Wednesday, January 7, 2026

Web Apps

I've created some Web apps using ChatGPT Codex and Claude Code. You can access them for free:

http://nevilleresearch.com/visual/

Visualizes how close plays are to each other using one semantic and two stylometric measures.

http://nevilleresearch.com/claudius/

Interactive Hamlet e-reader with funny annotations and audio, works on mobile or desktop.

http://nevilleresearch.com/confession/

Read Henry Neville's confession with pop-up n-grams in Shakespeare's plays.

http://nevilleresearch.com/letters/

Read Henry Neville's letters with pop-up n-grams in Shakespeare's plays.

http://nevilleresearch.com/hamneville/

Read Hamlet with pop-ups to Henry Neville letters.

http://nevilleresearch.com/hamlet/

Shows all cannon references in Hamlet.

Monday, January 5, 2026

Bigram Analysis: Neville's Letters vs. Shakespeare

I created an XML file of Neville's letters from Winwood's Memorials of Affairs of State, Vol 1 and 2. That's 89 letters Henry Neville wrote, mostly as ambassador from 1599-1601.

Using Pervez Rizvi's database of early modern English plays, I did a comparison of bigrams (two word combos) to see which plays more closely match the letters. I evaluated 239 plays from 1590-1615. The results are stunning. Shakespeare's plays rank at the top consistently:

Rank Year Similarity Title
116130.6126Henry VIII [Full Play]
216090.6079The Winter's Tale
315990.5897Henry V
416130.5866Henry VIII [Shakespeare Section]
516100.5843Cymbeline
616030.5736All's Well That Ends Well
716000.5687Cynthia's Revels (Jonson)
815970.5659Henry IV, Part 2
916080.5652Coriolanus
1016020.5645The Royal King and the Loyal Subject
1116070.5597The Tragedy of Charles Duke of Byron
1215950.5585Love's Labor's Lost
1316030.5583Measure for Measure
1415990.55281 Edward the Fourth
1515990.5509Every Man Out of His Humour (Jonson)
1615970.5473Henry IV, Part 1
1716050.5448Philotas
1816010.5431Hamlet
1915990.54202 Edward the Fourth
2016090.5414Epicoene (Jonson)
2115910.5404Henry VI, Part 2
2216070.5398The Conspiracy of Charles Duke of Byron
2316050.5392King Lear
2416140.5381The Hector of Germany
2516140.5356Bartholomew Fair (Jonson)
2616040.5333Sejanus His Fall (Jonson)
2716130.5326Henry VIII [Fletcher Section]
2816060.5324The Isle of Gulls
2916040.5310The Widow's Tears
3016050.5301Volpone (Jonson)
3116110.5299Catiline His Conspiracy (Jonson)
3216060.5289Antony and Cleopatra
3316100.5285The Revenge of Bussy D'Ambois
3416140.5280The Staple of News
3515980.5240Every Man in His Humour (Jonson)
3615920.5216A Knack to Know a Knave
3715900.5210Jack Straw
3815960.5205The Merchant of Venice
3916040.5187When You See Me You Know Me
4015950.5173Richard II

This is not conclusive evidence that Henry Neville wrote the works of Shakespeare. But it is an objective and reproducible test that shows a clear affinity between the two-word phrases Henry Neville and Shakespeare used. 

This overlap is partly due to the topic of the plays aligning with the experiences Neville had as ambassador. This is not a defect in this study. Quite the opposite, the overlap is another piece of strong evidence.

This research was done with the help of Claude Code. 

I ran a similar test, with the help of ChatGPT Codex, that reduces reliance on topical words.  "The new test uses function‑word bigrams only (top 200 MFW), then compares length‑matched windows with bootstrapping and reports mean ± std. This reduces topical bias and makes comparisons fairer across different text lengths." Very similar results:

RankYear  TitleMean_Sim
11613Henry VIII [Shakespeare Sect]0.7802
21613Henry VIII0.7548
31599Henry V0.7533
41607Tragedy of Charles Duke of Byron0.7435
51607Conspiracy of Charles Duke of Byron0.7386
61609The Winter's Tale0.7342
71605Philotas0.7341
81606Macbeth0.7309
91614The Hector of Germany0.7268
101595Richard II0.7219
111604Sejanus His Fall0.7207
121610Cymbeline0.7202
131608Coriolanus0.7191
141597Henry IV, Part 20.7165
151590The Reign of King Edward the Third0.7160
161591Locrine0.7147
171606The Rape of Lucrece0.7130
181592Summer's Last Will and Testament0.7129
191610The Revenge of Bussy D'Ambois0.7117
201603The Family of Love0.7116
211596King John0.7108
221592Henry VI, Part 10.7106
231606Hymenaei0.7103
241603All's Well That Ends Well0.7094
251595Love's Labor's Lost0.7091
261611The Atheist's Tragedy0.7089
2715911 The Troublesome Reign of King John0.7055
281613Henry VIII [Fletcher Section]0.7018
291604The Widow's Tears0.7017
301590The Love of David and Fair Bathsheba0.7012
311593The Massacre at Paris0.7005
321611Catiline His Conspiracy0.6993
331591Henry VI, Part 20.6987
3415912 The Troublesome Reign of King John0.6970
351610The Golden Age0.6965
361606The Isle of Gulls0.6942
371614The Staple of News0.6939
381603Measure for Measure0.6935
391606Antony and Cleopatra0.6930
401590Jack Straw0.6928

I have placed all the necessary information to reproduce both tests here: https://nevilleresearch.com/bigram/ 

Neville wrote a confession on March 2, 1601 after the execution of the Earl of Essex.  You can read it here. If you take this confession and compare all of the rare bigrams (found in 5 or fewer plays), the ranking of 239 plays looks like this:

RankYearSharedPlay
1161010Cymbeline (Shakespeare)
2161310Henry VIII (Shakespeare/Fletcher)
3161310Two Noble Kinsmen (Shakespeare/Fletcher)
415979Henry IV, Part 2 (Shakespeare)
516009Cynthia's Revels
615998Clyomon and Clamydes
716048The Widow's Tears
816118Catiline His Conspiracy
916128The White Devil
1016047Sejanus His Fall
...
1516016Hamlet (Shakespeare)

If just look at all bigrams, Hamlet is the second highest:

RankYearSimilarityPlay
115920.411Edward the Second (Marlowe)
216010.401Hamlet (Shakespeare)
316070.389Tragedy of Charles Duke of Byron (Chapman)
416050.372The Revenger's Tragedy
515990.3691 Sir John Oldcastle
616020.360The Gentleman Usher (Chapman)
715920.350Richard III (Shakespeare)
816040.345Bussy D'Ambois (Chapman)
916040.3431 If You Know Not Me You Know Nobody
1015930.343The Massacre at Paris (Marlowe)


Comparing Neville's "Case", another document written after his arrest for the Essex Rebellion, produce an even stronger result:


Here is another version of the same test, with extensive details, generated by ChatGPT Codex:

The Case (1601) vs. Early Modern Plays: Function-Word + POS Bootstrap Analysis

Overview

This report summarizes a function-word and part-of-speech (POS) n-gram analysis of Sir Henry Neville's The Case (1601), comparing it to 239 plays from 1590–1615. The method emphasizes syntactic habit over topical vocabulary, and uses bootstrap sampling to estimate the stability of similarity rankings.

This syntactic result aligns with earlier findings from word-bigram and rare-bigram analyses, which also placed Hamlet and other Shakespeare plays in the top tier. The key difference is that this test is largely insulated from content vocabulary, suggesting that the observed affinity persists even when topical overlap is minimized. That convergence across methods strengthens the case for a genuine stylistic proximity rather than a coincidence of subject matter.

Method in brief

  1. Lemmas and POS tags were extracted from the XML text of The Case.
  2. Each play was tokenized by lemma from the Early Modern Plays database, then POS-tagged with spaCy.
  3. Two feature sets were built for each text: function-word n-grams and POS-tag n-grams (n = 2--4).
  4. Cosine similarity was computed between The Case and each play for both feature sets.
  5. The final score is the mean of the function-word and POS similarities.
  6. Bootstrap stability: 500 random 400-token windows were drawn from The Case, producing 500 rankings; we report the mean similarity and a 95% interval for each play.

Parameters

  • Plays: 239 texts dated 1590--1615
  • Function words: 200+ Early Modern function words (Burrows-style list with EM additions)
  • POS tagger: spaCy en_core_web_sm
  • N-gram sizes: 2--4
  • Window size: 400 tokens
  • Bootstrap samples: 500

Top 20 plays by combined mean similarity

RankPlayYearCombined Mean95% Interval
1Hamlet16010.1337[0.1093, 0.1742]
2Richard III15920.1329[0.1138, 0.1712]
32 Edward the Fourth15990.1306[0.1124, 0.1587]
4Henry VI, Part 215910.1297[0.1086, 0.1719]
5Cymbeline16100.1291[0.1066, 0.1670]
6The True Chronicle of King Leir15900.1290[0.1110, 0.1551]
7Othello16040.1281[0.1071, 0.1513]
8The Winter's Tale16090.1275[0.1067, 0.1629]
9Edward the Second15920.1269[0.1078, 0.1583]
10Alphonsus, Emperor of Germany15940.1269[0.1072, 0.1621]
11Antony and Cleopatra16060.1267[0.1066, 0.1647]
12The Queen's Arcadia16050.1266[0.1135, 0.1540]
13The Tragedy of Charles Duke of Byron16070.1266[0.1056, 0.1753]
14Clyomon and Clamydes15990.1264[0.1060, 0.1467]
15Henry IV, Part 215970.1264[0.1017, 0.1702]
16Fair Em15900.1263[0.1094, 0.1531]
17Measure for Measure16030.1262[0.1023, 0.1562]
18As You Like It15990.1253[0.1052, 0.1536]
19Volpone16050.1253[0.1060, 0.1558]
20Two Lamentable Tragedies15940.1248[0.1062, 0.1564]

Interpretation

The highest-ranked plays are dominated by Shakespeare’s late and middle-period works, with Hamlet at the top of the list. This is notable because the method suppresses content vocabulary and instead emphasizes grammatical habit (function-word sequences and POS patterns). That means the observed affinity is less likely to be driven by shared topics and more likely to reflect structural linguistic tendencies.

Caveats

  • The Case is short (1,422 tokens), so even bootstrap windows are drawn from a limited pool.
  • POS tags are produced by a modern tagger; Early Modern syntax may be partially misclassified.
  • Similarity does not prove authorship; it indicates stylistic proximity under a specific metric.
[Please note, if you take Hamlet and run the same bigram test against ALL the plays from 1590-1615, Shakespeare plays pop up almost exclusively, showing the value of the test:

  1. Hamlet (503) 1.0000

  2. All’s Well That Ends Well (496) 0.7479

  3. King Lear (507) 0.7459

  4. Othello (512) 0.7456

  5. The Winter’s Tale (525) 0.7448

  6. Henry IV, Part 2 (491) 0.7442

  7. Richard III (515) 0.7400

  8. Measure for Measure (509) 0.7285

  9. Troilus and Cressida (523) 0.7227

  10. 1 Sir John Oldcastle (357) 0.7221

  11. The White Devil (52) 0.7215

  12. Henry IV, Part 1 (489) 0.7190

  13. Cymbeline (499) 0.7181

  14. Henry VIII (502) 0.7122

  15. The Gentleman Usher (452) 0.7058

  16. The Woman Hater (413) 0.7021

  17. Antony and Cleopatra (495) 0.6979

  18. 2 Edward the Fourth (340) 0.6925

  19. Much Ado About Nothing (494) 0.6900

  20. Sir Giles Goosecap (388) 0.6895]


Comparing Neville's 1613 "Advice" to King James, we get these amazing results: 


 Top 20 (lemma bigrams, 1590–1615):


  - 1 | 1599 | 0.3974 | Henry V

  - 2 | 1604 | 0.3802 | Sejanus His Fall

  - 3 | 1609 | 0.3779 | The Winter's Tale

  - 4 | 1613 | 0.3771 | Henry VIII

  - 5 | 1600 | 0.3738 | Cynthia's Revels

  - 6 | 1595 | 0.3697 | Love's Labor's Lost

  - 7 | 1605 | 0.3666 | Philotas

  - 8 | 1608 | 0.3617 | Coriolanus

  - 9 | 1590 | 0.3606 | Jack Straw

  - 10 | 1610 | 0.3581 | Cymbeline

  - 11 | 1607 | 0.3570 | The Conspiracy of Charles Duke of Byron

  - 12 | 1597 | 0.3562 | Henry IV, Part 2

  - 13 | 1601 | 0.3542 | Hamlet

  - 14 | 1610 | 0.3537 | The Revenge of Bussy D'Ambois

  - 15 | 1597 | 0.3527 | Henry IV, Part 1

  - 16 | 1611 | 0.3526 | Catiline His Conspiracy

  - 17 | 1604 | 0.3517 | Arches of Triumph

  - 18 | 1607 | 0.3504 | The Tragedy of Charles Duke of Byron

  - 19 | 1599 | 0.3471 | 1 Edward the Fourth

  - 20 | 1603 | 0.3471 | All's Well That Ends Well


Trigrams provide an equally strong result:


 Top 20 (lemma trigrams, 1590–1615):


  - 1 | 1609 | 0.0263 | The Captain

  - 2 | 1613 | 0.0263 | Henry VIII

  - 3 | 1603 | 0.0253 | All’s Well That Ends Well

  - 4 | 1605 | 0.0251 | The Noble Gentleman

  - 5 | 1609 | 0.0249 | The Winter’s Tale

  - 6 | 1599 | 0.0247 | 2 Edward the Fourth

  - 7 | 1598 | 0.0236 | Much Ado About Nothing

  - 8 | 1597 | 0.0235 | Henry IV, Part 2

  - 9 | 1599 | 0.0235 | Henry V

  - 10 | 1611 | 0.0233 | A King and No King

  - 11 | 1599 | 0.0232 | 1 Edward the Fourth

  - 12 | 1610 | 0.0231 | The Maid’s Tragedy

  - 13 | 1601 | 0.0220 | Hamlet

  - 14 | 1595 | 0.0220 | Love’s Labor’s Lost

  - 15 | 1599 | 0.0220 | As You Like It

  - 16 | 1592 | 0.0218 | Edward the Second

  - 17 | 1607 | 0.0217 | Cupid’s Revenge

  - 18 | 1599 | 0.0216 | Julius Caesar

  - 19 | 1597 | 0.0208 | Henry IV, Part 1

  - 20 | 1597 | 0.0208 | An Humorous Day’s Mirth