I created an XML file of Neville's letters from Winwood's Memorials of Affairs of State, Vol 1 and 2. That's 89 letters Henry Neville wrote, mostly as ambassador from 1599-1601.
Using Pervez Rizvi's database of early modern English plays, I did a comparison of bigrams (two word combos) to see which plays more closely match the letters. I evaluated 239 plays from 1590-1615. The results are stunning. Shakespeare's plays rank at the top consistently:
| Rank | Year | Similarity | Title |
|---|---|---|---|
| 1 | 1613 | 0.6126 | Henry VIII [Full Play] |
| 2 | 1609 | 0.6079 | The Winter's Tale |
| 3 | 1599 | 0.5897 | Henry V |
| 4 | 1613 | 0.5866 | Henry VIII [Shakespeare Section] |
| 5 | 1610 | 0.5843 | Cymbeline |
| 6 | 1603 | 0.5736 | All's Well That Ends Well |
| 7 | 1600 | 0.5687 | Cynthia's Revels (Jonson) |
| 8 | 1597 | 0.5659 | Henry IV, Part 2 |
| 9 | 1608 | 0.5652 | Coriolanus |
| 10 | 1602 | 0.5645 | The Royal King and the Loyal Subject |
| 11 | 1607 | 0.5597 | The Tragedy of Charles Duke of Byron |
| 12 | 1595 | 0.5585 | Love's Labor's Lost |
| 13 | 1603 | 0.5583 | Measure for Measure |
| 14 | 1599 | 0.5528 | 1 Edward the Fourth |
| 15 | 1599 | 0.5509 | Every Man Out of His Humour (Jonson) |
| 16 | 1597 | 0.5473 | Henry IV, Part 1 |
| 17 | 1605 | 0.5448 | Philotas |
| 18 | 1601 | 0.5431 | Hamlet |
| 19 | 1599 | 0.5420 | 2 Edward the Fourth |
| 20 | 1609 | 0.5414 | Epicoene (Jonson) |
| 21 | 1591 | 0.5404 | Henry VI, Part 2 |
| 22 | 1607 | 0.5398 | The Conspiracy of Charles Duke of Byron |
| 23 | 1605 | 0.5392 | King Lear |
| 24 | 1614 | 0.5381 | The Hector of Germany |
| 25 | 1614 | 0.5356 | Bartholomew Fair (Jonson) |
| 26 | 1604 | 0.5333 | Sejanus His Fall (Jonson) |
| 27 | 1613 | 0.5326 | Henry VIII [Fletcher Section] |
| 28 | 1606 | 0.5324 | The Isle of Gulls |
| 29 | 1604 | 0.5310 | The Widow's Tears |
| 30 | 1605 | 0.5301 | Volpone (Jonson) |
| 31 | 1611 | 0.5299 | Catiline His Conspiracy (Jonson) |
| 32 | 1606 | 0.5289 | Antony and Cleopatra |
| 33 | 1610 | 0.5285 | The Revenge of Bussy D'Ambois |
| 34 | 1614 | 0.5280 | The Staple of News |
| 35 | 1598 | 0.5240 | Every Man in His Humour (Jonson) |
| 36 | 1592 | 0.5216 | A Knack to Know a Knave |
| 37 | 1590 | 0.5210 | Jack Straw |
| 38 | 1596 | 0.5205 | The Merchant of Venice |
| 39 | 1604 | 0.5187 | When You See Me You Know Me |
| 40 | 1595 | 0.5173 | Richard II |
This research was done with the help of Claude Code.
I ran a similar test, with the help of ChatGPT Codex, that reduces reliance on topical words. "The new test uses function‑word bigrams only (top 200 MFW), then compares length‑matched windows with bootstrapping and reports mean ± std. This reduces topical bias and makes comparisons fairer across different text lengths." Very similar results:
| Rank | Year | Title | Mean_Sim |
|---|---|---|---|
| 1 | 1613 | Henry VIII [Shakespeare Sect] | 0.7802 |
| 2 | 1613 | Henry VIII | 0.7548 |
| 3 | 1599 | Henry V | 0.7533 |
| 4 | 1607 | Tragedy of Charles Duke of Byron | 0.7435 |
| 5 | 1607 | Conspiracy of Charles Duke of Byron | 0.7386 |
| 6 | 1609 | The Winter's Tale | 0.7342 |
| 7 | 1605 | Philotas | 0.7341 |
| 8 | 1606 | Macbeth | 0.7309 |
| 9 | 1614 | The Hector of Germany | 0.7268 |
| 10 | 1595 | Richard II | 0.7219 |
| 11 | 1604 | Sejanus His Fall | 0.7207 |
| 12 | 1610 | Cymbeline | 0.7202 |
| 13 | 1608 | Coriolanus | 0.7191 |
| 14 | 1597 | Henry IV, Part 2 | 0.7165 |
| 15 | 1590 | The Reign of King Edward the Third | 0.7160 |
| 16 | 1591 | Locrine | 0.7147 |
| 17 | 1606 | The Rape of Lucrece | 0.7130 |
| 18 | 1592 | Summer's Last Will and Testament | 0.7129 |
| 19 | 1610 | The Revenge of Bussy D'Ambois | 0.7117 |
| 20 | 1603 | The Family of Love | 0.7116 |
| 21 | 1596 | King John | 0.7108 |
| 22 | 1592 | Henry VI, Part 1 | 0.7106 |
| 23 | 1606 | Hymenaei | 0.7103 |
| 24 | 1603 | All's Well That Ends Well | 0.7094 |
| 25 | 1595 | Love's Labor's Lost | 0.7091 |
| 26 | 1611 | The Atheist's Tragedy | 0.7089 |
| 27 | 1591 | 1 The Troublesome Reign of King John | 0.7055 |
| 28 | 1613 | Henry VIII [Fletcher Section] | 0.7018 |
| 29 | 1604 | The Widow's Tears | 0.7017 |
| 30 | 1590 | The Love of David and Fair Bathsheba | 0.7012 |
| 31 | 1593 | The Massacre at Paris | 0.7005 |
| 32 | 1611 | Catiline His Conspiracy | 0.6993 |
| 33 | 1591 | Henry VI, Part 2 | 0.6987 |
| 34 | 1591 | 2 The Troublesome Reign of King John | 0.6970 |
| 35 | 1610 | The Golden Age | 0.6965 |
| 36 | 1606 | The Isle of Gulls | 0.6942 |
| 37 | 1614 | The Staple of News | 0.6939 |
| 38 | 1603 | Measure for Measure | 0.6935 |
| 39 | 1606 | Antony and Cleopatra | 0.6930 |
| 40 | 1590 | Jack Straw | 0.6928 |
Neville wrote a confession on March 2, 1601 after the execution of the Earl of Essex. You can read it here. If you take this confession and compare all of the rare bigrams (found in 5 or fewer plays), the ranking of 239 plays looks like this:
| Rank | Year | Shared | Play |
|---|---|---|---|
| 1 | 1610 | 10 | Cymbeline (Shakespeare) |
| 2 | 1613 | 10 | Henry VIII (Shakespeare/Fletcher) |
| 3 | 1613 | 10 | Two Noble Kinsmen (Shakespeare/Fletcher) |
| 4 | 1597 | 9 | Henry IV, Part 2 (Shakespeare) |
| 5 | 1600 | 9 | Cynthia's Revels |
| 6 | 1599 | 8 | Clyomon and Clamydes |
| 7 | 1604 | 8 | The Widow's Tears |
| 8 | 1611 | 8 | Catiline His Conspiracy |
| 9 | 1612 | 8 | The White Devil |
| 10 | 1604 | 7 | Sejanus His Fall |
| ... | |||
| 15 | 1601 | 6 | Hamlet (Shakespeare) |
| Rank | Year | Similarity | Play |
|---|---|---|---|
| 1 | 1592 | 0.411 | Edward the Second (Marlowe) |
| 2 | 1601 | 0.401 | Hamlet (Shakespeare) |
| 3 | 1607 | 0.389 | Tragedy of Charles Duke of Byron (Chapman) |
| 4 | 1605 | 0.372 | The Revenger's Tragedy |
| 5 | 1599 | 0.369 | 1 Sir John Oldcastle |
| 6 | 1602 | 0.360 | The Gentleman Usher (Chapman) |
| 7 | 1592 | 0.350 | Richard III (Shakespeare) |
| 8 | 1604 | 0.345 | Bussy D'Ambois (Chapman) |
| 9 | 1604 | 0.343 | 1 If You Know Not Me You Know Nobody |
| 10 | 1593 | 0.343 | The Massacre at Paris (Marlowe) |
Here is another version of the same test, with extensive details, generated by ChatGPT Codex:
The Case (1601) vs. Early Modern Plays: Function-Word + POS Bootstrap Analysis
Overview
This report summarizes a function-word and part-of-speech (POS) n-gram analysis of Sir Henry Neville's The Case (1601), comparing it to 239 plays from 1590–1615. The method emphasizes syntactic habit over topical vocabulary, and uses bootstrap sampling to estimate the stability of similarity rankings.
This syntactic result aligns with earlier findings from word-bigram and rare-bigram analyses, which also placed Hamlet and other Shakespeare plays in the top tier. The key difference is that this test is largely insulated from content vocabulary, suggesting that the observed affinity persists even when topical overlap is minimized. That convergence across methods strengthens the case for a genuine stylistic proximity rather than a coincidence of subject matter.
Method in brief
- Lemmas and POS tags were extracted from the XML text of The Case.
- Each play was tokenized by lemma from the Early Modern Plays database, then POS-tagged with spaCy.
- Two feature sets were built for each text: function-word n-grams and POS-tag n-grams (n = 2--4).
- Cosine similarity was computed between The Case and each play for both feature sets.
- The final score is the mean of the function-word and POS similarities.
- Bootstrap stability: 500 random 400-token windows were drawn from The Case, producing 500 rankings; we report the mean similarity and a 95% interval for each play.
Parameters
- Plays: 239 texts dated 1590--1615
- Function words: 200+ Early Modern function words (Burrows-style list with EM additions)
- POS tagger: spaCy
en_core_web_sm - N-gram sizes: 2--4
- Window size: 400 tokens
- Bootstrap samples: 500
Top 20 plays by combined mean similarity
| Rank | Play | Year | Combined Mean | 95% Interval |
|---|---|---|---|---|
| 1 | Hamlet | 1601 | 0.1337 | [0.1093, 0.1742] |
| 2 | Richard III | 1592 | 0.1329 | [0.1138, 0.1712] |
| 3 | 2 Edward the Fourth | 1599 | 0.1306 | [0.1124, 0.1587] |
| 4 | Henry VI, Part 2 | 1591 | 0.1297 | [0.1086, 0.1719] |
| 5 | Cymbeline | 1610 | 0.1291 | [0.1066, 0.1670] |
| 6 | The True Chronicle of King Leir | 1590 | 0.1290 | [0.1110, 0.1551] |
| 7 | Othello | 1604 | 0.1281 | [0.1071, 0.1513] |
| 8 | The Winter's Tale | 1609 | 0.1275 | [0.1067, 0.1629] |
| 9 | Edward the Second | 1592 | 0.1269 | [0.1078, 0.1583] |
| 10 | Alphonsus, Emperor of Germany | 1594 | 0.1269 | [0.1072, 0.1621] |
| 11 | Antony and Cleopatra | 1606 | 0.1267 | [0.1066, 0.1647] |
| 12 | The Queen's Arcadia | 1605 | 0.1266 | [0.1135, 0.1540] |
| 13 | The Tragedy of Charles Duke of Byron | 1607 | 0.1266 | [0.1056, 0.1753] |
| 14 | Clyomon and Clamydes | 1599 | 0.1264 | [0.1060, 0.1467] |
| 15 | Henry IV, Part 2 | 1597 | 0.1264 | [0.1017, 0.1702] |
| 16 | Fair Em | 1590 | 0.1263 | [0.1094, 0.1531] |
| 17 | Measure for Measure | 1603 | 0.1262 | [0.1023, 0.1562] |
| 18 | As You Like It | 1599 | 0.1253 | [0.1052, 0.1536] |
| 19 | Volpone | 1605 | 0.1253 | [0.1060, 0.1558] |
| 20 | Two Lamentable Tragedies | 1594 | 0.1248 | [0.1062, 0.1564] |
Interpretation
The highest-ranked plays are dominated by Shakespeare̢۪s late and middle-period works, with Hamlet at the top of the list. This is notable because the method suppresses content vocabulary and instead emphasizes grammatical habit (function-word sequences and POS patterns). That means the observed affinity is less likely to be driven by shared topics and more likely to reflect structural linguistic tendencies.
Caveats
- The Case is short (1,422 tokens), so even bootstrap windows are drawn from a limited pool.
- POS tags are produced by a modern tagger; Early Modern syntax may be partially misclassified.
- Similarity does not prove authorship; it indicates stylistic proximity under a specific metric.
1. Hamlet (503) 1.0000
2. All’s Well That Ends Well (496) 0.7479
3. King Lear (507) 0.7459
4. Othello (512) 0.7456
5. The Winter’s Tale (525) 0.7448
6. Henry IV, Part 2 (491) 0.7442
7. Richard III (515) 0.7400
8. Measure for Measure (509) 0.7285
9. Troilus and Cressida (523) 0.7227
10. 1 Sir John Oldcastle (357) 0.7221
11. The White Devil (52) 0.7215
12. Henry IV, Part 1 (489) 0.7190
13. Cymbeline (499) 0.7181
14. Henry VIII (502) 0.7122
15. The Gentleman Usher (452) 0.7058
16. The Woman Hater (413) 0.7021
17. Antony and Cleopatra (495) 0.6979
18. 2 Edward the Fourth (340) 0.6925
19. Much Ado About Nothing (494) 0.6900
20. Sir Giles Goosecap (388) 0.6895]
Comparing Neville's 1613 "Advice" to King James, we get these amazing results:
Top 20 (lemma bigrams, 1590–1615):
- 1 | 1599 | 0.3974 | Henry V
- 2 | 1604 | 0.3802 | Sejanus His Fall
- 3 | 1609 | 0.3779 | The Winter's Tale
- 4 | 1613 | 0.3771 | Henry VIII
- 5 | 1600 | 0.3738 | Cynthia's Revels
- 6 | 1595 | 0.3697 | Love's Labor's Lost
- 7 | 1605 | 0.3666 | Philotas
- 8 | 1608 | 0.3617 | Coriolanus
- 9 | 1590 | 0.3606 | Jack Straw
- 10 | 1610 | 0.3581 | Cymbeline
- 11 | 1607 | 0.3570 | The Conspiracy of Charles Duke of Byron
- 12 | 1597 | 0.3562 | Henry IV, Part 2
- 13 | 1601 | 0.3542 | Hamlet
- 14 | 1610 | 0.3537 | The Revenge of Bussy D'Ambois
- 15 | 1597 | 0.3527 | Henry IV, Part 1
- 16 | 1611 | 0.3526 | Catiline His Conspiracy
- 17 | 1604 | 0.3517 | Arches of Triumph
- 18 | 1607 | 0.3504 | The Tragedy of Charles Duke of Byron
- 19 | 1599 | 0.3471 | 1 Edward the Fourth
- 20 | 1603 | 0.3471 | All's Well That Ends Well
Trigrams provide an equally strong result:
Top 20 (lemma trigrams, 1590–1615):
- 1 | 1609 | 0.0263 | The Captain
- 2 | 1613 | 0.0263 | Henry VIII
- 3 | 1603 | 0.0253 | All’s Well That Ends Well
- 4 | 1605 | 0.0251 | The Noble Gentleman
- 5 | 1609 | 0.0249 | The Winter’s Tale
- 6 | 1599 | 0.0247 | 2 Edward the Fourth
- 7 | 1598 | 0.0236 | Much Ado About Nothing
- 8 | 1597 | 0.0235 | Henry IV, Part 2
- 9 | 1599 | 0.0235 | Henry V
- 10 | 1611 | 0.0233 | A King and No King
- 11 | 1599 | 0.0232 | 1 Edward the Fourth
- 12 | 1610 | 0.0231 | The Maid’s Tragedy
- 13 | 1601 | 0.0220 | Hamlet
- 14 | 1595 | 0.0220 | Love’s Labor’s Lost
- 15 | 1599 | 0.0220 | As You Like It
- 16 | 1592 | 0.0218 | Edward the Second
- 17 | 1607 | 0.0217 | Cupid’s Revenge
- 18 | 1599 | 0.0216 | Julius Caesar
- 19 | 1597 | 0.0208 | Henry IV, Part 1
- 20 | 1597 | 0.0208 | An Humorous Day’s Mirth
No comments:
Post a Comment