Pilot study for the COST Action "Reassembling the Republic of Letters": language-driven network analysis of letters from the Hartlib's Papers
PPilot study for the COST Action
Reassembling the Republicof Letters : language-driven network analysis of letters fromthe Hartlib’s Papers
Barbara McGillivray, Federico Sangati4 April 2016
The applications of Social Network Analysis [Scott, 2013] to literary and historical texts haveattracted a growing interest in the scholarly community as powerful tools to investigate socialstructures. At the same time, the increased access to large amounts of digitized historical texts andthe availability of corpus tools and computational methods for analysing those data in automaticways offer new answers to humanistic research questions. Over the past decades, an increasingnumber of academic projects have focused on the role played by corpora in historical investigations,and several studies have shown that historical corpora contribute effectively to the progress ofhistorical research (cf. e.g. Knooihuizena and Dediub 2012, Piotrowski 2012).The present report summarizes an exploratory study which we carried out in the context of theCOST Action IS1310
Reassembling the Republic of Letters, 1500–1800 , and which is relevant to theactivities of Working Group 3 “Texts and Topics” and Working Group 2 “People and Networks”. Inthis study we investigated the use of Natural Language Processing (NLP) and Network Text Analy-sis [Popping and Roberts, 1997, Diesner and Carley, 2004] on a small sample of seventeenth-centuryletters selected from
Hartlib Papers , whose records are in one of the catalogues of Early ModernLetters Online (EMLO), and whose online edition is available on the website of the HumanitiesResearch Institute at the University of Sheffield. We will outline the NLP pipeline used to automatically process the texts into a network repre-sentation following the approach by Sudhahar et al. [2015], van de Camp and van den Bosch [2011],in order to identify the texts’ “narrative centrality”, i.e. the most central entities in the texts, andthe relations between them.Network Text Analysis is typically applied to a large quantity of text, hence our goal is not toprovide a complete analysis of the letters under investigation. We will instead aim to make an initialassessment of the validity of this approach, to suggest how it can scale up to a much larger set ofletters, and to define which infrastructure would be needed to extend this process to a potentiallymultilingual historical corpus of epistolary texts. http://emlo.bodleian.ox.ac.uk/blog/?catalogue=samuel-hartlib a r X i v : . [ c s . C L ] J a n Preprocessing Steps
In our study we have worked on the following 13 texts, selected from the archive of the HartlibPapers:
1. Dury – Hartlib (1628?)2. Dury – Hartlib (1632)3. Dury – Hartlib (1661)4. Dury – Roe (1637)5. Dury – St Amand (1637)6. Dury – Waller (1646)7. Hartlib (1631–1633)8. Hartlib – Davenant (1640)9. Hartlib – Dury (1630)10. Hartlib – Pell (1657)11. Hartlib – Robartes (1640)12. Hartlib – Worthington (1659)13. Hartlib – Worthington (1660).We have chosen these texts because they span over the chronological range of the Hartlib Papers,and they cover a relatively wide range of addressees. Moreover, these texts are written in English ;this language has the largest number of resources and NLP tools, even considering historical varietiesof modern languages, and therefore provided the best conditions for the linguistic processing.In the rest of this section we describe the letters’ acquisition procedure, the general NLP pre-processing steps , and the
NLP tools we have adopted to prepare the text for the
Network TextAnalysis described in section 3.
Although all letters were digitized and transcribed, we had to apply some manual polishing to thetext (e.g. removing transcription notes, formatting tags, etc.). This procedure would not be trivialto do automatically, because the text formatting is not consistent across all sources. Moreover theletters had to be imported manually one by one from the website. A much simpler alternativeprocedure, which would be paramount for a larger study, is to obtain access to the raw textual dataof the letters as stored in the database. These are all letters, apart from number 7, which is a summary text written by Samuel Hartlib. The letters are available on the website . We refer to the letterswith the last name of the sender, followed by the last name of the addressee, and the date of attribution of the text. This is with the exception of Dury-Hartlib (1628), which contains German text in its final part. We have excludedthis part from the manual syntactic analysis described in section 3. .2 Pre-processing Steps We have applied the following five textual pre-processing steps to each of the acquired letters:
1) Sentence splitting
The typical contextual unit of reference in NLP analysis is a sentence,and therefore sentence boundaries need to be detected. This is a rather simple procedure wherebylanguage-specific rules are normally used to decide in which cases certain punctuation marks (e.g.,full stops, question marks, etc.) identify the end of the sentence.
2) Tokenization
The basic units of reference for automatic textual analyses are word tokens.These are identified by language-specific rules and separated by adjacent elements such as punctu-ation marks or other word tokens in agglutinative languages (e.g., the German compound
Comput-erlinguistik ‘computational linguistics’ can be tokenized as two tokens,
Computer and linguistik ).
3) Part of Speech (PoS) Tagging
Each token is analyzed and assigned with a specific categorydepending on its syntactic role (e.g., verb, noun, adjective, adverb, etc.)
4) Lemmatization
In order to reduce data sparsity, automatic textual analysis often resorts inlemmatizing the text, i.e., turning each inflected or variant word-token form into its basic form(e.g., eating → eat).
5) Dependency Parsing
The final step is to derive the full syntactic structure of each sentence,e.g., in terms of its subjects, predicates, and objects. This is important in order to identify theargument structure of a sentence, which can be used to derive the actions, actors, and patients( who does what to whom ) in the sentence. This is preliminary to a full semantic analysis, whichfalls outside the scope of this study.
We have adopted and compared two different NLP processing tools to analyze the letters.
1) Stanford Core NLP Tools
This is one of the most complete state-of-the art NLP libraries[Manning et al., 2014], which implements all five pre-processing steps for several modern languages.Since it has no ready model for Early Modern English, we used the model for modern English. It is however in principle possible to train new language models (and therefore a model for EarlyModern English), provided enough annotated materials for such languages.
2) MorphAdorner
This tool [Burns, 2013] is one of the most commonly used tools for NLPprocessing of historical English. It requires the text to be in TEI (Text Encoding Initiative) format;therefore we manually added a TEI header for epistolary texts to each text. This step can be easilyautomated. MorphAdorner performs all the pre-processing steps described above except for the The most common exception in which a full stop is not a sentence boundary is when it is used for abbreviations,e.g., Mr., Mrs., etc. This has caused several PoS tagging errors, such as ‘bee’ in figure 1 classified as a noun instead of a form of theverb be . In this section we describe the procedure we followed to build a network (or graph) from a pre-processed text. A selected set of networks from the texts under investigation is reported in theAppendix A. These were built with automatic codes that use the Gephi library [Bastian et al.,2009] for rendering the networks in a graphical mode.The basic elements in our networks are lemmatized word tokens represented as circles (nodes),with specific colors depending on their PoS categories: red for verbs, blue for nouns and green foradjectives. The lines (arcs) connecting two nodes represent a specific relation between them.All code is open source and available at https://github.com/kercos/DH_Code . We illustrate two basic methodologies for building the networks: one based on word co-occurrences and the other based on their syntactic relations .
1) Co-occurrences
The simplest way to build a network from a text is to rely on co-occurrenceinformation, that is two word tokens (e.g. a noun and a verb) are connected if they co-occur in thesame textual context. Additionally, we want to keep track of the frequency of these connections todistinguish word-token pairs which co-occur more or less often. We have performed an automatic co-occurrence extraction using our own code, starting both from the Stanford and the MorphAdornerpreprocessed texts. In the current analysis we consider the sentence as the contextual unit toextract co-occurrences. A common alternative is to restrict the contextual region to a window of aspecific number of words (typically 4 or 5).
2) Syntactic relations
A more refined way to represent connections between entities in a giventext is to visualize their syntactic relations [Tanev and Magnini, 2008, McGillivray et al., 2008].In the current study we focused on a subject-verb-object triplet representation and extracted suchtriplets by hand for one letter (see section 4.2). In order to automate this step we would need adependency parsing processing. The Stanford NLP Tools provide a dependency parser for modernEnglish. For what concerns historical English, it is possible to develop a parser based on manuallyannotated texts, and some research has already been done in this direction, as summarized inPiotrowski [2012].Since the texts pre-processed with MorphAdorner lacked the syntactic information required toextract the subject-verb-object triplets, we devised a workaround to obtain a similar representationbased on the typical word-order of English: for every verb in the sentence we identified the closest4oun on its left as the candidate subject (with a maximum distance of four words), and the closestnoun on its right as the candidate object (with a maximum distance of four words). For example, let us consider the following sentence, from the letter from John Dury to SamuelHartlib (1628):(1) I begin to shew what prudency & care a
Tutour must vse to move little Children [. . . ]From 1, we extracted the following pairs (the verb and noun lemmas are listed): • show - prudency • Tutour - use • move - Children
In the list above, we note that we did not extract pronouns, but we will consider pronouns inthe manual analysis reported on in section 4.2; moreover, instead of triplets, we were only able toextract pairs of candidate subjects and verbs or verbs and candidate objects. Finally, note that prudency is not a direct object of show because of the indirect clause following this verb, showingthat the context-based triplets do not perfectly reflect the syntactic relationships between items. Insection 4.2 we will suggest how this can be improved thanks to a manual syntactic analysis, whichcan be automated.
The number of nodes and connections tends to grow extremely large with the size of the text. It istherefore necessary to show only the most representative ones, i.e., those occurring more frequently.This is accomplished by removing (pruning) less frequent nodes and connections, which tend toalso be the less reliable ones. As a matter of fact, although the methodology is prone to detect anumber of erroneous connections, in a very large text these errors will tend to have a low frequency.We have adopted two basic pruning strategies:
Number-based: we select only nodes and arcs whose frequency is above a predefined threshold(e.g., FREQ >
1, selects only elements with frequency greater than 1).
Mean-based: we select only nodes and arcs whose frequency is above the MEAN of the respectivefrequency distribution plus a certain number of standard deviations (e.g., MEAN + 2SD,selects only elements with frequency greater than the mean plus two standard deviations).
As detailed in section 3, we created a number of different networks, showing the various steps ofour approach. In this section we will focus on two groups: • networks relative to the collection of all 13 letters (Figures 3 and 4) As we explain in section 5, this workaround is unlikely to work well for languages with a freer word order, suchas Latin. networks relative to the letter sent by John Dury to Samuel Hartlib around 1628, and availableat (Figures 1, 2,5, 6, and 7)Given the small size of the corpus considered, we do not provide a quantitative analysis of thedata. Therefore, we will limit ourselves to general observations and focus on the methodologicalimplications of our approach and its potential for broader applications. We derived the first group of networks in a fully automatic way by first pre-processing the letters(tokenization, sentence segmentation, and lemmatization), and then by automatically extractingco-occurrence patterns.The nodes in the network in Figure 3 correspond to the lemmas of nouns (blue nodes), verbs(red nodes), and adjectives (green nodes) occurring in the letters, and their size is proportional tothe frequency of the lemmas in the corpus; if two nodes are connected, it means that they occur inthe same sentence. As the sentences in the letters are often long, these networks display the mostfrequent entities (nouns) and actions (verbs) mentioned in the letters.Figure 3 summarizes the main topics that the letters are concerned with: church , man , God , Lord , time , and work regarding nouns, and come , make , take , and find regarding verbs. By contrast,Figure 4 was obtained by considering a narrower context of co-occurrences for verbs and nouns,which led to results that are closer to an actor-action model. The red edges link verbs to the nounsoccurring before them in a window of four words (candidate subjects), and the blue edges linkverbs to the nouns occurring after them in the same four-word window (candidate objects). Thisapproach to detecting candidate subjects and objects is not always accurate, as we explain below.Let us consider the noun truth , connected to the verb see by a blue edge, indicating a candidateobject role. In fact, truth follows a form of see twice in the corpus, in both cases in the letterfrom John Dury to Joseph St Amand (1637), available at :(2) By this then wee see what trueth is [. . . ].(3) First the Congregation it selfe is to bee seene : And secondly the trueth or the falshood ofthe service perfourmed to Christ in the congregation.In 2 the verb governs a clause introduced by what whose subject is truth , so trueth is not strictlyspeaking the direct object of see , even though from a semantic point of view this is not completelyinaccurate. In 3, however, the algorithm ignores sentence boundaries marked by colons, as we onlyconsidered full stops as sentence delimiters. One simple way to avoid these kinds of errors would beto use colons to identify the clause boundaries and impose this as a constraint for the algorithm.In other cases the errors concern other syntactic phenomena, which are more difficult to address inabsence of a full syntactic parsing. For example, in the same letter we find:(4) Because before Luthers time the Church which is now called the Protestant Church had nobeing nor visibilitie [. . . ].In this case
Church is the subject of a passive form of call , and therefore, even though it occursbefore the verb, it is not a subject. In order to partially remedy these problems, we have includedsyntactic information for one of the letters, as we show in the next section.6 .2 Networks from the 1628 letter from Dury to Hartlib
Figure 1 shows the network for the 1628 letter from Dury to Hartlib, obtained from data processedwith the Stanford parser. We can notice that the pronouns wee ‘we’ and hee ‘he’ are incorrectlylemmatized and tagged as nouns, and that the verb bee ‘be’ is incorrectly lemmatized and taggedas a noun.Figure 2 shows the network derived from the letter preprocessed with MorphAdorner; the edgescorrespond to the co-occurences analysis and the network was pruned based onmean-based pruning(MEAN+ the 1SD for nodes and MEAN+2SD for arcs)Figure 5 contains the network with the context-based definitions of candidate subjects andobjects of the verbs occurring in the letter from John Dury to Samuel Hartlib (1628), while Figure6 displays manually annotated subjects and objects and their verbs. As we can see from thecomparison of the two figures, the latter is definitely a more accurate representation of the entitiesand actions mentioned in the letter. While keeping in mind that this is the analysis of a singleletter and that we need to be cautious in any generalization, we will make some general remarksthat support the validity and potential of this approach.In addition to some known collocations (e.g. see light , please God ), we can identify active andpassive entities from the point of view of their syntactic role in the sentences. For example, weobserve that the noun child is object of the verbs move and lead , suggesting a patient role. Thismay be opposed to the active role of tutor (subject of come ). The Lord predominantly appearsas an actor (subject of assist , stir up , and send ), possibly suggesting the idea of an interventionistGod. Coming to inanimate entities, thoughts appear in need to be ordered ( thought is the object of order ). Moreover, topics of concern seem to be the prevention of negative outcomes, as suggestedby the nouns associated with the verb concern , like pacification , (pastoral) care , and trouble .Figure 7 is derived from an additional anaphora resolution step, which contributes to makingthe analysis richer. For example, we notice that now the nodes child and tutor are connectedbecause tutor is the subject of move and lead , which have child as their object. Let us look at therelevant passages:(5) I begin to shew what prudency & care a Tutour must vse to move little
Children that arevncapable of the Precepts of Christianity to a Custome of naturall vertues [. . . ](6) [. . . ] seeking to enter into a particular consideracion of the whole duty of a
Tutour how heeought to bee fitted & prepared for the Charge & what hee ought to doe to leade a Child from his infancy as it were by the hand through an insensible Custome of well doeing vnto aperfect degree of all vertuesAs the excerpts above attest, the tutor is the entity performing the action of moving and leading,respectively in 5 and in 6. In 5 specifically, after the anaphora resolution step, the pronoun hee ‘he’is resolved to refer to
Tutour ‘tutor’.Of course, only a systematic quantitative analysis on a larger scale would be able to confirm thepreliminary observations done here. However, we have shown that the networks are able to providesome insights into the content of the letters, as we summarize in the next section. A collocation is a sequence of two or more words that tend to occur often together. In a linguistic context, anaphora resolution refers to the resolution of an expression based on another expressionoccurring before or after it (its antecedent or postcedent, respectively). Final Remarks
The results we have achieved show that the approach is promising and, if extended in its scope,can lead to positive results for historical research on correspondence texts. We have also shownthat methodologies developed to analyse contemporary texts have the potential to be successfullyapplied in a historical context, with specific adjustments.Since the nature of this pilot study was methodological and exploratory, we have focused theanalysis on a limited set of letters. However, the automatic procedures we have followed can beapplied to a significantly higher number of letters. In fact, it is on large datasets that certainpatterns can be detected and analyzed statistically, which is one of the strengths of computationalapproaches such as the present one.
In addition to applying the processing and analysis to a larger set of letters, this study can beextended in a number of directions, as we outline below.
Languages
This pilot study focused on English. However, thanks to the automatic proceduresfollowed, it can in principle be applied to data in other languages as well. This would suit thehigh degree of multilinguality in the Hartlib’s Papers well, and capture possible interestingassociations between the semantic content and the languages used. Nevertheless, the specificfeatures of the languages might require adjustments in the syntactic processing and extractionof triplets. . Preprocessing
A number of steps could increase the accuracy of the preprocessing steps. • Relying on syntactically parsed texts would make the subject-verb-object triples moreaccurate. • Labelling nouns and verbs according to their semantic classes (such as persons and vehicles for nouns and communication and motion for verbs, just to give a few examples)would allow us to group the triplets in larger categories and detect possible patterns inlarger networks. • Performing the anaphora resolution automatically would enrich the analysis, as shownin section 4. • Following the approach presented in Trampus and Mladeni´c [2011], rather than focusingon subject-verb-object triplets, we could extract full event patterns from the lettersand therefore derive semantic graphs, where nodes represent actors and edges representactions.
Evaluation
A systematic evaluation of the annotation of the texts would be necessary to assessthe quality of the data from which the networks were built. This can be done by comparingthe automatically extracted triples with a manually created gold standard. For example, as the word order in Latin is freer than in English and Latin morphology is richer, the extractionof triplets based on the nouns occurring before/after verbs as candidate subjects/objects is unlikely to lead to goodresults. By contrast, constraints on the morphological case of the nouns (e.g. nominative for subjects and accusativefor objects) and morphological agreement of the verb with the candidate subject noun are more promising, lackinga full syntactic processing. Similar arguments hold for morphologically richer languages like German or Italian etwork analysis A systematic and quantitative analysis of the networks based on centralitymeasures (such as in-degree and out-degree measures) would highlight particularly activeactors and actions, as well as connections between them. Further, replacing static networkswith dynamic networks extracted from the full epistolary corpus would help to identify thechange in importance of actors over time, as outlined in Agarwal et al. [2012].
Further analysis
It is possible to combine the linguistic features explored in this study with themetadata of the letters (which capture historically relevant information, such as date, location,sender, and addressee), as well as other text metadata (e.g. length, structure, complexity ofletter). This has the potential to offer new insights into the context, content, and structureof the letters, and support further research on this material. ngland bee worke go resolution hee come wee King hadde left
Lord mee < much haue > businesse booke Mr trauell write letter hadde God beene hand minegod do good take man spirit boat find favour make uppe shew many daye night grace course leave great wagon companie conversation hauing lodgingRamsay bee tell Stettin
Cornell
Wirtzburg see
Sir way other period nothing understand other concern matter bring againe houre hee tooke wee wee souldier keep catchword delete
Kings uppe yow little occasion outwardaccord think prosecute difficulty tyme meane further new owne horse giue cause cause iourney thought peace thing hope able send let pastmile cast day want call such ask neere ordinarie leisure full next
Chancelour agreement mind march answer freind doe use backe particulars stay towne know tooke force word self prouide say follow command promise
Spenselight enquire acquaintance downe
Hamburg seek money fall get place
Generall desire effect point set offer side heartassistance
Souldiers
DauidIames
Sweden counsell french
Prussia long intention themselue untohear due whole preacher Dr first passageDrummond purpose maine Figure 1: Network of the 1628 letter from Dury to Hartlib from Stanford preprocessing with co-occurences analysis and mean-based for nodes and arcs. ood
God write matter catchword letter think show time give king promise take go man find make lord work mr. come church visibility true state cause chief concern thing prosecute other desire tell day Stettin course part mean hand leave spirit way great company know send lutheran other say set end graceconscience get soldier alter
Christ service delete resolution acquaintance bring follow stay divine seek chancellor occasion word visible undertake asktown effect diet congregation see sir desire hear purpose another call prince place
Figure 2: Network of the 1628 letter from Dury to Hartlib, from MorphAdorner preprocessing withco-occurences analysis and mean-based pruning (MEAN+1SD for nodes and MEAN+2SD for arcs). ood write go work church letter matter thing time day take find come promise God give king man make lord way course true catchword think mr. chancellor other great say bring follow divine send
Figure 3: Network of all 13 letters from MorphAdorner preprocessing with co-occurences analysisand mean-based pruning (MEAN+2SD for nodes and MEAN+2SD for arcs). ngland come resolution take audience give agreement show bring advancement hinder wind keep
God please leave thank try make promise hand information leaveconversation find
Christ call perform say serve visibility capitalise alter lodging truth see
Wirtzburg service lady marry college obedience yield propertyendue matter concern think satisfaction boat acquaintance continuance interrupt catchword cause favour unity seek land bvisines desire state lay letter write moment ought thing belong castle town doubt name news hear way go time get coast observe guard cloud gather affair day land stay king follow cause kind know particular answer Stockholm december effect part faith dietappoint mean use must save court deal charge bear purse occasion
Brandenburg viz. company sir notice divine bend reform move hopeobtain
Saxonymarch
Franctfort return man let use while reside infallibility
Stettin carry mercy meaning chancellor present lord understand praise beseech rule walk assistance money profession work affect recommend labour undertake set prosecute mr. entreat send word accord church infer believe agree establish foundation course resolve stop congregation grant place help furnishvoice chamber purpose
Reu. dobj dobj dobjdobjdobjdobj nsubj dobjdobjdobj dobj dobjdobjnsubjnsubj dobj dobjnsubj dobj dobjdobj dobjnsubjdobj nsubjdobjdobjdobj nsubjnsubjnsubjnsubjdobj dobj dobjdobjdobj dobj dobjdobj dobjdobj dobj nsubjnsubj dobjdobjnsubjdobj nsubj nsubjnsubj dobj nsubjdobjnsubj nsubjdobj dobj dobjdobj dobjdobjnsubj nsubjdobj nsubjnsubj nsubjdobj dobj nsubjdobjdobjdobj dobj nsubjdobjdobj dobjdobjnsubjdobj dobjnsubjnsubj dobjdobjnsubj nsubj dobjdobj nsubjdobjdobj dobjnsubj dobjdobjdobj dobjdobj dobj dobj dobjdobj nsubjdobj dobj nsubjdobjdobjdobj dobjdobj dobj dobjdobjdobjdobj dobj dobj dobjdobjdobj nsubjdobjdobj dobj dobj dobj dobjnsubj dobjdobjnsubj dobjdobj nsubjdobj dobjdobjdobj dobjdobjdobj dobj dobjdobjnsubj nsubjnsubj dobjnsubjdobj nsubjnsubj nsubj nsubjnsubjdobj nsubj dobjnsubj nsubjdobj dobjdobjnsubj dobjdobj nsubj nsubjdobjdobj dobjdobj dobj dobjdobj dobj nsubjdobj
Figure 4: Network of all 13 letters from MorphAdorner preprocessing with triplet analysis andnumber-based pruning (FREQ > > eason findhalf see education belong Ariadnes order thoughtsubject England send projectbless manner sethopediscourage prudency begin
December come midstbreak actionintricatenes affair take counselhelp work fit prosecute pleaseheart tutor oughtcare intentionprocurelight ethic person judge concern moor provoke letterwritepeace exercitationunderstand differenceshowconsideration enter reformator correction child lead use care satisfy occasionfall fearjudgementlabourer stir church trouble bookthank lord establish must sumexpectationfullfilmatterdetermineassistance headgatherviz.makecounselortreatisegrace troubleplace virtue looseGod leisurestretch sideneglectchargeprepare motionabuse Babylon callwayprayerhope meditation ambassador pacification regardoffer informationneedle thing god ageLabirinthn dobjdobj dobjnsubj dobjdobj dobjdobj nsubjdobj dobjnsubj dobj nsubjdobjdobj nsubj dobjdobj dobj nsubjdobj nsubjdobj nsubjdobj nsubj dobjdobjdobj nsubjnsubj dobj dobjdobj dobj nsubjnsubjdobj dobjnsubj nsubj nsubj nsubjdobj nsubj nsubjdobjdobjnsubj dobj dobj nsubjnsubj dobj dobjnsubjnsubjnsubj dobjdobjnsubjnsubjdobj nsubjdobj nsubjnsubjdobj nsubj nsubjdobj dobj dobjdobjdobj nsubj dobjnsubj dobj dobjnsubjnsubjdobj nsubj dobj nsubjdobj nsubjdobj
Figure 5: Unpruned network of the 1628 letter from Dury to Hartlib; the preprocessing was performedwith MorphAdorner and was followed by an automatic extraction of triplets. therprovokethoughtorder labourer stir-up subject take part make sumset-downexpectation fullfil headgatherprudency show provocation come hourtake-upjoy trouble concern
Godpleaserest understand mindsatisfymotionabuse work prosecute tutor use
Lord send assistlightsee meditationletterwritepacificationexercitationdifferenceinformation thing offerbelong child leadmove care dobjdobj dobj dobjdobj dobjnsubjpass dobjdobj nsubjdobjnsubjpass nsubj dobjdobj dobjdobj dobj dobjnsubjdobj nsubj nsubjnsubjdobj nsubjpassnsubjpassdobj dobj dobjnsubj nsubjnsubjdobjdobjdobjdobj
Figure 6: Unpruned network of the 1628 letter from Dury to Hartlib; the preprocessing was performedwith MorphAdorner and was followed by a manual extraction of triplets. ther provoke thoughtorder labourer stir-up subject take partmake book send ms.sumset-down expectation fullfil headgatherprudency show provocation come counselor hour take-up joy trouble concern swallow-up affair
God please bless rest understand mindsatisfymotionabuse work prosecute tutor leadmove use
Lord assistlight see meditationletterwritepacification exercitationdifferenceinformationreformator thing offerbelong childcare dobj dobj dobjdobjdobj dobjdobjdobj nsubjpassdobjdobj nsubjnsubj dobjnsubjpassnsubjnsubjpass nsubjdobj nsubj dobjdobjdobj dobj dobj nsubj nsubjnsubjdobj nsubjnsubj nsubjdobj nsubjpassnsubjpassdobj dobjdobjnsubjdobj dobjnsubjnsubjdobjdobjdobjdobj
Figure 7: Unpruned network of the 1628 letter from Dury to Hartlib; the preprocessing was performedwith MorphAdorner and was followed by a manual extraction of triplets and anaphora resolution. eferences
J. Scott.
Social Network Analysis . Sage, London, 3 rd edition, 2013.R. Knooihuizena and D. Dediub. Historical demography and historical sociolinguistics: The role ofmigrant integration in the development of dunkirk french in the 17th century. Language Dynamicsand Change , 2:1–33, 2012.Michael Piotrowski.
Natural Language Processing for Historical Texts . Synthesis Lectures on HumanLanguage Technologies. Morgan & Claypool Publishers, 2012.Roel Popping and Carl W Roberts. Network approaches in text analysis. In
Klar, R.& Opitz, O(eds.), Classification and Knowledge Organization. , pages 381–389. Springer Berlin Heidelberg,1997.Jana Diesner and Kathleen M. Carley. Using network text analysis to detect the organizationalstructure of covert networks. In
Proceedings of the North American Association for ComputationalSocial and Organizational Science (NAACSOS) Conference , 2004.Saatviga Sudhahar, Gianluca de Fazio, Roberto Franzosi, and Nello Cristianini. Network analysisof narrative content in large corpora.
Natural Language Engineering , 21:81–112, 1 2015.Matje van de Camp and Antal van den Bosch. A link to the past: Constructing historical socialnetworks. In
Proceedings of the 2Nd Workshop on Computational Approaches to Subjectivityand Sentiment Analysis , WASSA ’11, pages 61–69, Stroudsburg, PA, USA, 2011. Association forComputational Linguistics.Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, andDavid McClosky. The Stanford CoreNLP natural language processing toolkit. In
Associationfor Computational Linguistics (ACL) System Demonstrations , pages 55–60, 2014. URL .Philip R. Burns. Morphadorner v2: A java library for the morphological adornment of englishlanguage texts. Northwestern University, 2013. URL https://morphadorner.northwestern.edu/morphadorner/download/morphadorner.pdf> .Mathieu Bastian, Sebastien Heymann, and Mathieu Jacomy. Gephi: An open source software forexploring and manipulating networks. In
International AAAI Conference on Weblogs and SocialMedia , 2009. URL .H. Tanev and B. Magnini. Weakly supervised approaches for ontology population. In
Proceedingsof the 2008 Conference on Ontology Learning and Population: Bridging the Gap Between Textand Knowledge , pages 129–143, Amsterdam, 2008. IOS Press.Barbara McGillivray, Christer Johansson, and Daniel Apollon. Semantic structure from corre-spondence analysis. In
Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithmsfor Natural Language Processing , TextGraphs-3, pages 49–52, Stroudsburg, PA, USA, 2008. As-sociation for Computational Linguistics. ISBN 978-1-905593-57-6. URL http://dl.acm.org/citation.cfm?id=1627328.1627335 .Mitja Trampus and Dunja Mladeni´c. Learning event patterns from text.
Informatica , 35(1), 2011.17poorv Agarwal, Augusto Corvalan, Jacob Jensen, and Owen Rambow. Social network analysis ofalice in wonderland. In
Proceedings of the NAACL-HLT 2012 Workshop on Computational Lin-guistics for Literature , pages 88–96, Montr´eal, Canada, June 2012. Association for ComputationalLinguistics. URL