A model of language inflection graphs
SSeptember 12, 2018 16:0
International Journal of Modern Physics Cc (cid:13)
World Scientific Publishing Company
A MODEL OF LANGUAGE INFLECTION GRAPHS
HENRYK FUKŚBABAK FARZADYI CAO
Department of Mathematics and StatisticsBrock University, St. CatharinesOntario, Canada L2S [email protected], [email protected], [email protected]
Received 3 July 2013Revised 27 November 2013Accepted 29 November 2013Inflection graphs are highly complex networks representing relationships between inflec-tional forms of words in human languages. For so-called synthetic languages, such asLatin or Polish, they have particularly interesting structure due to abundance of inflec-tional forms. We construct the simplest form of inflection graphs, namely a bipartitegraph in which one group of vertices corresponds to dictionary headwords and the othergroup to inflected forms encountered in a given text. We then study projection of thisgraph on the set of headwords. The projection decomposes into a large number of con-nected components, to be called word groups. Distribution of sizes of word group exhibitssome remarkable properties, resembling cluster distribution in a lattice percolation nearthe critical point. We propose a simple model which produces graphs of this type, repro-ducing the desired component distribution and other topological features.
Keywords : complex networks, inflection graphs, percolation, scalingPACS Nos.: 64.60.ah, 05.90.+m, 05.70.Jk, 02.50.-r, 64.60.aq
1. Introduction
Human languages can be studied from many different perspectives. If we think ofa foreign language, however, we typically think of words of that language, thusit is quite natural that vocabulary is one of the most extensively studied featuresof languages. In recent years, the network paradigm has been used to study vo-cabularies, and within this paradigm, words of the language are viewed as ver-tices of a large and complex network or graph, with edges representing relation-ships between words. Many such models emphasizing different relationships betweenwords have been studied in the past decade, including networks of co-occurrences ofwords in sentences , thesaurus graphs , , , WordNet database graphs , and manyothers , , , , .It is fair to say that a lot of the aforementioned works concentrated on the a r X i v : . [ n li n . C G ] J un eptember 12, 2018 16:0 Henryk Fukś, Babak Farzad and Yi Cao
English language, which has a very characteristic property of being analytic, thatis, exhibiting only a minimal inflection. In analytic languages grammatical relationsand categories are handled mostly by the word order, and not by the inflection, thusmaking them somewhat easier to learn.In contrast to this, synthetic languages such as Latin, Greek, Polish, or Russianmake an extensive use of inflection, and one word in these languages can appear ingreat many forms, reflecting grammatical categories such as tense, mood, person,number, gender, case, etc. Order of words is less important in synthetic languages.While this is an excellent feature from the point of view of a poet, it presentsalgorithmic problems in text processing. Let us suppose, for example, that we wantto count the number of distinct words in a given work – e.g., for the purpose ofcomparing two works and deciding which one uses larger vocabulary. How do wedo this in a language like Latin, where one dictionary headword can have as manyas hundred different forms? To make things even more difficult, in some cases, oneinflectional form can correspond to more than one dictionary headword, and onemust deduce from the context which one to choose.In Ref. 11, one of the authors considered this problem from a practical point ofview, and proposed a solution which exploits some features of the so-called inflectiongraph. Here we will not dwell on this problem, referring an interested reader toRef. 11, but we will instead discuss the inflection graph itself. We will first describesome of its topological features, and then propose a model which reproduces thesefeatures.
2. Inflection graphs
The inflection graph for a given language can be constructed as follows. First weneed to create a list of all words of the language, which, strictly speaking, is animpossible task, as every such list is bound to be incomplete. Nevertheless, one caneasily obtain a reasonably adequate list of words using sufficiently large dictionaryof the language. The set of all dictionary headwords will be denoted by H . Foreach headword, we generate a list of all possible inflected forms, and the list of allpossible inflected forms obtained this way will be denoted by I . We then constructa bipartite graph G = ( H, I, E ) , where E is the set of edges such that the edgebetween v ∈ H and u ∈ I exists if and only if u is an inflected form of v .Construction of the inflection graph is obviously possible only if one is ableto produce all inflected forms of a given word. For the Latin language, this canbe achieved using WORDS, a computerized dictionary of Latin created by WilliamWhitaker . The resulting bipartite graph has vertices and edges,and will be denoted by G LA .We were also able to construct inflection graph for Polish language, using lexicalgrammar developed by the Group of Computer Linguistics of AGH University ofScience and Technology in Kraków . The corresponding graph, to be denoted by G P L , has vertices and
802 911 edges.eptember 12, 2018 16:0
A model of language inflection graphs Normally, for most headwords in H , there are many corresponding inflectedforms in I , so an element of H is typically connected to many (sometimes 100or more) elements of I . For example, the Latin word dicunt (they say) and dixit (he said) are both inflected forms of the verb dico , thus we will have a vertex in H corresponding to dico connected to vertices in I corresponding to dicunt and dixit . However, the opposite can also be true: in some instances, a word can bean inflected form of more than one headword, so that vertices of I are sometimesconnected to more than one vertex of H . As an example, consider the word sublatus ,which could be a form of tollo (lift, raise) or suffero (bear, endure), thus a vertexin I corresponding to sublatus will be connected to two vertices in H .The inflection graphs are rather sparse, and they decompose into a large numberof connected components of different sizes. From the practical point of view, thesize of the component is not as important as the number of distinct headwords inthe component, which we will call headword groups . The motivation for this can beexplained as follows.Suppose that one wants to perform a computerized count of the number ofdifferent words occurring in a given text. Obviously, one wants to count two differentinflection forms of a given word as one and the same word, or, to put this differently,one wants to know how many distinct dictionary headwords appear in the text(in various inflected forms). However, since in languages with a complex inflectionsystem a given inflection form can sometimes belong to two (or more) differentdictionary headwords, it is impossible for a computer to decide which one is used ina particular case. To make such a decision, one has to understand the sentence andfigure out from the context what is means. In English this problem is quite rare,but still exists. For example, consider the word dove – this could be the singularform of the noun dove (a type of bird), or the simple past tense of the verb to dive .Computer program upon encountering dove in a text will not know whether to countit as occurence of the headword dove or to dive . The simple solution to this problemis to say that dove is an inflected form of a headword from the set (headword group){ dove, to dive }. This means that instead of counting how many distinct headwordsare present in the text, we can only count how many distinct headword groups arepresent. We want to know, however, what are the sizes of the headword groups, asit is, in a sense, a measure of the difficulty of the disambiguation problem. A goodway to analyze these sizes is to look at their distribution.The distribution of headword groups sizes in inflection graphs is quite striking,as can be seen in Figure 1, which shows the distributions for Latin and Polish. Thegraph for Latin and its analysis have been previously published in Ref. 11, here weadd the same graph for Polish language.We fitted a straight line in log-log coordinates to data points for which thenumber of groups exceeds 20, in order to exclude points with small count. The linesof the best fit are shown as dashed lines. There seems to be a power-law trend inboth data, more strongly pronounced in the graph for the Latin language. In theremaining part of this paper we will attempt to shed some light on the origin of thiseptember 12, 2018 16:0 Henryk Fukś, Babak Farzad and Yi Cao nu m be r o f g r oup s headword group size nu m be r o f g r oup s headword group size Fig. 1. Distribution of headword clusters for Latin (left) and Polish (right). Slope of the fittedline is, correspondingly, − . ± . and − . ± . . The figure for Latin previously appeared inRef. 11 phenomenon.The dashed lines of the best fit shown in Figure 1 represent the power law n s ∼ s − τ , (1)where τ ≈ − . ± . for Latin and τ ≈ − . ± . for Polish. Errors given for τ signify that decreasing/increasing τ by the given amount increases the reduced χ twice. Anyone who is familiar with the percolation theory , can immediatelyrecall that a very similar scaling law for cluster sizes holds for the lattice percolationat the critical point, where τ is known as the Fisher exponent . This is also thecase for the Erdös-Rényi model G ( n, p ) , that is, a graph constructed by connecting n nodes randomly so that each edge is included in the graph with probability p independent from every other edge. It is well known that at np = 1 and n → ∞ themodel undergoes a structural transition similar to percolation . The distribution ofcomponent sizes follows the power law of eq. (1), and the Fisher exponent is known to be τ = 2 . . Figure 2 shows component size distributions obtained numerically for G ( n, p ) with n = 28092 , that is, the same n as the number of headwords in G LA .Three values of np were used, np = 0 . (below the percolation threshold), np = 2 . (above the percolation threshold) and np = 1 . (at the percolation threshold). Thepower law in the form of eq. (1) is evident at the percolation threshold, yet it isclearly not valid away from the threshold. In spite of the fact that the number ofvertices is relatively small and that only 10 graphs were generated, the value ofthe exponent τ = 2 . ± . obtained from fitting the straight line to data agrees,within error bounds, with the aforementioned value of τ = 2 . .Considering the case of G ( n, p ) , one could suspect that the inflection graphs havea structure somewhat resembling Erdös-Rényi random graphs at the percolationthreshold. We will, however, demonstrate that this is somewhat more complicated.To avoid repetitions, from now on we will be using G LA as an example.eptember 12, 2018 16:0 A model of language inflection graphs nu m be r o f c o m ponen t s component size Fig. 2. Distribution of component sizes for the random graph averaged over 10 realizations of thegraph above ( × ), below ( (cid:63) ), and at the percolation threshold ( + ). The data point correspondingto the giant component above the percolation threshold is not shown. Slope of the fitted line is − . ± . . Error bars correspond to standard deviations, and for clarity are shown only for thedata at the critical point.
3. Structure of the inflection graph for Latin
In order to describe some important features of G LA , we will consider its projectionon H . Given a bipartite graph G = ( H, I, E ) , define its H -projection as G (cid:48) =( H, E (cid:48) ) , where { u, v } is in E (cid:48) if and only if u and v are both connected to a commonvertex in I . H -projection of G LA has 28092 vertices and 24064 edges. Only 13345headwords have degree greater than zero in G (cid:48) LA . Note that for obvious reasons,distribution of component sizes of G (cid:48) LA is the same as the distribution of groupsizes in G LA . Could it then be that G (cid:48) LA resembles Erdös-Rényi random graph?In order to answer this question, we will first consider the degree distribution of G (cid:48) LA shown in Figure 3. Unlike in the case of G ( n, p ) , the degree distribution of G (cid:48) LA is clearly not Poissonian, and for small degree values it seems to follow exponentialdecay, shown in the figure as a straight line. The mean vertex degree is . . Thisalready indicates that G ( n, p ) cannot be a model of G (cid:48) LA – the mean vertex degreeof G ( n, p ) with a power law distribution of components sizes must be equal to 1.0.We can see the difference between G ( n, p ) and G (cid:48) LA even better if we use thenotion of core clustering spectrum , introduced in Ref. 18. For a non-negative inte-ger k , the k -core of a graph is the maximal subgraph such that its vertices havedegree greater or equal to k . By the “degree” in this definition we mean the de-gree of the vertex in the subgraph. If G is a given graph, we denote by G { k } the k -core of G . Now let C ( G ) denote the clustering coefficient of G . A set of pairs ( | G { k } | , C ( G { k } )) , where | G | denotes the number of vertices of G , will be called coreclustering spectrum of G . One can visualize the core clustering spectrum by plottingpoints ( | G { k } | , C ( G { k } )) on a plane, as it has been done in Ref. 18. Here we will useeptember 12, 2018 16:0 Henryk Fukś, Babak Farzad and Yi Cao nu m be r o f v e r t i c e s degreeLA c l u s t e r i ng c oe ff i c i en t core numberLA nu m be r o f v e r t i c e s degreePL c l u s t e r i ng c oe ff i c i en t core numberPL Fig. 3. Degree distribution (left) and clustering coefficients of cores (right) of H -projection of G LA (top) and G PL (bottom). slightly different graphs in order to convey a similar information, namely we willplot C ( G { k } ) as a function of k . We will call it the graph of clustering coefficientsof cores . This has the advantage over the plot of core spectrum in having the corenumber explicitly as one of the variables. The value of k will range from to k max ,where k max is the largest k for which G { k } is non-empty.For some graphs, such as the Erdös-Rényi random graphs, most vertices belongto the same k -core, as documented in Ref. 19. This means that the graph of cluster-ing coefficients of cores for Erdös-Rényi random graphs is very narrow, consisting ofonly a small number of points. This is not the case for G (cid:48) LA , as Figure 3 attests. G (cid:48) LA possesses highly clustered inner core, feature absent in Erdös-Rényi model near thepercolation threshold.Degree distribution of G (cid:48) P L and its graph of clustering coefficients of cores arequite similar to corresponding graphs of G (cid:48) LA , as shown in the bottom of Figure 3.
4. Model
In order to construct a model of inflection graphs which exhibits power law scal-ing resembling Figure 1, as well as having the degree distribution and clusteringcoefficients of cores of its H -projection resembling Figure 3, we need to make acouple of further remarks regarding topological structure of inflection graphs, againusing G LA as an example. It is useful to think of G LA as a collection of stars, eacheptember 12, 2018 16:0 A model of language inflection graphs centered at a headword and with arms connecting the headword to some inflectedforms. These stars are not completely disjoint, however. Sometimes they share oneor more vertices in I , and this occurs if a given headword shares some of its inflectedforms with another headword (or headwords).Let n be the number of headwords, and m be the number of inflected forms.Construction of the random graph serving as a model of G LA proceeds in two stages.In stage 1, we generate an assembly of stars, each centered at a headword and witharms connecting the headword to some inflected forms. In stage 2, we generate anumber of random bridges between these stars. We now describe the two stages indetail. Algorithm for generating stars (1) Generate the set of vertices H = { H , H , . . . , H n } corresponding to headwords,and another set I = { I , I , . . . , I m } corresponding to inflected forms.(2) For each i ∈ { , , . . . n } , draw a random number x i from a distribution f h tobe described below, and connect vertex H i to vertices I j +1 , I j +2 , . . . , I j + (cid:98)| x i |(cid:99) ,where j = 0 for i = 1 and j = (cid:80) i − p =1 (cid:98)| x p |(cid:99) otherwise. If any vertex index in I j +1 , I j +2 , . . . , I j + (cid:98)| x i |(cid:99) exceeds m , it is replaced by its value modulo m .(3) If any isolated vertices in I still remain, connect each of them to a randomly se-lected vertex in H . After this is done, relabel the set I so that vertices connectedto the same headword are labeled with a block of consecutive integers. The probability distribution function f h is a weighted sum of three normal dis-tributions, f h ( x ) = (cid:88) i =1 w i f σ i ,µ i ( x ) , (2)where f σ,µ ( x ) = 1 σ √ π e − ( x − µ )22 σ . (3)We used values ( w , w , w ) = (0 . , . , . , ( µ , µ , µ ) = (8 , , and ( σ , σ , σ ) = (2 , , . These were obtained by fitting the resulting degree dis-tribution to the degree distribution of the actual inflection graph, but their valuesare not too critical, meaning that small changes in values of these parameters stillproduce graphs with power-law distribution of headword group sizes.Note that although the random number x i drawn from the distribution f h instep two may theoretically be zero, yet the probability of such event is extremelysmall. In our program implementing the algorithm for generating stars, we simplyreject x i = 0 outcome and draw another number if it happens.The reason for taking f h to be the sum of three normal distributions is thestructure of Latin vocabulary. With respect to inflection, one can distinguish threemain groups of words: (1) verbs (inflexion by conjugation), (2) nouns and adjectiveseptember 12, 2018 16:0 Henryk Fukś, Babak Farzad and Yi Cao (inflection by declension) and (3) all other words. We should remark here that thisshape of the distribution is suitable for Latin, but for a different language, with adifferent grammatical structure, it would have to be different – in particular, thenumber of normal distributions in the sum would likely have to change. Moreover,we used normal distribution for the sake of simplicity, and we do not claim thatthis reflects the actual distribution of inflection forms very accurately, but it is closeenough for our purposes. One should also note that f h may theoretically producenegative numbers (again, with very small probability), and this is why we take theabsolute value of x i . We also round x i down to the nearest integer. One could use inplace of the normal distribution some other distribution with strongly pronouncedpeak and producing only positive numbers, such as, for example, the log-normaldistribution. We found, however, that the detailed shape of the distribution is nottoo crucial for our goal of reproducing the desired properties of the inflection graph,thus we kept the normal distribution for simplicity.Once the assembly of stars is created, we add a number of bridges between thestars. The most crucial feature of these bridges comes from the fact that typicallytwo headwords share not one, but many inflected forms with another headwordor headwords. This is because there exists a large number of pairs of closely re-lated Latin words, each having a separate entry in the dictionary. For example, thewords dico (say), dictum (utterance, remark) and dictus (speech) are all closelyrelated, thus they share many inflected forms. After experimenting with many pos-sible methods for generation of bridges, we came out with a simple algorithm, whichbasically adds a fixed number of edges at a time.Let λ and T be two positive integers, to be used as parameters in our algorithm. Algorithm for generating bridges (1) Randomly select two headword vertices H a and H b , where by H a we denotedthe vertex with the larger degree. Vertex H a is already connected to k inflectedforms, let us denote them by { I r , I r +1 , . . . I r + k − } . (2) Add λ additional edges by connecting H b with vertices { I r , I r +1 , . . . I r + λ − } .(3) Repeat the above two steps T times. Note that the second step is performed exactly as described even if λ > k , but inthis case some of the inflected forms with which we connect H b will not be inflectedforms of H a , but inflected forms of some other word(s). Also note that k is alwaysgreater than zero, because the algorithm for generating stars ensures that this isthe case. This agrees with our interpretation of the meaning of the “inflected form”.We assume that every word has at least one inflected form – if it is an adverb,for example, its sole “inflected form” is identical to itself. This is consistent wit thetreatment of other parts of the speech. For instance, for nouns we count nominativesingular among inflected forms, even though it is identical to the headword form.Regarding the value of λ and T , they must be selected as follows. After com-eptember 12, 2018 16:0 A model of language inflection graphs pleting the algorithm for generating stars, the number of edges in the graph is onlyslightly larger than | I | (recall that in step 2 we are replacing indices exceeding m with their values modulo m , but this happens only rarely for a few values of i closeto n ). Of course it could theoretically happen that the number of vertices will belarger than the desired number of edges (we want to have the same number of edgesin the model graph as in the inflection graph being modeled). With the choice of pa-rameters which we have made, the probability of such event is so exceedingly small,that for all practical purposes we can simply ignore such eventuality. Nevertheless,if it indeed happened, one would have to discard the result and run the algorithmfor generating stars again.Having less than the desired number of edges, we must ensure that the product λT is equal to the number of remaining edges which we want to produce. This meansthat only one of those two parameters can be freely chosen. By experimenting withdifferent values of λ in the range from to , we found that λ = 10 produces themost clearly pronounced power-law distribution of headword sizes in the resultinggraph. The typical corresponding value of T in this case is T = 7692 . We say“typical” because, as explained earlier, the exact number of vertices in the graphobtained after applying the algorithm for generating stars will slightly fluctuatefor different realizations of the graph, thus the number of “missing vertices”, andconsequently the value of T , will slightly fluctuate too. The shape of the headwordgroup size distribution graph, however, is only weakly affected by changes of λ and T as long as their product remains equal to the number of “missing vertices” andproviding that λ > . For example, if instead of λ = 10 and T = 7692 we use λ = 5 and T = 15384 , there is almost no perceptible difference in the shape of the graph.We generated random graph following the above algorithm using | H | = 28092 and | I | = 1000880 , that is, the same number of vertices as in the actual inflectiongraph. This graph will be called G MOD . Its distribution of headword group sizesis shown in Figure 4. Agreement with the actual distribution shown in Figure 1 isindeed very good. Even the slope of the fitted line agrees (within the error bound)with the exponent observed in G LA , as these are respectively − . ± . and − . ± . .The model also performs well when one considers H -projection of G MOD . Fig-ure 5 shows both the degree distribution and the graph of clustering coefficientsof cores of G (cid:48) MOD . Comparing these graphs with Figure 3, we can observe goodqualitative agreement. Degree distribution of G (cid:48) MOD is very similar to degree distri-bution of G (cid:48) LA , except that G (cid:48) MOD misses a small number of high-degree vertices,present in G (cid:48) LA . Clustering coefficients of cores of both graphs exhibit very similarbehavior, that is, the clustering sharply increases with increasing core number, andreaches value close to 1 for the inner core, indicating the presence of cliques in high(innermost) cores.eptember 12, 2018 16:0 Henryk Fukś, Babak Farzad and Yi Cao nu m be r o f g r oup s headword group size Fig. 4. Distribution of headword group sizes for the model graph G MOD averaged over 10 real-izations of the graph. Slope of the fitted line is − . ± . . nu m be r o f v e r t i c e s degree c l u s t e r i ng c oe ff i c i en t core number Fig. 5. Degree distribution (left) and clustering coefficients of cores (right) of H -projection ofthe model graph, averaged over 10 realizations of the graph. Error bars correspond to standarddeviation.
5. Conclusions
We have discussed selected topological properties of inflection graphs and proposed arandom graph model which exhibits the desired properties. In particular, our modelpossesses nearly identical distribution of headword group sizes, and its H -projectionexhibits degree distribution and clustering coefficients of cores qualitatively similarto analogous properties of the original inflection graph for the Latin and Polishlanguages.A number of unresolved questions remain. First of all, it would be helpful toformally prove that the distribution of headword group sizes in our model followsa power law, as well as to prove that the degree distribution of the H -projectiondecreases exponentially with degree. We feel that further simplification of the modelmay be needed in order to achieve this goal.eptember 12, 2018 16:0 A model of language inflection graphs A separate question is the meaning and implications of the observed featuresof inflection graphs in the linguistic context. It seems plausible, for example, thatthe structure of the inflection graphs is in some sense optimal. If the number of“bridges”, that is, connections between headword stars was much higher, the wholeinflection graph would be connected, and the disambiguation of headwords basedon inflected forms would be difficult. On the other hand, if there were no bridgesbetween headword stars at all, then a much larger number of inflected forms wouldbe needed. One can therefore speculate that the actual inflection graph representssome sort of compromise between these two extremes. In order to substantiate thisclaim one would need to construct a dynamical process producing many possibleforms of inflection graphs, and then show that the attractor of this process is theactual inflection graph, just like in the case of self-organized criticality.It is also possible to draw some further analogy between the percolation processand inflection graphs. One can think of percolation as a process in which one startswith a graph with n vertices and no edges, and then adds random edges one by one.The graph will then undergo a percolation transition, and the power-law distributionof component sizes will be observed at the transition point. Below and above thepercolation point, no power law will be observed. In order to mimic this process,we took the graph G LA and started adding random edges to it . As expected,this destroyed the power-law distribution of components sizes of G (cid:48) LA , although,obviously, it is very difficult to pinpoint how many edges exactly are needed todestroy the power law – the power law is not exact in the first place. The samephenomenon can be observed when one adds random edges to G MOD . One canthus say that inflection graphs as well as the model graph are somewhat “frozen”at the threshold, or slightly below the threshold, of some percolation process. Asintriguing as it is, this statement has to be taken very cautiously, because in theactual inflection graph edges cannot be added or removed – the graph is a fixedfeature of the language. We plan to probe this issue further in the near future.
Acknowledgments
References
1. R. Ferrer i Cancho and R. V. Solé, “The small world of human language,”
Proc. Roy.Soc. Lond. B (2001) 2261–2265.2. A. E. Motter, A. P. S. de Moura, Y. C. Lai, and P. Dasgupta, “Topology of theconceptual network of language,”
Phys. Rev. E (2002).3. O. Kinouchi, A. S. Martinez, G. F. Lima, G. M. Lourenco, and S. Risau-Gusman,“Deterministic walks in random networks: an application to thesaurus graphs,” Physica A (2002) 665–676. eptember 12, 2018 16:0 Henryk Fukś, Babak Farzad and Yi Cao
4. A. D. Holanda, I. T. Pisa, O. Kinouchi, A. S. Martinez, and E. E. S. Ruiz,“Thesaurus as a complex network,”
Physica A-Statistical Mechanics And ItsApplications (2004) 530–536.5. M. Sigman and G. A. Cecchi, “Global organization of the wordnet lexicon,”
PNAS (February, 2002) 1742–1747.6. J. Y. Ke and Y. Yao, “Analysing language development from a network approach,” Journal Of Quantitative Linguistics (2008) 70–99.7. S. M. G. Caldeira, T. C. P. Lobao, R. F. S. Andrade, A. Neme, and J. G. V.Miranda, “The network of concepts in written texts,” European Physical Journal B (2006) 523–529.8. A. Pomi and E. Mizraji, “Semantic graphs and associative memories,” PhysicalReview E (2004) 066136.9. R. Ferrer i Cancho, R. V. Solé, and R. Köhler, “Patterns in syntactic dependencynetworks,” Physical Review E (2004) 051915.10. L. Antiqueira, M. G. V. Nunes, O. N. Oliveira, and L. D. Costa, “Strong correlationsbetween text quality and complex networks features,” Physica A-StatisticalMechanics And Its Applications (2007) 811–820.11. H. Fukś, “Inflection system of a language as a complex network,” in
Proceedings of2009 IEEE Toronto International Conference – Science and Technology forHumanity TIC-STH 2009 , pp. 491–496. 2009. arXiv:1007.1025 .12. W. Whitaker, “WORDS, Latin-English dictionary.”http://users.erols.com/whitaker/words.htm.13. W. Lubaszewski, H. Wróbel, M. Gajęcki, B. Moskal, A. Orzechowska, P. Pietras,P. Pisarek, and T. Rokicka, “Polish inflection lexicon,” in
Słowniki komputerowe iautomatyczna ekstrakcja informacji z tekstu , pp. 37–67. AGH UczelnianeWydawnictwa Naukowo-Dydaktyczne, Kraków, 2009.14. D. Stauffer and A. Aharony,
Introduction to Percolation Theory . Taylor and Francis,London, 1994.15. B. Bollobás and O. Riordan,
Percolation . Cambridge University Press, Cambridge,2006.16. B. Bollobás, “Mathematical results on scale-free random graphs,” in
Handbook ofGraphs and Networks , pp. 1–37. Wiley, 2003.17. F. Mori and T. Odagaki, “Percolation analysis of clusters in random graphs,”
J. ofthe Phys. Soc. of Japan (2001) 2485–2489.18. H. Fukś and M. Krzemiński, “Topological structure of dictionary graphs,” J. Phys.A: Math. Theor. (2009) art. no. 375101.19. J. I. Alvarez-Hamelin, L. Dall’Asta, A. Barrat, and A. Vespignani, “K -coredecomposition : a tool for the visualization of large scale networks,” Advances inNeural Information Processing Systems (2006) 41, arXiv:cs.NI/0504107arXiv:cs.NI/0504107