OOn measuring linguistic intelligence
Maxim Litvak ([email protected])September 16, 2018
AbstractThis work addresses the problem of measuring how many languages a person“effectively” speaks given that some of the languages are close to each other. Inother words, to assign a meaningful number to her language portfolio.Intuition says that someone who speaks fluently Spanish and Portuguese islinguistically less proficient compared to someone who speaks fluently Spanishand Chinese since it takes more effort for a native Spanish speaker to learnChinese than Portuguese. As the number of languages grows and their profi-ciency levels vary, it gets even more complicated to assign a score to a languageportfolio.In this article we propose such a measure (”linguistic quotient” - LQ) thatcan account for these effects.We define the properties that such a measure should have. They are basedon the idea of coherent risk measures from the mathematical finance.Having laid down the foundation, we propose one such a measure togetherwith the algorithm that works on languages classification tree as input.The algorithm together with the input is available online at lingvometer.com1 a r X i v : . [ c s . C L ] M a r ontents If we aim to compare different language portfolios (consisting of different lan-guages at different proficiency levels), we need some way to measure it. Similarto IQ that is supposed to measure the intelligence, we need a “LQ” that measureslinguistic intelligence.Linguistic intelligence for the scope of this article is the achieved proficiencyin some set of languages, i.e. not the potential to learn fast a new one, but anachievement that already took place.The main idea is that languages that are related to each other give less tothe linguistic intelligence than those that are unrelated.We will deploy actively the ideas and methods from finance. A sample oflanguages resembles to some extent a portfolio of assets.To a given portfolio of assets it is more valuable to add an asset that is notcorrelated or even negatively correlated to the assets of a given portfolio (e.g.[Markowitz 52]).In the same way for a given sample of languages (further, portfolio of lan-guages) it is more valuable (in the sense of linguistic intelligence) to add alanguage that is not related to the languages of a given portfolio.The article is organized as follows. In the section 3.1 we design the measuregiven the set of properties that sounds reasonable from the intuitive and logicalpoint of view.In the section 3.2 we turn this design into mathematical model that allowsto calculate a score to a given portfolio of languages (it could be calculated forany language profile at lingvometer.com).
We reason on properties the linguistic intelligence should have. Based on theseprinciples we will later derive the formal rules.2e consider all the languages as equal to each other. It is not a trivialassumption since there are languages that have been developing over thousandsof years a reach literary and oral traditions, serving as medium for the scientificexpression etc.There are other languages that fail to compete with the former on this ac-count. It is tempting to say that they are less valuable. However, we still regardthem as equal since the ”undeveloped” languages could be harder to acquire ex-actly because they don’t have the written tradition.People master the languages to the different extent. Assume that the profi-ciency degree of the grown-up educated native speaker is 1. Someone who neverfaced it has the level 0 in this language. It would be logical to demand thatincreasing the proficiency in some language of the portfolio would increase theLQ of the person.We set the measure of the portfolio of n independent languages to be n , i.e.LQ of the portfolio of Spanish and Chinese would be 2.Consequently, measure of the portfolio of n languages, such that some ofthem are related to each other is less than n . Further, if we add to a portfolioa language that is already there then the measure of the portfolio must remainthe same.Thus, we would assign to the portfolio of two languages a number between1 and 2. The closer they are, the closer is this number to 1. The further theyare, the closer is this number to 2.We interpret such a measure as the “real” (or effective) number of languagesa person speaks.Thus, one and one is two if we add Spanish to Chinese. If we add Spanishand Portuguese, then one and one could be something like 1.3, maybe a littlebit less or a little bit more.There are also other reasons than the time to learn a new language to assigna higher score to Spanish+Chinese than to Spanish+Portuguese. We could alsoargue that learning a distant language, one learns also new structures that arenot there in the related language. 3 Formal approach
In this chapter we will argue on reasonable properties of a linguistic intelligencemeasure. Further, we formalize the properties as axioms and propose a formulathat satisfies them.
We write our considerations on how the LQ measure should behave in the fol-lowing list of axioms. It was inspired by the idea of coherent risk measures[Artzner et al. 99] from financial mathematics.Consider a language l is an element of languages space L and weightedlanguage w (in other words a language with proficiency level) is an element ofthe space W := L × [0 , that is a language and a number between 0 and 1 (1 isa fluent command of a language, 0 means any knowledge is absent).Portfolio Π of N languages is an element of space W N . If needed we couldalso consider the portfolio Π of N languages to be an element of space L N (i.e.languages at fluent proficiency). Definition : A linguistic intelligence measure (l.i.m.) is a function s.t. λ : W N → R If we set all proficiency levels to 1, then l.i.m. is also defined on L N : λ : L N → R .Now, we formalize the arguments of the previous chapter as axioms.We consider all languages to be equal: Axiom E . Equivalence. ∀ l ∈ L, l (cid:54) = ∅ : λ ( l ) = 1 A portfolio of languages weights at most as sum of its components:
Axiom S . Subadditivity. For any 2 language portfolios it must hold: ∀ Π , Π ∈ W N : λ (Π ∪ Π ) ≤ λ (Π ) + λ (Π ) Axiom ND . No double-counting. For a language l that is in portfolio Π itmust hold: l ∈ Π ⇒ λ (Π ∪ l ) = λ (Π) Axiom I . Independency. For a language l that is independent to any lan-guage in portfolio Π it must hold: Π ⊥ l ⇒ λ (Π ∪ l ) = λ (Π) + λ ( l ) Axiom PH . Positive homogeneity. Proficiency effect is linear. ∀ c ∈ [0 , , ∀ l ∈ L, w = ( c, l ) ∈ W : λ ( w ) = cλ ( l ) Definition
A linguistic measure is called coherent (c.l.i.m.) if it satisfiesaxioms E, S, D, I, PH.There are many measures that satisfy these axioms. We propose one in thenext chapter.Even though a c.l.i.m. seems to be strictly defined since there are manyrequirement to be met, there are 2 key points in the whole construction thatstill leave the room for interpretation: how the language space is constructed andhow the set-theoretic operations (independence and union) between elements ofthis space are defined. Later on, we will see 2 completely different constructionapproaches and further constructions are still possible.Further, we observe the axioms from different points of view to understandthem better. If the axiom E takes place, then axioms S, D and I lead to the4ollowing inequality: ∀ Π ∈ L N , ∀ l ∈ L : λ (Π) ≤ λ (Π ∪ l ) ≤ λ (Π) + 1 Thus, adding a language to a portfolio adds a number between 0 and 1 tolinguistic intelligence.This inequality helps us to reduce the range of l.i.m., s.t. λ : L N → [1 , N ] or in terms of W (with PH axiom) λ : W N → [0 , N ] .Simplifying further the inequality, we can write it for a portfolio of 2 lan-guages as follows: ∀ l , l ∈ L, Φ = ( l ∪ l ) ∈ L : 1 ≤ λ (Φ) ≤ If for some reason a space L is defined in such a way that it includes anempty element, then the axiom E must be adjusted such that λ ( ∅ ) = 0 We consider the portfolios of languages to be a tree with hypothetical Tower ofBabel (ToB) language as a source. A portfolio consisting of only one languagewould be a path from the source to the language through the language families,groups, subgroups etc classification.The children of ToB are language families like Indo-European or Sino-Tibetan. This is the layer of nodes of rank 1, the languages are consideredindependent if they belong to different language families.Thus, the language space L N is the full tree of all (N) languages. Thelanguages are the leaves of the tree and portfolios are the induced subgraphsof the full graphs containing the source (ToB), leaves and the paths betweenthem. Thus, a portfolio of 1 language would be a path from the source throughall families, sub-families, groups etc to this language. The union of 2 portfolioswould be the union of the corresponding subgraphs.To illustrate it, consider a portfolio Π consisting of Chinese and Serbianand portfolio Φ consisting of English and Slovene. Then the unified portfolio Ψ = Π ∪ Φ would contain all 4 languages. The situation is illustrated on thefigure 1.Let N be the maximal depth of the tree and V r the set of all nodes of depth r . Initialize all languages with their proficiencies (not necessarily all hanging inthe deepest layer).Starting from the deepest layer up to the source calculate for each node inthe layer iteratively the LQ (bottom-up).Denote the LQ of the node v as λ v and Ch(v) as the set of nodes which arethe children of node v. For r = N, , ∀ v ∈ V r λ v = (cid:0) (cid:88) c ∈ Ch ( v ) λ √ rc (cid:1) √ r We define the result of the last step of this iterative process as LQ mea-sure. The pseudocode for the algorithm both the straightforward way and therecursive way is given in the appendix.
Definition . LQ -measure is such a l.i.m. that LQ = λ v , v ∈ V f (1) = 1 ) would give another c.l.i.m.: λ v = (cid:0) (cid:88) c ∈ Ch ( v ) λ f ( r ) c (cid:1) f ( r ) LQ turns out to be coherent.
Lemma . LQ is c.l.i.m.
Proof . Axiom S. Consider the language portfolios Φ and Π that could havecommon elements (they have at least ToB in common). We could rewrite thechildren of the node as being the union of its children belonging to Π and itschildren belonging to Φ , then we algorithm to calculate LQ (Π ∪ Φ) looks asfollows (again starting with the nodes from the bottom): λ v = (cid:0) (cid:88) c ∈ Ch Π ( v ) ∪ Ch Φ ( v ) λ f ( r ) c (cid:1) f ( r ) The algorithms to calculate LQ (Π) and LQ (Φ) are respectively λ v = (cid:0) (cid:88) c ∈ Ch Π ( v ) λ f ( r ) c (cid:1) f ( r ) and λ v = (cid:0) (cid:88) c ∈ Ch Φ ( v ) λ f ( r ) c (cid:1) f ( r ) λ ( c ) forms a vector from the space R Ch ( v ) , their Minkowskidistance to the 0 vector with p = f ( r ) is λ v . In case Ch Φ ( v ) ∩ Ch Π ( v ) = ∅ thetriangular inequality states that λ v = (cid:0) (cid:88) c ∈ Ch Π ( v ) ∪ Ch Φ ( v ) λ f ( r ) c (cid:1) f ( r ) ≤ (cid:0) (cid:88) c ∈ Ch Π ( v ) λ f ( r ) c (cid:1) f ( r ) + (cid:0) (cid:88) c ∈ Ch Φ ( v ) λ f ( r ) c (cid:1) f ( r ) If the intersection of the children for this node is not empty, then we have ad-ditional terms on the right side of the inequality and, thus, still holds. Note thaton the right side of inequality we have the components that flow in calculationof LQ (Π) and LQ (Φ) respectively.Thus, going from the bottom to the top, we will finally have LQ (Π ∪ Φ) ≤ LQ (Π) + LQ (Φ) This satisfies the axiom S.Axiom I is satisfied due to the property of the formula that at the rank 1(this level is considered to contain independent entities) the sum of LQs is linear(i.e. at no cost).Axioms E and PH are satisfied due to the fact that initialized with 1 (orwith c ∈ [0 , in case PH) the LQ of the only language could be push up alonethe path to the source at no cost.Axiom ND is satisfied due to the space construction. (cid:4) A shortcoming of the approach is that some language dependences are notcaptured by tree structure. For instance, the direct French influence on Englishcould be represented by the edge between them. This would, however, destroythe tree structure.One of the advantages is that the language tree data is easily available, forinstance in [ethnologue 2015].One could try the algorithm on different input at lingvometer.com
We have seen in the previous chapter the measure constructed on languagetrees. However, it does not have to be trees. In this chapter we will discuss analternative approach, namely the one based on correlation matrices.The entries of the matrices represent a correlation or language distance (moreprecisely 1 minus distance to make it look like a correlation) expressed as anumber between 0 and 1. There are many works on measuring such a distance, toname a few [Petroni and Serva 2010], [Delsing and Åkesson 2005], [Koehn 2005],[Chiswick and Miller 2005], [Gingsburgh and Weber 2011]. They use differentapproaches: difficulty of acquiring a language, difficulty of machine translationbetween languages, number of words in common etc. Most of them cover rathera small part of all possible correlations/distances. With N languages, there mustbe N(N-1)/2 distances.The same as stocks prices are correlated with the information stored incorrelation matrices, we could also consider the languages to be correlated.We try to construct a l.i.m. on × matrices i.e. portfolios of 2 languages.The correlation matrix in this case looks like this:1 ρρ ρ ).Languages are independent if ρ = 0 and more dependent as ρ is closer to 1with two languages being equal if ρ = 1 .In order for the l.i.m. to be coherent, we need among other things λ ( M (0)) =2 and λ ( M (1)) = 1 with λ being monotone decreasing on ρ .One such a c.l.i.m. is λ ( M ( ρ )) = 2 − ρ . Axioms I and ND we checked above.The matrix of one language is simply 1, thus, axiom E holds as well. Axiom PHis not relevant on L space (it’s relevant only on W N spaces). Axiom S holdssince − ρ ≤ ( ≤ ρ ≤ ).It is not the only c.l.i.m. on this space since a familty of c.l.i.m. could beconstructed like this λ ( M ( ρ )) = 2 − ρ r , r > We discussed a particular case that shows an example of c.l.i.m. on thematrix space. The general case is still open.One of the shortcoming of the matrix approach is that the data on all N × ( N − dependencies is not available. Different studies assign if close, but stilldifferent numbers. Apparently, the tree data could be mapped to the matrixand vice versa (e.g. [Petroni and Serva 2008]). We presented a sound way to measure a portfolio of languages. How could it beused except for measuring someone’s linguistic intelligence?There is a broad field of research on intersection between economics andlinguistics. An extensive overview could be found in [Grin 2003].LQ could be used for instance as communication cost function. An institutthat evaluates several options as a working language(s), could aim to minimizethe overall LQ of its members since it would also mean the minimization ofcommunication costs. For example, the optimal language (or an optimal bun-dle of languages) for European Union could be chosen in such a way that theaggregated LQ of European population would be minimal.The implementation of the LQ algorithm could be found at lingvometer.com
Appendix
The pseudocode is presented in python style. First, initialize all leaves withLQ equal to the proficiency level of the corresponding language. Note that theleaves could lie in different layers and, thus, have different rank: for node in nodes:if node is leaf:node.lq=node.language.proficiency
The calculation: for rank in range[deepest_rank,0]:
The result is
LQ=nodes_of_layer(0)[0].lq
We also write down the recursive version of the algorithm. The initializa-tion step is the same. The recursive function that will do the job could beimplemented like this def lq_recursive(rank):if children(node) is empty:return node.language.proficiencyelse: temp_sum=0for child in children(node):temp_sum+=lq_recursive(child)return power(temp_sum, node.rank+1)
Then we can calculate LQ with the following call:
LQ=lq_recursive(ToB)
We introduce here an example of a language profile that is not trivial, but alsonot very complex. It contain patterns that test the first principles. Someonespeaks fluently Serbian, Slovene, Croatian and Chinese fluently. Besides, hehas some command of English that qualifies at 50% level. We also do thearithmetics and show every step of the calculation. Consider the followinglanguage portfolio. To initialize the algorithm, we assign the rank to each layerand set LQ to 1 for each language except for English where we set LQ to 0.5(Fig. 1). 10igure 2: InitializationThe deepest layer (that of Serbian, Slovene and Croatian) is of rank 5. Now,calculate the LQ for Western Branch of South-Slavic languages (Fig. 2). λ W estern = (cid:0) √ + 1 √ + 1 √ (cid:1) √ = 3 √ ≈ . If a node has just one child than according to the formula it takes its LQunbiased.For another example, let’s take the node of the Indo-European family (rankof the layer is 2). It has 2 children, namely Germanic and Slavic groups withLQ calculated at previous iterations equal to 0.5 and 1.63 respectively. λ W estern = (cid:0) . √ + 1 . √ (cid:1) √ ≈ . At the final step we can sum the LQs of the languages families at no cost: LQ = 1 .
84 + 1 = 2 .84