[PDF] Random vector generation of a semantic space

Abstract

We show how random vectors and random projection can be implemented in the usual vector space model to construct a Euclidean semantic space from a French synonym dictionary. We evaluate theoretically the resulting noise and show the experimental distribution of the similarities of terms in a neighborhood according to the choice of parameters. We also show that the Schmidt orthogonalization process is applicable and can be used to separate homonyms with distinct semantic meanings. Neighboring terms are easily arranged into semantically significant clusters which are well suited to the generation of realistic lists of synonyms and to such applications as word selection for automatic text generation. This process, applicable to any language, can easily be extended to collocations, is extremely fast and can be updated in real time, whenever new synonyms are proposed.

Full PDF

Random vector generaon of a semanc space

Jean‐François Delpech Sabine Ploux Instut des Sciences Cognives UMR5304 CNRS ‐ Université de Lyon 67, boulevard Pinel 69675 BRON cedex, France [email protected] [email protected]

We show how random vectors and random projecon can be implemented in the usual vector spacemodel to construct a Euclidean semanc space from a French synonym diconary. We evaluatetheorecally the resulng noise and show the experimental distribuon of the similaries of terms in aneighborhood according to the choice of parameters. We also show that the Schmidt orthogonalizaonprocess is applicable and can be used to separate homonyms with disnct semanc meanings.Neighboring terms are easily arranged into semancally signiﬁcant clusters which are well suited to thegeneraon of realisc lists of synonyms and to such applicaons as word selecon for automac textgeneraon. This process, applicable to any language, can easily be extended to collocaons, is extremelyfast and can be updated in real me, whenever new synonyms are proposed.

In their seminal work, Ploux and Victorri have used synonymy relaons deduced from French electronic diconaries to createsemanc spaces around French words and their neighbors. Their deﬁnion of “synonymy” is fairly broad and includes hyponymy(moineau and oiseau), hyperonymy (arme and pistolet) or even non‐synonymous, but related terms (autocar and automobile);however, in their work, true synonyms (i.e. terms which are more or less interchangeable) form cliques of the graph of synonyms,i.e. maximally complete subgraphs. While this is very interesng from a theorecal standpoint, as it then becomes straigh跀orwardto evaluate an interclique distance (or degree of separaon) between any two terms in the graph (as long as neither belongs to anisland, such as lapereau and lapinot), it is not very useful in pracce. For example, an author in search of the right term may wellnot be interested in strict synonyms; terms with related or even opposed meanings can oen be preferable in rhetorical ﬁgures.Also, in many applicaons such as automac text generaon, a well‐deﬁned and mathemacally well behaved semanc distancebetween terms is oen a prerequisite.In this report, we show how an Euclidean semanc distance can quickly and easily be constructed from Ploux and Victorri'sdatabase (which contains 54,685 terms and 116,694 cliques). Since the pioneering work of Salton , , it is well understood that any combinaon of terms, such as a clique, can be seen as avector in a space where each dimension represents a disnct term (or lemma.) (1)This representaon is extremely frui跀ul and forms the basis of numerous informaon retrieval systems; it suﬀers however from asevere limitaon in that each term is orthogonal to each other. Of course, the dual equaon from Equaon 1, (2) = ( , , … , ) C j t j t j t t , j = ( , , … , ) T k c k c k c s , k ay be used to compute term distances (or similaries) but the very high dimensionality of the subtending space makes suchdistances diﬃcult to compute and to interpret: this is the “curse of dimensionality”. If with cardinality is the set of disnct terms occurring in all the cliques containing term , we deﬁne the overlap similaritybetween two terms as the cardinality of the intersecon (each word being counted only once.) Obviously, for most pairs, since for any the total number of disnct terms in the database is much larger than , which ranges from to in Ploux and Victorri's database with an average value of 8.5. “Contexonyms” are words which co‐occur in a given context (as for example in the same sentence of a corpus); while they are notsynonyms, they are obviously closely related. Ji, Ploux and Wehrli have proposed an automac contexonym organizing model(ACOM) which relies on counng of co‐occurrences and evaluang their probabilies to automacally produce and organizecontexonyms for a target word. The test results, aer training on an English corpus maintained by Project Gutenberg, show thatthe model is able to classify contexonyms as well as to reﬂect words' minute usage and nuance. Dimensionality reducon can be achieved by a low‐rank approximaon of the term‐document matrix. This can be done by LatentSemanc Indexing , which reduces dimensionality through a singular value decomposion (SVD) of the term‐document matrix,retaining only a comparavely small number of the largest singular values. This method has been very successfully used fordocument indexing and retrieval. It suﬀers nevertheless from limitaons:SVD is computaonally intensive, even though the large term‐document matrix is very sparse;There is no really sasfactory way to increment the results as new terms/documents become available.More importantly, it is not well suited to generang a semanc space from cliques. The resulng, lower dimensional space is thebest approximaon, in the least squares sense, of the posion of any term belonging to the whole set of cliques: the distancebetween any pair of terms will be opmal, while what is really of interest from the present perspecve is the accuratedeterminaon of distances between semanc neighbors.As a test, a SVD decomposion of the clique‐term matrix (of which eq. 2 is a row) was performed. It reduced the matrix size from54,685 x 116,694 to 54,685 x 250, meaning that each term was associated with a vector having 250 orthogonal components. Thedecomposion, which took 134 sec. on a desktop computer, was clearly unsasfactory as the singular values decayed very slowly,from 63.31 for the ﬁrst coordinate to 37.4 for the 250 th one. According to this computaon, the ﬁrst few neighbors of rapsodewould be rimailleur, rimeur, versiﬁcateur, métromane, ﬁls d'Apollon, favori des Muses, favori du Parnasse, nourrisson des Muses,héros du Pinde, mâche‐laurier, all with similaries extremely close to 1.0. While clearly in the right neighborhood, this seems tobe of limited usefulness; note however that in pracce a restricon to the ﬁrst two or three largest singular values may oen yielduseful informaon . Word order is not considered in this publicaon, but it should be menoned for completeness that neural networks are oenused in natural language processing to encode word sequences (see for an extended review and bibliography). In a recentpublicaon, Mikolov et al. have introduced two novel model architectures for compung connuous vector representaons ofwords from very large data sets. They report large improvements in accuracy at a computaonal cost which is sll substanal, butthat they claim is much lower than previous architectures. An interesng consideraon is that, according to Mikolov et al. , thelearned vectors explicitly encode many linguisc regularies and paerns. D i d i t i d i,k ∩ D i D k = 0 d i,k ( i , k ) i d i .6. Random vectors and random projecon points out that intuions valid in a low‐dimensionality space may be totally misleading in a high‐dimensionalityspace. For example, a set of points picked at random from the unit ball (3)will have some signiﬁcant fracon near the origin, say within distance if , but this fracon becomes rapidly vanishinglysmall as the dimension becomes large; for example for , .Another useful remark is that while obviously one cannot create more than orthogonal vectors in a space of dimension , onecan create an exponenally large number of vectors quasi‐orthogonal to each other; in other words , a set of vectors picked at random will with high probability be quasi‐orthogonal, i.e. have angles of with each others. The seedvectors referred to below will be selected from such a set While an orthogonal projecon will in general reduce the average distance between points, it is also known, as shown by Johnsonand Lindenstrauss in an oen cited paper , that distances may be almost perfectly preserved for any points in an arbitrarynumber of dimensions when projected to a random subspace of dimension.2.6.2. Building random vectorsThe comparavely recent method of random projecon , , is based on these three preceding remarks and proceeds asfollows:1. Uniquely associate with each term a random seed vector having independent coordinates;2. Associate with each clique the vector where refers to the set of terms found in clique and is a funcon of the number of occurrences in of term and of the weights associated with ;3. Finally, associate with each term the (suitably weighted) sum of the vectors of the cliques in which appears, where refers to the cliques containing . In what follows, we shall assume without loss of generalitythat the term vectors are normalized to unity.Obviously, each term vector is now embedded in a ‐dimensional Euclidean semanc space and the similarity betweenterms and is the scalar product of the associated term vectors: (4)It is easy to see that ranges from ‐1 to 1. It is somemes more convenient to consider the distance which is related to thesimilarity by and ranges from (same and ) to (exactly opposite terms; note however that owing tothe extreme sparsity of a high‐dimensional space, the neighborhood exactly opposite a term is in pracce always empty.)2.6.3. Locality propertyBuilding a term vector by the process described above involves only the terms pertaining to the set deﬁned in secon 2.2. Itis thus a purely local process: updang the semanc space requires only a few ten or a few hundred operaons, orders ofmagnitude less than its inial generaon (provided small changes to the weights are neglected, which is usually acceptable asthey are logarithmic in term frequency and inverse document frequency.)This does not imply that the similarity of term with term is zero, even though , since and may well have neighbors in common. For example caroe and fraude have a degree of separaon of 2 but a similarity of 0.364.The seed vectors are not quite orthogonal and the scalar product will usually be small, but not zero. Thus, even foruncorrelated term vectors and , their similarity will usually be small but non‐zero. This induce an unavoidable noisewhich is studied below in some detail. { x ∈ V : ∥ x ∥ < 1}1/2 d = 3 d d = 250 = ≈ 5.5 × d −76 d d exp( O ( d )) ϵ

90 ± ϵ . S d nd O (log n ) t i ∈ s i S d dc k = C k ∑ i ∈ c k ρ ki s i i ∈ c k c k ρ ki c k t i t i t i t i = T i ∑ k ∋ t i C k k ∋ t i t i T i T i d σ ij t i t j = ⟨ | ⟩ σ ij T i T j σ ij D ij = D ij σ ij − −−−−−−− √ t i t k T i D i t i ∉ t k D i ∉ ⟹ ∉ t k D i t i D k t i t k s i ⟨ | ⟩ s i s j T i T j σ ij A normalized seed vector embedded in a ‐dimensional space has coordinates of which are , are and are . Having the same number of posive and negave coordinates ensures that the scalar product of two seed vectorsis 0 on the average. As seed vectors need to be very close to orthogonal with each others, the number of non‐zerocoeﬃcients must be substanally smaller than the dimension .The number of available, disnct seed vectors is the product of the number of combinaons of non‐zero coordinatesamongst coordinates, mes the number of ways of distribung posive and negave coordinates amongst thesenon‐zero coordinates : (5)In pracce, should be much larger than the number of disnct terms to guarantee a negligible collision probability (i.e. twodisnct terms having the same seed vector). This condion is already amply met with as when a dimension is selected.Given and and nong for simplicity , the probability of an overlap between two randomlyselected seed vectors is (6)When two randomly selected, normalized seed vectors have an overlap of non‐zero coordinates, their scalar products will bearranged symmetrically around zero and vary in discrete steps of . An overlap of will generate the two scalars and with probabilies , an overlap of 2 will generate with probability , with probability , and with probability ; more generally, an overlap will generate the scalar with the probability (7)where the factors are restricted to integer values and by virtue of the identy .It can be seen from equaons 6 and 7 that the noise decreases more or less linearly with Theorecal and experimental scalarproducts of two seed vectors, as computed from equaons 6 and 7, are ploed in the next ﬁgure (next page) where:The light vercal lines are increments of , the heavier vercal lines are at 0, and .The two horizontal lines are at 1.0 and 0.136.The red dots are computed by taking the scalar products of 1,000,000 'term vectors' each synthesized by the addion of 5random seed vectors. If instead we do the stascs directly on seed vectors, the result is unchanged except that the dotsnow occur only at mulples of 0.01 and the total is accordingly 5 mes larger.The black dots are stascs over 40,000 points, starng with 10,000, taken from the tail of the neighbors of an arbitraryterm (here rapsode.)The purple dots are a Gaussian with a standard deviaon of d d d − 2 m m +1/ 2 m −−−√ m −1/ 2 m −−−√ 2 md ( ) d m md ( ) mm m m ( d , m ) = ( ) × ( ) N seed d m mmN seed m ≥ 4 (250, 4) ≈ 2.4 × N seed d = 250 d m p = 2 m / d ( v , d ) P overlap v ( v , d ) = ( ) × × (1 − pP overlap mv p v ) (2 m − v ) v m m −1/2 m m m v s /2 m ( v , s ) = ( ) × P scalar vq s − v = ( v + s )/2 q s ( ) ≡ ∑ vk =− v kq k v d .0.01 σ −0.1 σ +0.1 σ Figure 1 ‐ Slice Noise

Obviously, the size of the database of the term vectors is itself linearly dependent on the dimension; for , each vectoroccupies 10 kB if a single coordinate is represented by a 4‐byte ﬂoang point number. A database of 1,000,000 disnct termswould thus occupy 10 GB with this elementary data structure; however, for many applicaons, will be suﬃcient and/ormore sophiscated data structures may be implemented. Computaon mes will also increase more or less linearly with ,because they mostly involve stepping through all the dimensions.

The following four ﬁgures have been constructed by compiling eight independent databases of term vectors for each ofthe four indicated couples; 跀‐idf stascal weights were used. Typically, on a small desktop computer, the compilaonme is 3 to 4 seconds for and 20 to 30 seconds for . Abscissas are proporonal to the logarithm of the neighbor's rank (From 1 for maison in the upper right corner to 1000 in thelower le corner) and ordinates are the similaries to maison. For a given neighbor, the horizontally aligned red dots representthe eight scalars computed from the eight databases and the thicker, black dot is the average value of the eight scalars.The neighbors are arranged in non‐increasing order of their with maison. Even though the diameter of maison as deﬁned in secon 2.2 is only 98, there are several hundred signiﬁcant neighbors:while most close neighbors belong to the set , many lie at more than one degree of separaon from each others (theircliques are separated by more than one vertex). Also, it can be seen, as expected, that the noise is inversely proporonal to but not very dependent on (in fact, the only noceable eﬀect of a lower is that there are more outliers) and that a standarddeviaon of about is not unrealisc for . T i d = 2500 d = 250 d

54, 685( m , d ) d = 250 d = 2500 σ i σ avg σ avg d maison D maison d √ m m d = 2500 igure 2 ‐ Neighbors of maison scalar rank Figure 3 ‐ Neighbors of maison scalar rank

Figure 4 ‐ Neighbors of maison scalar rank

Figure 5 ‐ Neighbors of maison scalar rank

In what follows, unless otherwise noted, we'll use and . The size of the ﬁle containing the 54,685 termvectors is then 547,724,964 bytes, including some overhead. The number of available seed vectors being the riskof collision is totally negligible.The 100 ﬁrst neighbors of maison are listed by decreasing similarity in table 1 next page. It is clear that the proximity decreaseswith , but that those neighboring words are all reasonably close to maison in its various meanings.

Table 1 ‐ First 100 neighbors of maison

From 1 to 20 From 21 to 40 From 41 to 60 From 61 to 80 From 81 to 1001 1.000 maison 21 0.499 chez‐soi 41 0.309 ménage 61 0.207 ermitage 81 0.163 domesque2 0.843 demeure 22 0.498 cassine 42 0.307 bâment 62 0.204 appartement 82 0.162 plaque_de_blindage3 0.820 habitaon 23 0.491 cabane 43 0.306 case 63 0.198 cagna 83 0.159 mas4 0.810 logis 24 0.486 gourbi 44 0.301 taudis 64 0.191 reposée 84 0.155 tanière5 0.767 domicile 25 0.480 gîte 45 0.300 chalet 65 0.190 chartreuse 85 0.154 cache6 0.762 pénates 26 0.477 bercail 46 0.292 pavillon 66 0.188 domescité 86 0.152 niche7 0.677 home 27 0.460 masure 47 0.292 villa 67 0.187 garde‐meubles 87 0.151 rendez‐vous_de_chasse8 0.669 mesnil 28 0.436 asile 48 0.287 chaumière 68 0.186 gabionnade 88 0.151 repaire9 0.661 chacunière 29 0.428 château 49 0.278 manse 69 0.182 lignée 89 0.149 clinique10 0.659 foyer 30 0.404 immeuble 50 0.275 hôtel_parculier 70 0.180 caponnière 90 0.148 gloriee11 0.653 train_de_maison 31 0.403 hue 51 0.265 isba 71 0.172 fermee 91 0.147 havre12 0.603 logement 32 0.385 maisonnee 52 0.254 bas‐lieu 72 0.171 grand_ensemble 92 0.147 lapinière13 0.596 maisonnée 33 0.382 ménil 53 0.249 manoir 73 0.171 lieu 93 0.146 kiosque14 0.593 nid 34 0.374 abri 54 0.244 hôtel 74 0.171 famille 94 0.146 intérieur15 0.576 résidence 35 0.369 galetas 55 0.241 standing 75 0.167 garde‐meuble 95 0.144 bouverie16 0.572 toit 36 0.368 lares 56 0.240 H.L.M. 76 0.167 deck‐house 96 0.144 parents17 0.545 bicoque 37 0.350 lare 57 0.233 habitacle 77 0.166 habitat 97 0.143 tourelle18 0.537 bâsse 38 0.338 clapier 58 0.223 retraite 78 0.166 édiﬁce 98 0.143 mantelet19 0.532 cahute 39 0.331 palais 59 0.211 carbet 79 0.164 ﬁrme 99 0.142 hangar20 0.502 baraque 40 0.314 train_de_vie 60 0.208 séjour 80 0.163 tranchée‐abri 100 0.141 pigeonnier m = 5 d = 250 m = 10 d = 2500 m = 40 d = 250 m = 50 d = 2500 d = 2500 m = 50 ≈ 2.0 × 10 σ .2. Similarity matrices and clusterizaon It is also straigh跀orward to build a similarity matrix (see table 2) and to use such matrices to group terms by clusters, i.e. lists ofterms which do not all belong to the same clique, but which are closely related semancally. We use nearest‐neighbor clusteringin this work.

Table 2 ‐ Similarity matrix chacunière 1.000 mesnil 0.665 1.000 train_de_maison 0.657 0.659 1.000 maisonnée 0.546 0.554 0.560 1.000 demeure 0.456 0.470 0.449 0.394 1.000 habitaon 0.414 0.433 0.403 0.344 0.852 1.000 maison 0.661 0.669 0.653 0.596 0.843 0.820 1.000 pénates 0.481 0.496 0.481 0.417 0.837 0.605 0.762 1.000 logement 0.257 0.278 0.250 0.223 0.796 0.724 0.603 0.616 1.000 domicile 0.457 0.464 0.462 0.394 0.780 0.670 0.767 0.620 0.588 1.000 logis 0.507 0.514 0.506 0.447 0.795 0.666 0.810 0.706 0.623 0.892 1.000 résidence 0.305 0.310 0.308 0.262 0.727 0.609 0.576 0.511 0.554 0.840 0.627 1.000

In table 3, the headers are the members of the original cliques including maison, grouped in semancally homogeneous clusters,and the associated lists are terms similar with to the center of mass of their header. Terms in blue are from the originalcliques, terms in gray are repeats from a previous cluster, and the others could reasonably be aggregated to their head cluster,especially at similaries above

Table 3 ‐ Clusters around maison and their cohortschacunière, mesnil, train_de_maison, maisonnée, demeure, habitaon, pénates, maison, logement, domicile, logis, résidenceabri, clapier, gîte, nid, asile, retraite, bercail, toit, foyer, habitaclebaraque, bicoque, cahute, cabane, hue, gourbi, masure, case, cassine, chaumière, maisonneebas‐lieu, naissance, origine, descendance, famille, lignée, race, parents, chez‐soi, home, intérieur, ménil, lare, lares, ménage, standing,train_de_vieappartement, bouge, taudis, galetas, chalet, pavillon, villa, château, manoir, palais, réduitbuilding, édiﬁce, bâment, immeuble, construcon, bâsse, hôtel, campagne, propriété, fermeboîte, entreprise, ﬁrme, établissement, prison, commerce, temple, instut, instuon, branche, couvert, domescité, serviteur, domesque,gens, monde, suiteclinique, hôpital, nom, couronne, trône, pigeonnier, lieu, place, séjour, feu

Things get more complicated when two homonyms are semancally disjoint, as is the case with le barde and la barde: σ > 0.250.35. able 4 ‐ First 100 neighbors of barde From 1 to 20 From 21 to 40 From 41 to 60 From 61 to 80 From 81 to 1001 1.000 barde 21 0.298 versiﬁcateur 41 0.167 croque‐notes 61 0.097 vicmaire 81 0.080 injurié2 0.839 aède 22 0.297 mâche‐laurier 42 0.157 choriste 62 0.096 ﬂamine 82 0.079 septemvir3 0.717 tranche_de_lard 23 0.290 héros_du_Pinde 43 0.148 harnais 63 0.096 prestolet 83 0.079 lama4 0.625 chantre 24 0.290 favori_des_Muses 44 0.147 prêtre 64 0.094 iman 84 0.078 salien5 0.533 poète 25 0.289 amant_du_Parnasse 45 0.146 cigale 65 0.094 mui 85 0.078 brachyne6 0.521 chanteur 26 0.286 favori_du_Parnasse 46 0.124 coryphée 66 0.094 rachon 86 0.077 ménestrier7 0.503 rhapsode 27 0.285 nourrisson_du_Parnasse 47 0.119 trouveur 67 0.093 ovate 87 0.076 curé8 0.493 bardit 28 0.285 poétereau 48 0.114 corybante 68 0.093 utopiste 88 0.076 talapoin9 0.468 trouvère 29 0.284 métromane 49 0.110 luperque 69 0.091 ministre_du_culte 89 0.075 épulon10 0.450 scalde 30 0.284 citharède 50 0.110 muezzin 70 0.090 pope 90 0.074 mere_dans_le_même_sac11 0.427 troubadour 31 0.282 crooner 51 0.109 druide 71 0.090 mystagogue 91 0.074 immodérément12 0.421 minnesinger 32 0.276 félibre 52 0.105 eubage 72 0.090 cantatrice 92 0.074 sacriﬁcateur13 0.328 nourrisson_du_Pinde 33 0.274 due迀ste 53 0.105 parolier 73 0.089 archiprêtre 93 0.074 bombardier14 0.323 amant_des_Muses 34 0.267 rapsode 54 0.103 quindecemvir 74 0.086 sous‐ventrière 94 0.073 lévite15 0.315 lamelle 35 0.256 ménestrel 55 0.103 mollah 75 0.085 abbé 95 0.073 englober16 0.313 favori_d'Apollon 36 0.240 rimeur 56 0.103 padre 76 0.084 curète 96 0.072 chiennerie17 0.313 ﬁls_d'Apollon 37 0.233 rimailleur 57 0.102 hiérogrammate 77 0.084 papas 97 0.072 rabbin18 0.308 nourrisson_des_Muses 38 0.216 panne 58 0.100 saronide 78 0.082 avarice 98 0.072 passivité19 0.302 enfant_d'Apollon 39 0.207 choreute 59 0.097 chansonnier 79 0.081 quindécemvir 99 0.072 capelan20 0.300 maître_du_Pinde 40 0.176 vocaliste 60 0.097 chapelain 80 0.080 oﬃciant 100 0.072 eschatologique

If we meant barde as aède, the third neighbor, tranche_de_lard is clearly not appropriate, and conversely.However, in a Euclidean space, the Schmidt orthogonalizaon procedure does remove this kind of interference. Since termvectors are normalized to unity, one needs simply to subtract from the vector the collinear component of the vector : (8)with the following result, where the perturbaon due to tranche_de_lard is totally eliminated: Table 5 ‐ First 100 neighbors of barde orthogonalized w.r.t. tranche_de_lard

From 1 to 20 From 21 to 40 From 41 to 60 From 61 to 80 From 81 to 1001 0.744 aède 21 0.396 poétereau 41 0.217 croque‐notes 61 0.162 harnais 81 0.130 lama2 0.697 barde 22 0.396 maître_du_Pinde 42 0.215 lamelle 62 0.161 prestolet 82 0.130 papas3 0.634 chantre 23 0.394 mâche‐laurier 43 0.205 coryphée 63 0.161 curète 83 0.126 soliste4 0.634 scalde 24 0.393 favori_des_Muses 44 0.188 cantatrice 64 0.157 ministre_du_culte 84 0.124 talapoin5 0.629 poète 25 0.389 nourrisson_du_Parnasse 45 0.185 quindecemvir 65 0.157 chapelain 85 0.121 directeur_de_conscience6 0.622 chanteur 26 0.389 amant_du_Parnasse 46 0.183 mystagogue 66 0.157 eubage 86 0.120 utopiste7 0.582 minnesinger 27 0.387 héros_du_Pinde 47 0.178 luperque 67 0.156 mui 87 0.120 prêtraille8 0.548 trouvère 28 0.380 félibre 48 0.177 padre 68 0.155 saronide 88 0.115 ceinture_de_sécurité9 0.525 rhapsode 29 0.380 rapsode 49 0.177 mollah 69 0.153 trouveur 89 0.113 curé10 0.523 troubadour 30 0.364 versiﬁcateur 50 0.175 muezzin 70 0.152 ovate 90 0.113 sacriﬁcateur11 0.429 nourrisson_du_Pinde 31 0.337 ménestrel 51 0.173 hiérogrammate 71 0.144 parolier 91 0.109 rabbin12 0.415 crooner 32 0.333 métromane 52 0.172 rachon 72 0.140 chansonnier 92 0.106 salien13 0.414 amant_des_Muses 33 0.319 choreute 53 0.172 quindécemvir 73 0.139 hiérophante 93 0.105 aumônier14 0.414 nourrisson_des_Muses 34 0.292 rimeur 54 0.171 corybante 74 0.139 pope 94 0.105 sous‐ventrière15 0.409 favori_d'Apollon 35 0.287 rimailleur 55 0.170 archiprêtre 75 0.137 diva 95 0.105 capelan16 0.408 citharède 36 0.282 bardit 56 0.169 druide 76 0.137 épulon 96 0.105 aﬀublement17 0.405 due迀ste 37 0.242 choriste 57 0.168 vicmaire 77 0.135 abbé 97 0.104 virtuose18 0.404 favori_du_Parnasse 38 0.231 vocaliste 58 0.168 cigale 78 0.134 septemvir 98 0.104 ménestrier19 0.400 ﬁls_d'Apollon 39 0.224 panne 59 0.165 ﬂamine 79 0.133 musicien 99 0.102 exécutant20 0.397 enfant_d'Apollon 40 0.218 prêtre 60 0.165 iman 80 0.131 oﬃciant 100 0.096 ténor

The number of terms which can be subtracted is only limited by the noise. | barde ⟩| tranche_de_lard ⟩ = | barde ⟩ − ⟨ barde | tranche_de_lard ⟩ × | tranche_de_lard ⟩| barde ⟩ ⊥ tranche _ de _ lard The corresponding clusters associated with now are:Table 6 ‐ Clusters around barde and their cohorts (orthogonalized w.r.t. tranche_de_lard) aède, barde, poète, chanteur, chantrebardit, harnais, prêtre, lamelle, panne, rhapsode, troubadour, trouvèreto be compared to the non‐orthogonalized result:Table 7 ‐ Clusters around barde and their cohortsaède, barde, chanteur, chantre, poètebardit, tranche_de_lard, lamelle, rhapsode, troubadour, trouvèreharnais, panne, prêtre

Conclusion and future work