Categorical Vector Space Semantics for Lambek Calculus with a Relevant Modality (Extended Abstract)
Lachlan McPheat, Mehrnoosh Sadrzadeh, Hadi Wazni, Gijs Wijnholds
DDavid I. Spivak and Jamie Vicary (Eds.):Applied Category Theory 2020 (ACT2020)EPTCS 333, 2021, pp. 168–182, doi:10.4204/EPTCS.333.12 © L. McPheat, M. Sadrzadeh, H. Wazni & G. WijnholdsThis work is licensed under theCreative Commons Attribution License.
Categorical Vector Space Semantics forLambek Calculus with a Relevant Modality(Extended Abstract)
Lachlan McPheat Mehrnoosh Sadrzadeh
University College LondonLondon, UK { m.sadrzadeh,l.mcpheat } @ucl.ac.uk Hadi Wazni
Queen Mary University LondonLondon, UK [email protected]
Gijs Wijnholds
Utrecht UniversityUtrecht, NL [email protected]
We develop a categorical compositional distributional semantics for Lambek Calculus with a Rel-evant Modality, ! L ∗ , which has a limited version of the contraction and permutation rules. Thecategorical part of the semantics is a monoidal biclosed category with a coalgebra modality as de-fined on Differential Categories. We instantiate this category to finite dimensional vector spaces andlinear maps via “quantisation” functors and work with three concrete interpretations of the coalge-bra modality. We apply the model to construct categorical and concrete semantic interpretations forthe motivating example of ! L ∗ : the derivation of a phrase with a parasitic gap. The effectiveness ofthe concrete interpretations are evaluated via a disambiguation task, on an extension of a sentencedisambiguation dataset to parasitic gap phrases, using BERT, Word2Vec, and FastText vectors andRelational tensors. Distributional Semantics of natural language are semantics which model the
Distributional Hypothe-sis due to Firth [11] and Harris [18] which assumes a word is characterized by the company it keeps .Research in Natural Language Processing (NLP) has turned to Vector Space Models (VSMs) of naturallanguage to accurately model the distributional hypothesis. Such models date as far back as to Rubinsteinand Goodenough’s co-occurence matrices [35] in 1965, until today’s neural machine learning methods,leading to embeddings, such as Word2Vec [40], GloVe [32], FastText [6] or BERT [10] to name a few.VSMs were used even earlier by Salton [38] for information retrieval. These models have plenty of ap-plications, for instance thesaurus extraction tasks [9, 17], automated essay marking [23] and semanticallyguided information retrieval [24]. However, they lack grammatical compositionality, thus making it dif-ficult to sensibly reason about the semantics of portions of language larger than words, such as phrasesand sentences.Somewhat orthogonally, Type Logical Grammars (TLGs) form highly compositional models of lan-guage by accurately modelling grammar, however they lack distributionality, in that such models do notaccurately describe the distributional semantics of a word, only its grammatical role. Distributional Com-positional Categorical Semantics (DisCoCat)[8] combines these two approaches using category theoreticmethods, originally developed to model Quantum protocols. DisCoCat has proven its efficacy empiri-cally [15, 16, 37, 43, 21, 27] and has the added utility of being a modular framework which is open to . McPheat, M. Sadrzadeh, H. Wazni & G. Wijnholds , denoted by L . The work in [20] extends Lambek calculus with a rele-vant modality, and denotes the resulting logic by ! L ∗ . As an example application domain, they use thenew logic to formalise the grammatical structure of the parasitic gap phenomena in natural language.In this paper, we first form a sound categorical semantics of ! L ∗ , which we call C ( ! L ∗ ) . This boilsdown to interpreting the logical contraction of ! L ∗ using comonads known as coalgebra modalities de-fined in [4]. In order to facilitate the categorical computations, we use the clasp-string calculus of [2],developed for depicting the computations of a monoidal biclosed category. To this monoidal diagram-matic diagrammatic calculus, we add the necessary new constructions for the coalgebra modality andits operations. Next, we define three candidate coalgebra modalities on the category of finite dimen-sional real vector spaces in order to form a sound VSM of ! L ∗ in terms of structure-preserving functors C ( ! L ∗ ) → FdVect R . We also briefly introduce a prospective diagrammatic semantics of C ( ! L ∗ ) to helpvisualise our derivations. We conclude this paper with an experiment to test the accuracy of the differentcoalgebra modalitites on FdVect R . The experiment is performed using different neural word embed-dings and on a disambiguation task over an extended version of dataset of [16] from transitive sentencesto phrases with parasitic gaps.This paper is an extended abstract of the full arXiv paper [25]. ! L ∗ : Lambek Calculus with a Relevant Modality Following [20], we assume that the formulae, or types, of Lambek calculus with a Relevant Modality! L ∗ are generated by a set of atomic types At, a unary connective !, three binary connectives, \ , / and , via the following Backus-Naur Form (BNF). ϕ :: = ϕ ∈ At | /0 | ( ϕ , ϕ ) | ( ϕ / ϕ ) | ( ϕ \ ϕ ) | ! ϕ , We refer to the types of ! L ∗ by Typ ! L ∗ ; here, /0 denotes the empty type. An element of Typ ! L ∗ is eitheratomic, made up of a modal type, or two types joined by a comma or a slash. We will use uppercaseroman letters to denote arbitrary types of ! L ∗ , and uppercase Greek letters to denote a set of types, forexample, Γ = { A , A , . . . , A n } = A , A , . . . , A n . It is assumed that , is associative, allowing us to omitbrackets in expressions like A , A , . . . , A n .A sequent of ! L ∗ is a pair of an ordered set of types and a type, denoted by Γ (cid:96) A . The deriva-tions of ! L ∗ are generated by the set of axioms and rules presented in table 1. The logic ! L ∗ extendsLambek Calculus L by endowing it with a modality denoted by !, inspired by the ! modality of LinearLogic, to enable the structure rule of contraction in a controlled way, although here it is introduced ona non-symmetric monoidal category but is introduced with an extra structure allowing the !-ed types tocommute over other types. So what ! L ∗ adds to L is the ( ! L ) , ( ! R ) rules, the ( perm ) rules, and the ( contr ) rule. There is a parallel pregroup syntax which gives you the same semantics, as discussed in [5] Vector Semantics for ! L ∗ A (cid:96) A Γ (cid:96) A ∆ , B , ∆ (cid:96) C ( / L ) ∆ , B / A , Γ , ∆ (cid:96) C Γ , A (cid:96) B ( / R ) Γ (cid:96) B / A Γ (cid:96) A ∆ , B , ∆ (cid:96) C ( \ L ) ∆ , Γ , A \ B , ∆ (cid:96) C A , Γ (cid:96) B ( \ R ) Γ (cid:96) A \ B Γ , A , Γ (cid:96) C ( ! L ) Γ , ! A , Γ (cid:96) C ! A , . . . , ! A n (cid:96) B ( ! R ) ! A , . . . , ! A n (cid:96) ! B ∆ , ! A , Γ , ∆ (cid:96) C ( perm ) ∆ , Γ , ! A , ∆ (cid:96) C ∆ , Γ , ! A , ∆ (cid:96) C ( perm ) ∆ , ! A , Γ , ∆ (cid:96) C ∆ , ! A , ! A , ∆ (cid:96) C ( contr ) ∆ , ! A , ∆ (cid:96) C Table 1: Rules of ! L ∗ . ! L ∗ We associate ! L ∗ with a category C ( ! L ∗ ) , with Typ ! L ∗ as objects, and derivable sequents of ! L ∗ as mor-phisms whose domains are the formulae on the left of the turnstile and codomains the formulae on theright. The category C ( ! L ∗ ) is monoidal biclosed . The connectives , and \ , / in ! L ∗ are associated withthe monoidal structure on C ( ! L ∗ ) , where , is the monoidal product, with the empty type as its unit and \ , / are associated with the two internal hom functors with respect to , , as presented in [39]. The connective! of ! L ∗ is a coalgebra modality, as defined for Differential Categories in [4], with the difference thatour underlying category is not necessarily symmetric monoidal, but we ask for a restricted symmetrywith regards to ! and that ! be a lax monoial functor. In Differential Categories ! does not necessarilyhave a monoidal property, i.e. it is not a strict, lax, or strong monoidal functor, but there are examples ofDifferential Categories where strong monoidality holds. We elaborate on these notions via the followingdefinition.
Definition 1.
The category C ( ! L ∗ ) has types of ! L ∗ , i.e. elements of Typ L , as objects, derivable sequentsof ! L ∗ as morphisms, together with the following structures: • A monoidal product ⊗ : C ( ! L ∗ ) × C ( ! L ∗ ) → C ( ! L ∗ ) , with a unit I. • Internal hom-functors ⇒ : C ( ! L ∗ ) op × C ( ! L ∗ ) → C ( ! L ∗ ) , ⇐ : C ( ! L ∗ ) × C ( ! L ∗ ) op → C ( ! L ∗ ) suchthat:i. For objects A , B ∈ C ( ! L ∗ ) , we have objects ( A ⇒ B ) , ( A ⇐ B ) ∈ C ( ! L ∗ ) and a pair of mor-phisms, called right and left evaluation , given below: ev rA , ( A ⇒ B ) : A ⊗ ( A ⇒ B ) −→ A , ev l ( A ⇐ B ) , B : ( A ⇐ B ) ⊗ B −→ Aii. For morphisms f : A ⊗ C −→ B , g : C ⊗ B −→ A, we have unique right and left curried mor-phisms, given below: Λ l ( f ) : C −→ ( A ⇒ B ) , Λ r ( g ) : C −→ ( A ⇐ B ) iii. The following hold ev lA , B ◦ ( id A ⊗ Λ l ( f )) = f , ev rA , B ◦ ( Λ r ( g ) ⊗ id B ) = g We follow the convention that products are not symmetric unless stated, hence a monoidal product is not symmetric unlessreferred to by ‘symmetric monoidal’. . McPheat, M. Sadrzadeh, H. Wazni & G. Wijnholds A coalgebra modality ! on C ( ! L ∗ ) . That is, a lax monoidal comonad ( ! , δ , ε ) such that:For every object A ∈ C ( ! L ∗ ) , the object ! A has a comonoid structure ( ! A , ∆ A , e A ) in C ( ! L ∗ ) .Where the comultiplication ∆ A :! A → ! A ⊗ ! A, and the counit e A :! A → I satisfy the usualcomonoid equations. Further, we require δ A :! A → !! A to be a morphism of comonoids [4]. • Restricted symmetry over the coalgebra modality, that is, natural isomorphisms σ r : 1 C ( ! L ∗ ) ⊗ ! → ! ⊗ C ( ! L ∗ ) and σ l :! ⊗ C ( ! L ∗ ) → C ( ! L ∗ ) ⊗ ! . σ rA , B : A ⊗ ! B (cid:55)−→ ! B ⊗ A , σ lA , B :! A ⊗ B (cid:55)−→ B ⊗ ! A . We now define a categorical semantics for ! L ∗ as the map (cid:74) (cid:75) : ! L ∗ → C ( ! L ∗ ) and prove that it is sound. Definition 2.
The semantics of formulae and sequents of ! L ∗ is the image of the interpretation map (cid:74) (cid:75) : ! L ∗ → C ( ! L ∗ ) . To elements ϕ in Typ L , this map assigns objects C ϕ of C ( ! L ∗ ) , as defined below: (cid:74) /0 (cid:75) : = C /0 = I (cid:74) ϕ (cid:75) : = C ϕ (cid:74) ( ϕ , ϕ ) (cid:75) : = C ϕ ⊗ C ⊗ (cid:74) ! ϕ (cid:75) : = ! C ϕ (cid:74) ( ϕ / ϕ ) (cid:75) : = ( C ϕ ⇐ C ϕ ) (cid:74) ( ϕ \ ϕ ) (cid:75) : = ( C ϕ ⇒ C ϕ ) To the sequents Γ (cid:96) A of ! L ∗ , for Γ = { A , A , · · · A n } where A i , A ∈ Typ L , it assigns morphism of C ( ! L ∗ ) as follows (cid:74) Γ (cid:96) A (cid:75) : = C Γ −→ C A , for C Γ = (cid:74) A (cid:75) ⊗ (cid:74) A (cid:75) ⊗ · · · ⊗ (cid:74) A n (cid:75) . Since sequents are not labelled, we have no obvious name for the linear map (cid:74) Γ (cid:96) A (cid:75) , so we will labelsuch morphisms by lower case roman letters as needed. Definition 3. A categorical model for ! L ∗ , or a ! L ∗ -model , is a pair ( C , (cid:74) (cid:75) C ) , where C is a monoidalbiclosed category with a coalgebra modality and restricted symmetry, and (cid:74) (cid:75) C is a mapping Typ ! L ∗ → C factoring through (cid:74) (cid:75) : Typ L → C ( ! L ∗ ) . Definition 4.
A sequent Γ (cid:96) A of ! L ∗ is sound in ( C ( ! L ∗ ) , (cid:74) (cid:75) ) , iff C Γ −→ C A is a morphism of C ( ! L ∗ ) . Arule Γ (cid:96) A ∆ (cid:96) B of ! L ∗ is sound in ( C ( ! L ∗ ) , (cid:74) (cid:75) ) iff whenever C Γ −→ C A is sound then so is C ∆ −→ C B . We say ! L ∗ is sound with regards to ( C ( ! L ∗ ) , (cid:74) (cid:75) ) iff all of its rule are. Theorem 1. ! L ∗ is sound with regards to ( C ( ! L ∗ ) , (cid:74) (cid:75) ) .Proof. See full paper [25]. C ( ! L ∗ ) Following [5], we develop vector space semantics for ! L ∗ , via a quantisation functor to the category offinite dimensional vector spaces and linear maps F : C ( ! L ∗ ) → FdVect R . This functor interprets objectsas finite dimensional vector spaces, and derivations as linear maps. Quantisation is the term first intro-duced by Atiyah in Topological Quantum Field Theory, as a functor from the category of manifolds andcobordisms to the category of vector spaces and linear maps. Since the cobordism category is monoidal,quantisation was later generalised to refer to a functor that ‘quantises’ any category in FdVect R . Since C ( ! L ∗ ) is free, there is a unique functor C ( ! L ∗ ) → ( FdVect R , ! ) for any choice of ! such that ( FdVect R , ! ) is a ! L ∗ -model. In definition 5 we introduce the necessary nomenclature to define quantisations in full. Strictly speaking, this definition applies to symmetric monoidal categories, however we may abuse notation without wor-rying, as we have symmetry in the image of ! coming from the restricted symmetries σ l , σ r . Vector Semantics for ! L ∗ Definition 5. A quantisation is a functor F : C ( ! L ∗ ) → ( FdVect R , ! ) , defined on the objects of C ( ! L ∗ ) using the structure of the formulae of ! L ∗ , as follows:F ( C /0 ) : = R F ( C ϕ ) : = V ϕ F ( C ϕ ⊗ ϕ ) : = V ϕ ⊗ V ϕ F ( C ! ϕ ) : = ! V ϕ F ( C ϕ / ϕ ) : = ( V ϕ ⇐ V ϕ ) F ( C ϕ \ ϕ ) : = ( V ϕ ⇒ V ϕ ) Here, V ϕ is the vector space in which vectors of words with an atomic type live and the other vectorspaces are obtained from it by induction on the structure of the formulae they correspond to. Morphismsof C ( ! L ∗ ) are of the form C Γ −→ C A , associated with sequents Γ (cid:96) A of ! L ∗ , for Γ = { A , A , · · · , A n } .The quantisation functor is defined on these morphisms as follows:F ( C Γ −→ C A ) : = F ( C Γ ) −→ F ( C A ) = V A ⊗ V A ⊗ · · · ⊗ V A n −→ V A Note that the monoidal product in
FdVect R is symmetric, so there is formally no need to distinguishbetween ( (cid:74) A (cid:75) ⇒ (cid:74) B (cid:75) ) and ( (cid:74) B (cid:75) ⇐ (cid:74) A (cid:75) ) . However it may be practical to do so when calculating thingsby hand, for example when retracing derivations in the semantics. We should also make clear that thefreeness of C ( ! L ∗ ) makes F a strict monoidal closed functor, meaning that F ( C A ⊗ C B ) = FC A ⊗ FC B , orrather, V ( A ⊗ B ) = ( V A ⊗ V B ) , and similarly, V ( A ⇒ B ) = ( V A ⇒ V B ) etc. Further, since we are working withfinite dimensional vector spaces we know that V ⊥ ϕ ∼ = V ϕ , thus our internal homs have an even simplerstructure, which we exploit when computing, which is V ϕ ⇒ V ϕ ∼ = V ϕ ⊗ V ϕ . In this section we present three different coalgebra modalities on
FdVect R defined over two differentunderlying comonads, treated in individual subsections. Defining these modalities lets us reason aboutsound vector space semantics of C ( ! L ∗ ) in terms of !-preserving monoidal biclosed functors C ( ! L ∗ ) → FdVect R .We point out here that we do not aim for complete model in that we do not require the tensor of ourvector space semantics to be non-symmetric. This is common practice in the DisCoCat line of researchand also in the standard set theoretic semantics of Lambek calculus [41]. Consider the English sentence”John likes Mary” and the Farsi sentence ”John Mary-ra Doost-darad(likes)”. These two sentences havethe same semantics, but different word orders, thus exemplifying the lack of syntax within semantics. ! as the Dual of a Free Algebra Functor Following [34] we interpret ! using the Fermionic Fock space functor F : FdVect R → Alg R . In orderto define F we first introduce the simpler free algebra construction, typically studied in the theory ofrepresentations of Lie algebras [19]. It turns out that F is itself a free functor, giving us a comonadstructure on U F upon dualising [34]. The choice of the symbol F comes from “Fermionic Fock space”(as opposed to “Bosonic”), and is also known as the exterior algebra functor, or the Grassmannian algebrafunctor [19]. Definition 6.
The free algebra functor T : Vect R → Alg R is defined on objects as:V (cid:55)−→ (cid:77) l ≥ V ⊗ n = R ⊕ V ⊕ ( V ⊗ V ) ⊕ ( V ⊗ V ⊗ V ) ⊗ · · · . McPheat, M. Sadrzadeh, H. Wazni & G. Wijnholds and for morphisms f : V → W , we get the algebra homomorphism T ( f ) : T ( V ) → T ( W ) defined layer-wise as T ( f )( v ⊗ v ⊗ · · · ⊗ v n ) : = f ( v ) ⊗ f ( v ) ⊗ · · · ⊗ f ( v n ) . T is free in the sense that it is left adjoint to the forgetful functor U : Alg R → Vect R , thus giving us amonad U T on Vect R with a monoidal algebra modality structure, i.e. the dual of what we are looking for.However note that even when restricting T to finite dimensional vector spaces V ∈ FdVect R the resulting U T ( V ) and U T ( V ⊥ ) ⊥ are infinite-dimensional. The necessity of working in FdVect R motivates us touse F , defined below, rather than T . Definition 7.
The
Fermionic Fock space functor F : Vect R → Alg R is defined on objects asV (cid:55)→ (cid:77) n ≥ V ∧ n = R ⊕ V ⊕ ( V ∧ V ) ⊕ ( V ∧ V ∧ V ) ⊗ · · · where V ∧ n is the coequaliser of the family of maps ( − τ σ ) σ ∈ S n , defined as − τ σ : V ⊗ n → V ⊗ n and givenas follows: ( − τ σ )( v ⊗ · · · ⊗ v n ) : = sgn ( σ )( v σ ( ) ⊗ v σ ( ) ⊗ · · · ⊗ v σ ( n ) ) . F applied to linear maps gives an analogous algebra homomorphism as in 6. Concretely, one may define V ∧ n to be the n -fold tensor product of V where we quotient by the layer-wise equivalence relations v ⊗ v ⊗ · · · ⊗ v n ∼ sgn ( σ )( v σ ( ) ⊗ v σ ( ) ⊗ · · · ⊗ v σ ( n ) ) for n = , , . . . anddenoting the equivalence class of a vector v ⊗ v ⊗ · · · ⊗ v n by v ∧ v ∧ · · · ∧ v n .Note that simple tensors in V ∧ n with repeated vectors are zero. That is, if v i = v j for some 1 ≤ i , j ≤ n and i (cid:54) = j in the above, the permutation ( i j ) ∈ S n has odd sign, and so v ∧ v ∧ · · · ∧ v n =
0, since v ∧ v ∧ · · · ∧ v n = sgn ( i j )( v ∧ v ∧ · · · ∧ v n ) = − ( v ∧ v ∧ · · · ∧ v n ) . Remark 1.
Given a finite dimensional vector space V , the antisymmetric algebra F ( V ) is also finitedimensional. This follows immediately from the note in definition 7, as basis vectors in layers of F ( V ) above the dim ( V ) -th are forced to repeat entries. Remark 1 shows that restricting F to finite dimensional vector spaces turns U F into an endofunctoron FdVect R . We note that F is the free antisymmetric algebra functor [34] and conclude that U F is amonad ( U F , µ , η ) on FdVect R .Given F , there are two ways to obtain a comonad structure ( U F ( V ) , ∆ V , e V ) , thus define a coalgebramodality ( U F , δ , ε ) on FdVect R , as desired. One is referred to by Cogebra construction and is givenbelow, for a basis { e i } i of V , and thus a basis { , e i , e i ∧ e i , e i ∧ e i ∧ e i , · · · } i j of U F ( V ) as: ∆ ( , e i , e i ∧ e i , e i ∧ e i ∧ e i , · · · ) = ( , e i , e i ∧ e i , e i ∧ e i ∧ e i , · · · ) ⊗ ( , e i , e i ∧ e i , e i ∧ e i ∧ e i , · · · ) The map e V : U F ( V ) → V is given by projection onto the first layer, that is1 , e i , e i ∧ e i , e i ∧ e i ∧ e i , · · · (cid:55)−→ e i . Another coalgebra modality arises from dualising the monad U F , and the monoid structure on F ( V ) ,or strictly speaking on U F ( V ) . Following [7], we dualise U F to define a comonad structure on U F asfollows. We take the comonad comultiplication to be δ V : = µ ⊥ V : U F U F ( V ) ⊥ → U F ( V ) ⊥ , and thecomonad counit to be ε V : = η ⊥ V : U F ( V ) ⊥ → V ⊥ . To avoid working with dual spaces one may chose to One may wish to think of F as having codomain Aalg R , the category of antisymmetric R -algebras, which is itself asubcategory Aalg R (cid:44) → Alg R . Vector Semantics for ! L ∗ formally consider ! ( V ) : = U F ( V ⊥ ) ⊥ , as in [34], since U F ( V ⊥ ) ⊥ ∼ = U F ( V ) (although this is not strictlynecessary, we choose this notation to stay close to its original usage [34, 7]). Note that a dualising inthis manner only makes sense for finite dimensional vector spaces, as in general, for an arbitrary familyof vector spaces ( V i ) i ∈ I , we have ( (cid:76) i ∈ I V i ) ⊥ ∼ = ∏ i ∈ I ( V ⊥ i ) . Finite dimensionality of a vector space V makes the direct sum in U F ( V ) finite, making the right-hand product a direct sum, i.e. for a finiteindex I we have ( (cid:76) i ∈ I V i ) ⊥ ∼ = (cid:76) i ∈ I ( V ⊥ i ∈ I ) , meaning that we indeed have U F ( V ) ⊥ ∼ = U F ( V ⊥ ) . Thislets us dualise the monoid structure of U F ( V ) , giving a comonoid structure on U F ( V ) hence making U F into a coalgebra modality. To compute the comultiplication it suffices to transpose the matrix forthe multiplication on U F ( V ) . However, this is in general intractable, as for V an n dimensional space, U F ( V ) will have 2 n dimensions, and its multiplication will be a ( n ) × n -matrix. We leave workingwith a dualised comultiplication to another paper, but in the next subsection use this construction toobtain a richer copying than the Cogebra one mentioned above. ! as the Identity Functor The above Cogebra construction can be simplified when one works with free vector spaces, for detailsof which we refer to the full version of the paper [25]. The simplified version resembles half of abialgebra over
FdVect R , known as Special Frobenius bialgebras , which were used in [36, 28, 26] tomodel relative pronouns in English and Dutch. As argued in [42], however, the copying map resultingfrom this comonoid structure only copies the basis vectors and does not seem adequate for a full copyingoperation. In fact, a quick computation shows that this ∆ in a sense only half-copies of the input vector.In order to see this, consider a vector −→ v = ∑ i C i s i , for s i ∈ S . Extending the comultiplication ∆ linearlyprovides us with ∆ ( −→ v ) = ∑ i C i ∆ ( s i ) = ∑ i C i ( s i ⊗ s i ) = ( ∑ i C i s i ) ⊗ ( ∑ i s i ) = −→ v ⊗ (cid:126) , In the second term, we have lost the C i weights, in other words we have replaced the second copy with avector of 1’s, denoted by (cid:126) ∆ map is just one of a family ofcopying maps, parametrised by reals, where for any k ∈ R we may define the a Cofree-inspired comonoid ( V ϕ , ∆ k , e ) over a vector spafce V ϕ with a basis ( v i ) i , as: ∆ k : V ϕ → V ϕ ⊗ V ϕ :: v (cid:55)→ ( v ⊗ (cid:126) k ) + ( (cid:126) k ⊗ v ) , e : V ϕ → R :: ∑ i C i v i (cid:55)→ ∑ i C i Here, −→ v is as before and (cid:126) k stands for an element of V padded with number k . In the simplest case, when k =
1, we obtain two copies of the weights −→ v and also of its basis vectors, as the following calculationdemonstrates. Consider a two dimensional vector space and the vector ae + be in it. The 1 vector −→ e + e in V . Suppose −→ v and (cid:126) ∆ results inthe matrix 2 a e ⊗ e + ab e ⊗ e + ab e ⊗ e + b e ⊗ e , where we have two copies of the weights inthe diagonal and also the basis vectors have obviously multiplied.This construction is inspired by the graded algebra construction on vector spaces, whose dual con-struction is referred to as a Cofree Coalgebra . The Cofree-inspired coalgebra over a vector space definesa coalgebra modality structure on the identity comonad on
FdVect R , which provides another ! L ∗ -model,or rather, another quantization C ( ! L ∗ ) → FdVect R . . McPheat, M. Sadrzadeh, H. Wazni & G. Wijnholds C ( ! L ∗ ) In order to show the semantic computations for the parasitic gap, we introduce a diagrammatic semantics.The derivation of the parasitic gap phrase is involved and its categorical or vector space interpretationsrequire close inspection to read. The diagrammatic notation makes it easier to visualise the steps of thederivation and the final semantic form. In what follows we first introduce notation for the Clasp diagrams,then extend them with extra prospective notation necessary to model the ! coalgebra modality. The basicstructure of the C ( ! L ∗ ) category, i.e. its objects, morphisms, monoidal product and its internal homsare as in [2]. To these, we add the necessary diagrams for the coalgebra modality, that is the coalgebracomultiplication (copying) ∆ , the counit of the comonad ε , and the comonad comultiplication δ , foundin figure 1. The motivating example of [20] was the parasitic gap example “the paper that John signed withoutreading”, with the following lexicon: { ( The , NP \ N ) , ( paper , N ) , ( that , ( N \ N ) / ( S / ! NP )) , ( John , NP ) , ( signed , ( NP \ S ) / NP ) , ( without , (( NP \ S ) \ ( NP \ S )) / NP ) , ( reading , NP / NP ) } . The ! L ∗ derivation of “the paper that John signed without reading” is in the full version of the paper[25]. The categorical semantics of this derivation is the following linear map. ( (cid:74) NP (cid:75) ⇐ (cid:74) N (cid:75) ) ⊗ (cid:74) N (cid:75) ⊗ (( (cid:74) N (cid:75) ⇒ (cid:74) N (cid:75) ) ⇐ ( (cid:74) S (cid:75) ⇐ (cid:74) ! NP (cid:75) )) ⊗ (cid:74) NP (cid:75) ⊗ (( (cid:74) NP (cid:75) ⇒ (cid:74) S (cid:75) ) ⇐ (cid:74) NP (cid:75) ) ⊗ ((( (cid:74) NP (cid:75) ⇒ (cid:74) S (cid:75) ) ⇒ ( (cid:74) NP (cid:75) ⇒ (cid:74) S (cid:75) )) ⇐ (cid:74) NP (cid:75) ) ⊗ ( (cid:74) NP (cid:75) ⇐ (cid:74) NP (cid:75) ) −→ (cid:74) NP (cid:75) defined on elements as follows, for the bracketed subscripts in Sweedler notation: the ( − ) ⊗ −−−→ paper ⊗ that ( − , − ) ⊗ −−→ John ⊗ signed ( − , − ) ⊗ without ( − , − , − ) ⊗ reading ( − ) (cid:55)→ the ( that ( −−−→ paper , without ( −−→ John , signed ( − , − ( ) ) , reading ( − ( ) )))) The diagrammatic interpretation of the ! L ∗ -derivation is depicted in figure 2This is obtained via steps mirroring the steps of the derivation tree of the example, please see the fullversion of the paper [25].76 Vector Semantics for ! L ∗ Figure 2: Diagrammatic interpretation of “The paper that John signed without reading” . McPheat, M. Sadrzadeh, H. Wazni & G. Wijnholds
The reader might have rightly been wondering which one of these interpretations, the Cogebra or theCofree-inspired coalgebra model, produces the correct semantic representation. We implement the re-sulting vector representations on large corpora of data and experiment with a disambiguation task toprovide insights. The disambiguation task was that originally proposed in [15], but we work with thedata set of [22], which contains verbs deemed as genuinely ambiguous by [33], as those verbs whosemeanings are not related to each other. We extended this latter with a second verb and a prepositionthat provided enough data to turn the dataset from a set of pairs of transitive sentences to a set of pairsof parasitic gap phrases. As an example, consider the verb file , with meanings register and smooth .Example entries of the original dataset and its extension are below; the full dataset is available from https://msadrzadeh.com/datasets/ .S: accounts that the local government filedS1: accounts that the local government registeredS2: accounts that the local government smoothedP: accounts that the local government filed after inspectingP1: accounts that the local government registered after inspectingP2: accounts that the local government smoothed after inspectingP’: nails that the young woman filed after cuttingP’1: nails that the young woman registered after cuttingP’2: nails that the young woman smoothed after cuttingS’: nails that the young woman filedS’1: nails that the young woman registeredS’2: nails that the young woman smoothedWe follow the same procedure as in [22] to disambiguate the phrases with the ambiguous verb: (1)build vectors for phrases P, P1, P2, and also P’, P’1, P’2, (2) check whether vector of P is closer to vectorof P1 or vector of P2 and whether P’ is close to P’2 or P’1. If yes, then we have two correct outputs,(3) compute a mean average precision (MAP), by counting in how many of the pairs, the vector of thephrase with the ambiguous verb is closer to that of the phrase with its appropriate meaning.In order to instantiate our categorical model on this task and experiment with the different copyingmaps, we proceed as follows. We work with the parasitic gap phrases that have the general form: “A’sthe B C’ed Prep D’ing”. Here, C and D are verbs and their vector representations are multilinear maps.C is a bilinear map that takes A and B as input and D is a linear map that takes A as input. For now, werepresent the preposition Prep by the trilinear map
Prep . The vector representation of the parasitic gapphrase with a proper copying operator is
Prep ( C ( −→ B , −→ A ) , D ( −→ A )) , for C and D multilinear maps and −→ A and −→ B , vectors, and where −→ A = ∑ i C Ai n i . The different types of copying applied to this, provide us withthe following options. Cogebra copying ( a ) Prep (cid:32) C ( −→ B , −→ A ) , D ( ∑ i n i ) (cid:33) , ( b ) Prep (cid:32) C ( −→ B , ∑ i n i ) , D ( −→ A ) (cid:33) Cofree-inspired copying
Prep (cid:16) C ( −→ B , −→ A ) + D ( (cid:126) ) , C ( −→ B ,(cid:126) ) + D ( −→ A ) (cid:17) Vector Semantics for ! L ∗ Table 2: Parasitic Gap Phrase Disambiguation ResultsModel MAP Model MAP Model MAPBERT 0. 65 FT(+) 0.55 W2V (+) 0.46Full 0.48 Full 0.57 Full 0.54Cofree-inspired 0.47 Cofree-inspired 0.56 Cofree-inspired 0.54Cogebra (a) 0.46 Cogebra (a) 0.56 Cogebra (a) 0.46Cogebra (b) 0.42 Cogebra (b) 0.37 Cogebra (b) 0.39In the copy object model of [22], these choices become as follows:
Cogebra copying ( a ) Prep (cid:32) −→ A (cid:12) ( C × −→ B ) , D × ∑ i n i (cid:33) ( b ) Prep (cid:32) ( ∑ i n i ) (cid:12) ( C × −→ B ) , ( D × −→ A ) (cid:33) Cofree-inspired copying
Prep (cid:16) ( −→ A (cid:12) ( C × −→ B )) + ( D × (cid:126) ) , ( (cid:126) (cid:12) ( C × (cid:126) B )) + ( D × (cid:126) A ) (cid:17) For comparison, we also implemented a model where a
Full copying operation ∆ ( −→ v ) = −→ v ⊗ −→ v wasused, resulting in a third option Prep (cid:16) C ( −→ B , −→ A ) , D ( A ) (cid:17) , with the copy-object model Prep (cid:16) −→ A (cid:12) ( C × −→ B ) , D × −→ A (cid:17) Note that this copying is non-linear and thus cannot be an instance of our
FdVect R categorical semantics;we are only including it to study how the other copying models will do in relation to it.The results of experimenting with these models are presented in table 2. We experimented with threeneural embedding architectures: BERT [10], FastText (FT) [6], and Word2Vec CBOW (W2V) [40]. Fordetails of the training, please see the full version of the paper [25].Uniformly, in all the neural architectures, the Full model provided a better disambiguation than otherlinear copying models. This better performance was closely followed by the Cofree-inspired model: inBERT, the Full model obtained an MAP of 0.48, and the Cofree-inspired model an MAP of 0.47; in FT,we have 0.57 for Full and 0.56 for Cofree-inspired; and in W2V we have 0.54 for both models. Alsouniformly, in all of the neural architectures, the Cogebra (a) did better than the Cogebra (b). It is notsurprising that the Full copying did better than other two copyings, since this is the model that providestwo identical copies of the head noun A . This kind of copying can only be obtained via the application ofa non-linear ∆ . The fact that our linear Cofree-inspired copying closely followed the Full model, showsthat in the absence of Full copying, we can always use the Cofree-inspired as a reliable approximation.It was also not surprising that the Cofree-inspired model did better than either of the Cogebra models,as this model uses the sum of the two possibilities, each encoded in one of the Cogebra (a) or (b). ThatCogebra (a) performed better than Cogebra (b), shows that it is more important to have a full copy ofthe object for the main verb rather than the secondary verb of a parasitic gap phrase. Using this, we cansay that verb C that got a full copy of its object A , played a more important role in disambiguation, thanverb D , which only got a vector of 1’s as a copy of A . Again, this is natural, as the secondary verb onlyprovides subsidiary information.The most effective disambiguation of the new dataset was obtained via the BERT phrase vectors,followed by the Full model. BERT is a contextual neural network architecture that provides different . McPheat, M. Sadrzadeh, H. Wazni & G. Wijnholds There are plenty of questions that arise from the theory in this paper, concerning alternative syntaxes,coherence, and optimisation.One avenue we are pursuing is to bound the !-modality of ! L ∗ . This is desirable from a naturallanguage point of view, as the ! of linear logic symbolises infinite reuse, however at no point in naturallanguage is this necessary. Thus bounding ! by indexing with natural numbers, similar to BoundedLinear Logic [13] may allow for a more intuitive notion of resource insensitivity closer to that of naturallanguage.Showing the coherence of the diagrammatic semantics by using the proof nets of Modal Lambek Cal-culus [29], developed for clasp-string diagrams in [44] constitutes work in progress. Proving coherencewould allow us to do all our derivations diagrammatically, making the sequent calculus labour superflu-ous. However, we suspect there are better notations for the diagrammatic semantics perhaps more closelyrelated to the proof nets of linear logic.Applications of type-logics with limited contraction and permutation to movement phenomena is aline of research initiated in [14, 3] with a recent boost in [1, 30, 31], and also in [12]. Finding common-alities with these approaches is future work.We would also like to see how much we can improve the implementation of the cofree-inspired modelin this paper. This involves training better tensors, hopefully by using neural networks methods.
10 Acknowledgement
Part of the motivation behind this work came from the Dialogue and Discourse Challenge project of theApplied Category Theory adjoint school during the week 22–26 July 2019. We would like to thank theorganisers of the school. We would also like to thank Adriana Correia, Alexis Toumi, and Dan Shiebler,for discussions. McPheat acknowledges support from the UKRI EPSRC Doctoral Training Programmescholarship, Sadrzadeh from the Royal Academy of Engineering Industrial Fellowship IF-192058.
References [1] Pogadalla Amblard, de Groote and Retor´e, editors.
On the Logic of Expansion in Natural Language , volume10054, Nancy, France. In Amblard, de Groote, Pogadalla and Retor´e (eds.), 2016. Springer. doi:10.1007/978-3-662-53826-5_14 .[2] J. Baez and M. Stay. Physics, topology, logic, and computation: A rosetta stone. In B. Coecke, editor,
New Structures in Physics , volume 813 of
Lecture Notes in Physics. Springer . Springer, 2011. doi:DOI:10.1007/978-3-642-12821-9_2 . Vector Semantics for ! L ∗ [3] Guy Barry, Mark Hepple, Neil Leslie, and Glyn Morrill. Proof figures and structural operators for categorialgrammar. 1991. doi:10.3115/977180.977215 .[4] R. F. Blute, J. R. B. Cockett, and R. A. G. Seely. Differential categories. Mathematical Structures in ComputerScience , 16(6):1049–1083, 2006. doi:10.1017/S0960129506005676 .[5] Mehrnoosh Sadrzadeh Bob Coecke, Edward Grefenstette. Lambek vs. lambek: Functorial vector spacesemantics and string diagrams for lambek calculus.
Ann. Pure and Applied Logic , 164:1079 – 1100, 2013. doi:10.1016/j.apal.2013.05.009 .[6] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enriching Word Vectors withSubword Information.
Transactions of the Association for Computational Linguistics , 2017. arXiv:1607.04606 , doi:10.1162/tacl_a_00051 .[7] Alain Brugui`eres and Alexis Virelizier. Hopf monads. Advances in Mathematics , 2007. doi:10.1016/j.aim.2007.04.011 .[8] B. Coecke, M. Sadrzadeh, and S. Clark. Mathematical Foundations for Distributed Compositional Model ofMeaning. Lambek Festschrift.
Linguistic Analysis , 36:345–384, 2010.[9] JR Curran. From distributional to semantic similarity.
University of Edimburgh , 2003. doi:10.1.1.10.6068 .[10] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirec-tional transformers for language understanding.
CoRR , abs/1810.04805, 2018. URL: http://arxiv.org/abs/1810.04805 , arXiv:1810.04805 .[11] John R Firth. A synopsis of linguistic theory, 1930-1955, 1957.[12] M. Sadrzadeh G. Wijnholds. A type-driven vector semantics for ellipsis with anaphora using lambekcalculus with limited contraction. J of Log Lang and Inf , 28:331–358, 2019. doi:DOI:10.1007/s10849-019-09293-4 .[13] Jean Yves Girard, Andre Scedrov, and Philip J. Scott. Bounded linear logic: a modular approach topolynomial-time computability.
Theoretical Computer Science , 1992. doi:10.1016/0304-3975(92)90386-T .[14] Mark Hepple Glyn Morrill, Neil Leslie and Guy Barry. Categorial deductions and structural operations.In
Studies in Categorial Grammar, Edinburgh Working Papers in CognitiveScience , volume 5, pages 1–21.Centre for Cognitive Science, 1990.[15] Edward Grefenstette and Mehrnoosh Sadrzadeh. Experimental support for a categorical compositional dis-tributional model of meaning. In
Proceedings of the 2011 Conference on Empirical Methods in NaturalLanguage Processing , pages 1394–1404, Edinburgh, Scotland, UK., July 2011. Association for Computa-tional Linguistics. URL: .[16] Edward Grefenstette and Mehrnoosh Sadrzadeh. Concrete models and empirical evaluations for the categori-cal compositional distributional model of meaning.
Computational Linguistics , 2015. doi:10.1162/COLI_a_00209 .[17] Gregory Grefenstette.
Explorations in Automatic Thesaurus Discovery . 1994. doi:10.1007/978-1-4615-2710-7 .[18] Zellig S. Harris. Distributional Structure.
WORD , 1954. doi:10.1080/00437956.1954.11659520 .[19] James Edward Humphreys.
Introduction to lie algebras and representation theory . Springer-Verlag, 1972. doi:10.1007/978-1-4612-6398-2 .[20] Max Kanovich, Stepan Kuznetsov, and Andre Scedrov. Undecidability of the Lambek calculus with a relevantmodality.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligenceand Lecture Notes in Bioinformatics) , 9804 LNCS:240–256, 2016. arXiv:1601.06303 , doi:10.1007/978-3-662-53042-9_14 .[21] Dimitri Kartsaklis and Mehrnoosh Sadrzadeh. Prior disambiguation of word tensors for constructing sentencevectors. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing , . McPheat, M. Sadrzadeh, H. Wazni & G. Wijnholds pages 1590–1601, Seattle, Washington, USA, October 2013. Association for Computational Linguistics.URL: .[22] Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, and Stephen Pulman. Separating disambiguation from composi-tion in distributional semantics. In Proceedings of the Seventeenth Conference on Computational NaturalLanguage Learning , pages 114–123, Sofia, Bulgaria, August 2013. Association for Computational Linguis-tics. URL: .[23] Thomas K. Landauer and Susan T. Dumais. A Solution to Plato’s Problem: The Latent Semantic AnalysisTheory of Acquisition, Induction, and Representation of Knowledge.
Psychological Review , 1997. doi:10.1037/0033-295X.104.2.211 .[24] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze.
Introduction to Information Retrieval .2008. doi:10.1017/cbo9780511809071 .[25] Lachlan McPheat, Mehrnoosh Sadrzadeh, Hadi Wazni, and Gijs Wijnholds. Categorical vector spacesemantics for lambek calculus with a relevant modality,https://arxiv.org/abs/2005.03074, 2020. arXiv:2005.03074 .[26] Michael Moortgat Mehrnoosh Sadrzadeh and Gijs Wijnholds. A frobenius algebraic analysis for parasiticgaps. In
Workshop on Semantic Spaces at the Intersection of NLP, Physics, and Cognitive Science , Riga,Latvia, 2019.[27] Dmitrijs Milajevs, Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, and Matthew Purver. Evaluating neural wordrepresentations in tensor-based compositional settings. In
Proceedings of the 2014 Conference on EmpiricalMethods in Natural Language Processing (EMNLP) , pages 708–719, Doha, Qatar, October 2014. Associationfor Computational Linguistics. doi:10.3115/v1/D14-1079 .[28] M. Moortgat and G. Wijnholds. Lexical and derivational meaning in vector-based models of relativisation.In
Proceedings of the 21st Amsterdam Colloquium , 2017.[29] Michael Moortgat. Multimodal Linguistic Inference.
Logic Journal of IGPL , 1995. doi:10.1093/jigpal/3.2-3.371 .[30] Glyn Morrill. Grammar logicised: relativisation.
Linguistics and Philosophy , 40:119–163, 2017. doi:10.1007/s10988-016-9197-0 .[31] Glyn Morrill. A note on movement in logical grammar.
Journal of Language Modelling , pages 353–363,2018. doi:10.15398/jlm.v6i2.233 .[32] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. GloVe: Global vectors for word rep-resentation. In
EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing,Proceedings of the Conference , 2014. doi:10.3115/v1/d14-1162 .[33] Martin Pickering and Steven Frisson. Processing ambiguous verbs: Evidence from eye movements.
Jour-nal of experimental psychology. Learning, memory, and cognition , 27:556–73, 03 2001. doi:10.1037/0278-7393.27.2.556 .[34] Prakash Panangaden Richard Blute and Robert Seely. Fock space: a model of linear exponential types.
Manuscript, revised version of the MFPS paper Holomorphic models of exponential types in linear logic ,pages 474–512, 1994.[35] Herbert Rubenstein and John B. Goodenough. Contextual correlates of synonymy.
Commun. ACM ,8(10):627–633, October 1965. doi:10.1145/365628.365657 .[36] Mehrnoosh Sadrzadeh, Stephen Clark, and Bob Coecke. The Frobenius anatomy of word meanings I: Subjectand object relative pronouns.
Journal of Logic and Computation , 2013. arXiv:1404.5278 , doi:10.1093/logcom/ext044 .[37] Mehrnoosh Sadrzadeh, Dimitri Kartsaklis, and Esma Balkır. Sentence entailment in compositional dis-tributional semantics. Annals of Mathematics and Artificial Intelligence , 2018. arXiv:1512.04419 , doi:10.1007/s10472-017-9570-x . Vector Semantics for ! L ∗ [38] G. Salton. A document retrieval system for man-machine interaction. In Proceedings of the 1964 19thACM national conference , pages 122.301–122.3020, New York, New York, USA, 1964. ACM Press. doi:10.1145/800257.808923 .[39] P. Selinger. A survey of graphical languages for monoidal categories, 2011. arXiv:0908.3347 , doi:10.1007/978-3-642-12821-9_4 .[40] Kai Chen Tomas Mikolov, Ilya Sutskever, Greg S Corrado, and Jeff Dean. Distributed representations ofwords and phrases and their compositionality. Advances in Neural Information Processing Systems , page3111–3119, 2013. doi:10.5555/2999792.2999959 .[41] Johan Van Benthem. The Lambek Calculus. 1988. doi:10.1007/978-94-015-6878-4_3 .[42] Gijs Wijnholds and Mehrnoosh Sadrzadeh. Classical copying versus quantum entanglement in natural lan-guage: The case of vp-ellipsis.
EPTCS Proceedings of the second workshop on Compositional Approachesfor Physics, NLP, and Social Sciences (CAPNS) , 2018. doi:10.4204/EPTCS.283.8 .[43] Gijs Wijnholds and Mehrnoosh Sadrzadeh. Evaluating composition models for verb phrase elliptical sentenceembeddings. In
Proceedings of the 2019 Conference of the North American Chapter of the Association forComputational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages 261–271, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi:10.18653/v1/N19-1023 .[44] Gijs Jasper Wijnholds. Coherent diagrammatic reasoning in compositional distributional semantics. In
In-ternational Workshop on Logic, Language, Information, and Computation , pages 371–386. Springer, 2017. doi:DOI:10.1007/978-3-662-55386-2_27doi:DOI:10.1007/978-3-662-55386-2_27