[PDF] Analogical Proportions

Abstract

Analogy-making is at the core of human intelligence and creativity with applications to such diverse tasks as commonsense reasoning, learning, language acquisition, and story telling. This paper contributes to the foundations of artificial general intelligence by introducing from first principles an abstract algebraic framework of analogical proportions of the form ` a is to b what c is to d ' in the general setting of universal algebra. This enables us to compare mathematical objects possibly across different domains in a uniform way which is crucial for AI-systems. The main idea is to define solutions to analogical equations in terms of maximal sets of algebraic justifications, which amounts to deriving abstract terms of concrete elements from a `known' source domain which can then be instantiated in an `unknown' target domain to obtain analogous elements. It turns out that our notion of analogical proportions has appealing mathematical properties. For example, we show that analogical proportions preserve functional dependencies across different domains, which is desirable. We study Lepage's axioms of analogical proportions and argue why we disagree with his symmetry, central permutation, strong reflexivity, and strong determinism axioms. We compare our framework with two prominent and recently introduced frameworks of analogical proportions from the literature in the concrete domains of sets and numbers, and we show that in each case we either disagree with the notion from the literature justified by some plausible counter-example or we can show that our model yields strictly more reasonable solutions. This provides evidence for its applicability. In a broader sense, this paper is a first step towards a theory of analogical reasoning and learning systems with potential applications to fundamental AI-problems like commonsense reasoning and computational learning and creativity.

Full PDF

aa r X i v : . [ c s . L O ] A ug Analogical Proportions

Christian Anti´c

Institute of Discrete Mathematics and GeometryVienna University of TechnologyWiedner Hauptstraße 8-10, A-1040 Vienna, Austria

Abstract

Analogy-making is at the core of human intelligence and creativity with applicationsto such diverse tasks as commonsense reasoning, learning, language acquisition, andstory telling. This paper contributes to the foundations of artiﬁcial general intelligenceby introducing an abstract algebraic framework of analogical proportions of the form ‘ a is to b what c is to d ’ in the general setting of universal algebra. This enables us to com-pare mathematical objects possibly across different domains in a uniform way whichis crucial for AI-systems. The main idea is to deﬁne solutions to analogical equationsin terms of generalizations and to derive abstract terms of concrete elements from a‘known’ source domain which can then be instantiated in an ‘unknown’ target domainto obtain analogous elements. We extensively compare our framework with two promi-nent and recently introduced frameworks of analogical proportions from the literaturein the concrete domains of sets, numbers, and words and show that our frameworkyields strictly more reasonable solutions in all of these cases which provides evidencefor the applicability of our framework. In a broader sense, this paper is a ﬁrst step to-wards an algebraic theory of analogical reasoning and learning systems with potentialapplications to fundamental AI-problems like commonsense reasoning and computa-tional learning and creativity. Keywords:

Analogy; Artiﬁcial General Intelligence; Computational Learning,Creativity

1. Introduction

Analogy-making is at the core of human intelligence and creativity with applica-tions to such diverse tasks as commonsense reasoning, learning, language acquisition,and story telling (see, e.g., Hofstadter (2001), Hofstadter and Sander (2013), Gust et al.(2008), Boden (1998), Sowa and Majumdar (2003), Winston (1980), and Wos (1993)).This paper contributes to the foundations of artiﬁcial general intelligence by introduc-ing an abstract algebraic framework of analogical proportions of the form ‘ a is to b Email address: [email protected] (Christian Anti´c)

Preprint submitted to Artiﬁcial Intelligence August 26, 2020 hat c is to d ’ in the general setting of universal algebra. This enables us to com-pare mathematical objects possibly across different domains in a uniform way whichis crucial for AI-systems. The main idea is to deﬁne solutions to analogical equationsin terms of generalizations and to derive abstract terms of concrete elements from a‘known’ source domain which can then be instantiated in an ‘unknown’ target domainto obtain analogous elements. Example 1.1

Imagine two domains, one consisting of positive integers 1 , , . . . and theother made up of words ab , ba . . . . The analogical proportion2 : 4 :: ab : z (1)is asking for some word z (here z is a variable) which is to ab what 4 is to 2. Weobserve two things, ﬁrst the obvious fact 4 = and second that the variable x is acommon generalization of 2 and ab , that is, we can instantiate x with 2 to obtain 2 andwith ab to obtain ab . By deﬁning f ( x ) = x and g ( x ) = x , we therefore have2 = f ( ) , = g ( ) , and ab = f ( ab ) . (2)By looking at 2, what could z in 1 equal to? In 2, we see that transforming 2 into 4means going from f ( ) to g ( ) . Now what does it mean to transform ab ‘in the sameway’ or ‘analogously’? By continuing the pattern in 2, z = g ( ab ) is a natural answer,where we interpret ( ab ) = ab · ab as the concatenation of ab with itself. This yieldsthe plausible solution to 1 given by2 : 4 :: ab : abab . As simple as this line of reasoning may seem, it cannot be formalized by current modelsof analogical proportions which restrict themselves to proportions between objects ofa single domain (cf. Stroppa and Yvon (2006) and Miclet et al. (2008)).The rest of the paper is devoted to formalizing and studying reasoning patternsas in the example above within the abstract algebraic setting of universal algebra andinstances thereof (cf. Examples 5.7 and 6.12). We extensively compare our frameworkwith two prominent and recently introduced frameworks of analogical proportions fromthe literature, namely Stroppa and Yvon (2006)’s and Miclet et al. (2008)’s, within theconcrete domains of sets, numbers, and words and show that our framework yieldsstrictly more reasonable solutions in all of these cases which provides evidence for itsapplicability.The aim of this paper is to introduce our model of analogical proportions—which tothe best of our knowledge is original—in its full generality. The core idea of the paperis formulated in Deﬁnition 3.1 and despite its simplicity it has interesting consequenceswith mathematically appealing computational proofs, which we plan to explore furtherin the future.For excellent surveys on computational models of analogical reasoning we referthe interested reader to Hall (1989) and Prade and Richard (2014).2 . Universal Algebra

We recall some basic notions and notations of universal algebra by mainly follow-ing the lines of Burris and Sankappanavar (2000).We assume a given non-empty ranked alphabet Σ of function symbols , including0-ary function symbols called constants . An algebra of type Σ is a pair A = ( A , F ) ,where A is a set and F is a family of ﬁnitary operations on A indexed by the functionsymbols in Σ such that corresponding to each n -ary function symbol f there is an n -aryfunction f A : A n → A on A . The set A is called the universe (or underlying set ) of A .We denote the cardinality of A by | A | . With a slight abuse of notation, we will oftenidentify the function symbol f with its interpretation function f A . A term of type Σ is formed as usual from function symbols in Σ and variables. A polynomial over A of type Σ is a term of type Σ possibly containing constants denoting elements from A .We denote the set of all polynomials over A containing variables among ~ x = x , . . . , x n , n ≥

0, by A [ ~ x ] . For example, 2 x + x + N [ x ] of type { + / , · / } , where N = { , , . . . } denotes the natural numbers; however, notice that2 x + not a polynomial of type { + / } , as x requires multiplication. We calla polynomial p constant if p A is a constant function. For instance, the polynomial f ( x ) = x ∈ N [ x ] is constant despite containing the variable x . Polynomials can beinterpreted as ‘generalized elements’ containing variables as placeholders for concreteelements, and they will play a central role in our abstract algebraic formulation ofanalogical proportions given below.A structure is an algebra of some type Σ possibly containing relations betweenelements. For instance, ( N , + , ≤ ) is the structure of positive integers with additionlinearly ordered by ≤ .

3. Analogical Proportions

In the rest of the paper, we may assume some ‘known’ source domain S and some‘unknown’ target domain T , both partially ordered algebras of same type. We maythink of the source domain S as our background knowledge—a repertoire of orderedelements we are familiar with—whereas T stands for an unfamiliar domain which wewant to explore via analogical transfer from S . For this we consider analogical propor-tions between elements from the source and target domain.This leads us to the main deﬁnition of the paper. Deﬁnition 3.1 An analogical equation is an expression of the form ‘ a is to b what c isto z ’, in symbols a : b :: c : z , (3)where a and b are source elements from S , c is a target element from T , and z is avariable. We call an analogical equation of the form 3 solvable if there is a target Here we denote by N = { , , ,... } the positive integers including zero. Here + / + is a function symbol of arity 2. Of course, 2 x + x + x +

1, which is of type { + / } . d ∈ T so that there are polynomials f , g ∈ S ∩ T [ ~ x ] , and minimal sequencesof elements ~ a ∈ S n and ~ c ∈ T n , n ≥

1, with respect to f , satisfying one of the followinglines of identities: a = f S ( ~ a ) , b = g S ( ~ a ) , c = f T ( ~ c ) , d = g T ( ~ c ) , (4) a = f S ( ~ a ) , b = g S ( ~ a ) , c = g T ( ~ c ) , d = f T ( ~ c ) , (5)or, in case a , b , c , d ∈ S ∩ T and ~ a ,~ c ∈ ( S ∩ T ) n , a = f S ( ~ a ) , b = f S ( ~ c ) , c = g T ( ~ a ) , d = g T ( ~ c ) . (6)In this case, we call d a solution of equation 3, justiﬁed by f and g , and write a : b :: c : d or a : b :: f , g c : d or a : b :: c : d jus f , g . We then say that a , b , c , and d are in analogical proportion ( justiﬁed by f and g ). Analogical equations formalize the idea that analogy-making is the task of trans-forming different objects from the source to the target domain in ‘the same way’; oras P´olya (1954) puts it:Two systems are analogous if they agree in clearly deﬁnable relations oftheir respective parts.In our formulation, the ‘parts’ are the ‘subelements’ ~ a and ~ c and the ‘deﬁnable rela-tions’ are represented by the polynomials f and g which can be interpreted as gen-eralizations of a , c and b , d , respectively. More precisely, transforming a = f ( ~ a ) into c = f ( ~ c ) means to replace the ‘subelements’ ~ a of a by ~ c , where f ( ~ x ) is a generalizationof a and c . Transforming b in ‘the same way’ as a thus means to ﬁnd and replace the‘subelements’ ~ a in b = g ( ~ a ) by ~ c , which is exactly what 4 says. Lines 5 and 6 arepermutations of f and g in 4.We require in Deﬁnition 3.1 the sequences ~ a and ~ c to be minimal which is crucialas is illustrated by the following example. Example 3.2

Let sgn denote the signum function . Consider the analogical equationgiven by 1 : n :: 1 : z , (7)where n is a positive integer. Deﬁne the polynomials f and g by f = sgn and g = id . (8) We mean here point-wise minimality which is crucial for avoiding implausible ‘solutions’ (cf. Example3.2). See Remark 1. This is why ‘copycat’ is the name of a prominent model of analogy-making (Hofstadter and Mitchell,1995). See Correa et al. (2012). k , we have1 = sgn ( n ) , n = g ( n ) , = sgn ( k ) , which yields the ‘solution’ z = g ( k ) = k . This ‘solution’ is counter-intuitive as it says that ‘1 is to n what 1 is to k ’, for any positive integers n and k . The problem here is that n and k are, in general, not minimalpositive integers satisfying 1 = sgn ( n ) and 1 = sgn ( k ) . (9)By restricting the values of n and k to the minimal positive integers satisfying 9, namely n = k =

1, we can justify with f and g as deﬁned in 8 only the axiomatic numer-ical proportion 1 : 1 :: 1 : 1, as an instance of 15, as desired. A valid solution to 7 isgiven by z = n justiﬁed by f = g = id as an instance of 6.We have to make one more important distinction. Let π and π denote the projec-tions of a pair of elements to its ﬁrst and second argument, respectively. Then, for any elements a , b ∈ S and c , d ∈ T , we have a : b :: π , π c : d as a consequence of a = π ( a , b ) , b = π ( a , b ) , c = π ( c , d ) , and d = π ( c , d ) . That is, we can justify any analogical proportion via projections.This motivates the following deﬁnition. Deﬁnition 3.3

We say that a justiﬁcation ( f , g ) of a solution to an analogical equationis trivial (with respect to S and T ) if a : b :: f , g c : d holds for all a , b ∈ S and c , d ∈ T .We call a solution to an analogical equation trivial if it has only trivial justiﬁcationsand non-trivial otherwise. Remark 1

In the sequel, we will write a : b :: c : d, without any reference to its justi-ﬁcations f , g, only in case d is a non-trivial solution to a : b :: c : z, and for the mostpart we will be interested only in non-trivial solutions to analogical equations, that is,we will write “solution” instead of “non-trivial solution” and “justiﬁcation” insteadof “non-trivial justiﬁcation” et cetera. Analogical equations may have none, one, or multiple solutions as is illustrated bythe following example. See Proposition 4.1 and Proposition 5.1. xample 3.4 Let the source and target domains consist of all words containing lettersamong a , b , and c . Consider the analogical equation a : b :: c : z . (10)The only generalization of a and b (resp., a and c ) with respect to concatenation is f = id . Since there is no adequate generalization g given f , 10 has no solution. On theother hand, the analogical equation a : b :: a : z has the unique solution z = b . Finally, the analogical equation ab : ac :: bd : z has at least two solutions given by z = cd and z = dc (cf. Example 6.3).Solutions to analogical equations may have multiple (non-trivial) justiﬁcations whichmotivates the following concept. Deﬁnition 3.5

The degree of a solution to an analogical equation is the number of itsnon-trivial justiﬁcations.Intuitively, the degree of a solution can be interpreted as its degree of plausibility(cf. Example 6.3). This induces a natural ordering on the set of all solutions to a pro-portion.Lepage (2003) proposes the following axioms as a guideline for formal models ofanalogical proportions (cf. (Miclet et al., 2008, p.797)): a : b :: c : d ⇔ c : d :: a : b (symmetry) , (11) a : b :: c : d ⇔ a : c :: b : d (exchange of the means) , (12) a : a :: b : z ⇒ z = b ; or a : b :: a : z ⇒ z = b (determinism) . (13)The ﬁrst two axioms are plausible and we show in Theorem 3.7 below that theyare satisﬁed within our framework. However, we disagree with Lepage (2003) on theaxiom of determinism. Example 3.6

Consider the analogical equation given by1 : 1 :: − z . (14)One obvious solution to 14 is z = − f = g = id . However, there isanother solution to 14 justiﬁed by the polynomials f = id and g ( x ) = x . In fact, wehave 1 = f ( ) , = g ( ) , − = f ( − ) , and 1 = g ( − ) , which shows that z = − . This solution, which is intuitively plausible, violates Lepage (2003)’s axiom of deter-minism. 6e replace the axiom of determinism by its weaker variant given by a : a :: b : b and a : b :: a : b . (15)That is, we replace the requirement that b is a unique solution by the weaker conditionthat b is one among possible further solutions. Theorem 3.7

Deﬁnition 3.1 implies 11, 15, and, in case b , c ∈ S ∩ T , 12. P ROOF . In the cases 11 and 12 it sufﬁces to permute the roles of the involved justiﬁ-cations f and g . For 15 deﬁne f = g = id .The following reasoning pattern will often be used in the rest of the paper. Proposition 3.8

For any source element a ∈ S, target element c ∈ T , and polynomialg ∈ S ∩ T [ x ] , we have a : g ( a ) :: c : g ( c ) . P ROOF . An immediate consequence of Deﬁnition 3.1 with f = id , ~ a = a , and ~ c = c . Corollary 3.9

For any source element a ∈ S, target element c ∈ T , and joint elementb ∈ S ∩ T , we have a : b :: c : b jus id , b . (16) Consequently, in case S = T , 16 holds for any a , b , c. Remark 2

In the rest of the paper, we shall treat constant justiﬁcations as in Corollary3.9 as trivial as well (cf. Remark 1).

We now want to study transformations of analogical proportions.

Proposition 3.10

For any source elements a , b ∈ S, target elements c , d ∈ T , and poly-nomial h ∈ S ∩ T [ x ] , we havea : b :: c : d = ⇒ h ( a ) : h ( b ) :: h ( c ) : h ( d ) . P ROOF . If f , g ∈ S ∩ T [ ~ x ] are the polynomials justifying a : b :: c : d , for some minimal ~ a ,~ c with respect to f , then h ( f ( ~ x )) ∈ S ∩ T [ ~ x ] and h ( g ( ~ x )) ∈ S ∩ T [ ~ x ] are the polynomialsjustifying h ( a ) : h ( b ) :: h ( c ) : h ( d ) , with ~ a ,~ c being minimal with respect to h ( f ( ~ x )) . Proposition 3.11

For any source elements a , b , a ′ , b ′ ∈ S, target elements c , d , c ′ , d ′ ∈ T , and polynomial h ∈ S ∩ T [ ~ x ] , we havea : b :: c : d and a ′ : b ′ :: c ′ : d ′ = ⇒ h ( a , a ′ ) : h ( b , b ′ ) :: h ( c , c ′ ) : h ( d , d ′ ) . if the justiﬁcations f , g ∈ S ∩ T [ ~ x ] and f ′ , g ′ ∈ S ∩ T [ ~ y ] of a : b :: c : d and a ′ : b ′ :: c ′ : d ′ ,respectively, are instances of the same line of equations from 4-6. ROOF . If f , g and f ′ , g ′ are both justiﬁcations satisfying 4, for some ~ a ,~ c and ~ a ′ ,~ c ′ minimal with respect to f and f ′ , respectively, then h ( f ( ~ x ) , f ′ ( ~ y )) ∈ S ∩ T [ ~ x ,~ y ] and h ( g ( ~ x ) , g ′ ( ~ y )) ∈ S ∩ T [ ~ x ,~ y ] are the polynomials justifying h ( a , a ′ ) : h ( b , b ′ ) :: h ( c , c ′ ) : h ( d , d ′ ) , with ~ a ,~ a ′ and ~ c ,~ c ′ being minimal with respect to h ( f ( ~ x ) , f ′ ( ~ y )) . The othercases in which f , g and f ′ , g ′ satisfy either 5 or 6 are analogous.Iterating the process of Proposition 3.11 ﬁnitely many times yields the followingcorollary. Corollary 3.12

For any source elements a , . . . , a n , b , . . . , b n ∈ S, target elements c , . . . , c n , d , . . . , d n ∈ T , n ≥ , and for any polynomial h ∈ S ∩ T [ x , . . . , x n ] , we havea : b :: c : d . . . a n : b n :: c n : d n = ⇒ h ( ~ a ) : h ( ~ b ) :: h ( ~ c ) : h ( ~ d ) . if the justiﬁcations f i , g i ∈ S ∩ T [ ~ x i ] of a i : b i :: c i : d i , ≤ i ≤ n, are instances of thesame line of equations from 4-6.

4. Sets

In the rest of this section, let S = ( S ′ , ∪ , ∩ , . c , /0 , ⊆ ) and T = ( T ′ , ∪ , ∩ , . c , /0 , ⊆ ) , forsome sets S ′ and T ′ , denote the source and target domains, respectively. Moreover, let A and B denote subsets of the source universe S ′ , and let C and D denote subsets of thetarget universe T ′ .As an instance of Proposition 3.8, for any sets A ⊆ S ′ and C ⊆ T ′ , we have A : A c :: C : C c (17)and, for any set E ⊆ S ′ ∩ T ′ , A : A ∪ E :: C : C ∪ E and A : A ∩ E :: C : C ∩ E . Proposition 4.1

For any sets A , B , C , D we haveA : B :: tr , tr C : D , (18) with tr ( X , Y ) = ( X ∩ Y ) ∪ ( X − Y ) and tr ( X , Y ) = ( X ∩ Y ) ∪ ( Y − X ) , which shows that tr and tr are trivial justiﬁcations of 18. P ROOF . We have A = tr ( A , B ) , B = tr ( A , B ) , C = tr ( C , D ) , and D = tr ( C , D ) , (19)and A , B , C , D are subset minimal sets satisfying 19.Proposition 4.1 shows that trivial justiﬁcations may contain useful informationabout the underlying structures—in this case, it encodes the observation that any twosets A and B are symmetrically related via A = ( A ∩ B ) ∪ ( A − B ) and B = ( A ∩ B ) ∪ ( B − A ) .We now want to prove some set-theoretic properties of set proportions.8 heorem 4.2 If A ∩ B = C ∩ D then A : B :: C : D. P ROOF . Deﬁne the polynomials f , g ∈ S ′ ∩ T ′ [ X , Y ] by f ( X , Y ) = ( A ∩ B ) ∪ ( X − Y ) and g ( X , Y ) = ( A ∩ B ) ∪ ( Y − X ) . As a consequence of A ∩ B = C ∩ D , we have A = f ( A , B ) , B = g ( A , B ) , C = f ( C , D ) , and D = g ( C , D ) , (20)and A , B , C , D are subset minimal sets satisfying 20.The following corollaries show that set proportions are compatible with set inclu-sion. Corollary 4.3

For any sets A ⊆ B ⊆ S ′ and A ⊆ D ⊆ T ′ , we haveA : B :: A : D . In particular, we have /0 : B :: /0 : D. Corollary 4.4

If A ∩ B ⊆ C then A : B :: C : D ∪ ( A ∩ B ) holds for any set D disjoint toC. Interestingly enough, the next result shows that distinct elements are proportionalwhen considered as singletons (as opposed to letters; cf. Example 3.4).

Corollary 4.5

For any distinct elements a , b ∈ S ′ and c , d ∈ T ′ , we have { a } : { b } :: { c } : { d } . We further have the following implication.

Theorem 4.6

If A ∪ B = C ∪ D then A : B :: C : D. P ROOF . The proof is similar to the proof of Theorem 4.2. Deﬁne the polynomials f , g ∈ S ′ ∩ T ′ [ X , Y ] by f ( X , Y ) = ( A ∪ B ) − ( Y − X ) and g ( X , Y ) = ( A ∪ B ) − ( X − Y ) . As a consequence of A ∪ B = C ∪ D , we have A = f ( A , B ) , B = g ( A , B ) , C = f ( C , D ) , and D = g ( C , D ) , (21)and A , B , C , D are subset minimal sets satisfying 21.Notice that in case S ′ = T ′ , Theorem 4.6 yields another justiﬁcation of 17. For instance, the ‘solution’ A = f ( A , /0 ) violates the requirement B = g ( A , /0 ) (except for B = /0). orollary 4.7 For any sets A ⊆ S ′ and C ⊆ T ′ , we haveA : S ′ ∪ T ′ :: C : S ′ ∪ T ′ . In what follows, we want to compare our notion of set proportion with two promi-nent models due to Miclet et al. (2008) and Stroppa and Yvon (2006).The following deﬁnition is due to (Stroppa and Yvon, 2006, Proposition 4).

Deﬁnition 4.8

For any sets A , B , C , D ⊆ S ′ ∪ T ′ , deﬁne A : B :: S C : D if A = A ∪ A , B = A ∪ D , C = D ∪ A , and D = D ∪ D .For example, with A = { a } , A = { a } , D = { d } , and D = { d } we obtain theset proportion { a , a } : { a , d } :: S { d , a } : { d , d } . So, roughly, we obtain the set { a , d } from { a , a } by replacing a by d , whichcoincides with the transformation from { d , a } into { d , d } .We have the following implication. Theorem 4.9

If A : B :: S C : D then A : B :: C : D. P ROOF . Let A , B , C , D be decomposed as in Deﬁnition 4.8. Deﬁne the polynomials f , g ∈ S ′ ∩ T ′ [ X ] by f ( X ) = X ∪ A and g ( X ) = X ∪ D . As a consequence of A = A ∪ A , B = A ∪ D , C = D ∪ A , and D = D ∪ D , wehave A = f ( A ) , B = g ( A ) , C = f ( D ) , and D = g ( D ) , (22)and the sets A and D are subset minimal sets satisfying 22.The following example shows that the converse of Theorem 4.9 fails in general. Example 4.10

Consider the analogical equation { a } : /0 :: { c } : Z . (23)As a consequence of Theorem 4.2, the empty set is a solution of 23. On the contrary,we verify that the empty set is not a solution of 23 according to Deﬁnition 4.8 asfollows. We ﬁrst observe that A = A ∪ A with A = /0 and A = { a } is the onlypossible decomposition of A which allows a decomposition of /0 into /0 = A ∪ D with D = /0. However, as Deﬁnition 4.8 requires a decomposition of { c } containing A , weobtain a contradiction.There is at least one more deﬁnition of set proportions in the literature due to(Miclet et al., 2008, Deﬁnition 2.3). 10 eﬁnition 4.11 For any ﬁnite sets A , B , C , D ⊆ S ′ ∪ T ′ , deﬁne A : B :: M C : D if thereare some ﬁnite sets E and F such that B = ( A ∪ E ) − F and D = ( C ∪ E ) − F .Notice that both Stroppa and Yvon (2006) and Miclet et al. (2008) deﬁne set pro-portions only for sets over the same universe which is a serious restriction to its prac-tical applicability. Even more problematic, Miclet et al. (2008) deﬁne set proportionsonly for ﬁnite sets.We have the following implication. Theorem 4.12

For any ﬁnite sets A , B , C , D ⊆ S ′ ∪ T ′ , if A : B :: M C : D then A : B :: C : D. P ROOF . For given subsets E and F of S ′ ∪ T ′ with B = ( A ∪ E ) − F and D = ( C ∪ E ) − F apply Proposition 3.8 with g ( X ) = ( X ∪ E ) − F .The following example shows that the converse of Theorem 4.12 fails in general. Example 4.13

Consider the analogical equation { a } : { b } :: /0 : Z . (24)As a consequence of Theorem 4.6, { a , b } is a solution of 24. However, since there areno ﬁnite sets E and F satisfying { b } = ( { a } ∪ E ) − F and { a , b } = ( /0 ∪ E ) − F , { a , b } is not a solution of 24 according to Deﬁnition 4.11.

5. Numbers

This section studies numerical proportions between integers m , n , k , and ℓ . Forinstance, it is an immediate consequence of Proposition 3.8 that for any integers m and k , we have m : − m :: k : − k and m : 1 m :: k : 1 k and, given some integer s , m : m + s :: k : k + s and m : ms :: k : ks . In analogy to Proposition 4.1 we have the following result.

Proposition 5.1

For any integers m , n , k , and ℓ , we havem : n :: tr , tr k : ℓ, (25) with tr ( x , y ) = x + y − y and tr ( x , y ) = x + y − x , which shows that tr and tr are trivial justiﬁcations of 25. m and n are symmetrically related via m = m + n − n and n = n + m − m .The next theorem formally proves a well-known arithmetic proportion. Theorem 5.2

If n − m = ℓ − k then m : n :: k : ℓ . P ROOF . Apply Proposition 3.8 with g ( x ) = x + n − m .The following notion of numerical proportions is an instance of the more gen-eral deﬁnition due to (Stroppa and Yvon, 2006, Proposition 2) given for abelian semi-groups. Deﬁnition 5.3

Deﬁne m : n :: S k : ℓ if m = m + m , n = m + ℓ , k = ℓ + m , and ℓ = ℓ + ℓ .For instance, with m = + n = + k = +

1, and ℓ = +

2, we have2 : 3 :: S . We have the following implication.

Theorem 5.4

If m : n :: S k : ℓ then m : n :: k : ℓ . P ROOF . Let m , n , k , and ℓ be decomposed as in Deﬁnition 5.3. Deﬁne the polynomials f , g ∈ Z [ x ] by f ( x ) = x + m and g ( x ) = x + ℓ . As a consequence of m = m + m , n = m + ℓ , k = ℓ + m , and ℓ = ℓ + ℓ , we have m = f ( m ) , n = g ( m ) , k = f ( ℓ ) , ℓ = g ( ℓ ) , where m and ℓ are minimal such integers.The following example shows that the converse of Theorem 5.4 fails in general. Example 5.5

The numerical proportion2 : 4 :: 3 : 9is an instance of Proposition 3.8 with justiﬁcation g ( x ) = x . According to Deﬁnition5.3, we need decompositions 2 = m + m , 4 = m + ℓ , 3 = ℓ + m , and 9 = ℓ + ℓ ,for some positive integers m , m , ℓ , ℓ . We obtain the following values for m , m , ℓ :2 = m + m ⇒ m = − m = m + ℓ ⇒ = − m + ℓ ⇒ m = − + ℓ = ℓ + m ⇒ = ℓ − + ℓ ⇒ ℓ = − ℓ . Hence, we ﬁnally arrive at 9 = − ℓ + ℓ = Theorem 5.6 If nm = ℓ m then m : n :: k : ℓ , for m = and n = . P ROOF . Apply Proposition 3.8 with g ( x ) = nm x .The following example shows how we can solve analogical equations across dif-ferent domains within our framework. Example 5.7

Let Σ = {◦ / , c / } be a type consisting of a binary function symbol ◦ and a constant c . Let A be a ﬁnite non-empty alphabet containing a and b , and let S = ( A , ∪ , { b } , ⊆ ) and T = ( N , + , , ≤ ) be structures of type Σ . That is, we interpret ◦ as union in S and as addition in T , and we interpret the constant c as the set { b } in S and as the number 1 in T . Now consider the analogical equation { a } : { a , b } :: 1 : z . Deﬁne the polynomial g by g ( x ) = x ◦ c . The interpretations of g in S and T are then given by g S ( x ) = x ∪ { b } and g T ( x ) = x + . Hence, as a consequence of Proposition 3.8, we have { a } : { a , b } :: 1 : 2 .

6. Words

Words are ubiquitous in computer science and linguistics and in this section westudy word proportions. In the rest of this section, let S = ( S ∗ , · , ε , ≤ lex ) and T =( T ∗ , · , ε , ≤ lex ) , for some ﬁnite alphabets S and T , denote the source and target domain,respectively. Here, · denotes concatenation of words and ≤ lex denotes the lexicographicordering on words. Moreover, let s and u be words over the source alphabet S , and let v and w be words over the target alphabet T .As an instance of Corollary 3.12, we have, for every n ≥ s : u :: v : w . . . s n : u n :: v n : w n = ⇒ s . . . s n : u . . . u n :: v . . . v n : w . . . w n , (26) Recall from Section 2 that we may use constants to form polynomials.

13f the justiﬁcations f i , g i of s i : u i :: v i : w i , 1 ≤ i ≤ n , are instances of the same line ofequations from 4-6.Our ﬁrst theorem of this section associates factorizations of words with word pro-portions. Theorem 6.1

For any words s , u , v , w ∈ ( S ∩ T ) ∗ , if sw = vu then s : u :: v : w. P ROOF . The equation sw = vu implies that either (i) v is a preﬁx of s in which case w is a sufﬁx of u or (ii) vice versa. In the ﬁrst case, we have s = vs ′ for some word s ′ and, consequently, vs ′ w = vu which implies u = s ′ w . Therefore, the polynomials f , g ∈ ( Σ ∩ Γ ) ∗ [ x ] given by f ( x ) = vx and g ( x ) = xw justify the word proportion s : u :: v : w since s = f ( s ′ ) , u = g ( s ′ ) , v = f ( ε ) , and w = g ( ε ) . Here, the word s ′ is lexicographically minimal with respect to f as it is the uniquesolution to s = f ( s ′ ) . In the second case, we have v = sv ′ for some word v ′ and, conse-quently, sv ′ u = sw which implies w = v ′ u . Therefore, the polynomials f , g ∈ ( Σ ∩ Γ ) ∗ [ x ] given by f ( x ) = sx and g ( x ) = xu justify the word proportion s : u :: v : w since s = f ( ε ) , u = g ( ε ) , v = f ( v ′ ) , and w = g ( v ′ ) . Here, the word v ′ is lexicographically minimal with respect to f as it is the uniquesolution to v = f ( v ′ ) .The following examples demonstrate that word proportions may have multiple so-lutions. Example 6.2

Consider the analogical equation a : bc :: ac : z . (27)As a consequence of Theorem 6.1, z = cbc is a solution of 27. To be more precise, wecan apply the second case in the proof of Theorem 6.1 as follows. First observe that ac = av ′ with v ′ = c . Then deﬁne the polynomials f ( x ) = ax and g ( x ) = xbc , and compute a = f ( ε ) , bc = g ( ε ) , ac = f ( c ) , and cbc = g ( c ) . The crucial observation here is that the empty word can be inserted at an arbitraryposition in g to obtain the word bc . For instance, if we deﬁne g ′ ( x ) = bxc (or, equiv-alently, g ′ ( x ) = bcx ), we again have bc = g ′ ( ε ) which shows that we can obtain onemore solution for 27 via z = g ′ ( c ) = bcc . Example 6.3

Consider the analogical equation ab : ac :: bd : z . (28)14eﬁne the polynomial f by f ( x , y ) = xby . We have ab = f ( a , ε ) and bd = f ( ε , d ) . Now deﬁne the three non-constant polynomials g ( x , y ) = xcy and g ( x , y ) = xyc and g ( x , y ) = yxc . In all three cases we have ac = g ( a , ε ) = g ( a , ε ) = g ( a , ε ) , which yields two solutions to 28 given by z = g ( ε , d ) = cd and z = g ( ε , d ) = g ( ε , d ) = dc . In this case, we see that the solution dc has degree at least 2. Intuitively, we caninterpret the transformation from ab to ac in (at least) two different ways, representedby the two solutions above: (i) we can say that we get ac from ab by simply replacing b by c , which yields the transformation of bd into z = cd ; or (ii) we can say that ac is obtained from ab by replacing b by c and moving c to the right-hand-side of a (which in the ﬁrst case happens to be the identity transformation), which yields thetransformation of bd into z = dc .The following notion of word proportion is an instance of (Stroppa and Yvon, 2006,Deﬁnition 2) given in the more general context of semigroups. Deﬁnition 6.4

Given s , u , v , w ∈ S ∗ , deﬁne s : u :: S v : w if there are decompositions s = s . . . s n , u = u . . . u n , v = v . . . v n , and w = w . . . w n , n ≥

1, such that s i = u i and v i = w i or s i = v i and u i = w i holds for every 1 ≤ i ≤ n . (29)For instance, the word proportions a : a :: S bb : bb and abc : abd :: S bbc : bbd areinstances of Deﬁnition 6.4.We have the following implication. Theorem 6.5

If s : u :: S v : w then s : u :: v : w. P ROOF . Let s = s . . . s n , u = u . . . u n , v = v . . . v n , and w = w . . . w n , n ≥

1, be de-compositions of s , u , v , and w . Let I be the set of indices i such that s i = v i and u i = w i ,1 ≤ i ≤ n , and let m be the ﬁnite cardinality of I = { i , . . . , i m } . If m = s = v and u = w and, hence, s : u :: v : w as an instance of 15. Otherwise, we deﬁnethe polynomials f , g ∈ S ∗ [ x i , . . . , x i m ] as follows. For every i , 1 ≤ i ≤ n , if s i = v i and u i = w i deﬁne f i = s i and g i = v i and, otherwise, deﬁne f i = x i and g i = x i ; ﬁnally,deﬁne f ( x i , . . . , x i m ) = f . . . f n and g ( x i , . . . , x i m ) = g i . . . g n .

15y construction, we have s = f ( s i , . . . , s i m ) , u = g ( s i , . . . , s i m ) , v = f ( v i , . . . , v i m ) , and w = g ( v i , . . . , v i m ) , (30)which shows that f and g are justiﬁcations of s : u :: v : w since s = ( s i , . . . , s i m ) and v = ( v i , . . . , v i m ) are lexicographically minimal sequences with respect to f as s and v are unique solutions to s = f ( s ) and v = f ( v ) , respectively.Notice that Stroppa and Yvon (2006) deﬁne word proportions only for words overthe same alphabet. We therefore cannot expect the converse of Theorem 6.5 to be trueand the following example shows that it may fail even in the case of a single alphabet. Example 6.6

Consider the analogical equation given by ab : ac :: bc : z . (31)The polynomials f ( x ) = ax and g ( x ) = xc justify the solution z = cc of 31 as an instanceof 6. This solution formalizes the intuitive observation, that ac and cc are obtainedfrom ab and bc , respectively, by replacing b by c . This solution cannot be obtainedfrom Deﬁnition 6.4 as the ﬁrst letters of ab and ac are identical, whereas the ﬁrst lettersof bc and cc differ.Theorem 6.5 and Example 6.6 show that our notion of word equations yields strictlymore reasonable solutions than Stroppa and Yvon (2006)’s notion.We now want to compare our notion of word proportions with the one of Miclet et al.(2008). This requires some auxiliary deﬁnitions (cf. (Miclet et al., 2008, Deﬁnitions2.6–2.8)). Deﬁnition 6.7

We say that a word u ∈ S ∗ is semantically equivalent to a word v ∈ ( S ∪ {∼} ) ∗ if u can be obtained from v by omitting the symbol ∼ in v . We write u ≈ v in this case.Semantical equivalence identiﬁes words which differ only by different occurrencesof the symbol ∼ . For example, we have ab ∼ a ∼ a ≈ abaa . Deﬁnition 6.8 An alignment between four words s , u , v , w ∈ Σ ∗ is a word over the al-phabet ( Σ ∪ {∼} ) − { ( ∼ , ∼ , ∼ , ∼ ) } whose projection on the ﬁrst, second, third, andfourth component is semantically equivalent to s , u , v , and w , respectively.Informally, an alignment represents a one-to-one letter correspondence betweenwords, in which some letters ∼ may be inserted. For instance, an alignment between ab , abc , acd , a is given by ( a ∼ b , abc , acd , a ∼∼ ) .The following deﬁnition of word proportions is due to (Miclet et al., 2008, Deﬁni-tion 2.9). Deﬁnition 6.9

For any words s , u , v , w ∈ S ∗ on which an analogical proportion is de-ﬁned, deﬁne s : u :: M v : w if there exist four words s ′ , u ′ , v ′ , w ′ ∈ ( Σ ∪ {∼} ) ∗ of samelength n , n ≥

0, such that 16. s ′ ≈ s , u ′ ≈ u , v ′ ≈ v , w ′ ≈ w ,2. s i : u i :: M v i : w i holds true for every 1 ≤ i ≤ n .For example, Σ = { a , b , α , β , A , B } with given word proportions a : b :: M A : B and a : α :: M b : β and A : α :: M B : β (32)and the alignment ( a ∼ BA , α bBA , b ∼ a ∼ , β ba ∼ ) between the four sequences aBA , α bBA , ba , and β ba ‘justify’ the word proportion aBA : α bBA :: M ba : β ba . (33)First, notice that Miclet et al. (2008) assume in the derivation of 33 given analogicalproportions of the form 32 between letters of the alphabet as ‘axioms’, which have nodirect correspondence within our framework. More precisely, the ‘axioms’ in 32 haveno justiﬁcations according to Deﬁnition 3.1 with respect to concatenation. However,we can extend the source and target domains by unary substitutions modeling the givenaxioms as follows. Deﬁne a substitution to be any one-to-one mapping σ : S → T ho-momorphically extended to non-empty words in S ∗ letter-wise. In the example above,we deﬁne σ = { a A , b B } , σ = { a b , α β } , and σ = { A B , α β } .The ‘axioms’ in 32 can now be modeled within our framework as an instance of Propo-sition 3.8 given the unary substitution operations σ , σ , σ by a : b :: σ ( a ) : σ ( b ) and a : α :: σ ( a ) : σ ( b ) and A : α :: σ ( A ) : σ ( α ) . (34)We can now justify the word proportion in 33 according to Deﬁnition 3.1 by an iteratedapplication of 26 to the proportions in 34 together with axiomatic letter proportions ofthe form ∼ : b :: ∼ : b and A : A :: ∼ : ∼ and B : B :: a : a . (35)More precisely, we have as an instance of 26, a : α :: b : β and ∼ : b :: ∼ : b = ⇒ a ∼ : α b :: b ∼ : β b . Two more applications of 26 applied to 34 and 35 yield a ∼ BA : α bBA :: b ∼ a ∼ : β ba ∼ , (36)which is an aligned variant of 33. Lastly, remove ∼ from 36 to obtain 33.Finally, notice that Miclet et al. (2008) only deﬁne word proportions between wordsover the same alphabet which is a serious restriction of its practical applicability.We have the following implication. Theorem 6.10

If s : u :: M v : w then s : u :: v : w. The proportions in 35 are instances of 15; see Theorem 3.7. ROOF . A straightforward generalization of the reasoning pattern in the example above,that is, given ‘axioms’ in the form of analogical proportions between letters of the al-phabet and an alignment of s , u , v , w , construct the necessary substitutions and itera-tively apply 26 to the given ‘axioms’ to obtain a proportion between aligned words,and then remove all appearances of ∼ from the obtained word proportion.As Miclet et al. (2008) deﬁne word proportions only for words over the same al-phabet, we cannot expect the converse of Theorem 6.10 to be true and the followingexample shows that it may fail even in the case of a single alphabet. Example 6.11

Reconsider the analogical equation of Example 6.6 given by ab : ac :: bc : z . (37)In Example 6.6 we have seen that z = cc is a solution of 37. We try to construct analignment of the words in 37. First, we must have a : a :: M ∼ : ∼ to align the a ’s in ab and ac . Second, we must have b : ∼ :: M b : ∼ to align the b ’s in ab and bc . The onlypossibility now to align the c ’s in ac and bc is to have ∼ : c :: M c : ∼ . However, we stillhave both letters in cc left, which requires ∼ : ∼ :: M ∼ : c —a contradiction.Finally, we can formally solve the analogical equation in Example 1.1 within ourframework. Example 6.12

Reconsider the analogical equation of Example 1.1 given by2 : 4 :: ab : z . Let Σ = {◦ / } be a type, and deﬁne S = ( N , + , ≤ ) and T = ( { a , b } , · , ≤ lex ) be structuresof type Σ , that is, we interpret ◦ as addition of numbers and as concatenation of wordsin S and T , respectively. Now deﬁne the polynomial g by g ( x ) = x ◦ x . As a direct consequence of Proposition 3.8, we have2 : 4 :: ab : abab .

7. Conclusion

This paper contributed to the foundations of artiﬁcial general intelligence by intro-ducing an abstract algebraic framework of analogical proportions in the general settingof universal algebra. This enabled us to compare mathematical objects possibly acrossdifferent domains in a uniform way which is crucial for AI-systems. We then showedthat our framework yields strictly more reasonable solutions than two prominent mod-els from the literature, namely Stroppa and Yvon (2006)’s and Miclet et al. (2008)’s,in the concrete domains of sets, numbers, and words, which provides evidence for theapplicability of our framework.In a broader sense, this paper is a ﬁrst step towards an algebraic theory of analogicalreasoning and learning with potential applications to fundamental AI-problems likecommonsense reasoning and computational learning and creativity.18 .1. Future Work

This theoretical paper studies some fundamental properties of analogical propor-tions within the general domain of universal algebra and within the speciﬁc domains ofsets, numbers, and words. In the future, we wish to expand this study to other domainsrelevant for computer science and artiﬁcial intelligence as, for instance, trees, graphs,automata, neural networks, et cetera.The main task for future research is to develop algorithms for the computation ofsome or all solutions to analogical equations as deﬁned in this paper. At its core, thisrequires algebraic methods for constructing and solving algebraic equations of the form4-6. For instance, in the arithmetic setting of Section 5, this task amounts to construct-ing some or all polynomials (i.e., justiﬁcations) f and g with integer coefﬁcients, givensome positive integers m , n , and k , such that some line of Diophantine equations in 4-6has integer solutions, which is non-trivial.Another interesting line of research relevant in practice is to study solutions toanalogical equations justiﬁed by minimal or least general generalizations, which isclosely related to anti-uniﬁcation ( modulo equational theory or E-generalization ) (cf.Plotkin (1970); Reynolds (1970); Burghardt (2003)). More precisely, in this paper wehave not restricted the space of generalizations as we were interested in all possiblejustiﬁcations to analogical proportions. However, in practice it will be necessary torestrict the search space of justiﬁcations in order to compute solutions to analogicalequations efﬁciently and developing techniques for computing minimal or least generaljustiﬁcations will be a ﬁrst step in this direction.From a practical point of view, applying our model to various AI-related problemssuch as, e.g., commonsense reasoning, formalizing metaphors, and learning by anal-ogy, is interesting and we are convinced that promising results will follow.From a mathematical point of view, relating analogical proportions to other con-cepts of universal algebra (e.g., congruences, homomorphisms, isomorphisms, etc.)and related subjects is an interesting line of research. Moreover, studying analogicalproportions in more abstract mathematical structures like, for example, various kindsof lattices, semigroups and groups, rings, et cetera, is particularly interesting in the caseof proportions between objects from different domains.

Arguably, the most prominent (symbolic) model of analogical reasoning to date isGentner (1983)’s

Structure-Mapping Theory (or

SMT ), ﬁrst implemented by Falkenhainer et al.(1989). Our approach shares with Gentner’s SMT its symbolic nature. However, whilein SMT mappings are constructed with respect to meta-logical considerations—for in-stance, Gentner’s systematicity principle prefers connected knowledge over indepen-dent facts—in our framework ‘mappings’ are realized via analogical proportions sat-isfying mathematically well-deﬁned properties. We leave a more detailed comparisonbetween our algebraic approach and Gentner’s SMT as future work.Formal models of analogical proportions started to appear only very recently andin this paper we extensively compared our model with two prominent models from theliterature, namely Miclet et al. (2008)’s and Stroppa and Yvon (2006)’s algebraic mod-els, and showed that our model yields strictly more solutions in the concrete domains19f sets, numbers, and words. We expect similar results in other domains where themodels of Stroppa and Yvon (2006) and Miclet et al. (2008) are applicable.A conceptually similar approach to solving analogical word equations is given byDastani et al. (2003). At this point, it is not entirely clear how our simple frame-work formulated in this paper relates to the rather complicated model of Dastani et al.(2003) built on top of concepts such as “gestalts” of sequential patters, structural in-formation theory (SIT), algebraic coding systems for SIT, information load, represen-tation systems, local homomorphism, constraints, et cetera. We challenge the readerto ﬁnd instances where the model of Dastani et al. (2003) is more expressive—in theword domain—than our model, which would (partially) justify their heavy machin-ery. To give a glimpse of what we mean, consider the following simple example (cf.(Navarrete and Dartnell, 2017, p.4)).

Example 7.1

Let S = T = { a , b , c , d } be linearly ordered via a < S lex b < S lex c < S lex d and d < T lex c < T lex b < T lex a (38)extended to words lexicographically. We assume unary ‘successor’ functions succ S and succ T preserving the linear orderings in 38, e.g., succ S ( a ) = b , succ S ( d ) = d , and succ T ( d ) = c and so on. Consider the analogical equation abc : abcd :: dcb : z . (39)This equation is asking for a word which is to dcb what abcd is to abc . Observe that weobtain abcd from abc by concatenating the successor of c at the end of abc . Thereforedeﬁne the polynomials f , g ∈ S ∗ ∩ T ∗ [ w , x , y ] by f ( w , x , y ) = wxy and g ( w , x , y ) = wxy · succ ( y ) , where succ is a new function symbol interpreted as succ S and succ T in S and T , respec-tively. We have abc = f S ( a , b , c ) , abcd = g S ( a , b , c ) , dcb = f T ( d , c , b ) , and, consequently, the solution z = g T ( d , c , b ) = dcba . Dastani et al. (2003) obtain the same solution in a different way by using the algebrasgenerated by the letters in T and operators (named in their terminology “gestalts”) suchas “iteration”, “successor”, “symmetry”, and “alternation”, “representation systems”,et cetera, and, ﬁnally, by computing the solution dcba via “local homomorphisms”. Here, it is important to distinguish between the function symbol succ and its associated interpretationfunctions succ S and succ T , and between the polynomials f , g and the interpretation functions f S , g S , f T , g T (cf. Section 2). Acknowledgments

This work has been supported by the Austrian Science Fund (FWF) project P31063-N35.

References