A Note on Argumentative Topology: Circularity and Syllogisms as Unsolved Problems
AA Note on Argumentative Topology: Circularity and Syllogisms asUnsolved Problems
Wlodek W. Zadrozny , , [email protected] College of Computing, University of North Carolina at Charlotte School of Data Science, University of North Carolina at CharlotteJanuary 05-25, 2021
Abstract
In the last couple of years there were a few attempts to apply topological data analysisto text, and in particular to natural language inference. A recent work by Tymochko et al.suggests the possibility of capturing ‘the notion of logical shape in text,’ using ‘topological delayembeddings,’ a technique derived from dynamical systems, applied to word embeddings.In this note we reconstruct their argument and show, using several old and new examples,that the problem of connecting logic, topology and text is still very much unsolved. We concludethat there is no clear answer to the question: “Can we find a circle in a circular argument?” Wepoint out some possible avenues of exploration. The code used in our experiment is also shown.
This note describes our attempt to reconstruct the recent work by Tymochko et al. [12], suggestingthe possibility of capturing ‘the notion of logical shape in text,’ using ‘topological delay embeddings,’a technique derived from dynamical systems, and applied to word embeddings. The authors arguethat using topological techniques it might be possible to “find a circle in a circular argument?”The authors say [12]:We were originally motivated by the question “Why do we call a circular argument‘circular’ ?” A circular argument is one that logically loops back on itself. This intuitivedefinition of why the argument is circular actually has an analogous and mathematicallyprecise definition from a topological perspective. Topology is the mathematical notionof shape and a topological circle can be abstractly defined as any shape which startsand loops back on itself (i.e. a circle, a square, and a triangle are all topologically acircle).This an interesting problem, and to address it, the cited work uses a new technique of
TopologicalWord Embeddings (TWE). The method combines topological data analysis (TDA) with mathemat-ical techniques in dynamical systems, namely time-delayed embeddings. The cited article suggestsa positive answer to the motivating question.However, in this note, we argue, using similar examples, that the process of finding circularityusing TDA might be more complicated. We describe our attempt, after reconstructing the TWEmethod, in applying it to circular and non-circular examples. We observe that the method oftenfails to distinguish the two cases. Thus, while in the cited work [12] and elsewhere [8] , we have some1 a r X i v : . [ c s . A I] F e b nteresting examples connecting topological data analysis and inference, the problem of connectingreasoning with topological data analysis is, in our opinion, still open.This note is organized as follows. We dispense with the preliminaries, and instead we refer thereader who is not familiar with topological data analysis to the original article [12] for the TDAbackground and the description of the experiment which motivated this reconstruction and exten-sion. Similarly, since word embeddings are a standard natural language representation technique,we only need to mention that it consists in replacing words (or terms) by 50 to 1000 dimensionalvectors, which have been proven to capture e.g. words similarities. Again we refer the reader to thestandard sources e.g. [5, 6]. Finally, we make no attempt to put this note into a broader contextof TDA-based research, and we provide only a bare minimum of references.In Section 2 we briefly describe the method, as well as the software and settings used in ourexperiment. Section 3 is all about examples showing the instability of the TWE method; thatis a slight change in parameters results in markedly different persistence diagram. We finish, inSection 4, with a discussion and propose some avenues of research on connecting TDA and inference.The Appendix has the code we used for these experiments. The purpose of out experiment was to see whether circularity of arguments corresponds to thepresence of loops in a representation of the sentences in the arguments as reported in [12]. Briefly,the representation to which TDA is applied is created by the following steps:1. The words in the sentences are replaced by their embedding vectors2. Since the words occur in a sequence, the embedding vectors can be viewed as aseries of vectors.3. To get a one-dimensional time series we compute the dot product of each vectorwith a random vector of the same dimension.4. This time series is then processed using a time delay embedding method5. Finally, the TDA is applied and the persistence computed and displayed.These are exactly the steps in Fig. 1 in [12]. However, we need to clarify some aspects of ourexperiment, since the original paper does not mention certain details. The following list refers tothe steps above:1. We used 50, 100, 200 and 300 dimensional Glove embedding vectors [6]. Glovevectors were also used in [12].2. (nothing to add here)3. We used the numpy dot product, and random seed of 42 in most experiments, butwe also used the seeds of 1, 2 and 3 (to show that changing the seed changes thepersistence diagrams.4. We used the function takensEmbedding from the public repository available onKaggle ( ) Itimplements the Takens time delay embedding method [10]. However, we did not investi-gate the optimal parameters, and instead we used the time delay of 2, and dimension 2, asin the original paper. The cited Kaggle code allows us to search for optimal parameters,and we intend to use it in further experiments. In this note, we also looked into into (2,3) nd (3,2) as time delay and dimension of Takens embeddings, and observed such changescan significantly impact the patterns shown in persistence diagrams.5. For TDA and persistence we used the Python version (0.6) https://ripser.scikit-tda.org/en/latest/ of Ripser [11]. In this section we first look at persistence diagram for circular reasoning. We compare them withexamples with very similar words and structures which are not circular. Then we perform thesame exercise for syllogisms. We use two examples are from the cited paper. The other ones areours, and are intended to test the influence of changes in vocabulary, paraphrase, and patterns ofreasoning. • Circular ([12]): “There is no way they can win because they do not have enough support.They do not have enough support because there is no way they can win.” • Circular (ours, word substitution to the previous one): “There is no way for the crew to winbecause the crew does not have good rowers. The crew does not have good rowers because thereis no way for the crew to win.” • Circular (ours, modified paraphrase of the above ): “The Russian crew must lose because thecoach did not hire Siberian rowers. The team did not enlist the good Siberian rowers becausethere is no way for the Russian crew to win.” • Non-circular ([12]): “There is no way they can win if they do not have enough support. Theydo not have enough support, so there is no way they can win.” • Non-circular (ours): ”No way the anarchists can lose the primary elections if they haveenough support. The anarchists have strong support, so there is no way they can lose theprimaries.” • Inductive reasoning (ours): “Gold is going up. Platinum is also going up. We should buy allmetals, including copper and silver.” • Syllogism (ours): “Every animal is created by evolution. The lion is an animal. The lion iscreated by evolution.” • Absurd (random text, [12]): “The body may perhaps compensates for the loss of a true meta-physics. Yeah, I think it’s a good environment for learning English. Wednesday is hump day,but has anyone asked the camel...”
Not only do we fail to detect any particular correlation between circularity and persistence,but we also see that persistence, for the same sentences, looks very differently for 50, 100, 200 and300–dimensional Glove vectors. In addition, we see the same the patterns for valid reasoning (e.g.syllogism) and for inductive reasoning. Inductive reasoning usually is not logically valid, althoughdepending on the context might be plausible. We might have seen some influence of less frequentwords on the persistence diagrams, but we are not sure.And contrary to [12], we see that the absurd, random text cited above can exhibit pretty muchany pattern of persistence, depending on the parameters used to find the persistent homology.3he conclusion we will draw from this exercise is that circular, non-circular and absurd modes ofargument might exhibit the same visual properties derived from persistence homology, and thereforecannot be distinguished by looking for “a circle in a circular argument.” (a) (g:50; seed:1) (b) (g:50, seed:2) (c) (g:50; seed:3)
Figure 1:
Persistence diagrams for the circular text from [12]: “There is no way they can win because theydo not have enough support. They do not have enough support because there is no way they can win.” . Withdifferent seeds (1,2,3) we get different patterns, even for the same embedding dimension. Here, the dimensionof embeddings is 50 (g:50), but the same phenomenon occurs for other dimensions. (a) (g:200; seed:42) (b) (g:100, seed:42) (c) (g:300; seed:42)
Figure 2:
Persistence diagrams for two circular text examples. Panel (a) “There is no way for the crew towin because the crew does not have good rowers. The crew does not have good rowers because there is noway for the crew to win.” makes simple vocabulary substitutions to the circular sentences from [12]. Panels(b) and (c) show the patterns obtained with additional changes: “The Russian crew must lose because thecoach did not hire Siberian rowers. The team did not enlist the good Siberian rowers because there is no wayfor the Russian crew to win.”
Figure 1 represents persistence diagrams for the circular text example from [12]: “There isno way they can win because they do not have enough support. They do not have enough supportbecause there is no way they can win.” . It shows that with different seeds (1,2,3) we get differentpatterns, even for the same embedding dimension. In the pictured examples the dimension ofembeddings is 50 (g:50), but the same phenomenon occurs for other dimensions. As in [12], we use4he Takens embedding dimension = 2, and the Takens embedding delay = 2.Figure 2 represents persistence diagrams for two circular text examples. Panel (a) makes simplevocabulary substitutions to the circular sentences from [12]: “There is no way for the crew to winbecause the crew does not have good rowers. The crew does not have good rowers because there isno way for the crew to win.” . Panels (b) and (c) show the patterns obtained with a few morechanges: “The Russian crew must lose because the coach did not hire Siberian rowers. The teamdid not enlist the good Siberian rowers because there is no way for the Russian crew to win.”
Theseexamples show that it is possible to get a ‘random’ pattern by changing vocabulary of the sentences,and such patterns can be ‘random’ or not, depending on the dimension of embeddings. As before,we use the Takens embedding dimension = 2, and the Takens embedding delay = 2.Based on the examples shown in Figures 1 and 2, we observe that contrary to the hypothesisproposed in [12], circular reasoning pattern can produce persistent homology patterns associatedwith random text. In Section 3.3 we will see that the opposite is true as well. (a) Non-circular: (g:200; seed:42) (b) Induction: (g:50, seed:42) (c) Syllogism: (g:50; seed:42)
Figure 3:
Persistence diagrams for three non-circular text examples. Panel (a) represents ”No way theanarchists can lose the primary elections if they have enough support. The anarchists have strong support,so there is no way they can lose the primaries.” modeled after [12] non-circular example, but with changedvocabulary and less repetitive pattern. It shows that a valid argument can result in a ‘random pattern.’Panels (b) shows an example of inductive reasoning. ’Gold is going up. Platinum is also going up. Weshould buy all metals, including copper and silver.’
Panel (c) show the patterns obtained from a syllogism. ’Every animal is created by evolution. The lion is an animal. The lion is created by evolution.’
Even thoughwe see some similarities between (b) and (c), in other dimensions such similarities might not appear. Asbefore, we use the Takens embedding dimension = 2, and the Takens embedding delay = 2.
Again, despite repeated experiments with different parameters we cannot see any persistentpatterns of homological persistence (pun intended). The examples show that it is possible to geta ‘random’ pattern by changing vocabulary of non-circular sentences, and such patterns can look‘random’ or not, depending on the dimension of embeddings.5 a) (g:100, tdim:2, tdel:2) (b) (g:200, tdim:2, tdel:2) (c) (g:300, tdim:2, tdel:2)(d) (g:100, tdim:3, tdel:2) (e) (g:200, tdim:3, tdel:2) (f) (g:300, tdim:3, tdel:2)(g) (g:100, tdim:2, tdel:3) (h) (g:200, tdim:2, tdel:3) (i) (g:300, tdim:2, tdel:3)
Figure 4:
This figure represents persistence diagrams for the random text example from [12]: “The body mayperhaps compensates for the loss of a true metaphysics. Yeah, I think it’s a good environment for learningEnglish. Wednesday is hump day, but has anyone asked the camel...”
The first parameter is the dimensionof the (Glove) word embedding vectors; the second parameter is the dimension of Takens embedding; andthe third parameter is the delay of Takens embedding. .3 Random text The cited work uses the following example to argue that random text produces chaotic display ofpersistence:
The body may perhaps compensates for the loss of a true metaphysics. Yeah, I thinkit’s a good environment for learning English. Wednesday is hump day, but has anyone asked thecamel...
As shown in Figure 4, there seem to be no discernible pattern in persistence diagrams forthis text. Depending on the choice of parameters we see that the same random text can producevirtually no signal, as in panel (b); produce ‘random’ signal, as in panel (d), produce relativelystrong signal in panels (c) and (g); and some patterns in between in the remaining panels.
The first point we want to make is that the examples shown in the previous section show thatthere seem to be no clear relationship between persistence and circularity of a textual argument,contrary to the hypothesis proposed in [12]. Neither, there seem to be such a relationship withrespect to random text.The second point is that our counter-examples do not prove that there are no significant rela-tions between topological features and circularity. An even stronger point we want to make is thatour intuitions agree with [12], and that such relationships are plausible. We hope and expect themto be discovered eventually, but we feel that the methods will likely to be more subtle. Our mathe-matical intuition points to the fact that topology and logic are connected through Heyting/Brouweralgebras (see e.g. [7, 9]). On the TDA side, our earlier and current research [3, 8, 4, 2] suggests thatthe contribution of topology to classification and/or inference may lie in augmenting other methods(and not necessarily being the center of the show as in [1]). In particular, it is possible we couldsee some statistical dependencies between persistence and circularity, if we ever run proper testingon a dataset of circular and non-circular arguments. Alas, we do not think such a dataset exists.Thus, we end this short note by pointing to these two avenues of exploration, one mathematicaland one experimental.
Note:
The references below are very incomplete, and we suggest the readers consult the list ofreferences in [12] for a more appropriate introduction to topological data analysis.
References [1] Pratik Doshi and Wlodek Zadrozny. Movie genre detection using topological data analysis.In
International Conference on Statistical Language and Speech Processing , pages 117–128.Springer, 2018.[2] Shafie Gholizadeh, Ketki Savle, Armin Seyeditabari, and Wlodek Zadrozny. Topological dataanalysis in text classification: Extracting features with additive information. arXiv preprintarXiv:2003.13138 , 2020.[3] Shafie Gholizadeh, Armin Seyeditabari, and Wlodek Zadrozny. Topological signature of 19thcentury novelists: Persistent homology in text mining.
Big Data and Cognitive Computing ,2(4):33, 2018.[4] Shafie Gholizadeh, Armin Seyeditabari, and Wlodek Zadrozny. A novel method of extractingtopological features from word embeddings. arXiv preprint arXiv:2003.13074 , 2020.75] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed repre-sentations of words and phrases and their compositionality. In
Advances in neural informationprocessing systems , pages 3111–3119, 2013.[6] Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors forword representation. In
Proceedings of the 2014 conference on empirical methods in naturallanguage processing (EMNLP) , pages 1532–1543, 2014.[7] Helena Rasiowa and Roman Sikorski.
The mathematics of metamathematics. PWN . PolishScientific Publishers, Warsaw, 1963.[8] Ketki Savle, Wlodek Zadrozny, and Minwoo Lee. Topological data analysis for discoursesemantics? In
Proceedings of the 13th International Conference on Computational Semantics-Student Papers , pages 34–43, 2019.[9] R Sikorski. Applications of topology to foundations of mathematics.
General Topology and itsRelations to Modern Analysis and Algebra , pages 322–330, 1962.[10] Florius Takens. Detecting strange attractors in turbulence. In D.A. Rand and L.-S Young,editors,
Dynamical Systems and Turbulence , volume 898 of
Lecture Notes in Mathematics ,pages 366–81. Springer-Verlag, New York, 1981.[11] Christopher Tralie, Nathaniel Saul, and Rann Bar-On. Ripser.py: A lean persistent homologylibrary for python.
Journal of Open Source Software , 3(29):925, 2018.[12] Sarah Tymochko, Zachary New, Lucius Bynum, Emilie Purvine, Timothy Doster, Julien Cha-put, and Tegan Emerson. Argumentative topology: Finding loop (holes) in logic. arXivpreprint arXiv:2011.08952 , 2020. 8 ode Appendix
This code runs on Google Colab under the assumption that Glove vectors are installed in a particulardirectory. Therefore, to run it, the paths to word embeddings should be appropriately changed.The next four figures have all the commands required to replicate or extend our results. How-ever, the code for entities with a 200 in their names should be replicated (and adjusted) for differentsize embedding vectors. For dimension 200 the code runs in exactly in its current form.Figure 5:
Step 1.
First, we install the preliminaries, and then the code which will be used to produceTakens embeddings. The source of this function is given in the comment.
Step 2.
Data preparation. Getting the embedding vectors, creating a map between words andvectors. (Models for other dimensions are created in exactly the same way). The method glove function maps tokens into embedding vectors.
Step 3.
The times series are obtained by projections on a random vector (as in [12]). The nextpanel allows an interactive exploration of persistence diagrams for different arguments, and the saving of theimages. An example of using it is shown (without the output).
Optional:
The method display argument3 enables the user to interactively explore persistencediagrams with arbitrary parameters of Takens embeddings.enables the user to interactively explore persistencediagrams with arbitrary parameters of Takens embeddings.