[PDF] Architecture and evolution of semantic networks in mathematics texts

Abstract

Full PDF

AArchitecture and evolution of semantic networks in mathematicstexts

Nicolas H. Christianson , , Ann Sizemore Blevins , Danielle S. Bassett − , ∗ John A. Paulson School of Engineering and Applied Sciences, Harvard University,Cambridge, MA 02138, USA Department of Bioengineering, School of Engineering & Applied Science, University ofPennsylvania, Philadelphia, PA 19104, USA Department of Physics & Astronomy, College of Arts & Sciences, University ofPennsylvania, Philadelphia, PA 19104, USA Department of Electrical & Systems Engineering, School of Engineering & AppliedScience, University of Pennsylvania, Philadelphia, PA 19104, USA Department of Neurology, Perelman School of Medicine, University of Pennsylvania,Philadelphia, PA 19104, USA Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania,Philadelphia, PA 19104, USA Santa Fe Institute, Santa Fe, NM 87501, USA ∗ [email protected] 26, 2019 Abstract.

Knowledge is a network of interconnected concepts. Yet, precisely how the topological structureof knowledge constrains its acquisition remains unknown, hampering the development of learning enhance-ment strategies. Here we study the topological structure of semantic networks reﬂecting mathematicalconcepts and their relations in college-level linear algebra texts. We hypothesize that these networks willexhibit structural order, reﬂecting the logical sequence of topics that ensures accessibility. We ﬁnd thatthe networks exhibit strong core-periphery architecture, where a dense core of concepts presented early iscomplemented with a sparse periphery presented evenly throughout the exposition; the latter is composedof many small modules each reﬂecting more narrow domains. Using tools from applied topology, we ﬁndthat the expositional evolution of the semantic networks produces and subsequently ﬁlls knowledge gaps,and that the density of these gaps tracks negatively with community ratings of each textbook. Broadly, ourstudy lays the groundwork for future eﬀorts developing optimal design principles for textbook expositionand teaching in a classroom setting. 1 a r X i v : . [ c s . C L ] S e p ntroduction Knowledge has been distilled into formal representations for millennia [1, 2]. Such eﬀorts have sought toexplain human reasoning and support artiﬁcial intelligence [3, 4, 5]. Semantic networks organize informationby detailing concepts and their relations as the nodes and edges of a graph [6]. In an educational context,concept maps reﬂect students’ understanding of information in a similar manner, but may be used to evaluatecomprehension [7, 8, 9, 10] and identify topics that are most diﬃcult to connect to other concepts [11]. Withthe capacity to construct semantic networks, concept maps, and similar formal representations of knowledgecomes the challenge of distilling mechanisms of knowledge acquisition.Network science oﬀers an appropriate conceptual language and useful mathematical toolset with whichto meet this challenge [12]. In the parlance of network science, semantic networks of language tend to exhibithighly ordered architectures with strong local clustering, relatively short paths between any pair of nodes,and a few hubs, which are connected to an unexpectedly large number of other nodes [6, 13]. Recent workusing highly stylized laboratory experiments provides some preliminary evidence that network structure mayplay a role in how humans process information [14] and acquire knowledge [15, 16, 17]. Yet extending theseﬁndings to the real world has proven diﬃcult and it remains unknown precisely how the network structure ofknowledge in the form of science textbooks [18], science and mathematics topics on Wikipedia [19], and evenformal scientiﬁc papers [20, 21] impacts the learnability of these content domains. Furthermore, as learningis a process, studies of semantic network architecture would beneﬁt from evaluating a network’s dynamicstructure as it unfurls over the course of presentation, exposition, or acquisition. The education literatureestablishes that the order in which topics are introduced can help or hinder learning at this level [22, 23], buta rigorous understanding of order and dynamic structure in knowledge acquisition has not been formalizedin ecologically valid experimental settings.Here we seek to address these limitations by studying semantic networks of mathematical conceptsin linear algebra textbooks [24, 25]. A common college-level course, the subject is rigorous and logical,sequentially introducing concepts that naturally relate to, depend on, and follow from other concepts. Tobegin, we seek to understand the structure of these inter-concept relations in textbooks, which present theknowledge in a thoughtfully ordered and comprehensive exposition. While each author may introduce andrelate topics in a diﬀerent order, we assume that each text serves to elucidate and approximate the latentstructure of the domain of knowledge it conveys. Using techniques from network science [12], we test thehypothesis that these semantic networks exhibit structural order, indicating a logical sequence of topics thatensures accessibility. Motivated by a recent report that language acquisition proceeds through an orderedprogression ﬁlling knowledge gaps [26], we use persistent homology [27, 28, 29, 30] to track the growth anddevelopment of topological cavities in the semantic network. We predict that fewer knowledge gaps will existin the texts than in null models of randomly growing semantic networks; withholding connections betweentopics that have already been taught is unlikely to eﬀectively convey knowledge. Finally, we compare thegrowth of semantic networks elicited from multiple texts, in terms of their diﬀerent expositional structuresand topic orderings. We hypothesize that the degree to which knowledge gaps are created and persist withintexts may be related to the complexity or diﬃculty of a text, and to the knowledge it conveys. Broadly, ourquantitative evaluation of the diﬀering structures and expositional layouts of distinct textbooks provides afoundation for future work examining the eﬀects of topic ordering and network architecture on classroomlearning.

Results

We constructed semantic networks and expositional growing networks from 10 linear algebra textbooks(see Methods). We ﬁrst used a modiﬁed version of the RAKE algorithm [31] to identify signiﬁcant phrases(Fig. 1, step 1), which we refer to collectively as the index list of concepts. We represent these concepts asnodes, and connect two nodes by an edge if their corresponding concepts co-occur within the same sentence(Fig. 1, step 2). To mimic the growth of a reader’s knowledge network, we add nodes and edges as soonas they are mentioned in the book (Fig. 1, step 3). Across textbooks, node sets ranged in length from 146to 453 (average 279.4) and edge densities ranged from 0.0748 to 0.204 (average 0.129). In what follows, wecharacterize the semantic network growth of all texts, and when useful we give examples from individualtexts referred to by author last name. 2igure 1: Extracting growing semantic networks from textbooks. (1)

The index set is populated with phrasesconveying signiﬁcant mathematical concepts. (2)

Any index nodes that co-occur within the same sentenceare connected by an edge. (3)

This procedure is applied to each sentence in the exposition, forming a viewof the semantic network as it grows throughout the text.3 eso-scale structure of semantic networks

Mathematics as a ﬁeld and linear algebra as a subject contain many fundamental topics and conceptualconnections between those topics. Practitioners and authors might contest which topics are fundamental,and which are more tangential, or less strongly linked to the rest. Within a network, this organizationalscheme can manifest as core-periphery structure where fundamental concepts are densely connected to oneanother, while peripheral concepts connect to the core but not to one another (Fig. 2a). To assess thisstructure in a semantic network constructed from the whole text, we calculate the core-periphery statisticand compare statistic values to those obtained from two null models (Fig. 2c): (i) a random index null model ,in which random words from each text are used to generate an expositional network, and (ii) a continuousconﬁguration model , in which the original network is rewired while maintaining node degree and strength.Generally, we observe that the empirical semantic networks show greater core-periphery organization thanthe continuous conﬁguration model, suggesting the presence of a strongly connected core of topics alongwith a set of sparsely connected periphery topics given the degree and strength distributions. Interestingly,we also observe that the empirical networks show less core-periphery organization than the random indexmodel, indicating that the networks of math terms are more homogeneous than a network of randomlychosen words.We next investigate the internal structure of the core and periphery. For the core, we ﬁnd that acrosstexts many similar words participate, including ‘determinant’, ‘vector space’, and ‘matrix’ as expected (seeSupplementary Table S3). In contrast, we expect that the periphery contains terms more speciﬁc to a givenbook and its particular sub-topics. We therefore hypothesize that the periphery will display communitystructure (Fig. 2d). To test our hypothesis, we calculate the modularity of the periphery subnetwork, alongwith the relevant subnetworks of the random index and continuous conﬁguration null models. We observethat the periphery of each semantic network generally exhibits a modular organization that is stronger thanthat of the continuous conﬁguration model, but weaker than that of the random index model (Fig. 2e).Intuitively, while randomly chosen words may display strong modularity due to greater variation in semanticrelationships and frequencies, mathematics phrases are used in a more modular fashion than expected fromthe rewired continuous conﬁguration model, perhaps due to the nature of focusing on one general idea at atime in chapters and sections.

Expositional development of the large-scale structure

How does the identiﬁed network structure develop along a text’s exposition? We ﬁnd that the expositionalintroduction of nodes in the ﬁnal network’s core precedes the introduction of periphery nodes throughout theexposition (Fig. 3a). We quantify this observation by calculating the area between the core and peripherynode introduction curves; high values indicate that the core appears much earlier than the periphery, andlow values indicate that the core and periphery appear at a more equal rate. The areas range from 0.064and 0.20 across texts, and a one-sample t -test rejects the null hypothesis that these values are drawn froma distribution with mean 0 ( t = 8 . p = 1 . × − ; calculated with the SciPy library, version 1.1.0 [33]).Next, we compare the areas obtained from the texts to those expected in statistical null models. Notably,we ﬁnd that the empirical periphery is introduced earlier (relative to the core) than expected from therandom index model, which has a more stark diﬀerence between core and periphery introduction (Fig. 3b).We observe no consistent trend across texts in comparison to a random sentence model , in which we usethe original index list to build a growing graph from the texts after randomizing sentence order. Whilemany texts show a marked discrepancy between the core and periphery development, others show a moreeven development. These diﬀerences across texts could reﬂect diﬀerent expositional styles amongst diﬀerentauthors: some may choose to introduce core topics initially and save extra tangents for later, while othersmay involve discussions of peripheral topics throughout the text for motivation. Additionally, we take asimilar approach in examining the relative rate of introduction of edges connecting diﬀerent types of groupswithin the core and periphery, and ﬁnd that of all edge types, those connecting concepts within a singleperiphery community are introduced the most sporadically, with some communities being fully introducedearly on in the text, and some being introduced later (see Supplementary Fig. S2).4 xpositional development of knowledge gaps In studying core and periphery formation, we focused on densely connected areas in the growing networks;now we turn to a study of sparsely connected areas. Speciﬁcally, we seek to understand how voids orknowledge gaps might emerge and evolve throughout the exposition. Teaching strategies may intentionallyleave open a connection or an area of the knowledge space in order to more intuitively reveal the connectionlater when a learner has more experience, or to provide the reader the opportunity to derive the connectionon his/her/their own. A lack of connections between concepts can manifest as a topological gap in thenetwork (Fig. 4a).To detect gaps that form and evolve throughout the text, we compute the persistent homology [27, 28,29, 30] of the ordered set of networks composed of nodes and edges that exist at each point in the exposition;note, this ordered set of binary graphs is referred to as a ﬁltration. We speciﬁcally detect gaps betweenconnected components (dimension 0 homology > > > born at the ﬁrst instance of their appearance in the network, they live as long as the network growsand the topological void still persists, and they die when they are either connected to another previouslydisconnected component (in the case of dimension 0) or are tessellated by crossing edges (in the higherdimension cases). We invite the reader to refer to the Methods for a more rigorous description of persistenthomology in this application.In order to detect emerging and evolving gaps throughout exposition, we compute the persistent homologyof each text. The number of gaps of dimension n that are alive at a given point in the ﬁltration, called theBetti curve, is denoted β n . We see that the texts tend to generate a large number of components, as manifestby the initial β peak, followed by a rise in β , and ﬁnally a slow and steady increase in β (Fig. 4b). Foreach text, we summarize the life and death of each persistent gap in a barcode (Fig. 4c). Each bar representsa single persistent cavity; the left endpoint of the bar indicates the birth time of the persistent cavity, whilethe right endpoint indicates the death time. Across all texts, we see that although many persistent cavitiesare killed soon after birth, a non-trivial number of gaps in each of the three dimensions persist throughoutmany sentences, suggesting that long-lived gaps are a consistent hallmark of the growing text structure.To further evaluate the substantiveness of the gap architecture, we compared the persistent homologyof the text networks to two ﬁltration-based null models. In the ﬁrst null model, we use the introduction ofconcepts to order the complete network for the ﬁltration. More precisely, this node-ordered null model addsa node at the ﬁrst mention of the concept, and also adds all of the connections that will ever exist from thatnode to previously acquired concepts. This model mimics the teaching strategy of introducing all connectionsof a new concept to anything previously taught. We ﬁnd that the node-ordered model produces almost nopersistent homology, in stark contrast to the original text (see Supplementary Figs. S5, S6). This resultsuggests that the text expositions consistently leave connections between already-learned concepts for laterdiscussion. We use a second null model to determine whether the totally random introduction of edges mightproduce similar progressions of persistent cavities. We ﬁnd that this random edge order model produces anorder of magnitude more persistent cavities of dimension 1 and 2 than the original text (see SupplementaryFigs. S5, S6). Broadly, the presence of a few long-lived cavities in the actual text are consistent with thenotion that knowledge gaps exist but are introduced sparingly, and that introducing connections to all topicspreviously learned is not the strategy of these texts.At this point we know that throughout the text the introduction of terms and connections forms andﬁlls gaps as a reader progresses. However, we do not yet know if the number and longevity of persistentcavities is diﬀerent than we would expect from any growing semantic network in the text or from a reorderedtext. In order to answer this question, we deﬁne the normalized average cycle lifetime in dimension n asthe sum of all persistent cavity lifetimes normalized by the number of cavities and ﬁltration length (similarto metrics deﬁned in [34], see Methods for details). Then intuitively a large value of normalized averagecycle lifetime suggests that multiple long-lived persistent gaps exist, while a small value suggests that anygaps that form will die shortly after birth. We show the distributions of normalized average cycle lifetimevalues in dimension 0 in Fig. 4d for the random index and random sentence models, and the correspondingdistributions for dimensions 1 and 2 in Supplementary Figs. S7, S8. For completeness, we also includethe barcodes and Betti curves for each model in the Supplementary Figs. S3, S4, S5, S6. Strikingly, theoriginal text expositions generally fall below both null models’ expected normalized average cycle lifetimes5n dimension 0. This observation suggests that the exposition proceeds in a manner that may intentionallyavoid developing disconnected topics, or possibly connects new topics to others very quickly. In dimensions1 and 2, texts’ normalized average cycle lifetimes vary more in relation to their null models, with only ahandful of texts showing lower values than the null models. Evolving structure and text properties

After characterizing the structural features of the growing text networks, we next ask if these featuresmight relate to text rating. Perhaps some readers particularly enjoy a book that leaves open many gapsmotivating future study, while others enjoy a book with a stronger core oﬀering conceptual closure. Todetermine whether readers’ preferences relate to network structure, we used average text rating across alleditions from Goodreads ( goodreads.com ). We kept any text which had at least ﬁve ratings, which wasthe case for seven of the ten texts. We observe no signiﬁcant correlation between average text rating andnormalized average cycle lifetime across texts’ sentence-based ﬁltrations (Fig. 5a; see Supplementary Table S5for all Spearman’s correlation coeﬃcients and p -values). We also consider a one-at-a-time (OAAT) ﬁltration(see Supplementary Methods), which in addition to allowing for comparability in persistent homology acrosstexts and null models, provides additional information not just about a text’s knowledge gaps on the sentencescale, but furthermore its sub-sentence topological structure. Remarkably, we observe signiﬁcant negativecorrelations between average rating and OAAT normalized average cycle lifetime in dimensions 0 (Spearman’scorrelation coeﬃcient ρ = − . p = 0 . ρ = − . p = 0 . ρ = − . p = 0 . Discussion

Here we examined the structure and topological development of semantic networks of mathematicalknowledge as extracted from linear algebra texts. Meso-scale structural analysis indicates that the semanticnetworks exhibit strong core-periphery structure, where a tightly knit group of concepts form a core, sur-rounded by sparsely-connected periphery concepts that are grouped into communities. Furthermore, thesefeatures appear to relate to the growth of the networks over the course of exposition; the cores of net-works are built more quickly than the peripheries, and edges within each particular periphery communityare introduced at varied times over the course of exposition. Using persistent homology, we extracted theknowledge gaps inherent in the exposition and found that the number of distinct connected componentstends to decrease throughout the text, while topological cavities tend to increase. Finally, we examinedpossible relationships between the extent and persistence of knowledge gaps and other features of a text andits associated semantic network, providing motivation for future work examining the role of knowledge gapsin learning.

Structure and evolution of mesoscale features in semantic networks

The prevalence of core-periphery and community structures in the networks we examine is consistent witha hierarchical structuring of mathematical knowledge, in which there exist a set of foundational concepts(the core), which are necessary for the subsequent logical development of subsidiary (periphery) concepts,which themselves are hierarchically organized into related communities. The generic notion of hierarchicalstructure in mathematics has been discussed in the context of presenting a logical sequence of concepts ineducation [35]. Hierarchical structure has also been noted in Wikipedia topic networks, in which conceptstend to maintain several connections to the foundational concepts used in each article [19]. A hierarchicalstructure of mathematics knowledge is intuitive, particularly within a delimited area such as linear algebra: aset of foundational concepts, such as matrix, vector, and linearity, are used to motivate and develop the rest6f the topics within the ﬁeld, which, for the most part, all presuppose the concepts in the core. Naturally,this hierarchy will not be a simple dichotomy (core-periphery), but the subsidiary concepts should themselvesfall hierarchically into diﬀerent groups, which may diﬀer across texts due to author interests and publishergoals.The observed growth dynamics of core-periphery structure oﬀers a coherent expositional model. Giventhat the set of core concepts are highly related, and thus plausibly represent the concepts providing thefoundation for linear algebra, it seems reasonable to introduce these concepts early and to introduce peripheryconcepts, which presuppose the core, later. The importance of giving suﬃcient foundational context and priorknowledge in exposition is well-appreciated [36]. Further, the edge dynamics we observe are consistent withan expositional model in which topics are procedurally related to each other; the core concepts are introducedﬁrst, and used to introduce, at each point in the exposition, the communities that are being focused on;furthermore, subsidiary concepts that have already been introduced may then be used to give context forand develop further, separate subsidiary communities. Such an expository approach, in which connectionsare consistently introduced between that which has been learned, and that which is to be learned, has beendemonstrated as useful in teaching mathematics proofs [37]. An extension of this motivating expositorystyle is one that incorporates the historical context [38]. In future, it could prove fruitful to compare theexpository structure of mathematics texts and the historical development of the results included in thosetexts.It is worth noting that, while recent eﬀorts address the dynamics of core-periphery [39, 40] and community[41, 42] structures in networks, comparatively little work has addressed the growth and emergence of thesestructures over time. The perspective of our work is therefore important; we consider there to be some apriori structure of mathematical knowledge, unbeknownst to us, which each author seeks to convey. Thus,rather than examining the evolution of meso-scale features in the networks, we instead focus on how theeventual features, which we take to represent those present in the latent structure of the knowledge, arecreated throughout exposition. Such methods dealing with the emergence of meso-scale features could proveuseful in studies of learning. For example, how do semantic networks of students’ knowledge evolve asstudents are taught? Can that evolution be formally predicted by a generative network model built fromthe textbook used in their class?

Knowledge gaps in the exposition of mathematics texts

Using the tool of persistent homology, we examined the growth and persistence of knowledge gaps (collo-quially), or topological cavities (formally), in the semantic networks of linear algebra texts. While this toolhas been applied to other types of text and knowledge, including Shakespeare’s plays [43], natural language[44], discourse [45], and collections of mathematics papers [46], little is known about how knowledge gapswithin a single expositional text or growing semantic network may impact how that text or knowledge struc-ture might be received or understood. Our hypothesis, motivated by the idea that a topologically complexstructure with many gaps in knowledge might be more diﬃcult to learn, was that eﬀective exposition likelyseeks to produce a smaller number of knowledge gaps, as the creation of a great number of topologicalcavities could prove confusing to a reader. Still, leaving a few gaps throughout exposition can add intrigueto the subject, piquing the reader’s curiosity to make connections themselves [47].In the context of this discussion, it is interesting to contrast the features of a process that humans havearguably optimized for explicit learning with the features of a process that nature has arguably optimizedfor implicit learning [48]. As a token of the former, we consider textbook writing; as a token of the latter,we consider language acquisition in children [6, 49]. Evidence suggests that knowledge gaps, detected astopological cavities, are a robust feature of language acquisition in toddlers and their prevalence is unaﬀectedby maternal education or by the order in which words are learned [26]. One could speculate that this observedhomogeneity in the early semantic feature network learning supports robust language acquisition, ensuringthat children who are exposed to diﬀerent sets of words at diﬀerent times are still able to reach adult languageproﬁciency. In contrast, when constructing an exposition for a textbook whose sole purpose is to take a set ofstudents from naivety to sophistication in the same place and at the same time, such robustness is not neededand instead consistency, thoroughness, and comprehensiveness is required. The relative paucity of knowledgegaps in the textbooks we study here would be consistent with these distinctions in goals and environment.It could prove useful in the future to more generically assess the robustness of growing networks to the order7f node introduction [50], particularly to assess diﬀerences between implicit or explicit learning processes.Notably, we found that most cavities that were introduced were eliminated before the end of each text.We observed that, while multiple connected components were introduced, all were eventually – and usuallyquite quickly – connected into a single connected component, suggesting that the expositional order ofintroduction of edges throughout the text minimizes the extent to which cavities are formed. Remarkably,though, the order of the expositions – that is, the extent to which cycles were not introduced and didnot persist – did not appear to be maximal. That is, the node-ordered ﬁltration null model exhibitedsigniﬁcantly sparser persistent homology than we observed in the texts (Supplementary Fig. S8). Thisobservation suggests a tradeoﬀ between topological order and apparent learnability; speciﬁcally, while suchneatly-ordered expositions might minimize the extent to which knowledge gaps are created and persist, it islikely in the best interest of readable and enjoyable exposition to not follow this purely structural ordering– that is, to properly motivate concepts, give relationships where they might seem natural and useful, andmake the text generally more readable.Our correlation analysis of the barcode densities suggests some interesting directions for further study ofthe potential relationship between persistent homology of a growing semantic network and eﬀective learn-ability. Speciﬁcally, while our study did not deal explicitly with diﬀerential learnability of texts or in howknowledge gaps might aﬀect the learning process, we did observe several interesting relationships betweenthe 0- and 2-dimensional barcode densities and textbook ratings. While these results are preliminary, theysuggest that an interesting avenue for further study would be to examine the topology of growing semanticnetworks in the classroom setting. In particular, one could consider multiple networks: the network of thetextbook being used, providing the “latent space” of the knowledge and the relationships between concepts;the teacher’s network, as provided in class to the students through lessons; and ﬁnally, the students’ net-works, as they develop over time while the students learn the material. An analysis of the developmentaland topological relationships between all three of these classes of semantic networks could yield interestingresults in how knowledge structures are transferred from teacher and book to student, and could provideuseful insight to eﬀective structuring and expositional presentation of knowledge in a textbook format.

Methodological considerations

There are certain limitations inherent in our work that should be considered for future study. First,our text extraction methodology imperfectly converted PDFs to plaintext, leaving signiﬁcant textual noiseand artifacts of embedded math which required subsequent automated removal, and the remnants of whichprevented perfect concept extraction and sentence-level co-occurrence calculation. Because textbook PDFsare easier to access than textbook source material, we spent signiﬁcant time developing our text extractionapproach to account for these circumstances so that our methodology could be widely applicable. However,future work could utilize the LaTeX source for textbooks in order to reduce noise. Second, the problem ofconcept extraction is ill-posed due to the subjectivity of the notion of “concept”. We examined a number ofsupervised and unsupervised keyphrase extraction algorithms, and our modiﬁed RAKE algorithm performedbest in comparison to our intuitive expectations for linear algebra contexts. However, future work will benecessary to better understand (a) how to determine how many concepts should be extracted from a text,(b) what should comprise a “concept” in a semantic network, and to (c) examine hierarchically structuredsemantic networks to incorporate the subjectivity of concepts into the network structure, so that high-levelconcepts are distinguished from those which are lower-level. Third, our network and ﬁltration constructionmethodology is only one of many possible methodologies; as we chose to use co-occurrence to construct thenetworks, they are undirected and lack edge labels detailing the nature of each relationship. Fourth, theapplication of a clique complex to infer knowledge gaps in a growing network is one of many choices, andit assumes that any fully-connected ( k + 1)-cliques should, in fact, reﬂect a ﬁlled k -simplex of knowledge.However, a possible alternative could be to only add a k -simplex when such higher-order relationships areobserved simultaneously, such as when three words co-occur in the same sentence. Finally, further researchin a classroom setting should be able to provide insight into what types of knowledge gaps might have aneﬀect on student learning, thus providing an answer as to how persistent homology should be computed ongrowing semantic networks. 8 uture directions A clear open area for future work lies in understanding tradeoﬀs in ordered network structure. Here,we ﬁnd four separate instances in which semantic networks of linear algebra textbooks appear to balancecompeting constraints. First, while core-ness and modularity are higher than expected in a continuousconﬁguration null model, they are notably lower than expected in a random index null model. Second, whilecore nodes tend to be added more quickly than periphery nodes, the diﬀerence in speed is more stark in therandom index model. Third, while some texts add core nodes faster than expected in the random sentenceorder null model, some texts add core nodes more slowly, suggesting that each text opts for a diﬀerentexpositional style. Fourth, while the barcodes of the empirical networks are relatively sparse compared tothe random edge model, they still exhibit more persistent cycles than the most ordered model, the node-ordered ﬁltration. Collectively, these results suggest that eﬀective and useful exposition, while structuredin nature, is not as strongly structured as it could be. It may be eﬀective to purposefully introduce somegaps in knowledge by withholding topics to support productive failure [23] or provide detailed motivationto stimulate curiosity [47, 51]. Of course it is also possible that our observations reﬂect the nature of thestructure of mathematics: perhaps mathematics simply does not have as strongly-ordered a structure aswe might observe in our null models. Future eﬀorts could seek to better understand this tradeoﬀ and itspotential causes.

Materials and methods

All tools and methods developed for use in this work are designed to be broadly applicable to anyexpositional text. We thus provide Python code for the extraction and analysis of semantic networks at https://github.com/nhchristianson/Math-text-semantic-networks . Further details and considerations for themethods used can be found in the Supplementary Methods.

Data collection and preprocessing

We collected a diverse set of ten linear algebra textbooks in PDF format, ranging in focus from theoryto application (see Supplementary Methods for more details). We converted the PDF ﬁles to plaintext withthe tool at https://pdftotext.com , and manually cleaned each text to isolate the main chapters, discardingintroductory or appendix sections. We then converted the text to unicode KD normal form, replaced hyphenswith spaces, and used spaCy [52] to lemmatize all words in each text, which reduces inﬂected words to theirdictionary form. We then used the Python Natural Language Toolkit (NLTK, Version 3.3 [53]) to tokenizethe text into sentences and their component words, replacing any word containing numerical characterswith the character “ ), since such single character words likelyrepresent variables within the text. Then, we applied a series of rules to determine whether any word tokenwas suﬃciently variable-like to be converted to a “VAR” variable placeholder: any “

Concept extraction

The linguistics and natural language processing literature provide a number of canonical statistical metricsfor determining the signiﬁcance of n -grams , or phrases comprised of n words, within text [55]. After testingmultiple supervised and unsupervised keyphrase extraction methodologies, we chose to use an unsupervisedmethod based on the rapid automatic keyword extraction (RAKE) algorithm [31] to extract concepts fromour texts. RAKE works as follows: 9. A provided set of stop words, phrase delimiters, and word delimiters are used to divide the documentinto a set of candidate keyphrases and their comprising keywords.2. The frequency of keywords and their co-occurrence in diﬀerent keyphrases is calculated, forming aco-occurrence graph.3. The candidate keyphrases are ranked by a scoring function score( k ), which typically ranks candidatekeyphrases by certain properties of their comprising keywords.4. Some threshold n is chosen, and the top n ranked candidate keyphrases are kept as the extractedkeyphrases.In RAKE, the scoring function for a candidate keyphrase k is typically taken to bescore RAKE ( k ) = (cid:88) i deg( k i )freq( k i ) , where deg( k i ) and freq( k i ) are the degree and frequency, respectively, of the i th keyword comprising thephrase k in the co-occurrence graph RAKE constructs. As such, RAKE poses that signiﬁcant keyphrases arethose whose component words co-occur with many other words, but do not occur very frequently. Becausewe wish to ensure that the scores of more plausibly mathematical words are high, we modify this keyphrasescoring function to incorporate the term frequency-inverse document frequency ranking method [56], includ-ing an additional term to account for a given keyphrase’s frequency in an external corpus. Speciﬁcally, wespecify our phrase scoring function as score( k ) = score RAKE ( k )1 + brown( k ) , where brown( k ) is the number of times that the whole keyphrase occurs in the Brown corpus [57] with the“Learned” category (comprised of scientiﬁc and other academic texts) removed. As such, we aim to penalizephrases that occur very frequently in non-mathematical text, as such words will likely not be mathematicallymeaningful. We add 1 to the brown( k ) term in the denominator since not all phrases RAKE extracts occurin the Brown corpus. Details on our speciﬁc implementation of the modiﬁed RAKE algorithm can be foundin the Supplementary Methods. Network construction

We construct each text’s semantic network of concepts by calculating the co-occurrence of concepts ineach text on a sentence level. That is, we deem two concepts in each text’s index set to co-occur, and thusbe related, if they occur in the same sentence at some point in the text. We also assign to each edge betweenconcepts an integer weight indicating the number of sentences in which the two concepts co-occur. Thisdata yields an undirected weighted graph G = ( V, E ), where each node v ∈ V is a concept and each edge( v , v ) = e ∈ E represents a semantic relationship between concepts with an associated positive integerweight w ( e ) ∈ Z + denoting the number of sentences in which the two concepts co-occur.We are not merely interested in the total semantic network of each textbook, but in the development ofthe semantic networks over the course of exposition. Thus, for each text, we keep track of the ﬁrst sentencein which each concept and each relationship – equivalently, each node and each edge – is introduced. If a texthas N sentences, our methodology of extracting growing semantic networks yields a sequence of N graphs G → · · · → G N , where the k th graph G k includes all nodes and edges which have been introduced prior toor during the k th sentence of the text. In the context of algebraic topology which we employ throughout thisstudy, such a sequence of nested objects is called a ﬁltration. We call this sequence of graphs the expositionalﬁltration of a text. In considering this ﬁltration, we consider the binarized graphs; that is, we disregard edgeweight data during the exposition, only considering edge weight data for the ﬁnal semantic network, whichwe call the total network . 10 eso-scale network structure Complex networks often exhibit meso-scale or global characteristics of structural order. Certain networksexhibit community structure , in which densely connected communities of nodes exhibit sparse or weak inter-community connections [58]. In the context of semantic networks, such densely connected communities mayrepresent strongly related concepts that indicate the existence of some higher-order enveloping concept orumbrella term. Another type of meso-scale structure which may be exhibited is core-periphery structure,which is characterized by a densely connected set of core nodes and a set of periphery nodes which aresparsely connected amongst themselves, but are strongly connected to the core [59]. Such an organization ofsemantic networks is plausible in the context of mathematics, in which many diﬀerent ideas may be developedfrom a smaller set of highly related concepts.To detect community and core-periphery structure in the networks, we used the Brain ConnectivityToolbox for Python, version 0.5.0, which is based on the MATLAB Brain Connectivity Toolbox (BCT) [60].To evaluate the presence of a core-periphery structure, we seek to assign a network’s nodes to either the coreor the periphery group so as to maximize the core-ness quality function [61]: Q C = 1 v C  (cid:88) i,j ∈ C c ( w ij − γ C ¯ w ) − (cid:88) i,j ∈ C p ( w ij − γ C ¯ w )  , where C c and C p are the sets of nodes in the core and periphery, respectively, w ij is the weight of theedge from node i to node j (which will be 0 if the nodes are not connected by an edge), ¯ w is the averageof all edge weights, where nonexistent edges with “zero weight” are also included in the average, γ C is aresolution which controls the size of the core, which we set to 1, and v C is a normalization constant. Ineﬀect, in maximizing core-ness we seek to maximize the number and weight of intra-core connections, whileminimizing the number and weight of intra-periphery connections.To evaluate the presence of community structure in the networks, we use a Louvain-like locally greedyalgorithm [62] to optimize the modularity quality function: Q M = 1 v M (cid:88) i,j ∈ C (cid:18) w ij − γ M s i s j v M (cid:19) δ ij , where C is the set of network nodes, w ij is the weight of the connection from node i to node j , s i and s j are the summed weights of edges connected to node i and node j , respectively, γ M is a resolution param-eter controlling the size of communities which we set to 1, v M is a normalization constant, and δ ij is theKronecker delta function, which is 1 when node i and node j are in the same community and is 0 otherwise[61]. In eﬀect, modularity maximization seeks to maximize the strength and number of connections withincommunities, yielding a partition of the network nodes into a set of densely connected communities with fewinter-community connections. Persistent homology

Beyond characterization of the local and meso-scale attributes of the total semantic network of the texts,we furthermore seek to evaluate structural and topological characteristics of the semantic networks as theyare built over the course of the entire text. In particular, we study the extent to which “knowledge gaps”are created and persist in semantic networks throughout a text’s exposition. To this end, we use a methodwith roots in the mathematics of algebraic topology called persistent homology which, in short, evaluates thecreation and lifespan of topological “holes” in data over time, or in this case, over the course of exposition,thus allowing us to characterize and evaluate the presence of these gaps in knowledge. Here we give abrief, intuitive overview of how we calculate persistent homology for our expositional semantic networks; theparticularly interested reader may refer to Refs. [27, 28, 29] for a rigorous overview of persistent homologyand its computation for data analysis, as well as Refs. [63, 64, 65, 30, 66] for example uses of persistenthomology in the context of complex networks.Recall that a text’s semantic network at a certain point in the exposition (a particular graph in theexpositional ﬁltration) is an undirected graph, where connections between nodes indicate that the concepts11epresented by those nodes have already co-occurred in a sentence. Given a binary undirected graph G =( V, E ), we may construct an object called the clique complex X ( G ), which, for every natural number k ,assigns to every all-to-all connected subgraph of G on ( k + 1) vertices (also known as a ( k + 1)-clique) a k - simplex , which may geometrically be represented as the convex hull of ( k + 1) aﬃnely independent points.For example, a 0-simplex is simply a single node, a 1-simplex is an edge, a 2-simplex is a ﬁlled-in triangle,and a 3-simplex is a ﬁlled-in tetrahedron. Intuitively speaking, this clique complex X ( G ) is a “ﬁlled-in”version of the graph G , where, for each k , we choose a distinct color and then color in all ( k + 1)-cliques in G to form k -simplices. Then, classical homology intuitively describes, for each k , how many topological “holes”of dimension k are in the complex, or how many regions are enclosed by the k th color, but are themselves notcolored as such. In other words, homology detects cycles of k -simplices that surround a void. For example,a 1-cycle reﬂects a conventional cycle in a graph, just like the hole in a circle; and a 2-cycle reﬂects a cavity,like the hole in the center of a sphere. A 0-cycle is intuitively slightly diﬀerent, in that 0-cycles refer toconnected components of the graph, so that having more than one 0-cycle tells us that multiple disconnectedcomponents exist. In our work, we restrict our focus to these ﬁrst three dimensions, since these are themost geometrically intuitive. These cavities or holes are exactly the knowledge gaps we seek in the semanticnetworks, as they indicate some closed cycle of ( k + 1)-order connections between concepts surrounding aregion of lesser connectivity.A useful extension of homology enables us both to count the number of holes present in the semanticnetwork at each step in a text’s exposition, as well as to keep track of which topological cavities are createdand destroyed at each step. Speciﬁcally, persistent homology allows for the computation of the homologyfor the sequence of clique complexes of our expositional ﬁltration X ( G ) → · · · → X ( G N ); this tool notonly keeps track of the number of cavities of each dimension present at each expositional step, but it alsotracks the persistence of each individual cavity over the course of the exposition, so we may identify individualknowledge gaps, when they were created, how long they persist, and when they are extinguished. Rigorously,the k th persistent homology of a graph ﬁltration yields a (multi-)set of intervals called the barcode : { [ b , d ) , . . . , [ b m , d m ) } , where b i indicates the time of birth of the i th k -dimensional cavity, and d i indicates the time of death ofthat cavity (which may be ∞ if the cavity is still present in the total network, i.e., it never dies). Thus, thenumber of intervals, as well as their length (the diﬀerence between their death and birth times), indicate thenumber and persistence, respectively, of topological cavities during exposition.Once we have computed the persistent homology of a text’s expositional ﬁltration for a given k , we mayuse several characteristics of the resultant persistence intervals to examine various aspects of the persistenceof knowledge gaps in the semantic networks. In particular, we consider two metrics: ﬁrst, we examine thevalue of m , which gives the total number of k -cavities which were created, at some point, over the course ofthe exposition. Secondly, we deﬁne a metric similar to one presented in Ref. [34], which we refer to as the normalized average cycle lifetime of dimension k : D k = 1 mN m (cid:88) i =1 ( d i − b i ) , where N is the number of steps in the ﬁltration, and d i is the time of death of the i th k -cavity, unless d i = ∞ , in which case we set d i = N + 1, to distinguish these inﬁnitely-persisting cavities from those that dieat step N . Intuitively, this metric describes the extent to which an expositional ﬁltration has cycles whichare persistent; it is normalized by N , the length of the ﬁltration, and m , the number of k -cycles introducedthroughout the ﬁltration, so as to be comparable across texts which might have diﬀerent ﬁltration lengthsor total numbers of cycles introduced. The goal is to allow a formal comparison of how persistent k -cyclestend to be in diﬀerent texts.In our work, we use Ripser.py [67] due to its speed and eﬃciency for the computation of persistenthomology for the empirical networks and the null models. More precisely, homology ﬁnds equivalence classes of cycles, but we refer to an equivalence class as a cycle for simplicity. ull models In order to determine to what extent the results we obtain for meso-scale structure and topologicaldynamics in the semantic networks are signiﬁcant, we employ two categories of null models: data-level nullmodels, which randomize on the scale of the underlying text and index list, from which we may then extractsemantic networks and expositional ﬁltrations; and projected network-level null models, which randomize onthe scale of the networks we extract for each text. Furthermore, while some of these models are particularlysuited as null models for the structural metrics on the total network since they yield a single, weightednetwork, others are more suitable as null models for the growing dynamics of the semantic networks, as theyprovide a null expositional ﬁltration. For each null model, our null ensemble is comprised of 100 randominstantiations; we present the resulting null distributions of metrics alongside the data for our actual networksin our results. Here we summarize the null models and their uses; more detailed descriptions of the modelscan be found in the Supplementary Methods.(a) Random index: Expositional ﬁltration of randomly chosen words in each text, to serve as a semanticnetwork on random “concepts”. Acts as a null model for the total network and empirical ﬁltration.(b) Random sentence order: Expositional ﬁltration of original index set with randomly-shuﬄed sentences,keeping sub-sentence structure while randomizing exposition on the sentence scale. Acts as a nullmodel for the empirical ﬁltration.(c) Continuous conﬁguration: Rewired network preserving node degree and strength. Acts as a null modelfor the total network.(d) Random edge: Random reordering of edge introduction from the empirical ﬁltration, mimicking totallyrandom exposition. Acts as a null model for the empirical ﬁltration.(e) Node-ordered: Adds each node and all its edges in order of node introduction in the exposition,mimicking exposition that connects each concept to all previously-related concepts. Acts as a nullmodel for the empirical ﬁltration.

Acknowledgments

We are grateful to David Lydon-Staley, Dale Zhou, Alec Helm, and Shubhankar Patankar for theirgenerous advice on early versions of this manuscript. D.S.B., N.H.C., and A.S.B. acknowledge support fromthe John D. and Catherine T. MacArthur Foundation, the Alfred P. Sloan Foundation, the ISI Foundation,the Paul Allen Foundation, the Army Research Laboratory (W911NF-10-2-0022), the Army Research Oﬃce(Bassett-W911NF-14-1-0679, Grafton-W911NF-16-1-0474, DCIST- W911NF-17-2-0181), the Oﬃce of NavalResearch, the National Institute of Mental Health (2-R01-DC-009209-11, R01-MH112847, R01-MH107235,R21-M MH-106799), the National Institute of Child Health and Human Development (1R01HD086888-01), National Institute of Neurological Disorders and Stroke (R01 NS099348), and the National ScienceFoundation (BCS-1441502, BCS-1430087, NSF PHY-1554488 and BCS-1631550).13igure 2: Meso-scale structure of semantic networks. (a)

A schematic of core-periphery structure, withdensely-connected core nodes and a sparsely-connected periphery. (b)

A schematic of community structure,with densely-connected communities which are themselves sparsely connected to each other. (c)

The core-ness statistic of each network and of the corresponding random index and continuous conﬁguration nullensembles. (d)

Visualization of the Axler core-periphery structure. (e)

Modularity statistics of the peripheryof each network and of the corresponding random index and continuous conﬁguration null ensembles. (f )

Visualization of the Axler periphery community structure. Graph visualizations generated with Graph-tool[32]. See Supplementary Table S4 for example nodes present in the Axler periphery communities.14igure 3: Development of core-periphery structure during exposition. (a)

Core and periphery developmentcurves, showing the fraction of nodes in each group introduced by a particular time in the exposition; mean ± (b) Diﬀerence in area between core and periphery developmentcurves for all texts and corresponding random index and random sentence order null ensembles.15igure 4: Development and persistence of knowledge gaps throughout exposition. (a)

Examples of knowledgegaps in dimensions 0, 1, and 2. (b)

Number of live knowledge gaps, or cycles, in each dimension throughoutexposition; mean ± (c) Barcode for the Treil text, showing intro-duction, persistence, and death of cycles introduced throughout exposition. Barcodes for other texts areprovided in Supplementary Figs. S3, S4, S5, S6. (d)

The 0-dimensional normalized average cycle lifetimeacross all texts, as well as corresponding random index and random sentence order null models.16igure 5: Relation between textbook ratings and network topology. Scatterplots and best-ﬁt lines for averageGoodreads rating versus the normalized average cycle lifetime in dimensions 0 through 2 and the averageover dimensions, across all texts for the (a) sentence ﬁltrations and (b)

OAAT ﬁltrations.17 eferences [1] Sowa, J. F.

Semantic Networks (John Wiley & Sons, 2006). URL https://onlinelibrary.wiley.com/doi/abs/10.1002/0470018860.s00065 .[2] Steup, M. Epistemology. In Zalta, E. N. (ed.)

The Stanford Encyclopedia of Philosophy (MetaphysicsResearch Lab, Stanford University, 2018), winter 2018 edn.[3] Hartley, R. T. & Barnden, J. A. Semantic networks: visualizations of knowledge.

Trends in CognitiveSciences , 169 – 175 (1997).[4] Lehmann, F. Semantic networks. Computers & Mathematics with Applications , 1 – 50 (1992).[5] Nickel, M., Murphy, K., Tresp, V. & Gabrilovich, E. A review of relational machine learning forknowledge graphs. Proceedings of the IEEE , 11–33 (2016).[6] Steyvers, M. & Tenenbaum, J. B. The large-scale structure of semantic networks: Statistical analysesand a model of semantic growth.

Cogn Sci , 41–78 (2005).[7] L. Gallenstein, N. Mathematics concept maps: assessing connections. Teaching Children Mathematics , 436–440 (2011).[8] Broggy, J. & McClelland, G. Integrating concept mapping into higher education: A case study withphysics education students in an irish university. British Education Research Association Annual Con-ference

Manchester, Sept (2009).[9] Hill, L. H. Concept mapping to encourage meaningful student learning.

Adult Learning , 7–13 (2005).[10] Daley, B. J. Facilitating learning with adult students through concept mapping. The Journal ofContinuing Higher Education , 21–31 (2002).[11] Lapp, D. A., Nyman, M. A. & Berry, J. S. Student connections of linear algebra concepts: an analysisof concept maps. International Journal of Mathematical Education in Science and Technology , 1–18(2010).[12] Newman, M. E. J. Networks: An Introduction (Oxford University Press, 2010).[13] Utsumi, A. A complex network approach to distributional semantic models.

PLOS ONE , 1–34(2015).[14] Lynn, C. W., Papadopoulos, L., Kahn, A. E. & Bassett, D. S. Human information processing in complexnetworks. arXiv , 00926 (2019).[15] da Fontoura Costa, L. Learning about knowledge: A complex network approach. Phys. Rev. E ,026103 (2006).[16] Karuza, E., Thompson-Schill, S. & Bassett, D. Local patterns to global architectures: Inﬂuences ofnetwork topology on human learning. Trends in Cognitive Sciences , 629–640 (2016).[17] Koponen, I. T. & Nousiainen, M. Concept networks of students’ knowledge of relationships betweenphysics concepts: ﬁnding key concepts and their epistemic support. Applied Network Science , 14(2018).[18] Yun, E. & Park, Y. Extraction of scientiﬁc semantic networks from science textbooks and comparisonwith science teachers’ spoken language by text network analysis. International Journal of ScienceEducation , 2118–2136 (2018).[19] Fang, Z., Wang, J., Liu, B. & Gong, W. Wikipedia as domain knowledge networks - domain extractionand statistical measurement. In KDIR (2011). 1820] Chai, L. R. & Bassett, D. S. Evolution of semantic networks in biomedical texts. arXiv , 10534(2018).[21] Pereira, H., Fadigas, I., Senna, V. & Moret, M. Semantic networks based on titles of scientiﬁc papers.

Physica A: Statistical Mechanics and its Applications , 1192 – 1197 (2011).[22] Ritter, F., Nerb, J., Lehtinen, E. & O’Shea, T. (eds.)

In Order to Learn: How the sequence of topicsinﬂuences learning (Oxford University Press, 2007).[23] Kapur, M. Productive failure in learning math.

Cognitive Science , 1008–1022 (2014).[24] Mowat, E. Making connections: Mathematical understanding and network theory. For the Learning ofMathematics , 20–27 (2008). URL .[25] Mowat, E. & Davis, B. Interpreting embodied mathematics using network theory: Implications formathematics education. Complicity: An International Journal of Complexity and Education (2010).URL https://doi.org/10.29173/cmplct8834 .[26] Sizemore, A. E., Karuza, E., Giusti, C. & Bassett, D. Knowledge gaps in the early growth of semanticfeature networks. Nature Human Behavior , 682–692 (2019).[27] Carlsson, G. Topology and data. Bull. Amer. Math. Soc. , 255–308 (2009).[28] Zomorodian, A. & Carlsson, G. Computing persistent homology. Discrete Comput. Geom. , 249–274(2005). URL http://dx.doi.org/10.1007/s00454-004-1146-y .[29] Edelsbrunner, H. & Morozov, D. Persistent homology: Theory and practice. In Proceedings of theEuropean congress of mathematics , 31–50 (2013).[30] Otter, N., Porter, M. A., Tillmann, U., Grindrod, P. & Harrington, H. A. A roadmap for the computationof persistent homology.

EPJ Data Science , 17 (2017). URL https://doi.org/10.1140/epjds/s13688-017-0109-5 .[31] Rose, S., Engel, D., Cramer, N. & Cowley, W. Automatic Keyword Extraction from Individual Docu-ments , 1 – 20 (John Wiley & Sons, Ltd, 2010).[32] Peixoto, T. P. The graph-tool python library. ﬁgshare (2014). URL http://figshare.com/articles/graph_tool/1164194 .[33] Jones, E., Oliphant, T., Peterson, P. et al.

SciPy: Open source scientiﬁc tools for Python (2001–). URL .[34] Adcock, A., Carlsson, E. & Carlsson, G. The ring of algebraic functions on persistence bar codes.

Homology, Homotopy and Applications , 381–402 (2016).[35] Hart, K. Hierarchies in mathematics education. Educational Studies in Mathematics , 205–218 (1981).URL .[36] Gijselaers, W. H. Connecting problem-based practices with educational theory. New Directions forTeaching and Learning , 13–21 (1996). URL https://onlinelibrary.wiley.com/doi/abs/10.1002/tl.37219966805 .[37] Avital, S. M. Teaching a mathematical proof by exposition or the “let us deﬁne a function” syndrome.

International Journal of Mathematical Education in Science and Technology , 143–147 (1973). URL https://doi.org/10.1080/0020739730040209 .[38] Fried, M. N. & Jahnke, H. N. Otto Toeplitz’s 1927 paper on the genetic method in the teaching ofmathematics. Science in Context , 285–295 (2015).[39] Csermely, P., London, A., Wu, L.-Y. & Uzzi, B. Structure and dynamics of core/periphery networks. Journal of Complex Networks , 93–123 (2013). URL https://doi.org/10.1093/comnet/cnt016 .1940] Verma, T., Russmann, F., Araujo, N. A. M., Nagler, J. & Herrmann, H. J. Emergence of core-peripheriesin networks. Nature Communications (2016).[41] Bassett, D. S. et al. Robust detection of dynamic community structure in networks.

Chaos: AnInterdisciplinary Journal of Nonlinear Science , 013142 (2013). URL https://doi.org/10.1063/1.4790830 .[42] Alvari, H., Hajibagheri, A., Sukthankar, G. & Lakkaraju, K. Identifying community structures indynamic networks (2016).[43] Rieck, B. & Leitte, H. ’Shall I compare thee to a network?’: Visualizing the Topological Structure ofShakespeare’s plays. In Workshop on Visualization for the Digital Humanities at IEEE VIS (2016).[44] Zhu, X. Persistent homology: An introduction and a new text representation for natural languageprocessing. In

Proceedings of the twenty-third international joint conference on artiﬁcial intelligence,IJCAI ’13 , 1953–1959 (AAAI Press, 2013).[45] Savle, K., Zadrozny, W. & Lee, M. Topological data analysis for discourse semantics? In

Proceedings ofthe 13th International Conference on Computational Semantics - Student Papers , 34–43 (Association forComputational Linguistics, Gothenburg, Sweden, 2019). URL .[46] Salnikov, V., Cassese, D., Lambiotte, R. & Jones, N. S. Co-occurrence simplicial complexes in mathe-matics: identifying the holes of knowledge.

Appl Netw Sci , 37 (2018). URL https://doi.org/10.1007/s41109-018-0074-3 .[47] Loewenstein, G. The psychology of curiosity: A review and reinterpretation. Psychological Bulletin , 75–98 (1994).[48] Seger, C. A. Implicit learning.

Psychol Bull , 163–196 (1994).[49] Hills, T. T., Maouene, M., Maouene, J., Sheya, A. & Smith, L. Longitudinal analysis of early semanticnetworks preferential attachment or preferential acquisition?

Psychol Sci , 729–739 (2009).[50] Blevins, A. S. & Bassett, D. S. On the reorderability of node-ﬁltered order complexes. arXiv ,02330 (2019).[51] Wade, S. & Kidd, C. The role of prior knowledge and curiosity in learning. Psychon Bull Rev (2019).URL https://doi.org/10.3758/s13423-019-01598-6 .[52] Honnibal, M. & Montani, I. spaCy 2: Natural language understanding with bloom embeddings, convo-lutional neural networks and incremental parsing.

To appear (2017).[53] Bird, S., Klein, E. & Loper, E.

Natural Language Processing with Python (O’Reilly Media, 2009).[54] Lachowicz, D. Enchant (2017). URL .[55] Manning, C. D. & Sch¨utze, H.

Foundations of Statistical Natural Language Processing , chap. Colloca-tions (The MIT Press, 1999).[56] Salton, G. & Buckley, C. Term-weighting approaches in automatic text retrieval.

Information Processing& Management , 513 – 523 (1988). URL .[57] Kuˇcera, H. Computational analysis of present-day American English (Brown University Press, Provi-dence, R.I., 1967).[58] Newman, M. E. J. Modularity and community structure in networks.

Proceedings of the NationalAcademy of Sciences , 8577–8582 (2006). URL .2059] Borgatti, S. P. & Everett, M. G. Models of core/periphery structures.

Social Networks , 375–395(2000).[60] Rubinov, M. & Sporns, O. Complex network measures of brain connectivity: Uses and interpretations. NeuroImage , 1059 – 1069 (2010). URL .[61] Rubinov, M., Ypma, R. J. F., Watson, C. & Bullmore, E. T. Wiring cost and topological participationof the mouse brain connectome. Proceedings of the National Academy of Sciences , 10032–10037(2015). URL .[62] Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities inlarge networks.

Journal of Statistical Mechanics: Theory and Experiment , P10008 (2008). URL http://stacks.iop.org/1742-5468/2008/i=10/a=P10008 .[63] Horak, D., Maleti´c, S. & Rajkovi´c, M. Persistent homology of complex networks.

Journal of StatisticalMechanics: Theory and Experiment , P03034 (2009). URL http://stacks.iop.org/1742-5468/2009/i=03/a=P03034 .[64] Petri, G., Scolamiero, M., Donato, I. & Vaccarino, F. Topological strata of weighted complex networks.

PLOS ONE , 1–8 (2013). URL https://doi.org/10.1371/journal.pone.0066506 .[65] Stolz, B. J., Harrington, H. A. & Porter, M. A. Persistent homology of time-dependent functionalnetworks constructed from coupled time series. Chaos: An Interdisciplinary Journal of NonlinearScience , 047410 (2017). URL https://doi.org/10.1063/1.4978997 .[66] Bampasidou, M. & Gentimis, T. Modeling collaborations with persistent homology. arXiv , 5346(2014).[67] Tralie, C., Saul, N. & Bar-On, R. Ripser.py: A lean persistent homology library for python. The Journalof Open Source Software , 925 (2018). URL https://doi.org/10.21105/joss.00925 .[68] Axler, S. Linear Algebra Done Right (Springer International Publishing, 2015), 3 edn.[69] Bretscher, O.

Linear Algebra with Applications (Pearson Education, Inc., 2013), 5 edn.[70] Edwards, H. M.

Linear Algebra (Springer Science+Business Media, 1995).[71] Greub, W. H.

Linear Algebra (Springer Science+Business Media, 1967), 3 edn.[72] Heﬀerson, J.

Linear Algebra (2017), 3 edn. URL http://joshua.smcvt.edu/linearalgebra .[73] Lang, S.

Introduction to Linear Algebra (Springer Science+Business Media, 1986), 2 edn.[74] Petersen, P.

Linear Algebra (Springer Science+Business Media, 2012).[75] Robbiano, L.

Linear Algebra for everyone (Springer-Verlag Italia, 2011).[76] Strang, G.

Linear Algebra and Its Applications (Thomson Learning, 2006), 4 edn.[77] Treil, S. Linear algebra done wrong (2017). URL .[78] Mihalcea, R. & Tarau, P. TextRank: Bringing order into text. In

Proceedings of EMNLP 2004 , 404–411(Association for Computational Linguistics, Barcelona, Spain, 2004). URL .[79] Coulet, A., Shah, N. H., Garten, Y., Musen, M. & Altman, R. B. Using text to build semanticnetworks for pharmacogenomics.

Journal of Biomedical Informatics , 1009 – 1019 (2010). URL .2180] G´abor, K. et al. SemEval-2018 task 7: Semantic relation extraction and classiﬁcation in scientiﬁcpapers. In

Proceedings of The 12th International Workshop on Semantic Evaluation , 679–688 (Associ-ation for Computational Linguistics, New Orleans, Louisiana, 2018). URL .[81] Wermter, J. & Hahn, U. You can’t beat frequency (unless you use linguistic knowledge): A qualitativeevaluation of association measures for collocation and term extraction. In

Proceedings of the 21st Inter-national Conference on Computational Linguistics and the 44th Annual Meeting of the Association forComputational Linguistics , ACL-44, 785–792 (Association for Computational Linguistics, Stroudsburg,PA, USA, 2006). URL https://doi.org/10.3115/1220175.1220274 .[82] Sch¨utze, H. Word space. In

Advances in Neural Information Processing Systems 5 , 895–902 (MorganKaufmann, 1993).[83] Mcdonald, S. & Lowe, W. Modelling functional priming and the associative boost. In

Proceedings ofthe 20th Annual Meeting of the Cognitive Science Society , 675–680 (Erlbaum, 1998).[84] Bullinaria, J. A. & Levy, J. P. Extracting semantic representations from word co-occurrence statistics:A computational study.

Behavior Research Methods , 510–526 (2007). URL https://doi.org/10.3758/BF03193020 .[85] Palowitch, J., Bhamidi, S. & Nobel, A. B. Signiﬁcance-based community detection in weighteds net-works. Journal of Machine Learning Research , 1–48 (2018). URL http://jmlr.org/papers/v18/17-377.html .[86] Vallat, R. Pingouin: statistics in python. The Journal of Open Source Software , 1026 (2018).22 ummary of Supplementary Material In this supplementary document, we provide supplemental methods, followed by supplemental results.We conclude with additional discussion relevant to the ﬁndings in both the main and supplemental texts.

Supplementary Methods

Textbooks used

The ten textbooks used in our study [68, 69, 70, 71, 72, 73, 74, 75, 76, 77] have publication dates rangingfrom 1967 to 2018. The set also includes two texts that were translated from a diﬀerent language, and twotexts that are made available online for free use.

Considerations for concept extraction

In order to construct a semantic network, it is ﬁrst necessary to choose which concepts should comprisethe nodes of that network. Much previous work has considered all or most of the individual words in a textas the network nodes [20, 21]; we avoid this assumption so that we may consider, further than individualwords, higher-level concepts that may be presented in multi-word phrases. Another choice of nodes couldbe the topics present in the index of a text, if an index is included. We also choose not to use this method,as we seek to determine and extract the concepts from the text’s exposition via some more intrinsic metricof conceptual signiﬁcance. This choice was motivated by an interest in examining the semantic networks ofconcepts that the text poses as signiﬁcant, rather than simply those of concepts which the author deemssigniﬁcant. Thus, via this paradigm of intrinsic conceptual signiﬁcance, we aim to emulate human readersin their assessment of the signiﬁcance of concepts. In choosing a methodology of extracting concepts fromthe texts for use as the networks’ nodes, we sought to ﬁnd a method that would maximize the numberof extracted mathematical concepts while minimizing the number of extracted words and phrases that arenot mathematics related. We also sought a method that would be extensible to domains of knowledge andexposition aside from mathematics, so that our whole methodology can be extended to the analysis of generaltextual exposition. These considerations led to our development of the modiﬁed RAKE algorithm.

Implementation details for our concept extraction methodology

In our code, we use the python-rake implementation of RAKE ( https://github.com/fabianvf/python-rake );as a stop word list, we use the modiﬁed Ranks NL Long stop word list we discuss in the main text, fromwhich we remove the word “value”,which plays an important role in linear algebra phrases such as “singularvalue decomposition”. We also add to this stop list our placeholder words “ index list . This threshold of one-half is similar tothresholds used in other work, such as the threshold of one-third in RAKE [31] and Textrank [78]. However,no choice of threshold will perfectly include all relevant concepts and omit irrelevant words.23xamples counterexample text texts undergraduate chapterdeﬁnition notation proof exercise resultTable S1: Common words in mathematical exposition that we add to the stop word list for concept extractionsquare matrix determinantisomorphism eigenvalues has amay be an is the product ofhas only nonzero (a) Unﬁlled knowledge gap in the network square matrix determinantisomorphism eigenvalues has amay be an is the product ofhas only nonzerohas nonzero (b) Knowledge gap is ﬁlled

Figure S1: A simple example of a semantic network comprised of linear algebra concepts. (a)

The lackof connection between “square matrix” and “eigenvalues” or between “isomorphism” and “determinant”indicates the presence of a knowledge gap. (b)

The knowledge gap is extinguished by the addition of therelationship between “isomorphism” and “determinant,” thus ensuring that all concepts’ neighbors are alsoneighbors themselves.

Considerations for network construction

Once we have determined a set of concepts to use as the nodes of a text’s semantic network, we then wishto form the semantic network of those concepts and their relationships, as provided by the text’s exposition.Certain approaches to semantic network construction seek to determine not only whether two entities arerelated, but also the semantic nature of the relationship between the entities in question. Fig. S1 givesan example of such an annotated semantic network, in which each relation has a meaningful label. Suchsemantic parsing techniques to generate semantic networks have been applied to scientiﬁc texts in severalcases [79, 80], but they generally require involved syntactic parsing rules or data annotation. We did notuse these approaches, as the messy nature of the text-converted mathematics textbooks – with embeddedvariables, formulas, and symbols sometimes interjecting sentences – likely would have interfered with eﬀectiveinference of semantic relationships. Instead, we use a method of extracting concept relationships that is moreresistant to such noise: co-occurrence frequency [81]. Co-occurrence is a notion specifying the degree to whichwords or phrases tend to occur nearby each other in either a text or a set of texts. Statistical metrics basedon co-occurrence have been studied extensively in the ﬁeld of computational linguistics as a measure of thesemantic relatedness of words or phrases [82, 83, 84]. Because we are interested in relationships betweenconcepts which are not purely linguistic in nature, and since many of our extracted concepts are multiple-word phrases, we choose to calculate co-occurrence on the sentence level; this level of granularity will alsoensure that phrases in the same sentence, yet separated by a string of math variables, will be inferred to berelated.

Null models

Here, we describe in more detail the construction and role of each null model we employ in our work. Webegin with the data-level null models: for both the total network and the expositional ﬁltration, we wish todetermine the extent to which our results might simply be reﬂective of the topology one would expect fromthe growing “semantic network” generated by computing the co-occurrence of a random set of words in ourtexts. To this end, we employ a random index null model, in which we select a random set of index terms ofequal size to the original index list, drawn without replacement from the set of words comprising each text(not including the augmented stop word list we used for RAKE extraction). We use this random index list asour set of “concepts” for calculating each text’s co-occurrence, yielding both a ﬁnal weighted network, as wellas an expositional ﬁltration, allowing this null to be used both in the comparison of meso-scale structure anddevelopment, as well as of persistent homology. Note, however, that we may interpret the random index null24odel in a diﬀerent way: that is, since the random index set excludes any stop words, it must be comprisedof meaningful words. Thus, the random index model can be viewed as conveying a semantic network – notthe network that the book intends to convey, but a semantic network nonetheless that may very well includesome mathematically meaningful concepts.We further seek to establish the extent to which our results on topological development of the networksare dependent on the order in which relationships are introduced within the texts. We therefore employa random sentence order null model, in which for each text, we randomly permute that text’s sentences,and use the original set of index terms to calculate co-occurrence. This null model yields the same totalnetwork, since the index set is the same and the same sentences are present, and thus the same sentence-levelco-occurrences will occur; however, the ﬁltration it yields will diﬀer in the order of edge introduction, thusenabling us to study how the meso-scale and topological development of the network diﬀers based on diﬀeringsentence order.The remainder of our null models are projected network-level nulls. To evaluate the extent to whichthe results we observe for the core-periphery and community structure of the empirical networks would beexpected from a random network with a similar joint distribution of node degrees and weights, we use the continuous conﬁguration model [85]. This model is an extension of the conﬁguration model for random graphgeneration, and seeks to preserve the expected degree of each node, as well as the expected strength of thenode, where a node’s strength is the sum of the weights of the edges it participates in. Speciﬁcally, if d u and s u give the degree and strength, respectively, of a node u , and d T and s T are the sum of all node degreesand strengths, respectively, then given some graph with node set [ n ], for any two nodes u, v ∈ [ n ], we deﬁne d uv = d u d v d T and s uv = s u s v s T , as well as { P uv } as some family of probability distributions with mean one.Then to generate a graph using the continuous conﬁguration model, we iterate through all possible pairs ofnodes u, v , introducing an edge between u and v with probability d uv ; if an edge is introduced, then theedge is given weight w uv = s uv d uv ξ uv where the normalized weight random variable ξ uv ∼ P uv . For the sake ofsimplicity, we assume that all distributions P uv are identical, so that all ξ uv iid ∼ P ; we discuss our ﬁtting ofthe distribution P for each network in the supplement.To examine how our results on persistent homology diﬀer from a model of exposition in which connectionsare drawn completely at random – that is, with a ﬁltration of the empirical total network that adds edgesrandomly – we employ the random edge null model. In this model, edges present in the empirical totalnetwork are introduced in a random order, and nodes are introduced immediately preceding their ﬁrstinclusion in an added edge. Next, to determine how our persistent homology results diﬀer from a model ofexposition in which concepts are iteratively introduced and connected to all already-introduced concepts,we examine a node-ordered ﬁltration [26, 50]. In this null model, nodes are added by order of introductionin the text; if multiple nodes were originally added in a single sentence, then those nodes will be added tothe node-ordered model in a random order. After each node is added to the null, all edges between it andpreviously-added nodes that are present in the total network are added in a random order.Finally, we must consider a caveat for the ﬁltration null models: in particular, while the original expo-sitional ﬁltration, the random index null, and the random sentence order null have some intrinsic sense of“time” of introduction due to the presence of the sentence structure of the text, the latter two null models donot, as they introduce nodes and edges one at a time. As such, in order to meaningfully compare persistencebarcodes amongst all these models, we must “unfurl” the expositional ﬁltrations of the real network andthe random index networks. To this end, we introduce the one-at-a-time (OAAT) ﬁltration process; thismethodology takes a ﬁltration in which multiple nodes and edges might be introduced in single sentences,such as the expositional ﬁltration of a text, and transforms it so that only a single node or edge is addedat each step in the ﬁltration. Speciﬁcally, for each sentence, the OAAT process examines what nodes andedges are added to the network in that sentence; if multiple nodes are added, then they are added ﬁrst, oneat a time, in a random order; then edges are added, one at a time, in a random order. For our empiricalexpositional network, we compute 100 instantiations of this OAAT ﬁltration in order to account for stochas-ticity in the random ordering (we do not do this for each random index or sentence order ﬁltrations, since wealready compute 100 distinct such graphs). With this method, we may examine the topological developmentthat occurs not just over the course of the text with a sentence-level granularity, but also on a sub-sentencescale.There are certain tradeoﬀs we make in using the OAAT ﬁltration for our expositional ﬁltrations. In par-25icular, we lose the direct relationship of cavity persistence length to “time”, or sentence duration throughoutthe text, since we instead simply introduce one node or edge at each “timestep” in the OAAT ﬁltration.However, long cycles should still tend to be long, under the assumption that there is relatively consistentintroduction of nodes and edges throughout the texts. Furthermore, this “unfurling” of the expositionalﬁltration gives us the ability to do a tˆete-`a-tˆete comparison of our latter two null models to the expositionalﬁltrations. These two nulls have no built-in notion of time, and introduce a single node or edge at each stepof their ﬁltration; as such, putting our expositional ﬁltrations on equal footing makes the qualitative andquantitative comparison of the persistent homologies of these ﬁltrations more direct. Supplementary Results

Estimating the normalized weight distributions for the continuous conﬁgurationmodel

The parametrization of the continuous conﬁguration null model for weighted undirected graphs restsupon the choice of a family of probability distributions P uv that speciﬁes the distribution of the possible“normed weight” values for each edge connecting nodes u and v in a network’s node set. Speciﬁcally, where d u and s u are the degree and strength, respectively, of a node u , and d T and s T are the sum of degrees andstrengths respectively over all nodes in a network, and d uv = d u d v d T and s uv = s u s v s T give a normalized viewof to what extent two nodes are both high (or low) in degree or strength, then the continuous conﬁgurationmodel assumes that the weight of an edge between two nodes u and v , if such an edge exists, will be w uv = s uv d uv ξ uv where ξ uv ∼ P uv , some probability distribution on what we call the “normalized weight” of an edge. In ourwork, for the sake of simplicity, we make the assumption that all normalized weight distributions are thesame distribution P . With this assumption, we may choose a parametrization of P and ﬁt this distributionon the empirical normalized weights of all edges in a given network. In particular, if the empirical edgeweights are given as ˆ w uv for all u, v in the set of edges, then the empirical normalized weights are simplygiven by ˆ w uv d uv s uv .Once we have the normalized weights, we may choose a parametrization. Because the normalized weightsof a network are positive and not restricted to the integers, we attempted maximum likelihood ﬁts of anumber of continuous probability distributions with support on the positive real line on each of the networks’normalized weights. Speciﬁcally, we focused on long-tailed distributions: the Pareto, Log-normal, L´evy, Burr,Fisk, Log-gamma, Log-Laplace, and power-law distributions. We also calculated the Kolmogorov-Smirnov(K-S) statistic D of each best-ﬁt distribution in order to determine how well the distribution ﬁt the empiricalnormalized weight data. Distributions were ﬁt and K-S statistics were calculated in Python with the SciPylibrary, version 1.1.0 [33]. In all networks, the K-S statistic was quite low ( D < . p -valuesall signiﬁcantly greater than 0 .

05, indicating good ﬁt between the empirical and best-ﬁt distributions, orinsuﬃcient evidence to reject the null hypothesis that the empirical normalized weight distribution and thebest-ﬁt distribution are identical. The best-ﬁts and statistics for each text’s network are reported in Table S2.

Concepts that appear in more than half the semantic networks’ cores

See Table S3.

Example concepts in the Axler periphery communities

See Table S4.

Development of the meso-scale core-periphery and community structures

Similar to our analysis of the development of each text’s core and periphery, we further wish to examinethe development of the community structure in the semantic networks through the addition of edges between26ext Best-ﬁt distribution K-S statistic K-S p -valueTreil Burr 0.0163 0.303Axler Burr 0.00997 0.795Edwards Log-normal 0.0232 0.195Lang Log-normal 0.0140 0.687Petersen Burr 0.0165 0.174Robbiano Fisk 0.00964 0.847Bretscher Burr 0.00870 0.758Greub Burr 0.0146 0.436Heﬀerson Burr 0.00696 0.910Strang Burr 0.00762 0.759Table S2: Best-ﬁt distributions and corresponding K-S statistics and p -values for the normalized weightdistribution of each text. Concept Frequency in coresmultiplication 8vector space 7scalar 7vector 8inverse 8matrix 9polynomial 7coeﬃcient 8linear transformation 6linear 8linearly independent 9diagonal 9theorem 9projection 6orthogonal 9invertible 6subspace 9determinant 9diagonal matrix 6eigenvalue 9eigenvector 8orthonormal 7orthonormal basis 6equation 7symmetric 6Table S3: Concepts that occur in more than half of the texts’ cores.27ommunity Example concepts1 commutative, associative, dual space, dual map, duality, column rank, row rank3 ﬁnite dimensional subspace, orthogonal, orthogonal complement4 inverse, additive inverse, additive identity6 null space, injective, surjective, isomorphism, invertibility, identity map7 induction hypothesis, division algorithm, factorization8 linearly dependent, linear combination, orthonormal list, gramschmidt procedure9 euclidean inner product, dot product, continuous real value[d] function, derivative10 positive operator, adjoint operator, complexiﬁcation, complex spectral theorem11 transpose, permutation, determinant, square matrixTable S4: Example concepts present within communities in the Axler periphery.particular groups over the course of exposition. Speciﬁcally, we consider four edge types: ‘core-periphery’edges, or those connecting a core node with a periphery node; ‘intra-core’ edges, connecting two core nodes;‘inter-periphery’ edges, connecting nodes in two diﬀerent periphery communities; and ‘intra-community’edges, connecting two nodes in the same periphery community. We examine the relative introduction ofeach group of edge types by calculating, at each point in the texts’ expositions, what fraction of edges ina particular group have been introduced. We show in Fig. S2 the mean ± y = x , which reﬂects constant introduction over time, theearly and late intra-community examples shown have signiﬁcant variability and deviate greatly from such arule of constant introduction. We may quantify this behavior of deviation from constant introduction withthe Kolmogorov-Smirnov (K-S) distance: in particular, for any of the edge group development curves c ( · ),we examine its K-S distance, or greatest vertical distance, to the line y = x on the interval (0 , c ) = max t ∈ (0 , | c ( t ) − t | . Note that we chose our early- and late-introduced communities in Fig. S2 as those communities with themost positive and negative values of c ( t ) − t on the interval (0 , Barcodes and Betti curves for all texts and null models

For the barcodes and Betti curves of the sentence-granularity text ﬁltration, random index model, andrandom sentence order model, see Figs. S3, S4. For barcodes and Betti curves of the OAAT text ﬁltrationand all null ensembles, see Figs. S5, S6. 28igure S2: Community development curves across texts, and associated K-S distance between communitydevelopment curve types and the line y = x across all texts and null ensembles. (a) Mean ± (b) K-S distances for the core-core edge introductioncurve, (c)

K-S distances for the core-periphery edge introduction curve, (d)

K-S distances for the periphery-periphery edge introduction curve, and (e) mean K-S distances across intra-community edge introductioncurves. 29igure S3: Sentence-ﬁltration barcodes and Betti curves for the ﬁrst half of the texts. Each pair of rowsshows an example barcode and Betti curves for a given text, with text results in the leftmost column andnull models in the other columns. 30igure S4: Sentence-ﬁltration barcodes and Betti curves for the second half of the texts. Each pair of rowsshows an example barcode and Betti curves for a given text, with text results in the leftmost column andnull models in the other columns. 31igure S5: OAAT barcodes and Betti curves for the ﬁrst half of the texts. Each pair of rows shows anexample barcode and Betti curves for a given text, with text results in the leftmost column and null modelsin the other columns. 32igure S6: OAAT barcodes and Betti curves for the second half of the texts. Each pair of rows shows anexample barcode and Betti curves for a given text, with text results in the leftmost column and null modelsin the other columns. 33igure S7: Normalized average cycle lifetime for 0-, 1-, and 2-dimensional persistent homology across alltexts’ sentence-granularity ﬁltrations and random index and random sentence order null models. From topto bottom: dimensions 0, 1, and 2.

Normalized average cycle lifetime for texts and all null ensembles

For normalized average cycle lifetimes of the sentence-granularity ﬁltrations for the empirical texts,random index model, and random sentence order model, see Fig. S7. For the normalized average lifetimesof the OAAT ﬁltrations for the empirical texts and all null models, see Fig. S8.

Extended correlation analysis

In the main text, we report results of a brief exploratory analysis assessing the relationship betweenstructural features of exposition and community ratings of the textbooks from which the expositions aretaken. Here, we provide the complete statistics for the Spearman and Pearson correlations between averagerating on Goodreads and normalized average cycle lifetime (NACL) in Table S5. We also note that while weconsider average rating across editions, the default rating presented by Goodreads for textbooks, this metricshould reasonably approximate the rating of each speciﬁc text edition we consider, since textbook editionstend to be similar.We furthermore examine additional correlations between text features, both structural and otherwise, inFig. S9, with associated p -values in Fig. S10. Notably, while we observe correlations between average anddimension-2 OAAT NACL and both number of sentences and node count of each text, neither of the latterstructural features are signiﬁcantly correlated with the average text rating. Furthermore, though the number34igure S8: Normalized average cycle lifetime for 0-, 1-, and 2-dimensional persistent homology across alltexts’ OAAT ﬁltrations and all null models. From top to bottom: dimensions 0, 1, and 2.35orrelate Spearman corr. coef. Spearman p -value Pearson corr. coef. Pearson p -valueNACL, dim. 0 0.143 0.760 0.466 0.291NACL, dim. 1 0.036 0.939 -0.334 0.464NACL, dim. 2 0.0 1.0 -0.145 0.757Avg. NACL 0.071 0.879 -0.187 0.689OAAT NACL, dim. 0 -0.857 0.0137 -0.821 0.0237OAAT NACL, dim. 1 -0.500 0.253 -0.575 0.177OAAT NACL, dim. 2 -0.893 0.00681 -0.846 0.0163Avg. OAAT NACL -0.821 0.0234 -0.828 0.0213Table S5: Spearman and Pearson correlation coeﬃcients and p -values for Goodreads ratings and normalizedaverage cycle lifetimes (NACLs).Text Average Goodreads rating Number of ratingsTreil 3.83 6Axler 4.26 673Lang 4.23 31Bretscher 3.37 71Greub 3.43 7Heﬀerson 3.96 25Strang 4.21 891Table S6: Average rating and total number of ratings on Goodreads for texts with more than 5 ratings.of ratings for each text is highly variable (Table S6), we ﬁnd that this number does not signiﬁcantly correlatewith text rating (Spearman ρ = 0 . p = 0 . ρ = − . p = 0 . ρ = − . p = 0 . p -values reported here and in the corresponding section of themain text were calculated using the Pingouin Python library, version 0.2.8 [86]. Supplementary Discussion

The remarkable eﬀectiveness of the random index null model

Throughout our study, we have used the random index model as a null to examine how the results weobtain for the texts’ actual semantic networks diﬀer from what we might expect when simply calculating theco-occurrence networks and ﬁltrations of a set of random words in a text. Notably, while most of our resultshave fallen at the extreme ends of the metric distributions exhibited by the random index ensemble, in somecases, such as in 1- and 2-dimensional normalized average cycle lifetime, the empirical ﬁltrations demonstratevalues that fall near the bulk of the corresponding random index ensemble results. The perspective that thenull simply gives us a weighted network and ﬁltration computed from the co-occurrence of a random set ofwords might be disheartening, as this could suggest that our results, rather than reﬂecting the meaningfulstructure of semantic networks of concepts elucidated by the textbooks, instead might simply reﬂect growingtopologies that would be expected from any similar calculation of co-occurrence within a text. However,there is another lens through which we can view the random index null model; recalling that the randomindex sets are comprised of words not found within the stop word list, we might consider each random indexgraph as a semantic network itself. Certainly, the semantic features extracted through co-occurrence mightnot reﬂect the content which is the primary focus of the text, since the random index set might includenon-mathematically-meaningful words. Even so, it is likely that some mathematical words will make theirway onto the index set, and the remainder of the words are also meaningful in some way, since they are not36igure S9: Spearman correlation matrix for text features, including sentence- and OAAT-normalized averagecycle lifetime (NACL), core-ness and modularity statistics, core - periphery area, intra-community edgedevelopment K-S, word frequencies, average text ratings and number of ratings, and text length, nodecount, and edge density. “NACL d ” refers to NACL in dimension d .37igure S10: Spearman correlation p -values for text features, including sentence- and OAAT-normalizedaverage cycle lifetime (NACL), core-ness and modularity statistics, core - periphery area, intra-communityedge development K-S, word frequencies, average text ratings and number of ratings, and text length, nodecount, and edge density. “NACL d ” refers to NACL in dimension dd