The Role of Network Analysis in Industrial and Applied Mathematics
AAbstract
Many problems in industry — and in the social, natural, information, and medical sciences — involvediscrete data and benefit from approaches from subjects such as network science, information theory,optimization, probability, and statistics. The study of networks is concerned explicitly with connectivitybetween different entities, and it has become very prominent in industrial settings, an importance that hasintensified amidst the modern data deluge. In this commentary, we discuss the role of network analysisin industrial and applied mathematics, and we give several examples of network science in industry. Wefocus, in particular, on discussing a physical-applied-mathematics approach to the study of networks.We also discuss several of our own collaborations with industry on projects in network analysis. a r X i v : . [ c s . S I] A ug he Role of Network Analysis in Industrial and AppliedMathematics Mason A. Porter , , and Sam D. Howison Department of Mathematics, UCLA, Los Angeles, California 90095, USA Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK CABDyN Complexity Centre, University of Oxford, Oxford OX1 1HP, UKAugust 8, 2018
Mathematics has long played a vital role in industry. From the mixing of fluids to produce an ideal barof chocolate to the study of gasoline emissions in vehicles, wonderful problems in continuous mathematics— typically framed in terms of ordinary and partial differential equations — have arisen from industrialproblems [2]. They have increasingly complemented the equally wonderful problems posed by applicationsin fundamental science and engineering. The applied mathematics curricula (and subjects studied by theacademic staff) of many universities reflect this historical bias toward continuum problems.The goal of the present commentary is to promote an approach to network analysis (especially in industry)through so-called ‘physical applied mathematics’. It is first useful to convey our perspective on such aviewpoint, which is one of the most prominent approaches to applied mathematics and our personally favoredapproach to science. In a physical-applied-mathematics approach to a problem, one uses basic physical (orbiological or chemical) principles and relevant domain knowledge to derive governing equations (most often inthe form of ordinary or partial differential equations) and boundary and/or initial conditions; simplifies theequations to make them mathematically tractable; studies the equations both computationally and with awide variety of mathematical tools (often approximately, such as with asymptotic analysis and perturbationtheory); compares the numerical solutions (ideally of both the simplified governing equations and, if possible,the ‘original’ equations) with approximate analytical solutions or qualitative behavior revealed throughprocedures like a dynamical-systems analysis; compares these results with controlled experiments; and,when possible, compares the experimental results with natural or industrial phenomena in more realisticsettings. Textbooks such as [29, 50] describe these ideals. Through making comparisons, one also revisitsone’s assumptions, adjusts models, refines experiments, and so on. A final crucial step is to interpret theresults of the mathematical and numerical studies in a way that engages seriously with the original problem.A problem’s stakeholders must learn something from the mathematical efforts, and such stakeholders —whether they are scientists in other academic departments, people who work in industry or government, orothers — often collaborate directly on the problem. At a minimum, they need to be consulted early andoften, as they offer domain knowledge.Good physical applied mathematics can start from potentially any type of problem, including fluid, solid,or granular phenomena in nature; observations of biological systems; observations of human or animal behav-ior; physical, behavioral, or other phenomena in industry; and much more. Industrial problems (and otherproblems as well) typically start out in woolly form, and a key challenge for applied mathematicians is for-mulating a concrete, tractable mathematical problem whose solution (perhaps in approximate or numericalform) can yield important insights about the original problem or phenomenon. This is the art of ‘math-ematical modeling’, and in industrial mathematics, it often entails taking a physical-applied-mathematicsapproach to problems that arise from industry. It is
Applied mathematics, rather than applied
Mathematics ,as it is crucial to engage very seriously with applications.This approach, for which open-problem brainstorming workshops, known as ‘study groups’, with industryhave made pioneering contributions [2], also applies to problems and data that take discrete forms. It has2ong been true that many problems in industry include discrete data and benefit from approaches thatincorporate topics from subjects such as network science, information theory, optimization, probability,and statistics. For example, solving problems in optimization is crucial for assembly lines, and the famoustraveling salesperson problem (TSP) has an undeniably practical origin [18]. Amidst the modern data deluge,the importance of discrete data and associated approaches has reached new heights. Social media, whichnow pervade all aspects of life, involve interactions between entities; radio-frequency identification (RFID)device data track the movements of people in cities and stores through discrete delineated zones; people haveassociated metadata that describe their characteristics using discrete (categorical or ordinal) variables; andso on. Network science is concerned explicitly with connectivity between (and among) different entities [63],and it has become very prominent in industrial settings [73] — an importance that has been accentuated bythe modern wealth of data.There is also a long history of using statistical approaches (such as actor-oriented models and others) tostudy networks [81, 82, 90], and statistical approaches have long dominated industrial approaches to networkanalysis. When studying networks, however, it can be very fruitful to take a complementary perspective:one uses the established approach of physical applied mathematics, but now the problems need not bephysical (or from other traditional domains), and in particular they are often discrete in nature and/orinvolve copious amounts of data. Specifically, as we discuss in more detail in Section 2, it is desirable tocombine network science, ‘the study of connectivity’, with an applied-mathematics philosophy, which hasbeen enormously successful in collaborations with industry [6]. For a recent collection of modeling effortsinvolving networks (including in industry), see the December 2016 special issue of
European Journal ofApplied Mathematics [73]. For some more specific examples, see the special issue of
Royal Society OpenScience on urban analytics [36]. There even exist companies that specialize in network analysis, and thereare of course myriad companies that specialize in data analysis more generally. The latter category includesour collaborator dunnhumby [22], and numerous other companies (including our collaborators HSBC, Tesco,and Unilever) include data and network analysis as part of larger research portfolios.In this article, we discuss the role of network analysis in industrial and applied mathematics, and we giveseveral examples of our own work on network science in collaborations with industrial partners. Specifically,we highlight a physical-applied-mathematics approach to industrial problems, though we are well aware thatother perspectives are also important. In Section 2, we discuss network modeling and relate it to traditionalideas from physical applied mathematics. In Section 3, we discuss applications of mathematics to the socialsciences. In Section 4, we discuss network science in industrial settings. We conclude in Section 5. The study of networks incorporates tools from a wide variety of subjects — such as graph theory (of course), computational linear algebra, dynamical systems, optimization, statistical physics, probability, statistics,and more — and is important for applications in just about any area that one might imagine. Scholars whostudy networks ask questions like the following: Who are the most important people, and which are themost important collaborations, in a network of overlapping committee memberships? What is a good movie-recommendation strategy in a social network? How did ideas spread over Twitter and other social media inthe See [35] (from the collection [6]) for an example application of network analysis in industry. One example is Orgnet LLC [3], with which neither of us has an affiliation or collaboration, which ‘provides software,training, consulting, and research in the application of network analysis in a wide variety of domains.’ Essentially, they donetwork analysis and data visualization for hire. For instance, a company may hire them as a consultant to analyze theirorganizational structure as a network (say, of people connected based on their interactions with each other) to determine itskey employees, including perhaps underappreciated people, for information flow. By its nature, network analysis intersects significantly with graph theory, but the former has a much broader spectrumthan the latter (e.g., dynamics often plays a central role), and graph-theoretic analysis is often not the primary focus in studiesof networks. G , which consists of a set V of‘nodes’ (or ‘vertices’) that encode entities and a set E ⊆ V × V of ‘edges’ (or ‘links’ or ‘ties’) that encode theinteractions between those entities. However, the term ‘network’ is more general than a graph, as a networkcan encompass connections among an arbitrary number of entities, can have nodes and/or edges that changein time, can include multiple types of edges, often have associated dynamical processes both on the networksand of the networks, and so on. Associated with a graph is an ‘adjacency matrix’ A , where, if one does notinclude a value to model the strength of a connection, an entry a ij = 1 indicates the presence of an edgethat connects entity i to entity j directly, while a ij = 0 indicates its absence. That is, when a ij = 1, node j is ‘adjacent’ to node i , and the associated edge is ‘incident’ from node i and to node j . The number ofedges that emanate from a node are its ‘out-degree’, and the number of edges incident to a node are its‘in-degree’. For an undirected network, a ij = a ji , and the number of edges incident to a node constitutethe node’s ‘degree’. The spectral properties of adjacency (and other) matrices give important informationabout associated graphs [63, 88]; for undirected networks, it is common to exploit the beneficent propertythat symmetric matrices only have real eigenvalues.Although the study of networks continues to advance at a rapid pace, it can be useful to keep in mindsome basic ideas. For example, it can be very insightful (e.g., for developing ranking methods for Web pages,sports teams, and other things [31]), though sometimes fraught with complications, to study important(i.e., ‘central’) nodes, edges, and other small structures in networks [63]. Another important theme is thestudy of both small and large mesoscale features, which impact network function and dynamics in importantways. Certain small subgraphs (called ‘motifs’) may appear frequently in some networks [58], providingpossible indications of fundamental structures such as feedback loops and other building blocks of globalbehavior. There have been extensive studies of various types of larger-scale network structures, such asdense ‘communities’ of nodes [28,75] (see Section 4), core–periphery structure [19], and others. Other famousfeatures of many networks have also played an important role in the emergence of ‘network science’ as an areaof study [89]. One of these is the ‘small-world property’ [76, 93], in which the mean shortest-path distancebetween nodes in a network scales sufficiently slowly (specifically, logarithmically or slower) as a function ofthe number of nodes. In many situations, such as in social networks, there is simultaneously significant localclustering. Another is heavy-tailed degree distributions (as idealized by a power law), indicating the presenceof many nodes with a small degree but few nodes (sometimes called ‘hubs’) with a large degree [8, 63].To consider edges with different amounts of importance, one can assign a weight — typically a nonnegativereal number, although there are many situations, such as in the study of international relations or socialinteractions [63, 90], in which negative values can be appropriate — so that the entry w ij of a ‘weightedadjacency matrix’ (or ‘weight matrix’) W represents the weight of the connections between nodes i and j .A large value of w ij represents a strong connection from j to i , though sometimes (e.g., for applications intransportation) one might instead have a distance matrix, and then elements with smaller values representstronger connections. Such data can arise, for example, in the form of physical distances (e.g., road networks)or from measurements of empirical or expected travel time between a pair of locations (e.g., using Oyster-carddata from Transport for London). One has to define this property somewhat differently if one is studying a single, finite-size network rather than a sequenceof networks of increasing size. .2 Modeling Considerations and the Incorporation of Complications As we mentioned in Section 1, an important issue in the study of networks involves the notion of ‘modeling’itself. An applied mathematical (or physical) model usually takes as a starting point a postulated mechanismof cause and effect. Statistical models, by contrast, have probabilities underlying them and are more (andsometimes purely) descriptive in nature. Statistical models are often very successful at indicating correla-tions (interpreted broadly). Consequently, it is not surprising that the study of networks is more commonamong statisticians than among applied mathematicians, as reflected by the prominence of statistical ap-proaches in studies of networks in industry and applications [45]. However, statistical approaches are rightlycautious in deducing causation from correlation. Hence, a distinctive feature of network modeling in the spiritof physical applied mathematics is its linking of ideas and tools from statistics (which are necessary, giventhe high-dimensional nature of networked systems) with the desire for causal mechanisms. Put another way,a physical-applied-mathematics approach tends to put more emphasis on detailed modeling of mechanismsthan do statistical perspectives. One possible desirable outcome is to derive some kind of governing equations(perhaps high-dimensional ones), which one can try to simplify using some sort of mean-field theory, masterequation, or other approximations [63, 74]. Unfortunately, this is often very difficult.Given data in which connectivity plays a role, and assuming that one wishes to use tools from networkscience to help in one’s analysis — that itself is not always obvious, and it is an important modeling decision— one needs to decide what type of network description to use, analogously to deciding a level of descriptionusing other approaches (e.g., discrete particulate interactions versus a continuum (PDE) model). Just asone needs to use the correct conservation laws (and boundary conditions, initial conditions, and so on) incontinuum models, it is crucial to choose an appropriate network representation for the problem at hand. Ifone studies the wrong network or asks questions that one’s network representation cannot answer adequately,it is easy to end up with nonsense. In taking a physical-applied-mathematics approach, it is typically desirable(though it can be rather challenging to do it well) to proceed as follows: (1) propose mechanisms — oftenprobabilistic ones, such as interactions arising from a Poisson process [23] — for the generation of edges andedge weights; and then (2) to interpret the ensuing results.To be a bit more concrete, suppose that one possesses a time-resolved data set representing social inter-actions. The most common approach in such a situation is to aggregate the data into a time-independentrepresentation and study the resulting graphs and adjacency matrices with standard tools, but that cancause several problems. There are many choices for aggregation, the simplest of which is to simply count thenumber of interactions between each pair of entities and place those numbers in the weight matrix W . If, forexample, i and j interacted with each other twice during the monitored time window, then w ij = w ji = 2.Unfortunately, this type of aggregation ignores the ‘bursty’ nature of dynamics in social systems and in-stead is based on an implicit (and often incorrect) assumption of Poissonian-in-time interactions betweenentities [37]. One can try to aggregate the temporal information in a more sophisticated way, but then onehas to think very carefully both about the observations and about the sociological (or other) model of com-munication between individuals. Such aggregations of temporal data into time-independent representationsalso suffer problems related to concurrency and ordering of interactions, which is crucial for applicationssuch as transmission of information and diseases (and thus for many problems of industrial interest), so oneneeds to go beyond the traditional tools associated with time-independent graphs. One way to do this is tostudy ‘temporal networks’ [39, 41] and either perform aggregations over multiple windows (which can eitheroverlap or not) or perhaps not aggregate at all and consider the timeline of interactions to be the objectsof interest. Indeed, the study of temporal networks is a very active area of network science, with severalactively researched, unresolved theoretical issues:1. It is very far from clear how to generalize measures and approaches (e.g., ‘centrality’ measures ofnode and edge importance, data clustering methods, and others) from time-independent networks totemporal networks.2. These concepts can be generalized in many different ways, and which generalizations are better forwhich applications, problems, and data is an open issue.3. One has to consider the important issues of discrete versus continuous time and of interaction duration. Another approach in data science, such as in hierarchical clustering, is to ‘start from the data’.
5. One must also consider the relative temporal scales of changes in network structure (weights and/orconnections) and changes in the states of network nodes and edges (e.g., time-dependent traffic on citystreets).Thus, with temporal networks (and any other generalization of ordinary graphs), there is a modeling tradeoff:Should one collapse the data and use a simpler representation, possibly losing something vital or evenobtaining a qualitatively incorrect answer in the process, to be able to use a better-developed and better-understood approach; or should one keep some of the salient information — surely one cannot keep all ofit, given limitations imposed by data size and measurement — and have to generalize the mathematicalapproach and perhaps make some missteps along the way? As with mathematical modeling (and veryprominently indeed in industry), which approach to take depends on the problem and the question that oneis asking. Ideally, one pursues both approaches, because it is necessary develop a better understanding ofwhich simplifications are acceptable.Temporal dynamics is not the only type of complication in interaction data. For example, data canhave ‘multilayer’ structures, perhaps through the interaction of multiple subsystems or through the presenceof multiple types of connections (e.g., there can be multiple communication channels or multiple modesof transportation) [13, 44], and one thus has to consider whether to use a monolayer (i.e., single-layer) ormultilayer approach. As with temporal networks, one keeps more information with a multilayer approach, butit is challenging to generalize monolayer measures and methods, as different generalizations are appropriatein different situations. There are several other similar issues. Should one consider just network structure, adynamical process on a network, or an ‘adaptive network’ [80], in which the dynamics on top of networksare coupled to the dynamics of the network structure (e.g., a driver changes his/her route based on trafficconditions)? Should one include annotations (e.g., demographic data) on nodes and/or edges? Should oneallow ‘hyperedges’ or simplices to connect more than two nodes at a time? Should one perhaps do all of thesethings (as well as others that we have not mentioned)? However, including everything yields an enormousmess that nobody knows how to study!
Further issues arise in the form and fidelity of data. Data may be inaccurate or missing (and ‘Big Data’ is very far from the same thing as ‘good data’ or ‘appropriate data’), and generalizing network structureto incorporate more features necessitates demanding reasonable measurements of more things. Thus, whatdata one can reliably collect (or obtain access to) will also influence the complexity of the chosen networkrepresentation. There is also the issue that most data do not come initially in the form of a network, or itmay come in such a form but with difficulty in determining the weights (or even existence) of edges betweenentities. In some situations (e.g., physical networks, such as in the study of traffic on road networks), thereare straightforward ways to measure edge weights. However, in many other examples (e.g., from social orbiological interactions), it is much harder to reliably calculate the weight of a network edge from empiricaldata. For example, suppose that data arises in the form of pairwise similarities between entities. One canconstruct an adjacency matrix and thus a network, but is a network approach the best one (or even a goodone) to use? Perhaps one should instead use data-reduction techniques from machine learning? To givean even more complicated scenario, one may possess coupled time series, so one can construct a (possiblytime-dependent) set of similarities using one or more of many possible ways of measuring similarities betweentime series, and one thereby obtains a network (either a temporal one or time-independent one) to analyze.However, one started with a set of coupled time series, so maybe one should use time-series approaches?Another salient point, which Andrew Stuart has pointed out [86] in the context of data assimilation(see [47] for an introduction to data assimilation) and which we borrow for our discussion, is the levelof verifiability (a kind of trust) of models in different domains and how that affects mathematical (andstatistical) modeling and the interaction between data and models. At one extreme — i.e., most verifiable— lie physical models, in which there is a set of mechanistic equations, which, in many cases, one trusts fullybecause they are derived from fundamental physical principles that are supported by numerous repeatableexperiments with very precise and accurate results. Somewhat less verifiable (or calibratable) are typical It often feels like many people believe in the Data Fairy. Perhaps they put their hard drive under their pillow and hopethat the Data Fairy comes during the night to leave them a clue? As Tom Petty and the Heartbreakers might sing, the weighting is the hardest part. It is important to emphasize that thelocation of a particular model on this spectrum depends strongly on the availability of widely-accepted basicprinciples, but it is also true that each discipline has its own prejudices in favor of some kinds of models andagainst others.
A key challenge in network science — and a major difference with other areas of industrial and appliedmathematics — is that the off-the-shelf tools are much less developed in network science than they are inother areas. It is far from clear how to generalize tools and methods from graphs to more complicated typesof networks (and there remain a wealth of open problems even when studying graphs), and such efforts areperhaps the most active part of network science. When faced with a problem from industry (or elsewhere),there is a tension between (1) simplifying it and using available approaches and (2) trying to develop thenew mathematical and computational tools that are necessary to examine the problem in a more detailedand perhaps more appropriate way.
An area that helps set the stage for the importance of networks in industrial problems is the application ofmathematics, and networks especially, in the social sciences. The use of mathematical approaches to socialphenomena is much older than widely appreciated [21, 90]. More recently, the wealth of social data — e.g.,from social media such as Twitter and Faceook, and directly from companies in forms such as mobile-phonedata, shopping data, human movement data, and others — has brought social science to the center of the‘Big Data’ explosion. In contrast to much smaller data sets, collected traditionally in forms such as surveys,the data deluge has led to the formation of subjects such as ‘computational social science’ [48, 79]. This hasplaced social science in a transition period in which an increasing number of researchers who are trainedin subjects such as computer science, physics, and mathematics are trying to apply their techniques tosocial systems [92]. There is an ongoing revolution in our ability to predict and explain their dynamics [38],underscoring the importance of developing new mechanic models for use in computational social science [40].
To give a concrete example of mathematical modeling in the social sciences, we discuss ideas related toopinions and social influence on networks [14, 49, 74]. One of the best-known, albeit in many ways rather See [46] for additional discussion of data-driven modeling in complex systems (for scenarios with various levels of trust),and see [66] for a discussion of model verification and validation in the context of the earth sciences. i has a threshold R i (which, in most models, is time-independent)that is drawn from some distribution. At any given time, each node can be in one of two states: 0 (inactive,not adopted, not infected, etc.) or 1 (active, adopted, infected, etc.). The states of the nodes change intime according to an update rule, and one can update nodes either synchronously or asynchronously. Inthe context of spreading of information, one can construe the transition from 0 to 1 as an instantaneouspurchase of a product or as representing — in an extreme simplification — the instant of a recognizablechange in opinion. When updating the state of a node in the WTM, one compares the node’s fraction m i /k i of infected neighbors (where m i is the number of infected neighbors and k i is the degree of node i ) to thenode’s threshold R i . If node i is inactive, it then becomes active (i.e., it switches to state 1) if m i /k i ≥ R i ;otherwise, its state remains unchanged. One can think of the quantity m i /k i as a peer pressure, and one canthink of R i as representing stubbornness or inertia. One way to generalize the WTM is by calculating peerpressure in different ways. For example, in a model introduced in [57], there are three states: nodes can bepassive, active, or ‘hyper-active’, where the last category of nodes, which could represent leaders in a massmovement, exert more influence than nodes that are merely active.Threshold models are rather simplistic, and it is natural to ask whether there exist real-life scenarios inwhich such models are appropriate for explaining empirical observations [74,84]. Although a binary decisionprocess on a network is a gross oversimplification of reality, it can already capture two very important features[65]: interdependence (an entity’s behavior depends on the behavior of other entities) and heterogeneity(differences in behavior are reflected in the distribution of thresholds). Typically, some seed fraction ρ (0)of nodes is assigned to the active state, although that is not always true (e.g., when R i < i ). Depending on the problem under study, one can choose the initially active nodes either randomly (oftenuniformly at random) or deterministically. For the latter, for example, one can imagine planting a rumorat specific nodes in a network, or perhaps this is a seed node that is trying to spread misinformation, ‘fakenews’, or ‘alternative facts’.There are many commercial and governmental applications of social influence and opinion dynamics onnetworks. An effective model of a marketing campaign — whether to promote a product (or an idea or de-sired behavior, such as a healthy diet), to perform targeted and personalized advertising, or to counteract thespread of misinformation — requires one to understand how network structure can affect network dynamicsand how different dynamical processes (even ones that cosmetically seem rather similar to each other) canexhibit qualitatively different behaviors. These are active research areas with numerous fascinating theoreti-cal issues (in mathematics, social science, human behavior, economics, and more), methodological issues (forboth analytical theories and computation), and commercial, governmental, and societal applications. In thecommercial sector, for example, research on human opinion, behavior, and influence can play a significantrole in personalized coupons. If, as sociological theory suggests, ‘you are who your friends are’ (as socialnetwork structure arises in large part from homophily) [90], one can imagine that coupon-program designshould include information about what one’s neighbors in a social network are buying.8 .2 Ethical Considerations Mathematical and computational scientists are faced increasingly with ethical issues in their data analysisand other research [27,51,52]. These issues — which are diverse, multifaceted, and interdependent — includereplicability [5], accessibility of code and data, tension between replicability (and availability) and privacyof human data, and others. In the context of network analysis in industry, we would like to briefly discussdata ethics. For a brief introduction and pointers to many references, see the slide deck [72], a recent themeissue [27] of a
Royal Society journal, and a recent conference [1] at University of Cambridge on ethics inmathematics.Privacy is one of several crucial ethical issues raised by the analysis of personal data [34,72]. Because dataoften include either explicit or implicit information about interactions between entities, one can use networkanalysis to de-anonymize data, especially when there is data about many of the same people in differentsocial networks [7, 60]. This becomes especially salient in light of the fact that personal data (e.g., medicalrecords) can influence insurance premiums or other important items. The privacy breaches via Facebookof the defunct company Cambridge Analytica are now infamous (see, e.g., [33]), and these issues are alsoparamount when analyzing data from social media for research purposes [17]. Additionally, although privacymay be the most obvious ethical dilemma facing researchers who analyze personal data, other ethical issuesare also present, and applied mathematicians may encounter them in their research.Most mathematical scientists have insufficient ethical training for the era of Big Data, and it is necessarythat such training be built into their education. Much of the social data used in collaborations with industryraise these issues rather prominently, so this will become an increasingly important aspect of industrialmathematics with social data — and especially with network data, as the ties between people can play amajor role in invasion of privacy and removal of anonymity. Several years ago, we (MAP, with help fromSDH) set up a procedure for studying human data for the Mathematical Institute at University of Oxford(the current version is available at [56]), and we would like to see this kind of approach becoming standard.
To illustrate the role of network analysis in industry through the lens of physical applied mathematics, wenow briefly discuss a few of our past and ongoing projects. During the last decade, we have cosupervisedseveral doctoral students jointly with industrial partners, initially through EPSRC Collaborative (IndustrialCASE) awards and more recently through programs such as University of Oxford’s Center for DoctoralTraining (CDT) in Industrially Focused Mathematical Modelling (InFoMM) [55]. All together, this covershalf a dozen students: four in collaboration with the investment bank HSBC, one with the supermarketcompany Tesco, and one with the customer-science company dunnhumby.Our collaboration on network science started with our students who worked on problems in partnershipwith HSBC, and ( very importantly ) it has led directly both to results of interest to stakeholders (HSBCand their clients) and to us and the applied mathematics and network-science communities more broadly.As these projects illustrate, the applications and the mathematics are intertwined, as each drives the otherin a crucial way. Early work on time-dependent correlations of financial assets [24–26] included ideas fromnetwork science and random-matrix theory [26] and led to HSBC’s ‘risk-on, risk-off’ (RORO) analysis ofmarket behavior. RORO has been featured prominently in financial circles (including as a tag in the
FinancialTimes blog); see, e.g., [42].Our work with HSBC helped pave the way for our more recent work on financial assets and multilayernetworks and our current project on consumer–product purchasing networks. In parallel, it stimulatedtheoretical analysis of time-dependent networks. The starting point was the study of ‘community detec-tion’ [28, 75], an approach to network clustering in which, in some hopefully optimal way, one seeks toalgorithmically find dense sets of nodes that are connected sparsely to other dense sets of nodes. In theHSBC-related work in [24, 25], led by our doctoral student Dan Fenn, we detected communities by optimiz-ing a ‘modularity’ objective function [61, 63]. We used a measure of similarities in the time series of theexchange rates, based on a time-window aggregation, to compare observed connections in a network with‘random ones’ in a null network constructed from a random-graph model. We did this separately in eachwindow, but we connected the windows sequentially, with a temporal overlap between consecutive windows. We primarily use null models, and null networks that are generated from null models, as structures to compare to those In addition to affirming the results(such as structural changes in the networks following the Lehman Brothers bankruptcy in 2008) from thework of Fenn et al. [24–26], Marya also found subtler structural changes that merit further exploration. Asthe focus of the work shifted from application to theory, she made several theoretical and methodologicaladvances, first in [12] and then (jointly with us, fellow University of Oxford doctoral student Lucas Jeub,and our collaborator Alex Arenas) in [11]. Advancements in [12] include the careful distinction betweennull models and null networks in modularity maximization, proofs of crucial conceptual ideas in multilayernetworks (e.g., that the limit of zero interlayer coupling is a singular one), and toy examples that set thestage for our recent systematic development of flexible generative models for multilayer networks in [11].These models allow a wide variety of correlations across different layers — a feature that is very importantfor real multilayer networks.Our current work in collaboration with supermarket and customer-science companies partly builds onthe above insights on clustering in networks and partly moves in entirely new directions. In one project, weare clustering shopping data and trying to incorporate constraints and metadata that are appropriate forthat application. In another, we are developing a generative model for shopping trajectories of people insupermarkets, with the hope of finding good approaches for relieving congestion.The project of our doctoral student Roxana Pamfil lies within this strand of research but also considers anew application, whose structural constraints differ in important ways from the ones in the above problems.Our work with Roxana [67] is in collaboration with dunnhumby, a customer-science company. She is analyzingdata from anonymized consumers and the products that they purchase in their shopping baskets (from manyTesco stores in the United Kingdom). From these data, we construct bipartite (i.e., two-mode) networks inwhich consumers are adjacent to purchased products. See Fig. 1 for a schematic. The bipartite structureneeds to be incorporated into methods for clustering the data. Determining the edge weights also requiresconsiderable care. For example, one can use so-called ‘item penetration’ (the fraction of all of the itemsbought by customer c that were product p ), ‘basket penetration’ (the fraction of all baskets of customer c that included product p ), or something else. Different choices can yield qualitatively different results, and akey challenge is to determine precisely which weights are most appropriate for which questions and which ofthem give the most robust results.We have access to time-resolved data, which are collected from several different stores (including multipledifferent store formats) and which include various shopper metadata. We also have access to productdescriptions at different hierarchical ‘levels’ (e.g., organic milk versus a particular type of organic milk),opening the door for multilevel modeling with interlayer edges that represent inclusion relationships andinduced intralayer edges, whose existence can be inferred on a different layer of a multilayer network. Forexample, if a consumer bought a particular type of organic milk, then he/she necessarily purchased organicmilk more generally. For examples of networks with various complications, see Fig. 2.We have also been incorporating statistical thinking into our collaboration with dunnhumby. An impor- from empirical data. This resembles the standard use of null models in statistics, but it is not exactly the same. Following the 2010 publication of [59], the study of multilayer networks has become one of the most prominent areasof network science [13, 44]; and several papers, such as [20, 69] and others, have proposed different approaches for studyingcommunities and other mesoscale structures in such networks. ineKale CheeseOrganicbread OrganicmilkBanana PastaChocolate bar Ice creamCustomerProduct Figure 1: A schematic bipartite (i.e., two-mode) network in which consumers are adjacent to the productsthat they purchase. The blue squares are consumers, and the red circles are the products. The dashed ellipsesenclose nodes that have been assigned to the same community using a clustering technique. The gray dashededge towards the bottom of the figure is a potential recommendation that one might give to a consumer, asthere is a node that is not yet purchasing kale but who has been assigned to the same community as kale.[This figure is a slight modification of one that was created by Roxana Pamfil.] ω ωω ω𝑇 = 1 𝑇 = 2 𝑇 = 3 (a) Time-dependent networks 𝜔 " 𝜔 " 𝜔 𝜔 𝜔 𝜔 $ Customer Product Product category (b) Multilevel networks
Cheese WineCheese CheeseWine (c) Annotated networks
Figure 2: Schematic of network structures that are relevant for our work on product–purchase networks. (a)Time-dependent network, encoded with a multilayer representation, with time layers that indicate purchasesby consumers (blue squares) of products (red circles) in supermarkets. The interlayer edge weights ω encodedependencies across time layers. Determining values for ω , ideally from data or as an output of analysis,is an open problem in the study of multilayer networks. (b) A multilevel network, in which different layersrepresent different hierarchical levels in product descriptions. Note that the interlayer edge weights ω i can beheterogeneous, as is also the case for multilayer networks more generally. (c) Annotated networks, in whichwe use product categories as product-node labels. [This figure is a modification of one that was createdoriginally by Roxana Pamfil.] 11ant aspect of our project with dunnhumby is clustering with mesoscale structures other than the ‘assortative’ones (which correspond to adjacency matrices that look dense in the main block diagonal) that are typified bytraditional community structure [28]. We do this in a statistically principled way by using ‘stochastic blockmodels’ (SBMs), in which one specifies a block structure and tries to find the clustering that best fits thatstructure [28, 71]. In this approach, statistical inference takes center stage, and statistical model selection— e.g., between different block structures or between whether or not one allows overlapping communities— becomes a key consideration [70, 71]. In a multilayer setting, as we showed in [11], one can incorporateinterlayer dependencies directly into generative network models (such as SBMs) in a convenient way, and weare actively using these ideas for clustering in consumer–product purchasing networks. Roxana’s work onSBM inference includes studying annotated networks (see Fig. 2c), in which we incorporate node labels (e.g.,consumer type). This is helpful for making recommendations for a newly-introduced product, for which nonetwork data are available. Specifically, one can use the annotations and the inferred relationship betweenannotations and mesoscale structure to assign this new product to a community, which in turn indicatesrelevant consumers for that product.As with our other collaborations with industry, Roxana’s work has also led to new mathematical results,such as extensions to weighted bipartite networks of methodology from [64] for detecting communities inannotated networks and improved methodology for how to determine interlayer edge weights in multilayerSBMs in a principled way [68]. The latter is a significant extension of Newman’s recent work [62] thatestablished an equivalence between modularity maximization (an ad hoc approach to community detection[28]), with a principled choice of a resolution parameter, and a special case of an SBM in monolayer networks.Roxana’s results are also important in network science more broadly, as most work on multilayer networksstill use ad hoc weights for interlayer edge weights.Roxana’s project is a good illustration of the way in which model choices, analysis, and simulationinteract with data in an iterative and question-driven way. A crucial point to stress once again is the benefitto both industrial and academic partners. In this project, the information in the data and the structuralconstraints of the application yield multilayer networks with different structures and metadata (and differentsparsity patterns in the network connections) than what has been analyzed previously. To further developthe theory of multilayer networks, which is one of our primary scientific interests and is one of the most activeareas in network science, it is necessary to consider diverse structures, applications, and ensuing challenges.Otherwise, one risks developing a biased theory that hasn’t been tested adequately on relevant structures. Forour industrial partners, Roxana’s project helps improve understanding of different types of edge weightingsbetween consumers and the products that they purchase, how to categorize different types of customers,and the development of strategies for product recommendations and personalized coupons (through rewardcards, which are also helpful for gathering data). Eventually, it is also desirable to account for geographicvariabilities in how people shop and for large-scale changes in customer preferences over time. Long term,it is also valuable to combine these insights with social-media data (e.g., with recommendations that arealso influenced by friends’ purchases), though that will of course involve very serious ethical considerationsregarding what research in that direction is appropriate.We hope to be able to use our models and data analysis in a predictive way to make product recommen-dations by assigning probabilities to unobserved edges (so-called ‘edge prediction’ [53]). Here, too, one facesa choice of model and level of sophistication: Should one use monolayer or multilayer networks, an SBMor modularity maximization, unweighted or weighted edges (for various choices of weights), and so on? Aspart of her thesis, Roxana has also modeled edge correlations in multilayer networks, and this too entailschoices. However, because people tend to buy the same things over time, modeling correlations explicitlyshould improve edge-prediction results, and this is important for our work with dunnhumby and for manyother applications. Excitingly, some of Roxana’s work has suggested ‘experimental’ work with dunnhumbyto further evaluate her insights from methodological development, modeling, and data analysis.Our latest doctoral student, Fabian Ying, is working on a project in collaboration with Tesco [95], whichis also in collaboration with our colleague Mariano Beguerisse D´ıaz. Fabian is also examining data fromTesco, though his project — investigating human mobility and congestion inside supermarkets — is ratherdifferent from the ones that we described above. The study of human and animal mobility is one of themost fascinating areas in complex systems [9], and investigating human mobility in supermarkets — whichoccurs on shorter time scales and smaller spatial scales than almost all existing studies of human mobility— is important for several questions (see [78] for a recent study), including the following: How do customers12a)
13 245 6 10 49 5350 51 5278 9 13 17 24 281211 16 38 451514 201918 212223 252627 293031 32 35 39 42 4633 36 40 43 4734 37 41 44 48
Entrance Tills (b)
12 345 6 7 891011 12 13 1415 1617 18 19 2021 22 23242526 27 2829 30 31 3233 3435 363738394041424344 454647484950515253545556575859606162636465666768697071727374 7576777879 808182 83848586878889 90919293949596979899 100101 102103 104105106107108109110111112113114115116117118119120121122123124125 126127128129130131 132133134 135136137138139140141 142143144145146147148 149150 151152 153154 155156 157158 159160 161162163 164 165 166 167 168169
Original store layout Op0mized store layout
Figure 3: (a) A Tesco market (left) is divided into zones, which are the nodes in a network (right). We cal-culate mobility flow between zones from anonymized shopping-journey data. We show an example customerjourney in green. (b) Original (left) and optimized (right) layouts of a story. In our optimization, we seek tominimize a congestion model. In this optimized store layout, popular zones are on the outer perimeter of thestore. Larger nodes have more shopping trips going through them. We use node color to signify positionsin the original store layout and to help visualize how they disperse in the optimized layout. [These figureswere created by Fabian Ying.]shop and navigate within a supermarket? What is the best store layout to reduce congestion? Where inthe stores should the promotional items be placed? For our industrial partner, these questions are of courserelated to one of their major questions: How do they maximize revenue?For Fabian’s research, we model each store as a network whose nodes are different regions of that store,with edges present whenever two regions are next to each other (see Fig. 3a). There are also specific entranceand exit points (the tills), and these networks are directed. The spatial nature of a supermarket networkis also an important consideration, as embeddedness in space of a network induces structural features thatmake them different from other networks [10]. An exciting aspect of Fabian’s project is that we are revisitingclassical results from stochastic modeling (e.g., queuing theory), dusting them off, and adapting them forthe network age.Fabian’s research involves two interrelated projects. In one project, we seek to minimize congestion insingle-source, single-sink queuing networks on which random walkers [54] (which, for simplicity, we takeinitially to be homogeneous and unbiased) are traversing. This project, while mathematical in nature, hasthe potential to give insights on customer congestion in markets. We hope to learn how network structureaffects congestion in this model, which network structures minimize congestion (e.g., as measured by meanqueue size or the total number of customers who are currently waiting), and how to alter an existing network(e.g., by adding or removing a small number pathways between shopping aisles) to decrease congestion. Inthe other project, Fabian is using human-mobility models [9] to analyze the flow of customers between zonesin a store. This project involves comparisons of several generative models and a systematic comparison withanonymized data of customer journeys from Tesco stores.13ur approach for the human-mobility project is to study generative models for customer movementswithin a store. We are using insights from existing models for human mobility, which have been a majorresearch topic during the last decade thanks to the recent availability and abundance of human and animalmobility data [9, 15]. However, our project entails analysis of rather different temporal and spatial scalesfrom existing studies. Ultimately, this may necessitate the development of new models, though thus far wehave found that existing population-level human mobility models can successfully predict (not merely fit)about 65–70% of the mobility flow of customers between zones in a store, which as far as we are aware is thefirst successful application of these models on such a small spatial scale. Additionally, through optimization(using simulated annealing), we are able to produce store layouts with less congestion (according to ourmodel), such as by fixing the basic store geometry and swapping zones (see Fig. 3b). The best layouts bringcustomers as quickly as possible to a store’s exit.There are numerous exciting directions, which intermingle theoretical and practical foci, to take Fabian’sresearch. For example, it will be useful to study a more realistic routing model between purchases as well asmore realistic measures and models for congestion. To compare the results of such expanded models withempirical data, we will need to use data sets (e.g., which combine customer-location data and purchase data)that are not currently available to us. It is also important to try to validate our model on a larger numberof stores. We also aim to incorporate business constraints into our optimization of store layouts, so thatwe can provide Tesco with suggestions of viable store layouts. On the conceptual side, we are fascinatedby the idea of augmenting random-walker dynamics through incorporation of shopping lists or zones withdifferent attraction levels, develop new human-mobility models that are tailored for this application’s smallspatial scale and temporal scales, and so on. Potential future avenues include incorporating insights fromrecommender systems (e.g., using mobility and behavioral economics) to influence movement and avoidcongestion.
Network science is playing a large — and increasing — role in industrial problems. Many problems, andassociated data, have a natural network structure; and the study of networks and other discrete structures israpidly becoming a core area of applied mathematics alongside traditional continuum approaches [73]. Thetraditional and very successful ‘physical-applied-mathematics’ philosophy is just as relevant in network mod-eling as it is in more traditional applied-mathematics topics, but there are also many fascinating, important,and often rather difficult challenges: (1) the field of network science is much less mature than topics suchas partial differential equations and asymptotic analysis, and this necessitates both the development of newmethodologies from industrial (and other application-oriented) problems and the navigation of a situationwith less clarity in how and what level of description to use to attack those problems; (2) because networksare high-dimensional and the interactions between many entities play a prominent role in network analysis,modelers should become comfortable not only with traditional mechanistic modeling, a longstanding exper-tise of applied mathematicians, but also with ideas such as probabilistic modeling, statistics, and uncertaintyquantification; (3) the large scale of networked systems poses challenges for scientific computation, especiallygiven not only large static data sets but also real-time computations with data streams; (4) missing and in-complete data (and data cleaning) provide significant guidance (and limitations!) that affect not only whatcalculations are reasonable but also the level of detail that one may wish to use in a network description; and(5) the vast and increasing use of human data poses significant ethical issues (e.g., data privacy) that departsrather markedly from, say, the use of data that arises from fluid mixing in chocolate and other traditionalindustrial applications that motivate mathematical studies.The study of networks is a core part — and, we would argue, one of the most important parts — of themathematics for the modern economy, as it relates very strongly both to the types of problems and to thetypes of data that arise in it. Network modeling also has deep connections to data analysis, data science,and Big Data, but it is more than that: network science incorporates modeling tenets from both physicalapplied mathematics and statistics, and it seeks to marry them together. Mathematics departments needto develop strength in network modeling, and those efforts need to include the study of problems with closeties to problems that arise in the industrial, commercial, and governmental sectors. This will help solveimportant societal problems and simultaneously lead to the develop of new mathematical and computational14echniques and insights.
Acknowledgements
We thank John Ockendon for the invitation to write this article and to give an associated talk at the RoyalSociety workshop on ‘Mathematics for the Modern Economy’ [4].We thank Robert Armstrong, Mariano Beguerisse D´ıaz, Jeremy Bradley, Peter Grindrod, Valid Krebs,Ursula Martin, Roxana Pamfil, Rosie Prior, and Fabian Ying for helpful comments. We also thank RoxanaPamfil for creating the original versions of Figs. 1 and 2 and helping us modify them, Fabian Ying for creatingFig. 3, and Andrew Stuart for an inspiring slide and discussion that we adapted for one of our paragraphs.We thank several anonymous referees for helpful comments.Our doctoral students Fabian Ying and Roxana Pamfil were funded by the EPSRC Centre For DoctoralTraining in Industrially Focused Mathematical Modelling (EP/L015803/1) in collaboration with Tesco anddunnhumby, respectively. Our doctoral students Daniel Fenn and Marya Bazzi were funded by the EPSRCthrough CASE studentship awards.
References [1] Eim 1: The first meeting on ethics in mathematics. available at .[2] Mathematics in Industry: Information Service. Available at ,2017.[3] Orgnet LLC. Available at , 2017.[4] The Royal Society of London, workshop on “Mathematics for the Modern Economy”. Availableat https://royalsociety.org/science-events-and-lectures/2017/06/maths-modern-economy/ ,2017.[5] D. B. Allison, R. M. Shiffrin, and V. Stodden. Reproducibility of research: Issues and proposed remedies.Proceedings of the National Academy of the United States of America, 115(11):2561–2562, 2018.[6] P. J. Aston, A. J. Mulholland, and K. M. M. Tant, editors. UK Success Stories in Industrial Mathematics.Springer-Verlag, 2016.[7] L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art thou r3579x? Anonymized social networks,hidden patterns, and structural steganography. In Proceedings of the 16th International Conference onWorld Wide Web (WWW ’07), pages 181–190, Banff, Alberta, Canada, 2007. ACM.[8] A.-L. Barab´asi and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512,1999.[9] Hugo Barbosa, Marc Barthelemy, Gourab Ghoshal, Charlotte R. James, Maxime Lenormand, ThomasLouail, Ronaldo Menezes, Jos J. Ramasco, Filippo Simini, and Marcello Tomasini. Human mobility:Models and applications. Physics Reports, 734:1–74, 2018. Human mobility: Models and applications.[10] Marc Barthelemy. Morphogenesis of Spatial Networks. Springer-Verlag, 2018.[11] M. Bazzi, L. G. S. Jeub, A. Arenas, S. D. Howison, and M. A. Porter. Generative benchmark modelsfor mesoscale structures in multilayer networks. arXiv:1608.06196, 2016.[12] M. Bazzi, M. A. Porter, S. Williams, M. McDonald, D. J. Fenn, and S. D. Howison. Communitydetection in temporal multilayer networks, with an application to correlation networks. MultiscaleModeling and Simulation: A SIAM Interdisciplinary Journal, 14(1):1–41, 2016.1513] S. Boccaletti, G. Bianconi, R. Criado, C.I. Del Genio, J. G´omez-Gardenes, M. Romance, I. Sendina-Nadal, Z. Wang, and M. Zanin. The structure and dynamics of multilayer networks. Physics Reports,544(1):1–122, 2014.[14] C. Castellano, S. Fortunato, and V. Loreto. Statistical physics of social dynamics. Reviews of ModernPhysics, 81:591–646, 2009.[15] S. C¸ olak, A. Lima, and M. C. Gonz´alez. Understanding congested travel in urban areas. NatureCommunications, 7:10793, 2016.[16] J. Chalupa, P. L. Leath, and G. R. Reich. Bootstrap percolation on a Bethe lattice. Journal of PhysicsC: Solid State Physics, 12(1):L31–L35, 1979.[17] C. Chambers. Facebook fiasco: Was Cornell’s study of ‘emotional contagion’ an ethicsbreach? Available at , 2014.[18] W. J. Cook. In Pursuit of the Traveling Salesman: Mathematics at the Limits of Computation. PrincetonUniversity Press, Princeton, USA, 2014.[19] P. Csermely, A. London, L.-Y. Wu, and B. Uzzi. Structure and dynamics of core–periphery networks.J. Complex Networks, 1:93–123, 2013.[20] M. De Domenico, A. Lancichinetti, A. Arenas, and M. Rosvall. Identifying modular flows on multi-layer networks reveals highly overlapping organization in interconnected systems. Physical Review X,5:011027, 2015.[21] I. de Sola Pool and M. Kochen. Contacts and influence. Social Networks, 1(1):5–51, 1978–1979.[22] dunnhumby. Bringing mathematics into business. Available at (21 June 2017), 2017.[23] W. Feller. An Introduction to Probability Theory and its Applications, Volume I, Third Edition. JohnWiley & Sons, 1968.[24] D. J. Fenn, M. A. Porter, M. McDonald, S. Williams, N. F. Johnson, and N. S. Jones. Dynamiccommunities in multichannel data: An application to the foreign exchange market during the 2007–2008 credit crisis. Chaos, 19(3):033119, 2009.[25] D. J. Fenn, M. A. Porter, P. J. Mucha, M. McDonald, S. Williams, N. F. Johnson, and N. S. Jones.Dynamical clustering of exchange rates. Quantitative Finance, 12(10):1493–1520, 2012.[26] D. J. Fenn, M. A. Porter, S. Williams, M. McDonald, N. F. Johnson, and N. S. Jones. Temporalevolution of financial market correlations. Physical Review E, 84(2):026109, 2011.[27] L. Floridi and M. Taddeo. What is data ethics? Philosophical Transactions of the Royal Society A,374(2083):20160360, 2016.[28] S. Fortunato and D. Hric. Community detection in networks: A user guide. Physics Reports, 659:1–44,2016.[29] A. C. Fowler. Mathematical Models in the Applied Sciences. Cambridge University Press, Cambridge,UK, 1997.[30] J. P. Gleeson. Binary-state dynamics on complex networks: Pair approximation and beyond. PhysicalReview X, 3:021004, 2013.[31] David F. Gleich. PageRank beyond the Web. SIAM Review, 57(3):321–363, 2015.[32] M. Granovetter. Threshold models of collective behavior. The American Journal of Sociology,83(6):1420–1443, 1978. 1633] H. Grassegger and M. Krogerus. The data that turned the world upside down. Available at http://motherboard.vice.com/read/big-data-cambridge-analytica-brexit-trump https://socialcontagionbook.github.io .[50] C. C. Lin and L. A. Segal. Mathematics Applied to Deterministic Problems in the Natural Sciences.Classics in Applied Mathematics (Book 1). Society for Industrial and Applied Mathematics, Philadel-phia, USA, 1988.[51] M. Loukides, H. Mason, and DJ Patil. Data ethics. 2018. Available at . (As of 6 August 2018, this series consists of four posts by this trio of authors.).[52] M. Loukides, H. Mason, and DJ Patil. Doing good data science. 2018. Available at (10 July 2018).[53] V´ıctor Mart´ınez, Fernando Berzal, and Juan-Carlos Cubero. A survey of link prediction in complexnetworks. ACM Computing Surveys, 49(4):69:1–69:33, December 2016.1754] N. Masuda, M. A. Porter, and R. Lambiotte. Random walks and diffusion on networks. Physics Reports,716–717:1–58, 2017.[55] University of Oxford Mathematical Institute. Industrially focused mathematical modelling (ep-src centre for doctoral training). Available at (accessed 29January 2017).[56] University of Oxford Mathematical Institute. Research using data involving hu-mans. Available at (accessed 29 January 2017).[57] S. Melnik, J. A. Ward, J. P. Gleeson, and M. A. Porter. Multi-stage complex contagions. Chaos,23(1):013124, 2013.[58] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon1. Network motifs: Simplebuilding blocks of complex networks. Science, 298(5594):824–827, 2002.[59] P. J. Mucha, T. Richardson, K. Macon, M. A. Porter, and J.-P. Onnela. Community structure intime-dependent, multiscale, and multiplex networks. Science, 328(5980):876–878, 2010.[60] A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In Proceedings ofthe 2008 IEEE Symposium on Security and Privacy (SP ’08), pages 111–125, Washington, DC, USA,2008. IEEE Computer Society.[61] M. E. J. Newman. Finding community structure in networks using the eigenvectors of matrices. PhysicalReview E, 74:036104, 2006.[62] M. E. J. Newman. Equivalence between modularity optimization and maximum likelihood methods forcommunity detection. Physical Review E, 94:052315, 2016.[63] M. E. J. Newman. Networks. Oxford University Press, Second edition, 2018.[64] M. E. J. Newman and A. Clauset. Structure and inference in annotated networks. Nat. Commun.,7:11863, 2016.[65] P. Oliver, G. Marwell, and R. Teixeira. A theory of the critical mass. I. Interdependence, group hetero-geneity, and the production of collective action. American Journal of Sociology, 91(3):522–556, 1985.[66] N. Oreskes, K. Shrader-Frechette, and K. Belitz. Verification, validation, and confirmation of numericalmodels in the earth sciences. Science, 5147(5147):641–646, 1994.[67] A. R. Pamfil. Community structure in product–purchase networks. 2018.Available at .[68] A. R. Pamfil, S. D. Howison, R. Lambiotte, and M. A. Porter. Relating modularity maximization andstochastic block models in multilayer networks. arXiv:1804.01964, 2018.[69] T. P. Peixoto. Inferring the mesoscale structure of layered, edge-valued, and time-varying networks.Physical Review E, 92(4):042807, 2015.[70] T. P. Peixoto. Model selection and hypothesis testing for large-scale network models with overlappinggroups. Physical Review X, 5(1):011033, 2015.[71] T. P. Peixoto. Bayesian stochastic blockmodeling. In Advances in Network Clustering andBlockmodeling. Wiley, New York City, USA, 2018.[72] M. A. Porter. Data ethics (for mathematicians and others). Available at , 2017.1873] M. A. Porter and G. Bianconi. Network analysis and modelling: Special issue of European Journal ofApplied Mathematics. European Journal of Applied Mathematics, 27(6):807–811, 2016.[74] M. A. Porter and J. P. Gleeson. Dynamical Systems on Networks: A Tutorial. Springer-Verlag (Vol. 4in “Frontiers in Applied Dynamical Systems: Reviews and Tutorials”), Heidelberg, Germany, 2016.[75] M. A. Porter, J.-P. Onnela, and P. J. Mucha. Communities in networks. Notices of the AmericanMathematical Society, 56:1082–1097, 1164–1166, 2009.[76] Mason A. Porter. Small-world networks. Scholarpedia, 7(2):1739, 2012.[77] S. Redner. Dynamics of voter models on simple and complex networks. DSWeb (The DynamicalSystems Web), (2), April 2017. Available at https://dsweb.siam.org/The-Magazine/Article/dynamics-of-voter-models-on-simple-and-complex-networks .[78] P. S. Riefer, R. Prior, N. Blair, G. Pavey, and B. C. Love. Coherency-maximizing exploration in thesupermarket. Nature Human Behaviour, 1:0017, 2017.[79] M. Salganik. Computational Social Science: Social Research in the Digital Age. Available at (accessed 29 January 2017), Fall 2016.[80] H. Sayama, I. Pestov, J. Schmidt, B. J. Bush, C. Wong, J. Yamanoi, and T. Gross. Modeling complexsystems with adaptive networks. Computers & Mathematics with Applications, 65(10):1645–1664, 2013.[81] T. A. B. Snijders. The statistical evaluation of social network dynamics. Sociological Methodology,31(1):361–395, 2001.[82] T. A. B. Snijders, G. G. Van de Bunt, and C. E. G. Steglich. Introduction to stochastic actor-basedmodels for network dynamics. Social Networks, 31(1):44–60, 2010.[83] V. Sood, T. Antal, and S. Redner. Voter models on heterogeneous networks. Physical Review E,77(4):041121, 2008.[84] B. State and L. Adamic. The diffusion of support in an online social movement: Evidence from theadoption of equal-sign profile pictures. In Proceedings of the 18th ACM Conference on ComputerSupported Cooperative Work &