[PDF] Observement as Universal Measurement

Abstract

Measurement theory is the cornerstone of science, but no equivalent theory underpins the huge volumes of non-numerical data now being generated. In this study, we show that replacing numbers with alternative mathematical models, such as strings and graphs, generalises traditional measurement to provide rigorous, formal systems (`observement') for recording and interpreting non-numerical data. Moreover, we show that these representations are already widely used and identify general classes of interpretive methodologies implicit in representations based on character strings and graphs (networks). This implies that a generalised concept of measurement has the potential to reveal new insights as well as deep connections between different fields of research.

Full PDF

OObservement as Universal Measurement

David G. Green ∗† , Kerri Morgan ‡ , and Marc Cheong § December 23, 2020

Abstract

Measurement theory is the cornerstone of science, but no equivalenttheory underpins the huge volumes of non-numerical data now being gen-erated. In this study, we show that replacing numbers with alterna-tive mathematical models, such as strings and graphs, generalises tradi-tional measurement to provide rigorous, formal systems (‘observement’)for recording and interpreting non-numerical data. Moreover, we showthat these representations are already widely used and identify generalclasses of interpretive methodologies implicit in representations based oncharacter strings and graphs (networks). This implies that a generalisedconcept of measurement has the potential to reveal new insights as wellas deep connections between diﬀerent ﬁelds of research.

Subjects: complexity, mathematical modelling

Keywords: formal languages, graph theory, measurement, non-numericdata, observement, complexity

It is impossible to overstate the inﬂuence that measurement has exerted onscientiﬁc thinking. In physics, for instance, both theory and experiment aredominated by concepts that are expressed as numerical values: time, distance,mass, charge, etc. Such measures are so familiar we hardly think about theirimplications; and yet they deﬁne fundamental concepts. In the 20th Century,quantitative thinking came to play a role in almost every ﬁeld of science.The association of the term ‘measurement’ with numbers is so deeply in-grained it blinds us to fundamental beneﬁts that the process of measurementitself conveys. A formal system of measurement performs several crucial rolesin research [74].1.

It ensures that data are gathered in a standard way.

The application ofstandards ensure that measurements are taken in a consistent and com-parable form. This makes it possible to compare and combine data fromdiﬀerent sources. ∗ Faculty of Information Technology, Monash University, Clayton, Victoria, Australia. † Corresponding Author. Email: [email protected] ‡ School of Information Technology, Faculty of Science Engineering & Built Environment,Deakin University, Geelong, Australia. § Centre for AI and Digital Ethics (CAIDE), School of Computing and Information Systems,University of Melbourne, Parkville, Australia. a r X i v : . [ c s . OH ] D ec . It produces data with well-known properties.

Representing attributes asnumbers means that we can express relationships between values usingequations and other well-known tools.3.

It has the power of mathematical abstraction.

Representing concepts asnumbers makes it possible to develop mathematical models and techniquesthat apply to a wide range of phenomena.4.

It shapes the development of theory and methods.

Combined with thepower of quantitative mathematics, measurable variables (such as massand length) shape the way we think about the world.Measurement is an indispensable cornerstone of science, but its success maylimit our thinking. It has led to a culture where attempts are made to reduceproblems and phenomena to numbers. This approach may oversimplify or biasthinking: the quality of human performance is more than the value of (say) dol-lar proﬁt per quarter; avian behavior is more complex than the number of timesbirds visit their nest. Needless to say, socio-cultural and historical contexts, andnuances of human behaviour, are disregarded within this approach.A consequence of the information revolution is that organisations collectenormous volumes of data, and much of it is non-numerical in nature. However,theoretical foundations to underlie current practices in data collection and anal-ysis have lagged behind. The many advantages of formal measurement suggestthat there are beneﬁts to be gained by extending its underlying principles tonon-numerical data. In this study, we show that replacing arithmetic with othermathematical models in the deﬁnition of measurement provides formal systemsfor representing non-numerical data.Because the term ‘ measurement ’ is so closely associated with the obtainingof, and assignment to numbers [77, 16, 74], we make the distinction clear byintroducing the term observement for formal systems of observation. In ourdeﬁnition (Section 3.2), observement is a formal system that maps real-worldphenomena to well-deﬁned representations. In this sense, measurement is aspecial case, or subset of observement, in which the representation is a number.We argue that many kinds of observations could, or already can be, consid-ered as observement. To illustrate how it works in practice, we present two ex-amples as case studies. These are based on abstract models that are widely-usedfor representing non-numeric data: strings of symbols and graphs (networks).A variety of analytic techniques have supported these representations – manyof them common to both – as well as rapidly-developing bodies of supportingtheory.

The theory of measurement arose from the need to codify and standardise pro-cedures for representing properties by numerical values. Its development as aformal approach to science dates back at least to the late 17th century and JohnLocke’s interest in metrology [9].Perhaps the most general approach to measurement is the

RepresentationTheory of Measurement (RTM). It assumes that a system assigns numbers toobjects in such a way that they represent a particular property. Formally, there2s a measurement system ( S , N , M ) where the system S = (cid:104) S, R (cid:105) is a set S ofobjects and a ﬁnite set R of relations on S , and N = (cid:104) N, P (cid:105) is the correspondingnumerical system, consisting of numbers N and a ﬁnite set of relations P on N . M is a mapping from S to N .For mass, for instance, the system is comprised of the set of all physi-cal objects S and the set of relations between them, such as “heavier”.The numerical system would comprise non-negative real numbers andrelations such as ≥ .The RTM speciﬁes the following three conditions that a measure must satisfy[29], [54]: Condition 2.1. (M1) Representation – there is an experimental process(mapping) that deﬁnes a homomorphism (property-preserving map) from objectsto numbers.

Formally, measurement is an experimental process that generates a mapping M : S → N . Moreover M must be a homomorphism, that is, for any relation r ∈ R , there is a corresponding relation p ∈ P , such that for any x , x , . . . x k ∈ S : r ( x , x , . . . x k ) ⇔ p ( M ( x ) , M ( x ) , . . . , M ( x k )) . The representation condition requires that a set H ( S ) of homomorphisms from S to N can be proved toexist.For example, if object a is heavier than object b , then the measurementsof their masses, m ( a ) and m ( b ), must satisfy m ( a ) ≥ m ( b ). Condition 2.2. (M2) Existence – there must be at least one mapping.

Clearly, there must be at least one mapping from S to N , otherwise there ex-ists no process to “measure” objects in S . This leads to the existence condition:the set H ( S ) of homomorphisms from S to N is non-empty.For example, procedures for measuring mass ensure that a measure m exists. Condition 2.3. (M3) Uniqueness – any two mappings that satisfy Condition(M1) are equivalent up to homomorphism.

This conditions means that measure must be unique in the sense that anytwo measures of the same object must be related, that is, if there exists anothermeasure M (cid:48) : S → N , then M and M (cid:48) are related by an isomorphism f : M → M (cid:48) such that for any relation r ∈ R and corresponding relation p ∈ P , r ( x , x , . . . , x k ) ⇔ p ( f ( M ( x )) , f ( M ( x )) , . . . , f ( M ( x k ))) . For example, the mass of an item can be measured in kilograms, M K , orpounds, M P , but there is a simple transformation M K → M P × . m exists, and thatthe measurement is unique: that is, there is a simple conversion between twodiﬀerent measurement algorithms (say) pounds and kilograms.One advantage of measurement is abstraction of entities and properties (andthe relations between them) to a simple representation: a number (and thecorresponding relations between numbers). This made it possible for humans tothink and work with abstract ideas. However, mathematics provides many othermodels that allow us to represent other features of the real world in abstractterms. Graphs (networks), for instance, serve as abstract representations formany complex structures (Section 5).Traditionally, measurement deals with numbers, but the essence of mea-surement is not quantities; it is the rigorous process by which observations areobtained [79], [49]. The RTM speciﬁes the features that every measurementsystem must have. However, RTM works just as well as a formal system ifthe underlying model is not arithmetic, but another representation model in-stead. This means that we can deﬁne formal systems of observation for anyrepresentation that is based on an underlying formal model.In order to satisfy the RTM conditions there must be a procedure (an ‘ al-gorithm ’) that implements the mapping from real world objects to the abstractmodel. Traditionally, this algorithm is built upon standards that deﬁne howto interpret the constants and relations in the model. For measurement, thestandards would typically deﬁne what is meant by numbers, 0 and 1; as well asfunctions such as ‘+’; and relations, such as ‘=’. But such an algorithm need notnecessarily output a number. For example, an algorithm might output a stringof symbols. Our concept of observement builds on this idea of generalising thetypes of outputs from the process as part of the generalisation of measurementto observement.The use of standards has never been conﬁned to traditional measurement.Standards today are used for everything from food preparation to business man-agement, from design of petrol pumps to building safety. The InternationalStandards Organization [3] maintains thousands of standards for a vast rangeof items and procedures. These standards often include rules for representingdata [1]. They are aided by increasing use of automation to acquire, manipulateand display data. Observement partly subscribes to the realist philosophy of measurement, which“distinguishes what is measured from how it is measured; [and] holds that whatis measured are attributes of things, rather than things themselves” [53].As “numbers are in no way the only usable symbols” [79], we propose anextension of the representational view of measurement, to remove the depen-dence – inherent in measurement – “upon an isomorphism between an empiricalsystem and a numerical system” [53]. Observement still supports the idea of the“mapping, of relations between objects and ... entities” [74], and the traditionalnotion of numerical measurement can be seen as a subset of observement whichdesignates ‘numbers’ as such entities. 4ur deﬁnition of observement ties in better with existing concepts of ‘mea-suring’ systems such as social networks [50] which need not be reduced to num-bers, as they would be (intuitively) better represented as a network graph (Sec-tion 5).By reducing measurement to numbers, we abstract away details rangingfrom temporal detail [42] to historical and cultural context per Jos´e Ortega yGasset’s conception of perspectivism [45]. The subjectivity of the observer’s“needs that [actually] ... interpret the world” [56]; an example is how a so-cial network captures social and cultural context more intuitively than a merereduction-to-numbers. The

Graph Commons project “makes the computationand visualisation of network mappings accessible in a way that does not rely onmathematical ability ... as a way to help reshape shared understandings in thecontext of an active social struggle ” ( emphases ours ) [52, 10].

We introduce the following deﬁnition of observement. In this deﬁnition, wegeneralise Conditions M1 – M3 in RTM (Section 2) to Conditions Ob1 – Ob3 inObservement below.Formally, there is an observement system ( S , O , m ), where the system S = (cid:104) S, R (cid:105) consists of a set of objects S and a ﬁnite set of relations R on S ; and thecorresponding system O = (cid:104) O, P (cid:105) where O is a set of observations, that is, theset of objects that can be obtained by the observement system, and P is a setof relations on O . In addition, there exists at least one algorithm m that forany input in S determines an observation in O .For example, graphs are commonly used to model real-world networks.Let S be a set of social networks and let O be the set of relationships on S . Here the observements of the networks are graphs, and O is the setof graphs and P is the set of relations on graphs.The observement system satisﬁes the following conditions: Condition 3.1. (Ob1) Representation Condition – The algorithm m gen-erates a homomorphism h m : S → O such that for any relation r ∈ R , thereis a corresponding relation p ∈ P such that for any x , x , x , . . . , x k ∈ S : r ( x , x , x , . . . , x k ) ⇔ p ( h m ( x ) , h m ( x ) , h m ( x ) , . . . , h m ( x k )) . Condition 3.2. (Ob2) Existence Condition – Clearly, there must be at leastone mapping from S to O , otherwise there is no algorithm to “observe” objectsin S . This leads to the existence condition: the set A ( S ) of algorithms that givehomomorphisms from S to O is non-empty. If the observement system satisﬁes the following additional condition, we saythat the observement system is strong . Condition 3.3. (Ob3) Uniqueness Condition – The algorithm must beunique in the sense that any two algorithms used to observe the same object mustbe related. That is, if there exists another algorithm m (cid:48) giving a homomorphism h m (cid:48) : S → O , then there exists a mapping f : m → m (cid:48) such that for any relation r ∈ R and the corresponding relation p ∈ P , r ( h m ( x ) , h m ( x ) , . . . , h m ( x k )) ⇔ p ( f ( h m )( x ) , f ( h m )( x ) , . . . , f ( h m )( x k )) . Ob3 requires that the observation obtained by any method of ob-serving a given property can be obtained by a mapping from the result obtainedfrom another method of observing the same property. Appropriate standardsfor these methods of observement should ensure this. This condition holds formeasurements such as length, distance and mass. It may also hold for some sys-tems of observement. The observement system is called weak if this conditiondoes not hold.To elaborate, condition

Ob3 does not hold for all observement systems.Unlike measurement, where a mapping can be found between algorithms formeasuring with numerical outputs, the case may be more complex for observe-ment. As an example of a weak observement system where

Ob3 does not hold,consider a simple scale for height: short , medium , tall . Suppose that: System A deﬁnes small as <

150 cm and tall as >

183 cm; whereas System B deﬁnes small as <

155 cm and tall as >

178 cm. Both systems capture the intuitiveand formal properties of height, but there is no mapping between the two.The above deﬁnition provides a basic set of criteria that can serve as guide-lines when setting up an observement system. We now argue that several suchrepresentations already exist. Below, we consider two non-numerical categoriesof observement systems: the ﬁrst maps objects to strings, and the second mapsobjects to graphs.

Some experimental methods that produce strings as their outputs satisfy theformal deﬁnition of observement. By strings we mean well-formed sequences ofsymbols within a formal language.Data in the form of strings is common because many processes form se-quences, especially over time, but also in space or some other ordering. Animportant case is written language, which consists of sequences of charactersarranged according to the relevant syntax. This idea is not restricted to naturallanguages, but also includes formal languages (e.g. arithmetic) and computercode. For instance, formal languages (e.g. L-system models of plants) have beenwidely used to describe growth patterns [64].Strings have also been used to deﬁne complexity. An information theoreticinterpretation is that the complexity of a system is the length of a messageneeded to describe it [17, 46, 58].The use of symbolic strings to represent data is common. We look at twoexamples, animal behaviour and genetic codes, that satisfy the deﬁnition ofobservement.

Strings have been used to encapsulate sequences of actions. We brieﬂy describemethods for modelling animal behaviour using strings.6 .2.1 Turtle Geometry

A simple string language S = (cid:104) Alphabet, Syntax (cid:105) consists of all the strings thatcan be made from an alphabet by applying its syntax (a set of production rulessuch as replacement, addition and concatenation). For example, a grammar fora turtle geometry is given in Figure 1. S = (cid:104) Alphabet, Syntax (cid:105)

Alphabet = { L, R, F, T } ∪ {(cid:104) path (cid:105)}

Syntax = {(cid:104) path (cid:105) , (cid:104) path (cid:105) → F (cid:104) path (cid:105) , (cid:104) path (cid:105) → L (cid:104) path (cid:105) , (cid:104) path (cid:105) → R (cid:104) path (cid:105) , (cid:104) path (cid:105) → T } Figure 1: A simple grammar for Turtle Geometry. Here L , R , F , T represent Lef t , Right , F orwardstep , T erminate respectively.The simple grammar in Figure 1 generates strings, such as

F F LF F F RF T ,which describe simple paths in turtle geometry [4]. One advantage is that itprovides a formal method to compare patterns: similar patterns have similarstrings.

The idea of representing behaviour as a string of actions has been employedby ethologists [69]. An animal’s behaviour is recorded as a string of symbolsin a language L B = (cid:104) A B , S B (cid:105) (Figure 2). We can regard this system as anobservement (cid:104) S, O, m (cid:105) by deﬁning S to be sequences of animal behaviour and O = L B . The mapping m is deﬁned by ﬁrst assigning symbols to particularactions (the semantics) (Figure 2). To record a sequence of behaviour, theobserver uses an event recorder, which is a device with keys related to predeﬁnedactions. Each time an action occurs, the observer presses the corresponding keyproducing a string that describes the animal’s behaviour. The advantage is thatthe sequence provides an analytic approach that simpliﬁes the task of identifyingsimilar or repeating patterns of behaviour.The above approach to obtaining behavioural strings ensures that m is ahomomorphism. If any sequence of behaviour is followed by any of the deﬁnedactions, then the symbol for that action will be the next entry in the recordedstring. So the deﬁnition satisﬁes Condition Ob1 . Also, the deﬁnition of therecording process ensures that at least one algorithm exists, so Condition

Ob2 is satisﬁed. Finally, the systems satisﬁes Condition

Ob3 because if we useany other symbols to express individual actions, then simple replacement of In this and the following examples, we use the conventions of Backus Nauer Form (BNF)notation to deﬁne syntax. The symbol ‘+’ denotes one or more repetitions and the symbol ‘ (cid:12)(cid:12) ’represent alternatives.

A common representation of data by strings is used to record sequences of genesand proteins. Genetic data consists of DNA sequences where each characteris one of four bases: Adenine ( a ), Cytosine ( c ), Guanine ( g ) and Thymine( t ) (see Figure 3). Most genes code for proteins (strings of amino acids). InDNA, the bases are grouped in sequences of three to form codons , each of whichcorresponds to an amino acid, or signals the start or end of the sequence (seeTable 1).For instance, the DNA codon atg signals both the start of a gene sequenceand the amino acid Methionine ( M ), the codon atc produces Isoleucine ( I ),and the codon tag signals the end of a gene. There are 64 codons, but only 20amino acids, so there is a lot of redundancy. For instance, there are six codons tct , tcc , tcs , tcg , agt , and agc for the amino acid Serine ( S ). Thus there is astrict correspondence between the DNA sequences in genes and the amino acidsequences in proteins (Figure 3).To interpret the acquisition of gene sequences as observement, we deﬁne alanguage L G = (cid:104) A G , S G (cid:105) to describe gene sequences (Figure 4). So for genes,we deﬁne the observement system (cid:104) S, O, m (cid:105) by interpreting S to be the set ofall genes and setting O to be the language L G .To deﬁne the mapping m : S → O , we associate the constants ( a, c, g, t ) in A G with the DNA bases listed above. A comprehensive methodology exists forrecording gene sequences including processes to identify the base sequences and8 mino Acid Label DNA codons Isoleucine I att, atc, ataLeucine L ctt, ctc, cta, ctg, tta, ttgValine V gtt, gtc, gta, gtgPhenylalanine F ttt, ttcMethionine (start codon) M (START) atgCysteine C tgt, tgcAlanine A gct, gcc, gca, gcgGlycine G ggt, ggc, gga, gggProline P cct, ccc, cca, ccgThreonine T act, acc, aca, acgSerine S tct, tcc, tca, tcg, agt, agcTyrosine Y tat, tacTryptophan W tggGlutamine Q caa, cagAsparagine N aat, aacHistidine H cac, catGlutamic acid E gaa, gagAspartic acid D gac, gatLysine K aaa, aagArginine R cgt, cgc, cga, cgg, aga, aggStop codons STOP taa, tag, tga .

Table 1: DNA codes for amino acids. Note that the start codon also codes forthe amino acid Methionine, and is usually represented as ‘M’ [41, 35].Figure 3: The start and end of a DNA sequence of 1554 bases and the cor-responding amino sequence for lactose permease [

Escherichia coli ] (

Source:GenBank ID AAA24054.1 ). Each subsequence of three bases (a codon ) codesfor an amino acid. The amino acid string corresponding to this gene is listedbeneath it [2]a variety of software tools to construct sequences and identify the genes [31],[2]. The above deﬁnition satisﬁes Condition

Ob1 . The mapping m is a homo-morphism, because both real genes and the sequences that describe them arestrings, and m satisﬁes the same rules for strings: any substring of a sequence9igure 4: Language for describing gene sequences. We use the conventions ofBackus Nauer form (BNF) notation to deﬁne syntax. The symbol ‘+’ denotesone or more repetitions and the symbol ‘ (cid:12)(cid:12) ’ represent alternatives.corresponds to a section of the gene it represents. The above deﬁnitions alsoensure that there is at least one algorithm to obtain a description of a genesequence, so it satisﬁes condition Ob2 . Finally, gene sequencing satisﬁes the strong

Condition

Ob3 . Suppose, we used a diﬀerent mapping m : S → O (cid:48) where O (cid:48) were diﬀerent symbols to represent the four bases. Then the mapping f : O (cid:48) → O which performs a simple replacement of corresponding symbolsgives a mapping from one system to the other.Although the observement of genes described here is simple, the biologicalprocesses involved in the translation of genes into proteins are complex Manybiological issues are beyond the scope of this discussion, such as reading frames,exons and introns, controller genes, and the roles of messenger RNA and ribo-somes. Most genes provide the code for producing proteins, which are formed as stringsof amino acids.Just as we did for genes, we can deﬁne a simple language L P = (cid:104) A P , S P (cid:105) torepresent proteins as amino acid sequences (Figure 5).This language will produce any amino acid sequence (e.g. Figure 3).The proof that amino acid sequencing is a strong observement system paral-lels the argument for gene sequencing almost exactly. The standards associatethe constant symbols with the amino acids they represent ( Ob1 ). As for DNA,there are many widely known methods (

Ob2 ) for extracting and interpretingamino acid strings and protein structure [2]. As Figure 3 shows, the observementsystems for representing genes and proteins are related. Gene sequences map toprotein sequences (

Ob3 ). The equivalences (Table 1) deﬁne a homomorphismfrom gene sequences onto amino acid sequences.10igure 5: Language for deﬁning proteins as strings of amino acids.

Networks have gained increasing prominence in many areas [42]. Examplesinclude social networks, infrastructure networks, and software systems. Theyare used in the analysis of biological networks, such as food webs and geneticregulatory networks. Diagrams of networks are widely used to convey informa-tion, such as organisational structure, family trees, ﬂow diagrams and semanticrelationships (see Figure 6).As networks underlie many diverse ﬁelds and applications, it is important tohave methods of observing behaviour in networks, understanding interactionsbetween entities within networks, and comparisons between networks.Graphs are widely used to model networks and network behaviour. A graphis a pair (

V, E ), where V is a set of nodes (or vertices) and E ⊆ V × V is a setof edges (or arcs) connecting pairs of nodes. (A graph with directed edges iscalled a digraph .) For clarity, here we deﬁne networks to be graphs in which thenodes and edges can have associated attributes. Nodes are an abstraction ofentities in the networks, and edges are an abstraction of relationships betweenentities. For example, the graph in Figure 6 models species as nodes and the‘to eat’ relationship by directed edges.Many important network properties can be determined from a graph. Theuse of graphs to model networks is powerful, as it allows the use of a wide rangeof well-developed tools and methods to extract valuable information about net-works such as network reliability [25, 7], connectedness [60, 65], common networkstructure [26, 57], eﬃcient resource allocation [63, 27], to identify ﬂows withinnetworks [82, 76], and eﬃcient route detection [22, 62]. Graphs and networksprovide a common theoretical model for patterns of interactions, where commoninteractions inside a network can be represented as a subgraph (subnetwork).For example, Figure 6(a) is a directed graph where each node represents aspecies (frogs, spiders, insects, etc) and each directed edge represents a relation-ship between species (e.g. ( f rog, insect ) denotes ‘frogs eat insects”).There are existing standards for representation of graphs as networks. Com-mon data structures for storing graph data include adjacency lists and adjacencymatrices (see Figure 7). These structures can also be represented as a bit stringsuch as the graph6 format [51].A proof of universality of graphs in underlying the structure of all complex11a)(b)Figure 6: Examples of data represented as networks (graphs with values fornodes/edges). (a) A food web from Aspen Manitoba (redrawn from data in[21]). The arrows denote that one species serves as food for another. (b) Aphylogenetic tree (dendogram), showing inter-speciﬁc relationships (after [72]).systems is given in [39]. This universality of graphs means that the networkmodels provide important insights about many diﬀerent systems [59]. Certainnetwork topologies such as trees occur widely and are known to convey impor-tant properties and behaviours. Perhaps the most far-reaching insight was theproof that random graphs undergo a critical phase change, from fragmented toconnected, as the edge density increases [28]. This property of graphs accountsfor a wide range of physical phenomena [40], such as crystallization, ﬁring of alaser and the onset of percolation. 12igure 7: Diﬀerent Graph Representations - a diagram, adjacency list, adjacencymatrix, and bit-string representation of the upper triangular adjacency matrix( graph6 format). In this section, we demonstrate that graphs and related methods, data structuresand standards are an observement system for networks. Here graphs are viewedas observations of networks, and the mapping of the network to a graph datastructure is a homomorphism between common graph relations and networkrelations. For example, the subgraph relation is homomorphic to the subnetworkrelation, and the isomorphism relation is homomorphic to ‘equality’ betweennetworks.We demonstrate that a mapping m exists that gives an homomorphism be-tween relations on the network and relations on the graphs and that satisﬁesthe observement properties. A Graph Observement System

Formally, we can regard the system (cid:104)S , O , m (cid:105) of observements of networks asgraphs as follows.Let S = ( S, R ) where S is the set of networks of interest and R is the set ofrelations on these networks. An example of a network relation is the subnetwork relation, where network N is related to network N if and only if network N is a subnetwork of network N .Let O be the set of graphs (observations) and P be the set of relations onthese graphs. An example of the graph relation is the subgraph relation where G is related to G if and only if G is a subgraph of G . There are many possible mappings from real-world networks to graphs. Herewe use an adjacency matrix (e.g. per Figure 7) where the entities in the net-work are represented as vertices and the relationships between pairs of entitiesas edges. This is a well-established structure for representing graphs and hasassociated methods and standards. The mapping m ﬁrst creates an n × n ad-jacency matrix where n is the number of vertices. It then assigns non-zeroentries corresponding to relationships between pairs of entities. Thus, m mapsa network to an observation (graph). 13 b 1 is satisﬁed This system satisﬁes Condition

Ob1 as the algorithm m is a homomorphismfrom networks to graphs that maps relations on networks to relations on graphs.For example, the mapping of the subnetwork relationship to the subgraph re-lationship. An example of the usefulness of these types of relation-preservingmappings is the mapping of subnetworks with signiﬁcant interactions in biolog-ical networks to subgraphs (network motifs) in graphs. Ob 2 is satisﬁed

There is at least one algorithm m that can be used to represent a network byan adjacency matrix Thus, the system (cid:104)S , O , m (cid:105) satisﬁes Condition Ob2 . Is Ob 3 satisﬁed?

Clearly, there is a mapping between any of the common graphs representations(adjacency matrices and adjacency lists, graph drawings, and compact storageencodings e.g. graph6 [51]) and so

Ob3 is satisﬁed for these representations.Although, there exist mappings between algorithms that observe networks asgraphs represented in these formats, it is an open question where this is alwaysthe case.

Networks have been employed to analyze a wide variety of data from medievalpolitics [81] to geospatial patterns [73].A powerful relation that is preserved when mapping networks to graphs isthe mapping of the subnetwork relation on networks to the subgraph relation ongraphs. Identifying common substructures in networks/graphs underlies manyresearch areas; as “. . . [g]raphs seem to be the current answer to the questionno matter the type of information: molecular data, brain images or neural sig-nals” [75]. Thomas, Dongmin and Lee’s survey of similarity relations on graphs[75], in particular, shows how they can be used to understand neurological in-teractions and to identify neurological disorders. Another example [66], usesnetwork topologies based on interactions between and within subnetworks toinvestigate changes in the brain in people with Alzheimer’s disease. Although,their measures of connectivity are numbers such as path lengths and cluster-ing coeﬃcients, the comparison of the networks represented in this way adds ameaningful layer to the studies (see Figures 4 and 5 in [66]).Trees are a common way to represent hierarchical relationships in manyﬁelds. For example, dendograms are standard tools for representing communitystructure in large networks [19] and are widely used in representing taxonomicrelationships (Figure 6(b)).

Genealogical information is often represented by a directed acyclic graph (DAG),commonly known as a family tree . In this representation, the family membersare represented by nodes and the “child of” relationship is represented by adirected edge from parent to child and the ”partnered with” relationship is14epresented by a bi-directed (or undirected) edge. The underlying graph repre-sentation is technically not a tree, as there may be more than one path betweena pair of nodes. A graph with directed edges is called a digraph .Large databases of genealogical data are maintained, for example ances-try.com . An interesting account of diﬀerent visualisations of this data is givenin [83].A digraph D can be represented as a pair D = ( V, E ) where V = { , , . . . , n − } is the set of nodes, or vertices, and E ⊆ V × V is the set of directed edgesconnecting pairs of nodes. Nodes may be labelled with information such asnames and date and place of birth.The empty digraph denoted D ( ∅ ) has a single vertex and no edges. We usethe notation D + −−−→ u u D and D + u u D to denote the digraphs obtainedby connecting graphs D and D by the directed, or bidirectional, edge u u respectively where u i ∈ V ( D i ) for i ∈ { , } . Similarly, we denote adding adirected, or bidirectional, edge between vertices u and v in D by D + −→ uv or D + uv respectively.If some relationships are missing, we may have a disconnected family tree.In such cases the D is a disjoint union of digraphs D , . . . , D k which we denoteas D = D + . . . + D k .A digraph D can then be recursively deﬁned as: D := D ( ∅ ) | D + −−−→ u u D | D + u u D | D + −→ uv | D + uv | D + D . The following relationships are encapsulated within the digraph: • is child of : For all u, v ∈ V ( D ), u is a child of v if and only if −→ vu ∈ E ( D ). • is parent of : For all u, v ∈ V ( D ), u is a parent of v if and only if −→ uv ∈ E ( D ). • partnered : For all u, v ∈ V ( D ), u partnered v if and only if uv is abidirectional edge in E ( D ). • is related to : For all u, v ∈ V ( D ), u is related to v if and only if u and v are connected by a path (disregarding the direction of the edges) in D . • is descendant of : For all u, v ∈ V ( D ), u is a descendant of v if and onlyif there exists a directed path u = u , u , u , . . . , u k = v in D where eachedge −−−−→ u i u i +1 ∈ E ( D ) for i ∈ { , . . . , k − } . • is predecessor of : For all u, v ∈ V ( D ), u is a predecessor of v if andonly if there exists a directed path v = v , v , v , . . . , v k = u in D whereeach edge −−−→ v i v i +1 ∈ E ( D ) for i ∈ { , . . . , k − } .The underlying data representation of a digraph is commonly an adjacencymatrix or adjacency list representation. Clearly, there is a mapping betweenthese two representations (and so the Observement Condition Ob3 is satisﬁedfor these representations). To reﬂect its inherent symmetry, the ‘partneredwith’ relationship is represented by a bi-directed edge. The underlying graphrepresentation is technically not a tree, as there may be more than one pathbetween a pair of nodes. 15ue to their hierarchical nature, a common way to visualise genealogicaltrees is as an ancestry chart (for example, [83]) where a selected node is posi-tioned as the root of the ‘tree’ (or subtree) and a tree-like structure emanatesfrom the root. The problem of representation and layout of family trees is aspecialisation of general graph layout problems (for example, [68]). Family treescan be extended to other types of genealogical networks (for example, [34]).As the size of the tree increases, the tree-like layout maybe replaced by alter-native visualisations. Systems for extracting genealogical data and visualisingthe genealogical structure using matrix representations are given in [13]. A dendogram is a (tree) graph that illustrates the hierarchical relationshipsbetween clusters of data (see Figure 6). Each leaf node corresponds to a partic-ular cluster, and clusters corresponding to leaf nodes in the same sub-tree havecommon properties. Leaf nodes belonging to smaller sub-trees are more closelyrelated than those connected only by larger subtrees. Dendograms are standardtools for representing community structure in large networks [19].Just as for strings, motifs are also used in the interpretation of graphs (seeFigure 10). Network motifs are subgraphs that appear to occur more frequentlythan expected in certain networks [55]. Often the motifs of a network are relatedto meaningful interactions such as social interaction [32], protein interactionsand gene transcription [80, 15] and other biological interactions [55, 12].As we found earlier with strings, motifs can be used to represent commonpatterns between graphs which in turn represent networks. However, at anotherlevel, motifs can be used as an observement themselves. Thus, we can have amapping from N → G → M which maps a network to a graph, and a graph toa motif. Graphs and networks have gained widespread theoretical prominence in thestudy of complex systems. This is because they provide a common theoreticalmodel for patterns of interactions. The following theorems guarantee this byproviding a proof of universality [39, 40].

Theorem 5.1.

Graphs underlie the structure of all complex systems.Proof.

The proof [39, 40] rests on the observation that models of complex sys-tems use only a small number of representations (e.g. matrices, systems ofequations, cellular automata). So if we show that graphs are implicit in thoserepresentations, then graphs are implicit in the structure of any complex systemfor which that representation is used to create a valid model.

Theorem 5.2.

In any deterministic automaton with a ﬁnite number of states,the state space forms a directed graph.Proof.

For any automaton (cid:104)

A, S (cid:105) , the proof [39, 40] deﬁnes the set of states A to be nodes in a graph and the successor function S deﬁnes a set of edges R A where R A = { ( x, y ) | S ( x ) = y } , where x, y ∈ A .16he proof extends to arrays of automata, since in any such system, the suiteof individual states at any stage deﬁnes a state for the array. By deﬁning edgesfor any transition with non-zero probability, the result also extends to stochasticprocesses.This universality of graphs means that the network model provides impor-tant insights about many diﬀerent systems. Certain network topologies (trees,for instance) occur widely and are known to convey important properties andbehaviours. Perhaps the most far-reaching insight was the proof that randomgraphs undergo a critical phase change, from fragmented to connected, as theedge density increases [28, 40]. This property of graphs accounts for a widerange of physical phenomena, such as crystallization, ﬁring of a laser and theonset of percolation. Graphs have proved a convenient way to represent transportation, utilities andother large-scale infrastructure. A useful abstraction is “networks of networks”[24], which allows eﬃcient identiﬁcation of key properties such as network relia-bility (important in utility networks), network ﬂow (important in transportationnetworks), and catastrophic cascading failures [14]. These critical properties arepreserved by the homomorphism (Condition

Ob1 ) illustrating the beneﬁts ofobservement systems: they abstract and simplify while maintaining these im-portant relationships.Another example of the usefulness of graphs in observement is modellingthe spread of disease by encapsulating the connections between members in thecommunity (e.g. [33], [6]). These observations (graphs) can be used to identifyhow to fragment the graph into subgraphs that minimise interactions and thusminimise the spread of such diseases.

The generalisation of the concept of measurement to include non-numericaldata prompts us to ask what kinds of analytic methods can be used. Numericalmethods may no longer necessarily apply, so there is a need to identify newkinds of analyses that apply to each kind of representation.

Motifs are small, re-occurring structures with signiﬁcant meaning. They occurin many contexts. In music, for instance, a short phrase ( leitmotif ), is sometimesused to identify people or characters in a ballet, opera or movie. In recent yearsthey have become popular analytic tools and are used to analyze both stringsand graphs.In strings, motifs are substrings that typically occur more frequently. A goodexample of motif use is in the observement of proteins, via strings of amino acidsequences [2].Many methodologies have been developed for interpreting gene and aminoacid sequences [2]. Most relevant here are methods that compare diﬀerent se-quences. For genes, important questions include relationships between organ-17 bend i → [ V | L | F | M | Y ] h aromatic i → [ F | R | P | Q ] h livf my i → [ F | S | V | W | N | Q ] h hydro i → [ A | C | F | R | S | T | V | W | M | N | P | Q ] h polar i → [ D | E | R | T | Y | I | K | L | M | P | Q ] h tiny i → [ A | C | G | L ] h aliphatic i → [ S | V | N ] h positive i → [ R | T | K ] h charged i → [ D | E | R | T | K ] h small i → [ A | C | D | G | Y | H | L | M | N ]Figure 1: Rules for construction of protein representation.In practice, the database does not use these labels in its descriptors, preferring instead to provide a standalonecode. Figure ?? gives an example of three amino acid sequences and the motif pattern they share in common. Thispattern highlights the structural similarities between the sequences. In this example x ( N ) means “any N amino acids”and { A, B, C } means any one of the listed amino acids. 1 Figure 8: Rules for construction of protein representation.isms, and unravelling genetic regulatory networks. For proteins, some of themost important questions concern their structure and function. Motifs can beused to describe families of proteins and interrogate their structure.A good example is the use of motifs to describe families of proteins, aspioneered by Bairoch and his colleagues. The protein database Prosite [11]stores information about the structural homology of families of proteins [44].The database represents proteins as amino acid sequences and uses motifs tocapture homology patterns.

Motifs are short sections of strings that are associated with known struc-tural features, such as alpha helices or beta sheets. They are characterized byidentifying: • sets of amino acids that share common physical properties [18]; and • sequences that play a role in folding.Some typical rules are given in Figure 8.In practice, the database does not use these labels in its descriptors, pre-ferring instead to provide a standalone code. Figure 9 gives an example ofthree amino acid sequences and the motif pattern they share in common. Thispattern highlights the structural similarities between the sequences. In this ex-ample x ( N ) means “any N amino acids” and { A, B, C } means any one of thelisted amino acids.In a similar vein, gene sequence analysis uses alignment methods to identifysimilarities and diﬀerences between corresponding genes of distinct species (forexample, see [48]); several methods make use of motifs to assist in this process.As we saw earlier with strings, motifs can be used to represent commonpatterns between graphs, which in turn represent networks (see Figure 10). Network motifs are small sub-graphs that occur more frequently in particular18 mino acid sequences

A-A-G-K-

V-L-F-M-Y -G-Y-W- A -T-L- G N-A-W-K-

V-L-F-M-Y -N-H-P- C -C-A- G F-Q-G-L-

V-L-F-M-Y -D-I-Y- B -K-Q- G Shared motif pa(cid:308)ern x(4)-

V-L-F-M-Y -x(3)-{

A,B,C }-x(2)- G Figure 9: Using motifs to express similarities between amino acid sequences.networks of interest [55]. These motifs can be related to meaningful interactionssuch as social interactions [32], protein interactions and gene transcription [80,15] and other biological interactions [55, 12].Figure 10: Simple examples of network motifs.

In the structure of complex systems, motifs are closely related to modules , whichare self-contained, repeatable units of structure. Examples of modules includelarge corporations, which divide their operations into self-contained units, suchas ﬁnance, manufacturing; and plants, which grow by repeatedly adding mod-ules, such as branches, leaves and buds.Shared motif patterns in families of proteins, such as in Figure 9, leads to theidea of templates that describe a common pattern shared diﬀerent phenomenashare. The idea of templates and modules is familiar in language. For instance,the sentences“A cat ate my canary.” and “The dog buried a bone.”are analogous in the sense that share the common grammatical structure19 article >< noun >< verb >< object > .In this case, the underlying template relates to describing actions in the realworld.The idea of templates is also widely used in taxonomy. For example, thecommon body plan for arthropods (insects, spiders, crabs etc) is < HEAD >< SEGM EN T > + < T AIL > where < SEGM EN T > is a body unit with two legs. An important property of measurement is that it can make vague concepts pre-cise . However there are many concepts that numerical measurement cannotrepresent adequately; such as organisation and process . As we saw earlier (Sec-tion 5.1), graphs are implicit in all complex systems. Graphs-as-observementshandle this problem gracefully, as they can do double-duty: such as in databaseEntity-Relationship Diagrams (ERDs) for the former; and ﬂowcharts for the lat-ter. Recalling Section 3.1, this means that many systems can be represented asnetworks of nodes and edges, which is more intuitive than reduction to numbers,as is the standard conception of conventional measurement [29, 54].Several systems base measures of complexity on strings. They deﬁne thecomplexity of a system is the length of the shortest message required to describethe system [17, 46, 71]. Wallace reﬁned this idea, basing his idea of MinimumMessage Length on the computational model of a program plus data [78]. Thisidea is consistent with Papentin’s division of complexity into two components: primary order (ordered complexity or pattern) and secondary order (entropy,the random complexity) [58]. All of these approaches rest on the assumptionthat there is an absolute minimum value.An alternative is to deﬁne complexity relative to a particular observementframe (based either on a graph or string representation) [36, 37, 38]. This isa more practical approach to dealing with complexity because we can apply aconsistent method (i.e. the same frame of reference) when comparing diﬀerentsystems. For instance, a social network of key power-brokers in an organisa-tion would look very diﬀerent to the network of social sports participation ofemployees within the same organisation.Conversely, any observement frame, based on graphs or strings, immediatelyassigns a complexity value to the observed object. We can represent any complexsystems as a graph, and we can describe that graph as a text string (see Figure7). The length of that text string then provides a number, which is a measureof the system’s complexity. That is, there are mappings: GRAP H → ST RIN G → N U M BER

Thus we can associate a measure of relative complexity with any observementbased on graphs, or strings. A good example is the way measurement of diversity transformed ecology. Initially, thenotion of diversity was vague, but in the mid 20th Century, ecologists introduced a successionof metrics [70, 61]. Also, particularly in the humanities, certain observements involve entangled contexts (suchas historical and social conventions, see Section 3.1). .4 Data storage and compression As we saw in Figure 9, protein analysis uses motif patterns to infer features ofprotein structure. Such ideas are also common in syntactic pattern recognition[30, 67], which applies parsing and other inference methods to formal descrip-tors of observed patterns. Here, motifs can be viewed as representing commonpatterns between strings. However, at another level, motifs can be used as anobservation themselves.Thus, we can have a nested ‘Russian

Matryoshka doll’ mapping P → S → M → N which maps a protein to a string, a string to a motif, and a motif toa number. Each of these is an observation in some observement system, but ateach level of the nesting, there is some loss of information. Observe that thereduction to numbers, being the ultimate step of the mapping, resulted in theloss of more information.Several methods exploit the concept of string motifs by using syntactic andrelated approaches to storing data. A simple example is LZW compression,which maps data to a string [84]. It achieves compression by creating a dictio-nary of repeating elements in the string. This method achieves lossless com-pression in certain kinds of data, such as large-scale genomic databases [47].LZW compression could be regarded as an observement that maps data to apair (cid:104) M, S (cid:105) , where M is a set of string motifs and S is a string in which motifsare replaced by identifying codes. There are numerous examples of databases that demonstrate the usefulness ofobservements in storing data.

Graph databases highlight a shift from the emphasis on collecting data inrigidly-deﬁned and normalised tables [20] to collecting information about re-lationships. They are designed “to store data about relationship-rich environ-ments” [23]. Graph databases are particularly useful where the interactionsbetween entities are important, or where the shape of the structure is mean-ingful, or when no suitable pre-determined ‘template’ exists to represent (oftenincomplete) data. Hence, its popularity lies in its innate suitability for socialnetwork applications, where “relationships become just as important as the dataitself” [23].A survey of these databases is given in [8], which lists advantages of using thisrepresentation; e.g. users are able to easily visualise their data and relationships;queries are easily associated with familiar graph operations; and the availabilityof special structures to store graphs and eﬃcient algorithms to process them.The implicit graphical nature of information has also led to new kinds ofanalytic methods for knowledge discovery in large databases. A widely usedexample is association analysis , which searches for connections between itemsin a database [5, 43]. 21

Conclusion

In this study we have shown that the formal deﬁnition of measurement, notwith-standing the requirement of numeric representation, extends to other, non-numeric representations. We call such systems observement , as a generalisationof, and to distinguish them from, traditional numeric measurement systems. Wehave also shown that several systems based on two commonly used data repre-sentations - strings (Section 4) and graphs (Section 5) - immediately satisfy thedeﬁnition.Revisiting the crucial roles of a traditional measurement system, listed inSection 1, observement allows the gathering of data in a standard way qua measurement. Standard representations lend themselves to standard methodsof interpretation. As we have seen, strings and graphs are already used torepresent many diﬀerent kinds of data; and general methods, such as motifdetection and association analysis can be used in many diﬀerent areas of study.Secondly, observement produces data with well-known properties. For exam-ple, strings and graphs share some commonalities with numerical measurement.Analogies for relations (such as equality) and operations (such as addition) doexist for both strings and graphs, but are richer in variety. For instance, append-ing one string to the end of another provides an analogy for addition, but onestring could also be inserted anywhere within the other. Likewise two graphscan be joined by connecting any pair of nodes with an edge, or by identifyingsets of nodes to bring together with intermediate edges. Moreover, strings andgraphs also introduce other kinds of properties. Graphs, for instance, can ex-hibit clusters, modularity, and various topologies. So observement opens up theprospect of formal systems with new kinds of operators. An enormous array oftools exists to support numerical measurement systems.Thirdly, observement has the power of mathematical abstraction. We haveillustrated this for graphs, which encapsulate the organization of large networksby relationships (edges) between entities, and in strings, which encapsulate ex-tremely complex biological structures, such as proteins and DNA.Lastly, observement can shape the development of theory and methods.Techniques for analysing and interpreting strings and graphs are now very ac-tive research areas. The example of motifs and other patterns, which are widelyused to interpret strings and graphs, provide a case in point. A potential contri-bution of observement theory would be to encourage the development of furthermethods and applications based around widely used representations. In theearly days of measurement theory, only numeric data oﬀered formal analyticmethods of interpretation; modern high performance computers and interactivevisualization can eﬀectively bring many kinds of observement systems withinthe scope of formal analysis.Finally, we point out that the examples we have given here are just the tipof the iceberg. There are many other kinds of data that already are, or couldbe observed using these representations. There are also many other formalrepresentations that could serve for certain kinds of non-numeric data.22 cknowledgment

We would like to thank Professor John Crossley, Professor Graham Farr and Pro-fessor Mark Sanderson for useful suggestions on earlier versions of the manuscript.

References [1] American National Standards Institute: ANSI2018. Available at , Accessed: 22/11/2018.[2] GenBank. Sequence ID AAA24054.1Available at

Ac-cessed: 22/11/2020.[3] International Organization for Standardization: ISO2018. Available at: , Accessed: 22/11/2018.[4] H. Abelson and A.A. DiSessa.

Turtle Geometry: The Computer as aMedium for Exploring Mathematics . MIT Press, 1986.[5] Rakesh Agrawal, Tomasz Imieli´nski, and Arun Swami. Mining associationrules between sets of items in large databases. In

Proceedings of the 1993ACM SIGMOD International Conference on Management of Data , pages207–216, 1993.[6] G.M. Ames, D.B. George, C.P. Hampson, and et. al. Using network prop-erties to predict disease dynamics on human contact networks.

Proc. R.Soc. B. , 278:3544–3550, 2011.[7] M. Andellinia, V. Cannata, S. Gazzellini, B. Bernardi, and A. Napolitanoa.Test-retest reliability of graph metrics of resting state MRI functional brainnetworks: A review.

Journal of Neuroscience Methods , 253:183–192, 2015.[8] R. Angles and C. Gutierrez. Survey of graph database models.

ACMComput. Surv. , 40:1–39, 2008.[9] P.R. Anstey. Locke on measurement.

Studies in History and Philosophy ofScience Part A , 60:70–81, 2016.[10] B. Arikan. Analyzing data networks. the graph commons jour-nal., 2016. (Available at: http://blog.graphcommons.com/analyzing-data-networks/ , Accessed at: 14/5/2020).[11] A. Bairoch. PROSITE: a dictionary of sites and patterns in proteins.

Nu-cleic Acids Research , 19(Suppl):2241, 1991.[12] A.R. Benson, D.F. Gleich, and J. Leskovec. Higher-order organization ofcomplex networks.

Science , 353(6295):163–166, 2016.[13] A. Bezerianos, P. Dragicevic, J. Bae, and B. Watson. GeneaQuilts: Asystem for exploring large genealogies.

IEEE Transactions On VisualizationAnd Computer Graphics , 16:1073–1081, 2010.2314] Charles D Brummitt, George Barnett, and Raissa M D’Souza. Coupledcatastrophes: sudden shifts cascade and hop among interdependent sys-tems.

Journal of The Royal Society Interface , 12(20150712), 2015.[15] Z. Burda, A Krzywicki, O.C. Martin, and M. Zagorski. Motifs emerge fromfunction in model gene regulatory networks.

Proceedings of the NationalAcademy of Sciences , 108(42):17263–17268, 2011.[16] N.R. Campbell.

Physics: the Elements . Cambridge University Press, 1920.[17] Gregory J Chaitin. On the length of programs for computing ﬁnite binarysequences.

Journal of the ACM , 13(4):547–569, 1966.[18] H. Chen, X. Zhou, and Z-C Ou-Yang. Classiﬁcation of amino acids based onstatistical results of known structures and cooperativity of protein folding.

Physical Review E , 65:061907, 2002.[19] A. Clauset, M.E.J. Newman, and C. Moore. Finding community structurein very large networks.

Phys. Rev. E , 70, 2004.[20] E. F. Codd. A relational model of data for large shared data banks.

Com-mun. ACM , 13:377–387, 1970.[21] J.E. Cohen, F. Briand, and C.M. Newman.

Community Food Webs: Dataand Theory . Springer Science & Business Media, 2012.[22] F. Corman, A. D’Ariano, D. Pacciarelli, and M. Pranzoc. Bi-objective con-ﬂict detection and resolution in railway traﬃc management.

TransportationResearch Part C: Emerging Technologies , 20:79–94, 2012.[23] Carlos Coronel and Steven Morris.

Database Systems: Design, Implemen-tation, & Management . Cengage Learning, 2016.[24] Gregorio D’Agostino and Antonio Scala.

Networks of Networks: the LastFrontier of Complexity , volume 340. Springer, 2014.[25] T. Elperin, I. Gertsbakh, and M. Lomonosov. Estimation of network re-liability using graph evolution models.

IEEE Transactions on Reliability ,40:572–581, 1991.[26] F. Emmert-Streib, M. Dehmer, and Y. Shi. Fifty years of graph matching,network alignment and network comparison.

Information Sciences , 346-347:180–197, 2016.[27] P.T. Endo, A. V. de Almeida Palhares, N. Pereira, G. E. Goncalves,D. Sadok, J. Kelner, B. Melander, and J. Mangs. Resource allocationfor distributed cloud: concepts and research challenges.

IEEE Network ,25:42–46, 2011.[28] P. Erd¨os and A. R´enyi. On the evolution of random graphs.

MatematikaiKutat´o Intez´et´enek K¨ozlem´enyei , 5:17–61, 1960.[29] L. Finkelstein and M.S. Leaning. A review of the fundamental concepts ofmeasurement.

Measurement , 2:25–34, 1984.2430] K.S. Fu, editor.

Syntactic Pattern Recognition, Applications . Springer,1982.[31] Walter Gilbert, Nancy Maizels, and Allan Maxam. Sequences of controllingregions of the lactose operon. In

Cold Spring Harbor symposia on quanti-tative biology , volume 38, pages 845–855. Cold Spring Harbor LaboratoryPress, 1974.[32] M. Girvan and M.E.J. Newman. Community structure in social and biolog-ical networks.

Proceedings of the National Academy of Sciences , 99:7821–7826, 2002.[33] K. Glass and B. Barnes. How much would closing schools reduce transmis-sion during an inﬂuenza pandemic?

Epidemiology , 18:623–628, 2007.[34] M. Graham and J. Kennedy. Exploring multiple trees through DAG repre-sentations.

IEEE Transactions on Visualization and Computer Graphics ,13(6):1294–1301, 2007.[35] D. Graur, W-H. Li, and W-H. Li.

Fundamentals of Molecular Evolution ,volume 2. Sinauer, Sunderland, MA, 2000.[36] David G Green. Towards a mathematics of complexity.

Complex Systems:From Local Interactions to Global Phenomena , pages 98–105, 1996.[37] David G. Green. Elements of a network theory of complex adaptive systems.

International Journal of Bio-Inspired Computation , 3(3):159–167, 2011.[38] David G Green and David Newth. Towards a theory of everything?–grand challenges in complexity and informatics.

Complexity international ,8(1):36, 2001.[39] D.G. Green. Emergent behaviour in biological systems. In D.G. Green andT.J. Bossomaier, editors,

Complex Systems – From Biology to Computation ,pages 25–36. IOS Press, Amsterdam, 1992. Reprinted as D.G. Green (1993).Emergent Behaviour in biological systems.

Complexity International

Vol.1.[40] D.G. Green. Connectivity and the evolution of biological systems.

Journalof Biological Systems , 2(1):91–103, 1994.[41] A.J.F. Griﬃths, J.H. Miller, D.T. Suzuki, R.C. Lewontin, and W.M. Gel-bart.

An Introduction to Genetic Analysis . W.H. Freeman, New York,2000.[42] K. Gysi and D.M. Nowick K. Construction, comparison and evolutionof networks in life sciences and other disciplines.

J. R. Soc. Interface ,17(20190610), 2020.[43] Jochen Hipp, Ulrich G¨untzer, and Gholamreza Nakhaeizadeh. Algorithmsfor association rule mining—a general survey and comparison.

ACMSIGKDD Explorations Newsletter , 2:58–64, 2000.[44] K. Hofmann, P. Bucher, L. Falquet, and A. Bairoch. The PROSITEdatabase, its status in 1999.

Nucleic Acids Research , 27(1):215–219, 1999.2545] Oliver Holmes. Jos´e Ortega y Gasset. In E.N. Zalta, editor,

The StanfordEncyclopedia of Philosophy . Metaphysics Research Lab, Stanford Univer-sity, winter 2017 edition, 2017.[46] Andrei Nikolaevich Kolmogorov. Three approaches to the quantitative def-inition of information.

International Journal of Computer Mathematics ,2:157–168, 1968.[47] S. Kuruppu, S.J. Puglisi, and J. Zobel.

Relative Lempel-Ziv Compressionof Genomes for Large-scale Storage and Retrieval , pages 201–206. 2010.[48] H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth,G. Abecasis, R. Durbin, and the 1000 Genome Project Data ProcessingSubgroup. The sequence alignment/map format and SAM tools.

Bioinfor-matics , 25(16):2078–2079, 2009.[49] L. Mari, A. Maul, D.T. Irribarra, and M. Wilson. Quantities, quantiﬁca-tion, and the necessary and suﬃcient conditions for measurement.

Mea-surement , 100:115–121, 2017.[50] Peter V. Marsden. Network data and measurement.

Annual Review ofSociology , 16:435–463, 1990.[51] B. McKay. The graph6 encoding. A description of graph6 and otherencodings is available at http://users.cecs.anu.edu.au/~bdm/data/formats.txt

Accessed 27/11/20.[52] Dan McQuillan. Data science as machinic neoplatonism.

Philos. Technol. ,31(2):253–272, June 2018.[53] J. Michell. History and philosophy of measurement: A realist view. In , 2004.[54] J. Michell.

Representational theory of measurement , pages 19–39. ElsevierAmsterdam, 2007.[55] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon.Network motifs: Simple building blocks of complex networks.

Science ,298:824–827, 2002.[56] F. Nietzsche.

The Will to Power . Vintage Books, 1968.[57] F. Papadopoulos, M. Kitsak, M. A. Serrano, M. Boguna, and D. Krioukov.Popularity versus similarity in growing networks.

Nature , 489:537–540,2012.[58] Frank Papentin. On order and complexity. I. General considerations.

Jour-nal of Theoretical Biology , 87:421–456, 1980.[59] Greg Paperin, David G Green, and Suzanne Sadedin. Dual-phase evolu-tion in complex adaptive systems.

Journal of the Royal Society Interface ,8(58):609–629, 2011.[60] E.R. Peay. Connectedness in a general model for valued networks.

SocialNetworks , 2:385–410, 1980. 2661] E.C. Pielou.

Ecological Diversity . Wiley, New York, 1969.[62] N. Pinto and T.H. Keitt. Beyond the least-cost path: evaluating corridorredundancy using a graph-theoretic approach.

Landscape Ecol. , 24:253–266,2009.[63] V.M. Preciado, M. Zargham, C. Enyioha, A. Jadbabaie, and G.J. Pap-pas. Optimal resource allocation for network protection against spreadingprocesses.

IEEE Transactions on Control of Network Systems , 1:99–108,2014.[64] Przemyslaw Prusinkiewicz and Aristid Lindenmayer.

The AlgorithmicBeauty of Plants . Springer Science & Business Media, 2012.[65] P. Santi and D.M. Blough. An evaluation of connectivity in mobile wirelessad hoc networks. In

Proceedings International Conference on DependableSystems and Networks , pages 89–98, 2002.[66] E.J. Sanz-Arigita, M.M. Schoonheim, and J.S. Damoiseaux. Loss of ‘small-world’ networks in Alzheimer’s disease: graph analysis of fmri resting-statefunctional connectivity.

PLoS One , 5:e13788, 2010.[67] R. Schalkoﬀ.

Pattern Recognition: Statistical, Structural and Neural Ap-proaches . John Wiley & Sons, New York, 1992.[68] H. Schulz. Treevis.net: A tree visualization reference.

IEEE ComputerGraphics and Applications , 31(6):11–15, 2011.[69] H-F Shih and H-K Mok. ETHOM: event-recording computer software forthe study of animal behavior.

Acta Zool. Taiwanica , 11:47–61, 2000.[70] E.H. Simpson. Measurement of diversity.

Nature , 163(4148):688–688, 1949.[71] R.J. Solomonoﬀ. A formal theory of inductive inference. part i.

Informationand Control , 7:1–22, 1964.[72] D. Strait, F.E. Grine, and J.G. Fleagle.

Analyzing hominin phylogeny:cladistic approach , pages 1989–2014. Springer, 2015.[73] Ri-Qi Su, Wen-Xu Wang, Xiao Wang, and Ying-Cheng Lai. Data-basedreconstruction of complex geospatial networks, nodal positioning and de-tection of hidden nodes.

Royal Society Open Science , 3(150577), 2016.[74] E. Tal. Measurement in science. In Edward N. Zalta, editor,

The StanfordEncyclopedia of Philosophy . Metaphysics Research Lab, Stanford Univer-sity, fall 2017 edition, 2017.[75] J. Thomas, S. Dongmin, and S.Lee. Review on graph clustering and sub-graph similarity based analysis of neurological disorders.

InternationalJournal of Molecular Sciences , 17:862–884, 2016.[76] D.L. Urban, E.S. Minor, E.A. Treml, and R.S. Schick. Graph models ofhabitat mosaics.

Ecology Letters , 12:26–273, 2009.2777] H. von Helmholtz. Zahlen und Messen erkenntnis-theoretisch betrachtet.

Gesammelte Abhandl , 1895. Translated by C.L. Bryan (1930) as Countingand measuring. Van Nostrand, Princeton.[78] Christopher S Wallace.

Statistical and Inductive Inference by MinimumMessage Length . Springer Science & Business Media, 2005.[79] H. Weyl.

Philosophy of Mathematics and Natural Science . Princeton, 1949.[80] E. Yeger-Lotem, S. Sattath, N. Kashtan, S. Itzkovitz, R. Milo, R. Pinter,U. Alon, and H. Margalit. Network motifs in integrated cellular networksof transcription–regulation and protein–protein interaction.

Proceedings ofthe National Academy of Sciences , 101:5934–5939, 2004.[81] Joseph Yose, Ralph Kenna, M´air´ın MacCarron, and P´adraig MacCarron.Network analysis of the viking age in ireland as portrayed in cogadh gaedhelre gallaibh.

Royal Society Open Science , 5(171024), 2018.[82] J. Yu and S. M. LaValle. Planning optimal paths for multiple robots ongraphs. In , pages 3612–3617, 2013.[83] L. Zhukov. Visualizing Family Trees, 2014. (Available at: https://blogs.ancestry.com/ancestry/2014/1/17/visualizing-family-trees/ , Ac-cessed at: 10/8/2018).[84] J. Ziv and A. Lempel. A universal algorithm for sequential data compres-sion.