IInformation and Computation
Carlos Gershenson , Departamento de Ciencias de la Computaci´onInstituto de Investigaciones en Matem´aticas Aplicadas y en SistemasUniversidad Nacional Aut´onoma de M´exicoA.P. 20-726, 01000 M´exico D.F. M´[email protected] http://turing.iimas.unam.mx/~cgg Centro de Ciencias de la ComplejidadUniversidad Nacional Aut´onoma de M´exicoOctober 3, 2018
Abstract
In this chapter, concepts related to information and computation are reviewed inthe context of human computation. A brief introduction to information theory anddifferent types of computation is given. Two examples of human computation systems,online social networks and Wikipedia, are used to illustrate how these can be describedand compared in terms of information and computation.
Before delving into the role of information theory as a descriptive tool for human computa-tion (von Ahn, 2009), we have to agree on at least two things: what is human, and whatis computation, as human computation is at its most general level computation performedby humans. It might be difficult to define what makes us human, but for practical purposeswe can take an “I-know-it-when-I-see-it” stance. For computation, on the other hand, thereare formal definitions, tools and methods that have been useful in the development of digitalcomputers and can also be useful in the study of human computation.
Information has had a long and interesting history (Gleick, 2011). It was Claude Shan-non (1948) who developed mathematically the basis of what we now know as informationtheory (Ash, 1990). Shannon was interested in particular on how a message could be trans-mitted reliably across a noisy channel. This is very relevant for telecommunications. Still,1 a r X i v : . [ c s . I T ] M a y nformation theory has proven to be useful beyond engineering (von Baeyer, 2005), as any-thing can be described in terms of information (Gershenson, 2012).A brief technical introduction to Shannon information H is given in Appendix A. Themain idea behind this measure is that messages will carry more information if they reduceuncertainty. Thus, if some data is very regular, i.e. already certain, more data will bringnew information, so H will be low, i.e. few or no new information. If data is irregular orclose to random, then more data will be informative and H will be high, since this new datacould not have been expected from previous data.Shannon information assumes that the meaning or decoding is fixed, and this is generallyso for information theory. The study of meaning has been made by semiotics (Peirce, 1991;Eco, 1979). The study of the evolution of language (Christiansen and Kirby, 2003) has alsodealt with how meaning is acquired by natural or artificial systems (Steels, 1997).Information theory can be useful for different aspects of human computation. It canbe used to measure, among other properties: the information transmitted between people,novelty, dependence, and complexity (Prokopenko et al., 2009; Gershenson and Fern´andez,2012). For a deeper treatment of information theory, the reader is referred to the textbookby Cover and Thomas (2006). Having a most general view, computation can be seen simply as the transformation of in-formation (Gershenson, 2012). If anything can be described in terms of information, thenanything humans do could be said to be human computation. However, this notion is toobroad to be useful.A formal definition of computation was proposed by Alan Turing (1936). He defined anabstract “machine” (a Turing machine) and defined “computable functions” as those whichthe machine could calculate in finite time. This notion is perhaps too narrow to be useful,as Turing machines are cumbersome to program and it is actually debated whether Turingmachines can model all human behavior (Edmonds and Gershenson, 2012).An intermediate and more practical notion of computation is the transformation of in-formation by means of an algorithm or program.
This notion on the one hand tractable, andon the other hand is not limited to abstract machines.In this view of computation, the algorithm or program (which can be run by a machineor animal) defines rules by which information will change. By studying at a general levelwhat happens when the information introduced to a program (input) is changed, or how thecomputation (output) changes when the program is modified (for the same input), differenttypes of dynamics of information can be identified:
Static.
Information is not transformed. For example, a crystal has a pattern which doesnot change in observable time.
Periodic.
Information is transformed following a regular pattern. For example, planetshave regular cycles which in which information measured is repeated every period.2 haotic.
Information is very sensitive to changes to itself or the program, it is difficult tofind patterns. For example, small changes in temperature or pressure can lead to verydifferent meteorological futures, a fact which limits the precision of weather prediction.
Complex.
Also called critical, it is regular enough to preserve information but allowsenough flexibility to make changes. It balances robustness and adaptability (Langton,1990). Living systems would fall in this category.Wolfram (2002) conjectured that there are only two types of computation: universalor regular. In other words, programs are either able to perform any possible computation(universal), or they are simple and limited (regular). This is still an open question and thetheory of computation is an active research area.
Computing networks (CNs) are a formalism proposed to compare different types of com-puting structures (Gershenson, 2010). CNs will be used to compare neural computation(information transformed by neurons), machine distributed computation (information trans-formed by networked computers), and human computation.In computing networks, nodes can process information (compute) and exchange infor-mation through their edges, each of which connects the output of node with the input ofanother node. A computing network is defined as a set of nodes N linked by a setof edges K used by an algorithm a to compute a function f (Gershenson, 2010).Nodes and edges can have internal variables that determine their state, and functions thatdetermine how their state changes. CNs can be stochastic or deterministic, synchronous orasynchronous, discrete or continuous.In a CN description of a neural network (NN) model, nodes represent neurons. Eachneuron i has a continuous state (output) determined by a function y i which is composedby two other functions: the weighted sum S i of its inputs ¯ x i and an activation function A i ,usually a sigmoid. Directed edges ij represent synapses, relating outputs y i of neurons i toinputs x j of neurons j , as well as external inputs and outputs with the network. Edges havea continuous state w ij (weight) that relates the states of neurons. The function f may begiven by the states of a subset of N (outputs ¯ y ), or by the complete set N . NNs usuallyhave two dynamical scales: a “fast” scale where the network function f is calculated bythe functional composition of the function y i of each neuron i , and a “slow” scale where alearning algorithm a adjusts the weights w ij (states) of edges. There is a broad diversity ofalgorithms a used to update weights in different types of NN. Figure 1 illustrates NNs asCNs.Digital machines carrying out distributed computation (DC) can also be representedas CNs. Nodes represent computers while edges represent network connections between them.Each computer i has information H i which is modified by a program P i ( H i ). Physically,both H i and P i are stored in the computer memory, while the information transformationis carried out by a processor. Computers can share information H ij across edges using acommunication protocol. The function f of the DC will be determined by the output of P i ( H i ) of some or all of the nodes, which can be seen as a “fast” scale. Usually there is an3 i =A i (S i (x ̄ i )) y ij , w ij f ⊆ y ̅ a → Δ w ij , ∀ i,j Figure 1: A NN represented as a CN.algorithm a working at a “slower” scale, determining and modifying the interactions betweencomputers, i.e. the network topology. Figure 2 shows a diagram of DC as a CN. Human computation (HC) can be described as a CN in a very similar way than DC.People are represented as nodes and their interactions as edges . People within a HC systemtransform information H i following a program P i ( H i ). In many cases, the information sharedbetween people H ij is transmitted using digital computers, e.g. in social networks, wikis,forums, etc. In other cases, e.g. crowd dynamics, information H ij is shared through theenvironment: acustically, visually (Moussa¨ıd et al., 2011), stigmergically (Doyle and Marsh,2013), etc. The function f of a HC system can be difficult to define, since in many cases theoutcome is observed and described only a posteriori . Still, we can say that f is a combinationof the computation carried out by people. An algorithm a would determine how the sociallinks change in time. Depending on the system, a can be slower than f or vice versa. H i P i H ij f ⊆ P i (H i ) ∀ i)a → H ij , ∀ i,j Figure 2: A DC system or a HC system represented as a CN.In DC, the algorithm a is centrally determined by a designer, while in most HC systems,the a is determined and executed by people (nodes) themselves.Using information theory, we can measure how much information H ij is transmittedbetween people, how much each person receives and produces, and how much the entiresystem receives and produces. In many cases, machines enable this transmission and thus4lso facilitate its measurement. Comparing the history of information transfers and currentinformation flows can be used to measure the novelty in current information. A straightforward example of human computation can be given with online social networks.There are key differences, e.g. links are bidirectional in Facebook (my friends also have me astheir friend) and unidirectional in Twitter (the people I follow do not necessarily follow me, Ido not necessarily follow my followers). People and organizations are represented with theiraccounts in the system as nodes, and they receive information through their incoming links,They can share this information with their outgoing links and also produce novel informationthat their links may receive. People can decide how to create or eliminate social links, i.e. a is decided by individuals.These simple rules of the information dynamics on social networks are able to producevery interesting features of human computation (Lerman and Ghosh, 2010), which can bedescribed as functions f . For example, non-official news can spread very quickly throughsocial networks, challenging mass media dominated by some governments. On the otherhand, false rumors can also spread very quickly, potentially leading to collective misbelief.Nevertheless, it has been found that the dynamics of false rumors spreading is different fromthat of verifiable information (Castillo et al., 2011).Describing social networks as CNs is useful because interactions are stated explicitly.Moreover, one can relate different scales with the same model: local scale (nodes), globalscale (networks), and meso scales (modules); and also temporal scales: fast ( f ) and slow ( a ).Information theory can be used to detect novelty in social interactions (high H values inedges), imitation (low H values in edges), unusual patterns (“fake” information), correlations(with mutual information), and communities (modules (Newman, 2010)). Wikipedia gives a clear example of the power of human computation. Millions of people(nodes) from all over the world have collaboratively built the most extensive encyclopediaever. The sharing of information is made through editable webpages on a specific topic.Since these pages can potentially link more than two people (editing the webpage), thelinks can be represented as those of a hypernetwork (Johnson, 2009), where edges can linkmore than two nodes (as in usual networks). The information in pages (hyperedges) canbe measured, as it changes over time with the editing made by people linked to them. Theinformation content delivered by different authors can be measured with H . When this isincreased, it implies novelty. The complexity of the webpages, edits, and user interactionscan also be measured, seen as a balance between maximum information (noise) and minimuminformation (stasis) (Fern´andez et al., 2013).The function f of Wikipedia is its own creation, growth, and refinement: the pagesthemselves are the output of the system. Again, people decide which pages to edit, so the5lgorithm a is also decided by individuals.Traditionally, Wikipedia—like any set of webpages—is described as a network of pageswith directional edges from pages that link to other pages. This is a useful description tostudy the structure of Wikipedia itself, but it might not be the most appropriate in thecontext of human computation, as no humans are represented. Describing Wikipedia as aCN, the relationships between humans and the information they produce collaboratively isexplicit, providing a better understanding of this collective phenomenon. Concepts related to information and computation can be applied to any system, as anythingcan be described in terms of information (Gershenson, 2012). Thus, HC can also benefitfrom the formalisms and descriptions related to information and computation.CNs are general, so they can be used to describe and compare any HC system. Forexample, it is straightforward to represent online social networks such as Facebook, Twitter,LinkedIn, Google+, Instagram, etc. as CNs. As such, their structure, functions, and algo-rithms can be contrasted, and their local and global information dynamics can be measured.The properties of each of these online social networks could be compared with other HCsystems, such as Wikipedia.Moreover, CNs and Information Theory can be used to design and self-monitor HCsystems (Gershenson, 2007). For example, information overload should be avoided in HCsystems. The formalisms presented in this chapter and in the cited material can be used tomeasure information inputs, transfers, and outputs to avoid not only information overload,but also information poverty (Bateson, 1972).In our age where data is overflowing, we require appropriate measures and tools to beable to make sense out of “big data”. Information and computation provide some of thesemeasures and tools. There are still several challenges and opportunities ahead, but whathas been achieved so far is very promising and invites us to continue exploring appropriatedescriptions of HC systems.
Acknowledgments
I should like to thank Matthew Blumberg and Pietro Michelucci for useful advice. This workwas partially supported by SNI membership 47907 of CONACyT, Mexico.
A Shannon Information
Given a string X , composed by a sequence of values x which follow a probability distribution P ( x ), information (according to Shannon) is defined as: H = − (cid:88) P ( x ) log P ( x ) . (1)For binary strings, the most commonly used in ICT systems, the logarithm is usually takenwith base two. For example, if the probability of receiving ones is maximal ( P (1) = 1)6nd the probability of receiving zeros is minimal ( P (0) = 0), the information is minimal,i.e. H = 0, since we know beforehand that the future value of x will be 1. Informationis zero because future values of x do not add anything new, i.e. the values are knownbeforehand. If we have no knowledge about the future value of x , as with a fair coin toss,then P (0) = P (1) = 0 .
5. In this case, information will be maximal, i.e. H = 1, becausea future observation will give us all the relevant information, which is also independent ofprevious values. Equation 1 is plotted in Figure 3. Shannon information can be seen alsoas a measure of uncertainty. If there is absolute certainty about the future of x , be it zero( P (0) = 1) or one ( P (1) = 1), then the information received will be zero. If there is nocertainty due to the probability distribution ( P (0) = P (1) = 0 . H because equation 1 is equivalent toBoltzmann’s entropy in thermodynamics, which is also defined as H . The unit of informationis the bit. One bit represents the information gained when a binary random variable becomesknown.A more detailed explanation of information theory, as well as measures of complexity,emergence, self-organization, homeostasis, and autopoiesis based on information theory canbe found in Fern´andez et al. (2013). I ( X ) Figure 3: Shannon’s Information H ( X ) of a binary string X for different probabilities P ( x ).Note that P (0) = 1 − P (1). References
Ash, R. B. (1990).
Information Theory . Dover Publications, Inc.
Bateson, G. (1972).
Steps to an Ecology of Mind . Ballantine, New York.7 astillo, C. , Mendoza, M. , and Poblete, B. (2011). Information credibility on twit-ter. In Proceedings of the 20th international conference on World wide web . WWW ’11.ACM, New York, NY, USA, pp. 675–684. URL http://doi.acm.org/10.1145/1963405.1963500 . Christiansen, M. H. and Kirby, S. (2003).
Language evolution . Vol. 3. Oxford UniversityPress.
Cover, T. M. and Thomas, J. A. (2006).
Elements of Information Theory . Wiley-Interscience. URL . Doyle, M. J. and Marsh, L. (2013). Stigmergy 3.0: From ants to economies.
CognitiveSystems Research : 1 – 6. URL http://dx.doi.org/10.1016/j.cogsys.2012.06.001 . Eco, U. (1979).
A theory of semiotics . Indiana University Press.
Edmonds, B. and Gershenson, C. (2012). Learning, social intelligence and the Turingtest - why an “out-of-the-box” Turing machine will not pass the Turing test. In
Howthe world computes : Turing Centenary Conference and 8th Conference on Computabil-ity in Europe, CiE 2012, Cambridge, UK, June 18-23, 2012. Proceedings , S. B. Cooper,A. Dawar, and B. L¨owe, (Eds.). Lecture Notes in Computer Science, vol. 7318/2012.Springer-Verlag, Berlin Heidelberg, 182–192. URL http://arxiv.org/abs/1203.3376 . Fern´andez, N. , Maldonado, C. , and Gershenson, C. (2013). Information measuresof complexity, emergence, self-organization, homeostasis, and autopoiesis. In Guided Self-Organization: Inception , M. Prokopenko, (Ed.). Springer. In Press. URL http://arxiv.org/abs/1304.1842 . Gershenson, C. (2007).
Design and Control of Self-organizing Systems . CopIt Arxives,Mexico. http://tinyurl.com/DCSOS2007. URL http://tinyurl.com/DCSOS2007 . Gershenson, C. (2010). Computing networks: A general framework to contrast neuraland swarm cognitions.
Paladyn, Journal of Behavioral Robotics (2): 147–153. URL http://dx.doi.org/10.2478/s13230-010-0015-z . Gershenson, C. (2012). The world as evolving information. In
Unifying Themes in Com-plex Systems , A. Minai, D. Braha, and Y. Bar-Yam, (Eds.). Vol. VII. Springer, BerlinHeidelberg, 100–115. URL http://arxiv.org/abs/0704.0304 . Gershenson, C. and Fern´andez, N. (2012). Complexity and information: Measuringemergence, self-organization, and homeostasis at multiple scales.
Complexity (2): 29–44. URL http://dx.doi.org/10.1002/cplx.21424 . Gleick, J. (2011).
The information: A history, a theory, a flood . Pantheon, New York.
Johnson, J. (2009).
Hypernetworks in the science of complex systems . Imperial CollegePress. 8 angton, C. (1990). Computation at the edge of chaos: Phase transitions and emergentcomputation.
Physica D : 12–37. Lerman, K. and Ghosh, R. (2010). Information contagion: An empirical study of thespread of news on digg and twitter social networks. In
Proceedings of 4th InternationalConference on Weblogs and Social Media (ICWSM) . Moussa¨ıd, M. , Helbing, D. , and Theraulaz, G. (2011). How simple rules determinepedestrian behavior and crowd disasters. PNAS (17) (April): 6884–6888. URL http://dx.doi.org/10.1073/pnas.1016507108 . Newman, M. (2010).
Networks: An Introduction . Oxford University Press, Oxford, UK.
Peirce, C. S. (1991).
Peirce on signs: Writings on semiotic by Charles Sanders Peirce .University of North Carolina Press.
Prokopenko, M. , Boschetti, F. , and Ryan, A. J. (2009). An information-theoreticprimer on complexity, self-organisation and emergence. Complexity (1): 11–28. URL http://dx.doi.org/10.1002/cplx.20249 . Shannon, C. E. (1948). A mathematical theory of communication.
Bell System TechnicalJournal : 379–423 and 623–656. URL http://tinyurl.com/6qrcc . Steels, L. (1997). The synthetic modeling of language origins.
Evolution of communica-tion (1): 1–34. Turing, A. M. (1936). On computable numbers, with an application to the Entschei-dungsproblem.
Proceedings of the London Mathematical Society, Series 2 : 230–265.URL . von Ahn, L. (2009). Human computation. In Design Automation Conference, 2009. DAC’09. 46th ACM/IEEE . pp. 418–419. von Baeyer, H. C. (2005).
Information: The New Language of Science . Harvard Univer-sity Press, Cambridge, MA. URL . Wolfram, S. (2002).
A New Kind of Sciene . Wolfram Media. URL