No Substitute for Functionalism -- A Reply to 'Falsification & Consciousness'
CC-W
ARS : T HE U NFOLDING A RGUMENT S TRIKES B ACK - AR
EPLY TO ‘F ALSIFICATION & C
ONSCIOUSNESS ’ A P
REPRINT
Natesh Ganesh
Department of PhysicsUniversity of Colorado, BoulderBoulder, Colorado 80305 [email protected]
June 25, 2020 A BSTRACT
The unfolding argument was presented by Doerig et.al. in [1] as an argument to show that causalstructure theories (CST) like IIT are either falsified or outside the realm of science. In [2] and then [3],the authors mathematically formalized the process of generating observable data from experimentsand using that data to generate inferences and predictions onto an experience space. The resulting substitution argument built on this formal framework was used to show that all existing theoriesof consciousness were pre-falsified if the inference reports are valid. If this argument is indeedcorrect, it would have a profound effect on the field of consciousness as a whole indicating extremelyfundamental problems that would require radical changes to how consciousness science is performed.However in this note the author identifies the shortcomings in the formulation of the substitutionargument and explains why it’s claims about functionalist theories are wrong. K eywords Consciousness · Falsification · Unfolding Argument · Substitution Argument · IIT · Causal Structure · Krohn-Rhodes Theorem
The ‘unfolding argument’ presented in [1] made the case that IIT and other causal structure theories (CSTs) are eitheralready falsified or outside the realm of science. This argument was first extended in [2] and a more generalized versionwas presented as the ‘substitution arguments’ in [3].The author here will assume that readers are pretty familiar with all3 papers -[1], [2] and [3]. The focus will be on the last one which we find to be the most general version of the argumentsand the most broad in claims. It is very interesting work, accessible and proposes a descriptive mathematical frameworkthat could be very useful moving forward. For the sake of brevity, we will borrow the symbols and terminologies from[3] as much as possible to point out the errors in that model and make suitable corrections. With these correctionsincorporated, we should be able see that the substitutions argument does not apply for functionalist theories (or at besthave not been proven to do so) in [3].In this short note, we will start by introducing some relevant concepts and definitions from [3] in section 2. The maincontribution of this note is section 3, where we present arguments as to why the results of the substitution argument doesnot apply for functionalist theories of consciousness by pointing what the formalism missed. The note will conclude insection 4 summarizing the ideas presented here.
We start with the summary of the different parts of the figure from section 2.5 in [3] in detail, since the error stems fromthese very definitions. a r X i v : . [ c s . OH ] M a y PREPRINT - J
UNE
25, 2020Figure 1: This picture illustrates substitutions. Assume that some data set o with inference content o r is given. Asubstitution is a transformation T of physical systems which leaves the inference content or invariant but which changesthe result of the prediction process. Thus whereas p and T ( p ) have the same inference content or, the prediction contentof experimental data sets is different. Different in fact to such an extent that the predictions of consciousness based onthese datasets are incompatible (illustrated by the non-overlapping circles on the right). Here we have used that bydefinition of P o r , every p ∈ P o r yields at least one data set o (cid:48) with the same inference content as o and have identifiedas o and o (cid:48) in the drawing [3].1. P denotes a class of physical systems, each in various different configurations. In most cases, every p ∈ P thus describes a physical system in a particular state, dynamical trajectory, or configuration.2. obs is a correspondence which contains all details on how the measurements are set up and what is measured.It describes how measurement results (data sets) are determined by a system configuration under investigation.This correspondence is given, though usually not explicitly known, once a choice of measurement scheme hasbeen made. O is the class of all possible data sets that can result from observations or measurements of thesystems in the class P . Any single experimental trail results in a single data set o ∈ O , whose data is used formaking predictions based on the theory of consciousness and for inference purposes.3. pred describes the process of making predictions by applying some theory of consciousness to a data set o . Itis therefore a mapping from O to E .4. E denotes the space of possible experiences specified by the theory under consideration. The result of theprediction is a subset of this space, denoted as pred ( o ) . Elements of this subset are denoted by e i and describepredicted experiences.5. inf describes the process of inferring a state of experience from some observed data, e.g. verbal reports,button presses or using no-report paradigms. Inferred experiences are denoted by e r .Under these definitions for the different components, falsification is defined ( Definition 2.1 in [3]) as there is afalsification at o ∈ O if we have inf ( o ) / ∈ pred ( o ) . A substitution is defined ( Definition 3.1 ) as a o r -substitution ifthere is a transformation S : P o r → P o r such that at least for one p ∈ P o r - pred · obs ( p ) ∩ pred · obs ( S ( p )) = φ . Thecrux of the paper i.e. the substitution argument (represented in the Fig.(1) here) is built using a number of propositionand lemmas, and is given in Theorem 3.10 from [3] - Theorem 3.10 - If inference and prediction data are independent, either every single inference operation is wrong orthe theory under consideration is already falsified.
Inference and prediction data is defined as independent (by
Definition 3.8 ) if for any o i , o (cid:48) i and o r , there is a variation v : P → P such that o i ∈ obs ( p ) , o (cid:48) i ∈ obs ( v ( p )) , but o r ∈ obs ( p ) and o r ∈ obs ( v ( p )) for some p ∈ P .The authors in [3] then suggest that the unfolding argument against IIT is a special case of the substitution argument,and the latter applies to all known frameworks for the science of consciousness including functionalist frameworks. It isa pretty big claim that all functionalist frameworks have every single inference wrong or have already been falsified.Since the author’s contention in this note is whether the substitution argument has been properly applied with respect tofunctionalist frameworks, we will quote from section 3.4.4 from [3] where this is discussed - Artificial neural networks. ANNs, particularly those trained using deep learning, have grown increasingly powerfuland capable of human-like performance (LeCun et al., 2015; Bojarski et al., 2016). For any ANN, report (output) is PREPRINT - J
UNE
25, 2020 a function of node states. Crucially, this function is non-injective, i.e. some nodes are not part of the output. E.g.,in deep learning, the report is typically taken to consist of the last layer of the ANN, while the hidden layers are nottaken to be part of the output. Correspondingly, for any given inference data, one can construct a ANN with arbitraryprediction data by adding nodes, changing connections and changing those nodes which are not part of the output. Putdifferently, one can always substitute a given ANN with another with different internal observables but identical ornear-identical reports. From a mathematical perspective it is well-known that both single-layer ANNs and recurrentANNs can approximate any given function (Hornik et al., 1989; Schafer and Zimmermann, 2006). Since reports are justsome function, there are viable universal substitutions that provably exist.A special case thereof is the unfolding transformation considered in Doerig et al. (2019) in the context of IIT. Thearguments in this paper constitute a proof of the fact that for ANNs, inference and prediction data are independent(Definition 3.8). Crucially, our main theorem shows that this has implications for all minimally informative theoriesof consciousness. A similar results (using a different characterization of theories of consciousness than minimallyinformative) has been shown in Kleiner (2019).
While we have introduced the most relevant parts here, we strongly encourage readers to read the full paper in [3]. Wewill now explore the shortcomings of the substitution argument in the next section and explain why it has not beenproperly applied to functionalist frameworks.
The work in [3] is a very interesting base to build from for further formalizing the science of consciousness. However thebroad mathematical picture it paints presents a low resolution view of experimental work and modeling methodologiesthat coarse-grains some very important details. In their paper, Hoel and Kleiner referred to neural networks during theirdiscussion of functionalist frameworks and the equivalence of feedforward and recurrent networks that can producethe same input-output behavior (as utilized in [1]. We will discuss what is missed in their framework using a simpleexample from a machine state functionalism picture which was used in [2] (A much broader argument equivalent to theunfolding argument was presented in [5] using the Krohn-Rhodes decomposition [6], [7] of finite state automaton).Since neural networks can be decomposed into finite state automata, the work presented here can be extended to coverthe cases discussed in [3].From [4], we have machine state functionalism to be - any creature with a mind can be regarded as a Turing machine(an idealized finite state digital computer), whose operation can be fully specified by a set of instructions (a “machinetable” or program) each having the form: If the machine is in state S i , and receives input I j , it will go into state S k and produce output O l (for a finite number of states, inputs and outputs).A machine table of this sort describes the operation of a deterministic automaton, but most machine state functionalists(e.g. Putnam 1967) take the proper model for the mind to be that of a probabilistic automaton: one in which theprogram specifies, for each state and set of inputs, the probability with which the machine will enter some subsequentstate and produce some particular output.On either model, however, the mental states of a creature are to be identifiedwith such “machine table states” ( S , S , .., S n ) . These states are not mere behavioral dispositions, since they arespecified in terms of their relations not only to inputs and outputs, but also to the state of the machine at the time. Let us now set up the machine state functionalism picture under the framework presented in [3]. Consider a physicalsystem p ∈ P that produces the finite state automaton shown with the corresponding transition table is given inFig.2(a). The transition table which captures the relationship between the current state, inputs and next state specifiesthe functional structure. The pred and inf functions (correspondences) operate on these automaton states, inputs andoutputs to the space of E and generate both o i and o r as shown in Fig.2(b).In our case here, the variation map T ( p ) is simply a Krohn-Rhodes (KR) decomposition which produces a homomorphiccascaded automaton (that might have a different state space and transition map - we will deal with isomorphic case later)with the same input-output characteristics as shown in Fig.(3). In this figure we have a variation T ( p ) of the physicalsystem p which produces a different transition table (at the bottom) corresponding to this KR decomposition whilemaintaining the same input-output characteristics. Since inf obtained by a verbal report (for example) are functionsof the output, the KR-decomposed state machine will produce the same inf . However since this transition table isdifferent, application of the pred function on the states and their transitions (i.e. functional structure) should producea different prediction of the experience, hence indicating s successful substitution i.e falsification of machine statefunctionalism if the inference report is to be believed.To identify what the argument missed, let’s take another step closer and focus on the pred function. In the 1stinstance when dealing with p , we have the pred function which maps the internal states and inputs (the transitionsto the next state are a function of the these two) to an experience in E . For example, this might look something like3 PREPRINT - J
UNE
25, 2020
Experiences E pred inf
Physical system p obs (a)(b)
Figure 2: (a) 2-bit/4 state finite state automaton and it’s corresponding transition table indicating the next state for everycurrent state and input. (b) A functionalist framework generating a transition table from a physical system p using the obs correspondence. The pred and inf functions act on this state machine to produce model predictions and inferencereports on the experience space E . pred ( current − state = 00 , input = 1 , next − state = 10) = ’happy’ and inference report of ‘happy’. Now with theinference report being fixed, there would be a substitution T ( p ) that generates a different transition table as per the KR-decomposition, we have that pred ( current − state = 000 , input = 1 , next − state = 001) = ’sad’. Clearly the pred fucntion mapped to different experiences for the same inference report. But we should notice that argument space forthe pred function has now changed with respect to the machine states from { , , , } to { , , ..., , } .In order to be consistent, we need to have a different prediction function pred (cid:48) which is properly defined and can act onthe new state space to generates maps to E . Thus we rewrite pred ( current − state = 000 , input = 1 , next − state =001) = ’sad’ to pred (cid:48) ( current − state = 000 , input = 1 , next − state = 001) = ’sad’.Clearly we can see that for the substitution argument to be successful, we need pred [ obs ( p )] ∩ pred [ obs ( T ( p ))] = φ .But when properly accounting for the change in the prediction function, what we have here is that pred [ obs ( p )] ∩ pred (cid:48) [ obs ( T ( p ))] = φ , which is not falsification by the very definition. One could argue that a theoretical frameworkcould have a set of prediction functions { pred } and if pred, pred (cid:48) ∈ { pred } . Then a slight modification in the originaldefinition of o r substitution allowing for multiple prediction functions would allow one to infer that a framework hasbeen falsified if pred [ obs ( p )] ∩ pred (cid:48) [ obs ( T ( p ))] = φ . This captures the main drawback of the entire mathematicalformalization - it operates at an high level with low resolutions, which coarse-grains out some of these crucial nuances.Let us now explore why the substitution argument still does not work even with a modified definition for substitution.Under this new definition, we can have a substitution if pred [ obs ( p )] ∩ pred (cid:48) [ obs ( T ( p ))] = φ where both pred, pred (cid:48) ∈{ pred } o r . Assume you are a scientist working in the science of consciousness. You generate the datasets from system p and use a state machine functionalist picture to generate the transition table of the 2-bit state machine A (with 4 states).You then proceed to use a prediction function pred that maps these states and inputs to space of experience E . Nowyou are working with the transformed system T ( p ) which produces data from which you extract the transition tablefor the 3-bit machine A (cid:48) . We assume that A and A (cid:48) are related through a KR-decomposition and maintain the sameinput-output relationships and inference reports. You can use your original pred function only if the new 3-bit statespace is suitably coarse-grained to the original 2-bit space (suitably here referring to consistency to be maintained with4 PREPRINT - J
UNE
25, 2020
Experiences E predinf
Physical systemp obs pred
Physical system
T(p) obs
Observation O
000 001011 101001 111100 000101 000
Figure 3: A detailed look into the substitution argument using variation T on the physical system p , in the case of amachine state functionalist framework. The inference reports inf in both cases are the same while the predictions pred vary on the experience space E respect to the KR-transformation). Of course, by the very nature of the KR decomposition, if the state-space changeis properly implemented, we would obtain the pred results that is identical to results obtained by prediction from A .The other option is to identify a new prediction function pred (cid:48) which can operate on the 3-bit state space of A (cid:48) . Thisfunction cannot be picked independently of our knowledge of both A and pred . In fact if we know that A and A (cid:48) arerelated through a KR-transformation, then the choice of pred (cid:48) specifically depends on pred and should be chosen suchthat pred [ A ] = pred (cid:48) [ A (cid:48) ] in order to maintain consistency. However this constraint on the choice of pred (cid:48) immediatelyentails that the substitution argument fails with respect to functionalist theories. We would specifically need erroneousmethodology with respect to the state space change to make the substitution argument here.We can clarify this further with a very simple analogy of Cartesian and polar coordinates (Note that this is not anexample, but a simple analogy to pump our intuitions about the larger idea and like all analogies does not perfectlymap to the original concept). Let the angle θ be the equivalent of the result of pred function onto the experience space,and say the inference report will be the circle generated with origin as center and radius= (cid:112) x + y . The angle θ in a 2D case using the Cartesian coordinates ( x, y ) is given by the function F as θ = F ( x, y ) = tan − (cid:0) yx (cid:1) . Nowthere is a very mathematical transformation M that maps the Cartesian coordinates ( x, y ) to polar coordinates ( r, θ ) - M : ( x, y ) → ( r, θ ) , where r = (cid:112) x + y and θ = tan − (cid:0) yx (cid:1) respectively. In this analogy, M plays the role of T variation map in [3]. Note that the circle generated by the polar coordinates (i.e.inference report) remains the same overthe transformation M . We can see that applying F on the new polar coordinates to generate θ will give a different valuei.e. F ( x, y ) (cid:54) = F ( r, θ ) and is in fact nonsensical since the function F is only defined over ( x, y ) (even though we havetwo sets of numbers in both cases). But this is what the authors in [3] are expecting us to do when they apply the exactsame pred function on the transition table generated by T ( p ) in order to generate the substitution argument. In thepolar coordinate system, let us say the angle be given by a different function G ( r, θ ) (equivalent to pred (cid:48) ). Consider thefunctions F , G and M . Given the mapping between ( x, y ) to ( r, θ ) through M , F and G are not independent of eachother. With F and M fixed, an arbitrary choice of G is not possible if the objective of G is to calculate the angle θ . Tobe consistent, we need to pick G ( r, θ ) = θ i.e. the choice of G is validated by ensuring that the angle calculated in5 PREPRINT - J
UNE
25, 2020both systems need to be equal to each other - F ( x, y ) = θ = G ( r, θ ) . For the substitution-like argument to work inthis case, you would need all the functions F , G and T to be appropriately picked to ensure consistency, and still have F ( x, y ) (cid:54) = G ( r, θ ) which would not be possible by the very definition.Thus once the finer details in the differences between different prediction functions are accounted for, we can seethat for (machine state) functionalist frameworks, the substitution argument would only work under the following badmethodological scenarios - • Trying to apply the pred function on a domain space on which it is not defined i.e. trying to apply F definedin the ( x, y ) domain on ( r, θ ) and expect to get accurate answers. • Assuming pred and pred (cid:48) are not constrained by each other and the variation map T i.e. F and M constrain G such that it has to give the same angle θ . Picking pred (cid:48) arbitrarily is bound to give inaccurate predictions.The next argument against the author’s objections would be to point towards the case of isomorphic KR-decompositions,in which the number of system states and the global network topology remains consistent and the only differencecorresponds to a permutation in the state’s labels [5]. Our original argument rested on the difference between thesize of the state space of the different state machines. But what if the KR-decomposition of state machine producesanother with the same number of states? This argument once again arises due to the coarse-grained view created bythe original framework. To understand this better, let us go back to the case of a scientist working performing theseexperiments. The scientists uses the observation function obs on physical system p to generate different datasets uponwhich the transition table is constructed and the pred function is defined. Now consider the case where the scientistis operating on an isomorphic KR-transformation T ( p ) such that the global topology and number of states remainconsistent, while state labels are permuted. In the most straightforward scenario, application of obs on T ( p ) producesan identical state machine as the earlier case and the application of pred produces the same result i.e. there is nosubstitution or subsequently falsification with pred [ obs ( p )] = pred [ obs ( T ( p ))] . Moving on to the scenario whereapplication of obs on T ( p ) produced a state transition table with the same number of states and transition topology butwith labels permuted. It seems like the substitution argument wins out here, since the pred function can be applied hereas well with no change in the state space dimensions but obtaining a different prediction results. However this is onceagain not the case due to the following reasons - as the scientist applies obs to generate the isomorphic machine withpermuted labels, it is reasonable to assume that the labeling scheme is contained within obs in the manner in which theexperiment is setup. Now when the labeling scheme is changed (as required to permute the state symbols) during theprocess of generating datasets, we can keep track of this change with respect to the original labels and thus understandthe label-label map between the two state machines. While the pred function can technically be applied on the new statemachine, it would be poor practice to directly do so knowing the change in label scheme, without first un-permuting thestate symbols. We need to first un-permute the labels and then we can apply the pred function on the state machinewith the original labels. This would produce the same prediction results as the earlier case. We can think about this asreplacing the pred once again with a new function pred (cid:48) = pred · unpermute . With this definition of pred (cid:48) , we goback to the case of utilizing pred (cid:48) discussed earlier in this section. We see that with the permutation taken into account,we have pred [ obs ( p )] = pred [ unpermute ( obs ( T ( p )))] and the substitution argument fails. We will make this clearwith a simple example - consider running experiments on physical system p , generating a 2-bit (4 state) finite stateautomaton by picking a labeling scheme by assigning to specific physical observables (like say voltage highs/lowsare binary ‘1’/‘0’s) and making predictions on the experience space by applying pred on the states of the automaton.Next, you are working on different physical system T ( p ) and generate an isomorphic 2-bit state machine in which thelabels of the states are changed by flipping the labeling assignment i.e. voltage lows/highs are now mapped to binary‘1’/‘0’s. Of course the original pred function can still be applied to the 2-bit state space of this new automaton, but itwould be overly optimistic to expect the same prediction results on the experience space as the original state machinegiven the knowledge of labeling scheme change. The commonsense methodology to make predictions would be togenerate a new function pred (cid:48) which swaps the labels first and then applies pred to the resulting state of the automaton.Once the labels are flipped and we apply the prediction function, we should expect to get the same results as the earlierstate machine (how could we not?). Note that while the independence/dependence of prediction and inference datawas discussed in [3], the authors did not study this dependence between different pred functions, and the relationshipbetween pred and obs . Their mathematical framework as presented does not recognize these crucial interdependenciesthat exist between T , obs and pred , which results in it’s erroneous claims.The translation from state machine picture to the examples of neural networks discussed in the original paper isstraightforward. As stated in [3] - “ for any given inference data, one can construct a ANN with arbitrary predictiondata by adding nodes, changing connections and changing those nodes which are not part of the output. Put differently,one can always substitute a given ANN with another with different internal observables but identical or near-identicalreports” . This simply maps to either changing the state space of the network and/or relabeling the states and thus willnot work. While the author here has specifically focussed on state machines and neural networks, we suspect that6 PREPRINT - J
UNE
25, 2020objections of the same flavor can be constructed against the other examples of universal computers and intelligences ifthey are cast under a functionalist picture. We will now conclude this note by summarizing the results in this paper.
The substitution argument presented in [3] would have massive implications in the field of consciousness if it ‘pre-falsifies’ all the current major frameworks of consciousness and points towards a new approach in this researcharea. However the mathematical formalization of the observation-prediction-inference process coarse-grains outsome important details which would implicitly invalidate the claims made in the paper with respect to functionalistframeworks. We would need to follow poor experimental and modeling methodologies to allow for the substitutionargument to apply on functionalist frameworks and falsify it. Once we take a more fine-grained view and accountfor the dependencies between these different components, the substitution argument once again simply reinforces theresults of the unfolding (like) arguments [1], [5] that the integrated information theory (and causal structures theories)are falsified if the inference report are taken to be valid. The science of consciousness is safe for now and we should notbe hasty in abandoning existing frameworks and methodologies to explore phenomenology-first approaches.
References [1] Doerig, Adrien, et al., “The unfolding argument: Why IIT and other causal structure theories cannot explainconsciousness.”
Consciousness and Cognition , 72 (2019): 49-59.[2] Kleiner, Johannes, “On empirical well-definedness of models of consciousness.”
Psyarxiv , (2019).[3] Kleiner, Johannes and Hoel, Erik, “Falsification and consciousness.”
Psyarxiv , (2020)[4] Levin, Janet, “Functionalism,” Stanford Encyclopedia of Philosophy, (2004).[5] Hanson, Jake R., and Sara I. Walker. "Integrated Information Theory and Isomorphic Feed-Forward PhilosophicalZombies." Entropy 21.11 (2019): 1073.[6] Krohn, K, Rhodes, J, “Algebraic theory of machines - I: Prime decomposition theorem for finite semigroups andmachines,”
Transactions of the American Mathematical Society , 1965, 116, 450–464.[7] Zeiger, H.P, “Cascade synthesis of finite-state machines,”