[PDF] Are mental properties supervenient on brain properties?

Abstract

The "mind-brain supervenience" conjecture suggests that all mental properties are derived from the physical properties of the brain. To address the question of whether the mind supervenes on the brain, we frame a supervenience hypothesis in rigorous statistical terms. Specifically, we propose a modified version of supervenience (called epsilon-supervenience) that is amenable to experimental investigation and statistical analysis. To illustrate this approach, we perform a thought experiment that illustrates how the probabilistic theory of pattern recognition can be used to make a one-sided determination of epsilon-supervenience. The physical property of the brain employed in this analysis is the graph describing brain connectivity (i.e., the brain-graph or connectome). epsilon-supervenience allows us to determine whether a particular mental property can be inferred from one's connectome to within any given positive misclassification rate, regardless of the relationship between the two. This may provide motivation for cross-disciplinary research between neuroscientists and statisticians.

Full PDF

AAre mental properties supervenient on brain properties?

Joshua T. VogelsteinDepartment of Applied Mathematics & Statistics, Johns Hopkins UniversityR. Jacob VogelsteinNational Security Technology Department, Johns Hopkins University Applied Physics LaboratoryCarey E. PriebeDepartment of Applied Mathematics & Statistics, Johns Hopkins University

Abstract

The“mind-brain supervenience” conjecture suggests that all mental properties are derived from the physical proper-ties of the brain. To address the question of whether the mind supervenes on the brain, we frame a superveniencehypothesis in rigorous statistical terms. Speciﬁcally, we propose a modiﬁed version of supervenience (called ε -supervenience) that is amenable to experimental investigation and statistical analysis. To illustrate this approach, weperform a thought experiment that illustrates how the probabilistic theory of pattern recognition can be used to make a one -sided determination of ε -supervenience. The physical property of the brain employed in this analysis is the graphdescribing brain connectivity (i.e., the brain-graph or connectome). ε -supervenience allows us to determine whether aparticular mental property can be inferred from one’s connectome to within any given positive misclassiﬁcation rate,regardless of the relationship between the two. This may provide motivation for cross-disciplinary research betweenneuroscientists and statisticians. Introduction

Questions and assumptions about mind-brain supervenience go back at least as far as Plato’s dialogues in circa 400BCE [1]. While there are many different notions of supervenience, we ﬁnd Davidson’s canonical description particu-larly illustrative [2]:[mind-brain] supervenience might be taken to mean that there cannot be two events alike in all physicalrespects but differing in some mental respect, or that an object cannot alter in some mental respect withoutaltering in some physical respect.This philosophical conjecture has potentially widespread implications. For example, neural network theory and ar-tiﬁcial intelligence often implicitly assume a local version mind-brain supervenience [3, 4]. Cognitive neurosciencesimilarly seems to operate under such assumptions [5]. Philosophers continue to debate and reﬁne notions of su-pervenience [6]. Yet, to date, relatively scant attention has been paid to what might be empirically learned aboutsupervenience.In this work we attempt to bridge the gap between philosophical conjecture and empirical investigations by castingsupervenience in a probabilistic framework amenable to hypothesis testing. We then use the probabilistic theory ofpattern recognition to determine the limits of what one can and cannot learn about supervenience through data analysis.

Results

Statistical supervenience: a deﬁnition

Let M = { m , m , . . . } be the space of all possible minds and let B = { b , b , . . . } be the set of all possible brains. M includes a mind for each possible collection of thoughts, memories, beliefs, etc. B includes a brain for each possibleposition and momentum of all subatomic particles within the skull. Given these deﬁnitions, Davidson’s conjecture may1 a r X i v : . [ q - b i o . O T ] A ug ogelstein JT, et al. Statistical Superveniencebe concisely and formally stated thusly: m (cid:54) = m (cid:48) = ⇒ b (cid:54) = b (cid:48) , where ( m, b ) , ( m (cid:48) , b (cid:48) ) ∈ M × B are mind-brain pairs.This mind-brain supervenience relation does not imply an injective relation, a causal relation, or an identity relation(see Appendix for more details and some examples). To facilitate both statistical analysis and empirical investigation,we convert this supervenience relation from a logical to a probabilistic relation.Let F MB indicate a joint distribution of minds and brains. Statistical supervenience can then be deﬁned as follows: Deﬁnition 1. M is said to statistically supervene on B for distribution F = F MB , denoted M S ∼ F B , if and only if P [ m (cid:54) = m (cid:48) | b = b (cid:48) ] = 0 , or equivalently P [ m = m (cid:48) | b = b (cid:48) ] = 1 . Statistical supervenience is therefore a probabilistic relation on sets (which could be considered a generalization ofcorrelation; see Appendix for details).

Statistical supervenience is equivalent to perfect classiﬁcation accuracy

If minds statistically supervene on brains, then if two minds differ, there must be some brain-based difference toaccount for the mental difference. This means that there must exist a deterministic function g ∗ mapping each brainto its supervening mind. One could therefore, in principle, know this function. When the space of all possible mindsis ﬁnite—that is, |M| < ∞ —any function g : B → M mapping from minds to brains is called a classiﬁer . Deﬁnemisclassiﬁcation rate, the probability that g misclassiﬁes b under distribution F = F MB , as L F ( g ) = P [ g ( B ) (cid:54) = M ] = (cid:88) ( m,b ) ∈M×B I { g ( b ) (cid:54) = m } P [ B = b, M = m ] , (1)where I {·} denotes the indicator function taking value unity whenever its argument is true and zero otherwise. TheBayes optimal classiﬁer g ∗ minimizes L F ( g ) over all classiﬁers: g ∗ = argmin g L F ( g ) . The Bayes error , or Bayesrisk, L ∗ = L F ( g ∗ ) , is the minimum possible misclassiﬁcation rate.The primary result of casting supervenience in a statistical framework is the below theorem, which follows imme-diately from Deﬁnition 1 and Eq. (1): Theorem 1. M S ∼ F B ⇔ L ∗ = 0 . The above argument shows (for the ﬁrst time to our knowledge) that statistical supervenience and zero Bayes errorare equivalent. Statistical supervenience can therefore be thought of as a constraint on the possible distributions onminds and brains. Speciﬁcally, let F indicate the set of all possible joint distributions on minds and brains, and let F s = { F MB ∈ F : L ∗ = 0 } be the subset of distributions for which supervenience holds. Theorem 1 implies that F s (cid:36) F . Mind-brain supervenience is therefore an extremely restrictive assumption about the possible relationshipsbetween minds and brains. It seems that such a restrictive assumption begs for empirical evaluation, vis-´a-vis, forinstance, a hypothesis test. The non-existence of a viable statistical test for supervenience

The above theorem implies that if we desire to know whether minds supervene on brains, we can check whether L ∗ = 0 . Unfortunately, L ∗ is typically unknown. Fortunately, we can approximate L ∗ using training data.Assume that training data T n = { ( M , B ) , . . . , ( M n , B n ) } are each sampled identically and independently (iid)from the true (but unknown) joint distribution F = F MB . Let g n be a classiﬁer induced by the training data, g n : B × ( M × B ) n (cid:55)→ M . The misclassiﬁcation rate of such a classiﬁer is given by L F ( g n ) = (cid:88) ( m,b ) ∈M×B I { g n ( b ; T n ) (cid:54) = m } P [ B = b, M = m ] , (2)which is a random variable due to the dependence on a randomly sampled training set T n . Calculating the ex-pected misclassiﬁcation rate E [ L F ( g n )] is often intractable in practice because it requires a sum over all possi-ble training sets. Instead, expected misclassiﬁcation rate can be approximated by “hold-out” error. Let H n (cid:48) = { ( M n +1 , B n +1 ) , . . . , ( M n + n (cid:48) , B n + n (cid:48) ) } be a set of n (cid:48) hold-out samples, each sampled iid from F MB . The hold-outapproximation to the misclassiﬁcation rate is given by (cid:98) L n (cid:48) F ( g n ) = (cid:88) ( M i ,B i ) ∈H n (cid:48) I { g n ( B i ; T n ) (cid:54) = M i } ≈ E [ L F ( g n )] ≥ L ∗ . (3)2ogelstein JT, et al. Statistical SupervenienceBy deﬁnition of g ∗ , the expectation of (cid:98) L n (cid:48) F ( g n ) (with respect to both T n and H n (cid:48) ) is greater than or equal to L ∗ for any g n and all n . Thus, we can construct a hypothesis test for L ∗ using the surrogate (cid:98) L n (cid:48) F ( g n ) .A statistical test proceeds by specifying the allowable Type I error rate α > and then calculating a test statistic.The p -value—the probability of rejecting the least favorable null hypothesis (the simple hypothesis within the poten-tially composite null which is closest to the boundary with the alternative hypothesis)—is the probability of observinga result at least as extreme as the observed. In other words, the p -value is the cumulative distribution function of thetest statistic evaluated at the observed test statistic with parameter given by the least favorable null distribution. Wereject if the p -value is less than α . A test is consistent whenever its power (the probability of rejecting the null whenit is indeed false) goes to unity as n → ∞ . For any statistical test, if the p -value converges in distribution to δ (pointmass at zero), then whenever α > , power goes to unity.Based on the above considerations, we might consider the following hypothesis test: H : L ∗ > and H A : L ∗ =0 ; rejecting the null indicates that M S ∼ F B . Unfortunately, the alternative hypothesis lies on the boundary, so the p -value is always equal to unity [7]. From this, Theorem 2 follows immediately: Theorem 2.

There does not exist a viable test of M S ∼ F B . In other words, we can never reject L ∗ > in favor of supervenience, no matter how much data we obtain. Conditions for a consistent statistical test for ε -supervenience To proceed, therefore, we introduce a relaxed notion of supervenience:

Deﬁnition 2. M is said to ε -supervene on B for distribution F = F MB , denoted M ε ∼ F B , if and only if L ∗ < ε forsome ε > . Given this relaxation, consider the problem of testing for ε -supervenience: H ε : L ∗ ≥ εH εA : L ∗ < ε. Let ˆ n = n (cid:48) (cid:98) L n (cid:48) F ( g n ) be the test statistic . The distribution of ˆ n is available under the least favorable null distribution.For the above hypothesis test, the p -value is therefore the binomial cumulative distribution function with parameter ε ; that is, p -value = B (ˆ n ; n (cid:48) , ε ) = (cid:80) k ∈ [ˆ n ] Binomial ( k ; n (cid:48) ; ε ) , where [ˆ n ] = { , , . . . , ˆ n } . We reject whenever this p -value is less than α ; rejection implies that we are − α )% conﬁdent that M ε ∼ F B .For the above ε -supervenience statistical test, if g n → g ∗ as n → ∞ , then (cid:98) L n (cid:48) F ( g n ) → L ∗ as n, n (cid:48) → ∞ . Thus, if L ∗ < ε , power goes to unity. The deﬁnition of ε -supervenience therefore admits, for the ﬁrst time to our knowledge,a viable statistical test of supervenience, given a speciﬁed ε and α . Moreover, this test is consistent whenever g n converges to the Bayes classiﬁer g ∗ . The existence and construction of a consistent statistical test for ε -supervenience The above considerations indicate the existence of a consistent test for ε -supervenience whenever the classiﬁer usedis consistent. To actually implement such a test, one must be able to (i) measure mind/brain pairs and (ii) have aconsistent classiﬁer g n . Unfortunately, we do not know how to measure the entirety of one’s brain, much less one’smind. We therefore must restrict our interest to a mind/brain property pair. A mind (mental) property might be aperson’s intelligence, psychological state, current thought, gender identity, etc. A brain property might be the numberof cells in a person’s brain at some time t , or the collection of spike trains of all neurons in the brain during sometime period t to t (cid:48) . Regardless of the details of the speciﬁcations of the mental property and the brain property, givensuch speciﬁcations, one can assume a model, F . We desire a classiﬁer g n that is guaranteed to be consistent, nomatter which of the possible distributions F MB ∈ F is the true distribution. A classiﬁer with such a property iscalled a universally consistent classiﬁer . Below, under a very general mind-brain model F , we construct a universallyconsistent classiﬁer. Gedankenexperiment Let the physical property under consideration be brain connectivity structure, so b is a brain-graph (“connectome”) with vertices representing neurons (or collections thereof) and edges representing synapses (or collections thereof). Further let B , the brain observation space, be the collection of all graphs on a given ﬁnite numberof vertices, and let M , the mental property observation space, be ﬁnite. Now, imagine collecting very large amounts ofvery accurate identically and independently sampled brain-graph data and associated mental property indicators from F MB . A k n -nearest neighbor classiﬁer using a Frobenius norm is universally consistent (see Methods for details).The existence of a universally consistent classiﬁer guarantees that eventually (in n, n (cid:48) ) we will be able to conclude M ε ∼ F B for this mind-brain property pair, if indeed ε -supervenience holds. This logic holds for directed graphs ormultigraphs or hypergraphs with discrete edge weights and vertex attributes, as well as unlabeled graphs (see [8] fordetails). Furthermore, the proof holds for other matrix norms (which might speed up convergence and hence reducethe required n ), and the regression scenario where |M| is inﬁnite (again, see Methods for details). Thus, under the conditions stated in the above

Gedankenexperiment , universal consistency yields:

Theorem 3. M ε ∼ F B = ⇒ β → as n, n (cid:48) → ∞ . Unfortunately, the rate of convergence of L F ( g n ) to L F ( g ∗ ) depends on the (unknown) distribution F = F MB [9]. Furthermore, arbitrarily slow convergence theorems regarding the rate of convergence of L F ( g n ) to L F ( g ∗ ) demonstrate that there is no universal n, n (cid:48) which will guarantee that the test has power greater than any speciﬁedtarget β > α [10]. For this reason, the test outlined above can provide only a one-sided conclusion: if we reject wecan be − α ) % conﬁdent that M ε ∼ F B holds, but we can never be conﬁdent in its negation; rather, it may be thecase that the evidence in favor of M ε ∼ F B is insufﬁcient because we simply have not yet collected enough data. Thisleads immediately to the following theorem: Theorem 4.

For any target power β min > α , there is no universal n, n (cid:48) that guarantees β ≥ β min . Therefore, even ε -supervenience does not satisfy Popper’s falsiﬁability criterion [11]. The feasibility of a consistent statistical test for ε -supervenience Theorem 3 demonstrates the availability of a consistent test under certain restrictions. Theorem 4, however, demon-strates that convergence rates might be unbearably slow. We therefore provide an illustrative example of the feasibilityof such a test on synthetic data.

Caenorhabditis elegans is a species whose nervous system is believed to consist of the same labeled neuronsfor each organism [12]. Moreover, these animals exhibit a rich behavioral repertoire that seemingly depends on circuitproperties [13]. These ﬁndings motivate the use of C. elegans for a synthetic data analysis [14]. Conducting suchan experiment requires specifying a joint distribution F MB over brain-graphs and behaviors. The joint distributiondecomposes into the product of a class-conditional distribution (likelihood) and a prior, F MB = F B | M F M . The priorspeciﬁes the probability of any particular organism exhibiting the behavior. The class-conditional distribution speciﬁesthe brain-graph distribution given that the organism does (or does not) exhibit the behavior.Let A uv be the number of chemical synapses between neuron u and neuron v according to [15]. Then, let S be theset of edges deemed responsible for odor-evoked behavior according to [16]. If odor-evoked behavior is supervenienton this signal subgraph S , then the distribution of edges in S must differ between the two classes of odor evokedbehavior [17]. Let E uv | j denote the expected number of edges from vertex v to vertex u in class j . For class m , let E uv | = A uv + η , where η = 0 . is a small noise parameter (it is believed that the C. elegans connectome is similaracross organisms [12]). For class m , let E uv | = A uv + z uv , where the signal parameter z uv = η for all edges notin S , and z uv is uniformly sampled from [ − , for all edges within S . For both classes, let each edge be Poissondistributed, F A uv | M = m j = Poisson ( E uv | j ) .We consider k n -nearest neighbor classiﬁcation of labeled multigraphs (directed, with loops) on 279 vertices, underFrobenius norm. The k n -nearest neighbor classiﬁer used here satisﬁes k n → ∞ as n → ∞ and k n /n → as n → ∞ ,ensuring universal consistency. (Better classiﬁers can be constructed for the joint distribution F MB used here; however,we demand universal consistency.) Figure 1 shows that for this simulation, rejecting ( ε = 0 . -supervenience at α = 0 . requires only a few hundred training samples.Importantly, conducting this experiment in actu is not beyond current technological limitations. 3D superresolutionimaging [18] combined with neurite tracing algorithms [19, 20, 21] allow the collection of a C. elegans brain-graphwithin a day. Genetic manipulations, laser ablations, and training paradigms can each be used to obtain a non-wild typepopulation for use as M = m [13], and the class of each organism ( m vs. m ) can also be determined automatically[22]. 4ogelstein JT, et al. Statistical Supervenience ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● . . . . . . n j L ^ Figure 1: C. elegans graph classiﬁcation simulation results. The estimated hold-out misclassiﬁcation rate (cid:98) L n (cid:48) F ( g n ) (with n (cid:48) = 1000 testing samples) is plotted as a function of class-conditional training sample size n j = n/ , suggest-ing that for ε = 0 . we can determine that M ε ∼ F B holds with conﬁdence with just a few hundred training samplesgenerated from F MB . Each dot depicts (cid:98) L n (cid:48) F ( g n ) for some n ; standard errors are ( (cid:98) L n (cid:48) F ( g n )(1 − (cid:98) L n (cid:48) F ( g n )) /n (cid:48) ) / . Forexample, at n j = 180 we have k n = (cid:98)√ n (cid:99) = 53 (where (cid:98)·(cid:99) indicates the ﬂoor operator), (cid:98) L n (cid:48) F ( g n ) = 0 . , andstandard error less than . . We reject H . : L ∗ ≥ . at α = 0 . . Note that L ∗ ≈ for this simulation. Discussion

This work makes the following contributions. First, we deﬁne statistical supervenience based on Davidson’s canonicalstatement (Deﬁnition 1). This deﬁnition makes it apparent that supervenience implies the possibility of perfect classi-ﬁcation (Theorem 1). We then prove that there is no viable test against supervenience, so one can never reject a nullhypothesis in favor of supervenience, regardless of the amount of data (Theorem 2). This motivates the introduction ofa relaxed notion called ε -supervenience (Deﬁnition 2), against which consistent statistical tests are readily available.Under a very general brain-graph/mental property model ( Gedankenexperiment ε -supervenience is always available no matter the true distribution F MB (Theorem 3). In other words, the proposed testis guaranteed to reject the null whenever the null is false, given sufﬁcient data, for any possible distribution governingmental property/brain property pairs.Alas, arbitrary slow convergence theorems demonstrate that there is no universal n, n (cid:48) for which convergenceis guaranteed (Theorem 4). Thus, a failure to reject is ambiguous: even if the data satisfy the above assumptions,the failure to reject may be due to either (i) an insufﬁcient amount of data or (ii) M may not be ε -supervenient on B . Moreover, the data will not, in general, satisfy the above assumptions. In addition to dependence (because eachhuman does not exist in a vacuum), the mental property measurements will often be “noisy” (for example, accuratelydiagnosing psychiatric disorders is a sticky wicket [23]). Nonetheless, synthetic data analysis suggests that undersomewhat realistic assumptions, convergence obtains with an amount of data one might conceivably collect (Figure 1and ensuing discussion).Thus, given measurements of mental and brain properties that we believe reﬂect the properties of interest, and givena sufﬁcient amount of data satisfying the independent and identically sampled assumption, a rejection of H ε : L ∗ ≥ ε in favor of M ε ∼ F B entails that we are − a )% conﬁdent that the mental property under investigation is ε -supervenient on the brain property under investigation. Unfortunately, failure to reject is more ambiguous.Interestingly, much of contemporary research in neuroscience and cognitive science could be cast as mind-brainsupervenience investigations. Speciﬁcally, searches for “engrams” of memory traces [24] or “neural correlates” of5ogelstein JT, et al. Statistical Superveniencevarious behaviors or mental properties (for example, consciousness [25]), may be more aptly called searches for the“neural supervenia” of such properties. Letting the brain property be a brain-graph is perhaps especially pertinent inlight of the advent of “connectomics” [26, 27], a ﬁeld devoted to estimating whole organism brain-graphs and relatingthem to function. Testing supervenience of various mental properties on these brain-graphs will perhaps thereforebecome increasingly compelling; the framework developed herein could be fundamental to these investigations. Forexample, questions about whether connectivity structure alone is sufﬁcient to explain a particular mental property isone possible mind-brain ε -supervenience investigation. The above synthetic data analysis demonstrates the feasibil-ity of ε -supervenience on small brain-graphs. Similar supervenience tests on larger animals (such as humans) willpotentially beneﬁt from either higher-throughput imaging modalities [28, 29], more coarse brain-graphs [30, 31], orboth. Methods

The -nearest neighbor ( -NN) classiﬁer works as follows. Compute the distance between the test brain b and all n training brains, d i = d ( b, b i ) for all i ∈ [ n ] , where [ n ] = 1 , , . . . , n . Then, sort these distances, d (1) < d (2) <. . . < d ( n ) , and consider their corresponding minds, m (1) , m (2) , . . . , m ( n ) , where parenthetical indices indicate rankorder among { d i } i ∈ [ n ] . The -NN algorithm predicts that the unobserved mind is of the same class as the closestbrain’s class: ˆ m = m (1) . The k n nearest neighbor is a straightforward generalization of this approach. It saysthat the test mind is in the same class as whichever class is the plurality class among the k n nearest neighbors, ˆ m = argmax m (cid:48) I { (cid:80) k n i =1 m ( i ) = m (cid:48) } . Given a particular choice of k n (the number of nearest neighbors to consider)and a choice of d ( · , · ) (the distance metric used to compare the test datum and training data), one has a relativelysimple and intuitive algorithm.Let g n be the k n nearest neighbor ( k n NN) classiﬁer when there are n training samples. A collection of suchclassiﬁers { g n } , with k n increasing with n , is called a classiﬁer sequence. A universally consistent classiﬁer sequenceis any classiﬁer sequence that is guaranteed to converge to the Bayes optimal classiﬁer regardless of the true distributionfrom which the data were sampled; that is, a universally consistent classiﬁer sequence satisﬁes L F ( g n ) → L F ( g ∗ ) as n → ∞ for all F MB . In the main text, we refer to the whole sequence as a classiﬁer.The k n NN classiﬁer is consistent if (i) k n → ∞ as n → ∞ and (ii) k n /n → as n → ∞ [32]. In Stone’s originalproof [32], b was assumed to be a q -dimensional vector, and the L norm ( d ( b, b (cid:48) ) = (cid:80) qj =1 ( b j − b (cid:48) j ) , where j indexeselements of the q -dimensional vector) was shown to satisfy the constraints on a distance metric for this collection ofclassiﬁers to be universally consistent. Later, others extended these results to apply to any L p norm [9]. When brain-graphs are represented by their adjacency matrices, one can stack the columns of the adjacency matrices, effectivelyembedding graphs into a vector space, in which case Stone’s theorem applies. Stone’s original proof also applied tothe scenario when |M| was inﬁnite, resulting in a universally consistent regression algorithm as well.Note that the above extension of Stone’s original theorem to the graph domain implicitly assumed that verticeswere labeled, such that elements of the adjacency matrices could easily be compared across graphs. In theory, whenvertices are unlabeled, one could ﬁrst map each graph to a quotient space invariant to isomorphisms, and then proceedas before. Unfortunately, there is no known polynomial time complexity algorithm for graph isomorphism [33], so inpractice, dealing with unlabeled vertices will likely be computationally challenging [8].6ogelstein JT, et al. Statistical Supervenience Appendix

In this appendix we compare supervenience to several other relations on sets (see Figure 2).First, a supervenient relation does not imply an injective relation. An injective relation is any relation that preservesdistinctness. Thus if minds are injective on brains, then b (cid:54) = b (cid:48) = ⇒ m (cid:54) = m (cid:48) (note that the directionality of theimplication has been switched relative to supervenience). However, it might be the case that a brain could changewithout the mind changing. Consider the case that a single subatomic particle shifts its position by a Plank length,changing brain state from b to b (cid:48) . It is possible (likely?) that the mental state supervening on brain state b remains m ,even after b changes to b (cid:48) . In such a scenario, the mind might still supervene on the brain, but the relation from brainsto minds is not injective. This argument also shows that supervenience is not necessarily a symmetric relation. Mindssupervening on brains does not imply that brains supervene on minds.Second, supervenience does not imply causality. For instance, consider an analogy where M and B correspond totwo coins being ﬂipped, each possibly landing on heads or tails. Further assume that every time one lands on heads sodoes the other, and every time one lands on tails, so do the other. This implies that M supervenes on B , but assumesnothing about whether M causes B , or B causes M , or some exogenous force causes both.Third, supervenience does not imply identity. The above example with the two coins demonstrates this, as the twocoins are not the same thing, even if one has perfect information about the other.What supervenience does imply, however, is the following. Imagine ﬁnding two unequal minds. If M S ∼ F B , thenthe brains on which those two minds supervene must be different. In other words, there cannot be two unequal minds,either of which could supervene on a single brain. Figure 2 shows several possible relations between the sets of mindsand brains.Note that statistical supervenience is distinct from statistical correlation. Statistical correlation between brainstates and mental states is deﬁned as ρ MB = E [( B − µ B )( M − µ M )] / ( σ B σ M ) , where µ X and σ X are the meanand variance of X , and E [ X ] is the expected value of X . If ρ MB = 1 , then both M S ∼ F B and B S ∼ F M . Thus,perfect correlation implies supervenience, but supervenience does not imply correlation. In fact, supervenience maybe thought of as a generalization of correlation which incorporates directionality, can be applied to arbitrary valuedrandom variables (such as mental or brain properties), can depend on any moment of a distribution (not just the ﬁrsttwo). b b b m m m b b b m m b b b m m m (A) (B) (C) Figure 2: Possible relations between minds and brains. (A) Minds supervene on brains, and it so happens that there isa bijective relation from brains to minds. (B) Minds supervene on brains, and it so happens that there is a surjective(a.k.a., onto) relation from brains to minds, but not a bijective one. (C) Minds are not supervenient on brains, becausetwo different minds supervene on the same brain. 7ogelstein JT, et al. Statistical Supervenience

References [1] Plato.

Plato: complete works . Hackett Pub Co, (1997).[2] Davidson, D.

Experience and Theory , chapter Mental Eve. Duckworth (1970).[3] Haykin, S.

Neural Networks and Learning Machines . Prentice Hall, 3rd edition, (2008).[4] Ripley, B. D.

Pattern Recognition and Neural Networks . Cambridge University Press, (2008).[5] Gazzaniga, M. S., Ivry, R. B., and Mangun, G. R.

Cognitive Neuroscience: The Biology of the Mind (ThirdEdition) . W. W. Norton & Company, (2008).[6] Kim, J.

Physicalism, or Something Near Enough (Princeton Monographs in Philosophy) . Princeton UniversityPress, (2007).[7] Bickel, P. J. and Doksum, K. A.

Mathematical Statistics: Basic Ideas and Selected Topics, Vol I (2nd Edition) .Prentice Hall, (2000).[8] Vogelstein, J. T. and Priebe, C. E.

Submitted for publication (2011).[9] Devroye, L., Gy¨orﬁ, L., and Lugosi, G.

A Probabilistic Theory of Pattern Recognition . Springer, (1996).[10] Devroye, L.

Utilitas Mathematica , 475–483 (1983).[11] Popper, K. R.

The logic of scientiﬁc discovery . Routledge, (1959).[12] Durbin, R. M.

Studies on the Development and Organisation of the Nervous System of Caenorhabditis elegans .PhD thesis, University of Cambridge, (1987).[13] de Bono, M. and Maricq, A. V.

Annu Rev Neurosci , 451–501 (2005).[14] Gelman, A. and Shalizi, C. R. Submitted for publication , 1–36 (2011).[15] Varshney, L. R., Chen, B. L., Paniagua, E., Hall, D. H., Chklovskii, D. B., Spring, C., and Farm, J.

World WideWeb Internet And Web Information Systems , 1–41.[16] Chalasani, S. H., Chronis, N., Tsunozaki, M., Gray, J. M., Ramot, D., Goodman, M. B., and Bargmann, C. I.

Nature (7166), 63–70 November (2007).[17] Vogelstein, J. T., Gray, W. R., Vogelstein, R. J., and Priebe, C. E.

Submitted for publication (2011).[18] Vaziri, A., Tang, J., Shroff, H., and Shank, C. V.

Proceedings of the National Academy of Sciences of the UnitedStates of America (51), 20221–6 December (2008).[19] Helmstaedter, M., Briggman, K. L., and Denk, W.

Current opinion in neurobiology (6), 633–41 December(2008).[20] Mishchenko, Y. J Neurosci Methods (2), 276–289 January (2009).[21] Lu, J., Fiala, J. C., and Lichtman, J. W.

PLoS ONE (5), e5655 (2009).[22] Buckingham, S. D. and Sattelle, D. B. Invertebrate neuroscience : IN (3), 121–31 September (2008).[23] Kessler, R. C., Berglund, P., Demler, O., Jin, R., Merikangas, K. R., and Walters, E. E. Archives of generalpsychiatry (6), 593–602 June (2005).[24] Lashley, K. S. Symposia of the society for experimental biology (454-482), 30 (1950).[25] Koch, C. The Quest for Consciousness . Roberts and Company Publishers, (2010).[26] Sporns, O., Tononi, G., and Kotter, R.

PLoS Computational Biology (4), e42 (2005).8ogelstein JT, et al. Statistical Supervenience[27] Hagmann, P. From diffusion MRI to brain connectomics . PhD thesis, Institut de traitement des signaux, (2005).[28] Hayworth, K. J., Kasthuri, N., Schalek, R., Lichtman, J. W., Program, N., Angeles, L., and Biology, C.

World (Supp 2), 86–87 (2006).[29] Bock, D. D., Lee, W.-C. A., Kerlin, A. M., Andermann, M. L., Hood, G., Wetzel, A. W., Yurgenson, S., Soucy,E. R., Kim, H. S., and Reid, R. C. Nature (7337), 177–182 March (2011).[30] Palm, C., Axer, M., Gr¨aß el, D., Dammers, J., Lindemeyer, J., Zilles, K., Pietrzyk, U., and Amunts, K.

Frontiersin Human Neuroscience (2010).[31] Johansen-Berg, H. and Behrens, T. E. Diffusion MRI: From quantitative measurement to in-vivo neuroanatomy .Academic Press, (2009).[32] Stone, C. J.

The Annals of Statistics (4), 595–620 July (1977).[33] Garey, M. R. and Johnson, D. S. Computers and intractability. A guide to the theory of NP-completeness. ASeries of Books in the Mathematical Sciences . WH Freeman and Company, San Francisco, Calif, (1979).

Acknowledgments

The authors would like to acknowledge helpful discussions with J. Lande, B. Vogelstein, S. Seung, and two helpfulreferees.

Author Contributions

JTV, RJV, and CEP conceived of the manuscript. JTV and CEP wrote it. CEP ran the experiment.