[PDF] Interfacing biology, category theory and mathematical statistics

Abstract

Motivated by the concept of degeneracy in biology (Edelman, Gally 2001), we establish a first connection between the Multiplicity Principle (Ehresmann, Vanbremeersch 2007) and mathematical statistics. Specifically, we exhibit two families of statistical tests that satisfy this principle to achieve the detection of a signal in noise.

Full PDF

JJohn Baez and Bob Coecke (Eds.): Applied Category Theory 2019EPTCS 323, 2020, pp. 136–148, doi:10.4204/EPTCS.323.9 c (cid:13)

Pastor, Beurier, Ehresmann & WaldeckThis work is licensed under theCreative Commons Attribution License.

Interfacing biology, category theory and mathematicalstatistics

Dominique Pastor Erwan Beurier

11 IMT AtlantiqueLabSTICC & LEGOUniversité de Bretagne-Loire29238 Brest, France { erwan.beurier ; dominique.pastor ; roger.waldeck }@imt-atlantique.fr

Andrée Ehresmann Roger Waldeck

12 Faculté des SciencesDépt. de MathématiquesLAMFAUniversité de Picardie Jules Verne33 rue Saint-LeuF-80039 Amiens, France [email protected]

Motivated by the concept of degeneracy in biology [3], we establish a ﬁrst connection between theMultiplicity Principle [4, 5] and mathematical statistics. Speciﬁcally, we exhibit two families of teststhat satisfy this principle to achieve the detection of a signal in noise.

In [3], Edelman & Gally pointed out degeneracy as the fundamental property allowing for living systemsto evolve through natural selection towards more complexity in ﬂuctuating environments. Degeneracy isdeﬁned [3] as “ . . . the ability of elements that are structurally different to perform the same function oryield the same output ”. Degeneracy is a crucial feature of immune systems and neural networks, at allorganization levels.The Multiplicity Principle (MP) [4, 5], introduced by Ehresmann & Vanbremeersch, is a mathemat-ical formalization of degeneracy in Categorical terms. The consequences of this principle, as treated in[4, 5], underpin Edelman & Gally’s conjecture according to which “ complexity and degeneracy go handin hand ” [3].Another property of many biological and social systems is their resilience: (i) they can performin degraded mode, with some performance loss, but without collapsing; (ii) they can recover their ini-tial performance level when nominal conditions are satisﬁed again; (iii) they can perform correctionsand auto-adaption so as to maintain essential tasks for their survival. In addition, resilience of socialor biological systems is achieved via agents with different skills. For instance, cells are simply reac-tive organisms, whereas social agents have some cognitive properties. Thence the idea that resiliencemay derive from fundamental properties satisﬁed by agents, interactions and organizations. Could thisfundamental property be a possible consequence of degeneracy [5, Section 3.1, p. 15]?The notion of resilience remains, however, somewhat elusive, mathematically speaking. In contrast,the notion of robustness has a long history and track record in mathematical statistics [6]. By and large,a statistical method is robust if its performance is not unduly altered in case of outliers or ﬂuctuationsaround the model for which it is designed. Can we fathom the links between resilience and robustness? astor, Beurier, Ehresmann & Waldeck

Summary of main results

Because this paper lies at the interface between different mathematical specialties, the present sectionsummarizes its contents in straight text. To begin with, the MP is a property that a category may satisfywhen it involves structurally different diagrams sharing the same cocones. To state our main results, itwill not be necessary to consider the general MP though. In fact, the particular case of preordered setswill sufﬁce, in which case the MP reduces to Proposition 1.Second, in statistical hypothesis testing, a hypothesis can be seen as a predicate, of which we canaim at determining the truth value by using statistical decisions. There exist many optimality criteria todevise a decision to test a given hypothesis. In non-Bayesian approaches, which will be our focus below,such criteria are speciﬁed through the notions of size and power.The size is the least upper bound for the probability of rejecting the hypothesis when this one isactually true. We generally want this size to remain below a certain value called level, because thehypothesis to test mostly represents the standard situation. For instance, planes in the sky are rare events,after all, and the standard hypothesis is "there is no plane", which represents the nominal situation. A toolarge level may result in an intolerable cluttering of a radar screen.We do not want to be bothered by too many alarms. In contrast, when the hypothesis is false, we wantto reject it with the highest possible conﬁdence. The probability that a decision rejects the hypothesiswhen this one is actually false is called the power of the decision. For a given testing problem, we thuslook for decisions with maximal power within the set of those decisions that have a size less than or equalto a a speciﬁed level. This deﬁnes a preorder. A maximal element in this preorder is said to be optimal.Different hypotheses to test may thus require different criteria, speciﬁed through different notions ofsize and different notions of power. This is what we exploit below to exhibit two sets of "structurallydifferent" decisions that satisfy the MP.To carry out this construction, we consider the detection of a signal in independent standard gaussiannoise, a classical problem in many applications. This is an hypothesis testing problem for which thereexists an optimality criterion where the size is the so-called probability of false alarm and the power isthe so-called probability of detection. This criterion has a solution, the Neyman-Pearson (NP) decision,which is thus the maximal element of a certain preorder. We can consider a second class of decisions,38

Interfacing biology, CT and statistics namely, the RDT decisions. These decisions are aimed at detecting deviations of a signal with respect toa known deterministic model in presence of independent standard gaussian noise. This problem is rota-tionally invariant and the RDT decisions are optimal with respect to a speciﬁc criterion deﬁned throughsuitable notions of size and power. They are maximal elements of another preordered set. Althoughnot dedicated to signal detection, these decisions can be used as surrogates to NP decisions to detect asignal. It turns out that the family of RDT decisions and that of NP decisions satisfy the MP as stated inTheorem 4. This is because the more data we have, the closer to perfection both decisions are.

Notation

Random variables.

Given two measurable spaces E and F , M ( E , F ) denotes the set of all measurablefunctions deﬁned on E and valued in F . The two σ -algebra involved are omitted in the notation because,in the sequel, they will always be obvious from the context. In particular, we will throughout considera probability space ( Ω , B , P ) and systematically endow R with the Borel σ -algebra, which will not berecalled. Therefore, M ( Ω , R ) designates the set of all real random variables and M ( Ω , R n ) is the set of n -dimensional real random vectors.Given q ∈ [ , ∞ [ , B ∞ ( q ) is the set of all real random variables ∆ ∈ M ( Ω , R ) such that | ∆ | ∞ (cid:54) q .As usual, we write X ∼ N ( , ) to mean that X ∈ M ( Ω , R ) is standard normal. Given a sequence ( X n ) n ∈ N ∈ M ( Ω , R ) N of real random variables, we write X , X , . . . iid ∼ N ( , ) to mean that X , X , . . . are independent and identically distributed with common distribution N ( , ) . Decisions et Observations.

Throughout, M (cid:0) { , }× Ω , (cid:8) , (cid:9)(cid:1) designates the set of all measurablefunctions D : (cid:8) , (cid:9) × Ω → (cid:8) , (cid:9) . Any element of M (cid:0) { , }× Ω , (cid:8) , (cid:9)(cid:1) is called a decision for ob-vious reasons given below. If D ∈ M (cid:0) { , }× Ω , (cid:8) , (cid:9)(cid:1) then, for any ε ∈ (cid:8) , (cid:9) , D ( ε ) denotes theBernoulli-distributed random variable D ( ε ) : Ω → (cid:8) , (cid:9) deﬁned for any given ω ∈ Ω by D ( ε )( ω ) = D ( ε , ω ) . An n -dimensional test is hereafter any measurable function f : R n → { , } and M ( R n , { , } ) stands for the set of all n -dimensional tests. A measurable function X : { , } × Ω → R n is hereaftercalled an observation and M ( { , } × Ω , R n ) denotes the set of all these observations. Given a test f ∈ M ( R n , { , } ) and X ∈ M ( { , } × Ω , R n ) , D = f ( X ) is trivially a decision: D ∈ M (cid:0) { , }× Ω , (cid:8) , (cid:9)(cid:1) .If X ∈ M ( { , } × Ω , R n ) then, for any ε ∈ { , } , X ( ε ) = X ( ε , · ) ∈ M ( Ω , R n ) is deﬁned for every ω ∈ Ω by X ( ε )( ω ) = X ( ε , ω ) . Empirical means.

We deﬁne the empirical mean of a given sequence y = ( y n ) n ∈ N of real values asthe sequence ( (cid:104) y (cid:105) n ) n ∈ N of real values such that, ∀ n ∈ N , (cid:104) y (cid:105) n : = n ∑ ni = y i . By extension, the empiri-cal mean of a sequence Y = ( Y n ) n ∈ N of random variables where each Y n ∈ M ( Ω , R ) is the sequence ( (cid:104) Y (cid:105) n ) n ∈ N of random variables where, for each n ∈ N , (cid:104) Y (cid:105) n ∈ M ( Ω , R ) is deﬁned by (cid:104) Y (cid:105) n : = n ∑ ni = Y i .Therefore, for any ω ∈ Ω , (cid:104) Y (cid:105) n ( ω ) : = (cid:104) Y ( ω ) (cid:105) n with Y ( ω ) = ( Y n ( ω )) n ∈ N . If Y = ( Y n ) n ∈ N is a sequenceof observations ( ∀ n ∈ N , Y n ∈ M ( { , } × Ω , R ) ), we deﬁne the empirical mean of Y as the sequence ( (cid:104) Y (cid:105) n ) n ∈ N of observations such that, for ε ∈ { , } , (cid:104) Y (cid:105) n ∈ M ( { , } × Ω , R ) with (cid:104) Y (cid:105) n ( ε ) = (cid:104) Y ( ε ) (cid:105) n and Y ( ε ) = ( Y n ( ε )) n ∈ N . Preordered sets.

Given a preordered set ( E , (cid:22) ) and A ⊂ E , the set of maximal elements of A is denotedby max ( A , ( E , (cid:22) )) , the set of upper bounds of A is denoted by upper ( A , ( E , (cid:22) )) and the set of least upperbounds of A in ( E , (cid:22) ) is denoted by sup ( A , ( E , (cid:22) )) . astor, Beurier, Ehresmann & Waldeck The multiplicity principle (MP) comes from [4]. It proposes a categorical approach to the biologicaldegeneracy principle, which ensures a kind of ﬂexible redundancy. Roughly, MP, in a category C ,ensures the existence of structurally non isomorphic diagrams with same colimit. A formal deﬁnitionrelies on the notion of a cluster between diagrams in a category C . Deﬁnition 1 (Cluster) . Let D : D → C and E : E → C be two (small) diagrams. A cluster G : D → E isa maximal set G = { f : D ( d ) → E ( e ) | d ∈ D , e ∈ E , f ∈ C } such that:(i) for all d ∈ D there exist e ∈ E and g : D ( d ) → E ( e ) such that g ∈ G(ii) let G ( d ) be the subset of G consisting of arrows g : D ( d ) → E ( e ) associated to the same d; thenG ( d ) is included in a connected component of the comma-category ( D ( d ) | E ) (iii) if g : D ( d ) → E ( e ) ∈ G ( d ) and ε : e → e (cid:48) ∈ E , then E ( ε ) ◦ g ∈ G ( d ) (iv) if δ : d (cid:48) → d ∈ D and g : D ( d ) → E ( e ) ∈ G ( d ) , then g ◦ D ( δ ) ∈ G ( d (cid:48) ) For instance, a connected cone from c to D can be seen as a cluster from the constant functor ∆ ( c ) to D ; and any cocone from E to c is a cluster E → ∆ ( c ) . Remark 1.

Adjacent clusters can be composed: a cluster G : D → E and a cluster G : E → E canbe composed to a cluster G ◦ G. We can then consider a category of clusters of C , whose objects arethe (small) diagrams D → C , and an arrow D → E is a cluster. This category is isomorphic to the freecocompletion of C [4]. A cluster G : D → E deﬁnes a functor Ω G : Cocones ( E ) → Cocones ( D ) mapping a cocone α to thecocone α ◦ G (composite of α , seen as a cluster, and G , which is a cluster). Deﬁnition 2 (Multiplicity principle (MP)) . A category C satisﬁes the multiplicity principle (MP) if thereexist two diagrams D : D → C and E : E → C such that:(i) Cocones ( D ) ∼ = Cocones ( E ) ;(ii) There is no cluster G : D → E nor G : E → D such that Ω G is an isomorphism.D and E having the same cocones translates the property of both systems to accomplish the samefunction. The absence of clusters between D and E that deﬁne an isomorphism, reﬂects the structuraldifference between D and E , which is key to robustness and adaptability: if the system described by E fails, then D may replace it. The main purpose of this paper is to ﬁnd a meaningful instance of the MP in some preorder. In thefollowing, we do not distinguish between a preorder and its associated category.

Proposition 1 (MP in a preorder) . Let ( E , (cid:54) ) be a preorder. If there are two disjoint subsets A , B ⊂ Esuch that the following conditions hold, then E veriﬁes the MP:(i) A and B have the same sets of upper bounds(ii) There is an a ∈ A with no upper bounds in B(iii) There is a b ∈ B with no upper bounds in A Interfacing biology, CT and statistics

Proof.

Condition (i) ensures that A and B have isomorphic categories of cocones. Conditions (ii) and(iii) respectively ensure that there is no cluster i A → i B nor i B → i A where i A : A (cid:44) → E and i B : B (cid:44) → E arethe inclusion functors.Albeit trivial, the following lemma will be helpful. Lemma 1.

Given a preordered set ( E , (cid:22) ) , if A and B are two subsets of E such that A × B ∩ (cid:22) = /0 and sup ( A , ( E , (cid:22) )) = sup ( B , ( E , (cid:22) )) , then E satisﬁes the MP. Let ε ∈ { , } be the unknown indicator value on whether a certain physical phenomenon has occurred( ε =

1) or not ( ε = ε . However, whatever D , the decision is erroneous for any ω ∈ Ω such that D ( ε , ω ) (cid:54) = ε . We thus have two distinct cases. False alarm probability: If ε = D ( , ω ) =

1, we commit a false alarm or error of the 1 st kind,since we have erroneously decided that the phenomenon has occurred while nothing actually happened.We thus deﬁne the false alarm probability (aka size, aka error probability of the 1 st kind) of D as: P FA [ D ] def = P (cid:2) D ( ) = (cid:3) (1) Detection probability: If ε = D ( , ω ) =

0, we commit an error of the 2 nd kind, also called misseddetection since, in this case, we have missed the occurrence of the phenomenon. As often in the literatureon the topic, we prefer to use the probability of correctly detecting the phenomenon and we deﬁne thedetection probability as: P DET [ D ] def = P (cid:2) D ( ) = (cid:3) (2) γ ∈ ( , ) and oracles Among all possible decisions, the omniscient oracle D ∗ ∈ M (cid:0) { , }× Ω , (cid:8) , (cid:9)(cid:1) is deﬁned for any pair ( ε , ω ) ∈ (cid:8) , (cid:9) × Ω by setting D ∗ ( ε , ω ) = ε . Its probability of false alarm is 0 and its probability ofdetection is 1: P FA [ D ∗ ] = P DET [ D ∗ ] =

1. This omniscient oracle has no practical interest sinceit knows ε . That’s not really fair! Since it is not possible in practice to guarantee a null false alarmprobability, we focus on decisions whose false alarm probabilities are upper-bounded by a real number γ ∈ ( , ) called level. We state the following deﬁnition. Deﬁnition 3 (Level) . Given γ ∈ ] , [ , we say that D ∈ M (cid:0) { , }× Ω , (cid:8) , (cid:9)(cid:1) has level γ if P FA [ D ] (cid:54) γ .The set of all decisions with level γ ∈ ] , [ is denoted by Dec γ . We can easily prove the existence of an inﬁnite number of elements in Dec γ that all have a detectionprobability equal to 1. Whence the following deﬁnition. Deﬁnition 4.

Given γ ∈ ( , ) , an oracle with level γ is any decision D ∈ Dec γ such that P DET [ D ] = .The set of all the oracles with level γ is denoted by O γ . Oracles with level γ have no practical interest either since they require prior knowledge of ε ! There-fore, we restrict our attention to decisions in Dec γ that "approximate" at best the oracles with level γ ,without prior knowledge of ε , of course. To this end, we must preorder decisions. astor, Beurier, Ehresmann & Waldeck Lemma-Deﬁnition 1 (Total preorder ( Dec γ , (cid:22) ) ) . For any given γ ∈ ( , ) and any pair ( D , D (cid:48) ) ∈ Dec γ × Dec γ , we deﬁne a preorder ( Dec γ , (cid:22) ) by setting:D (cid:22) D (cid:48) if P DET [ D ] (cid:54) P DET (cid:2) D (cid:48) (cid:3) . (3) We write D ∼ = D (cid:48) if D (cid:22) D (cid:48) and D (cid:48) (cid:22) D. In practice, observations help us decide whether the phenomenon has occurred or not. By collectinga certain number of them, we can expect to make a decision. Hereafter, observations are assumed tobe elements of M ( { , } × Ω , R ) and corrupted versions of ε . We suppose that we have a sequence ( Y n ) n ∈ N of such random variables. As a ﬁrst standard model, we could assume that, for any n ∈ N andany ( ε , ω ) ∈ { , } × Ω , Y n ( ε , ω ) = ε + X n ( ω ) with X , X , . . . , X n , . . . iid ∼ N ( , ) . In this additive model, X n models noise on the n th observation. We could make this model more complicated and realistic byconsidering random vectors instead of variables. However, with respect to our purpose, the signiﬁcantimprovement we can bring to the model is elsewhere. Indeed, we have assumed above that the signal,regardless of noise, is ε . However, from a practical point of view, it is more realistic to assume thatthe n th observation Y n captures ε in presence of some interference ∆ n , independent of X n . In practice,the probability distribution of ∆ n will hardly be known and, as a means to compensate for this lack ofknowledge, we assume the existence of a uniform bound on the amplitude of all possible interferences.Therefore, we assume that, for all ( ε , ω ) ∈ { , } × Ω , Y n ( ε , ω ) = ε + X n ( ω ) + ∆ n ( ω ) and the existenceof q ∈ [ , ∞ ) such that ∆ n ∈ B ∞ ( q ) . After all, this model is standard in time series analysis: ε plays therole of a trend, ∆ n is the seasonal variation and X n is the measurement noise.For each q ∈ [ , ∞ ) , Seq q henceforth designates the set of all the sequences: Y = ( Y n ) n ∈ N ∈ M (cid:0) { , }× Ω , (cid:8) , (cid:9)(cid:1) N such that, ∀ n ∈ N and ∀ ( ε , ω ) ∈ { , } × Ω , Y n ( ε , ω ) = ε + ∆ n ( ω ) + X n ( ω ) , where ∆ n ∈ B ∞ ( q ) and X n ∼ N ( , ) are independant. Therefore, for all n ∈ N and all ε ∈ { , } , Y n ( ε ) = ε + ∆ n + X n , with X , X , . . . , X n , . . . iid ∼ N ( , ) . For any sequence Y = ( Y n ) n ∈ N ∈ M (cid:0) { , }× Ω , (cid:8) , (cid:9)(cid:1) , we henceforth set: YYY n = (cid:0) Y , Y , . . . , Y n (cid:1) (4)In other words, YYY n is the truncated version of the original sequence Y at the n th term. Deﬁnition 5 (Selectivity of a test) . Given any n ∈ N and any test f ∈ M ( R n , { , } ) , the selectivity of fat given level γ ∈ ( , ) is deﬁned as the set: Sel γ ( f ) : = (cid:110) q ∈ [ , / ) : ∀ Y ∈ Seq q , f ( YYY n ) ∈ Dec γ (cid:111) The relevance of the interval [ , / ) in the deﬁnition above will pop up in Section 6.2.42 Interfacing biology, CT and statistics

Deﬁnition 6 (Landscapes of tests) . Given any n ∈ N and any test f ∈ M ( R n , { , } ) , the landscape off at given level γ ∈ ( , ) is the subset of Dec γ deﬁned by: Lnd γ ( f ) : = (cid:91) q ∈ Sel γ ( f ) (cid:110) f ( YYY n ) : Y ∈ Seq q (cid:111) (5) The total landscape covered by all the tests f ∈ M ( R n , { , } ) is deﬁned by setting: Lndscp γ : = (cid:91) n ∈ N (cid:110) Lnd γ ( f ) : f ∈ M ( R n , { , } ) (cid:111) (6)This notion of landscape makes it possible to compare tests via the following preorder. The proofsthat the following deﬁnition is consistent and that the next lemma holds true are left to the reader. Deﬁnition 7 (Preorder ( Dec γ , (cid:22) ∗ ) ) . Given any level γ ∈ ( , ) , we deﬁne the preorder ( Dec γ , (cid:22) ∗ ) viathe following three properties: (P1) ∀ n ∈ N , ∀ ( f , g ) ∈ M ( R n , { , } ) × M ( R n , { , } ) , Lnd γ ( f ) (cid:22) ∗ Lnd γ ( g ) if: Sel γ ( f ) = Sel γ ( g ) and ∀ q ∈ Sel γ ( f ) , ∀ Y ∈ Seq q , f ( YYY n ) (cid:22) g ( YYY n ) (P2) ∀ ( L , L (cid:48) ) ∈ (cid:16) Lndscp γ ∪ O γ (cid:17) × O γ , L (cid:22) ∗ L (cid:48) (P3) ∀ L ∈ Dec γ \ (cid:16) Lndscp γ ∪ O γ (cid:17) , L (cid:22) ∗ L Lemma 2. ∀ ( L , L (cid:48) ) ∈ (cid:16) Lndscp γ ∪ O γ (cid:17) × (cid:16) Lndscp γ ∪ O γ (cid:17) , L (cid:22) ∗ L (cid:48) ⇒ L × L (cid:48) ⊂ (cid:22) . With this material, we can state our ﬁrst result that will prove useful in applications to statisticaldecisions below.

Theorem 1 (Approximation of oracles in ( Dec γ , (cid:22) ∗ ) ) . Given γ ∈ ( , ) , if a set X γ and a family of tests (cid:0) f ξ , n (cid:1) ξ ∈ X γ , n ∈ N satisfy:(i) ∀ ( ξ , n ) ∈ X γ × N , f ξ , n ∈ M ( R n , { , } ) ;(ii) ∃ Q γ ⊂ [ , ∞ ) , ∀ ( ξ , n ) ∈ X γ × N , Sel γ (cid:0) f ξ , n (cid:1) = Q γ ;(iii) ∀ ( ξ , q ) ∈ X γ × Q γ , ∀ Y ∈ Seq q , lim n → ∞ P DET (cid:2) f ξ , n ( YYY n ) (cid:3) = ;then, by setting Lndscp (cid:48) γ = (cid:110) Lnd γ (cid:0) f ξ , n (cid:1) : n ∈ N , ξ ∈ X γ (cid:111) , we have: O γ = upper (cid:16) Lndscp (cid:48) γ , (cid:0) Dec γ , (cid:22) ∗ (cid:1)(cid:17) = sup (cid:16) Lndscp (cid:48) γ , (cid:0) Dec γ , (cid:22) ∗ (cid:1)(cid:17) (7) Proof.

For any ( ξ , n ) ∈ X γ × N and any L ∈ O γ , (P2) in Deﬁnition 7 straightforwardly implies thatLnd γ (cid:0) f ξ , n (cid:1) (cid:22) ∗ L . As a consequence:2 O γ ⊂ upper (cid:16) Lndscp γ (cid:48) , (cid:0) Dec γ , (cid:22) ∗ (cid:1)(cid:17) (8)To prove the converse inclusion, consider some L ∈ upper (cid:16) Lndscp γ (cid:48) , (cid:0) Dec γ , (cid:22) ∗ (cid:1)(cid:17) . We thus have ∀ ( ξ , n ) ∈ N × X γ , Lnd γ (cid:0) f ξ , n (cid:1) (cid:22) ∗ L . According to Lemma 2, we have ∀ ( ξ , n ) ∈ X γ × N , Lnd γ (cid:0) f ξ , n (cid:1) × L ⊂ (cid:22) . Therefore, ∀ ( ξ , n ) ∈ N × X γ , ∀ q ∈ Sel γ (cid:0) f ξ , n (cid:1) , ∀ Y ∈ Seq q and ∀ D ∈ L , f ξ , n ( YYY n ) (cid:22) L . It followsfrom the deﬁnition of (cid:22) and assumption (ii) above that: ∀ ( ξ , n ) ∈ N × X γ , ∀ q ∈ Q γ , ∀ Y ∈ Seq q , ∀ D ∈ L , P DET (cid:2) f ξ , n ( YYY n ) (cid:3) (cid:54) P DET [ D ] astor, Beurier, Ehresmann & Waldeck P DET [ D ] = D ∈ O γ . It follows that L ∈ O γ . We obtainthat upper (cid:16) Lndscp γ (cid:48) , (cid:0) Dec γ , (cid:22) ∗ (cid:1)(cid:17) ⊂ O γ and therefore, from (8), 2 O γ = upper (cid:16) Lndscp γ (cid:48) , (cid:0) Dec γ , (cid:22) ∗ (cid:1)(cid:17) .The second equality in (7) is straightforward since the elements of 2 O γ are isomorphic in the sense of (cid:22) ∗ .For later use, given J ⊂ [ , ∞ ) , n ∈ N and F ⊂ M ( R n , { , } ) , we hereafter set:Lndscps J γ ( F ) : = (cid:110) Lnd γ ( f ) ∈ Lndscp γ : f ∈ F , Sel γ ( f ) = J (cid:111) (9) When n spans N , the Neyman-Pearson (NP) Lemma makes it possible to pinpoint a maximal elementin each ( Lndscps { } γ ( F ) , (cid:22) ) with F = M ( R n , { , } ) . These maximal elements are hereafter called NPdecisions. Speciﬁcally, we have the following result. Lemma 3 (Maximality of the NP decisions) . For any γ ∈ ( , ) and any n ∈ N , Lnd γ (cid:16) f NP ( γ ) n (cid:17) = max (cid:16) Lndscps { } γ ( M ( R n , { , } )) , (cid:22) ∗ (cid:17) (10) where f NP ( γ ) n ∈ M ( R n , { , } ) is the n-dimensional NP test with size γ deﬁned by: ∀ ( y , y , . . . , y n ) ∈ R n , f NP ( γ ) n ( y , y , . . . , y n ) = (cid:40) if ∑ ni = y i > √ n Φ − ( − γ ) otherwise (11) and satisﬁes, ∀ Y ∈ Seq ,  P FA (cid:104) f NP ( γ ) n ( YYY n ) (cid:105) = γ P DET (cid:104) f NP ( γ ) n ( YYY n ) (cid:105) = − Φ (cid:0) Φ − ( − γ ) − √ n (cid:1) Proof.

A direct application of the Neyman-Pearson Lemma [8, Theorem 3.2.1, page 60], followed bysome standard algebra to obtain P DET (cid:104) f NP ( γ ) n ( YYY n ) (cid:105) .The next result states that it sufﬁces to increase the number of observations to approximate oracleswith level γ by NP decisions. Theorem 2 (Approximation of oracles with level γ by NP decisions in ( Dec γ , (cid:22) ∗ ) ) . Setting

Lnd NP ( γ ) : = (cid:110) Lnd γ (cid:16) f NP ( γ ) n (cid:17) : n ∈ N (cid:111) for any γ ∈ ( , ) , we have: O γ = upper (cid:16) Lnd NP ( γ ) , (cid:0) Dec γ , (cid:22) ∗ (cid:1)(cid:17) = sup (cid:16) Lnd NP ( γ ) , (cid:0) Dec γ , (cid:22) ∗ (cid:1)(cid:17) Proof.

Given γ ∈ ( , ) , set X γ = { } and, ∀ n ∈ N , f , n = f NP ( γ ) n . According to Lemma 3:lim n → ∞ P DET (cid:104) f NP ( γ ) n ( YYY n ) (cid:105) = Interfacing biology, CT and statistics

Problem statement.

The RDT theoretical framework is exposed in full details in [9, 10]. To easethe reading of the present paper, we directly focus on the particular RDT problem that can be used inconnection with the detection problem at stake.In this respect, suppose that Z = Θ + W ∈ M ( Ω , R n ) , where Θ and W are independent elements of M ( Ω , R n ) . In the sequel, we assume that W ∼ N ( , I n ) , I n being the n × n identity matrix, and considerthe mean testing problem of deciding on whether |(cid:104) Θ (cid:105) n ( ω ) | (cid:54) τ (null hypothesis H ) or |(cid:104) Θ (cid:105) n ( ω ) | > τ (alternative hypothesis H ), when we are given Z ( ω ) = Θ ( ω ) + W ( ω ) , for ω ∈ Ω . The idea is that Θ oscillates uncontrollably around 0 and that only sufﬁcient large deviations of the norm should bedetected. This is a particular Block-RDT problem, following the terminology and deﬁnition given in[10]. This problem is summarized by dropping ω , as usual, and writing:  Observation: Z = Θ + W ∈ M ( Ω , R n ) with (cid:26) Θ ∈ M ( Ω , R n ) , W ∼ N ( , I N ) , Θ and W are independent , H : |(cid:104) Θ (cid:105) n | (cid:54) τ , H : |(cid:104) Θ (cid:105) n | > τ . (12)Standard likelihood theory [8, 1, 2] does not make it possible to solve this problem. Fortunately, thisproblem can be solved as follows via the Random Distortion Testing (RDT) framework. Size and power of tests for mean testing.

We seek tests with guaranteed size and optimal power, in thesense speciﬁed below.

Deﬁnition 8 (Size for the mean testing problem) . The size of f ∈ M ( R n , { , } ) for testing the empiricalmean of the signals Θ ∈ M ( Ω , R n ) such that P (cid:2) |(cid:104) Θ (cid:105) n | (cid:54) τ (cid:3) (cid:54) = , given Z = Θ + W ∈ M ( Ω , R n ) with Windependent of Θ , is deﬁned by: α [ n ] ( f ) = sup Θ ∈ M ( Ω , R n ) : P [ |(cid:104) Θ (cid:105) n | (cid:54) τ ] (cid:54) = P (cid:2) f ( Z ) = (cid:12)(cid:12) |(cid:104) Θ (cid:105) n | (cid:54) τ (cid:3) (13) We say that f ∈ M ( R n , { , } ) has level (resp. size) γ if α [ n ] ( f ) (cid:54) γ (resp. α [ n ] ( f ) = γ ). The class of allthe tests with level γ is denoted by Tests [ n ] γ : Tests [ n ] γ = (cid:110) f ∈ M ( R n , { , } ) : α [ n ] ( f ) (cid:54) γ (cid:111) Deﬁnition 9 (Power for the mean testing problem) . The power of f ∈ M ( R n , { , } ) for testing the em-pirical mean of Θ ∈ M ( Ω , R n ) such that P (cid:2) |(cid:104) Θ (cid:105) n | > τ (cid:3) (cid:54) = when we are given Z = Θ + W ∈ M ( Ω , R n ) ,with W independent of Θ , is deﬁned by: β [ n ] Θ ( f ) = P (cid:2) f ( Z ) = (cid:12)(cid:12) |(cid:104) Θ (cid:105) n | > τ (cid:3) (14) The RDT solution.

With the same notation as above, we can easily construct a preorder (cid:16)

Tests [ n ] γ , (cid:22) (cid:5) (cid:17) by setting: ∀ ( f , f (cid:48) ) ∈ Tests [ n ] γ × Tests [ n ] γ , f (cid:22) (cid:5) f (cid:48) if ∀ Θ ∈ M ( Ω , R n ) , P (cid:2) |(cid:104) Θ (cid:105) n | > τ (cid:3) (cid:54) = ⇒ β [ n ] Θ ( f ) (cid:54) β [ n ] Θ ( f (cid:48) ) astor, Beurier, Ehresmann & Waldeck (cid:16) Tests [ n ] γ , (cid:22) (cid:5) (cid:17) . However, we can exhibit C [ n ] γ ⊂ Tests [ n ] γ whose elementssatisfy suitable invariance properties with respect to the mean testing problem and prove the existence ofa maximal element in (cid:16) C [ n ] γ , (cid:22) (cid:5) (cid:17) .Set S = (cid:8) id , − id (cid:9) where id is the identity of R . Endowed with the usual composition law ◦ offunctions, ( S , ◦ ) is a group. Let A be the group action that associates to each given s ∈ S the map A s : R n → R n deﬁned for every x = ( x , x , . . . , x n ) ∈ R n by A s ( x ) = ( s ( x ) , s ( x ) , . . . , s ( x n )) . Readily,the mean testing problem is invariant under the action of A in that A s ( Z ) = A s ( Θ ) + W (cid:48) where W (cid:48) =( W (cid:48) , W (cid:48) , . . . , W (cid:48) n ) ∼ N ( , I n ) is independent of A s ( Θ ) . Therefore, A s ( Z ) satisﬁes the same hypothesesas Z . We also have |(cid:104) A s ( Θ ) (cid:105) n | = |(cid:104) Θ (cid:105) n | . Hence, the mean testing problem remains unchanged bysubstituting A s ( Θ ) for Θ and W (cid:48) for W . It is thus natural to seek A -invariant tests, that is, tests f ∈ M ( R n , { , } ) such that f ( A s ( x )) = f ( x ) for any s ∈ S and any x ∈ R n .On the other hand, since we can reduce the noise variance by averaging observations, we con-sider A -invariant integrator tests, that is, A -invariant tests f ∈ M ( R n , { , } ) for which exists f ∈ M (cid:0) R , { , } (cid:1) , henceforth called the reduced form of f , such that f ( xxx ) = f ( (cid:104) xxx (cid:105) n ) for any xxx ∈ R n . Re-duced forms of A -invariant integrator tests are also A -invariant: ∀ x ∈ R , ∀ s ∈ A , f ( s ( x )) = f ( x ) . Wethus deﬁne C [ n ] γ ⊂ Tests [ n ] γ as the class of all A -invariant integrator tests with level γ . We thus have f ∈ C [ n ] γ if: [Size] : α [ n ] ( f ) (cid:54) γ ; [ A -invariance] : ∀ ( s , x ) ∈ S × R n , f ( A s ( x )) = f ( x ) ; [Integration] : ∃ f ∈ M (cid:0) R , { , } (cid:1) , ∀ x ∈ R n , f ( x ) = f ( (cid:104) x (cid:105) n ) .The following result derives from the foregoing and [9, 10]. Proposition 2 (Maximal element of (cid:16) C [ n ] γ , (cid:22) (cid:5) (cid:17) ) . For any γ ∈ ( , ) and any n ∈ N , (cid:110) f RDT ( γ , τ ) n (cid:111) = max (cid:16) C [ n ] γ , (cid:22) (cid:5) (cid:17) (15) where f RDT ( γ , τ ) n ∈ M ( R n , { , } ) is deﬁned by setting ∀ ( y , y , . . . , y n ) ∈ R n , f RDT ( γ , τ ) n ( y , y , . . . , y n ) = (cid:40) if | ∑ ni = y i | (cid:54) √ n λ γ ( τ √ n ) otherwiseand λ γ ( τ √ n ) is the unique solution in x to the equation − Φ ( x − τ √ n ) − Φ ( x + τ √ n ) = γ where Φ is the cumulative distribution function (cdf) of the N ( , ) law. RDT and NP tests are structurally different because dedicated to two different testing problems andoptimal with respect to two different criteria. This structural difference will be enhanced by coming backto our initial detection problem.

Consider again the problem of estimating ε ∈ { , } , when we have a sequence Y ∈ Seq q of observationssuch that:46 Interfacing biology, CT and statistics ∀ n ∈ N , ∀ ( ε , ω ) ∈ (cid:8) , (cid:9) × Ω , Y n ( ε , ω ) = ε + ∆ n ( ω ) + X n ( ω ) (16)where X , X , . . . iid ∼ N ( , ) and ∀ n ∈ N , ∆ n ∈ B ∞ ( q ) with q ∈ [ , ∞ ) . The empirical mean of Y satisﬁes: ∀ n ∈ N , (cid:104) Y (cid:105) n ( ε ) = (cid:104) Y ( ε ) (cid:105) n = ε + (cid:104) ∆ (cid:105) n + (cid:104) X (cid:105) n . We thus have |(cid:104) ∆ (cid:105) n | (cid:54) q (a-s). Set Θ n = ε + ∆ n for every n ∈ N . In the sequel, we assume q < / (cid:26) ε = ⇔ |(cid:104) Θ (cid:105) n | (cid:54) q ε = ⇔ |(cid:104) Θ (cid:105) n | (cid:62) − q (17)Therefore, when q ∈ [ , / ) , deciding on whether ε is zero or not when we are given YYY n ( ω ) amounts totesting whether |(cid:104) Θ (cid:105) n ( ω ) | (cid:54) τ or not for τ ∈ [ q , − q ] . We thus can use the decision f RDT ( γ , τ ) n ( YYY n ) , where f RDT ( γ , τ ) n is given by Proposition 2.We can calculate the false alarm probability (1) of f RDT ( γ , τ ) n ( YYY n ) where YYY n is deﬁned by (4). Thetheoretical results in [9] yield that ∀ τ ∈ [ q , − q ] , P FA (cid:104) f RDT ( γ , τ ) n ( YYY n ) (cid:105) (cid:54) γ . In the sequel, for the sake ofsimplifying notation, we assume that both τ and q are in [ , / ) . In this case, we have: ∀ τ ∈ [ , / ) ,  Sel γ (cid:16) f RDT ( γ , τ ) n (cid:17) = [ , τ ] Lnd γ (cid:16) f RDT ( γ , τ ) n (cid:17) = (cid:91) q ∈ [ , τ ] (cid:110) f RDT ( γ , τ ) n ( YYY n ) : Y ∈ Seq q (cid:111) (18)We can then state the following lemma, which is the counterpart to Lemma 3. Theorem 3 (Maximality of RDT decisions) . For any γ ∈ ( , ) , any n ∈ N and any (cid:54) q (cid:54) τ < / , Lnd γ (cid:16) f RDT ( γ , τ ) n (cid:17) = max (cid:16) Lndscps [ , τ ] γ (cid:16) C [ n ] γ (cid:17) , (cid:22) ∗ (cid:17) .Proof. It results from Deﬁnition 6 that Lnd γ ( f ) : = (cid:110) f ( YYY n ) : Y ∈ Seq q , q ∈ [ , τ ] (cid:111) . According to (9),we also have: Lndscps [ , τ ] γ (cid:16) C [ n ] γ (cid:17) = (cid:110) Lnd γ ( f ) ∈ Lndscp γ : f ∈ C [ n ] γ , Sel γ ( f ) = [ , τ ] (cid:111) Given q ∈ [ , τ ] and Y ∈ Seq q , set:  Z = YYY n = ( Y , Y , . . . , Y n ) (see (4)) W = ( X , X , . . . , X n ) ∼ N ( , I n ) Θ = ( + ∆ , + ∆ , . . . , + ∆ n ) We basically have Z = Θ + W . Consider now the mean testing problem (12) with Θ , W and Z deﬁned asabove. For any f ∈ M ( R n , { , } ) , it follows from Eqs. (16) , (17), (2) and (13) that: β [ n ] Θ ( f ) = P DET [ f ( YYY n )] (19)Suppose now that f ∈ C [ n ] γ with Sel γ ( f ) = [ , τ ] . We derive from Proposition 2, (19) and its application to f RDT ( γ , τ ) n , that P DET [ f ( YYY n )] (cid:54) P DET (cid:104) f RDT ( γ , τ ) n ( YYY n ) (cid:105) . Since q (cid:54) τ < / q ∈ Sel γ ( f ) and sinceSel γ ( f ) = Sel γ (cid:16) f RDT ( γ , τ ) n (cid:17) = [ , τ ] , we can rewrite the foregoing equality as f ( YYY n ) (cid:22) f RDT ( γ , τ ) n ( YYY n ) . Thisholding true for any q ∈ Sel γ ( f ) , any Y ∈ Seq q and since f and f RDT ( γ , n ) q have same selectivity [ , τ ] , wederive from the foregoing and Deﬁnition 7 that Lnd γ ( f ) (cid:22) ∗ Lnd γ (cid:16) f RDT ( γ , τ ) n (cid:17) . astor, Beurier, Ehresmann & Waldeck γ are approximated by RDT decisions. Lemma 4 (Approximation of oracles with γ by RDT decisions in ( Dec γ , (cid:22) ∗ ) ) . Setting

Lnd

RDT ( γ , τ ) : = (cid:110) Lnd γ (cid:16) f RDT ( γ , τ ) n (cid:17) : n ∈ N (cid:111) for any given γ ∈ ( , ) , we have: O γ = upper (cid:16) Lnd

RDT ( γ , τ ) , (cid:0) Dec γ , (cid:22) ∗ (cid:1)(cid:17) = sup (cid:16) Lnd

RDT ( γ , τ ) , (cid:0) Dec γ , (cid:22) ∗ (cid:1)(cid:17) Proof.

Given γ ∈ ( , ) , it follows from (2) and [9, Theorem 2] that: ∀ ( q , τ ) ∈ [ , / ) × [ , / ) , ∀ n ∈ N , P DET (cid:104) f RDT ( γ , τ ) n ( YYY n ) (cid:105) (cid:62) Q / (cid:0) ( − q ) √ n , λ γ ( τ √ n ) (cid:1) Since τ < − q , [7, Eq. (3) and Lemma B.2] induce that lim n → ∞ P DET (cid:104) f RDT ( γ , τ ) n ( YYY n ) (cid:105) =

1. The setLnd

RDT ( γ , τ ) ⊂ Lndscp γ thus satisﬁes Theorem 1 conditions with X γ = { τ } and ∀ n ∈ N , f n , τ = f RDT ( γ , τ ) n . ( Dec γ , (cid:22) ∗ ) To state the MP in ( Dec γ , (cid:22) ∗ ) , we need the following lemma. Lemma 5 (Selectivity of NP tests) . ∀ n ∈ N , Sel γ (cid:16) f NP ( γ ) n (cid:17) = { } Proof.

A consequence of [9, Section B, p. 6.].We have now all the material to state the main result.

Theorem 4 (Multiplicity Principle in ( Dec γ , (cid:22) ∗ ) ) . For any given τ ∈ ( , / ) , the MP is satisﬁed in ( Dec γ , (cid:22) ∗ ) by the pair (cid:16) Lnd NP ( γ ) , Lnd

RDT ( γ , τ ) (cid:17) .Proof. According to Theorems 2 and 3, the subsets Lnd NP ( γ ) and Lnd RDT ( γ , τ ) of 2 Dec γ are such thatsup (cid:16) Lnd NP ( γ ) , (cid:0) Dec γ , (cid:22) ∗ (cid:1)(cid:17) = sup (cid:16) Lnd

RDT ( γ , τ ) , (cid:0) Dec γ , (cid:22) ∗ (cid:1)(cid:17) = O γ In addition, (18) and Lemma 5 imply that Lnd NP ( γ ) × Lnd

RDT ( γ , τ ) ∩ (cid:22) ∗ = /0. The conclusion followsfrom Lemma 1. In this paper, via the framework provided by the Multiple Principle (MP), which is motivated by theconcept of degeneracy in biology, and by introducing the notions of test landscapes and selectivity, wehave established that this principle is satisﬁed when we consider the standard NP tests and the RDT testsapplied to a detection problem. One interest of this result is that it opens prospects on the constructionof Memory Evolutive Systems [4, 5] via tests.More elaborated statistical decision problems should be considered beyond this preliminary work.Sequential tests are particularly appealing because they collect information till they can decide withguaranteed performance bounds. On the one hand, the Sequential Probability Ratio Test (SPRT) estab-lished in [11] is proved to be optimal; on the other hand, in [7], we have exhibited non-optimal tests with48

Interfacing biology, CT and statistics performance guarantees in presence of interferences. In the same way as NP and RDT tests satisfy PM,we conjecture that these two types of sequential tests satisfy MP as well.From a pratical point of view, such results open new prospects for the design of networks of sensors,where combining different types of sensors and tests satisfying the MP could bring resilience to theoverall system.

Acknowledgements

The authors are very grateful to the reviewers for their strong encouragements and insightful remarksthat help improve the readiness of this paper.

References [1] A. A. Borovkov (1998):

Mathematical statistics . Gordon and Breach Science Publishers,doi:10.2307/3619119.[2] M. L. Eaton (1983):

Multivariate statistics. A vector space approach . Wiley. Available at https://projecteuclid.org/euclid.lnms/1196285102 .[3] Gerald M. Edelman & Joseph A. Gally (2001):

Degeneracy and complexity in biological systems . Proceed-ings of the National Academy of Sciences .[4] Andrée Ehresmann & Jean-Paul Vanbremeersch (2007):

Memory Evolutive Systems; Hierarchy, Emergence,Cognition , ﬁrst edition.

Studies in multidisciplinarity

4, Elsevier, doi:10.1016/S1571-0831(06)04001-9.[5] Andrée Ehresmann & Jean-Paul Vanbremeersch (2019):

MES: A Mathematical Model for the Revival ofNatural Philosophy . Philosophies

Robust Statistics: the Approach basedon Inﬂuence Functions . John Wiley and Sons, New York, doi:10.1002/9781118186435.[7] Prashant Khanduri, Dominique Pastor, Vinod Sharma & Pramod K. Varshney (2019):

Sequential RandomDistortion Testing of Non-Stationary Processes . IEEE Transactions on Signal Processing

Testing statistical hypotheses , 3rd edition. Springer, doi:10.1007/0-387-27605-X.[9] D. Pastor & Q.-T. Nguyen (2013):

Random Distortion Testing and Optimality of Thresholding Tests . IEEETransactions on Signal Processing

Random distortion testing with linear measurements . Signal Processing .[11] A. Wald (1945):

Sequential Tests of Statistical Hypotheses . The Annals of Mathematical Statistics