Pragmatic hypotheses in the evolution of science
Luis G. Esteves, Rafael Izbicki, Rafael B. Stern, Julio M. Stern
PPragmatic hypotheses inthe evolution of Science
Luís Gustavo Esteves , Rafael Izbicki ,Rafael Bassi Stern , and Julio Michael Stern * 11 University of São Paulo Federal University of São CarlosDecember 22, 2018 * Authors’ e-mails: [email protected], [email protected] (J.M.S., corresponding author); [email protected] (L.G.E.); [email protected] (R.I.); [email protected] (R.B.S.). a r X i v : . [ s t a t . O T ] D ec bstract This paper introduces pragmatic hypotheses and relates this concept to thespiral of scientific evolution. Previous works determined a characterization oflogically consistent statistical hypothesis tests and showed that the modal oper-ators obtained from this test can be represented in the hexagon of oppositions.However, despite the importance of precise hypothesis in science, they cannotbe accepted by logically consistent tests. Here, we show that this dilemma canbe overcome by the use of pragmatic versions of precise hypotheses. Thesepragmatic versions allow a level of imprecision in the hypothesis that is smallrelative to other experimental conditions. The introduction of pragmatic hy-potheses allows the evolution of scientific theories based on statistical hypothe-sis testing to be interpreted using the narratological structure of hexagonal spi-rals, as defined by Pierre Gallais.
Standard hypothesis tests can lead to immediate logical incoherence, which makestheir conclusions hard to interpret. This incoherence is a result of such tests havingonly two possible outcomes. Indeed, Izbicki and Esteves [2015] shows that thereexists no two-valued test that satisfies desirable statistical properties and that is alsologically coherent.In order to overcome such an impossibility result, Esteves et al. [2016] proposesagnostic hypothesis tests, which have three possible outputs: (A) accept the hypoth-esis, say H , (E) reject H , or (Y) remain agnostic about H . These tests can be madelogically coherent while preserving desirable statistical properties. For instance,both conditions are satisfied by the Generalized Full Bayesian Significance Test (GF-BST). Furthermore, Stern et al. [2017] shows that the GFBST’s modal operators andtheir respective negations can be represented by vertices of the hexagon of opposi-tions [Blanché, 1966, Béziau, 2012, 2015, Carnielli and Pizzi, 2008, Dubois and Prade,1982, 2012], which is depicted in fig. 1.This paper complements the above static representation with an analysis of theGFBST in the dynamic evolution of scientific theories. The analysis is based on themetaphor of evolutive hexagonal spirals [Gallais and Pollina, 1974, Gallais, 1982], inwhich the logical modalities associated to scientific theories change over time, as infig. 2. Our key point in this paradigm is reconciling two apparently contradictoryfacts. On the one hand, precise or sharp hypotheses, that is, hypotheses that have apriori zero probability are central in scientific theories [Stern, 2011b, 2017]. On theother hand, the GFBST never accepts (A) precise hypotheses. These observations2igure 1: Hexagons of Opposition for Statistical Modalities.lead to the apparent paradox that, if the GFBST were used to test scientific theories,then the acceptance step in the spiral of scientific theories would be forfeited.In order to overcome this paradox, we propose the concept of a pragmatic hy-pothesis associated to a precise hypothesis. While precise hypothesis are commonlyobtained from mathematical theories used in areas of science and technology [Stern,2011b, 2017], the associated pragmatic hypothesis is an imprecise hypothesis whichis sufficiently good from the practical purpose of an end-user of the theories. Forinstance, Newtonian theory assumes a gravitational force of magnitude given by theequation F = G m m d − , where the gravitational constant G has a precise value.However, the current CODATA (Committee on Data for Science and Technology)value for the gravitational constant is G = × − m kg − s − , whichincludes a standard deviation for the last signicant digits, 408 ±
31. Hence, it maybe reasonable for a given end-user to assume that the theoretical form of the lastequation is exact, but that, pragmatically, the constant G can only be known up to achosen precision. As a result, one might wish to test an imprecise hypothesis asso-ciated to the scientific hypothesis of interest [DeGroot and Schervish, 2012, Berger,2013]This article advocates for the conceptual distinction between a precise scientifictheory and an associated pragmatic hypotheses. The alternate use of precise andpragmatic versions of corresponding statistical hypotheses enables the GFBST to(pragmatically) accept scientific hypotheses. Moreover, this alternate use allows theGFBST to track the evolution of scientific theories, as interpreted in the context ofGallais’ hexagonal spirals.Our main goal in this paper is to formalize testing procedures for a theory taking3nto consideration the level of precision that is appropriate for a given end-user. Inorder to handle this problem, we consider the end-user’s predictions about an ex-periment of his interest. The variation in these predictions can be explained by acombination of the level of imprecision in the theory and by properties of the end-user’s experiment. For instance, the latter source of variation is influenced by prop-erties of the equipment, including precision, accuracy and resolution of measuringdevices [Bucher, 2012, Czichos et al., 2011], and also error bounds for fundamen-tal constants and calibration factors [Cohen et al., 1957, Cohen, 1957, Lévy-Leblond,1977, Pakkan and Akman, 1995, Akman and Pakkan, 1996, Wainwright, 2002, Bishop,2006, Iordanov, 2010, Gelman et al., 2014]. We propose to choose a pragmatic hy-pothesis in such a way that the imprecision in the end-user’s predictions is mostlydue to his experimental conditions and not due to the level of imprecision in thetheory that he uses.In order to develop this argument, section 2 first adapts Gallais’s metaphor ofhexagonal spirals to the evolution of science. Next, section 3, proposes three meth-ods of decomposing the variability in an end-user’s predictions into the level of pre-cision of the theory he uses and his experimental conditions. Sections 3.1 and 3.2use these decompositions in order to build pragmatic hypothesis. They build prag-matic hypotheses for simple hypotheses and then prove that there exists a singleway of extending this construction to composite hypotheses while preserving logi-cal coherence in simultaneous hypothesis testing. This methodology is illustratedin section 4. All proofs are found in appendix A. Following a well-established tradition in structural semantics and narratology [Greimas,1983, Propp, 2000], Gallais and Pollina [1974] proposes that many classical medievaltales follow the same organizational pattern. More precisely, these narratives ex-hibit an underlying intellectual structure and are organized according to an under-lying archetypal format or prototypical pattern. This pattern includes both staticand a dynamical aspects. From a static perspective, the logical structure of the nar-rative is such that each arch is represented by a vertex of the hexagon of oppositions [Blanché, 1966]. The static hexagon of oppositions is depicted in fig. 1 and repre-sents in each vertex a modal operator among necessity ( (cid:228) ), possibility ( ♦ ), contin-gency ( ∆ ) and their negations ( ¬ ). These modal operators are structured accordingto three axes of opposition ( === ), a triangle of contrariety ( − − − ), another triangleof sub-contrariety, ( ··· ), and several edges of subalteration ( −→ ). From a dynamicalperspective, the temporal evolution of the narrative follows a spiral (fig. 2) that un-4igure 2: Gallais’ evolutionary spiral.winds ( se déroule ) around concentric and expanding hexagons of opposition [Gallaisand Pollina, 1974, Gallais, 1982].Since also the evolution of science can be conceived as following a spiral pattern[Stern, 2014], its analysis can benefit from the structure in [Gallais and Pollina, 1974,Gallais, 1982]. From a static perspective, the logical modalities induced by agnostichypothesis tests [Stern et al., 2017] can be represented in the hexagon of oppositions.From a dynamic perspective, scientific theories evolve as a spiral which unwindsaround the following states:• A - Extant thesis: This vertex represents a standing paradigm, an accepted theoryusing well-known formalisms and familiar concepts, relying on accredited exper-imental means and methods, etc. In fact, the concepts of a current paradigm maybecome so familiar and look so natural that they become part of a reified ontology.That is, there is a perceived correspondence between concepts of the theory and dinge-an-sich (things-in-themselves) as seen in nature [Stern, 2011a, 2014].• U - Analysis: This vertex represents the moment when some hypotheses of thestanding theory are put in question . At this moment, possible alternatives to thestanding hypotheses may still be only vaguely defined• E - Antithesis: This vertex represents the moment when some laws of the stand-5ng theory have to be rejected . Such a rejection of old laws may put in questionthe entire world-view of the current paradigm, opening the way for revolutionaryideas, as described in the next vertex.• O - Apothesis/ Prosthesis: This vertex is the locus of revolutionary freedom. Al-ternative models are considered, and specific (precise) forms investigated. Thereis intellectual freedom to set aside and dispose of (apothesis) old preconceptions,prejudices and stereotypes, and also to explore and investigate new paths, to puttogether (prosthesis) and try out new concepts and ideas.• Y - Synthesis: It is at this vertex that new laws are formulated ; this is the pointof Eureka moment(s). A selection of old and and new concepts seem to click intoplace, fitting together in the form of new laws, laws that are able to explain newphenomena and incorporate objects of an expanded reality.• I - Enthesis: At this vertex new laws, concepts and methods must enter and be in-tegrated into a consistent and coherent system. At this stage many tasks are per-formed in order to combine novel and traditional pieces or to accommodate orig-inal and conventional components into an well-integrated framework. Finally,new experimental means and methods are developed and perfected, allowing thenew laws to be corroborated.• A - New Thesis: At this vertex, the new theory is accepted as the standard paradigmthat succeeds the preceding one ( A ). Acceptance occurs after careful determina-tion of fundamental constants and calibration factors (including their known pre-cision), metrological and instrumentational error bounds, etc. At later stages ofmaturity, equivalent theoretical frameworks may be developed using alternativeformalisms and ontologies. For example, analytical mechanics offers variationalalternatives that are (almost) equivalent to the classical formulation of Newtonianmechanics [Abraham and Marsden, 2013]. Usually, these alternative world-viewsreinforce the trust and confidence on the underlying laws. Nevertheless, the exis-tence of such alternative perspectives may also foster exploratory efforts and in-vestigative works in the next cycle in evolution.Table 1 applies this spiral structure to the evolution of the theories of orbital as-tronomy and chemical affinity. The evolution of orbital astronomy has been widelystudied [Hawking, 2004]. The evolution of chemical affinity is presented in greaterdetail in Stern [2014], Stern and Nakano [2014].The above spiral structure highlights that a statistical methodology should beable to obtain each of the six modalities in the hexagon of oppositions. Before anacceptance vertex (A) in the hexagon is reached by the spiral of scientific evolution,6ertex Orbital astronomy Chemical Affinity I - Enthesis/ Ptolemaic/ Copernican Geoffroy affinity table and A - Thesis cycles and epicycles highest rank substitution U - Analysis Circular or oval orbits? Ordinal or numeric affinity? E - Antithesis Non-circular orbits Non-ordinal affinity O - Apothesis Elliptic planetary orbits, Integer affinity values,/Prosthesis focal centering of sun for arithmetic recombination Y - Synthesis Kepler laws! Morveau rules and tables! I - Enthesis Vortex physics theories, Affinity + stoichiometry A - Thesis Keplerian astronomy substitution reactions U - Analysis Tangential or radial forces? Total or partial reaction? E - Antithesis Non-tangential forces Non-total substitutions O - Apothesis Radial attraction forces, Reversible reactions,/Prosthesis inverse square of distance equilibrium conditions Y - Synthesis Newton laws! Mass-Action kinetics! I - Enthesis/ Newtonian mechanics & Thermodynamic theories A - Thesis variational equivalents for reaction networksTable 1: Evolution of orbital astronomy and chemical affinity.theoretically precise or sharp hypotheses must be formulated. However, a logicallycoherent hypothesis test, such as the GFBST, can choose solely between rejecting orremaining agnostic (i.e. corroborating) such sharp hypotheses. Once the evolvingtheory becomes (part of) a well-established paradigm, the GFBST can be used withthe goal of accepting non-sharp hypotheses in the context of the same paradigm, acontext that includes fundamental constants and calibration factors (and their re-spective uncertainties), metrological error bounds, specified accuracies of scientificintrumentation, etc. The non-sharp versions of sharp hypotheses used in such testsare called pragmatic, and their formulation is developed in the following sections. In order to derive pragmatic hypotheses from precise ones, it is necessary to definean idealized future experiment. Let θ be an unknown parameter of interest which isused to express scientific hypotheses and that takes values in the parameter space, Θ . A scientific hypothesis takes the form H : θ ∈ Θ , where Θ ⊂ Θ . Whenever thereis no ambiguity, H and Θ are used interchangeably. Also, the determination of θ isuseful for predicting an idealized future experiment, Z , which takes values in Z . Theuncertainty about Z depends on θ by means of P θ ∗ , the probability measure over Z when it is known that θ = θ ∗ , θ ∗ ∈ Θ . 7ften, it is sufficient for an end-user to determine a pragmatic hypothesis, thatis, that the parameter lies in a set of plausible values, which is larger than the nullhypothesis. This set can be chosen in such a way that the variation over predic-tions about a future experiment is mostly due to experimental conditions ratherthan to the imprecision in the value of the parameter. This section formally developsa methodology for determining these pragmatic hypotheses.In order to compare two parameter values, we use a predictive dissimilarity , d Z ,which is a function d Z : Θ × Θ → (cid:82) + , such that d Z ( θ , θ ∗ ) measures how much thepredictions made for Z based on θ ∗ diverge from the ones made based on θ . Wedefine and compare three possible choices for such a dissimilarity. Definition 3.1.
The
Kullback-Leibler predictive dissimilarity , KL Z isKL Z ( θ , θ ∗ ) = KL( (cid:80) θ ∗ , (cid:80) θ ) = (cid:90) Z log (cid:181) d (cid:80) θ ∗ d (cid:80) θ (cid:182) d (cid:80) θ ∗ ,that is, KL Z ( θ , θ ∗ ) is the relative entropy between (cid:80) θ ∗ and (cid:80) θ . Example 3.2 (Gaussian with known variance) . Let Z = ( Z ..., Z d ) ∼ N ( θ , Σ ) be arandom vector with a multivariate Gaussian distribution: d (cid:80) θ ( z ) d z = (cid:107) π Σ (cid:107) − exp (cid:161) − z − θ ) t Σ − d ( z − θ ) (cid:162) KL Z ( θ , θ ∗ ) = (cid:90) (cid:82) d log (cid:181) d (cid:80) θ ∗ ( z ) d (cid:80) θ ( z ) (cid:182) d (cid:80) θ ∗ ( z ) = θ − θ ∗ ) t Σ − ( θ − θ ∗ ),When d = Σ = σ , K L z ( θ , θ ∗ ) = ( θ − θ ∗ ) σ (1)The KL dissimilarity evaluates the distance between the predictive probabilitydistributions for the future experiment under two parameter values, θ and θ ∗ . Al-though the KL dissimilarity is general, it can be challenging to interpret. In particu-lar, it can be hard to establish how good are the predictions for Z based on θ ∗ when Z is actually generated from θ and K L Z ( θ , θ ∗ ) ≤ (cid:178) . A more interpretable dissimilar-ity is obtained by taking d Z ( θ , θ ∗ ) to measure how far are the best predictions for Z based on θ ∗ and θ . In this case, if one makes a prediction for Z based on θ ∗ , z ∗ , and Z was actually generated using θ , then d Z ( θ , θ ∗ ) ≤ (cid:178) guarantees that z ∗ will be atmost (cid:178) apart from the best possible prediction. Such a dissimilarity is discussed inthe following definition. 8 efinition 3.3 (Best prediction dissimilarity - BP) . Let ˆ Z : Θ → Z be such that ˆ Z ( θ )is the best prediction for Z given that θ = θ . For example, one can takeˆ Z ( θ ) = argmin z ∈ Z δ Z , θ ( z ),where δ Z , θ : Z → (cid:82) is such that δ Z , θ ( z ) measures how bad z predicts Z when θ = θ .The best prediction dissimilarity , BP Z ( θ , θ ∗ ), measures how badly ˆ Z ( θ ∗ ) predicts Z relatively to ˆ Z ( θ ) when θ = θ . Formally,BP Z ( θ , θ ∗ ) = g (cid:195) δ Z , θ (ˆ Z ( θ ∗ )) − δ Z , θ (ˆ Z ( θ )) δ Z , θ (ˆ Z ( θ )) (cid:33) ,where g : (cid:82) −→ (cid:82) is a motononic function. The choice of g in a particular settingaims at improving the interpretation of the best prediction dissimilarity criterion. Example 3.4 (BP under quadratic form) . Let Z = (cid:82) d , µ Z , θ = E [ Z | θ ], Σ Z , θ = (cid:86) [ Z | θ ]and S be a positive definite matrix. Define the quadratic form induced by S to be (cid:107) z (cid:107) S = z T Sz and δ Z , θ ( z ) = E (cid:163) (cid:107) Z − z (cid:107) S | θ = θ (cid:164) The optimal prediction under θ ∗ is ˆ Z ( θ ∗ ) = µ Z , θ ∗ . It follows that δ Z , θ (ˆ Z ( θ ∗ )) = E (cid:163) (cid:107) Z − µ Z , θ ∗ (cid:107) S | θ = θ (cid:164) = (cid:107) µ Z , θ − µ Z , θ ∗ (cid:107) S + E (cid:163) (cid:107) Z − µ Z , θ (cid:107) S | θ = θ (cid:164) In particular, δ Z , θ (ˆ Z ( θ )) = E (cid:163) (cid:107) Z − µ Z , θ (cid:107) S | θ = θ (cid:164) . Therefore,BP Z ( θ , θ ∗ ) = g (cid:195) (cid:107) µ Z , θ − µ Z , θ ∗ (cid:107) S E (cid:163) (cid:107) Z − µ Z , θ (cid:107) S | θ = θ (cid:164) (cid:33) (2)In this example, BP Z can be put in the same scale as Z by taking g ( x ) = (cid:112) x . Also, twochoices of S are of particular interest. When S = (cid:86) [ Z | θ = θ ] − , eq. (2) simplifies toBP Z ( θ , θ ∗ ) = g (cid:181) d − (cid:107) µ Z , θ − µ Z , θ ∗ (cid:107) Σ − Z , θ (cid:182) (3)Similarly, when S is the identity matrix, eq. (2) simplifies toBP Z ( θ , θ ∗ ) = g (cid:195) (cid:107) E [ Z | θ = θ ] − E [ Z | θ = θ ∗ ] (cid:107) tr( (cid:86) [ Z | θ = θ ]) (cid:33) (4)9quation (4) admits an intuitive interpretation. The larger the value of tr( (cid:86) [ Z | θ = θ ]), the more Z is dispersed and the harder it is to predict its value. Also, (cid:107) E [ Z | θ = θ ] − E [ Z | θ = θ ∗ ] (cid:107) measures how far apart are the best prediction for Z under θ = θ and θ = θ ∗ . That is, BP Z ( θ , θ ∗ ) captures that, if one predicts Z assuming that θ = θ ∗ when actually θ = θ , then the error with respect to the best prediction is increasedas a function of the distance between the predictions over the dispersion of Z . Example 3.5 (Gaussian with known variance) . Consider again Example 3.2 and let δ Z , θ ( z ) be such as in Example 3.4. It follows from eq. (4) that, when S is the identitymatrix, BP Z ( θ , θ ∗ ) = g (cid:195) (cid:107) θ − θ ∗ (cid:107) tr( Σ ) (cid:33) (5)Similarly, it follows from eq. (3) that, when S = Σ − ,BP Z ( θ , θ ∗ ) = g (cid:161) d − ( θ − θ ∗ ) t Σ − ( θ − θ ∗ ) (cid:162) (6)Conclude from eq. (6) that, if S = Σ − and g ( x ) = x , then BP Z ( θ , θ ∗ ) = d − KL Z ( θ , θ ∗ ).Also, when d = Σ = σ and g ( x ) = (cid:112) x both eq. (5) and eq. (6) simplify toBP Z ( θ , θ ∗ ) = σ − | θ − θ ∗ | (7)In some situations, Z is the average of m independent observations distributed as N ( θ , Σ ). In this case, Z ∼ N ( θ , m − Σ ). It follows from eq. (5) that BP Z ( θ , θ ∗ ) = g (cid:181) m (cid:107) θ − θ ∗ (cid:107) tr ( Σ ) (cid:182) , when S is the identity, and BP Z ( θ , θ ∗ ) = g (cid:161) md − ( θ − θ ∗ ) t Σ − ( θ − θ ∗ ) (cid:162) ,when S = Σ − .Although BP Z is more interpretable then KL Z it also relies on more tuning vari-ables, such as δ , ˆ Z and g . A balance between these features is obtained by a thirdpredictive dissimilarity, which evaluates how easy it is to recover the value of θ be-tween θ or θ ∗ based on Z . Definition 3.6 (Classification distance - CD) . Let ˆ θ θ , θ ∗ : Z → Θ be such thatˆ θ θ , θ ∗ ( z ) = arg max θ ∈ { θ , θ ∗ } f Z ( z | θ )ˆ θ θ , θ ∗ assigns to each possible outcome of the future experiment z , which value of θ , θ or θ ∗ , makes the experimental result more likely. The classification distance θ and θ ∗ , CD( θ , θ ∗ ), is defined asCD( θ , θ ∗ ) = (cid:80) (cid:161) ˆ θ θ , θ ∗ ( Z ) = θ | θ (cid:162) + (cid:80) (cid:161) ˆ θ θ , θ ∗ ( Z ) = θ ∗ | θ ∗ (cid:162) − θ , θ ∗ ) + θ against θ ∗ usinga uniform prior for θ and the 0/1 utility [Berger, 2013]. By subtracting 0.5 from thisquantity, CD( θ , θ ∗ ) varies between 0 and 0.5 and is a distance. Also,CD( θ , θ ∗ ) = (cid:80) θ , (cid:80) θ ∗ ) = (cid:107) (cid:80) θ − (cid:80) θ ∗ (cid:107) ,where TV( (cid:80) θ , (cid:80) θ ∗ ) = sup A | (cid:80) θ ( A ) − (cid:80) θ ∗ ( A ) | and (cid:107) (cid:80) θ − (cid:80) θ ∗ (cid:107) = (cid:82) Z | (cid:80) θ ( z ) − (cid:80) θ ∗ ( z ) | d z is the L -distance between probability measures. Example 3.7 (Gaussian with known variance) . Consider Examples 3.2 and 3.5. When d = Σ = σ , obtain CD Z ( θ , θ ∗ ) = Φ (cid:181) | θ − θ ∗ | σ (cid:182) −
12 (8)Note that, in this case,
C D would be the same as BP if, instead of taking g ( x ) = (cid:112) x ,one chose g ( x ) = Φ (0.5 (cid:112) x ) − We start by defining the pragmatic hypothesis associated to a singleton hypothesis.A singleton hypothesis is one in which the parameter assumes a single value, suchas H : θ = θ . In this case, the pragmatic hypothesis associated to H is the set ofpoints whose dissimilarity to θ is at most (cid:178) , as formalized below. Definition 3.8 (Pragmatic hypothesis for a singleton) . Let H : θ = θ , d Z be a predic-tive dissimilarity function and (cid:178) >
0. The pragmatic hypothesis for H , P g ({ θ }, d Z , (cid:178) ),is P g ({ θ }, d Z , (cid:178) ) = { θ ∗ ∈ Θ : d Z ( θ , θ ∗ ) ≤ (cid:178) } Example 3.9 (Gaussian with known variance) . Consider Examples 3.2 and 3.5 when11 = Σ = σ and g ( x ) = (cid:112) x . It follows from eqs. (1), (7) and (8) that P g ({ θ }, BP Z , (cid:178) ) = [ θ − (cid:178)σ , θ + (cid:178)σ ] P g ({ θ }, K L Z , (cid:178) ) = (cid:104) θ − (cid:112) (cid:178)σ , θ + (cid:112) (cid:178)σ (cid:105) P g ({ θ }, C D Z , (cid:178) ) = (cid:163) θ − Φ − (0.5 + (cid:178) ) σ , θ + Φ − (0.5 + (cid:178) ) σ (cid:164) Note that the size of each of the pragmatic hypothesis is proportional to σ . Thisoccurs because every one of the predictive dissimilarity functions makes the predic-tion error due to the unknown parameter value small with respect to that due to thedata variability, σ . Next, we consider pragmatic hypotheses for general hypotheses H : θ ∈ Θ , where Θ ⊂ Θ . Definition 3.10.
For each hypothesis Θ ⊆ Θ , predictive dissimilarity d Z and (cid:178) > P g ( Θ , d Z , (cid:178) ) is the pragmatic hypothesis associated to Θ induced by d Z and (cid:178) .Whenever d Z and (cid:178) are clear or not relevant to the result, we write P g ( Θ ) instead of P g ( Θ , d Z , (cid:178) ).In order to construct these pragmatic hypotheses, we use logically coherent ag-nostic hypothesis tests. For each hypothesis, an agnostic hypothesis test can eitherreject it (1), accept it (0) or remain agnostic (1/2). Esteves et al. [2016] shows thatan agnostic hypothesis test is logically coherent if and only if it is based on a regionestimator. Such tests are presented in Definition 3.12 and illustrated in fig. 3. Definition 3.11.
Let X denote the sample space of the data used to test a hypothe-sis. A region estimator is a function, R : X −→ P ( Θ ), where P ( Θ ) is the power set of Θ . Definition 3.12 (Agnostic test based on a region estimator) . The agnostic test basedon the region estimator R for testing H , φ RH , is such that φ RH ( x ) = R ( x ) ⊆ H R ( x ) ⊆ H c , otherwise.Besides the logical conditions on the hypothesis test, one might also impose log-ical restraints on how pragmatic hypotheses are constructed. For instance, let A and12 ( x ) H H c φ R ( x ) = R ( x ) H H c φ R ( x ) = R ( x ) H H c φ R ( x ) = Figure 3: φ ( x ) is an agnostic test based on the region estimator R ( x ) for testing H . B be two hypothesis such that B logically entails A , that is, B ⊆ A . If a logically co-herent test accepts B , then it also accepts A . This property is called monotonocity[Izbicki and Esteves, 2015, da Silva et al., 2015, Fossaluza et al., 2017]. One might alsoimpose that P g is such that, if a logically coherent hypothesis test accepts
P g ( B ),then it should also accept P g ( A ). Similarly, let ( A i ) i ∈ I be a collection of hypothesiswhich cover A , that is, A ⊆ ∪ i ∈ I A i . If a logically coherent hypothesis test rejects ev-ery A i , then it rejects A . This property is called union consonance. One might alsoimpose that P g is such that, if a logically coherent hypothesis test rejects
P g ( A i ) forevery i , then it should also reject P g ( A ). The above conditions define the logicalcoherence of a procedure for constructing pragmatic hypotheses. Definition 3.13.
A procedure for constructing pragmatic hypothesis,
P g , is logicallycoherent if, for every logically coherent hypothesis test φ and sample point x :1. If φ Pg ( B ) ( x ) = B ⊆ A , then φ Pg ( A ) ( x ) = φ Pg ( A i ) ( x ) = i ∈ I and A ⊆ ∪ i ∈ I A i , then φ Pg ( A ) ( x ) = A A , AB and BB in a given population are θ , θ and θ . Note that B : = {0.25,0.5,0.25} is asubset of A = {( p ,2 p (1 − p ),(1 − p ) ) : p ∈ [0,1]}, which denotes the Hardy-Weinbergequilibrium. That is, if the frequencies A A , AB and BB are, respectively, 0.25, 0.5and 0.25, then the population follows the Hardy-Weinberg equilibrium. As a result,if one pragmatically accepts that the population satisfies the specified proportions,then one might also wish to pragmatically accept that the population follows theHardy-Weinberg. Similarly, if one pragmatically rejects for every p ∈ [0,1] that thefrequencies of A A , AB and BB are, respectively, p , 2 p (1 − p ) and (1 − p ) , thenone might also wish to pragmatically reject that the population follows the Hardy-Weinberg equilibrium. These conditions are assured in Definition 3.13.13n a logically coherent procedure for constructing pragmatic hypotheses, thepragmatic hypothesis associated to a composite hypothesis is completely determinedby the pragmatic hypotheses associated to simple hypotheses. This result is pre-sented in Theorem 3.14. Theorem 3.14.
A procedure for constructing pragmatic hypothesis, P g , is logicallycoherent if and only if, for every hypothesis Θ , P g ( Θ ) = (cid:83) θ ∈ Θ P g ({ θ }) . Using Theorem 3.14 it is possible to determine a logically coherent procedure forconstructing pragmatic hypotheses by determining only the pragmatic hypothesisassociated to simple hypothesis, such as in section 3.1. Theorem 3.14 is illustratedin section 4.Besides being logically coherent, it is often desirable in statistics [Pereira andStern, 2008, Stern and Pereira, 2014] and in science [Stern, 2011b, 2017] for a pro-cedure to be invariant to reparametrization. That is, that the procedure reaches thesame conclusions whatever the coordinate system is used to specify both the sampleand the parameter spaces. For instance, the pragmatic hypothesis that is obtainedusing the International metric system should be compatible to the one that is ob-tained using the English metric system. Invariance to reparametrization is formallypresented in Definition 3.16.
Definition 3.15. (cid:161) (cid:80) ∗ θ ∗ (cid:162) θ ∗ ∈ Θ ∗ is a reparameterization of ( (cid:80) θ ) θ ∈ Θ if there exists a bijec-tive function, f : Θ → Θ ∗ , such that for every θ ∈ Θ , (cid:80) θ = (cid:80) ∗ f ( θ ) . Definition 3.16.
Let (cid:161) (cid:80) ∗ θ ∗ (cid:162) θ ∗ ∈ Θ ∗ be a reparametrization of ( (cid:80) θ ) θ ∈ Θ by a bijective func-tion, f : Θ → Θ ∗ . Also, let d Z and d ∗ Z be predictive dissimilarity functions. The func-tions d Z and d ∗ Z are invariant to the reparametrization if for every logically coherentprocedure for constructing pragmatic hypotheses, P g , f [ P g ( Θ , d Z , (cid:178) )] = P g ( f [ Θ ], d ∗ Z , (cid:178) ),Definition 3.16 states that, if Θ is an hypothesis and invariance to reparametriza-tion holds, then the pragmatic hypothesis obtained in a reparametrization of Θ , say P g ( f [ Θ ]), is the same as the transformed pragmatic hypothesis associated to Θ , f [ P g ( Θ )]. Theorem 3.17 presents a sufficient condition for obtaining invariance toreparametrization. Theorem 3.17.
Let (cid:161) (cid:80) ∗ θ ∗ (cid:162) θ ∗ ∈ Θ ∗ be a reparameterization of ( (cid:80) θ ) θ ∈ Θ given by a bijec-tive function, f . If d Z and d ∗ Z satisfy d Z ( θ , θ ) = d ∗ Z ( f ( θ ), f ( θ )) , then d Z and d ∗ Z areinvariant to this reparametrization. orollary 3.18. If d Z and d ∗ Z are the same choice between KL, BP or CD, then d Z andd ∗ Z are invariant to every reparametrization. The procedures for constructing pragmatic hypotheses induced by
K L and
C D also satisfy an additional property given by Theorem 3.19.
Theorem 3.19.
Let Z m = ( Z ,..., Z m ) , where Z i ’s are i.i.d. F θ and ( F θ ) θ ∈ Θ is identifi-able [Casella and Berger, 2002, Wechsler et al., 2013]. Also, let K L m and C D m be thedissimilarities calculated using Z m . If P g is logically coherent then, for every Θ ⊆ Θ and (cid:178) > ,(i) (cid:161) P g ( Θ , K L m , (cid:178) ) (cid:162) m ≥ and (cid:161) P g ( Θ , C D m , (cid:178) ) (cid:162) m ≥ are non-increasing sequences ofsets(ii) P g ( Θ , K L m , (cid:178) ) m →∞ −−−−→ Θ and P g ( Θ , C D m , (cid:178) ) m →∞ −−−−→ Θ . Theorem 3.19 states that the sequence of pragmatic hypotheses for Θ inducedby d Z m is non-increasing if the dissimilarity is evaluated by either KL or CD. Thegreater the number of observable quantities Z m , the easier it is to distinguish twoparameter values θ and θ ∗ and, therefore, the smaller the amount of parametersthat are taken as close to θ . Also, as the sample size goes to infinity, the pragmatichypothesis associated to Θ converges to to Θ . In other words, for each θ ∈ Θ ,no other parameter value can predict infinitely many observable quantities with aprecision sufficiently close to that of θ . In the following, pragmatic hypotheses for standard statistical problems are derived.
Example 4.1 (Gaussian with unknown variance) . Consider the setting from Exam-ple 3.9, but with σ unknown and 0 < σ ≤ M . In this case, the parameter is θ = ( µ , σ ). Consider the composite hypothesis H : { µ } × (0, M ], which is often writtenas H : µ = µ . In this case let θ = ( µ , σ ) and Θ = { µ } × (0, M ]. Proceeding as inExample 3.9, it follows that P g ({ θ }, BP Z , (cid:178) ) = [ µ − (cid:178)σ , µ + (cid:178)σ ] × (0, M ] P g ( Θ , BP Z , (cid:178) ) = [ µ − (cid:178) M , µ + (cid:178) M ] × (0, M ] Theorem 3.14The rectangular shape of these pragmatic hypotheses seems to be unreasonable as,for instance, whether a point ( µ , σ ) is close to ( µ , σ ) does not depend on σ . Thisis a consequence of the choice of δ in Example 3.9.15igure 4: Pragmatic hypotheses in Example 4.1 for H : µ = (cid:178) = M = H is represented by a red line in all figures.Figure 4 presents the pragmatic hypotheses for H : µ = σ = H : µ = (cid:178) = M =
2, and using the KL and CD dissimilarities. Contrary to BP,the hypotheses obtained from these dissimilarities do not have a rectangular shape.In particular, the triangular shape of the pragmatic hypotheses for H : µ = σ is to 0, the smaller the range of values for µ that are included inthe pragmatic hypothesis. This behavior might be desirable since, when σ is small,there is little uncertainty about the value of Z and, consequently, a narrow intervalof values of µ can predict Z with precision (cid:178) . Example 4.2 (Hardy-Weinberg equilibrium) . Let Z ∼ Multinomial( m , θ ), where θ = ( θ , θ , θ ), θ i ≥
0, and (cid:80) i = θ i =
1. The Hardy-Weinberg (HW) hypothesis [Hardy,2003], H , which is depicted in the red curve in fig. 5 satisfies H : θ ∈ Θ , Θ = (cid:169)(cid:161) p ,2 p (1 − p ),(1 − p ) (cid:162) : 0 ≤ p ≤ (cid:170) If θ p = ( p ,2 p (1 − p ),(1 − p ) ), δ Z ( z ) = E [ (cid:107) Z − z (cid:107) | θ = θ p ] and g ( x ) = (cid:112) x , then itfollows from Example 3.4 thatBP Z ( θ p , θ ∗ ) = (cid:181) m × ( θ − p ) + ( θ − p (1 − p )) + ( θ − (1 − p ) ) p (1 − p ) + p (1 − p )(1 − p (1 − p )) + (1 − p ) (1 − (1 − p ) ) (cid:182) The pragmatic hypotheses that are obtained using
K L , BP and C D for the HW hy-16igure 5: Pragmatic hypotheses obtained for the HW equilibrium, depicted in red,using m = (cid:178) = (cid:178) = p = (top) and for HW (bottom). The lower,middle and right panels were obtained, respectively, with BP, KL and CD. The greenregions in the right panels represents 80% HPD regions for the genotype distributionof each of the eight groups collected by Brentani et al. [2011] and two simulateddatasets.pothesis are depicted in fig. 5. The choice between BP or KL and CD has a largeimpact over the shape of the pragmatic hypotheses. While for BP the width of thepragmatic hypothesis is approximately uniform along the HW curve, the width ofthe pragmatic hypotheses obtained using K L and
C D is smaller towards the edgesof the HW curve. This behavior could be expected since, towards the edges of theHW curve, Z has the smallest variability. The figure also depicts the challenge in cal-ibrating K L . While the pragmatic hypotheses for BP and C D have similar sizes whenusing (cid:178) = K L while using (cid:178) = (cid:178) Example 4.3 (Bioequivalence) . Assume that Z = ( X , Y ) ∼ N (( µ , µ ), σ (cid:73) ), with σ
17A AD DD Decision1 4 18 94 Agnostic2 6 53 74 Accept3 57 118 100 Agnostic4 58 97 48 Agnostic5 120 361 194 Agnostic6 206 309 142 Accept7 110 148 44 Accept8 34 22 12 Agnostic9 198 282 520 Reject10 641 314 45 AcceptTable 2: Genotype counts for the eight groups in Brentani et al. [2011]. Also, thedecision of the GFBST agnostic hypothesis test [Esteves et al., 2016] for testing ineach group the pragmatic Hardy-Weinberg equilibrium hypothesis with m =
20. Thedecisions are the same for
K L , BP and C D .known. We derive the pragmatic hypothesis for H : µ = µ , that is, for {( µ , µ ) ∈ (cid:82) : µ = µ }. Such a test might be used in a bioequivalence study, where X and Y arethe concentrations of an active ingredient in a generic (test) drug medication and inthe brand-name (reference) medication [Chow et al., 2016], respectively. Since H iscomposite, it is helpful to derive the pragmatic hypothesis of its constituents.In order to do so, let θ = ( µ , µ ), µ ∈ (cid:82) , θ ∗ = ( µ ∗ , µ ∗ ), and H θ : θ = θ . If δ Z , θ ∗ ( z ) = E (cid:163) ( X − z ) + ( Y − z ) | θ = θ ∗ (cid:164) and g ( x ) = (cid:112) x , thenBP Z ( θ , θ ∗ ) = (cid:115) ( µ ∗ − µ ) + ( µ ∗ − µ ) σ Hence,
P g ({ θ }, BP Z , (cid:178) ) = (cid:169) ( µ ∗ , µ ∗ ) : ( µ ∗ − µ ) + ( µ ∗ − µ ) ≤ (cid:178) σ (cid:170) which is a circlewith center ( µ , µ ) and radius (cid:112) (cid:178)σ , as depicted on the left panel of fig. 6. In thiscase, the pragmatic hypothesis is the Tier 1 Equivalence Test hypothesis suggestedby the US Food and Drug Administration [Chow et al., 2016]. The pragmatic hypoth-esis for H : µ = µ is obtained by taking the union of the pragmatic hypothesesassociated to its constituents, as illustrated in the right panel of fig. 6. Specifically, P g ( H , BP Z , (cid:178) ) = (cid:169) ( µ ∗ , µ ∗ ) : | µ ∗ − µ ∗ | ≤ (cid:178)σ (cid:170) The pragmatic hypothesis for H using KL is obtained similarly. Note thatKL Z ( θ , θ ∗ ) = Z ( θ , θ ∗ )18 a) H : µ = µ = µ . (b) H : µ = µ . Figure 6: Pragmatic hypotheses using BP in Example 4.3 when σ is known.Therefore, P g ({ θ }, K L Z , (cid:178) ) = (cid:169) ( µ ∗ , µ ∗ ) : ( µ ∗ − µ ) + ( µ ∗ − µ ) ≤ (cid:178)σ (cid:170) and P g ( H , K L Z , (cid:178) ) = P g ( H , K L Z ,0.5 (cid:178) )The pragmatic hypothesis for H that is obtained using CD has no analytic ex-pression. However, by observing that N ( µ , σ ) = µ + σ N (0,1), it is possible to showthat, there exists a monotonically increasing function, h : (cid:82) −→ (cid:82) , such that P g ( H , C D Z ,0.5 (cid:178) ) = (cid:169) ( µ ∗ , µ ∗ ) : | µ ∗ − µ ∗ | ≤ h ( (cid:178) ) σ (cid:170) That is, the pragmatic hypothesis associated to H have the same shape as in theright panel of fig. 6. They differ solely on how many standard deviations correspondto the width of the pragmatic hypothesis. The spiral structure studied in [Gallais and Pollina, 1974] can be used to describescientific evolution. However, in order for the analogy to be complete, it is neces-sary to indicate what types of scientific theories or hypotheses are effectively testedin the acceptance vertex of the hexagon of oppositions. We defend that these arepragmatic hypotheses, which are sufficiently precise for the end-user of the theory.In order to make this statement formal, we introduce three methods for con-structing a pragmatic hypothesis associated to a precise hypothesis. These methodsare based on three predictive dissimilarity functions: KL, BP and CD. Each of these19ethods have different advantages. For instance, the scale of BP and CD is moreinterpretable than KL, making it easier to determine whether the former are largeor small. On the other hand, BP relies on the definition of more functions than KLand CD, such as δ Z , θ ( z ) in Definition 3.3. If these function are chosen inadequately,then the shape of the resultant pragmatic hypothesis might be counter-intuitive ormeaningless. Finally, CD often does not have an analytic expression. It relies on nu-merical integration over the sample space, which can be taxing in high dimensions. Acknowledgments
The authors are grateful for the support of IME-USP, the Institute of Mathematicsand Statistics of the University of São Paulo, and the Department of Statistics of UFS-Car - The Federal University of São Carlos. Finally, the authors are grateful for adviceand comments received from anonymous referees, and from participants of the 6thWorld Congress on the Square of Opposition, held on November 1-5, 2018, at Cha-nia, Crete, having as main organizers Jean-Yves Béziau and Ioannis Vandoulakis.This work was partially supported by
CNPq – Conselho Nacional de Desenvolvi-mento Científico e Tecnológico , grants PQ 06943-2017-4, 301206-2011-2 and 301892-2015-6; and
FAPESP – Fundação de Amparo à Pesquisa do Estado de São Paulo , grants2017/03363-8, 2014/25302-2, CEPID-2013/07375-0, and CEPID-2014/50279-4.
References
R. Izbicki and L. G. Esteves. Logical consistency in simultaneous statistical test pro-cedures.
Logic Journal of the IGPL , 23(5):732–758, 2015.Luís G Esteves, Rafael Izbicki, Julio M Stern, and Rafael B Stern. The logical consis-tency of simultaneous agnostic hypothesis tests.
Entropy , 18(7):256, 2016.J.M. Stern, R. Izbicki, L.G. Esteves, and R.B. Stern. Logically-consistent hypothe-sis testing and the hexagon of oppositions.
Logic Journal of IGPL , 25(5):741–757,2017.R. Blanché.
Structures Intellectuelles: Essai sur l’Organisation Systématique des Con-cepts . Vrin, 1966.Jean-Yves Béziau. The power of the hexagon.
Logica Universalis , 6(1-2):1–43, 2012.Jean-Yves Béziau. Opposition and order. In J.Y J.Y.Béziau and K. Gan-Krzywoszynska, editors,
New Dimensions of the Square of Opposition , pages 1–11.Philosophia Verlag, 2015. 20. Carnielli and C. Pizzi.
Modalities and Multimodalities , volume 12 of
Logic, Epis-temology, and the Unity of Science . Springer, Dordrecht, 2008. ISBN 978-1-4020-6781-5.Didier Dubois and Henri Prade. On several representations of an uncertain body ofevidence. In M.M. Gupta and E. Sanchez, editors,
Fuzzy Information and DecisionProcesses , pages 167–181. Elsevier/North-Holland, 1982.Didier Dubois and Henri Prade. From Blanché’s Hexagonal Organization of Con-cepts to Formal Concept Analysis and Possibility Theory,.
Logica Universalis , 6(1-2):149–169, 2012.P. Gallais and V. Pollina. Hegaxonal and spiral structure in medieval narrative.
YaleFrench Studies , 51:115–132, 1974.P. Gallais.
Dialectique Du Récit Mediéval: Chrétien de Troyes et l’Hexagone Logique .Rodopi, 1982.Julio Michael Stern. Symmetry, invariance and ontology in physics and statistics.
Symmetry , 3(3):611–635, 2011b.Julio Michael Stern. Continuous versions of haack’s puzzles: Equilibria, eigen-statesand ontologies.
Logic Journal of IGPL , 25(4):604–631, 2017.M. H. DeGroot and M. J. Schervish.
Probability and statistics . Pearson Education,2012.J. O. Berger.
Statistical decision theory and Bayesian analysis . Springer Science &Business Media, 2013.Jay L Bucher.
The Metrology Handbook, Second Edition . ASQ Quality Press, 2nd ededition, 2012.Horst Czichos, Tetsuya Saito, and Leslie Smith.
Springer Handbook of Metrology andTesting . Springer Handbooks. Springer-Verlag Berlin Heidelberg, 2 edition, 2011.Richard Cohen, Kenneth Crowe, and Jesse DuMond.
The Fundamental Constants ofPhysics . CODATA Task Group on Fundamental Constants/ Interscience Publish-ers, 1957.Richard Cohen. Mathematical analysis of the universal physical constants.
Il NuovoCimento , 6(sup.):187–214, 1957.J.M. Lévy-Leblond. On the conceptual nature of the physical constants.
Il NuovoCimento , 7(2):187–214, 1977. 21ujdat Pakkan and Varol Akman. Hypersolver: a graphical tool for commonsenseset theory.
Information Sciences: An International Journal , 85(1):43–61, 1995.Varol Akman and Müjdat Pakkan. Nonstandard set theories and information man-agement.
Journal of Intelligent Information Systems , 6(1):5–31, 1996.Martin James Wainwright.
Stochastic processes on graphs with cycles: geometric andvariational approaches . PhD thesis, Massachusetts Institute of Technology, 2002.Christopher M. Bishop.
Pattern Recognition and Machine Learning (Information Sci-ence and Statistics) . Springer-Verlag, Berlin, Heidelberg, 2006. ISBN 0387310738.Borislav Iordanov. Hypergraphdb: A generalized graph database. In
Internationalconference on web-age information management , pages 25–36. Springer, 2010.Andrew Gelman, Aki Vehtari, Pasi Jylänki, Tuomas Sivula, Dustin Tran, Swupnil Sa-hai, Paul Blomstedt, John P Cunningham, David Schiminovich, and ChristianRobert. Expectation propagation as a way of life: A framework for bayesian in-ference on partitioned data. arXiv preprint arXiv:1412.4869 , 2014.A.J. Greimas.
Structural Semantics: An Attempt at a Method . Nebraska Univ. Press,1983.V. Propp.
Morphology of the Folktale . Texas Univ. Press, 2000.Julio Michael Stern. Jacob’s ladder and scientific ontologies.
Cybernetics & HumanKnowing , 21(3):9–43, 2014.Julio Michael Stern. Constructive verification, empirical induction, and falibilist de-duction: A threefold contrast.
Information , 2(4):635–650, 2011a.R. Abraham and J. E. Marsden.
Foundations of Mechanics . Addison-Wesley, 2013.S. Hawking.
The Illustrated On the Shoulders of Giants: The Great Works of Physicsand Astronomy . Running Press, 2004.Julio Michael Stern and Fábio Nakano. Optimization models for reaction networks:Information divergence, quadratic programming and kirchhoff’s laws.
Axioms , 3:109–118, 2014.G. M. da Silva, L. G. Esteves, V. Fossaluza, R. Izbicki, and S. Wechsler. A bayesiandecision-theoretic approach to logically-consistent hypothesis testing.
Entropy ,17(10):6534–6559, 2015. 22. Fossaluza, R. Izbicki, G. M. da Silva, and L. G. Esteves. Coherent hypothesis testing.
The American Statistician , 71(3):242–248, 2017.C. A. B. Pereira and J. M. Stern. Can a signicance test be genuinely bayesian?
Bayesian Analysis , 3(1):79–100, 2008.Julio Michael Stern and Carlos A De Bragança Pereira. Bayesian epistemic values:Focus on surprise, measure probability!
Logic Journal of IGPL , 22(2):236–254,2014.G. Casella and R. L. Berger.
Statistical inference , volume 2. Duxbury Pacific Grove,CA, 2002.S. Wechsler, R. Izbicki, and L. G. Esteves. A bayesian look at nonidentifiability: asimple example.
The American Statistician , 67(2):90–93, 2013.GH Hardy. Mendelian proportions in a mixed population. 1908.
The Yale journal ofbiology and medicine , 76(2):79, 2003.H. Brentani, E. Y. Nakano, C. B. Martins, R. Izbicki, and C. A. de B. Pereira. Dise-quilibrium coefficient: a bayesian perspective.
Statistical Applications in Geneticsand Molecular Biology , 10(1), 2011.S. C. Chow, F. Song, and H. Bai. Analytical similarity assessment in biosimilar studies.
The AAPS journal , 18(3):670–677, 2016.
A Proofs
Proof of Theorem 3.14.
Let
P g be logically coherent. Pick an arbitrary θ ∈ Θ andnote that, if R ( x ) ≡ P g ({ θ }), then φ RPg ({ θ }) ( x ) =
0. Since
P g is logically coherent,conclude that φ RPg ( Θ ) ( x ) ≡
0, that is,
P g ({ θ }) ⊆ P g ( Θ ). Since θ ∈ Θ was arbitrary,conclude that (cid:91) θ ∈ Θ P g ({ θ }) ⊆ P g ( Θ ) (9)Next, let R ( x ) ≡ (cid:84) θ ∈ Θ P g ({ θ }) c . For every θ ∈ Θ , φ RPg ({ θ }) ( x ) =
1. Since
P g islogically coherent, φ RPg ( Θ ) ≡
1, that is,
P g ( Θ ) ⊆ R c ≡ (cid:83) θ ∈ Θ P g ({ θ }). Conclude that P g ( Θ ) ⊆ (cid:91) θ ∈ Θ P g ({ θ }) (10)23t follows from Equations (9) and (10) that P g ( Θ ) = (cid:83) θ ∈ Θ P g ({ θ }). It also followsfrom direct calculation that, if P g ( Θ ) = (cid:83) θ ∈ Θ P g ({ θ }), then P g is logically coher-ent.
Proof of Theorem 3.17.
Let Θ ⊆ Θ P g ( f [ Θ ], d ∗ Z , (cid:178) ) = { θ ∗ ∈ Θ ∗ : ∃ θ ∗ ∈ f [ Θ ] s.t. d ∗ Z ( θ ∗ , θ ∗ ) ≤ (cid:178) } = { θ ∗ ∈ Θ ∗ : ∃ θ ∗ ∈ f [ Θ ] s.t. d Z ( f − ( θ ∗ ), f − ( θ ∗ )) ≤ (cid:178) } = f [{ θ ∈ Θ : ∃ θ ∈ Θ s.t. d Z ( θ , θ ) ≤ (cid:178) }] = f (cid:163) P g ( Θ , d Z , (cid:178) ) (cid:164) Proof of Theorem 3.19.
Since the Z i ’s are i.i.d., KL m ( θ , θ ∗ ) = m KL Z ( θ , θ ∗ ). It fol-lows that P g ( Θ , K L m , (cid:178) ) = (cid:91) θ ∈ Θ P g ({ θ },KL m , (cid:178) ) = (cid:91) θ ∈ Θ P g ({ θ }, m KL Z , (cid:178) ) = (cid:91) θ ∈ Θ (cid:169) θ ∗ ∈ Θ : K L Z ( θ , θ ∗ ) ≤ m − (cid:178) (cid:170) Thus, (cid:161)
P g ( Θ , K L m , (cid:178) ) (cid:162) m ≥ is a non-increasing sequence of sets. It follows thatlim m →∞ P g ( Θ , K L m , (cid:178) ) = (cid:92) m ≥ (cid:91) θ ∈ Θ (cid:169) θ ∗ ∈ Θ : KL Z ( θ , θ ∗ ) ≤ m − (cid:178) (cid:170) = (cid:91) θ ∈ Θ (cid:92) m ≥ (cid:169) θ ∗ ∈ Θ : KL Z ( θ , θ ∗ ) ≤ m − (cid:178) (cid:170) = (cid:91) θ ∈ Θ (cid:169) θ ∗ ∈ Θ : KL Z ( θ , θ ∗ ) = (cid:170) = (cid:91) θ ∈ Θ { θ } = Θ where the next-to-last equality follows from the assumption that ( F θ ) θ ∈ Θ is identi-fiable. The proofs for the C D divergence follows from the fact that TV( (cid:80) θ , (cid:80) θ ∗ ) ≤ (cid:112) KL( (cid:80) θ , (cid:80) θ ∗∗