Resolving the induction problem: Can we state with complete confidence via induction that the sun rises forever?
aa r X i v : . [ s t a t . O T ] J a n Resolving the induction problem: Can we state withcomplete confidence via induction that the sun risesforever?
Youngjo Lee, ∗ Department of Statistics, Seoul National University, ∗ [email protected] Induction is a form of reasoning from the particular example to the generalrule. However, establishing the truth of a general proposition is problematic,because it is always possible that a conflicting observation to occur. This prob-lem is known as the induction problem. The sunrise problem is a quintessen-tial example of the induction problem, which was first introduced by Laplace(1814). However, in Laplaces solution, a zero probability was assigned to theproposition that the sun will rise forever, regardless of the number of obser-vations made. Therefore, it has often been stated that complete confidenceregarding a general proposition can never be attained via induction. In thisstudy, we attempted to overcome this skepticism by using a recently developedtheoretically consistent procedure. The findings demonstrate that through in-duction, one can rationally gain complete confidence in propositions based onscientific theory. ntroduction In the era of artificial intelligence, learning from experience (induction from data) becomes cru-cial to drawing valid inferences. Induction is a form of reasoning from particular examples tothe general rule, in which one infers a proposition based on data. Originally, the goal of sciencewas to prove propositions (scientific theory) such as ”all ravens are black”, or to infer themfrom observational data. However, the difficulty of deriving such inductive logics has been rec-ognized as the problem of induction since the Greek and Roman periods. Pyrrhonian skepticSextus Empiricus questioned the validity of inductive reasoning, ”positing that a universal rulecould not be established from an incomplete set of particular instances. Specifically, to establisha universal rule from particular instances by means of induction, either all or some of the par-ticulars can be reviewed. If only some of the instances are reviewed, the induction may not bedefinitive as some of the instances omitted in the induction may contravene the universal fact;however, reviewing all the instances may be nearly impossible as the instances are infinite andindefinite (Heinemann, 1933, p. 283)”. Hume (1748) argued that inductive reasoning cannotbe justified rationally because it presupposes that the future will resemble the past. Kant (1781)proposed a resolution to the induction problem, which involved considering the propositions asvalid, absolutely a priori . Popper (1959) suggested realizing the falsification of propositionsinstead of proving them or trying to view them as valid. Broad (1923) stated that ”induction isthe glory of science but the scandal of philosophy”. However, induction may also be viewed asa scandal of science if there is no way to confirm scientific theories with complete confidencevia induction. Therefore, in this study, we attempt to demonstrate that if a scientific theory onhow the future data are generated is available, an inductive reasoning can be justified rationallybased on the scientific theory, whose role is an axiom in mathematics.The question addressed in this work is whether via scientific induction, a scientist (or ar-2ificial intelligence) can obtain complete confidence regarding a general proposition. We useprobability as the main tool, and for the purpose of this work, we limit ourselves to two con-cepts of probability. The first concept is Kolmogorov’s (1933) formal mathematical probabilityof random events such as coin tossing, which relates to the long-run rate of observable events(Von Mises, 1928); this aspect involves the P-value and coverage probabilities of the confidenceintervals. The second concept concerns the logical probability of a proposition, developed forthe scientific induction being true. Bayes (1763) introduced a logical probability; however, hemight not have embraced the broad application scope now known as Bayesianism, which wasin fact pioneered and popularized by Laplace (1824) as an inverse probability. Bayesianismbeen applied to all types of propositions in scientific and other fields (Paulos, 2011). Savage(1954) provided an axiomatic basis for the Bayesian probability as a subjective probability. Itis of interest to derive an objective logical probability. Fisher (1930) developed an alternativelogical probability, namely, fiducial probability, which is based on the P-value. Neyman (1937)introduced the idea of confidence, represented by the coverage probabilities of confidence in-tervals. The confidence allows for a frequentist interpretation of the long-run rate of coverageif the confidence intervals are repeatedly produced over different observations. Schweder andHjort (2016) viewed the confidence as the Neymanian interpretation of Fisher’s logical proba-bility. Recently, there has been a surge of renewed interest in the confidence as an estimate ofthe logical probability (Xie and Singh, 2013).An inductive logic is based on the idea that the probability represents a logical relationbetween the proposition and the observations. Accordingly, a theory of induction should explainhow one can ascertain that certain observations establish a degree of belief strong enough toconfirm a given proposition. Let G be a general proposition, such as ”all ravens are black” or”the sun rises forever”, and E be a particular proposition or an observation (evidence) such as”the raven in front of me is black” or ”the sun rises tomorrow”. Then, we can use the logical3robability to represent a deductive logic: P ( E | G ) = 1 and P ( not E | G ) = 0 . The logical probability can be quantified as a number between 0 and 1, where 0 indicates impos-sibility (the proposition is false) and 1 indicates certainty (complete confidence; the propositionis true). Thus, deductive reasoning can help attain complete confidence, provided that the basicpremises such as the axioms are true. The use of the logical probability for scientific reasoningwas proposed by Cox (1946).The logical probability allows us to represent the inductive logic as follows: P ( G | not E ) = P ( not E | G ) P ( G ) /P ( not E ) = 0 , (1) P ( G | E ) = P ( E | G ) P ( G ) /P ( E ) = P ( G ) /P ( E ) ≥ P ( G ) , (2)provided that the denominators are not zeros. From (1), we see that one observation of a non-black raven can certainly falsify the general proposition. Popper (1959) saw this falsifiabilityof a proposition as a criterion for scientific theory; if a theory is falsifiable, it is scientific,and if not, then this theory is unscientific. From (2), we see that a particular observation cancorroborate the general proposition.Laplace (1814) elaborated on the Bayesian approach to compute the logical probability.However, Broad (1918) indicated that Laplace’s solution involved the assignment of a zeroprobability to the general proposition, regardless of the number of observations made. To thisend, in this study, the confidence resolution of the induction problem was realized by demon-strating that complete confidence on a general proposition can be achieved via induction basedon a finite number of observations. We also demonstrate that the attained complete confidenceis theoretically consistent. 4 aplace solution to the sunrise problem The Bernoulli model was developed for random binary events such as coin tossing. Suppose thata coin was tossed, but the outcome is unknown. The logical probability for a general propositionwould be like the probability of a coin toss, whose outcome is unknown as the truthfulness ofgeneral proposition is unknown. Laplace (1814) demonstrated how to compute such an actuallogical probability, based on the data. He used the Bernoulli model as an instant of a scientifictheory for the sunrise problem. Let θ be the long-run frequency of sunrises, i.e., the sun riseson 100 × θ % of days. Under the Bernoulli model, the general proposition G that the sun risesforever is equivalent to the hypothesis θ = 1 . The general proposition for which θ = 1 is then aPopper scientific theory because it can be falsified if a conflicting observation, i.e., one day ofno sunrise, occurs. With a finite sample based on observations until now, could this Bernoullimodel allow for complete confidence on θ = 1? Prior to the knowledge of any sunrise, suppose that one is completely ignorant of the valueof θ . Laplace (1814) represented this prior ignorance by means of a uniform prior π ( θ ) = 1 on θ ∈ [0 , . This uniform prior was first proposed by Bayes (1763). Given the value of θ and noother information relevant to the question of whether the sun will rise tomorrow, the probabilityof the particular proposition E that the sun will rise tomorrow is θ . However, we do not knowthe true value of θ . Thus, let T n be the number of sunrises in n days. We are provided withthe observed data that the sun has risen every day on record ( T n = n ) . Laplace, based on ayoung-earth creationist reading of the Bible, inferred the number of days by considering thatthe universe was created approximately 6000 years ago. The Bayes–Laplace rule defines theposterior, logical probability given data, P ( θ | T n = n ) = π ( θ ) θ n R π ( θ ) θ n dθ = θ n R θ n dθ = ( n + 1) θ n , which is a proper probability for θ ; consequently, the probability statements for θ can be estab-5ished from this posterior. As described in the supplementary materials, given n = 6000 ×
365 =2 , , days of consecutive sunrises, the logical probability of E is P ( E | T n = n ) = Z θP ( θ | T n = n ) dθ = n + 1 n + 2 = 21900012190002 = 0 . . The probability of this particular proposition, that is, the sun rising the next day, is eventuallyone as the number of observations increases. However, this aspect is not sufficient to confirm thegeneral proposition G that the sun rises forever. Broad (1918) showed that P ( G | T n = n ) = 0 for all n ; there is no justification whatsoever for attaching even a moderate probability to ageneral proposition if the possible instances of the rule are many times more numerous thanthe instances already investigated (See Senn (2003) for a more thorough discussion). Thus,the Bayes–Laplace rule cannot overcome the degree of skepticism raised by Hume (1748).Popper (1959, p 383) concluded that the presence of observations cannot alter the zero logicalprobability. In Carnap’s inductive logic (1950), the degree of confirmation of every universallaw is always zero. Therefore, the universal law cannot be accepted, but is not rejected untilconflicting evidence appears. Jeffreys’ resolution
The fact that laws cannot be confirmed via scientific induction based on the Bayes–Laplacerule means that the choice of the prior had been wrong. Jaynes (2003) argued that a beta priordensity,
Beta ( α, β ) = θ α − (1 − θ ) β − /beta ( α, β ) , with beta ( · , · ) being the beta function, α > and β > , describes the state of knowledge that we have observed α successes and β failuresprior to the experiment. The Bayes–Laplace uniform prior π ( θ ) = 1 is the Beta (1 , prior,which means that the experiment is a true binary one in the sense of physical possibility. Thisphenomenon explains why we cannot attain complete confidence of G by using the Bayes–Laplace rule. The Beta (1 , prior means that a trustworthy manufacturer sent you a coin with6nformation that he/she observed one head and one tail in two trials before sending the coin.Even if you have an experiment with heads only for many trials, there is no way to attaincomplete confidence on θ = 1 , unless you discard the manufacturer’s information. In this case,Jeffreys’ prior (1939) is Beta (1 / , / , but any prior with α > and β > cannot overcomethe degree of skepticism, i.e., P ( G | T n = n ) = 0 .Jeffreys’ (1939) resolution was another prior, which places a mass 1/2 on the general propo-sition θ = 1 and a uniform prior on [0,1) with 1/2 weight. Then, as described in the supplemen-tary materials, we have P ( E | T n = n ) = ( n + 1)( n + 3)( n + 2) and P ( G | T n = n ) = n + 1 n + 2 . Jeffreys’ resolution produces an important innovation of the Bayes factor for hypothesis testing(Etz and Wagenmakers, 2017). Senn (2009) considered Jeffreys’ (1939) work as ”a touch ofgenius, necessary to rescue the Laplacian formulation of induction”, by allowing P ( G | T n = n ) > . According to Jeffrey resolution with a prior P ( G ) = 1 / , P ( G ) = 1 / < P ( G | T = 1) = 2 / < P ( G | T = 2) = 3 / < · · · , and thus, P ( G | T n = n ) increases to one eventually. Using this resolution, a hypothesis cannotbe rejected. However, the scientific induction cannot attain complete confidence even in this eraof big data, because such a process requires infinite evidence. Confidence resolution
Different priors lead to different logical probabilities. Savage (1954) interpreted these probabil-ities as subjective probabilities, depending upon personal preferences. It can be controversialto allow personal preferences in scientific induction. The question is, however, whether wecan form an objective logical probability without presupposing a prior. Newton and Einstein7ay not believe a priori that their laws are true with half of their personal probability. Fisher(1930) derived, when T n is continuous, an alternative approach using the P-value which hasbeen widely used for scientific inferences. As shown in the supplementary materials, we de-rive a logical probability using Pawitan’s (2003, Chapter 5) right-side P-value P ( T n ≥ t | θ ) fordiscrete T n . This P-value leads to a logical probability P ( E | T n = n ) = 1 , so that P ( G | T n = n ) = 1 . According to this confidence resolution, P ( G | T = 1) = P ( G | T = 2 = 1) = · · · = P ( G | T n = n ) = 1 , and P ( G | T i = i ) ≤ P ( G | T i +1 = i + 1) = 0 for any i ≤ n. This allows the realization ofcomplete confidence even with a finite n . As shown in the supplementary materials, a priorcan be induced from the logical probability, which can be obtained from the Bayes–Laplacerule using the induced prior. The right-side P-value P ( T n ≥ t | θ ) is an unobservable randomvariable because the true value of θ is unknown, and according to Pawitan and Lee (2020) theBayes–Laplace rule is also an update rule for the likelihood for unobservable events (Lee et al.,2017).In the supplementary materials, we demonstrate that the confidence leads to two potentialinduced priors, specifically, Beta (0 , and Beta (1 , . Although these priors are improper, R π ( θ ) dθ = ∞ , they allow a reasonable interpretation; for example, the Beta (0 , prior in-dicates that only one failure is observed a priori . Thus, if we observe all the failures, it islegitimate to attain complete confidence on θ = 0 . However, even if we observe all the suc-cesses, we can never attain complete confidence on θ = 1 because of the failure a priori . The
Beta (1 , prior exhibits the contrasting property.8he confidence resolution leads to simultaneous hypothesis testing and estimation of theconfidence density (or probability) function for the confidence intervals. As described in thesupplementary materials, for example, under the induced prior Beta (1 , , the Bayes–Laplacerule leads to a posterior (confidence density), for t = 0 , , · · · , n − P ( θ | T n = t ) = θ t (1 − θ ) n − t − /beta ( t + 1 , n − t ) . From this confidence density, we can form a confidence interval for θ , and the actual cover-age rates in finite sample were reported by Pawitan (2001; Chapter 5). When T n = n, thediscrete posterior (confidence) P ( θ = 1 | T n = n ) = 1 is attained. Thus, with T n = n , the100% confidence interval for θ is { } . In the recent developments of the confidence theory, theconfidence density is viewed as an estimate of the true but unknown logical probability, leadingto consistent interval estimation. The coverage probability is a long-run rate of the coverageof the confidence interval in hypothetical repetitions. Thus, the confidence concept is a bridgebetween the Kolmogorov and logical probabilities. We may view the Bayesian posteriors asestimates of the true logical probability (confidence). In this case, the consistency of the esti-mations becomes important.
Oracle hypothesis testing and confidence estimation
Our confidence resolution provides an extension of the oracle property of a recent interest forsimultaneous hypothesis testing and point estimation (Fan and Li, 2001). The oracle works as ifit is known in advance whether the general proposition is true or not. We may define the oracleproperty as the attainment of complete confidence in finite samples. Thus, P ( G | T n = n ) = 1 is an oracle estimator of the true logical probability. To form an oracle procedure, Lee and Oh(2014) proposed the use of a prior such as Beta (1 , , which is infinite and not differentiableat θ = 1 . An advantage of using such a prior in the change point problem is that this allows9 simultaneously consistent estimation of the number of change points and their locations andsizes (Ng et al ., 2018). The induced prior
Beta (1 , provides simultaneous hypothesis testingfor the two hypotheses H : θ = 1 versus H : θ = 1 and estimation of the logical probability(confidence). When T n = n, the confidence resolution ensures that H can be accepted withcomplete confidence. When T n = n H is accepted with complete confidence, and a confidenceinterval for θ can be formed using the confidence density . The coverage probability statementsof these intervals are consistent, which maintain the stated level as n increases. Note thatJeffreys’ resolution cannot accept H with complete confidence (see supplementary materialsfor detailed discussion).In response to the skepticism raised by Hume (1748), Kant (1781) proposed the considera-tion of the general proposition as absolutely valid, a priori , which is otherwise drawn from thedubious inferential inductions. In contrast Bayes (1763) and Laplace (1814) presumed a priori that the general proposition is false. Thus, Kant’s proposal is consistent only if the generalproposition is true, whereas the Bayes–Laplace rule is consistent only if the general propositionis false. It is not necessary a priori to presume P ( G ) = 0 or 1. The term ”confirmation” hasbeen used in the epistemology and philosophy of science whenever the observational data (evi-dence) support scientific theories. Many Bayesian confirmation measures have been proposed.For example, Carnap’s (1950) degree of confirmation of the general proposition G by the evi-dence E is C ( G, E ) = P ( G | E ) − P ( G ) . Because C ( G, E ) ≤ − P ( G ) = P ( not G ) ≤ ,Popper (1959) equated the confirmability with ”refutability”. We see that P ( G ) = 0 leads to P ( G | T n = n ) = 0 in the sunrise problem.In Jeffreys’ (1939) resolution with P ( G ) = 1 / , C ( G, T n = n ) = P ( G | T n = n ) − P ( G ) = n/ { n + 2) } > , and thus the evidence T n = n confirms the general theory G positively.However, in the confidence resolution, although the prior P ( G ) is not defined, complete con-fidence (confirmation) P ( G | T n = n ) = 1 is achieved. Both the resolutions are consistent,10egardless of whether G is true or false. However, the former is not an oracle because it cannotattain complete confidence on the general proposition in a finite sample whereas the latter can.In this study, the Bayesian prior, Bayes rule, Fisherian P-value, fiducial probability and Neyma-nian confidence ideas are combined to resolve the induction problem, facilitating the attainmentof complete confidence. Concluding remarks
Through deduction, one can achieve complete confidence regarding a particular proposition EP ( E | G ) = 1 , provided that the general proposition G is true. Through induction, we see that one can attaincomplete confidence regarding the general proposition P ( G ) ≤ P ( G | E )(= P ( G | T n = n )) = 1 . Considering the information provided by the data, we can be certain that the sun rises forever,provided that the assumed scientific (binomial) model is true. (Of course, in physics, the sunruns out of energy, and the solar system vanishes eventually.) To establish the universal lawsfrom the particular instances by means of induction, scientists (or artificial intelligence) do notneed to review all the instances but to establish a scientific model pertaining to the generation ofthe instances. To confirm the validity of the general relativity theory, the observational evidenceof light bending was obtained in 1919 and the astrophysical measurement of the gravitationalredshift was obtained in 1925. Thus, a new theory was confirmed based on a few observations.The oracle confidence resolution shows that such an inductive reasoning is theoretically consis-tent and therefore rational. More supporting evidence corroborates the consistency of the oracleestimation of the logical probability. If a long existing scientific theory has not been refuted by11onflicting evidence, it is theoretically consistent to claim complete confidence regarding thegeneral propositions derived by the existing scientific theory. The role of the scientific theoryin a scientific induction is the same as that of an axiom in a mathematical deduction. Thus, ascientific theory can be falsified or confirmed via induction. If one drops an apple, one can besure that it will fall unless the Newtonian laws suddenly stops to hold. Indeed, induction can bethe glory of both science and philosophy.
References and Notes
1. T. Bayes, An essay toward solving a problem in the doctrine of chance.
Phil. Trans, Roy.Soc.
Mind , , 389-404 1918.3. C. D. Broad, Scientific thought (Brace and Co, New York, 1963).4. R. Carnap,
Logical Foundation of Probability , (University of Chicago Press, 1950)5. R. T. Cox, Probability, frequency, and reasonable expectation.
American journal ofphysics. , 1-13 (1946).6. A. Etz and E. Wagenmakers, Haldane’s contribution to the Bayes factor hypothesis test.
Statistical Science. , 313-329. (2017)7. J. Fan, and R. Li, R. Variable selection via nonconcave penalized likelihood and its oracleproperties.
Journal of American Statistical Association,
Proceedings of the Cambridge Philosophical Society , , 528–535 (1930). 12. J. B. S. Haldane, A note on inverse probability. Mathematical Proceedings of the Cam-bridge Philosophical Society. , 55-61 (1932).10. W. Heinemann,
Sextus Empiricus. Outlines of Pyrrhonism , (Robert Gregg Bury, 1933)11. D. Hume,
An Enquiry concerning Human Understanding . (P.F. Collier & Son, 1910)(1748)12. E. T. Jaynes,
Probability Theory: The Logic of Science. (Cambridge University Press,2003)13. H. Jeffreys,
Theory of Probability . (Oxford University Press, 1939)14. I. Kant,
Critique of Pure Reason . Kitcher, Patricia (intro.); Pluhar, W. (trans.) Indianapo-lis: Hackett. xxviii, 1996. (1781)15. A. N. Kolmogorov,
Foundations of the Theory of Probability , Second English Edition.Chelsea, 1956 (1933)16. P. S. Laplace, A Philosophical Essay on Probabilities, English translation of the 6thFrench Edition, Dover, 1951. (1814)
17. Y. Lee, J. A. Nelder and Y. Pawitan,
Generalized Linear Models with Random Effects:Unified Analysis via H-likelihood.
Journal ofMultivariate Analysis . , 89-99. (2014)19. J. Neyman, Note on an article by Sir Ronald Fisher”. Journal of the Royal StatisticalSociety . Series B (Methodological) 18 (2), 288-294. (1956).130. T. Ng, W. Lee and Y. Lee, Change-point estimators with true identification property.
Bernoulli Journal , , 616-660. (2018).21. J. A. Paulos, The Mathematics of Changing Your Mind , (
New York Times (US), 5 August2011).22. Y. Pawitan,
In all likelihood: Statistical modelling and inference using likelihood . (Ox-ford University Press, 2001)23. Y. Pawitan and Y. Lee, Confidence as likelihood, To appear
Statistical Science (2020)24. K. R. Popper,
New Appendices to the Logic of Scientific Discovery, 6th revised impressionof the 1959 English translation, Hutchinson, 1972. (1959).25. L. J. Savage,
Foundations of Statistics , (Wiley & Son, 1954)26. S. Senn,
Dicing with Death: Chance, Risk and Health (Cambridge University Press,2003)
27. S. Senn, Comments on ”Harold Jeffreys’s theory of probability revisited”.
StatisticalScience . (2009)28. T. Schweder and N. L. Hjort,
Confidence, Likelihood, Probability. (Cambridge UniversityPress, 2016)29. R. Von Mises,
Probability, Statistics and Truth , 2nd revised English edition, Allen andUrwin, 1961. (1928)30. M. Xie, M. and K. Singh, Confidence distribution, the frequentist distribution estimatorof a parameter: a review.
International Statistical Review , 81, 3-39. (2013).14 r X i v : . [ s t a t . O T ] J a n Supplementary materials
Bayesian approach
Laplace (1814) used the Bernoulli model as an instant of scientific theory for the sunrise prob-lem. Let X = ( X , · · · , X n ) be independent and identically distributed Bernoulli randomvariables with the success probability θ. Once we observed data x = ( x , · · · , x n ) , we have thelikelihood L ( x, θ ) = P ( X = x | θ ) = θ t (1 − θ ) n − t , (1)where t = P x i . Prior to knowing of any sunrise, suppose that one is completely ignorant ofthe value of θ . Laplace (1814) represented this prior ignorance by means of a uniform prior π ( θ ) = 1 on θ ∈ [0 , . Let T n = P X i . To find the logical conditional probability of θ given T n = n , one uses the Bayes-Laplace rule: The conditional probability distribution of θ giventhe data x is called the posterior P ( θ | x ) = π ( θ ) L ( x, θ ) R L ( x, θ ) π ( θ ) dθ = π ( θ ) L ( x, θ ) P ( x ) , (2)where P ( x ) = Z L ( x, θ ) π ( θ ) dθ = Z θ t (1 − θ ) n − t dθ = b ( t + 1 , n − t + 1) , where b ( · , · ) is the beta function. Let E be a particular proposition that the sun rises tomorrow.Then, P ( T n = n ) = b ( n + 1 ,
1) = 1 / ( n + 1) , to give P ( E | It has risen n consecutive days )= P ( X n +1 = 1 | T n = n ) = n + 1 n + 2 = 21900012190002 = 0 . . This shows that P ( E | T n = n ) → as n → ∞ . One observation increases the probability of the particular proposition from n/ ( n + 1) to ( n +1) / ( n + 2) , so that the increment of probability is ( n + 1) / ( n + 2) − n/ ( n + 1) = 1 / { ( n + 1)( n + ) } . Thus, the probability of this particular proposition will be eventually one. However, this isnot enough to ensure the general proposition G that the sun rises forever holds (Senn, 2003).The probability that the sun rises in the next m consecutive days, given the previous n consecutive sunrises, is P ( X n +1 = 1 , ..., X n + m = 1 | T n = n ) = P ( T n + m = n + m | T n = n ) = n + 1 n + m + 1 . As long as n is finite , the probability of the general proposition G becomes zero because for all n > P ( G | T n = n ) = lim m →∞ P ( T n + m = n + m | T n = n ) = 0 . Consider Jeffreys’s prior, which places 1/2 of probability on θ = 1 and puts a uniform prior on[0,1) with 1/2 probability. Then, P ( T n = n ) = 0 . { / ( n + 1) } to give P ( E | T n = n ) = P ( T n +1 = n + 1 | T n = n ) = 1 + 1 / ( n + 2)1 + 1 / ( n + 1) = ( n + 1)( n + 2)( n + 2) ,P ( T n + m = n + m | T n = n ) = 1 + 1 / ( n + m + 1)1 + 1 / ( n + 1) = ( n + 1)( n + m + 2)( n + 2)( n + m + 1) . Thus, P ( G | T n = n ) = n + 1 n + 2 . Confidence approach
Let T be a continuous sufficient statistics for parameter θ. Let t be an observed value of T. Define the right-side P-value function C ( t, θ ) = P ( T ≥ t | θ ) . Given t , as a function of θ, C ( t, −∞ ) = 0 and C ( t, ∞ ) = 1 and C ( t, θ ) is a strictly increasingfunction of θ. Thus, C ( t, θ ) behaves as if it is the cumulative distribution of θ . This leads to theconfidence density for θ , analogous to the Bayesian posterior P ( θ | t ) = dC ( t, θ ) /dθ, which is the fiducial probability of Fisher (1930), not using a subjective prior. Schweder andHjort (2016) called it the confidence density.Fisher considered cases for T to be continuous. However, in practical applications T doesnot have to be continuous. In the Bernoulli model, T n is a sufficient statistics but is discrete. The2o-called ‘exact inference’ from discrete data can be expressed in terms of confidence density.Consider the conservative right-side P-value C ( t, θ ) = P ( T n ≥ t | θ ) = n X y = t n ! y !( n − y )! θ y (1 − θ ) n − y = R θ x t − (1 − x ) n − t dxb ( t, n − t + 1) . This leads to the confidence density P ( θ | t ) = θ t − (1 − θ ) n − t /b ( t, n − t + 1) , which gives a conservative confidence interval (Pawitan, 2001; Chapter 5). See Pawitan andLee (2020) for a more thorough discussion.Because the confidence density is analogous to the posterior, the induced model prior is c ( θ ) ∝ P ( θ | t ) /L ( θ ) = θ − , which is objective because it is directly obtained solely from the model. However, the modelprior c ( θ ) ∝ θ − , namely the Beta (0 , distribution , is improper to give R c ( θ ) dθ = ∞ . Necessary computations of
Beta (0 , can be obtained as the limit of proper Beta ( a, distri-bution for a > . The confidence density can be viewed as the posterior (2) but replacing theconfidence prior c ( θ ) by the Bayesian prior π ( θ ) .This leads to P ( T n = n ) = Z θ n − dθ = 1 /n to give P ( X n +1 = 1 | T n = n ) = P ( T n +1 = n + 1) P ( T n = n ) = nn + 1 ,P ( T n + m = n + m | T n = n ) = P ( T n + m = n + m ) P ( T n = n ) = nn + m . Given n, P ( G | T n = n ) = lim m →∞ P ( T n + m = n + m | T n = n ) = 0 . Thus, this confidencedensity cannot overcome the degree of scepticism yet.Now apply the confidence to the transformed data, by defining Y i = 0 if the sun rises in theith day, where P ( Y i = 1) = θ ∗ = 1 − θ and θ is the long-run frequency of sunrises. Thus, θ ∗ = 1 is equivalent to θ = 0 . Then, Y i = 1 − X i . Let T ∗ n = P Y i = n − T n . Then, P ( Y n +1 = 0 | T ∗ n = 0) = P ( X n +1 = 1 | T n = n ) = lim a ↓ b ( a, n + 2) b ( a, n + 1) = 1 , to give P ( G | T ∗ n = 0) = P ( G | T n = n ) = lim m →∞ P ( T ∗ n + m = 0 | T ∗ n = 0) = 1 . P ( G | T n = n ) = 1 , under the transformed data, we can say that the sun will alwaysrise with the logical probability being one. It is a little disturbing that the confidence statementdepends upon the scale of the data X i and Y i = 1 − X i to which the confidence procedure isapplied. Now we see the consequences and show that the resulting confidence procedures areconsistent.Because the confidence interval for θ ∗ is easily transformed to that of θ, we can readily applythe confidence to the transformed data. This leads to the confidence density and the Beta (1 , model prior P ( θ | t ) = P ( θ ∗ | n − t ) ∝ θ t (1 − θ ) n − t − and c ( θ ) ∝ P ( θ | t ) /L ( θ ) = (1 − θ ) − . This model prior leads to P ( X n +1 = 1 | T n = n ) = lim a ↓ b ( n + 2 , a ) b ( n + 1 , a ) = 1 , to give P ( G | T n = n ) = lim m →∞ P ( T n + m = n + m | T n = n ) = 1 . We can interpret a
Beta ( α, β ) prior with α > and β > as describing the state ofknowledge that a priori we have observed α successes and β failures. Then, P ( X n +1 = 1 | T n = n ) = Beta ( n + 1 + α, β ) Beta ( n + α, β ) = n + αn + α + β and P ( X n +1 = 0 | T n = 0) = Beta ( α, n + 1 + β ) Beta ( α, n + β ) = n + βn + α + β . Thus, P ( X n +1 = 1 | T n = n ) → as β → and P ( X n +1 = 0 | T n = 0) → as α → . Jaynes(2003) argued that the Bayes-Laplace
Beta (1 , prior is the state of knowledge in which wehave observed one success and one failure a priori the experiment. Thus, we know that theexperiment is a true binary one, in the sense of physical possibility. This explains why wecannot reach a sure confidence of the general proposition with the Bayes-Laplace uniform priorbecause one day of no sunrise is assumed a priori . Thus, it cannot be an ignorant prior. Supposethat you were sent a coin from a manufacturer who informed you that before sending the coinan experiment was done and found one head and one tail in two trials. Even if you have donean experiment with heads only for many trials, there is no way to have the sure confidence on θ = 1 if you accepts the manufacturer’s information.The model priors Beta(0,1) and Beta(1,0) allow a possibility that θ = 0 and θ = 1 , respec-tively. The Haldane (1932) prior, namely the Beta(0, 0) ∝ θ − (1 − θ ) − , means that no success4nd no failure has been observed a priori , so that it presumes that either θ = 1 or θ = 0 ispossible. Under Haldane’s prior P ( X n +1 = 1 | T n = n ) = 1 and P ( X n +1 = 0 | T n = 0) = 1 to give P ( G | T n = n ) = 1 . If you were not given any information from the manufacturer, it is natural to have the sureconfidence on θ = 1 , provided you have not observed a single tail in your own experiment.Note here that with the right-side P-value, for any θ ∈ [0 , C (0 , θ ) = P ( T n ≥ | θ ) = 1 . This means that θ has a point mass at zero given t = 0 , leading to the 100 % confidenceinterval for θ, given t = 0 , is { } . With the transformed data, we can show that it has the pointmass at θ = 1 given t = n, leading to the 100 % confidence interval for θ, given t = n, is { } . The general proposition such as θ = 0 or 1 is scientific because it can be falsified if aconflicting observation appears. Sure confidence for the general proposition such as θ = 1 or is consequence of the scientific procedure, the P-value. Oracle hypothesis testing and confidence estimation
The induced prior
Beta (1 , gives indeed simultaneous hypothesis testing for the two hypothe-ses H : θ = 1 versus H : θ = 1 and estimation of the logical probability (confidence). When T n = n, the confidence resolution allows that H can be accepted with a sure confidence. When T n = n H is accepted with a sure confidence and using the confidence density a confidenceinterval for θ can be formed . Coverage probability statements of these intervals are consistent,which maintain the stated level as n increases. Thus, when θ = 1 , we can say it is true withthe sure confidence provided T n = n . When θ = 1 , T n cannot be n if n is sufficiently large,so that we cannot achieve the sure confidence on θ = 1 with a large n . It is therefore consis-tent whether θ = 1 or not. With the Haldane (1932) prior, Beta (0 , , the resulting proceduregives a simultaneous hypothesis testing and confidence estimation: When T n = n ( T n = 0) ,H : θ = 1 ( H : θ = 0) is accepted with 100 % confidence interval { } ( { } ), and when ≤ T n ≤ n − the hypothesis H : θ ∈ (0 , is accepted with an estimator of the logicalprobability (confidence density) P ( θ | T n = t ) = θ t − (1 − θ ) n − t − /beta ( t, n − t ) ,,