[PDF] Resolving the induction problem: Can we state with complete confidence via induction that the sun rises forever?

Abstract

Induction is a form of reasoning from the particular example to the general rule. However, establishing the truth of a general proposition is problematic, because it is always possible that a conflicting observation to occur. This problem is known as the induction problem. The sunrise problem is a quintessential example of the induction problem, which was first introduced by Laplace (1814). However, in Laplace's solution, a zero probability was assigned to the proposition that the sun will rise forever, regardless of the number of observations made. Therefore, it has often been stated that complete confidence regarding a general proposition can never be attained via induction. In this study, we attempted to overcome this skepticism by using a recently developed theoretically consistent procedure. The findings demonstrate that through induction, one can rationally gain complete confidence in propositions based on scientific theory.

Full PDF

aa r X i v : . [ s t a t . O T ] J a n Resolving the induction problem: Can we state withcomplete conﬁdence via induction that the sun risesforever?

Youngjo Lee, ∗ Department of Statistics, Seoul National University, ∗ [email protected] Induction is a form of reasoning from the particular example to the generalrule. However, establishing the truth of a general proposition is problematic,because it is always possible that a conﬂicting observation to occur. This prob-lem is known as the induction problem. The sunrise problem is a quintessen-tial example of the induction problem, which was ﬁrst introduced by Laplace(1814). However, in Laplaces solution, a zero probability was assigned to theproposition that the sun will rise forever, regardless of the number of obser-vations made. Therefore, it has often been stated that complete conﬁdenceregarding a general proposition can never be attained via induction. In thisstudy, we attempted to overcome this skepticism by using a recently developedtheoretically consistent procedure. The ﬁndings demonstrate that through in-duction, one can rationally gain complete conﬁdence in propositions based onscientiﬁc theory. ntroduction In the era of artiﬁcial intelligence, learning from experience (induction from data) becomes cru-cial to drawing valid inferences. Induction is a form of reasoning from particular examples tothe general rule, in which one infers a proposition based on data. Originally, the goal of sciencewas to prove propositions (scientiﬁc theory) such as ”all ravens are black”, or to infer themfrom observational data. However, the difﬁculty of deriving such inductive logics has been rec-ognized as the problem of induction since the Greek and Roman periods. Pyrrhonian skepticSextus Empiricus questioned the validity of inductive reasoning, ”positing that a universal rulecould not be established from an incomplete set of particular instances. Speciﬁcally, to establisha universal rule from particular instances by means of induction, either all or some of the par-ticulars can be reviewed. If only some of the instances are reviewed, the induction may not bedeﬁnitive as some of the instances omitted in the induction may contravene the universal fact;however, reviewing all the instances may be nearly impossible as the instances are inﬁnite andindeﬁnite (Heinemann, 1933, p. 283)”. Hume (1748) argued that inductive reasoning cannotbe justiﬁed rationally because it presupposes that the future will resemble the past. Kant (1781)proposed a resolution to the induction problem, which involved considering the propositions asvalid, absolutely a priori . Popper (1959) suggested realizing the falsiﬁcation of propositionsinstead of proving them or trying to view them as valid. Broad (1923) stated that ”induction isthe glory of science but the scandal of philosophy”. However, induction may also be viewed asa scandal of science if there is no way to conﬁrm scientiﬁc theories with complete conﬁdencevia induction. Therefore, in this study, we attempt to demonstrate that if a scientiﬁc theory onhow the future data are generated is available, an inductive reasoning can be justiﬁed rationallybased on the scientiﬁc theory, whose role is an axiom in mathematics.The question addressed in this work is whether via scientiﬁc induction, a scientist (or ar-2iﬁcial intelligence) can obtain complete conﬁdence regarding a general proposition. We useprobability as the main tool, and for the purpose of this work, we limit ourselves to two con-cepts of probability. The ﬁrst concept is Kolmogorov’s (1933) formal mathematical probabilityof random events such as coin tossing, which relates to the long-run rate of observable events(Von Mises, 1928); this aspect involves the P-value and coverage probabilities of the conﬁdenceintervals. The second concept concerns the logical probability of a proposition, developed forthe scientiﬁc induction being true. Bayes (1763) introduced a logical probability; however, hemight not have embraced the broad application scope now known as Bayesianism, which wasin fact pioneered and popularized by Laplace (1824) as an inverse probability. Bayesianismbeen applied to all types of propositions in scientiﬁc and other ﬁelds (Paulos, 2011). Savage(1954) provided an axiomatic basis for the Bayesian probability as a subjective probability. Itis of interest to derive an objective logical probability. Fisher (1930) developed an alternativelogical probability, namely, ﬁducial probability, which is based on the P-value. Neyman (1937)introduced the idea of conﬁdence, represented by the coverage probabilities of conﬁdence in-tervals. The conﬁdence allows for a frequentist interpretation of the long-run rate of coverageif the conﬁdence intervals are repeatedly produced over different observations. Schweder andHjort (2016) viewed the conﬁdence as the Neymanian interpretation of Fisher’s logical proba-bility. Recently, there has been a surge of renewed interest in the conﬁdence as an estimate ofthe logical probability (Xie and Singh, 2013).An inductive logic is based on the idea that the probability represents a logical relationbetween the proposition and the observations. Accordingly, a theory of induction should explainhow one can ascertain that certain observations establish a degree of belief strong enough toconﬁrm a given proposition. Let G be a general proposition, such as ”all ravens are black” or”the sun rises forever”, and E be a particular proposition or an observation (evidence) such as”the raven in front of me is black” or ”the sun rises tomorrow”. Then, we can use the logical3robability to represent a deductive logic: P ( E | G ) = 1 and P ( not E | G ) = 0 . The logical probability can be quantiﬁed as a number between 0 and 1, where 0 indicates impos-sibility (the proposition is false) and 1 indicates certainty (complete conﬁdence; the propositionis true). Thus, deductive reasoning can help attain complete conﬁdence, provided that the basicpremises such as the axioms are true. The use of the logical probability for scientiﬁc reasoningwas proposed by Cox (1946).The logical probability allows us to represent the inductive logic as follows: P ( G | not E ) = P ( not E | G ) P ( G ) /P ( not E ) = 0 , (1) P ( G | E ) = P ( E | G ) P ( G ) /P ( E ) = P ( G ) /P ( E ) ≥ P ( G ) , (2)provided that the denominators are not zeros. From (1), we see that one observation of a non-black raven can certainly falsify the general proposition. Popper (1959) saw this falsiﬁabilityof a proposition as a criterion for scientiﬁc theory; if a theory is falsiﬁable, it is scientiﬁc,and if not, then this theory is unscientiﬁc. From (2), we see that a particular observation cancorroborate the general proposition.Laplace (1814) elaborated on the Bayesian approach to compute the logical probability.However, Broad (1918) indicated that Laplace’s solution involved the assignment of a zeroprobability to the general proposition, regardless of the number of observations made. To thisend, in this study, the conﬁdence resolution of the induction problem was realized by demon-strating that complete conﬁdence on a general proposition can be achieved via induction basedon a ﬁnite number of observations. We also demonstrate that the attained complete conﬁdenceis theoretically consistent. 4 aplace solution to the sunrise problem The Bernoulli model was developed for random binary events such as coin tossing. Suppose thata coin was tossed, but the outcome is unknown. The logical probability for a general propositionwould be like the probability of a coin toss, whose outcome is unknown as the truthfulness ofgeneral proposition is unknown. Laplace (1814) demonstrated how to compute such an actuallogical probability, based on the data. He used the Bernoulli model as an instant of a scientiﬁctheory for the sunrise problem. Let θ be the long-run frequency of sunrises, i.e., the sun riseson 100 × θ % of days. Under the Bernoulli model, the general proposition G that the sun risesforever is equivalent to the hypothesis θ = 1 . The general proposition for which θ = 1 is then aPopper scientiﬁc theory because it can be falsiﬁed if a conﬂicting observation, i.e., one day ofno sunrise, occurs. With a ﬁnite sample based on observations until now, could this Bernoullimodel allow for complete conﬁdence on θ = 1? Prior to the knowledge of any sunrise, suppose that one is completely ignorant of the valueof θ . Laplace (1814) represented this prior ignorance by means of a uniform prior π ( θ ) = 1 on θ ∈ [0 , . This uniform prior was ﬁrst proposed by Bayes (1763). Given the value of θ and noother information relevant to the question of whether the sun will rise tomorrow, the probabilityof the particular proposition E that the sun will rise tomorrow is θ . However, we do not knowthe true value of θ . Thus, let T n be the number of sunrises in n days. We are provided withthe observed data that the sun has risen every day on record ( T n = n ) . Laplace, based on ayoung-earth creationist reading of the Bible, inferred the number of days by considering thatthe universe was created approximately 6000 years ago. The Bayes–Laplace rule deﬁnes theposterior, logical probability given data, P ( θ | T n = n ) = π ( θ ) θ n R π ( θ ) θ n dθ = θ n R θ n dθ = ( n + 1) θ n , which is a proper probability for θ ; consequently, the probability statements for θ can be estab-5ished from this posterior. As described in the supplementary materials, given n = 6000 ×

365 =2 , , days of consecutive sunrises, the logical probability of E is P ( E | T n = n ) = Z θP ( θ | T n = n ) dθ = n + 1 n + 2 = 21900012190002 = 0 . . The probability of this particular proposition, that is, the sun rising the next day, is eventuallyone as the number of observations increases. However, this aspect is not sufﬁcient to conﬁrm thegeneral proposition G that the sun rises forever. Broad (1918) showed that P ( G | T n = n ) = 0 for all n ; there is no justiﬁcation whatsoever for attaching even a moderate probability to ageneral proposition if the possible instances of the rule are many times more numerous thanthe instances already investigated (See Senn (2003) for a more thorough discussion). Thus,the Bayes–Laplace rule cannot overcome the degree of skepticism raised by Hume (1748).Popper (1959, p 383) concluded that the presence of observations cannot alter the zero logicalprobability. In Carnap’s inductive logic (1950), the degree of conﬁrmation of every universallaw is always zero. Therefore, the universal law cannot be accepted, but is not rejected untilconﬂicting evidence appears. Jeffreys’ resolution

The fact that laws cannot be conﬁrmed via scientiﬁc induction based on the Bayes–Laplacerule means that the choice of the prior had been wrong. Jaynes (2003) argued that a beta priordensity,

Beta ( α, β ) = θ α − (1 − θ ) β − /beta ( α, β ) , with beta ( · , · ) being the beta function, α > and β > , describes the state of knowledge that we have observed α successes and β failuresprior to the experiment. The Bayes–Laplace uniform prior π ( θ ) = 1 is the Beta (1 , prior,which means that the experiment is a true binary one in the sense of physical possibility. Thisphenomenon explains why we cannot attain complete conﬁdence of G by using the Bayes–Laplace rule. The Beta (1 , prior means that a trustworthy manufacturer sent you a coin with6nformation that he/she observed one head and one tail in two trials before sending the coin.Even if you have an experiment with heads only for many trials, there is no way to attaincomplete conﬁdence on θ = 1 , unless you discard the manufacturer’s information. In this case,Jeffreys’ prior (1939) is Beta (1 / , / , but any prior with α > and β > cannot overcomethe degree of skepticism, i.e., P ( G | T n = n ) = 0 .Jeffreys’ (1939) resolution was another prior, which places a mass 1/2 on the general propo-sition θ = 1 and a uniform prior on [0,1) with 1/2 weight. Then, as described in the supplemen-tary materials, we have P ( E | T n = n ) = ( n + 1)( n + 3)( n + 2) and P ( G | T n = n ) = n + 1 n + 2 . Jeffreys’ resolution produces an important innovation of the Bayes factor for hypothesis testing(Etz and Wagenmakers, 2017). Senn (2009) considered Jeffreys’ (1939) work as ”a touch ofgenius, necessary to rescue the Laplacian formulation of induction”, by allowing P ( G | T n = n ) > . According to Jeffrey resolution with a prior P ( G ) = 1 / , P ( G ) = 1 / < P ( G | T = 1) = 2 / < P ( G | T = 2) = 3 / < · · · , and thus, P ( G | T n = n ) increases to one eventually. Using this resolution, a hypothesis cannotbe rejected. However, the scientiﬁc induction cannot attain complete conﬁdence even in this eraof big data, because such a process requires inﬁnite evidence. Conﬁdence resolution

Different priors lead to different logical probabilities. Savage (1954) interpreted these probabil-ities as subjective probabilities, depending upon personal preferences. It can be controversialto allow personal preferences in scientiﬁc induction. The question is, however, whether wecan form an objective logical probability without presupposing a prior. Newton and Einstein7ay not believe a priori that their laws are true with half of their personal probability. Fisher(1930) derived, when T n is continuous, an alternative approach using the P-value which hasbeen widely used for scientiﬁc inferences. As shown in the supplementary materials, we de-rive a logical probability using Pawitan’s (2003, Chapter 5) right-side P-value P ( T n ≥ t | θ ) fordiscrete T n . This P-value leads to a logical probability P ( E | T n = n ) = 1 , so that P ( G | T n = n ) = 1 . According to this conﬁdence resolution, P ( G | T = 1) = P ( G | T = 2 = 1) = · · · = P ( G | T n = n ) = 1 , and P ( G | T i = i ) ≤ P ( G | T i +1 = i + 1) = 0 for any i ≤ n. This allows the realization ofcomplete conﬁdence even with a ﬁnite n . As shown in the supplementary materials, a priorcan be induced from the logical probability, which can be obtained from the Bayes–Laplacerule using the induced prior. The right-side P-value P ( T n ≥ t | θ ) is an unobservable randomvariable because the true value of θ is unknown, and according to Pawitan and Lee (2020) theBayes–Laplace rule is also an update rule for the likelihood for unobservable events (Lee et al.,2017).In the supplementary materials, we demonstrate that the conﬁdence leads to two potentialinduced priors, speciﬁcally, Beta (0 , and Beta (1 , . Although these priors are improper, R π ( θ ) dθ = ∞ , they allow a reasonable interpretation; for example, the Beta (0 , prior in-dicates that only one failure is observed a priori . Thus, if we observe all the failures, it islegitimate to attain complete conﬁdence on θ = 0 . However, even if we observe all the suc-cesses, we can never attain complete conﬁdence on θ = 1 because of the failure a priori . The

Beta (1 , prior exhibits the contrasting property.8he conﬁdence resolution leads to simultaneous hypothesis testing and estimation of theconﬁdence density (or probability) function for the conﬁdence intervals. As described in thesupplementary materials, for example, under the induced prior Beta (1 , , the Bayes–Laplacerule leads to a posterior (conﬁdence density), for t = 0 , , · · · , n − P ( θ | T n = t ) = θ t (1 − θ ) n − t − /beta ( t + 1 , n − t ) . From this conﬁdence density, we can form a conﬁdence interval for θ , and the actual cover-age rates in ﬁnite sample were reported by Pawitan (2001; Chapter 5). When T n = n, thediscrete posterior (conﬁdence) P ( θ = 1 | T n = n ) = 1 is attained. Thus, with T n = n , the100% conﬁdence interval for θ is { } . In the recent developments of the conﬁdence theory, theconﬁdence density is viewed as an estimate of the true but unknown logical probability, leadingto consistent interval estimation. The coverage probability is a long-run rate of the coverageof the conﬁdence interval in hypothetical repetitions. Thus, the conﬁdence concept is a bridgebetween the Kolmogorov and logical probabilities. We may view the Bayesian posteriors asestimates of the true logical probability (conﬁdence). In this case, the consistency of the esti-mations becomes important.

Oracle hypothesis testing and conﬁdence estimation

Our conﬁdence resolution provides an extension of the oracle property of a recent interest forsimultaneous hypothesis testing and point estimation (Fan and Li, 2001). The oracle works as ifit is known in advance whether the general proposition is true or not. We may deﬁne the oracleproperty as the attainment of complete conﬁdence in ﬁnite samples. Thus, P ( G | T n = n ) = 1 is an oracle estimator of the true logical probability. To form an oracle procedure, Lee and Oh(2014) proposed the use of a prior such as Beta (1 , , which is inﬁnite and not differentiableat θ = 1 . An advantage of using such a prior in the change point problem is that this allows9 simultaneously consistent estimation of the number of change points and their locations andsizes (Ng et al ., 2018). The induced prior

Beta (1 , provides simultaneous hypothesis testingfor the two hypotheses H : θ = 1 versus H : θ = 1 and estimation of the logical probability(conﬁdence). When T n = n, the conﬁdence resolution ensures that H can be accepted withcomplete conﬁdence. When T n = n H is accepted with complete conﬁdence, and a conﬁdenceinterval for θ can be formed using the conﬁdence density . The coverage probability statementsof these intervals are consistent, which maintain the stated level as n increases. Note thatJeffreys’ resolution cannot accept H with complete conﬁdence (see supplementary materialsfor detailed discussion).In response to the skepticism raised by Hume (1748), Kant (1781) proposed the considera-tion of the general proposition as absolutely valid, a priori , which is otherwise drawn from thedubious inferential inductions. In contrast Bayes (1763) and Laplace (1814) presumed a priori that the general proposition is false. Thus, Kant’s proposal is consistent only if the generalproposition is true, whereas the Bayes–Laplace rule is consistent only if the general propositionis false. It is not necessary a priori to presume P ( G ) = 0 or 1. The term ”conﬁrmation” hasbeen used in the epistemology and philosophy of science whenever the observational data (evi-dence) support scientiﬁc theories. Many Bayesian conﬁrmation measures have been proposed.For example, Carnap’s (1950) degree of conﬁrmation of the general proposition G by the evi-dence E is C ( G, E ) = P ( G | E ) − P ( G ) . Because C ( G, E ) ≤ − P ( G ) = P ( not G ) ≤ ,Popper (1959) equated the conﬁrmability with ”refutability”. We see that P ( G ) = 0 leads to P ( G | T n = n ) = 0 in the sunrise problem.In Jeffreys’ (1939) resolution with P ( G ) = 1 / , C ( G, T n = n ) = P ( G | T n = n ) − P ( G ) = n/ { n + 2) } > , and thus the evidence T n = n conﬁrms the general theory G positively.However, in the conﬁdence resolution, although the prior P ( G ) is not deﬁned, complete con-ﬁdence (conﬁrmation) P ( G | T n = n ) = 1 is achieved. Both the resolutions are consistent,10egardless of whether G is true or false. However, the former is not an oracle because it cannotattain complete conﬁdence on the general proposition in a ﬁnite sample whereas the latter can.In this study, the Bayesian prior, Bayes rule, Fisherian P-value, ﬁducial probability and Neyma-nian conﬁdence ideas are combined to resolve the induction problem, facilitating the attainmentof complete conﬁdence. Concluding remarks

Through deduction, one can achieve complete conﬁdence regarding a particular proposition EP ( E | G ) = 1 , provided that the general proposition G is true. Through induction, we see that one can attaincomplete conﬁdence regarding the general proposition P ( G ) ≤ P ( G | E )(= P ( G | T n = n )) = 1 . Considering the information provided by the data, we can be certain that the sun rises forever,provided that the assumed scientiﬁc (binomial) model is true. (Of course, in physics, the sunruns out of energy, and the solar system vanishes eventually.) To establish the universal lawsfrom the particular instances by means of induction, scientists (or artiﬁcial intelligence) do notneed to review all the instances but to establish a scientiﬁc model pertaining to the generation ofthe instances. To conﬁrm the validity of the general relativity theory, the observational evidenceof light bending was obtained in 1919 and the astrophysical measurement of the gravitationalredshift was obtained in 1925. Thus, a new theory was conﬁrmed based on a few observations.The oracle conﬁdence resolution shows that such an inductive reasoning is theoretically consis-tent and therefore rational. More supporting evidence corroborates the consistency of the oracleestimation of the logical probability. If a long existing scientiﬁc theory has not been refuted by11onﬂicting evidence, it is theoretically consistent to claim complete conﬁdence regarding thegeneral propositions derived by the existing scientiﬁc theory. The role of the scientiﬁc theoryin a scientiﬁc induction is the same as that of an axiom in a mathematical deduction. Thus, ascientiﬁc theory can be falsiﬁed or conﬁrmed via induction. If one drops an apple, one can besure that it will fall unless the Newtonian laws suddenly stops to hold. Indeed, induction can bethe glory of both science and philosophy.

References and Notes

1. T. Bayes, An essay toward solving a problem in the doctrine of chance.

Phil. Trans, Roy.Soc.

Mind , , 389-404 1918.3. C. D. Broad, Scientiﬁc thought (Brace and Co, New York, 1963).4. R. Carnap,

Logical Foundation of Probability , (University of Chicago Press, 1950)5. R. T. Cox, Probability, frequency, and reasonable expectation.

American journal ofphysics. , 1-13 (1946).6. A. Etz and E. Wagenmakers, Haldane’s contribution to the Bayes factor hypothesis test.

Statistical Science. , 313-329. (2017)7. J. Fan, and R. Li, R. Variable selection via nonconcave penalized likelihood and its oracleproperties.

Journal of American Statistical Association,

Proceedings of the Cambridge Philosophical Society , , 528–535 (1930). 12. J. B. S. Haldane, A note on inverse probability. Mathematical Proceedings of the Cam-bridge Philosophical Society. , 55-61 (1932).10. W. Heinemann,

Sextus Empiricus. Outlines of Pyrrhonism , (Robert Gregg Bury, 1933)11. D. Hume,

An Enquiry concerning Human Understanding . (P.F. Collier & Son, 1910)(1748)12. E. T. Jaynes,

Probability Theory: The Logic of Science. (Cambridge University Press,2003)13. H. Jeffreys,

Theory of Probability . (Oxford University Press, 1939)14. I. Kant,

Critique of Pure Reason . Kitcher, Patricia (intro.); Pluhar, W. (trans.) Indianapo-lis: Hackett. xxviii, 1996. (1781)15. A. N. Kolmogorov,

Foundations of the Theory of Probability , Second English Edition.Chelsea, 1956 (1933)16. P. S. Laplace, A Philosophical Essay on Probabilities, English translation of the 6thFrench Edition, Dover, 1951. (1814)

17. Y. Lee, J. A. Nelder and Y. Pawitan,

Generalized Linear Models with Random Effects:Uniﬁed Analysis via H-likelihood.

Journal ofMultivariate Analysis . , 89-99. (2014)19. J. Neyman, Note on an article by Sir Ronald Fisher”. Journal of the Royal StatisticalSociety . Series B (Methodological) 18 (2), 288-294. (1956).130. T. Ng, W. Lee and Y. Lee, Change-point estimators with true identiﬁcation property.

Bernoulli Journal , , 616-660. (2018).21. J. A. Paulos, The Mathematics of Changing Your Mind , (

New York Times (US), 5 August2011).22. Y. Pawitan,

In all likelihood: Statistical modelling and inference using likelihood . (Ox-ford University Press, 2001)23. Y. Pawitan and Y. Lee, Conﬁdence as likelihood, To appear

Statistical Science (2020)24. K. R. Popper,

New Appendices to the Logic of Scientiﬁc Discovery, 6th revised impressionof the 1959 English translation, Hutchinson, 1972. (1959).25. L. J. Savage,

Foundations of Statistics , (Wiley & Son, 1954)26. S. Senn,

Dicing with Death: Chance, Risk and Health (Cambridge University Press,2003)

27. S. Senn, Comments on ”Harold Jeffreys’s theory of probability revisited”.

StatisticalScience . (2009)28. T. Schweder and N. L. Hjort,

Conﬁdence, Likelihood, Probability. (Cambridge UniversityPress, 2016)29. R. Von Mises,

Probability, Statistics and Truth , 2nd revised English edition, Allen andUrwin, 1961. (1928)30. M. Xie, M. and K. Singh, Conﬁdence distribution, the frequentist distribution estimatorof a parameter: a review.

International Statistical Review , 81, 3-39. (2013).14 r X i v : . [ s t a t . O T ] J a n Supplementary materials

Bayesian approach

Laplace (1814) used the Bernoulli model as an instant of scientiﬁc theory for the sunrise prob-lem. Let X = ( X , · · · , X n ) be independent and identically distributed Bernoulli randomvariables with the success probability θ. Once we observed data x = ( x , · · · , x n ) , we have thelikelihood L ( x, θ ) = P ( X = x | θ ) = θ t (1 − θ ) n − t , (1)where t = P x i . Prior to knowing of any sunrise, suppose that one is completely ignorant ofthe value of θ . Laplace (1814) represented this prior ignorance by means of a uniform prior π ( θ ) = 1 on θ ∈ [0 , . Let T n = P X i . To ﬁnd the logical conditional probability of θ given T n = n , one uses the Bayes-Laplace rule: The conditional probability distribution of θ giventhe data x is called the posterior P ( θ | x ) = π ( θ ) L ( x, θ ) R L ( x, θ ) π ( θ ) dθ = π ( θ ) L ( x, θ ) P ( x ) , (2)where P ( x ) = Z L ( x, θ ) π ( θ ) dθ = Z θ t (1 − θ ) n − t dθ = b ( t + 1 , n − t + 1) , where b ( · , · ) is the beta function. Let E be a particular proposition that the sun rises tomorrow.Then, P ( T n = n ) = b ( n + 1 ,

1) = 1 / ( n + 1) , to give P ( E | It has risen n consecutive days )= P ( X n +1 = 1 | T n = n ) = n + 1 n + 2 = 21900012190002 = 0 . . This shows that P ( E | T n = n ) → as n → ∞ . One observation increases the probability of the particular proposition from n/ ( n + 1) to ( n +1) / ( n + 2) , so that the increment of probability is ( n + 1) / ( n + 2) − n/ ( n + 1) = 1 / { ( n + 1)( n + ) } . Thus, the probability of this particular proposition will be eventually one. However, this isnot enough to ensure the general proposition G that the sun rises forever holds (Senn, 2003).The probability that the sun rises in the next m consecutive days, given the previous n consecutive sunrises, is P ( X n +1 = 1 , ..., X n + m = 1 | T n = n ) = P ( T n + m = n + m | T n = n ) = n + 1 n + m + 1 . As long as n is ﬁnite , the probability of the general proposition G becomes zero because for all n > P ( G | T n = n ) = lim m →∞ P ( T n + m = n + m | T n = n ) = 0 . Consider Jeffreys’s prior, which places 1/2 of probability on θ = 1 and puts a uniform prior on[0,1) with 1/2 probability. Then, P ( T n = n ) = 0 . { / ( n + 1) } to give P ( E | T n = n ) = P ( T n +1 = n + 1 | T n = n ) = 1 + 1 / ( n + 2)1 + 1 / ( n + 1) = ( n + 1)( n + 2)( n + 2) ,P ( T n + m = n + m | T n = n ) = 1 + 1 / ( n + m + 1)1 + 1 / ( n + 1) = ( n + 1)( n + m + 2)( n + 2)( n + m + 1) . Thus, P ( G | T n = n ) = n + 1 n + 2 . Conﬁdence approach

Let T be a continuous sufﬁcient statistics for parameter θ. Let t be an observed value of T. Deﬁne the right-side P-value function C ( t, θ ) = P ( T ≥ t | θ ) . Given t , as a function of θ, C ( t, −∞ ) = 0 and C ( t, ∞ ) = 1 and C ( t, θ ) is a strictly increasingfunction of θ. Thus, C ( t, θ ) behaves as if it is the cumulative distribution of θ . This leads to theconﬁdence density for θ , analogous to the Bayesian posterior P ( θ | t ) = dC ( t, θ ) /dθ, which is the ﬁducial probability of Fisher (1930), not using a subjective prior. Schweder andHjort (2016) called it the conﬁdence density.Fisher considered cases for T to be continuous. However, in practical applications T doesnot have to be continuous. In the Bernoulli model, T n is a sufﬁcient statistics but is discrete. The2o-called ‘exact inference’ from discrete data can be expressed in terms of conﬁdence density.Consider the conservative right-side P-value C ( t, θ ) = P ( T n ≥ t | θ ) = n X y = t n ! y !( n − y )! θ y (1 − θ ) n − y = R θ x t − (1 − x ) n − t dxb ( t, n − t + 1) . This leads to the conﬁdence density P ( θ | t ) = θ t − (1 − θ ) n − t /b ( t, n − t + 1) , which gives a conservative conﬁdence interval (Pawitan, 2001; Chapter 5). See Pawitan andLee (2020) for a more thorough discussion.Because the conﬁdence density is analogous to the posterior, the induced model prior is c ( θ ) ∝ P ( θ | t ) /L ( θ ) = θ − , which is objective because it is directly obtained solely from the model. However, the modelprior c ( θ ) ∝ θ − , namely the Beta (0 , distribution , is improper to give R c ( θ ) dθ = ∞ . Necessary computations of

Beta (0 , can be obtained as the limit of proper Beta ( a, distri-bution for a > . The conﬁdence density can be viewed as the posterior (2) but replacing theconﬁdence prior c ( θ ) by the Bayesian prior π ( θ ) .This leads to P ( T n = n ) = Z θ n − dθ = 1 /n to give P ( X n +1 = 1 | T n = n ) = P ( T n +1 = n + 1) P ( T n = n ) = nn + 1 ,P ( T n + m = n + m | T n = n ) = P ( T n + m = n + m ) P ( T n = n ) = nn + m . Given n, P ( G | T n = n ) = lim m →∞ P ( T n + m = n + m | T n = n ) = 0 . Thus, this conﬁdencedensity cannot overcome the degree of scepticism yet.Now apply the conﬁdence to the transformed data, by deﬁning Y i = 0 if the sun rises in theith day, where P ( Y i = 1) = θ ∗ = 1 − θ and θ is the long-run frequency of sunrises. Thus, θ ∗ = 1 is equivalent to θ = 0 . Then, Y i = 1 − X i . Let T ∗ n = P Y i = n − T n . Then, P ( Y n +1 = 0 | T ∗ n = 0) = P ( X n +1 = 1 | T n = n ) = lim a ↓ b ( a, n + 2) b ( a, n + 1) = 1 , to give P ( G | T ∗ n = 0) = P ( G | T n = n ) = lim m →∞ P ( T ∗ n + m = 0 | T ∗ n = 0) = 1 . P ( G | T n = n ) = 1 , under the transformed data, we can say that the sun will alwaysrise with the logical probability being one. It is a little disturbing that the conﬁdence statementdepends upon the scale of the data X i and Y i = 1 − X i to which the conﬁdence procedure isapplied. Now we see the consequences and show that the resulting conﬁdence procedures areconsistent.Because the conﬁdence interval for θ ∗ is easily transformed to that of θ, we can readily applythe conﬁdence to the transformed data. This leads to the conﬁdence density and the Beta (1 , model prior P ( θ | t ) = P ( θ ∗ | n − t ) ∝ θ t (1 − θ ) n − t − and c ( θ ) ∝ P ( θ | t ) /L ( θ ) = (1 − θ ) − . This model prior leads to P ( X n +1 = 1 | T n = n ) = lim a ↓ b ( n + 2 , a ) b ( n + 1 , a ) = 1 , to give P ( G | T n = n ) = lim m →∞ P ( T n + m = n + m | T n = n ) = 1 . We can interpret a

Beta ( α, β ) prior with α > and β > as describing the state ofknowledge that a priori we have observed α successes and β failures. Then, P ( X n +1 = 1 | T n = n ) = Beta ( n + 1 + α, β ) Beta ( n + α, β ) = n + αn + α + β and P ( X n +1 = 0 | T n = 0) = Beta ( α, n + 1 + β ) Beta ( α, n + β ) = n + βn + α + β . Thus, P ( X n +1 = 1 | T n = n ) → as β → and P ( X n +1 = 0 | T n = 0) → as α → . Jaynes(2003) argued that the Bayes-Laplace

Beta (1 , prior is the state of knowledge in which wehave observed one success and one failure a priori the experiment. Thus, we know that theexperiment is a true binary one, in the sense of physical possibility. This explains why wecannot reach a sure conﬁdence of the general proposition with the Bayes-Laplace uniform priorbecause one day of no sunrise is assumed a priori . Thus, it cannot be an ignorant prior. Supposethat you were sent a coin from a manufacturer who informed you that before sending the coinan experiment was done and found one head and one tail in two trials. Even if you have donean experiment with heads only for many trials, there is no way to have the sure conﬁdence on θ = 1 if you accepts the manufacturer’s information.The model priors Beta(0,1) and Beta(1,0) allow a possibility that θ = 0 and θ = 1 , respec-tively. The Haldane (1932) prior, namely the Beta(0, 0) ∝ θ − (1 − θ ) − , means that no success4nd no failure has been observed a priori , so that it presumes that either θ = 1 or θ = 0 ispossible. Under Haldane’s prior P ( X n +1 = 1 | T n = n ) = 1 and P ( X n +1 = 0 | T n = 0) = 1 to give P ( G | T n = n ) = 1 . If you were not given any information from the manufacturer, it is natural to have the sureconﬁdence on θ = 1 , provided you have not observed a single tail in your own experiment.Note here that with the right-side P-value, for any θ ∈ [0 , C (0 , θ ) = P ( T n ≥ | θ ) = 1 . This means that θ has a point mass at zero given t = 0 , leading to the 100 % conﬁdenceinterval for θ, given t = 0 , is { } . With the transformed data, we can show that it has the pointmass at θ = 1 given t = n, leading to the 100 % conﬁdence interval for θ, given t = n, is { } . The general proposition such as θ = 0 or 1 is scientiﬁc because it can be falsiﬁed if aconﬂicting observation appears. Sure conﬁdence for the general proposition such as θ = 1 or is consequence of the scientiﬁc procedure, the P-value. Oracle hypothesis testing and conﬁdence estimation

The induced prior

Beta (1 , gives indeed simultaneous hypothesis testing for the two hypothe-ses H : θ = 1 versus H : θ = 1 and estimation of the logical probability (conﬁdence). When T n = n, the conﬁdence resolution allows that H can be accepted with a sure conﬁdence. When T n = n H is accepted with a sure conﬁdence and using the conﬁdence density a conﬁdenceinterval for θ can be formed . Coverage probability statements of these intervals are consistent,which maintain the stated level as n increases. Thus, when θ = 1 , we can say it is true withthe sure conﬁdence provided T n = n . When θ = 1 , T n cannot be n if n is sufﬁciently large,so that we cannot achieve the sure conﬁdence on θ = 1 with a large n . It is therefore consis-tent whether θ = 1 or not. With the Haldane (1932) prior, Beta (0 , , the resulting proceduregives a simultaneous hypothesis testing and conﬁdence estimation: When T n = n ( T n = 0) ,H : θ = 1 ( H : θ = 0) is accepted with 100 % conﬁdence interval { } ( { } ), and when ≤ T n ≤ n − the hypothesis H : θ ∈ (0 , is accepted with an estimator of the logicalprobability (conﬁdence density) P ( θ | T n = t ) = θ t − (1 − θ ) n − t − /beta ( t, n − t ) ,,