Is profile likelihood a true likelihood? An argument in favor
aa r X i v : . [ m a t h . S T ] J u l Is profile likelihood a truelikelihood? An argument infavor
Oliver J. Maclaren
Department of Engineering Science, University of Auckland, Auckland, New Zealand
Abstract.
Profile likelihood is the key tool for dealing with nuisance pa-rameters in likelihood theory. It is often asserted, however, that profilelikelihood is not a ‘true’ likelihood. One implication is that likelihoodtheory lacks the generality of e.g. Bayesian inference, wherein marginal-ization is the universal tool for dealing with nuisance parameters. Herewe argue that profile likelihood has as much claim to being a truelikelihood as a marginal probability has to being a true probability dis-tribution. The crucial point we argue is that a likelihood function isnaturally interpreted as a maxitive possibility measure: given this, theassociated theory of integration with respect to maxitive measures de-livers profile likelihood as the direct analogue of marginal probability inadditive measure theory. Thus, given a background likelihood function,we argue that profiling over the likelihood function is as natural (oras unnatural, as the case may be) as marginalizing over a backgroundprobability measure. The connections to Bayesian inference can also befurther clarified with the introduction of a suitable logarithmic distancefunction, in which case the present theory can be naturally describedas ‘Tropical Bayes’ in the sense of tropical algebra.
Key words and phrases:
Estimation, Inference, Profile Likelihood, Marginal-ization, Nuisance Parameters, Idempotent Integration, Maxitive Mea-sure Theory, Tropical Algebra, Tropical Bayes.
1. INTRODUCTION
Consider the opening sentence from the entry on profile likelihood in the En-cyclopedia of Biostatistics (Aitkin, 2005):
The profile likelihood is not a likelihood, but a likelihood maximized over nuisanceparameters given the values of the parameters of interest.
Numerous similar assertions that profile likelihood is not a ‘true’ likelihoodmay be found throughout the literature and various textbooks, and is apparentlythe accepted viewpoint of the statistical community. Importantly, this includesthe ‘pure’ likelihood literature, which generally accepts a lack of systematic meth-ods for dealing with nuisance parameters, while still recommending profile likeli-
Department of Engineering Science, The University of Auckland, Auckland1142, New Zealand (e-mail: [email protected]). MACLAREN hood as the most general, albeit ‘ad-hoc’, solution (see e.g. Royall, 1997; Rohde,2014; Edwards, 1992; Pawitan, 2001). Similarly, recent monographs on character-izing statistical evidence presents favorable opinions of the likelihood approachbut criticize the lack of general methods for dealing with nuisance parameters(Aitkin, 2010; Evans, 2015). The various justifications given, however, appear tothe present author to rather vague and unconvincing. For example, suppose wemodified the above quotation to refer to marginal probability instead of profilelikelihood:
A marginal probability is not a probability, but a probability distribution integratedover nuisance variables given the values of the variables of interest.
The above would be a perfectly fine characterization of a marginal probabilityif the “not a probability, but” part was dropped, i.e.
A marginal probability is a probability distribution integrated over nuisance vari-ables given the values of the variables of interest.
Simply put: the fact that a marginal probability is obtained by integrating overa ‘background’ probability distribution does not prevent the marginal probabilityfrom being a true probability. The crucial observation in the case of marginalprobability is that integration over variables takes probability distributions toprobability distributions .The purpose of the present article is to point out that there is an appropri-ate notion of integration over variables that takes likelihood functions to like-lihood functions via maximization . This notion of integration is based on theidea of idempotent analysis , wherein one replaces a standard algebraic opera-tion such as addition in a given mathematical theory with another basic alge-braic operation, defining a form of ‘idempotent addition’, to obtain a new anal-ogous, self-consistent theory (Maslov, 1992; Kolokoltsov and Maslov, 1997). Inthis case one simply replaces the usual ‘addition’ operations, including the usual(Lebesgue) integration, with ‘maximization’ operations, including taking supre-mums, to obtain a new, ‘idempotent probability theory’. Maximization in thiscontext is understood algebraically as an idempotent addition operation, hencethe terminology. While perhaps somewhat exotic at first sight, this idea findsdirect applications in e.g. large deviation theory (Puhalskii, 2001) and, most rel-evantly, possibility theory, fuzzy set theory and pure-likelihood-based decisiontheory (Dubois, Moral and Prade, 1997; Cattaneo, 2013, 2017). A popular spe-cial instance of idempotent mathematics is so-called ‘tropical mathematics’ inwhich multiplication is also converted to a new algebraic operation, here addition(see e.g. Speyer and Sturmfels, 2009; Akian, Quadrat and Viot, 1996; Litvinov,2007; Pachter and Sturmfels, 2004; Bernhard, 2000). That is, the basic ‘addition’and ‘multiplication’ operations in tropical algebra are interpreted as (max , +),respectively, instead of the usual (+ , × ). With the introduction of a logarithmicdistance in likelihood theory, multiplication of likelihoods becomes addition oflog-likelihoods and we are naturally led to a ‘Tropical Bayesian’ interpretation of(log) profile likelihoods. This provides a formal foundation for the usual intuitiveinterpretation of (negative) log-likelihoods as ‘cost’ measures.The present argument is not, of course, without objections. In particular,acceptance or rejection of the present interpretation depends on what one be-lieves the key properties of likelihood should be; this is, perhaps surprisingly, ROFILE LIKELIHOOD not without significant controversy (Bayarri and DeGroot, 1992; Bjørnstad, 1996;Bayarri, DeGroot and Kadane, 1988). Thus we end with a discussion of variouspotential objections, including a discussion of some properties one might wanta general notion of ‘likelihood’ to satisfy and whether the present interpretationdoes or does not satisfy these. Despite potential conflicts with some frequentist,evidential and/or Bayesian considerations, we believe that the present interpre-tation is a clear, self-consistent and suitable foundational concept for ‘pure’ like-lihood theory (particularly that developed by Edwards, 1992), and/or for whatwe propose to call ‘Tropical Bayes’.
2. LIKELIHOOD AS A POSSIBILITY MEASURE
Though apparently not well known in the statistical literature, likelihood the-ory is known in the wider literature on uncertainty quantification to have anatural correspondence to possibility theory rather than to probability theory(Dubois, Moral and Prade, 1997; Cattaneo, 2013, 2017). This has perhaps beenobscured by the usefulness of likelihood methods as tools in probabilistic statisti-cal inference. It is not our intention to review this wider literature in detail here(see e.g. Dubois, Moral and Prade, 1997; Cattaneo, 2013, 2017; Augustin et al.,2014; Halpern, 2017, for more), but to simply point out the implications of thiscorrespondence. In particular, likelihood theory interpreted as a possibilistic,rather than probabilistic theory can be summarized as:
Probability theory with addition replaced by maximization.
As indicated above, this is sometimes known as, for example, ‘idempotent mea-sure theory’, ‘maxitive measure theory or ‘possibility’ theory, among other names(see e.g. Dubois, Moral and Prade, 1997; Cattaneo, 2013, 2017; Augustin et al.,2014; Halpern, 2017; Maslov, 1992; Kolokoltsov and Maslov, 1997; Puhalskii,2001, for more). This correspondence perhaps explains the preponderance ofmaximization methods in likelihood theory, including the methods of maximumlikelihood and profile likelihood.The most important consequence of this perspective is that the usual Lebesgueintegration with respect to an additive measure, as in probability theory, be-comes, in likelihood/possibility theory, a different type of integration, definedwith respect to a maxitive measure. Again, the key point is simply that additionoperations (including summation and integration) are replaced by maximizationoperations (or taking supremums in general).For completeness, we contrast the key axioms of possibility theory with thoseof probability theory. Given a set of possibilities of Ω, assumed to be discrete forthe moment for simplicity, and for two discrete sets of possibilities
A, B ⊆ Ω thekey axioms of elementary possibility theory are (Halpern, 2017):(2.1) poss( ∅ ) = 0poss(Ω) = 1poss( A ∪ B ) = max { poss( A ) , poss( B ) } which can be contrasted with those of elementary probability theory: MACLAREN (2.2) prob( ∅ ) = 0prob(Ω) = 1prob( A ∪ B ) = sum { prob( A ) , prob( B ) } where A and B are required to be disjoint in the probabilistic case, but this isnot strictly required in the possibilistic case.Given a ‘background’ or ‘starting’ likelihood measure, likelihood theory can bedeveloped as a self-contained theory of possibility, where derived distributions aremanipulated according to the first set of axioms above. This is entirely analogousto developing probability theory from a background measure, with derived dis-tributions manipulated according to the second set of axioms. As our intention isto consider methods for obtaining derived distributions by ‘eliminating’ nuisanceparameters, we need not consider here where the starting measure comes from(but see the Discussion).To make the correspondences of interest clear in what follows, we first presentprobabilistic marginalization as a special case of a pushforward measure or, equiv-alently, as a special case of a general (not necessarily 1-1) change of variables. Wethen consider the possibilistic analogues.
3. PUSHFORWARD PROBABILITY MEASURES AND THE DELTAFUNCTION METHOD FOR GENERAL CHANGES OF VARIABLE
Given a probability measure µ over a random variable x ∈ R n with associateddensity ρ , define the new random variable t = T ( x ) where T : R n → R m . Thisvariable is distributed according to the pushforward measure T ⋆ µ , i.e. t ∼ T ⋆ µ .The density of t , here denoted by q = T ⋆ ρ , is conveniently calculated viathe delta function method which is valid for arbitrary changes of variables (notnecessarily 1-1):(3.1) q ( t ) = [ T ⋆ ρ ]( t ) = Z δ ( t − T ( x )) ρ ( x ) dx. As a side point, we note that this method of carrying out arbitrary transforma-tions of variables is standard in statistical physics (see e.g. Van Kampen, 1992),but is apparently less common in statistics (see the articles Au and Tam, 1999;Khuri, 2004, aimed at highlighting this method to the statistical community).
The above means that we can interpret marginalization to a component x ,say, as a special case of a (non-1-1) deterministic change of variables via:(3.2) ρ ( x ) = Z δ ( x − proj X ( x )) ρ ( x ) dx, where proj X ( x ) is simply the projection of x to its first coordinate. Thusmarginalization can be thought of as the pushforward under the projection op-erator and as a special case of a general (not necessarily 1-1) change of variables t = T ( x ). ROFILE LIKELIHOOD
4. PROFILE LIKELIHOOD AS MARGINAL POSSIBILITY AND ANEXTENSION TO GENERAL CHANGES OF VARIABLE
As we have repeatedly stressed above, likelihood theory interpreted as a pos-sibilistic, and hence maxitive, measure theory simply means that addition op-erations such as the usual Lebesgue integration are replaced by maximizationoperations such as taking the supremum.Consider first then the analogue of a marginal probability density, which wewill call a marginal possibility distribution and denote by L p . Starting from a‘background’ likelihood measure L ( x ) we ‘marginalize’ in the analogous mannerto before:(4.1) L p ( x ) = sup { δ ( x − proj X ( x )) L ( x ) } = sup { x | proj X ( x )= x } { L ( x ) } . This is again simply the pushforward under the projection operator, but hereunder a different type of ‘integration’ - i.e. the operation of taking a supremum.Of course, this is just the usual profile likelihood for x .As above, we need not be restricted to marginal possibility distributions: wecan consider arbitrary functions of the parameter t = T ( x ). This leads to ananalogous pushforward operation of L ( x ) to L p ( t ) that we denote by ⋆ p :(4.2) L p ( t ) = [ T ⋆ p L ]( t ) = sup { δ ( t − T ( x )) L ( x ) } = sup { x | T ( x )= t } { L ( x ) } which again corresponds to the usual definition of profile likelihood.
5. A SIMPLE EXAMPLE COMPARING MARGINAL PROBABILITY ANDMARGINAL POSSIBILITY
Here we consider a simple example illustrating the difference between prob-abilistic and possibilistic reasoning, in particular under marginalization/non-1-1changes of variable.Suppose you have three suspects in a crime. Through some means or anotheryou decide on the following ‘plausibility’ distribution, where plausibility is usedhere as a general umbrella term for probability and/or possibility reasoning: sus-pect one has plausibility 0.4, while the other two suspects each have plausibility0.3. You also know that suspect one was wearing a red hat at the time of thecrime while the other two were wearing blue hats.According to the above, under a probabilistic interpretation, the most probable perpetrator is suspect one (who wore a red hat); but the most probable hat color of the perpetrator is blue (with probability 0.3 + 0.3 = 0.6). This is a consequenceof the additivity of probability theory and the non-1-1 change of variables in goingfrom suspects to hat colors.On the other hand, if you interpret the given plausibility numbers as a possibil-ity distribution, then according to standard possibility theory the most possiblesuspect is suspect one and the most possible hat colour is now red, i.e. is the hatcolor of the most possible suspect, suspect one. Similarly, this is a consequenceof the maxitivity of possibility theory.The difference can be made more extreme given a large number of ‘other’suspects, each with low plausibility but sharing some common property that
MACLAREN the main suspect lacks. Again, these results are a simple consequence of howadditivity and maxitivity, respectively, interact with non-1-1 changes of variable(here: person to hat color).We believe that there are reasonable situations where additivity is desirable, but also reasonable situations in which maxitivity might be preferred. This isa subject worth further debate. We note, however, that a relative probability approach to the problem of statistical evidence, such as that presented in Evans(2015) comes to similar conclusions to that of a possibility approach (MichaelEvans, personal communication).
6. STRENGTH OF EVIDENCE, DISTANCES AND ‘TROPICAL BAYES’
As noted in Evans (2015), it is perhaps less controversial to hold that likeli-hood gives a qualitatively reasonable relative ordering of preference for parametervalues in light of data than it is to hold (e.g. Royall, 1997) that it provides a quan-titative measure of relative support.To make some progress towards addressing this distinction, we consider how todefine a suitable notion of distance that respects - but is distinct from - a given qualitative ordering . Notions of statistical distance are common in the statisticalliterature (see e.g. Basu, Shioya and Park, 2011, and references therein); here,however, we follow the ideas developed by Tarantola (2006) of quality spaces anddistances defined in these. This leads naturally to the idea of pure likelihoodtheory as a form of what we propose to call ‘Tropical Bayes’, where the meaningof this term is discussed below.In particular, given the ordering induced by a likelihood function (and/orprofile likelihood function):(6.1) θ is preferred to θ iff L ( θ ) > L ( θ ) , we can define a likelihood distance via(6.2) D L ( θ , θ ) = | log L ( θ ) L ( θ ) | = | log L ( θ ) L ( θ ) | . This distance has the properties of being symmetric, additive and zero iff L ( θ ) = L ( θ ). Tarantola (2006) argues that this notion of distance is widelyapplicable for many types of qualitative orderings. In the present case it is, ofcourse, just the well-known log-likelihood ratio function. We propose then that,accepting that the likelihood gives a natural qualitative preference or plausibil-ity ordering , the log-likelihood then gives a natural distance in this ‘qualitativespace’. There remains, however, a choice of logarithm base and/or a choice ofarbitrary distance scale factor; thus we can’t fully remove some of the ‘quali-tative’ features associated with pure likelihood theory without a further choiceof reference. One natural choice might be to take the minimum distance to afully saturated model, i.e. one which can fit the data perfectly, in which case onewould be interested in how much ‘fit’ to trade-off against parsimony considera-tions (Edwards, 1992). ROFILE LIKELIHOOD Interestingly, the combination of replacing addition operations by maximiza-tion and then working in log-space (wherein multiplication becomes addition) cor-responds to completing the ‘tropicalization’ of probability theory: moving froman algebraic structure in terms of (+ , × ) to one in terms of (max , +). This isthe subject of ‘tropical algebra’, which also goes by the name ‘max-plus’ alge-bra, and is a popular special instance of idempotent mathematics with applica-tions to decision theory, uncertainty quantification, statistical inference and op-timization (see e.g. Speyer and Sturmfels, 2009; Akian, Quadrat and Viot, 1996;Litvinov, 2007; Pachter and Sturmfels, 2004; Bernhard, 2000, for some relevantstarting points in this area). A natural interpretation of negative log-likelihoodfunctions in this context is as ‘cost measures’; these have also been termed‘Maslov measures’, due to their origins in Maslov’s idempotent probability theory(Akian, Quadrat and Viot, 1996; Bernhard, 2000). These analogies are exploredin detail by Akian, Quadrat and Viot (1996), where the natural analogue of arandom variable is a decision variable, the analogue of a Markov chain is a Bell-man chain (i.e. the Bellman equation from the subject of dynamic programming)and so on.Finally, however, we note that even if profile likelihood is accepted as thenatural analogue of marginal probability, the evidential interpretation of profilelikelihood may still have difficulties; this is discussed further below.
7. DISCUSSION7.1 Objections to profile likelihood
As discussed, it is frequently asserted that profile likelihood is not a true like-lihood (Aitkin, 2005; Royall, 1997; Pawitan, 2001; Rohde, 2014; Evans, 2015).Common reasons include: that it is obtained from a likelihood via maximiza-tion (Aitkin, 2005), that it is not based directly on observable quantities (Royall,1997; Pawitan, 2001; Rohde, 2014) and that it lacks particular repeated samplingproperties (Royall, 1997; Cox and Barndorff-Nielsen, 1994).None of the above objections appear to the present author to apply to thefollowing: given a starting or ‘background’ likelihood function, profile likelihoodsatisfies the axioms of possibility theory, in which the basic additivity axiom ofprobability theory is replaced by a maxitivity axiom. Profile likelihood is simplythe natural possibilistic counterpart to marginal probability, where additive inte-gration is replaced by a maxitive analogue. We thus argue that, if marginal prob-ability is a ‘true’ probability, then profile likelihood should likewise be considereda ‘true’ likelihood, at least when likelihood theory is interpreted in a possibilisticmanner. Negative log-likelihood functions can then be naturally interpreted ascost measures in the sense of tropical mathematics.
Regarding the second two objections mentioned above: observable quantitiesand repeated sampling properties, it is important to note that the given datamust be held fixed to give a consistent background likelihood over which to pro-file. Given fixed data one has a fixed possibility measure and thus can consider‘marginal’ - i.e. profile - likelihoods. In contrast, repeated sampling will produce adistribution of such possibility measures, and these may or may not have good fre-quentist properties. None of this is in contrast to marginal probability: changing
MACLAREN the distribution over which we marginalize changes the resulting marginal prob-ability. Of course, despite this caveat, profile likelihood often does have good re-peated sampling properties (Royall, 1997; Cox and Barndorff-Nielsen, 1994) andalso plays a key role in frequentist theory, though we do not discuss this furtherhere. One consequence is that our conception of profile likelihood does not gen-erally satisfy properties such as zero expectation of the associated score function(Cox and Barndorff-Nielsen, 1994; Pawitan, 2001). These are, however, propertiesdependent on particular repeated sampling notions such as ‘unbiasedness’, andhence more properly considered as frequentist concepts. The present approachis more suitable for those seeking a non-probabilistic ‘plausibility’ measure, asinduced by data that are considered fixed once observed.
A natural question, perhaps, is why worry about whether profile likelihoodis a true likelihood? One answer is that profile likelihood is a widely used toolbut is often dismissed as ‘ad-hoc’ or lacking proper justification. This gives theimpression that, for example, likelihood theory is lacking in comparison with e.g.Bayesian theory in terms of systematic methods for dealing with nuisance pa-rameters. By understanding that profile likelihood does in fact have a systematicbasis in terms of possibility theory practitioners and students can better under-stand and reason about a widely popular and useful tool. Understanding theconnection to possibilistic as opposed to probabilistic reasoning may also helpexplain why profile likelihood has emerged as a particularly promising method of identifiability analysis (Raue et al., 2009), where identifiability is traditionally aprerequisite for probabilistic analysis. Of course, as indicated, the price of accept-ing profile likelihood as a ‘true’ likelihood is an interpretation in terms of purelikelihood theory, and this makes the connections to repeated sampling propertiesmore complicated. We see no need however, to restrict oneself to one perspec-tive on statistical inference - the present possibilistic view can complement otherapproaches such as frequentist statistics or Bayesian statistics. Furthermore, thisanalogy opens strong connections between likelihood theory and the optimizationliterature; the foundations of such connections have already been explored by e.g.Akian, Quadrat and Viot (1996); Bernhard (2000) and provide a natural link topure likelihood decision theory as developed by Cattaneo (2013).
The possibilistic interpretation of likelihood also helps understand the rep-resentation of ignorance. While probabilistic ignorance is not preserved under arbitrary changes of variables (e.g. non-1-1 transformations), even in the discretecase, possibilistic ignorance is in the following sense: if we take the maximumlikelihood over a set of possibilities, such as { x | T ( x ) = t } for each t , rather thansumming them, a flat ‘prior likelihood’ (Edwards, 1969, 1992) over x becomesa flat prior likelihood over t . On the other hand, a flat prior probability over x in general becomes non-flat over t under non-1-1 changes of variable. Thus aprofile prior likelihood has what, in many cases, may be desirable properties as arepresentation of prior ignorance (see the discussion in Edwards, 1969, 1992, formore on likelihood and the representation of ignorance). This difference in trans-formation properties was also illustrated in our simple example comparing theprobabilistic and possibilistic analysis of criminal evidence. As noted there, how- ROFILE LIKELIHOOD ever, the relative probabilistic approach a la Evans (2015), reaches conclusionscloser to the possibilistic analysis, compared to the conclusions of the ‘absolute’probabilistic analysis (Michael Evans, personal communication).
Likelihood is traditionally considered a point function as opposed to a setfunction; this is also related to controversy over defining likelihood functions forso-called composite hypotheses (see e.g. Edwards, 1992; Royall, 1997). Authorssuch as Basu (2012) have argued, contra e.g. Fisher, that likelihood could bedirectly extended to a set function. Basu (2012) further developed the argumentthat this set function could be taken as additive - we are more inclined, here atleast, to consider the first possibility, and reject the second. A number of otherauthors have also considered the question of composite hypotheses, in particularin the context of defining evidence (see e.g. Zhang and Zhang, 2013; Blume, 2013;Bickel, 2012).We have attempted to avoid the issue of set functions/composite hypothesessomewhat by instead using the concept of a non-1-1 transformation of variables.This allows us to consider the likelihood of subsets of the full/background pa-rameter space based on an indexing statistic, i.e. by using subsets defined via { x | T ( x ) = t } . This approach is based on what amounts to equality constraints,leaving out subsets defined via inequality constraints. It may be desirable to fur-ther relax this and simply consider likelihood directly as a set function definedvia(7.1) L p ( A ) = sup x ∈ A { L ( x ) } for A ⊆ X . This allows for inequality constraints such as those in A = { x | T ( x ) ≤ t } .We leave consideration of this approach to future work. Presumably, however,one could recover the present approach by considering some notion of minimaland/or extremal sets of equality constraints, e.g. by restricting attention to thoseinequality constraints that are active during the profiling/maximization proce-dure, and hence those that are reduced to binding equality constraints. The in-terpretation of negative log-likelihoods as cost measures may also be helpful here. One of the key issues to consider when deciding whether to accept profile like-lihoods as ‘true’ likelihoods is whether they can play the same role that ‘full’ like-lihoods play in defining evidential measures (Royall, 1997; Aitkin, 2010; Evans,2015; Zhang and Zhang, 2013; Blume, 2013; Bickel, 2012). Mathematically, it ap-pears clear that profile likelihood is entirely analogous to marginal probability;it is less clear whether - or under what circumstances - one should use marginal(whether maxitive or additive) measures in defining evidence . We believe thatthis applies equally to the Bayesian approach. A way forward from here would beto separate the questions: first accept profile likelihood as a ‘marginal’ possibil-ity measure, and then investigate under what circumstances marginal measurescan be given further evidential interpretations. We suspect that the answer mayrequire additional concepts and/or assumptions like those used in the causal infer-ence literature to separate spurious marginal associations from ‘true’ causation MACLAREN (Pearl, 2009a,b). That is, we suspect that ‘evidence’ may be better defined incausal terms than in either purely probabilistic or purely possibilistic terms. Assuch, the question of whether or not profile likelihood is a ‘true’ likelihood shouldbe independent of whether it plays the role of an evidential measure, unless thedefinition of likelihood is itself explicitly supplemented with causal assumptions.
8. CONCLUSIONS
We have argued that profile likelihood has as much claim to being a true likeli-hood as a marginal probability has to being a true probability distribution. In thecase of marginal probability, integration over variables takes probability distri-butions to probability distributions, while in the case of likelihood, maximizationtakes likelihood functions to likelihood functions . Maximization can be consideredin this context as an alternative (idempotent) notion of integration, and a like-lihood function as a maxitive possibility measure. There are some conflicts withboth Bayesian and frequentist considerations, however: lack of additivity and lackof some repeated sampling properties, respectively. In our view, these conflicts arenot necessarily an issue, as neither additivity nor repeated sampling propertiessuch as unbiasedness are beyond objections. Instead we argue that the present ap-proach gives a self-consistent theory suitable for possibilistic statistical analysis,with a well-defined method of treating nuisance parameters, and which continuesin the tradition of ‘pure’ likelihood theories. The connection of profile likelihoodsto evidential interpretations appears subtle (as is, we believe, the connection ofmarginal probabilities to evidence); our view is that this issue should be exploredfurther in the context of formulating additional causal properties that an evidencemeasure should satisfy, such as those required to classify marginal correlationsinto ‘spurious’ and ‘true’ causal relationships. Finally, taking profile likelihoodseriously as a ‘true’ likelihood leads naturally to the idea of ‘Tropical BayesianInference’, a subject yet to be properly explored by the statistical community.
ACKNOWLEDGEMENTS
The author would like to thank Michael Evans, Marco Cattaneo, Yudi Pawitan,Alexandre Patriota, Christian Robert and Anthony Edwards for useful commentsand/or discussions.
REFERENCES
Aitkin, M. (2005). Profile Likelihood. In
Encyclopedia of Biostatistics
John Wiley & Sons, Ltd.
Aitkin, M. (2010).
Statistical Inference: An Integrated Bayesian/Likelihood Approach . Chapman& Hall/CRC Monographs on Statistics & Applied Probability . CRC Press.
Akian, M. , Quadrat, J. P. and
Viot, M. (1996). Duality between probability and optimiza-tion. In
Idempotency (J. Gunawardena, ed.) Cambridge University Press.
Au, C. and
Tam, J. (1999). Transforming Variables Using the Dirac Generalized Function.
Am.Stat. Augustin, T. , Coolen, F. P. A. , de Cooman, G. and Troffaes, M. C. M. (2014).
Intro-duction to imprecise probabilities . John Wiley & Sons.
Basu, D. (2012).
Statistical Information and Likelihood: A Collection of Critical Essays by Dr.D. Basu . Springer Science & Business Media.
Basu, A. , Shioya, H. and
Park, C. (2011).
Statistical Inference: The Minimum DistanceApproach . CRC Press.
Bayarri, M. , DeGroot, M. and
Kadane, J. (1988). What is the likelihood function? (withdiscussion).
Statistical Decision Theory and Related Topics IV. (eds. SS Gupta and JO Berger)Springer, New York Bayarri, M. J. and
DeGroot, M. H. (1992). Difficulties and ambiguities in the definition ofa likelihood function.
J. It. Statist. Soc. Bernhard, P. (2000). Max-Plus Algebra and Mathematical Fear in Dynamic Optimization.
Set-Valued Analysis Bickel, D. R. (2012). The Strength Of Statistical Evidence For Composite Hypotheses: Infer-ence To The Best Explanation.
Stat. Sin. Bjørnstad, J. F. (1996). On the Generalization of the Likelihood Function and the LikelihoodPrinciple.
J. Am. Stat. Assoc. Blume, J. D. (2013). Likelihood and Composite Hypotheses [Comment on “A LikelihoodParadigm for Clinical Trials”].
J. Stat. Theory Pract. Cattaneo, M. E. G. (2013). Likelihood decision functions.
Electron. J. Stat. Cattaneo, M. E. G. V. (2017). The likelihood interpretation as the foundation of fuzzy settheory.
Int. J. Approx. Reason.
Cox, D. R. and
Barndorff-Nielsen, O. E. (1994).
Inference and asymptotics . Chapman andHall, London.
Dubois, D. , Moral, S. and
Prade, H. (1997). A Semantics for Possibility Theory Based onLikelihoods.
J. Math. Anal. Appl.
Edwards, A. W. F. (1969). Statistical methods in scientific inference.
Nature
Edwards, A. W. F. (1992). Likelihood, expanded ed.
Johns Hopkins University Press, Balti-more . Evans, M. (2015).
Measuring Statistical Evidence Using Relative Belief . Chapman & Hall/CRCMonographs on Statistics & Applied Probability . CRC Press.
Halpern, J. Y. (2017).
Reasoning about Uncertainty . MIT Press.
Khuri, A. I. (2004). Applications of Dirac’s delta function in statistics.
Internat. J. Math. Ed.Sci. Tech. Kolokoltsov, V. and
Maslov, V. P. (1997).
Idempotent Analysis and Its Applications .Springer Science & Business Media.
Litvinov, G. L. (2007). Maslov dequantization, idempotent and tropical mathematics: A briefintroduction.
J. Math. Sci.
Maslov, V. P. (1992).
Idempotent Analysis . American Mathematical Soc.
Pachter, L. and
Sturmfels, B. (2004). Tropical geometry of statistical models.
Proc. Natl.Acad. Sci. U. S. A.
Pawitan, Y. (2001).
In All Likelihood: Statistical Modelling and Inference Using Likelihood . Oxford science publications . OUP Oxford.
Pearl, J. (2009a). Causal inference in statistics: An overview.
Stat. Surv. Pearl, J. (2009b).
Causality . Cambridge University Press.
Puhalskii, A. (2001).
Large Deviations and Idempotent Probability . CRC Press.
Raue, A. , Kreutz, C. , Maiwald, T. , Bachmann, J. , Schilling, M. , Klingm¨uller, U. and
Timmer, J. (2009). Structural and practical identifiability analysis of partially observeddynamical models by exploiting the profile likelihood.
Bioinformatics Rohde, C. A. (2014).
Introductory Statistical Inference with the Likelihood Function: . SpringerInternational Publishing.
Royall, R. (1997).
Statistical Evidence: A Likelihood Paradigm . CRC Press.
Speyer, D. and
Sturmfels, B. (2009). Tropical Mathematics.
Math. Mag. . Tarantola, A. (2006).
Elements for Physics: Quantities, Qualities, and Intrinsic Theories .Springer Science & Business Media.
Van Kampen, N. G. (1992).
Stochastic processes in physics and chemistry . Elsevier. Zhang, Z. and
Zhang, B. (2013). A Likelihood Paradigm for Clinical Trials.
J. Stat. TheoryPract.7