[PDF] Linear Functions to the Extended Reals

Abstract

This note investigates functions from \mathbb{R}^d to \mathbb{R} \cup \{\pm \infty\} that satisfy axioms of linearity wherever allowed by extended-value arithmetic. They have a nontrivial structure defined inductively on d, and unlike finite linear functions, they require \Omega(d^2) parameters to uniquely identify. In particular they can capture vertical tangent planes to epigraphs: a function (never -\infty) is convex if and only if it has an extended-valued subgradient at every point in its effective domain, if and only if it is the supremum of a family of "affine extended" functions. These results are applied to the well-known characterization of proper scoring rules, for the finite-dimensional case: it is carefully and rigorously extended here to a more constructive form. In particular it is investigated when proper scoring rules can be constructed from a given convex function.

Full PDF

LLinear Functions to the Extended Reals ∗ Bo Waggoner

University of Colorado, Boulder

February 19, 2021

Abstract

This note investigates functions from R d to R ∪ {±∞} that satisfy axioms of linearitywherever allowed by extended-value arithmetic. They have a nontrivial structure deﬁnedinductively on d , and unlike ﬁnite linear functions, they require Ω( d ) parameters touniquely identify. In particular they can capture vertical tangent planes to epigraphs:a function (never −∞ ) is convex if and only if it has an extended-valued subgradientat every point in its eﬀective domain, if and only if it is the supremum of a family of“aﬃne extended” functions. These results are applied to the well-known characterizationof proper scoring rules, for the ﬁnite-dimensional case: it is carefully and rigorouslyextended here to a more constructive form. In particular it is investigated when properscoring rules can be constructed from a given convex function. The extended real number line, denoted R = R ∪ {±∞} , is widely useful particularly inconvex analysis. But I am not aware of an answer to the question: What would it mean tohave a “linear” f : R d → R ? The extended reals have enough structure to hope for a usefulanswer, but diﬀer enough from a vector space to need investigation. A natural approach isthat f must satisfy the usual linearity axioms of homogeneity and additivity whenever legalunder extended-reals arithmetic. Here is an example when d = 3: f ( x, y, z ) =  if z > → ∞ if z < → − ∞ if z = 0 →  if y > → ∞ if y < → − ∞ if y = 0 → x. ∗ Many thanks to Rafael Frongillo for discussions and suggestions, and to Terry Rockafellar for comments. a r X i v : . [ m a t h . S T ] F e b e will see that all linear extended functions (Deﬁnition 2.1) on R d have the aboveinductive format, descending dimension-by-dimension until reaching a ﬁnite linear function(Proposition 2.5). In fact, a natural representation of this procedure is parsimonious, requiringas many as Ω( d ) real-valued parameters to uniquely identify an extended linear function on R d (Proposition 2.6). This structure raises the possibility, left for future work, that linearextended functions are signiﬁcantly nontrivial on inﬁnite-dimensional spaces.Linear extended functions arise naturally as extended subgradients (Deﬁnition 3.1) of aconvex function g . These can be used to construct aﬃne extended functions (Deﬁnition 3.4)capturing vertical supporting hyperplanes to convex epigraphs. Along such a hyperplane,intersected with the boundary of the domain, g can sometimes have the structure of anarbitrary convex function in d − f is an extendedsubgradient, at the point (1 , , g ( x, y, z ) =  if z > → ∞ if z < → z = 0 →  if y > → ∞ if y < → y = 0 → x . A proper function (one that is never −∞ ) is convex if and only if it has an extendedsubgradient at every point in its eﬀective domain (Proposition 3.10). We also ﬁnd it is convexif and only if it is the pointwise supremum of aﬃne extended functions (Proposition 3.7). Proper scoring rules.

The motivation for this investigation was the study of scoring rules:functions S assigning a score S ( p, y ) to any prediction p (a probability distribution overoutcomes) and observed outcome y . This S is called proper if reporting the true distributionof the outcome maximizes expected score. A well-known characterization states that properscoring rules arise as and only as subgradients of convex functions [McCarthy, 1956, Savage,1971, Schervish, 1989].Modern works [Gneiting and Raftery, 2007, Frongillo and Kash, 2014] generalize thischaracterization to allow scores of −∞ , corresponding to convex functions that are notsubdiﬀerentiable . These works therefore replace subgradients with brieﬂy-deﬁned generalized“subtangents”, but they slightly sidestep the questions tackled head-on here: existence of theseobjects and their roles in convex analysis. So these characterizations are not fully constructive.In particular, it may surprise even experts that rigorous answers are not available to thefollowing questions, given a convex function g : (a) under what conditions can one constructa proper scoring rule from it? (b) when is the resulting scoring rule strictly proper? (c) dothe answers depend on which subgradients of g are used, when multiple choices are available?Section 4 uses the machinery of extended linear functions to prove a “construction ofproper scoring rules” that answers such questions for the ﬁnite-outcome case. They also consider inﬁnite-dimensional outcome spaces, which are not treated here.

Any convex function onthe probability simplex gives rise to a proper scoring rule (using any choices of its extendedsubgradients).

Furthermore, it shows that a strictly convex function can only give rise to strictly proper scoring rules and vice versa. These facts are likely known by experts in thecommunity; however, I do not know of formal claims and proofs. This may be because, whenscoring rules can be −∞ , proofs seem to require signiﬁcant formalization and investigationof “subtangents”. In this paper, this investigation is supplied by linear extended functionsand characterizations of (strictly) convex functions as those that have (uniquely supporting)extended subgradients everywhere.Next, Theorem 4.15 allows prediction spaces P to be any subset of the simplex. It sharpensthe characterizations of Gneiting and Raftery [2007], Frongillo and Kash [2014] by showing S to be proper if and only if it can be constructed from extended subgradients of a convex g that is interior-locally-Lipschitz (Deﬁnition 4.13). It also answers the construction questions (a)-(c) above, e.g. showing that given such a g , some but not necessarily all selections of itsextended subgradients give rise to proper scoring rules. Preliminaries.

Functions in this work are deﬁned on Euclidean spaces of dimension d ≥ R d refer to any such space. Forexample, I may argue that a certain function exists on domain R d , then refer to that functionas being deﬁned on a given d -dimensional subspace of R d +1 .Let g : R d → R . The eﬀective domain eﬀdom( g ) of g is { x ∈ R d : g ( x ) (cid:54) = ∞} . Thefunction g is convex if its epigraph { ( x, y ) ∈ R d × R : y ≥ g ( x ) } is a convex set. Equivalently, itis convex if for all x, x (cid:48) ∈ eﬀdom( g ) and all 0 < ρ < g ( ρ · x + (1 − ρ ) x (cid:48) ) ≤ ρg ( x ) + (1 − ρ ) g ( x (cid:48) ),observing that this sum may contain −∞ but not + ∞ . It is strictly convex on a convex set P ⊆ eﬀdom( g ) if the previous inequality is always strict for x, x (cid:48) ∈ P with x (cid:54) = x (cid:48) .We say a function h minorizes g if h ( x ) ≤ g ( x ) for all x . cl( A ) denotes the closure of theset A and int( A ) its interior. For a ∈ R , the sign function is sign( a ) = 1 if a >

0, sign( a ) = 0if a = 0, and sign( a ) = − a < The extended reals.

The extended reals, R = R ∪ {±∞} , have the following rules ofarithmetic for any α, β ∈ R : β + ∞ = ∞ , β − ∞ = −∞ , and α · ∞ =  ∞ α > α = 0 −∞ α < . Illegal and disallowed is addition of ∞ and −∞ . Addition is associative and commutative aslong as it is legal; multiplication of multiple scalars and possibly one non-scalar is associativeand commutatitve. Multiplication by a scalar distributes over legal sums. R has the following rules of comparison: −∞ < β < ∞ for every β ∈ R . The supremumof a subset of R is ∞ if it contains ∞ or it contains an unbounded-above set of reals; the3nalogous facts hold for the inﬁmum. Also, inf ∅ = ∞ and sup ∅ = −∞ . I will not put atopology on R in this work. The following deﬁnition makes sense with a general real vector space X in place of R d , but itremains to be seen if all results can extend. Deﬁnition 2.1.

Call the function f : R d → R a linear extended function if:1. (scaling) For all x ∈ R d and all α ∈ R : f ( αx ) = αf ( x ).2. (additivity) For all x, x (cid:48) ∈ R d : If f ( x ) + f ( x (cid:48) ) is legal, i.e. { f ( x ) , f ( x (cid:48) ) } (cid:54) = {±∞} , then f ( x + x (cid:48) ) = f ( x ) + f ( x (cid:48) ).If the range of f is included in R , Deﬁnition 2.1 reduces to the usual deﬁnition of a linearfunction. For clarity, this paper may emphasize the distinction by calling such f ﬁnite linear .To see that this deﬁnition can be satisﬁed nontrivially, let f , f : R d → R be ﬁnite linearand consider f ( x ) := ∞ · f ( x ) + f ( x ). With a representation f ( x ) = v · x , we have f ( x ) =  ∞ v · x > −∞ v · x < f ( x ) v · x = 0 . Claim 2.2.

Any such f is a linear extended function. Proof.

Multiplication by a scalar distributes over the legal sum ∞ · f ( x ) + f ( x ), so αf ( x ) = ∞ · f ( αx ) + f ( αx ) = f ( αx ). To obtain additivity of f , consider cases on the pair ( v · x, v · x (cid:48) ).If both are zero, f ( x + x (cid:48) ) = f ( x + x (cid:48) ) = f ( x ) + f ( x (cid:48) ) = f ( x ) + f ( x (cid:48) ). If they have oppositesigns, then { f ( x ) , f ( x (cid:48) ) } = {±∞} and the requirement is vacuous. If one is positive and the othernonnegative, then v · ( x + x (cid:48) ) >

0, so we have f ( x + x (cid:48) ) = ∞ = f ( x ) + f ( x (cid:48) ). The remaining case,one negative and the other nonpositive, is exactly analogous. Such f are not all the linear extended functions, because the subspace where x · v = 0need not be entirely ﬁnite-valued. As in the introduction’s example, it can itself be dividedinto inﬁnite and neg-inﬁnite open halfspaces, with f ﬁnite only on some further-reducedsubspace. We will show that all linear extended functions can be constructed in this recursiveway, Algorithm 1 (heavily relying on the setting of R d ).For the following lemma, recall that a subset S of R d is a convex cone if, when x, x (cid:48) ∈ S and α, β >

0, we have αx + βx (cid:48) ∈ S . A convex cone need not contain (cid:126) Lemma 2.3 (Decomposition) . f : R d → R is linear extended if and only if: (1) the sets S + = f − ( ∞ ) and S − = f − ( −∞ ) are convex cones with S + = − S − , and (2) the set F = f − ( R ) is a subspace of R d , and (3) f coincides with some ﬁnite linear function on F . lgorithm 1 Computing a linear extended function f : R d → R Parameters: t ∈ { , . . . , d } ; ﬁnite linear ˆ f : R d − t → R ; if t ≥

1, unit vectors v ∈ R d , . . . , v t ∈ R d − t +1 Input: x ∈ R d for j = 1 , . . . , t doif v j · x > then return ∞ else if v j · x < then return −∞ end if reparameterize x as a vector in R d − j , a member of the subspace { x (cid:48) : v j · x (cid:48) = 0 } end for return ˆ f ( x ) Proof. ( = ⇒ ) Let f be linear extended. The scaling axiom implies f ( (cid:126)

0) = 0, so (cid:126) ∈ F . Observethat F is closed under scaling (by the scaling axiom) and addition (the addition axiom is nevervacuous on x, x (cid:48) ∈ F ). So it is a subspace. f satisﬁes scaling and additivity (never vacuous) for allmembers of F , so it is ﬁnite linear there. This proves (2) and (3). Next: the scaling axiom impliesthat if x ∈ S + then − x ∈ S − , giving S + = − S − as claimed. Now let x, x (cid:48) ∈ S + and α, β >

0. Thescaling axiom implies αx and βx (cid:48) are in S + . The additivity axiom implies αx + βx (cid:48) ∈ S + , provingit is a convex cone. We immediately get S − = − S + is a convex cone as well, proving (1).( ⇐ = ) Suppose F is a subspace on which f is ﬁnite linear and S + = − S − is a convex cone. Weprove f satisﬁes the two axioms needed to be linear extended. First, F includes (cid:126) (cid:126) S + = − S − , because S + ∩ S − = ∅ ). Because f is ﬁnitelinear on F , f ( (cid:126)

0) = 0. We now show f satisﬁes the two axioms of linearity.For scaling, if x ∈ F , the axiom follows from closure of F under scaling and ﬁnite linearity of f on F . Else, let α ∈ R and x ∈ S + (the case x ∈ S − is exactly analogous). If α >

0, then αx ∈ S + because it is a convex cone, which gives f ( αx ) = αf ( x ) = ∞ . If α = 0, then f ( αx ) = αf ( x ) = 0.If α <

0, then use that − x ∈ S − by assumption, and since S − is a cone and − α >

0, we have( − α )( − x ) ∈ S − . In other words, f ( αx ) = −∞ = αf ( x ), as required.For additivity, let x, x (cid:48) be given. • If x, x (cid:48) ∈ F , additivity follows from closure of F under addition and ﬁnite linearity of f on F . • If x, x (cid:48) ∈ S + , then x + x (cid:48) ∈ S + as required because it is a convex cone; analogously for x, x (cid:48) ∈ S − . • If x ∈ S + , x (cid:48) ∈ S − or vice versa, the axiom is vacuously satisﬁed.The remaining case is, without loss of generality, x ∈ S + , x (cid:48) ∈ F (the proof is identical for x ∈ S − , x (cid:48) ∈ F ). We must show x + x (cid:48) ∈ S + . Because F is a subspace and x (cid:48) ∈ F, x (cid:54)∈ F , we musthave x + x (cid:48) (cid:54)∈ F . Now suppose for contradiction that x + x (cid:48) ∈ S − . We have − x ∈ S − because S − = − S + . Because S − is a convex cone, it is closed under addition, so x + x (cid:48) + ( − x ) = x (cid:48) ∈ S − , acontradiction. So x + x (cid:48) ∈ S + . emma 2.4 (Recursive deﬁnition) . f : R d → R is linear extended if and only if one of thefollowing hold:1. f is ﬁnite linear (this case must hold if d = 0 ), or2. There exists a unit vector v and linear extended function f on the d − dimensionalsubspace { x : v · x = 0 } such that f ( x ) = f ( x ) if x · v = 0 , else f ( x ) = ∞ · sign ( v · x ) . Proof. ( = ⇒ ) Suppose f is linear extended. The case d = 0 is immediate, as f ( (cid:126)

0) = 0 bythe scaling axiom. So let d ≥ f is not ﬁnite linear; we show case (2) holds. Let S + = f − ( ∞ ), S − = f − ( −∞ ), and F = f − ( R ). Recall from Lemma 2.3 that F is a subspace,necessarily of dimension < d by assumption that f is not ﬁnite; meanwhile S + = − S − and both areconvex cones.We ﬁrst claim that there is an open halfspace on which f ( x ) = ∞ , i.e. included in S + . First,cl( S + ) includes a closed halfspace: if not, the set cl( S + ) ∪ cl( S − ) (cid:54) = R d and then its complement, anopen set, would necessarily have aﬃne dimension d yet would be included in F , a contradiction. Now,because S + is convex, it includes the relative interior of cl( S + ), so it includes an open halfspace.Write this open halfspace { x : v · x > } for some unit vector v . Because S − = − S + , we have f ( x ) = −∞ on the complement { x : v · x < } . Let f be the restriction of f to the remainingsubspace, { x : v · x = 0 } . This set is closed under addition and scaling, and f satisﬁes the axiomsof a linear extended function, so f is a linear extended function as well.( ⇐ = ) If case (1) holds and f is ﬁnite linear, then it is immediately linear extended as well,QED. In case (2), we apply Lemma 2.3 to f . We obtain that it is ﬁnite linear on a subspace of { x : v · x = 0 } , which is a subspace of R d , giving that f is ﬁnite linear on a subspace. We alsoobtain f − ( ∞ ) = − f − ( −∞ ) and is a convex cone. It follows directly that f − ( ∞ ) = − f − ( −∞ ).In fact, f − ( ∞ ) = f − ( ∞ ) ∪ { x : v · x > } . The ﬁrst set is a convex cone lying in the closureof the second set, also a convex cone. So the union is a convex cone: scaling is immediate; anynontrivial convex combination of a point from each set lies in the second, giving convexity; scalingand convexity give additivity. This shows that f − ( ∞ ) is a convex cone, the ﬁnal piece needed toapply Lemma 2.3 and declare f linear extended. Proposition 2.5 (Correctness of Algorithm 1) . A function f : R d → R is linear extendedif and only if it is computed by Algorithm 1 for some t ∈ { , . . . , d } , some v ∈ R d , . . . , v t ∈ R d − t +1 , and some ﬁnite linear ˆ f : R d → R . Proof. ( = ⇒ ) Suppose f is linear extended. By Lemma 2.4, there are two cases. If f is ﬁnitelinear, then take t = 0 and ˆ f = f in Algorithm 1. Otherwise, Lemma 2.4 gives a unit vector v sothat f ( x ) = ∞ if v · x > f ( x ) = −∞ if v · x <

0, as in Algorithm 1. f is linear extendedon { x : v · x = 0 } , so we iterate the procedure until reaching a subspace where f is ﬁnite linear,setting t to be the number of iterations.( ⇐ = ) Suppose f is computed by Algorithm 1. We will use the two cases of Lemma 2.4 to show f is linear extended. If t = 0, then f is ﬁnite linear, hence linear extended (case 1). If t ≥

1, then f is in case 2 with unit vector v and function f equal to the implementation of Algorithm 1 on t −

1, ˆ f , and v , . . . , v t . This proves by induction on t that, if f is computed by Algorithm 1, thenit satisﬁes one of the cases of Lemma 2.4, so it is linear extended. roposition 2.6 (Parsimonious parameterization) . Each linear extended function has aunique representation by the parameters of Algorithm 1.

Proof.

For i ∈ { , } , let f ( i ) be the function computed by Algorithm 1 with the parameters t ( i ) ,ˆ f ( i ) , { v ( i ) j : j = 1 , . . . , t ( i ) } . We will prove that f (1) and f (2) are distinct if any of their parametersdiﬀer: i.e. if t (1) (cid:54) = t (2) , or else ˆ f (1) (cid:54) = ˆ f (2) , or else there exists j with v (1) j (cid:54) = v (2) j .By Lemma 2.3, each f ( i ) is ﬁnite linear on a subspace F ( i ) and positive (negative) inﬁnite on aconvex cone S i, + (respectively, S i, − ) with S i, + = − S i, − . It follows that they are equal if and onlyif: S , + = S , + (this implies F (1) = F (2) ) and they coincide on F (1) .If t (1) (cid:54) = t (2) , then the dimensions of F (1) and F (2) diﬀer, so they are nonequal, so S , + (cid:54) = S , + and the functions are not the same. So suppose t (1) = t (2) . Now suppose the unit vectors are notthe same, i.e. there is some smallest index j such that v (1) j (cid:54) = v (2) j . Observe that on iteration j ofthe algorithm, the d − j + 1 dimensional subspace under consideration is identical for f (1) and f (2) .But now there is some x with v (1) j · x > v (2) j · x <

0, for example, x = v (1) j − v (2) j . On this x (parameterized as a vector in R d ), f (1) ( x ) = ∞ while f (2) ( x ) = −∞ , so the functions diﬀer.Finally, suppose t (1) = t (2) and all unit vectors are the same. Observe that F (1) = F (2) . But ifˆ f (1) (cid:54) = ˆ f (2) , then there is a point in F (1) where ˆ f (1) and ˆ f (2) diﬀer. On this point (parameterized asa vector in R d ), f (1) and f (2) diﬀer. One corollary is the following deﬁnition:

Deﬁnition 2.7 (Depth) . Say the depth of a linear extended function f is the value of t inits parameterization of Algorithm 1 (shown in Proposition 2.6 to be unique).Another is that, while ﬁnite linear functions on R d are uniquely identiﬁed by d real-valuedparameters, linear extended functions require as many as (cid:0) d (cid:1) + 1 = Ω( d ): Unit vectors in R k require k − k ≥ t = d −

1, identifying the unitvectors takes d − · · · + 1 = (cid:0) d (cid:1) parameters, and one more deﬁnes ˆ f : R → R .By the way, we would be remiss in forgetting to prove: Proposition 2.8 (Convexity) . Linear extended functions are convex.

Proof.

We prove convexity by showing the epigraph of a linear extended function f : R d → R isconvex. By induction on d : If d = 0, then f is ﬁnite linear (Lemma 2.4), so it has a convex epigraph.Otherwise, let d ≥

1. If f is ﬁnite linear, then again it has a convex epigraph, QED. Otherwise, byLemma 2.4, its epigraph is A ∪ B where: A = { x : v · x < } for some unit vector v ∈ R d ; and B is the epigraph (appropriately reparameterized) of a linear extended function f : R d − → R onthe set { x : v · x = 0 } . We have B ⊆ cl( A ), and both sets are convex (by inductive hypothesis), sotheir union is convex: any nontrivial convex combination of points lies in the interior of A . The eﬀective domain of a linear extended function at least includes an open halfspace. Ifthe depth is zero, it is R d ; if the depth is 1, it is a closed halfspace { x : v · x ≤ } , andotherwise it is neither a closed nor open set. 7 tructure. It seems diﬃcult to put useful algebraic or topological structure on the set oflinear extended functions on R d . For example, addition of two functions is typically undeﬁned.Convergence in the parameters of Algorithm 1 does not seem to imply pointwise convergenceof the functions, as for instance we can have a sequence v ( m )1 → v such that members of { x : v · x = 0 } are always mapped to ∞ . But perhaps future work can put a creativestructure on this set. Recall that a subgradient of a function g at a point x is a (ﬁnite) linear function f satisfying g ( x ) ≥ g ( x ) + f ( x − x ) for all x . The following deﬁnition of extended subgradient replaces f with a linear extended function; to force sums to be legal, we will only deﬁne it at pointsin the eﬀective domain of a “proper” function, i.e. one that is never −∞ . A main aim ofthis section is to show that a proper function is convex if and only if it has an extendedsubgradient everywhere in its eﬀective domain (Proposition 3.10). Deﬁnition 3.1 (Extended subgradient) . Given a function g : R d → R ∪ {∞} , a linearextended function f is an extended subgradient of g at a point x in its eﬀective domain if,for all x ∈ R d , g ( x ) ≥ g ( x ) + f ( x − x ).Again for clarity, if this f is a ﬁnite linear function, we may call it a ﬁnite subgradient of g . Note that the sum appearing in Deﬁnition 3.1 is always legal because g ( x ) ∈ R under itsassumptions.As is well-known, ﬁnite subgradients correspond to supporting hyperplanes of the epigraphsof convex functions g , but they do not exist at points where all such hyperplanes are vertical;consider, in one dimension, g ( x ) =  x < x = 1 ∞ x > . (1)We next show that every g has an extended subgradient everywhere in its eﬀective domain.For example, f ( z ) = ∞ · z is an extended subgradient of the above g at x = 1. Proposition 3.2 (Existence of extended subgradients) . Each convex function g : R d → R ∪ {∞} has an extended subgradient at every point in its eﬀective domain. Proof.

Let x be in the eﬀective domain G of g ; we construct an extended subgradient f x . Theauthor ﬁnds the following approach most intuitive: Deﬁne g x ( z ) = g ( x + z ) − g ( x ), the shift of g that moves ( x, g ( x )) to ( (cid:126) , G x = eﬀdom( g x ). Now we appeal to a technical lemma, Lemma3.3 (next), which says there exists a linear extended f x that minorizes g x . This completes the proof:For any x (cid:48) ∈ R d , we have f x ( x (cid:48) − x ) ≤ g x ( x (cid:48) − x ), which rearranges to give g ( x ) + f x ( x (cid:48) − x ) ≤ g ( x (cid:48) )(using that g ( x ) ∈ R by assumption). emma 3.3. Let g x : R d → R ∪ {∞} be a convex function with g x ( (cid:126)

0) = 0 . Then there existsa linear extended function f x with f x ( z ) ≤ g x ( z ) for all z ∈ R d . Proof.

First we make Claim (*): if (cid:126) G x of g x , then suchan f x exists. This follows because g x necessarily has at (cid:126) f x .Now, we prove the result by induction on d .If d = 0, then it is only possible to have f x = g x .If d ≥

1, there are two cases. If (cid:126) G x , then Claim (*) applies and we aredone.Otherwise, (cid:126) G x . This implies that there exists a hyperplanesupporting G x at (cid:126)

0. That is, there is some unit vector v ∈ R d such that G x ⊆ { z : v · z ≤ } . Set f x ( z ) = ∞ if v · z > f x ( z ) = −∞ if v · z <

0. Note f x minorizes g on these regions, in the ﬁrstcase because g x ( z ) = ∞ = f x ( z ) (by deﬁnition of eﬀective domain), and in the second case because f x ( z ) = −∞ .So it remains to deﬁne f x on the subspace S := { z : v · z = 0 } and show that it minorizes g x there. But g x , restricted to this subspace, is again a proper convex function equalling 0 at (cid:126)

0, soby induction we have a minorizing linear extended function ˆ f x on this d − f x ( z ) = ˆ f x ( z ) for z ∈ S . Then f x minorizes g x everywhere. We also have that f x is linearextended by Lemma 2.4, as it satisﬁes the necessary recursive format. This proof would have gone through if Claim (*) used “relative interior” rather thaninterior, because supporting hyperplanes also exist at the relative boundary. That versionwould have constructed extended subgradients of possibly lesser depth (Deﬁnition 2.7).It is now useful to deﬁne aﬃne extended functions . Observe that a bad deﬁnition wouldbe “a linear extended function plus a constant.” Both vertical and horizontal shifts of linearextended functions must be allowed in order to capture, e.g. h ( x ) =  −∞ x < x = 1 ∞ x > . (2) Deﬁnition 3.4 (Aﬃne extended function) . A function h : R d → R is aﬃne extended if forsome β ∈ R and x ∈ R d and some linear extended f : R d → R we have h ( x ) = f ( x − x ) + β . Deﬁnition 3.5 (Supports) . An aﬃne extended function h supports a function g : R d → R at a point x ∈ R d if h ( x ) = g ( x ) and for all x ∈ R d , h ( x ) ≤ g ( x ).For example, the above h in Display 2 supports, at x = 1, the previous example g ofDisplay 1. Actually, h supports g at all x ≥ Deﬁnition and existence of a supporting hyperplane are referred to e.g. in Hiriart-Urrut and Lemar´echal[2001]: Deﬁnition 1.1.2 of a hyperplane (there v must merely be nonzero, but it is equivalent to requiring aunit vector); Deﬁnition 2.4.1 of a supporting hyperplane to G x at x , specialized here to the case x = (cid:126)

0; andLemma 4.2.1, giving existence of a supporting hyperplane to any nonempty convex set G x at any boundarypoint. bservation 3.6. Given β, x , f , deﬁne the aﬃne extended function h ( x ) = f ( x − x ) + β .1. h is convex: its epigraph is a shift of a linear extended function’s.2. Let g : R d → R ∪ {∞} such that x ∈ eﬀdom ( g ) . Then h supports g at x if and onlyif f is an extended subgradient of g at x .3. If h ( x ) is ﬁnite, then for all x we have h ( x ) = f ( x − x ) + β where β = f ( x − x ) + β . Point 2 follows because h supports g at x if and only if β = g ( x ) and g ( x ) ≥ β + f ( x − x )for all x . Point 3 follows because h ( x ) = f ( x − x ) + β , so f ( x − x ) is ﬁnite: the sum f ( x − x ) + f ( x − x ) is always legal and equals f ( x − x ).An important fact in convex analysis is that a closed (i.e. lower semicontinuous) convexfunction g is the pointwise supremum of a family of aﬃne functions. We have the followinganalogue. Proposition 3.7 (Supremum characterization) . A function g : R d → R ∪ {∞} is convex ifand only if it is the pointwise supremum of a family of aﬃne extended functions. Proof. ( ⇐ = ) Let H be a family of aﬃne extended functions and let g ( x ) := sup h ∈ H h ( x ). Theepigraph of each aﬃne extended function is a convex set (Observation 3.6), and g ’s epigraph is theintersection of these, hence a convex set, so g is convex.( = ⇒ ) Suppose g is convex; let H be the set of aﬃne extended functions minorizing g . We show g ( x ) = sup h ∈ H h ( x ).First consider x in the eﬀective domain G of g . By Proposition 3.2, g has an extended subgradientat x , which implies (Observation 3.6) it has a supporting aﬃne function h x there, and of course h x ∈ H . We have g ( x ) = h x ( x ) and by deﬁnition g ( x ) ≥ h ( x ) for all h ∈ H , so g ( x ) = max h ∈ H h ( x ).Finally, let x (cid:54)∈ G . We must show that sup h ∈ H h ( x ) = ∞ . We apply a technical lemma, Lemma3.8 (stated and proven next), to obtain a set H (cid:48) of aﬃne extended functions that all equal −∞ on G (hence H (cid:48) ⊆ H ), but for which sup h ∈ H (cid:48) h ( x ) = ∞ , as required. Lemma 3.8.

Let G be a convex set in R d and let x (cid:54)∈ G . There is a set H (cid:48) of aﬃne extendedfunctions such that sup h ∈ H (cid:48) h ( x ) = ∞ while h ( x (cid:48) ) = −∞ for all h ∈ H and x (cid:48) ∈ G . For intuition, observe that the easiest case is if x (cid:54)∈ cl( G ), when we can use a stronglyseparating hyperplane to get a single h that is ∞ at x and −∞ on G . The most diﬃcultcase is when x is an extreme, but not exposed, point of cl( G ). (Picture a convex g witheﬀective domain cl( G ); now modify g so that g ( x ) = ∞ .) Capturing this case requires thefull recursive structure of linear extended functions. Proof.

Let G (cid:48) = { x (cid:48) − x : x (cid:48) ∈ G } . Note that G (cid:48) is a convex set not containing (cid:126)

0. ByLemma 3.9, next, there is a linear extended function f equalling −∞ on G (cid:48) . Construct H = { h ( x (cid:48) ) = f ( x (cid:48) − x ) + β : β ∈ R } . Then sup h ∈ H h ( x ) = sup β ∈ R β = ∞ , while for x (cid:48) ∈ G we have h ( x (cid:48) ) = f ( x (cid:48) − x ) + β = −∞ . This proves Lemma 3.8. Lemma 3.9.

Given a convex set G (cid:48) ⊆ R d not containing (cid:126) , there exists a linear extended f with f ( x (cid:48) ) = −∞ for all x (cid:48) ∈ G (cid:48) . roof. Note that if G (cid:48) = ∅ , then the claim is immediately true; and if d = 0, then G (cid:48) must beempty. Now by induction on d : If d = 1, then we can choose f ( z ) = ∞ · z or else f ( z ) = −∞ · z ;one of these must be −∞ on the convex set G (cid:48) .Now suppose d ≥

2; we construct f . The convex hull of G (cid:48) ∪ { (cid:126) } must be supported at (cid:126)

0, whichis a boundary point , by some hyperplane. Let this hyperplane be parameterized by the unit vector v where G (cid:48) ⊆ { z : v · z ≤ } . Set f ( z ) = ∞ if v · z > −∞ if v · z <

0. It remains todeﬁne f on the subspace S := { z : v · z = 0 } . Immediately, G (cid:48) ∩ S is a convex set not containing (cid:126)

0. So by inductive hypothesis, there is a linear extended ˆ f : R d − → R that is −∞ everywhere on G (cid:48) ∩ S . Set f ( z ) = ˆ f ( z ) on S . Now we have f is linear extended because it satisﬁes the recursivehypotheses of Lemma 2.4. We also have that f is −∞ on G (cid:48) : each z ∈ G (cid:48) either has v · z < z ∈ S , and both cases have been covered. Consider the convex function in one dimension, g ( x ) = (cid:40) x < ∞ x ≥ . For the previous proof to obtain g as a pointwise supremum of aﬃne extended functions, itneeded a sequence of the form h ( x ) =  −∞ x < β x = 1 ∞ x > . Indeed, one can show that g has no aﬃne extended supporting function at x = 1. But it doeseverywhere else: in particular, h supports g at all x >

1. So we might have hoped that eachconvex function is the pointwise maximum of aﬃne extended functions; but the example of g and x = 1 prevents this. This does hold on g ’s eﬀective domain (implied by existence ofextended subgradients there, Proposition 3.2).Speaking of which, we are now ready to prove the converse. Proposition 3.10 (Subgradient characterization) . A function g : R d → R ∪ {∞} is convexif and only if it has an extended subgradient at every point in its eﬀective domain. Proof. ( = ⇒ ) This is Proposition 3.2.( ⇐ = ) Suppose g has an extended subgradient at each point in its eﬀective domain G . We willshow it is the pointwise supremum of a family H of aﬃne extended functions, hence (Proposition3.7) convex.First, let H ∗ be the set of aﬃne extended functions that support g . Then H ∗ must include asupporting function at each x in the eﬀective domain (Observation 3.6). Now we repeat a trick:for each x (cid:54)∈ G , we obtain from Lemma 3.8 a set H x of aﬃne extended functions that are all If (cid:126) G (cid:48) , a contradiction. For the supportinghyperplane argument, see the above footnote, using reference Hiriart-Urrut and Lemar´echal [2001]. ∞ on G , but for which sup h ∈ H x h ( x ) = ∞ . Letting H = H ∗ unioned with (cid:83) x (cid:54)∈ G H x , we claim g ( x ) = sup h ∈ H h ( x ). First, every h ∈ H minorizes g : this follows immediately for h ∈ H ∗ , and bydeﬁnition of eﬀective domain for each H x . For x ∈ G , the claim then follows because H ∗ containsa supporting function with h ( x ) = g ( x ). For x (cid:54)∈ G , the claim follows by construction of H x . So g ( x ) = sup h ∈ H h ( x ). Finally, characterizations of strict convexity will be useful. Say an aﬃne extended functionis uniquely supporting of g at b ∈ P ⊆ eﬀdom( g ), with respect to P , if it supports g at b andat no other point in P . Lemma 3.11 (Strict convexity) . Let g : R d → R ∪ {∞} be convex, and let P ⊆ eﬀdom ( g ) bea convex set. The following are equivalent: (1) g is strictly convex on P ; (2) no two distinctpoints in P share an extended subgradient; (3) every aﬃne extended function supports g in P at most once; (4) g has at each x ∈ P a uniquely supporting aﬃne extended function. Although the diﬀerences between (2,3,4) are small, they may be useful in diﬀerent scenarios.For example, proving (4) may be the easiest way to show g is strictly convex, whereas (3)may be the more powerful implication of strict convexity. Proof.

We prove a ring of contrapositives.( ¬ ⇒ ¬

4) If g is convex but not strictly convex on P ⊆ eﬀdom( g ), then there are two points a, c ∈ P with a (cid:54) = c and there is 0 < ρ < b = ρ · a +(1 − ρ ) c , g ( b ) = ρg ( a )+(1 − ρ )) g ( c ).We show g has no uniquely supporting h at b ; intuitively, tangents at b must be tangent at a and c as well.Let h be any supporting aﬃne extended function at b and write h ( x ) = g ( b ) + f ( x − b ). Because h is supporting, g ( c ) ≥ h ( c ) = g ( b ) + f ( c − b ), implying with the axioms of linear extended functionsthat f ( b − c ) ≥ g ( b ) − g ( c ). Now, b − c = ρ ( a − c ) = ρ − ρ ( a − b ). So f ( b − c ) = ρ − ρ f ( a − b ). Similarly, g ( b ) − g ( c ) = ρ ( g ( a ) − g ( c )) = ρ − ρ ( g ( a ) − g ( b )). So f ( a − b ) ≥ g ( a ) − g ( b ), so h ( a ) = g ( b ) + f ( a − b ) ≥ g ( a ). But because h supports g , we also have h ( a ) ≤ g ( a ), so h ( a ) = g ( a ) and h supports g at a .( ¬ ⇒ ¬

3) Almost immediate: g has a supporting aﬃne extended function at every point in P because it has an extended subgradient everywhere (Proposition 3.2). If ¬

4, then one supports g at two points in P .( ¬ ⇒ ¬

2) Suppose an aﬃne extended h supports g at two distinct points a, b ∈ P . UsingObservation 3.6, we can write h ( x ) = f ( x − a ) + g ( a ) = f ( x − b ) + g ( b ) for a unique linear extended f , so f is an extended subgradient at two distinct points.( ¬ ⇒ ¬

1) Suppose f is an extended subgradient at distinct points a, c ∈ P . By deﬁnitionof extended subgradient, we have g ( a ) ≥ g ( c ) + f ( a − c ) and g ( c ) ≥ g ( a ) − f ( a − c ), implying f ( a − c ) = g ( a ) − g ( c ).Now let b = ρ · a + (1 − ρ ) c for any 0 < ρ <

1. By convexity and linear extended axioms, g ( b ) ≥ g ( a ) − f ( a − b ) = g ( a ) − f ((1 − ρ )( a − c )) = g ( a ) − (1 − ρ )( g ( a ) − g ( c )) = ρg ( a ) + (1 − ρ ) g ( c ).So g is not strictly convex. The extended subdiﬀerential.

Although its topological properties and general usefulnessare unclear, we should probably not conclude without deﬁning the extended subdiﬀerential of the function g : R d → R ∪ {∞} at x ∈ eﬀdom( g ) to be the (nonempty) set of extended12ubgradients of g at x . On the interior of eﬀdom( g ), the extended subdiﬀerential can onlycontain ﬁnite subgradients. But note that if the eﬀective domain has aﬃne dimension smallerthan R d , then the epigraph is contained in one of its vertical tangent supporting hyperplanes.In this case, the relative interior, although it has ﬁnite sugradients, will have extended onesas well that take on non-ﬁnite values outside the eﬀective domain. This section will ﬁrst deﬁne scoring rules; then recall their characterization and discuss whyit is not fully constructive; then use linear extended functions to prove more complete andconstructive versions. Y is a ﬁnite set of mutually exclusive and exhaustive outcomes, also called observations. Theprobability simplex on Y is ∆ Y = { p ∈ R Y : ( ∀ y ) p ( y ) ≥ (cid:80) y ∈Y p ( y ) = 1 } . The support of p ∈ ∆ Y is Supp( p ) = { y : p ( y ) > } . The distribution with full mass on y is δ y ∈ ∆ Y .Proper scoring rules (e.g. Gneiting and Raftery [2007]) model an expert providing aforecast p ∈ ∆ Y , after which y ∈ Y is observed and the expert’s score is S ( p, y ). The expertchooses p to maximize expected score according to an internal belief q ∈ ∆ Y . More generally,it is sometimes assumed that reports and beliefs are restricted to a subset P ⊆ ∆ Y . Twoof the most well-known are the log scoring rule, where S ( p, y ) = log p ( y ); and the quadraticscoring rule, where S ( p, y ) = −(cid:107) δ y − p (cid:107) . An example of a scoring rule that is not proper is S ( p, y ) = p ( y ). Deﬁnition 4.1 (Scoring rule, regular) . Let Y be ﬁnite and P ⊆ ∆ Y , nonempty. A function S : P × Y → R is termed a scoring rule . It is regular if S ( p, y ) (cid:54) = ∞ for all p ∈ P , y ∈ Y .Regular scoring rules are nice for two reasons. First, calculating expected scores such as (cid:80) y q ( y ) S ( p, y ) may lead to illegal sums if S ( p, y ) ∈ R , but regular scores guarantee that thesum is legal (using that q ( y ) ≥ −∞ ), allowing scores of S ( p, y ) = ∞ leads to strange and unexciting rules: p isan optimal report for any belief q unless q ( y ) = 0. Deﬁnition 4.2 (Expected score, strictly proper) . Given a regular scoring rule S : P ×Y → R ∪ {−∞} , the expected score for report p ∈ P under belief q ∈ ∆ Y is written S ( p ; q ) := (cid:80) y ∈Y q ( y ) S ( p, y ). The regular scoring rule S is proper if for all p, q ∈ P with p (cid:54) = q , S ( p ; q ) ≤ S ( q ; q ), and it is strictly proper if this inequality is always strict.Why use a general deﬁnition of scoring rules, taking values in R , if I claim that theonly interesting rules are regular? It will be useful for technical reasons: we will considerconstructions that obviously yield a well-deﬁned scoring rule, then investigate conditions underwhich that scoring rule is regular. Understanding these conditions is a main contribution ofTheorem 4.9 and particularly Theorem 4.15.13he well-known proper scoring rule characterization, discussed next, roughly states thatall proper scoring rules are of the following form. Deﬁnition 4.3 (Subtangent rule) . If a scoring rule S : P × Y → R satisﬁes S ( p, y ) = g ( p ) + f p ( δ y − p ) ∀ p ∈ P , y ∈ Y (3)for some convex function g : R Y → R ∪ {∞} with P ⊆ eﬀdom( g ) and some choices of itsextended subgradients f p at each p ∈ P , then call S a subtangent rule (of g ) .The geometric intuition is that the aﬃne extended function q (cid:55)→ g ( p ) + f p ( q − p ) is a linearapproximation of g at the point p . So a subtangent rule, on prediction p and observation y ,evaluates this linear approximation at δ y . The classic characterization.

The scoring rule characterization has appeared in manyforms, notably in McCarthy [1956], Savage [1971], Schervish [1989], Gneiting and Raftery[2007]. In particular, Savage [1971] proved:

Theorem 4.4 (Savage [1971] characterization) . A scoring rule S on P = ∆ Y taking onlyﬁnite values is proper (respectively, strictly proper) if and only if it is of the form (3) forsome convex (respectively, strictly convex) g with ﬁnite subgradients { f p } . There are several nonconstructive wrinkles in this result. By itself, it does not answer thefollowing questions (and the fact that it does not may surprise even experts, especially if theanswers feel obvious and well-known).1. Let the ﬁnite-valued S be proper, but not strictly proper. Theorem 4.4 asserts that itis of the form (3) for some convex g . Can g be strictly convex?2. Now let S be strictly proper. Theorem 4.4 asserts that it is of the form (3) for somestrictly convex g . Can it also be of the form (3) for some other, non -strictly convex g ?3. Let g : ∆ Y → R be convex. Theorem 4.4 implies that if there exists a ﬁnite-valuedscoring rule S of the form (3), then S is proper. But does such an S exist?4. Now let g be strictly convex and suppose it has a ﬁnite-valued proper scoring rule S ofthe form (3). Is S necessarily strictly proper?For the ﬁnite-valued case, answers are not too diﬃcult to derive, although I do not knowof a citation. But as discussed next, these questions are also unanswered by characterizationsthat allow scores of −∞ ; although to an extent the answers may be known to experts.Theorems 4.9 and 4.15 will address them. The answer to the ﬁrst two questions is no , and the fourth (the contrapositive of the ﬁrst) is yes . Thesefollow because a convex subdiﬀerentiable g is strictly convex on ∆ Y if and only if each subgradient appearsat most at one point p ∈ ∆ Y ; this is an easier version of Lemma 3.11. The third question is the deepest, andthe answer is not necessarily . Some convex g are not subdiﬀerentiable, i.e. they have no ﬁnite subgradient atsome points, so it is not possible to construct a ﬁnite-valued scoring rule from them at all. xtension to scores with −∞ . Savage [1971] disallows (intentionally, Section 9.4) thelog scoring rule of Good [1952], S ( p, y ) = log p ( y ), because it can assign score −∞ if p ( y ) = 0.The prominence of the log scoring rule has historically (e.g. Hendrickson and Buehler [1971])motivated including it, but doing so requires generalizing the characterization to includepossibly-neg-inﬁnite scores.The modern treatment of Gneiting and Raftery [2007] (also Frongillo and Kash [2014])captures such scoring rules as follows: It brieﬂy deﬁnes extended-valued subtangents , ana-logues of this paper’s extended subgradients (but possibly on inﬁnite-dimensional spaces,so for instance requirements of “legal sums” are replaced with “quasi-integrable”). Thecharacterization can then be stated: A regular scoring rule is (strictly) proper if and only ifit is of the form (3) for some (strictly) convex g with subtangents { f p } .However, these characterizations have drawbacks analogous to those enumerated above,and also utilize the somewhat-mysterious subtangent objects. This motivates Theorem 4.9(for the case P = ∆ Y ) and Theorem 4.15 (for general P ⊆ ∆ Y ), which are more speciﬁcabout when and how subtangent scoring rules can be constructed.In particular, for P = ∆ Y , Theorem 4.9 formalizes a slightly-informal construction andclaim of Gneiting and Raftery [2007], Section 3.1: given any convex function on ∆ Y , all of itssubtangent rules are proper. I do not know if proving it would be rigorously possible withoutmuch of the development of extended subgradients in this paper. This circles us back to ﬁrstpaper to state a characterization, McCarthy [1956], which asserts “any convex function of aset of probabilities may serve”, but, omitting proofs, merely remarks, “The derivative has tobe taken in a suitable generalized sense.”For general P , the following question becomes interesting: Suppose we are given a convexfunction g and form a subtangent scoring rule S . Prior characterizations imply that if S is regular, then it is proper. But under what conditions is S regular? Theorem 4.15 givesan answer (see also Figure 1), namely, one must choose certain subgradients of a g that is interior-locally-Lipschitz (Deﬁnition 4.13). Aside: subgradient rules.

For completeness, we will brieﬂy connect to a diﬀerent versionof the characterization [McCarthy, 1956, Hendrickson and Buehler, 1971]: a regular scoringrule is proper if and only if it is a subgradient rule of a positively homogeneous convexfunction g , where: Deﬁnition 4.5 (Subgradient scoring rule) . Let Y be ﬁnite and P ⊆ ∆ Y . A scoring rule S : P × Y → R is a subgradient scoring rule (of g ) if S ( p, y ) = f p ( δ y ) ∀ p ∈ P , y ∈ Y (4)for some set of extended subgradients, f p at each p ∈ P , of some convex function g : R Y → R ∪ {∞} with eﬀdom( g ) ⊇ P .To understand the interplay of the two characterizations, consider the convex function g ( z ) = z · (cid:126) − (cid:80) y ∈Y z ( y ) −

1. In particular, it is zero for z ∈ ∆ Y . Its subgradients are g is positively homogeneous if g ( αx ) = αg ( x ) for all α > , x ∈ R d . p ( z ) = z · (cid:126) z . The proper scoring rule S ( p, y ) = 0 ( ∀ p, y ) is a subtangent rule of g ,since it is of the form S ( p, y ) = g ( p ) + f p ( δ y − p ) = 0 + 0 = 0. It is not a subgradient rule of g , since each f p ( δ y ) = 1, whereas S ( p, y ) = 0.So S instantiates the Savage characterization (subtangent rules) and not the McCarthycharacterization (subgradient rules) with respect to this g . However, this all-zero scoring ruleis not a counterexample to the McCarthy characterization: it is a subgradient rule of some other convex function, which is in fact positively homogeneous, i.e. g ( z ) = 0.Another example: The log scoring rule S ( p, y ) = log p ( y ) is typically presented as a subtangent rule of the negative entropy g ( p ) = (cid:80) y ∈Y p ( y ) log p ( y ) (inﬁnite if p (cid:54)∈ ∆ Y ), butit is not a subgradient rule of that function, as pointed out by Marschak [1959]. But, asHendrickson and Buehler [1971] responds, it is a subgradient rule of the positively homogeneousfunction g ( p ) = (cid:80) y ∈Y p ( y ) log (cid:16) p ( y ) / (cid:80) y (cid:48) p ( y (cid:48) ) (cid:17) deﬁned on the nonnegative orthant.So in general (Theorem 4.9), a proper scoring rule can be written as a subtangent rule ofsome convex function g ; and it can also be written as a subgradient rule of a possibly-diﬀerentone. Before proving the scoring rule characterization, we encounter a problem with the expectedscore function S ( p ; q ) := (cid:80) y ∈Y q ( y ) S ( p, y ) of a regular scoring rule. Usual characterizationproofs proceed by observing that S ( p ; q ) is an aﬃne function of q , and that a pointwisesupremum over these functions yields the convex g ( q ) from Deﬁnition 4.3 (subtangent rule).However, S ( p ; q ) is not technically an aﬃne nor aﬃne extended function, as it is only deﬁnedon q ∈ ∆ Y . Attempting to naively extend it to q ∈ R Y leads to illegal sums.Therefore, this section formalizes a key fact about regular scoring rules (Lemma 4.8): forﬁxed p , S ( p ; q ) always coincides on ∆ Y with at least one aﬃne extended function; we call anysuch an extended expected score of S . Deﬁnition 4.6 (Extended expected score) . Let Y be ﬁnite and P a subset of ∆ Y . Givena regular scoring rule S : P × Y → R ∪ {−∞} , for each p ∈ P , an aﬃne extended function h : R Y → R is an extended expected score function of S at p if h ( δ y ) = S ( p, y ) for all y ∈ Y .The name is justiﬁed: h ( q ) is the expected score for report p under belief q . In fact thisholds for any q ∈ ∆ Y , not just q ∈ P . Recall that S ( p ; q ) := (cid:80) y ∈Y q ( y ) S ( p, y ) Observation 4.7. If h is an extended expected score of a regular S at p , then for all q ∈ ∆ Y , h ( q ) = S ( p ; q ) . To prove it, write h ( q ) = β + f ( q − p ) and recall by Deﬁnition 4.6 that S ( p, y ) = h ( δ y ).So S ( p ; q ) = β + (cid:80) y q ( y ) f ( δ y − p ). By regularity of S , the sum never contains + ∞ , so it is alegal sum and, using the scaling axiom of f , it equals β + f ( q − p ) = h ( q ). Lemma 4.8 (Existence of expected scores) . Let Y be ﬁnite, P ⊆ ∆ Y , and S : P × Y → R ∪ {−∞} a regular scoring rule. Then S has at least one extended expected score function S p at every p ∈ P ; in fact, it has an extended linear one. roof. Given p , let Y = { y : S ( p, y ) = −∞} and Y = Y \ Y . We deﬁne S p : R Y → R viaAlgorithm 1 using the following parameters. Set t = | Y | . Let ˆ f be deﬁned on the subspace spannedby { δ y : y ∈ Y } via ˆ f ( x ) = (cid:80) y ∈ Y x ( y ) S ( p, y ). By deﬁnition, ˆ f is a ﬁnite linear function. If t ≥ v , . . . , v t = {− δ y : y ∈ Y } , in any order. By Proposition 2.5, S p is linear extended because itis implemented by Algorithm 1.To show it is an extended expected score function, we calculate S p ( δ y ) for any given y ∈ Y . If y ∈ Y , then v j · δ y = 0 for all j = 1 , . . . , t and S p ( δ y ) = ˆ f ( δ y ) = S ( p, y ). Otherwise, v j · δ y = 0 forall j = 1 , . . . , t except for some j ∗ where v j ∗ = − δ y . There, v j ∗ · δ y = −

1, so S p ( δ y ) = −∞ = S ( p, y ). This section considers P = ∆ Y and uses the machinery of linear extended functions to provea complete characterization of proper scoring rules, including their construction from anyconvex function and set of extended subgradients. This arguably improves little on the stateof knowledge as in Gneiting and Raftery [2007] and folklore of the ﬁeld, but formalizes someimportant previously-informal or unstated facts. Theorem 4.9 (Construction of proper scoring rules) . Let Y be a ﬁnite set.1. Let g : R Y → R ∪ {∞} be convex with eﬀdom ( g ) ⊇ ∆ Y . Then: (a) it has at least onesubtangent rule; and (b) all of its subtangent rules are regular and proper; and (c) if g is strictly convex on ∆ Y , then all of its subtangent rules are strictly proper.2. Let S : ∆ Y × Y → R ∪ {−∞} be a regular, proper scoring rule. Then: (a) it is asubtangent rule of some convex g with eﬀdom ( g ) ⊇ ∆ Y ; and (b) if S is strictly proper,then any such g is strictly convex on ∆ Y ; and (c) in fact, it is a subgradient rule ofsome (possibly diﬀerent) positively homogeneous convex g . Proof. (1) Given g , deﬁne any subtangent rule by S ( p, y ) = g ( p ) + f p ( δ y − p ) for any choices ofextended subgradients f p : R Y → R at each p ∈ ∆ Y . Proposition 3.2 asserts that at least one choiceof f p exists for all p ∈ ∆ Y , so at least one such function S can be deﬁned, proving (a) . And it isa regular scoring rule (i.e. never assigns score + ∞ ) because eﬀdom( g ) ⊇ ∆ Y and by deﬁnition ofextended subgradient, S ( p, y ) ≤ g ( δ y ).To show S is proper: Immediately, the function S p ( q ) = g ( p ) + f p ( q − p ) is an extended expectedscore of S (Deﬁnition 4.6). It is an aﬃne extended function supporting g at p . So S q ( q ) ≥ S p ( q ) forall p (cid:54) = q , so S is proper by deﬁnition, completing (b) . Furthermore, if g is strictly convex on ∆ Y ,then (for any choices of subgradients) each resulting S p must support g at only one point in ∆ Y byLemma 3.11, so S q ( q ) > S p ( q ) for all p (cid:54) = q and S is strictly proper. This proves (c) .(2) Given S , by Lemma 4.8, for each p there exists a linear extended expected score S p .Deﬁne g ( q ) = sup p ∈ ∆ Y S p ( q ), a convex function by Proposition 3.7. By deﬁnition of properness, S q ( q ) = max p S p ( q ) = g ( q ). So S q is an aﬃne extended function supporting g at q ∈ eﬀdom( g ). So(e.g. Observation 3.6) it can be written S q ( x ) = g ( q ) + f q ( x − q ) for some extended subgradient f q of g at q . In particular, by deﬁnition of extended expected score, S ( p, y ) = S q ( δ y ), so S is asubtangent rule of g , whose eﬀective domain contains ∆ Y . This proves (a) . urthermore, suppose S is strictly proper and a subtangent rule of some g . Then as above, it hasaﬃne extended expected score functions S q at each q and, by strict properness, S q ( p ) < S p ( p ) = g ( p )for all p ∈ ∆ Y , p (cid:54) = q . So at each p ∈ ∆ Y there is a supporting aﬃne extended function S p thatsupports g nowhere else on ∆ Y . By Lemma 3.11, this is a necessary and suﬃcient condition forstrict convexity of g on ∆ Y . This proves (b) .Now looking closer, recall Lemma 4.8 guarantees existence of a linear extended S p . So S p ( (cid:126)

0) =0 = g ( p ) + f p ( − p ), implying f p ( p ) = g ( p ) = S p ( p ). Then for all x , S p ( x ) = S p ( p ) + f p ( x − p ), whichrearranges and uses the axioms of linearity (along with ﬁniteness of S p ( p )) to get S p ( x − p ) = f p ( x − p ),or S p = f p . Since we have S ( p, y ) = S p ( δ y ) = f p ( δ y ), we get that S is a subgradient rule of g .Furthermore, since each S p is linear extended, αg ( q ) = α sup p ∈ ∆ Y S p ( q ) = sup p ∈ ∆ Y αS p ( q ) = g ( αq ),for α ≥

0. So g is positively homogeneous, proving (c) . Observe that (as is well-known) if S is a subtangent rule of g , then g ( q ) = S ( q ; q ) for all q , i.e. g gives the expected score at belief q for a truthful report. This also implies that g isuniquely deﬁned on ∆ Y .One nontrivial example (besides the log scoring rule) is the scoring rule that assigns k points to any prediction with support size |Y| − k , assuming the outcome y is in its support; −∞ if it is not. This is associated with the convex function g taking the value k on therelative interior of each |Y| − k − |Y| − δ y .A generalization can be obtained from any set function G : 2 Y → R that is monotone ,meaning X ⊆ Y = ⇒ G ( X ) ≤ G ( Y ). We can set g ( p ) := G (Supp( p )). In other words,the score for prediction p is G (Supp( p )) if the observation y is in Supp( p ); the score is −∞ otherwise. We utilized this construction in Chen and Waggoner [2016].Of course neither of these is strictly proper. A similar, strictly proper approach is toconstruct g from any strictly convex and bounded function g on the interior of the simplex;then on the interior of each facet, let g coincide with any bounded strictly convex functionwhose lower bound exceeds the upper bound of g ; and so on. Remark 4.10.

In this remark, we restate these results using notation for the sets of regular,proper, strictly proper scoring rules, and so on. Some readers may ﬁnd this phrasing andcomparison to prior work helpful.Let Reg( P ), Proper( P ), and StrictProper( P ) be the sets of regular, proper, and strictlyproper scoring rules on P . Let C ( P ) be the set of convex functions, nowhere −∞ , witheﬀective domain containing P . Let ST( G ) be the subtangent scoring rules of all members ofthe set of functions G .Then the characterization (prior work) is the statement Reg(∆ Y ) ∩ Proper(∆ Y ) =Reg(∆ Y ) ∩ ST( C (∆ Y )). That is, regular rules are proper if and only if they are subtan-gent rules. The construction (this work, Theorem 4.9) states that Proper(∆ Y ) = ST( C (∆ Y )).In other words, it claims that all subtangent rules on ∆ Y are regular.Meanwhile, let C + ( P ) be those g ∈ C ( P ) that are strictly convex on P , while C − ( P )is those that are not. The strict characterization (prior work) is Reg(∆ Y ) ∩ StrictProper(∆ Y ) =18eg(∆ Y ) ∩ ST( C + (∆ Y )). The construction, Theorem 4.9, states StrictProper(∆ Y ) = ST( C + (∆ Y )).Theorem 4.9 also states ST( C − (∆ Y )) ∩ ST( C + (∆ Y )) = ∅ . Finally, for all g ∈ C ( P ), it statesST( g ) (cid:54) = ∅ . We ﬁnally consider general belief and report spaces

P ⊆ ∆ Y . Here proper scoring rules havebeen characterized by Gneiting and Raftery [2007] (when P is convex) and Frongillo andKash [2014] (in general), including for inﬁnite-dimensional outcome spaces. The statement isthat a regular scoring rule S : P × Y → R ∪ {−∞} is (strictly) proper if and only if it is asubtangent rule of some (strictly) convex g with eﬀdom( g ) ⊇ P .Again, this characterization leaves open the question of exactly which convex functionsproduce proper scoring rules on P . Unlike in the case P = ∆ Y , not all of them do, asFigure 1b illustrates: suppose P = eﬀdom( g ) ⊆ int(∆ Y ) and it has only vertical supportingaﬃne extended functions at some p on the boundary of P . Then the construction S ( p, y ) = g ( p ) + f p ( δ y − p ) will lead to S ( p, y ) = + ∞ for some y . So S will not be regular, so it cannotbe proper.Therefore, the key question is: for which convex g are their subtangent rulespossibly, or necessarily, regular? We next make some deﬁnitions to answer this question.

Figure 1:

Let Y = { , } ; the horizontal axes are ∆ Y parameterized by p (1).Each subﬁgure gives the domain of g and whether it is interior-locally-Lipschitz,i.e. produces a scoring rule. (1a) plots the log scoring rule’s associated g ( p ) = (cid:80) y p ( y ) log p ( y ); others shift/squeeze/truncate the domain. Vertical supports on theinterior of the simplex violate interior-local-Lipschitzness and lead to illegal scoringrules, which would assign e.g. S ( p,

0) = + ∞ and S ( p,

1) = −∞ . (a) [0 , (b) [0 . , . (c) (0 . , . (d) [0 . , . The following deﬁnition will capture extended subgradients that lead to regular scoringrules. Recall from Proposition 2.6 that a linear extended function has a unique parameteriza-tion in Algorithm 1. Because we may have t = 0 and Supp( p ) = Y , the conditions can bevacuously true. Deﬁnition 4.11 (Interior-ﬁnite) . Say a linear extended function f : R Y → R is p -interior-ﬁnite for p ∈ ∆ Y if its parameterization in Algorithm 1 has: (1) for all pairs y, y (cid:48) ∈ Supp( p ),that v j ( y ) = v j ( y (cid:48) ) for all j = 1 , . . . , t ; and (2) for all y (cid:54)∈ Supp( p ) and y (cid:48) ∈ Supp( p ), that v j ( y ) = v j ( y (cid:48) ) for some sequence j = 1 , . . . , k , with either k = t or else v k +1 ( y ) < v k +1 ( y (cid:48) ).19f f is p -interior-ﬁnite, then property (1) gives that f ( q − p ) is ﬁnite for q in the face ofthe simplex generated by the support of p . (2) gives that f ( q − p ) is either ﬁnite or −∞ at all q ∈ ∆ Y that are not in that face. These are the key properties that we will need forregular scoring rules. Lemma 4.12 (Interior-ﬁnite implies quasi-integrable) . Let p ∈ ∆ Y . A linear extendedfunction f : R Y → R is p -interior-ﬁnite if and only if f ( δ y − p ) ∈ R ∪ {−∞} for all y ∈ Y .Furthermore, in this case f ( δ y − p ) ∈ R for any y ∈ Supp ( p ) . Proof.

Let f be a linear extended function and v , . . . , v t its unit vectors in the parameterizationof Algorithm 1.( = ⇒ ) Suppose f is p -interior-ﬁnite. If it is ﬁnite everywhere, then the conclusion is immediate.Otherwise, let y ∈ Supp( p ). Using that v ( y ) = v ( y (cid:48) ) for any y (cid:48) ∈ Supp( p ), we have v · ( δ y − p ) = v ( y ) − v · p = v ( y ) − (cid:88) y (cid:48) ∈ Supp( p ) v ( y (cid:48) ) p ( y (cid:48) )= v ( y ) − v ( y ) (cid:88) y (cid:48) ∈ Supp( p ) p ( y (cid:48) )= 0 . This applies to each successive v j , so f ( δ y − p ) ∈ R .Now let y (cid:54)∈ Supp( p ). Having just shown v · ( δ y (cid:48) − p ) = 0 for any y (cid:48) ∈ Supp( p ), we have v · ( δ y − p ) = v · ( δ y − δ y (cid:48) ) = v ( y ) − v ( y (cid:48) ) ≤ p -interior-ﬁnite. So eitherAlgorithm 1 returns −∞ in the ﬁrst iteration, or we continue to the second iteration. The sameargument applies at each iteration, so f ( δ y − p ) ∈ R ∪ {−∞} .( ⇐ = ) Suppose f ( δ y − p ) ∈ R ∪ {−∞} for all y ∈ Y . Again if f is ﬁnite everywhere, it is p -interior-ﬁnite, QED. So suppose its depth is at least 1. We must have v · ( δ y − p ) ≤ y ∈ Y . This rearranges to v ( y ) ≤ v · p for all y , or max y ∈Y v ( y ) ≤ v · p . But p is aprobability distribution, so v · p ≤ max y ∈Y v ( y ). So v · p = max y ∈Y v ( y ). This implies that v ( y ) = v ( y (cid:48) ) = v · p for all y, y (cid:48) ∈ Supp( p ). So v · ( δ y − p ) = 0 if y ∈ Supp( p ). Meanwhile, if y (cid:54)∈ Supp( p ), then v ( y ) ≤ v · p = v ( y (cid:48) ) for any y (cid:48) ∈ Supp( p ).This argument repeats to prove that if y ∈ Supp( p ), then for all j = 1 , . . . , t , v j ( y ) =max y (cid:48) ∈Y v j ( y (cid:48) ) and also v j · ( δ y − p ) = 0. So it also gives f ( δ y − p ) ∈ R . Meanwhile, if y (cid:54)∈ Supp( p ),then we have a sequence v ( y ) = max y (cid:48) ∈Y v ( y (cid:48) ), . . . , v k ( y ) = max y (cid:48) ∈Y v k ( y (cid:48) ), where in each case v j · ( δ y − p ) = 0. Then ﬁnally, either k = t and f ( δ y − p ) ∈ R , or we have v k +1 ( y ) < max y (cid:48) ∈Y v k +1 ( y (cid:48) )and f ( δ y − p ) = −∞ . We can now make the key deﬁnition for the scoring rule characterization.

Deﬁnition 4.13 (Interior-locally-Lipschitz) . For

P ⊆ ∆ Y ∩ eﬀdom( g ), say g : R d → R ∪ {∞} is P -interior-locally-Lipschitz if g has at every p ∈ P a p -interior-ﬁnite extended subgradient.An interior-locally-Lipschitz g simply has two conditions on its vertical supporting hy-perplanes. First, they cannot cut faces of the simplex whose relative interior intersects20ﬀdom( g ); they can only contain them. This is enforced by v j ( y ) = v j ( y (cid:48) ) for y, y (cid:48) ∈ Supp( p ).In other words, the extended subgradients must be ﬁnite (in particular bounded, an analogyof causing g to be Lipschitz) relative to the aﬃne hull of these faces. Second, they must beoriented correctly so as not to cut that face from the rest of the simplex. This is enforced by v j ( y ) ≤ v j ( y (cid:48) ) for y (cid:54)∈ Supp( p ), y (cid:48) ∈ Supp( p ).Lemma 4.12 gives the following corollary, in which the word “regular” is the point: Corollary 4.14.

A convex g : R Y → R ∪ {∞} with eﬀdom ( g ) ⊇ P has a regular subtangentscoring rule if and only if it is P -interior-locally-Lipschitz. Proof. ( = ⇒ ) Suppose g has a regular subtangent scoring rule S ( p, y ) = g ( p ) + f p ( δ y − p ),where f p is an extended subgradient of g at p . By regularity, f p ( δ y − p ) ∈ R ∪ {−∞} , so byLemma 4.12, f p is p -interior-ﬁnite. So g has a p -interior-ﬁnite subgradient at every p ∈ P , so it is P -interior-locally-Lipschitz.( ⇐ = ) Suppose g is P -interior-locally-Lipschitz; then it has at each p ∈ P a p -interior-ﬁnitesubgradient f p . Let S ( p, y ) = g ( p ) + f p ( δ y − p ); by deﬁnition of p -interior-ﬁnite, f p ( δ y − p ) is never+ ∞ , so S is regular, and it is a subtangent score. Theorem 4.15 (Construction of scoring rules on P ) . Let Y be a ﬁnite set and P ⊆ ∆ Y .1. Let g : R Y → R ∪ {∞} be convex and P -interior-locally-Lipschitz with eﬀdom ( g ) ⊇ P .Then: (a) g has at least one subtangent scoring rule that is regular; and (b) all of itssubtangent scoring rules that are regular are proper; and (c) if g is strictly convex on P , then all of its regular subtangent rules are strictly proper.2. Let S : P ×Y → R ∪{−∞} be a regular proper scoring rule. Then: (a) it is a subtangentrule of some convex P -interior-locally-Lipschitz g : R Y → R ∪ {∞} with eﬀdom ( g ) ⊇ P ;and (b) if S is strictly proper and P a convex set, then any such g is strictly convex on P . Observe that in (1), g may have some subtangent rules that are regular and some thatare not, depending on the choices of extended subgradients at each p . (1) only claims thatthere exists at least one choice that is regular (and therefore proper). An example is if P = eﬀdom( g ) is a single point, say the uniform distribution; vertical tangent planes will notwork, but ﬁnite subgradients will. (This issue did not arise in the previous section where P = ∆ Y ; there, all of a convex g ’s subtangent rules were regular.) Proof. (1) Let such a g be given. We ﬁrst prove that there exists a regular subtangent scoring ruleof g . At each p ∈ P , let f p be a p -interior-ﬁnite extended subgradient of g , guaranteed by deﬁnitionof interior-locally-Lipschitz. Let S : P × Y → R be of the form S ( p, y ) = g ( p ) + f p ( δ y − p ). ApplyingLemma 4.12, f p ( δ y − p ) ∈ R ∪ {−∞} , so S ( p, y ) (cid:54) = ∞ , so it is a regular scoring rule, proving (a) .Now, let S be any regular subtangent scoring rule of g ; we show S is proper. Immediately wehave an aﬃne expected score function at each p satisfying S p ( q ) = g ( p ) + f p ( q − p ) for all p ∈ P andall q ∈ ∆ Y . We have S p ( p ) ≥ S q ( p ) for all q because S p supports g at p , so S is proper, proving (b) .Furthermore, if g is strictly convex, then each S q supports g at just one point in its eﬀective domain(Lemma 3.11). So S p ( p ) > S q ( p ) for p, q ∈ P with q (cid:54) = p , so S is strictly proper, proving (c) .

2) Let a regular proper scoring rule S be given. It has by Lemma 4.8 an aﬃne extended expectedscore function S p at each p ∈ P . Deﬁne g : R Y → R ∪ {∞} by g ( q ) = sup p ∈P S p ( q ). This is convex byProposition 3.7. By deﬁnition of proper, for all p ∈ P , we have S p ( p ) = max q ∈P S q ( p ) = g ( p ). So S p is an aﬃne extended function supporting g at p , so it can be written S p ( q ) = g ( p ) + f p ( q − p ) for somesubgradient f p of g at p . By deﬁnition of extended expected score, S ( p, y ) = S p ( δ y ) = g ( p )+ f p ( δ y − p ),so it is a subtangent rule.Next, since S is a scoring rule, in particular S p ( δ y ) ∈ R ∪ {−∞} for all y . So by Lemma 4.12,each f p is p -interior-ﬁnite, so g is interior-locally-Lipschitz, proving (a) .Now suppose S is strictly proper and P a convex set. Then for any q ∈ P , we have that S q supports g at q alone out of all points in P , by deﬁnition of strict properness. By Lemma 3.11, g isstrictly convex on P , proving (b) . There remain open directions investigating interior-locally-Lipschitz functions. Twoimportant cases can be immediately resolved: • If P ⊆ int(∆ Y ), then g is P -interior-locally-Lipschitz if g is subdiﬀerentiable on P , i.e.has ﬁnite subgradients everywhere on P . This follows because every p ∈ P has fullsupport. More rigorously, this must be true of g when restricted to the aﬃne hull ofthe simplex; its subgradients can still be inﬁnite perpendicular to the simplex, e.g. ofthe form f ( x ) = ∞ · ( x · (cid:126)

1) + ˆ f ( x ). • If P is an open set relative to the aﬃne hull of ∆ Y , then every convex function g witheﬀdom( g ) ⊇ P is P -interior-locally-Lipschitz. This follows from the previous case alongwith the fact that convex functions are subdiﬀerentiable on open sets in their domain.One question for future work is whether one can optimize eﬃciently over the space of interior-locally-Lipschitz g , for a given set P ⊆ ∆ Y . This could be useful in constructing properscoring rules that optimize some objective.Finally, we restate the general- P construction using the notation of Remark 4.10. Remark 4.16.

Let L ( P ) be the set of P -interior-locally-Lipschitz convex g with eﬀectivedomain containing P , with L + ( P ) those that are strictly convex on P and L − ( P ) those thatare not.The characterization (prior work) states Reg( P ) ∩ Proper( P ) = Reg( P ) ∩ ST( C ( P ));the construction (Theorem 4.15) states Proper( P ) = Reg( P ) ∩ ST( L ( P )). In other words,if a subtangent rule of g is regular, then g is P -interior-locally-Lipschitz. Similarly, thecharacterization states Reg( P ) ∩ StrictProper( P ) = Reg( P ) ∩ ST( C + ( P )), while Theorem4.15 states StrictProper( P ) = Reg( P ) ∩ ST( L + ( P )).Theorem 4.15 furthermore states that Reg( P ) ∩ ST( L − ( P )) ∩ ST( L + ( P )) = ∅ , i.e. strictproperness exactly corresponds to strict convexity. Finally, it also states that for any particular g , Reg( P ) ∩ ST( g ) (cid:54) = ∅ if and only if g ∈ L ( P ), i.e. g has a regular subtangent rule if andonly if it is convex and P -interior-locally-Lipschitz.22 eferences Yiling Chen and Bo Waggoner. Informational substitutes. In , FOCS ’16, 2016.Rafael Frongillo and Ian Kash. General truthfulness characterizations via convex analysis.In

Proceedings of the 10th Conference on Web and Internet Economics , WINE ’14, pages354–370. Springer, 2014.Tilman Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, andestimation.

Journal of the American Statistical Association , 102(477):359–378, 2007.Irving J. Good. Rational decisions.

Journal of the Royal Statistical Society , 14(1):107–114,1952.Arlo D. Hendrickson and Robert J. Buehler. Proper scores for probability forecasters.

TheAnnals of Mathematical Statistics , 42(6):1916–1921, 12 1971.Jean-Baptiste Hiriart-Urrut and Claude Lemar´echal.

Fundamentals of Convex Analysis .Springer, 2001.Jacob Marschak.

Remarks on the Economics of Information , pages 91–117. SpringerNetherlands, 1959. ISBN 978-94-010-9278-4.John McCarthy. Measures of the value of information.

Proceedings of the National Academyof Sciences , 42(9):654–655, 1956.R. Tyrrell Rockafellar.

Convex analysis . Princeton University Press, 1970.Leonard J. Savage. Elicitation of personal probabilities and expectations.

Journal of theAmerican Statistical Association , 66(336):783–801, 1971.Mark J. Schervish. A general method for comparing probability assessors.

The Annalsof Statistics , 17(4):1856–1879, 12 1989. doi: 10.1214/aos/1176347398. URL https://doi.org/10.1214/aos/1176347398https://doi.org/10.1214/aos/1176347398