[PDF] Efficient Synthesis with Probabilistic Constraints

Abstract

We consider the problem of synthesizing a program given a probabilistic specification of its desired behavior. Specifically, we study the recent paradigm of distribution-guided inductive synthesis (DIGITS), which iteratively calls a synthesizer on finite sample sets from a given distribution. We make theoretical and algorithmic contributions: (i) We prove the surprising result that DIGITS only requires a polynomial number of synthesizer calls in the size of the sample set, despite its ostensibly exponential behavior. (ii) We present a property-directed version of DIGITS that further reduces the number of synthesizer calls, drastically improving synthesis performance on a range of benchmarks.

Full PDF

aa r X i v : . [ c s . P L ] M a y Eﬃcient Synthesis with Probabilistic Constraints

Samuel Drews, Aws Albarghouthi, and Loris D’Antoni

University of Wisconsin–Madison

Abstract.

We consider the problem of synthesizing a program given aprobabilistic speciﬁcation of its desired behavior. Speciﬁcally, we studythe recent paradigm of distribution-guided inductive synthesis ( digits ),which iteratively calls a synthesizer on ﬁnite sample sets from a givendistribution. We make theoretical and algorithmic contributions: ( i ) Weprove the surprising result that digits only requires a polynomial num-ber of synthesizer calls in the size of the sample set, despite its ostensi-bly exponential behavior. ( ii ) We present a property-directed version of digits that further reduces the number of synthesizer calls, drasticallyimproving synthesis performance on a range of benchmarks. Over the past few years, progress in automatic program synthesis has touchedmany application domains, including automating data wrangling and data ex-traction tasks [20,21,29,14,2,12], generating network conﬁgurations that meetuser intents [28,9], optimizing low-level code [27,24], and more [4,13].The majority of the current work has focused on synthesis under Booleanconstraints. However, often times we require the program to adhere to a prob-abilistic speciﬁcation, e.g., a controller that succeeds with a high probability, adecision-making model operating over a probabilistic population model, a ran-domized algorithm ensuring privacy, etc. In this work, we are interested in (1)investigating probabilistic synthesis from a theoretical perspective and (2) de-veloping eﬃcient algorithmic techniques to tackle this problem.Our starting point is our recent framework for probabilistic synthesis called distribution-guided inductive synthesis ( digits ) [1]. The digits framework isanalogous in nature to the guess-and-check loop popularized by counterexample-guided approaches to synthesis and veriﬁcation ( cegis and cegar ). The keyidea of the algorithm is reducing the probabilistic synthesis problem to a non-probabilistic one that can be solved using existing techniques, e.g., sat solvers.This is performed using the following loop: (1) approximating the input proba-bility distribution with a ﬁnite sample set; (2) synthesizing a program for variouspossible output assignments of the ﬁnite sample set; and (3) invoking a proba-bilistic veriﬁer to check if one of the synthesized programs indeed adheres to thegiven speciﬁcation. digits has been shown to theoretically converge to correct programs whenthey exist—thanks to learning-theory guarantees. The primary bottleneck of digits is the number of expensive calls to the synthesizer, which is ostensiblyxponential in the size of the sample set. Motivated by this observation, thispaper makes theoretical, algorithmic, and practical contributions: – On the theoretical side, we present a detailed analysis of digits and provethat it only requires a polynomial number of invocations of the synthesizer,explaining that the strong empirical performance of the algorithm is notmerely due to the heuristics presented in [1] (Section 3). – On the algorithmic side, we develop an improved version of digits thatis property-directed, in that it only invokes the synthesizer on instancesthat have a chance of resulting in a correct program, without sacriﬁcingconvergence. We call the new approach τ - digits (Section 4). – On the practical side, we implement τ - digits for sketch-based synthesisand demonstrate its ability to converge signiﬁcantly faster than digits . Weapply our technique to a range of benchmarks, including illustrative examplesthat elucidate our theoretical analysis, probabilistic repair problems of unfairprograms, and probabilistic synthesis of controllers (Section 5). In this section, we present the synthesis problem, the digits [1] algorithm, andfundamental background on learning theory.

As discussed in [1], digits searches through some (inﬁnite)set of programs, but it requires that the set of programs has ﬁnite VC dimen-sion (we restate this condition in Section 2.3). Here we describe one constructiveway of obtaining such sets of programs with ﬁnite VC dimension: we will con-sider sets of programs deﬁned as program sketches [26] in the simple grammarfrom [1], where a program is written in a loop-free language, and “holes” deﬁningthe sketch replace some constant terminals in expressions. The syntax of thelanguage is deﬁned below: P := V ← E | if B then P else P | P P | return V Here, P is a program, V is the set of variables appearing in P , E (resp. B ) isthe set of linear arithmetic (resp. Boolean) expressions over V (where, again,constants in E and B can be replaced with holes), and V ← E is an assignment.We assume a vector v I of variables in V that are inputs to the program. Wealso assume there is a single Boolean variable v r ∈ V that is returned by theprogram. All variables are real-valued or Boolean. Given a vector of constant In the case of loop-free program sketches as considered in our program model, we canconvert the input-output relation into a real arithmetic formula that guaranteedlyhas ﬁnite VC dimension [11]. Restricting the output to Boolean is required by the algorithm; other output typescan be turned into Boolean by rewriting. See, e.g., thermostat example in Section 5. alues c , where | c | = | v I | , we use P ( c ) to denote the result of executing P onthe input c .In our setting, the inputs to a program are distributed according to some joint probability distribution D over the variables v I . Semantically, a program P is denoted by a distribution transformer J P K , whose input is a distribution overvalues of v I and whose output is a distribution over v I and v r .A program also has a probabilistic postcondition , post , deﬁned as an inequalityover terms of the form Pr[ B ] , where B is a Boolean expression over v I and v r .Speciﬁcally, a probabilistic postcondition consists of Boolean combinations ofthe form e > c , where c ∈ R and e is an arithmetic expression over terms of theform Pr[ B ] , e.g., Pr[ B ] / Pr[ B ] > . .Given a triple ( P, D , post ) , we say that P is correct with respect to D and post , denoted J P K ( D ) | = post , iﬀ post is true on the distribution J P K ( D ) . Example 1.

Consider the set of intervals of the form [0 , a ] ⊆ [0 , and inputs x uniformly distributed over [0 , (i.e. D = Uniform [0 , ). We can write inclusionin the interval as a (C-style) program (left) and consider a postcondition statingthat the interval must include at least half the input probability mass (right): if (0 <= x && x <= a) { return return Pr x ∼ D [ P ( x ) = 1] > . Let P c denote the interval program where a is replaced by a constant c ∈ [0 , .Observe that J P c K ( D ) describes a joint distribution over ( x, v r ) pairs, where [0 , c ] ×{ } is assigned probability measure c and ( c, ×{ } is assigned probabilitymeasure − c . Therefore, J P c K ( D ) | = post if and only if c ∈ [0 . , . Synthesis Problem. digits outputs a program that is approximately “sim-ilar” to a given functional speciﬁcation and that meets a postcondition. Thisfunctional speciﬁcation is some input-output relation which we quantitativelywant to match as closely as possible: speciﬁcally, we want to minimize the er-ror of the output program P from the functional speciﬁcation ˆ P , deﬁned asEr ( P ) := Pr x ∼ D [ P ( x ) = ˆ P ( x )] . (Note that we represent the functional speciﬁca-tion as a program.) The postcondition is Boolean, and therefore we always wantit to be true. digits is guaranteed to converge whenever the space of solutionssatisfying the postcondition is robust under small perturbations. The followingdeﬁnition captures this notion of robustness: Deﬁnition 1 ( α -Robust Programs). Fix an input distribution D , a postcon-dition post, and a set of programs P . For any P ∈ P and any α > , denote the open α -ball centered at P as B α ( P ) = { P ′ ∈ P | Pr x ∼ D [ P ( x ) = P ′ ( x )] < α } .We say a program P is α -robust if ∀ P ′ ∈ B α ( P ) . J P ′ K ( D ) | = post. We can now state the synthesis problem solved by digits : Deﬁnition 2 (Synthesis Problem).

Given an input distribution D , a set ofprograms P , a postcondition post, a functional speciﬁcation ˆ P ∈ P , and parame-ters α > and < ε α , the synthesis problem is to ﬁnd a program P ∈ P suchthat J P K ( D ) | = post and such that any other α -robust P ′ has Er ( P ) Er ( P ′ )+ ε . .2 A Naive DIGITS Algorithm Algorithm 1 shows a simpliﬁed, naive version of digits , which employs a synthesize-then-verify approach. The idea of digits is to utilize non-probabilistic synthesistechniques to synthesize a set of programs, and then apply a probabilistic veri-ﬁcation step to check if any of the synthesized programs is a solution. Procedure digits ( ˆ

P , D , post , m ) S ← { x ∼ D | i ∈ [1 , . . . , m ] } progs ← ∅ foreach f : S → { , } do P ← O syn ( { ( x, f ( x )) | x ∈ S } ) if P = ⊥ then progs ← progs ∪ { P } res ← { P ∈ progs |O ver ( P, D , post ) } return argmin P ∈ res {O err ( P ) } Algorithm 1:

Naive digits

Speciﬁcally, this “Naive digits ”begins by sampling an appropriatenumber of inputs from the input dis-tribution and stores them in the set S . Second, it iteratively explores eachpossible function f that maps the in-put samples to a Boolean and invokesa synthesis oracle to synthesize a pro-gram P that implements f , i.e. thatsatisﬁes the set of input–output ex-amples in which each input x ∈ S is mapped to the output f ( x ) . Naive digits then ﬁnds which of the synthesized programs satisfy the postcondition(the set res ); we assume that we have access to a probabilistic veriﬁer O ver toperform these computations. Finally, the algorithm outputs the program in theset res that has the lowest error with respect to the functional speciﬁcation, onceagain assuming access to another oracle O err that can measure the error.Note that the number of such functions f : S → { , } is exponential in thesize of | S | . As a “heuristic” to improve performance, the actual digits algorithmas presented in [1] employs an incremental trie-based search, which we describe(alongside our new algorithm, τ - digits ) and analyze in Section 3. The naiveversion described here is, however, suﬃcient to discuss the convergence propertiesof the full algorithm. digits is only guaranteed to converge when the program model P has ﬁnite VCdimension . Intuitively, the VC dimension captures the expressiveness of the setof ( { , } -valued) programs P . Given a set of inputs S , we say that P shatters S iﬀ, for every partition of S into sets S ⊔ S , there exists a program P ∈ P such that ( i ) for every x ∈ S , P ( x ) = 0 , and ( ii ) for every x ∈ S , P ( x ) = 1 . Deﬁnition 3 (VC Dimension).

The

VC dimension of a set of programs P isthe largest integer d such that there exists a set of inputs S with cardinality d that is shattered by P . We deﬁne the function

VCcost ( ε, δ, d ) = ε (4 log ( δ )+8 d log ( ε )) [5], whichis used in the following theorem: Recall that this is largely a “free” assumption since, again, sketches in our loop-freegrammar guaranteedly have ﬁnite VC dimension. heorem 1 (Convergence).

Assume that there exist an α > and program P ∗ that is α -robust w.r.t. D and post. Let d be the VC dimension of the set ofprograms P . For all bounds < ε α and δ > , for every function O syn ,and for any m > VCcost ( ε, δ, k ) , with probability > − δ we have that digits enumerates a program P with Pr x ∼ D [ P ∗ ( x ) = P ( x )] ε and J P K ( D ) | = post. To reiterate, suppose P ∗ is a correct program with small error Er ( P ∗ ) = k ;the convergence result follows two main points: ( i ) P ∗ must be α -robust , meaningevery P with Pr x ∼ D [ P ( x ) = P ∗ ( x )] < α must also be correct, and therefore( ii ) by synthesizing any P such that Pr x ∼ D [ P ( x ) = P ∗ ( x )] ε where ε < α ,then P is a correct program with error Er ( P ) within k ± ε . The importance of ﬁnite VC dimension is due to the fact that the convergencestatement borrows directly from probably approximately correct (PAC) learning .We will brieﬂy discuss a core detail of eﬃcient PAC learning that is relevant tounderstanding the convergence of digits (and, in turn, our analysis of τ - digits in Section 4), and refer the interested reader to Kearns and Vazirani’s book [15]for a complete overview. Speciﬁcally, we consider the notion of an ε -net , whichestablishes the approximate-deﬁnability of a target program in terms of pointsin its input space. Deﬁnition 4 ( ε -net). Suppose P ∈ P is a target program, and points in itsinput domain X are distributed x ∼ D . For a ﬁxed ε ∈ [0 , , we say a set ofpoints S ⊂ X is an ε -net for P (with respect to P and D ) if for every P ′ ∈ P with Pr x ∼ D [ P ( x ) = P ′ ( x )] > ε there exists a witness x ∈ S such that P ( x ) = P ′ ( x ) . In other words, if S is an ε -net for P , and if P ′ “agrees” with P on all of S , then P and P ′ can only diﬀer by at most ε probability mass.Observe the relevance of ε -nets to the convergence of digits : the synthesisoracle is guaranteed not to “fail” by producing only programs ε -far from some ε -robust P ∗ if the sample set happens to be an ε -net for P ∗ . In fact, this obser-vation is exactly the core of the PAC learning argument: having an ε -net exactlyguarantees the approximate learnability.A remarkable result of computational learning theory is that whenever P hasﬁnite VC dimension, the probability that m random samples fail to yield an ε -netbecomes diminishingly small as m increases. Indeed, the given VCcost functionused in Theorem 1 is a dual form of this latter result—that polynomially manysamples are suﬃcient to form an ε -net with high probability. After providing details on the search strategy employed by digits , we present ourtheoretical result on the polynomial bound on the number of synthesis queriesthat digits requires. nitialize explored ← { ǫ } P ǫ ← ˆ P depth ← best ← ⊥∀ σ ∈ explored . ∀ b ∈ { , } . ( P σ = ⊥ ∧ | σb | depth ∧ unblocked ( σb ) ) ⇒ σb ∈ explored Deepen sample depth +1 ∼ D depth ← depth + 1 σ ∈ explored P σ = ⊥ b ∈ { , } σb explored | σb | depth unblocked ( σb ) Explore (Synthesis Query) P σb ← O syn ( { ( sample i +1 , σb ( i )) : 0 i < | σb |} ) explored ← explored ∪ { σb } σ ∈ explored P σ = ⊥ b ∈ { , } σb explored | σb | depth unblocked ( σb ) P σ ( sample | σb | ) = b Explore (Solution Propagation) P σb ← P σ explored ← explored ∪ { σb } σ ∗ = argmin σ {O err ( P σ ) | σ ∈ explored ∧ P σ = ⊥ ∧ O ver ( P σ ) = true } Best best ← P σ ∗ where unblocked ( σ ) := |{ i : 0 i < | σ | ∧ σ ( i ) = ˆ P ( sample i +1 ) }| τ · depth Fig. 1.

Full digits description and our new extension, τ - digits , shown in boxes. Naive digits , as presented in Algorithm 1, performs a very unstructured, expo-nential search over the output labelings of the sampled inputs—i.e., the possi-ble Boolean functions f in Algorithm 1. In our original paper [1] we present a“heuristic” implementation strategy that incrementally explores the set of pos-sible output labelings using a trie data structure. In this section, we study thecomplexity of this technique through the lens of computational learning theoryand discover the surprising result that digits requires a polynomial numberof calls to the synthesizer in the size of the sample set! Our improved searchalgorithm (Section 4) inherits these results.For the remainder of this paper, we use digits to refer to this incrementalversion. A full description is necessary for our analysis: Figure 1 (non-framedrules only) consists of a collection of guarded rules describing the constructionof the trie used by digits to incrementally explore the set of possible outputlabelings. Our improved version, τ - digits (presented in Section 4), correspondsto the addition of the framed parts, but without them, the rules describe digits .Nodes in the trie represent partial output labelings—i.e., functions f assign-ing Boolean values to only some of the samples in S = { x , . . . , x m } . Each nodeis identiﬁed by a binary string σ = b · · · b k ( k can be smaller than m ) denot-ing the path to the node from the root. The string σ also describes the partialoutput-labeling function f corresponding to the node—i.e., if the i -th bit b i is = ∅ [0 , . S = { . } [0 , .

3] [0 , S = { . , . } × [0 , .

3] [0 , .

5] [0 , Fig. 2.

Example execution of incremental digits on interval programs, starting from [0 , . . Hollow circles denote calls to O syn that yield new programs; the cross denotesa call to O syn that returns ⊥ . set to 1, then f ( x i ) = true . The set explored represents the nodes in the triebuilt thus far; for each new node, the algorithm synthesizes a program consistentwith the corresponding partial output function (“Explore” rules). The variable depth controls the incremental aspect of the search and represents the maximumlength of any σ in explored ; it is incremented whenever all nodes up to thatdepth have been explored (the “Deepen” rule). The crucial part of the algorithmis that, if no program can be synthesized for the partial output function of a nodeidentiﬁed by σ , the algorithm does not need to issue further synthesis queriesfor the descendants of σ .Figure 2 shows how digits builds a trie for an example run on the intervalprograms from Example 1, where we suppose we begin with an incorrect programdescribing the interval [0 , . . Initially, we set the root program to [0 , . (leftﬁgure). The “Deepen” rule applies, so a sample is added to the set of samples—suppose it’s . . “Explore” rules are then applied twice to build the children ofthe root: the child following the 0 branch needs to map . , which [0 , . already does, thus it is propagated to that child without asking O syn to performa synthesis query. For the child following 1, we instead make a synthesis query,using the oracle O syn , for any value of a such that [0 , a ] maps . —suppose itreturns the solution a = 1 , and we associate [0 , with this node. At this point wehave exhausted depth 1 (middle ﬁgure), so “Deepen” once again applies, perhapsadding . to the sample set. At this depth (right ﬁgure), only two calls to O syn are made: in the case of the call at σ = 01 , there is no value of a that causesboth . and . , so O syn returns ⊥ , and we do not try to explore anychildren of this node in the future. The algorithm continues in this manner untila stopping condition is reached—e.g., enough samples are enumerated. We observed in [1] that the trie-based exploration seems to be eﬃcient in prac-tice, despite potential exponential growth of the number of explored nodes inthe trie as the depth of the search increases. The convergence analysis of digits relies on the ﬁnite VC dimension of the program model, but VC dimension itselfs just a summary of the growth function , a function that describes a notionof complexity of the set of programs in question. We will see that the growthfunction much more precisely describes the behavior of the trie-based search; wewill then use a classic result from computational learning theory to derive betterbounds on the performance of the search. We deﬁne the growth function below,adapting the presentation from [15].

Deﬁnition 5 (Realizable Dichotomies).

We are given a set P of programsrepresenting functions from X → { , } and a (ﬁnite) set of inputs S ⊂ X . Wecall any f : S → { , } a dichotomy of S ; if there exists a program P ∈ P thatextends f to its full domain X , we call f a realizable dichotomy in P . We denotethe set of realizable dichotomies as Π P ( S ) := { f : S → { , } | ∃ P ∈ P . ∀ x ∈ S. P ( x ) = f ( x ) } . Observe that for any (inﬁnite) set P and any ﬁnite set S that | Π P ( S ) | | S | .We deﬁne the growth function in terms of the realizable dichotomies: Deﬁnition 6 (Growth Function).

The growth function is the maximal num-ber of realizable dichotomies as a function of the number of samples, denoted ˆΠ P ( m ) := max S ⊂X : | S | = m {| Π P ( S ) |} . Observe that P has VC dimension d if and only if d is the largest integer satisfying ˆΠ P ( d ) = 2 d (and inﬁnite VC dimension when ˆΠ P ( m ) is identically m )—in fact,VC dimension is often deﬁned using this characterization. Example 2.

Consider the set of intervals of the form [0 , a ] as in Example 1 andFigure 2. For the set of two points S = { . , . } , we have that | Π [0 ,a ] ( S ) | = 3 ,since, by example: a = 0 . accepts . but not . , a = 0 . accepts neither, and a = 1 accepts both, thus these three dichotomies are realizable; however, nointerval with as a left endpoint can accept . and not . , thus this dichotomyis not realizable. In fact, for any (ﬁnite) set S ⊂ [0 , , we have that | Π [0 ,a ] ( S ) | = | S | + 1 ; we then have that ˆΠ [0 ,a ] ( m ) = m + 1 .When digits terminates having used a sample set S , it has considered allthe dichotomies of S : the programs it has enumerated exactly correspond toextensions of the realizable dichotomies Π P ( S ) . The trie-based exploration iseﬀectively trying to minimize the number of O syn queries performed on non-realizable ones, but doing so without explicit knowledge of the full functionalbehavior of programs in P . In fact, it manages to stay relatively close to per-forming queries only on the realizable dichotomies: Lemma 1. digits performs at most | S || Π P ( S ) | synthesis oracle queries. Moreprecisely, let S = { x , . . . , x m } be indexed by the depth at which each sample wasadded: the exact number of synthesis queries is P mℓ =1 | Π P ( { x , . . . , x ℓ − } ) | .roof. Let T d denote the total number of queries performed once depth d iscompleted. We perform no queries for the root, thus T = 0 . Upon completingdepth d − , the realizable dichotomies of { x , . . . , x d − } exactly specify thenodes whose children will be explored at depth d . For each such node, one childis skipped due to solution propagation, while an oracle query is performed on theother, thus T d = T d − + | Π P ( { x , . . . , x d − } ) | . Lastly, | Π P ( S ) | cannot decreaseby adding elements to S , so we have that T m = P mℓ =1 | Π P ( { x , . . . , x ℓ − } ) | P mℓ =1 | Π P ( S ) | | S || Π P ( S ) | . ⊓⊔ Connecting digits to the realizable dichotomies and, in turn, the growthfunction allows us to employ a remarkable result from computational learningtheory, stating that the growth function for any set exhibits one of two asymp-totic behaviors: it is either identically m (inﬁnite VC dimension) or dominatedby a polynomial! This is commonly called the Sauer-Shelah Lemma [23,25]: Lemma 2 (Sauer-Shelah). If P has ﬁnite VC dimension d , then for all m > d , ˆΠ P ( m ) (cid:0) emd (cid:1) d ; i.e. ˆΠ P ( m ) = O ( m d ) . Combining our lemma with this famous one yields a surprising result—thatfor a ﬁxed set of programs P with ﬁnite VC dimension, the number of oraclequeries performed by digits is guaranteedly polynomial in the depth of thesearch, where the degree of the polynomial is determined by the VC dimension: Theorem 2. If P has VC dimension d , then digits performs O ( m d +1 ) synthesis-oracle queries. In short, the reason an execution of digits seems to enumerate a sub-exponential number of programs (as a function of the depth of the search) isbecause it literally must be polynomial. Furthermore, the algorithm performsoracle queries on nearly only those polynomially-many realizable dichotomies.

Example 3. A digits run on the [0 , a ] programs as in Figure 2 using a sampleset of size m will perform O ( m ) oracle queries, since the VC dimension of theseintervals is . (In fact, every run of the algorithm on these programs will performexactly m ( m + 1) many queries.) τ -DIGITS digits has better convergence guarantees when it operates on larger sets ofsampled inputs. In this section, we describe a new optimization of digits thatreduces the number of synthesis queries performed by the algorithm so that itmore quickly reaches higher depths in the trie, and thus allows to scale to largersamples sets. This optimized digits , called τ - digits , is shown in Figure 1 asthe set of all the rules of digits plus the framed elements. The high-level ideais to skip synthesis queries that are (quantiﬁably) unlikely to result in optimal We assume the functional speciﬁcation itself is some ˆ P ∈ P and thus can be used—the alternative is a trivial synthesis query on an empty set of constraints. olutions. For example, if the functional speciﬁcation ˆ P maps every sampledinput in S to 0, then the synthesis query on the mapping of every element of S to 1 becomes increasingly likely to result in programs that have maximaldistance from ˆ P as the size of S increases; hence the algorithm could probablyavoid performing that query. In the following, we make use of the concept of Hamming distance between pairs of programs:

Deﬁnition 7 (Hamming Distance).

For any ﬁnite set of inputs S and anytwo programs P , P , we denote Hamming S ( P , P ) := |{ x ∈ S | P ( x ) = P ( x ) }| (we will also allow any { , } -valued string to be an argument of Hamming S ). Fix the given functional speciﬁcation ˆ P and suppose that there exists an ε -robustsolution P ∗ with (nearly) minimal error k = Er ( P ∗ ) := Pr x ∼ D [ ˆ P ( x ) = P ∗ ( x )] ;we would be happy to ﬁnd any program P in P ∗ ’s ε -ball. Suppose we angelicallyknow k a priori, and we thus restrict our search (for each depth m ) only toconstraint strings (i.e. σ in Figure 1) that have Hamming distance not muchlarger than km .To be speciﬁc, we ﬁrst ﬁx some threshold τ ∈ ( k, . Intuitively, the optimiza-tion corresponds to modifying digits to consider only paths σ through the triesuch that Hamming S ( ˆ P , σ ) τ | S | . This is performed using the unblocked func-tion in Figure 1. Since we are ignoring certain paths through the trie, we need toask: How much does this decrease the probability of the algorithm succeeding? —It depends on the tightness of the threshold, which we address in Section 4.2. InSection 4.3, we discuss how to adaptively modify the threshold τ as τ - digits isexecuting, which is useful when a good τ is unknown a priori. Using τ - digits , the choice of τ will aﬀect both ( i ) how many synthesis queries areperformed, and ( ii ) the likelihood that we miss optimal solutions; in this sectionwe explore the latter point. Interestingly, we will see that all of the analysis isdependent only on parameters directly related to the threshold; notably, noneof this analysis is dependent on the complexity of P (i.e. its VC dimension).If we really want to learn (something close to) a program P ∗ , then we shoulduse a value of the threshold τ such that Pr S ∼ D m [ Hamming S ( ˆ P , P ∗ ) τ m ] islarge—to do so requires knowledge of the distribution of Hamming S ( ˆ P , P ∗ ) .Recall the binomial distribution : for parameters ( n, p ) , it describes the numberof successes in n -many trials of an experiment that has success probability p . Claim.

Fix P and let k = Pr x ∼ D [ ˆ P ( x ) = P ( x )] . If S is sampled from D m , thenHamming S ( ˆ P , P ) is binomially distributed with parameters ( m, k ) . The former point is a diﬃcult combinatorial question that to our knowledge has noprecedent in the computational learning literature, and so we leave it as future work. ext, we will use our knowledge of this distribution to reason about the failureprobability , i.e. that τ - digits does not preserve the convergence result of digits .The simplest argument we can make is a union-bound style argument: thethresholded algorithm can “fail” by ( i ) failing to sample an ε -net, or otherwise( ii ) sampling a set on which the optimal solution has a Hamming distance thatis not representative of its actual distance. We provide the quantiﬁcation of thisfailure probability in the following theorem: Theorem 3.

Let P ∗ be a target ε -robust program with k = Pr x ∼ D [ ˆ P ( x ) = P ∗ ( x )] ,and let δ be the probability that m samples do not form an ε -net for P ∗ . If we runthe τ - digits with τ ∈ ( k, , then the failure probability is at most δ +Pr[ X > τ m ] where X ∼ Binomial ( m, k ) . In other words, we can use tail probabilities of the binomial distribution to boundthe probability that the threshold causes us to “miss” a desirable program weotherwise would have enumerated. Explicitly, we have the following corollary:

Corollary 1. τ - digits increases failure probability (relative to digits ) by atmost Pr[

X > τ m ] = P mi = ⌊ τm ⌋ +1 (cid:0) mi (cid:1) k i (1 − k ) m − i . Informally, when m is not too small , k is not too large , and τ is reasonably forgiv-ing , these tail probabilities can be quite small. We can even analyze the asymp-totic behavior by using any existing upper bounds on the binomial distribution’stail probabilities—importantly, the additional error diminishes exponentially as m increases, dependent on the size of τ relative to k . Corollary 2. τ - digits increases failure probability by at most e − m ( τ − k ) . Example 4.

Suppose m = 100 , k = 0 . , and τ = 0 . . Then the extra failureprobability term in Theorem 3 is less than . .As stated at the beginning of this subsection, the balancing act is to choose τ ( i ) small enough so that the algorithm is still fast for large m , yet ( ii ) largeenough so that the algorithm is still likely to learn the desired programs. The fur-ther challenge is to relax our initial strong assumption that we know the optimal k a priori when determining τ , which we address in the following subsection. Of course, we do not have the angelic knowledge that lets us pick an idealthreshold τ ; the only absolutely sound choice we can make is the trivial τ = 1 .Fortunately, we can begin with this choice of τ and adaptively reﬁne it as thesearch progresses. Speciﬁcally, every time we encounter a correct program P suchthat k = Er ( P ) , we can reﬁne τ to reﬂect our newfound knowledge that “thebest solution has distance of at most k .”We refer to this reﬁnement as adaptive τ - digits . The modiﬁcation involvesthe addition of the following rule to Figure 1: A more precise (though less convenient) bound is e − m ( τ ln τk +(1 − τ ) ln − τ − k ) . est = ⊥ Reﬁne Threshold (for some g : [0 , → [0 , ) τ ← g ( O err ( best )) We can use any (non-decreasing) function g to update the threshold τ ← g ( k ) . The simplest choice would be the identity function (which we use in ourexperiments), although one could use a looser function so as not to over-prunethe search. If we choose functions of the form g ( k ) = k + b , then Corollary 2allows us to make (slightly weak) claims of the following form: Claim.

Suppose the adaptive algorithm completes a search of up to depth m yielding a best solution with error k (so we have the ﬁnal threshold value τ = k + b ). Suppose also that P ∗ is an optimal ε -robust program at distance k − η .The optimization-added failure probability (as in Corollary 1) for a run of (non-adaptive) τ - digits completing depth m and using this τ is at most e − m ( b + η ) . Implementation.

In this section, we evaluate our new algorithm τ - digits (Fig-ure 1) and its adaptive variant (Section 4.3) against digits (i.e., τ - digits with τ = 1 ). Both algorithms are implemented in Python and use the SMT solverZ3 [8] to implement a sketch-based synthesizer O syn . We employ statistical ver-iﬁcation for O ver and O err : we use Hoeﬀding’s inequality for estimating proba-bilities in post and Er. Probabilities are computed with 95% conﬁdence, leavingour oracles potentially unsound. Research Questions.

Our evaluation aims to answer the following questions:

RQ1

Is adaptive τ - digits more eﬀective/precise than τ - digits ? RQ2 Is τ - digits more eﬀective/precise than digits ? RQ3

Can τ - digits solve challenging synthesis problems?We experiment on three sets of benchmarks: ( i ) synthetic examples for whichthe optimal solutions can be computed analytically (Section 5.1), ( ii ) the set ofbenchmarks considered in the original digits paper (Section 5.2), ( iii ) a variantof the thermostat-controller synthesis problem presented in [7] (Section 5.3). We consider a class of synthetic programs for which we can compute the optimalsolution exactly; this lets us compare the results of our implementation to anideal baseline. Here, the program model P is deﬁned as the set of axis-aligned hy-perrectangles within [ − , d ( d ∈ { , , } and the VC dimension is d ), and theinput distribution D is such that inputs are distributed uniformly over [ − , d .We ﬁx some probability mass b ∈ { . , . , . } and deﬁne the benchmarks sothat the best error for a correct solution is exactly b (see Appendix B).We run our implementation using thresholds τ ∈ { . , . , . , . , } , omit-ting those values for which τ < b ; additionally, we also consider an adaptive run

50 100050100

Time (s) D e p t h C o m p l e t e d . . Time (s) (log scale) B e s t E rr o r adaptive τ = 1 τ = 0 . τ = 0 . τ = 0 . Fig. 3.

Synthetic hyperrectangle problem instance with parameters d = 1 , b = 0 . . where τ is initialized as the value , and whenever a new best solution is enu-merated with error k , we update τ ← k . Each combination of parameters wasrun for a period of 2 minutes. Figure 3 ﬁxates on d = 1 , b = 0 . and shows eachof the following as a function of time: ( i ) the depth completed by the search(i.e. the current size of the sample set), and ( ii ) the best solution found by thesearch. (See Appendix B for other conﬁgurations of ( d, b ) .)By studying Figure 3 we see that the adaptive threshold search performs atleast as well as the tight thresholds ﬁxed a priori because reasonable solutionsare found early. In fact, all search conﬁgurations ﬁnd solutions very close to theoptimal error (indicated by the horizontal dashed line). Regardless, they reachdiﬀerent depths, and the main advantage of reaching large depths concerns thestrength of the optimality guarantee. Note, also, that small τ values are nec-essary to see improvements in the completed depth of the search. Indeed, thediscrepancy between the depth-versus-time functions diminishes drastically forthe problem instances with larger values of b (see Appendix B); the gains ofthe optimization are contingent on the existence of correct solutions close to thefunctional speciﬁcation. Findings (RQ1): τ - digits does tend to ﬁnd reasonable solutions at earlydepths and near-optimal solutions at later depths, thus adaptive τ - digits is moreeﬀective than τ - digits , and we use it throughout our remaining experiments. The original digits paper [1] evaluates on a set of 18 repair problems of varyingcomplexity. The functional speciﬁcations are machine-learned decision trees andsupport vector machines, and each search space P involves the set of programsformed by replacing some number of real-valued constants in the program withholes. The postcondition is a form of algorithmic fairness —e.g., the programshould output true on inputs of type A as often as it does on inputs of type B [10]. For each such repair problem, we run both digits and adaptive τ - digits

100 200 300 4000100200300400 digits a d a p t i v e τ - d i g i t s Depth Completed . . . . . . digits a d a p t i v e τ - d i g i t s Best Error

Fig. 4.

Improvement of using adaptive τ - digits on the original digits benchmarks.Left: the dotted line marks the . × average increase in depth. (again, with initial τ = 1 and the identity reﬁnement function). Each benchmarkis run for 10 minutes, where the same sample set is used for both algorithms.Figure 4 shows, for each benchmark, ( i ) the largest sample set size com-pleted by adaptive τ - digits versus digits (left—above the diagonal line indi-cates adaptive τ - digits reaches further depths), and ( ii ) the error of the bestsolution found by adaptive τ - digits versus digits (right—below the diagonalline indicates adaptive τ - digits ﬁnds better solutions). We see that adaptive τ - digits reaches further depths on every problem instance, many of which aresubstantial improvements, and that it ﬁnds better solutions on 10 of the 18problems. For those which did not improve, either the search was already deepenough that digits was able to ﬁnd near-optimal solutions, or the complexity ofthe synthesis queries is such that the search is still constrained to small depths. Findings (RQ2):

Adaptive τ - digits can ﬁnd better solutions than thosefound by digits and can reach greater search depths. We challenge adaptive τ - digits with the task of synthesizing a thermostat con-troller, borrowing the benchmark from [7]. The input to the controller is theinitial temperature of the environment; since the world is uncertain, there is aspeciﬁed probability distribution over the temperatures. The controller itself is aprogram sketch consisting primarily of a single main loop: iterations of the loopcorrespond to timesteps, during which the synthesized parameters dictate anincremental update made by the thermostat based on the current temperature.The loop runs for 40 iterations, then terminates, returning the absolute value ofthe diﬀerence between its ﬁnal actual temperature and the target temperature.The postcondition is a Boolean probabilistic correctness property intuitivelycorresponding to controller safety, e.g. with high probability, the temperatureshould never exceed certain thresholds. In [7], there is a quantitative objective

10 20 40

Unrolling D e p t h . . . . Unrolling B e s t E rr o r N = 8 N = 4 N = 2 Fig. 5.

Thermostat controller results. in the form of minimizing the expected value E [ | actual − target | ] —our settingdoes not admit optimizing with respect to expectations, so we must modify theproblem. Instead, we ﬁx some value N ( N ∈ { , , } ) and have the programreturn when | actual − target | < N and otherwise. Our quantitative objectiveis to minimize the error from the constant-zero functional speciﬁcation ˆ P ( x ) := 0 (i.e. the actual temperature always gets close enough to the target). The fullspeciﬁcation of the controller is provided in Appendix C.We consider variants of the program where the thermostat runs for fewertimesteps and try loop unrollings of size { , , , } . We run each benchmarkfor 10 minutes: the ﬁnal completed search depths and best error of solutionsare shown in Figure 5. For this particular experiment, we use the SMT solverCVC4 [3] because it performs better than Z3 on the occurring SMT instances.As we would expect, for larger values of N it is “easier” for the thermostat toreach the target temperature threshold and thus the quality of the best solutionincreases in N . However, with small unrollings (i.e. 5) the synthesized controllersdo not have enough iterations (time) to modify the temperature enough for theprobability mass of extremal temperatures to reach the target: as we increasethe number of unrollings to 10, we see that better solutions can be found sincethe set of programs are capable of stronger behavior.On the other hand, the completed depth of the search plummets as theunrolling increases due to the complexity of the O syn queries. Consequently, for20 and 40 unrollings, adaptive τ - digits synthesizes worse solutions because itcannot reach the necessary depths to obtain better guarantees.One ﬁnal point of note is that for N = 8 and 10 unrollings, it seems that thereis a sharp spike in the completed depth. However, this is somewhat artiﬁcial:because N = 8 creates a very lenient quantitative objective, an early O syn queryhappens to yield a program with an error less than − . Adaptive τ - digits then updates τ ←≈ − and skips most synthesis queries. Findings (RQ3):

Adaptive τ - digits can synthesize small variants of a com-plex thermostat controller, but cannot solve variants with many loop iterations. Related Work

Synthesis & Probability.

Program synthesis is a mature area with many pow-erful techniques. The primary focus is on synthesis under Boolean constraints,and probabilistic speciﬁcations have received less attention [1,7,18,16]. We dis-cuss the works that are most related to ours. digits [1] is the most relevant work. First, we show for the ﬁrst time that digits only requires a number of synthesis queries polynomial in the number ofsamples. Second, our adaptive τ - digits further reduces the number of synthesisqueries required to solve a synthesis problem without sacriﬁcing correctness.The technique of smoothed proof search [7] approximates a combination offunctional correctness and maximization of an expected value as a smooth, con-tinuous function. It then uses numerical methods to ﬁnd a local optimum ofthis function, which translates to a synthesized program that is likely to be cor-rect and locally maximal. The benchmarks described in Section 5.3 are variantsof benchmarks from [7]. Smoothed proof search can minimize expectation; τ - digits minimizes probability only. However, unlike τ - digits , smoothed proofsearch lacks formal convergence guarantees and cannot support the rich proba-bilistic postconditions we support, e.g., as in the fairness benchmarks.Works on synthesis of probabilistic programs are aimed at a diﬀerent prob-lem [18,6,22]: that of synthesizing a generative model of data. For example,Nori et al. [18] use sketches of probabilistic programs and complete them witha stochastic search. Recently, Saad et al. [22] synthesize an ensemble of proba-bilistic programs for learning Gaussian processes and other models.Kˇucera et al. [16] present a technique for automatically synthesizing programtransformations that introduce uncertainty into a given program with the goal ofsatisfying given privacy policies—e.g., preventing information leaks. They lever-age the speciﬁc structure of their problem to reduce it to an SMT constraintsolving problem. The problem tackled in [16] is orthogonal to the one targetedin this paper and the techniques are therefore very diﬀerent. Stochastic Satisﬁability.

Our problem is closely related to e-majsat [17], aspecial case of stochastic satisﬁability ( ssat ) [19] and a means for formalizingprobabilistic planning problems. e-majsat is of np pp complexity. An e-majsat formula has deterministic and probabilistic variables. The goal is to ﬁnd anassignment of deterministic variables such that the probability that the formulais satisﬁed is above a given threshold. Our setting is similar, but we operate overcomplex program statements and have an additional optimization objective (i.e.,the program should be close to the functional speciﬁcation). The deterministicvariables in our setting are the holes deﬁning the search space; the probabilisticvariables are program inputs. Acknowledgements.

We thank Shuchi Chawla, Yingyu Liang, Jerry Zhu, theentire fairness reading group at UW-Madison, and Nika Haghtalab for all of thedetailed discussions. This material is based upon work supported by the NationalScience Foundation under grant numbers 1566015, 1704117, and 1750965. eferences

1. Albarghouthi, A., D’Antoni, L., Drews, S.: Repairing decision-making programsunder uncertainty. In: Majumdar, R., Kunčak, V. (eds.) Computer Aided Veriﬁca-tion. pp. 181–200. Springer International Publishing, Cham (2017)2. Barowy, D.W., Gulwani, S., Hart, T., Zorn, B.G.: Flashrelate: extract-ing relational data from semi-structured spreadsheets using examples.In: Proceedings of the 36th ACM SIGPLAN Conference on Program-ming Language Design and Implementation, Portland, OR, USA, June15-17, 2015. pp. 218–228 (2015). https://doi.org/10.1145/2737924.2737952, http://doi.acm.org/10.1145/2737924.2737952

3. Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanović,D., King, T., Reynolds, A., Tinelli, C.: Cvc4. In: Proceedings ofthe 23rd International Conference on Computer Aided Veriﬁcation.pp. 171–177. CAV’11, Springer-Verlag, Berlin, Heidelberg (2011), http://dl.acm.org/citation.cfm?id=2032305.2032319

4. Bastani, O., Sharma, R., Aiken, A., Liang, P.: Synthesizing program inputgrammars. In: Proceedings of the 38th ACM SIGPLAN Conference on Pro-gramming Language Design and Implementation, PLDI 2017, Barcelona, Spain,June 18-23, 2017. pp. 95–110 (2017). https://doi.org/10.1145/3062341.3062349, http://doi.acm.org/10.1145/3062341.3062349

5. Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability andthe vapnik-chervonenkis dimension. Journal of the ACM (JACM) (4), 929–965(1989)6. Chasins, S., Phothilimthana, P.M.: Data-driven synthesis of full probabilistic pro-grams. In: International Conference on Computer Aided Veriﬁcation. pp. 279–304.Springer (2017)7. Chaudhuri, S., Clochard, M., Solar-Lezama, A.: Bridging boolean and quantitativesynthesis using smoothed proof search. In: POPL. vol. 49, pp. 207–220. ACM (2014)8. De Moura, L., Bjørner, N.: Z3: An eﬃcient smt solver. In: International conferenceon Tools and Algorithms for the Construction and Analysis of Systems. pp. 337–340. Springer (2008)9. El-Hassany, A., Tsankov, P., Vanbever, L., Vechev, M.: Network-wide conﬁgurationsynthesis (2017)10. Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian,S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. pp.259–268. ACM (2015)11. Goldberg, P.W., Jerrum, M.: Bounding the vapnik-chervonenkis di-mension of concept classes parameterized by real numbers. MachineLearning (2-3), 131–148 (1995). https://doi.org/10.1007/BF00993408, https://doi.org/10.1007/BF00993408

12. Gulwani, S.: Automating string processing in spreadsheets using input-outputexamples. In: Proceedings of the 38th ACM SIGPLAN-SIGACT Symposiumon Principles of Programming Languages, POPL 2011, Austin, TX, USA, Jan-uary 26-28, 2011. pp. 317–330 (2011). https://doi.org/10.1145/1926385.1926423, http://doi.acm.org/10.1145/1926385.1926423

13. Gulwani, S.: Program synthesis. In: Software Systems Safety,pp. 43–75 (2014). https://doi.org/10.3233/978-1-61499-385-8-43, https://doi.org/10.3233/978-1-61499-385-8-43

4. Gulwani, S.: Programming by examples - and its applicationsin data wrangling. In: Dependable Software Systems Engineering,pp. 137–158 (2016). https://doi.org/10.3233/978-1-61499-627-9-137, https://doi.org/10.3233/978-1-61499-627-9-137

15. Kearns, M.J., Vazirani, U.V.: An Introduction to Computational Learning Theory.MIT Press, Cambridge, MA, USA (1994)16. Kučera, M., Tsankov, P., Gehr, T., Guarnieri, M., Vechev, M.: Synthesis ofprobabilistic privacy enforcement. In: Proceedings of the 2017 ACM SIGSACConference on Computer and Communications Security. pp. 391–408. CCS ’17,ACM, New York, NY, USA (2017). https://doi.org/10.1145/3133956.3134079, http://doi.acm.org/10.1145/3133956.3134079

17. Littman, M.L., Goldsmith, J., Mundhenk, M.: The computational complexity ofprobabilistic planning. Journal of Artiﬁcial Intelligence Research , 1–36 (1998)18. Nori, A.V., Ozair, S., Rajamani, S.K., Vijaykeerthy, D.: Eﬃ-cient synthesis of probabilistic programs. SIGPLAN Not. (6),208–217 (Jun 2015). https://doi.org/10.1145/2813885.2737982, http://doi.acm.org/10.1145/2813885.2737982

19. Papadimitriou, C.H.: Games against nature. Journal of Computer and SystemSciences (2), 288–301 (1985)20. Polozov, O., Gulwani, S.: Flashmeta: a framework for inductive programsynthesis. In: Proceedings of the 2015 ACM SIGPLAN International Con-ference on Object-Oriented Programming, Systems, Languages, and Applica-tions, OOPSLA 2015, part of SPLASH 2015, Pittsburgh, PA, USA, Octo-ber 25-30, 2015. pp. 107–126 (2015). https://doi.org/10.1145/2814270.2814310, http://doi.acm.org/10.1145/2814270.2814310

21. Raza, M., Gulwani, S.: Automated data extraction using predictive program syn-thesis. In: Proceedings of the Thirty-First AAAI Conference on Artiﬁcial Intelli-gence, February 4-9, 2017, San Francisco, California, USA. pp. 882–890 (2017), http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/15034

22. Saad, F.A., Cusumano-Towner, M.F., Schaechtle, U., Rinard, M.C., Mansinghka,V.K.: Bayesian synthesis of probabilistic programs for automatic data modeling.Proceedings of the ACM on Programming Languages (POPL), 37 (2019)23. Sauer, N.: On the density of families of sets. Journal of Combinatorial Theory,Series A (1), 145–147 (1972)24. Schkufza, E., Sharma, R., Aiken, A.: Stochastic program optimization.Commun. ACM (2), 114–122 (2016). https://doi.org/10.1145/2863701, http://doi.acm.org/10.1145/2863701

25. Shelah, S.: A combinatorial problem; stability and order for models and theoriesin inﬁnitary languages. Paciﬁc Journal of Mathematics (1), 247–261 (1972)26. Solar-Lezama, A.: Program Synthesis by Sketching. Ph.D. thesis, Berkeley, CA,USA (2008), aAI335322527. Srinivasan, V., Reps, T.W.: Synthesis of machine code from semantics.In: Proceedings of the 36th ACM SIGPLAN Conference on Program-ming Language Design and Implementation, Portland, OR, USA, June15-17, 2015. pp. 596–607 (2015). https://doi.org/10.1145/2737924.2737960, http://doi.acm.org/10.1145/2737924.2737960

28. Subramanian, K., D’Antoni, L., Akella, A.: Genesis: synthesizing forwarding tablesin multi-tenant networks. In: Proceedings of the 44th ACM SIGPLAN Symposiumon Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017. pp. 572–585 (2017), http://dl.acm.org/citation.cfm?id=3009845

9. Wang, X., Gulwani, S., Singh, R.: FIDEX: ﬁltering spreadsheet data using ex-amples. In: Proceedings of the 2016 ACM SIGPLAN International Conferenceon Object-Oriented Programming, Systems, Languages, and Applications, OOP-SLA 2016, part of SPLASH 2016, Amsterdam, The Netherlands, October 30 -November 4, 2016. pp. 195–213 (2016). https://doi.org/10.1145/2983990.2984030, http://doi.acm.org/10.1145/2983990.2984030

A Miscellaneous Proofs

A.1 Main Theorem

Proof (Theorem 2).

Let S be the set of samples, with | S | = m . By Lemma 1,the number of queries is at most | S || Π P ( S ) | , which is in turn at most m ˆΠ P ( m ) .Applying Lemma 2 immediately gives us the O ( m d +1 ) bound. ⊓⊔ A.2 Interval Details

Here we expand on the details related to the set of interval programs (Figure 2)that were elided in the various examples in Section 3.

Claim.

For any (ﬁnite) set S ⊂ [0 , , | Π [0 ,a ] ( S ) | = | S | + 1 .Technically, this claim is not correct: when ∈ S , the number of dichotomies isone fewer. However, S is obtained by sampling from a distribution, and if thedistribution over [0 , does not contain atoms, then this case almost surely doesnot happen. We omit this detail for simpler presentation throughout. Proof.

Let the elements of S = { x , . . . , x m } be ordered increasingly; there areexactly | S | + 1 equivalence classes of programs based on the choice of a : onefrom a < x , one from a > x m , and ( | S | − -many from x i < a < x i +1 for i ∈ { , . . . , m − } . B Varying Synthetic Problem Parameters

In this section, we provide the complete description of the synthetic benchmarksand present the complete plots of our evaluation.We consider a class of hyperrectangle programs for which we can compute theoptimal solution exactly; this lets us compare the results of our implementationto an ideal baseline. Here, the concept class P (i.e., the set of programs) isdeﬁned as the set of axis-aligned hyperrectangles within [ − , d , and the inputdistribution D is such that inputs are distributed uniformly over [ − , d . Weﬁx some probability mass b and aim to synthesize a program that is close to afunctional speciﬁcation of the form x b ∧ V i ∈{ ,...,d } − x i , whichonly returns 1 for points whose ﬁrst coordinate is positive and at most b . Weﬁx the following postcondition: Pr[ P ( x ) = 1 | x > Pr[ P ( x ) = 1 | x > ∧ Pr[ P ( x ) = 1] > b. n other words, a correct hyperrectangle must include as much probability massof points whose ﬁrst coordinate is negative as it does for those with a positiveﬁrst coordinate, and additionally it must include at least as much probabilitymass as the original hyperrectangle. Observe that independent of d , the besterror for a correct solution is exactly b (and there exist dense regions of α -robustprograms that have error b + α ).We consider problem instances formed from combinations of d ∈ { , , } and b ∈ { . , . , . } . As d increases, the set of programs increases in com-plexity (in fact, it has VC dimension d ) and the synthesis queries become moreexpensive. As b increases, the threshold used by the optimization cannot be assmall, so we expect the search to beneﬁt less from our optimizations. We runour implementation using thresholds τ ∈ { . , . , . , . , } , omitting thosevalues for which τ < b ; additionally, we also consider an adaptive run where τ isinitialized as the value , and whenever a new best solution is enumerated witherror k we update τ ← k .Each combination of parameters was run for a period of 2 minutes. Figure 6shows each of the following as a function of time: ( i ) the depth completed by thesearch (i.e. the current size of the sample set), and ( ii ) the best solution foundby the search. C Thermostat Benchmark

Here we include the speciﬁcation of our modiﬁed version of the thermostat con-troller synthesis benchmark [7]. Figure 7 shows the deﬁnitions of pre , whichdescribes the probability distribution D over the inputs, and thermostat , a pro-gram sketch describing the set of possible programs. We handle the thermostatloop (line 11) through syntactic unrolling, since it has a constant bound: Un-rollings is the value we instantiate from { , , , } in the creating of probleminstances for our experiments. Similarly, the threshold N in line 26 is instantiatedfrom { , , } .A synthesized program instantiates the sketch by replacing the holes withreal-valued constants: for example, the syntax in the thermostat deﬁnition atline 2 speciﬁes that the synthesizer must replace the right side of the assignmentwith a constant between 0 and 10. The assert statements form the proba-bilistic postcondition: if we have the set of assert statements in the program { assert ( event i ; θ i ); } i ∈ I , then the postcondition is given by the following con-junction: V i ∈ I Pr[ event i ] > θ i . (Recall that the loop is syntactically unrolledand observe that all execution paths encounter all assert statements, so this iswell-deﬁned.) lots of Completed Depth vs Time (s) dim = 1 dim = 2 dim = 3 o p t i m a l a t . o p t i m a l a t . o p t i m a l a t . Plots of Best Error vs Time (s) dim = 1 dim = 2 dim = 3 o p t i m a l a t .

100 101 10200 . . . . . adaptive τ = 1 τ = 0 . τ = 0 . τ = 0 . τ = 0 .

07 100 101 10200 . . . . . . . . . . o p t i m a l a t .

100 101 10200 . . . . . . . . . . . . . . . o p t i m a l a t .

100 101 10200 . . . . . . . . . . . . . . . Fig. 6.