[PDF] Achieving High Coverage for Floating-point Code via Unconstrained Programming (Extended Version)

Abstract

Achieving high code coverage is essential in testing, which gives us confidence in code quality. Testing floating-point code usually requires painstaking efforts in handling floating-point constraints, e.g., in symbolic execution. This paper turns the challenge of testing floating-point code into the opportunity of applying unconstrained programming --- the mathematical solution for calculating function minimum points over the entire search space. Our core insight is to derive a representing function from the floating-point program, any of whose minimum points is a test input guaranteed to exercise a new branch of the tested program. This guarantee allows us to achieve high coverage of the floating-point program by repeatedly minimizing the representing function. We have realized this approach in a tool called CoverMe and conducted an extensive evaluation of it on Sun's C math library. Our evaluation results show that CoverMe achieves, on average, 90.8% branch coverage in 6.9 seconds, drastically outperforming our compared tools: (1) Random testing, (2) AFL, a highly optimized, robust fuzzer released by Google, and (3) Austin, a state-of-the-art coverage-based testing tool designed to support floating-point code.

Full PDF

AAchieving High Coverage for Floating-point Codevia Unconstrained Programming (Extended Version)

Zhoulai Fu Zhendong Su

University of California, Davis, USA [email protected] [email protected]

Abstract

Achieving high code coverage is essential in testing, whichgives us conﬁdence in code quality. Testing ﬂoating-pointcode usually requires painstaking efforts in handling ﬂoating-point constraints, e.g. , in symbolic execution. This paper turnsthe challenge of testing ﬂoating-point code into the oppor-tunity of applying unconstrained programming — the math-ematical solution for calculating function minimum pointsover the entire search space. Our core insight is to derive arepresenting function from the ﬂoating-point program, any ofwhose minimum points is a test input guaranteed to exercisea new branch of the tested program. This guarantee allowsus to achieve high coverage of the ﬂoating-point program byrepeatedly minimizing the representing function.We have realized this approach in a tool called CoverMeand conducted an extensive evaluation of it on Sun’s C mathlibrary. Our evaluation results show that CoverMe achieves,on average, 90.8% branch coverage in 6.9 seconds, drasticallyoutperforming our compared tools: (1) Random testing, (2)AFL, a highly optimized, robust fuzzer released by Google,and (3) Austin, a state-of-the-art coverage-based testing tooldesigned to support ﬂoating-point code.

1. Introduction

Test coverage criteria attempt to quantify the quality of testdata. Coverage-based testing [39] has become the state-of-the-practice in the software industry. The higher expectationfor software quality and the shrinking development cyclehave driven the research community to develop a spectrumof automated testing techniques for achieving high codecoverage.A signiﬁcant challenge in coverage-based testing lies inthe testing of numerical code, e.g. , programs with ﬂoating-point arithmetic, non-linear variable relations, or externalfunction calls, such as logarithmic and trigonometric func-tions. Existing solutions include random testing [13, 26],symbolic execution [14, 17, 20, 27], and various search-basedstrategies [11, 29, 32, 35], which have found their way intomany mature implementations [15, 16, 46]. Random testingis easy to employ and fast, but ineffective in ﬁnding deepsemantic issues and handling large input spaces; symbolic execution and its variants can perform systematic path explo-ration, but suffer from path explosion and are weak in dealingwith complex program logic involving numerical constraints.

Our Work

This paper considers the problem of coverage-based testing for ﬂoating-point code and focuses on the cov-erage of program branches. We turn the challenge of test-ing ﬂoating-point programs into the opportunity of apply-ing unconstrained programming — the mathematical solu-tion for calculating function minima over the entire searchspace [38, 51].Our approach has two unique features. First, it introducesthe concept of representing function , which reduces thebranch coverage based testing problem to the unconstrainedprogramming problem. Second, the representing functionis specially designed to achieve the following theoreticalguarantee: Each minimum point of the representing functionis an input to the tested ﬂoating-point program, and the inputnecessarily triggers a new branch unless all branches havebeen covered. This guarantee is critical not only for thesoundness of our approach, but also for its efﬁciency — theunconstrained programming process is designed to cover onlynew branches; it does not waste efforts on covering alreadycovered branches.We have implemented our approach into a tool calledCoverMe. CoverMe ﬁrst derives the representing functionfrom the program under test. Then, it uses an existing uncon-strained programming algorithm to compute the minimumpoints. Note that the theoretical guarantee mentioned aboveallows us to apply any unconstrained programming algorithmas a black box. Our implementation uses an off-the-shelfMonte Carlo Markov Chain (MCMC) [10] tool.CoverMe has achieved high or full branch coverage forthe tested ﬂoating-point programs. Fig. 1 lists the program s_tanh.c from our benchmark suite Fdlibm [5]. The programtakes a double input. In Line 3, variable jx is assignedwith the high word of x according to the comment givenin the source code; the right-hand-side expression in theassignment takes the address of x ( &x ), cast it as a pointer-to-int ( int* ), add 1, and dereference the resulting pointer.In Line 4, variable ix is assigned with jx whose sign bit ismasked off. Lines 5-15 are two nested conditional statements a r X i v : . [ c s . P L ] A p r double tanh(double x){ int jx, ix; jx = *(1+(int*)&x); // High word of x ix = jx&0x7fffffff; if (ix>=0x7ff00000) { if (jx>=0) ...; else ...; } if (ix < 0x40360000) { if (ix<0x3c800000) ...; if (ix>=0x3ff00000) ...; else ...; } else ...; return ...; } Figure 1: Benchmark program s_tanh.c taken from Fdlibm. on ix and jx , which contain 16 branches in total accordingto Gcov [6]. Testing this type of programs is beyond thecapabilities of traditional symbolic execution tools suchas Klee [16]. CoverMe achieves full coverage within 0.7seconds, dramatically outperforming our compared tools,including random testing, Google’s AFL, and Austin (atool that combines symbolic execution and search-basedheuristics). See details in Sect. 6. Contributions

This work introduces a promising auto-mated testing solution for programs that are heavy on ﬂoating-point computation. Our approach designs the representingfunction whose minimum points are guaranteed to exercisenew branches of the ﬂoating-point program. This guaranteeallows us to apply any unconstrained programming solutionas a black box, and to efﬁciently generate test inputs forcovering program branches.Our implementation, CoverMe, proves to be highly efﬁ-cient and effective. It achieves 90.8% branch coverage onaverage, which is substantially higher than those obtained byrandom testing (38.0%), AFL [1] (72.9%), and Austin [30](42.8%).

Paper Outline

We structure the rest of the paper as follows.Sect. 2 presents background material on unconstrained pro-gramming. Sect. 3 gives an overview of our approach, andSect. 4 presents the algorithm. Sect. 5 describes our imple-mentation CoverMe, and Sect. 6 describes our evaluation.Sect. 7 surveys related work and Sect. 8 concludes the paper.For completeness, Sect. A-D provide additional details onour approach.

Notation

We write F for ﬂoating-point numbers, Z forintegers, Z > for strictly positive integers. we use the ternaryoperation B ? a : a (cid:48) to denote an evaluation to a if B holds, or a (cid:48) otherwise. The lambda terms in the form of λ x . f ( x ) may denote mathematical function f or its machineimplementation according to the given context.

2. Background

This section presents the deﬁnition and algorithms of un-constrained programming that will be used in this paper. Asmentioned in Sect. 1, we will treat the unconstrained pro-gramming algorithms as black boxes.

Unconstrained Programming

We formalize unconstrainedprogramming as the problem below [22]:Given f : R n → R Find x ∗ ∈ R n for which f ( x ∗ ) ≤ f ( x ) for all x ∈ R n where f is the objective function; x ∗ , if found, is called aminimum point; and f ( x ∗ ) is the minimum. An example is f ( x , x ) = ( x − ) + ( x − ) , (1)which has the minimum point x ∗ = ( , ) . Unconstrained Programming Algorithms

We considertwo kinds of algorithms, known as local optimization andglobal optimization. Local optimization focuses on how func-tions are shaped near a given input and where a minimum canbe found at local regions. It usually involves standard tech-niques such as Newton’s or the steepest descent methods [40].Fig. 2(a) shows a common local optimization method withthe objective function f ( x ) that equals 0 if x ≤

1, or ( x − ) otherwise. The algorithm uses tangents of f to converge to aminimum point quickly. In general, local optimization is usu-ally fast. If the objective function is smooth to some degree,the local optimization can deduce function behavior in theneighborhood of a particular point x by using information at x only (the tangent here).Global optimization for unconstrained programmingsearches for minimum points over R n . Many global opti-mization algorithms have been developed. This work usesMonte Carlo Markov Chain (MCMC) [10]. MCMC is asampling method that targets (usually unknown) probabilitydistributions. A fundamental fact is that MCMC samplingfollows the target distributions asymptotically, which is for-malized by the lemma below. For simplicity, we present thelemma in the form of discrete-valued probabilities [10]. Lemma 2.1.

Let x be a random variable, A be an enumerableset of the possible values of x, f be a target probabilitydistribution for x, i.e. , the probability of x taking value a ∈ Ais f ( a ) . Then, an MCMC sampling sequence x , . . . , x n , . . . satisﬁes the property that Prob ( x n = a ) → f ( a ) . For example, consider the target distribution of cointossing with 0 . x ,. . . , x n , . . . ,such that the probability of x n being head converges to 0 . n (a) p p p p p p (b)Figure 2: (a) Local optimization example with objective function λ x . x ≤ ( x − ) . The local optimization algorithm usestangents of the curve to quickly converge to a minimum point;(b) Global optimization example with objective function λ x . x ≤ (( x + ) − ) : ( x − ) . The MCMC method starts from p ,converges to local minimum p , performs a Monte-Carlo move to p and converges to p . Then it moves to p and converges to p . Second, MCMC integrates well with local optimization. Anexample is the Basinhopping algorithm [33] used in Sect. 5.Third, MCMC techniques are robust; some variants can evenhandle high dimensional problems [44] or non-smooth ob-jective functions [23]. Our approach uses unconstrained opti-mization as a black box. Fig. 2(b) provides a simple example.Steps p → p , p → p , and p → p employ local optimiza-tion; Steps p → p and p → p , known as Monte-Carlomoves [10], avoid the MCMC sampling being trapped in thelocal minimum points.

3. Overview

This section states the problem and illustrates our solution.

Notation

Let

FOO be the program under test with N condi-tional statements, labeled by l ,. . . , l N − . Each l i has a true branch i T and a false branch i F . We write dom ( FOO ) to de-note the input domain of program FOO . The problem of branch coverage-based test-ing aims to ﬁnd a set of inputs X ⊆ dom ( FOO ) that covers allbranches of FOO . Here, we say a branch is “covered” by X ifit is passed through by executing FOO with an input of X .We scope the problem with three assumptions. They willbe partially relaxed in our implementation (Sect. 5):(a) The inputs of FOO are ﬂoating-point numbers;(b) Each Boolean condition in

FOO is an arithmetic compar-ison between two ﬂoating-point variables or constants;and(c) Each branch of

FOO is feasible, i.e. , it is covered by aninput of program

FOO .The concept below is crucial. It allows us to rewrite thebranch coverage-based testing problem as deﬁned in Def. 3.1to an equivalent, but easier-to-solve one.

Step 1.Step 2.Step 3. X : A set of ’s global minimum points, which saturates (therefore covers) all branches of FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void LOADER (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void LOADER (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking : Program under test: Instrumented program

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void LOADER (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

Generated test inputs

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void LOADER (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void LOADER (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking : Representing function

Figure 3: An illustration of our approach. The goal is to ﬁndinputs that saturate (therefore cover) all branches of

FOO , i.e. , { T , F , T , F } . Deﬁnition 3.2.

Let X be a set of inputs generated during thetesting process. We say that a branch is saturated by X ifthe branch itself and all its descendant branches, if any, arecovered by X . Here, a branch b (cid:48) is called a descendant branchof b if there exists a control ﬂow from b to b (cid:48) . We writeSaturate ( X ) (2)for the set of branches saturated by X .he control-ﬂow graph on the rightillustrates Def. 3.2. Suppose that aninput set X covers { T , F , F } . ThenSaturate ( X ) = { F , F } . Branch 1 T is notsaturated because it is not covered; branch0 T is not saturated neither because its de-scendant branch 1 T is not covered.Our approach reformulates the branch coverage-basedtesting problem with the lemma below. Lemma 3.3.

Let

FOO be the program under test. AssumeDef. 3.1(a-c) hold. Then, a set of inputs X ⊆ dom ( FOO ) saturates all FOO ’s branches iff X covers all

FOO ’s branches.By consequence, the goal of branch coverage-based test-ing deﬁned in Def. 3.1 can be equivalently stated: To ﬁnd aset of inputs X ⊆ dom ( FOO ) that saturates all FOO ’s branches.

Fig. 3 illustrates our approach. The program under test hastwo conditional statements l and l . Our objective is to ﬁndan input set that saturates all branches, namely, 0 T , 0 F , 1 T ,and 1 F . Our approach proceeds in three steps: Step 1

We inject the global variable r in FOO , and, immedi-ately before each control point l i , we inject the assignment r = pen (3)where pen invokes a code segment with parameters associatedwith l i . The idea of pen is to capture the distance of theprogram input from saturating a branch that has not yet beensaturated . Observe that in Fig. 3, both injected pen returndifferent values depending on whether the branches at l i aresaturated or not. FOO_I denotes the instrumented program.

Step 2

This step constructs the representing function thatwe have mentioned in Sect. 1. The representing function isthe driver program

FOO_R shown in Fig. 3. It initializes r to 1,invokes FOO_I and then returns r as the output of FOO_R . Thatis to say,

FOO_R ( x ) for a given input x calculates the value of r at the end of executing FOO_I ( x ) .The key in Steps 1 and 2 is to design pen so that FOO_R meets the two conditions below:

C1.

FOO_R ( x ) ≥ x , and C2.

FOO_R ( x ) = x saturates a new branch. In other words,a branch that has not been saturated by the generated inputset X becomes saturated with X ∪ { x } , i.e. , Saturate ( X ) (cid:54) = Saturate ( X ∪ { x } ) .Conditions C1 and C2 are essential because they allow usto transform a branch coverage-based testing problem to anunconstrained programming problem. Ideally, we can thensaturate all branches of FOO by repeatedly minimizing

FOO_R as shown in the step below.

Step 3

We calculate the minimum points of

FOO_R viaunconstrained programming algorithms described in Sect. 2.Typically, we start with an input set X = /0 and Saturate ( X ) = /0. We minimize FOO_R and obtain a minimum point x ∗ whichnecessarily saturates a new branch by condition C2. Then wehave X = { x ∗ } and we minimize FOO_R again which givesanother input x ∗∗ and { x ∗ , x ∗∗ } saturates a branch that is notsaturated by { x ∗ } . We continue this process until all branchesare saturated. Note that when Step 3 terminates, FOO_R ( x ) must be strictly positive for any input x , due to C1 and C2.Tab. 1 illustrates a scenario of how our approach saturatesall branches of program FOO given in Fig. 3. Each “ ” belowcorresponds to a line in the table. We write pen and pen to distinguish the two pen injected at l and l respectively.( ) Initially, no branch has been saturated. Both pen and pen set r =

0, and

FOO_R returns 0 for any input. Suppose x ∗ = . ) The branch 1 F is now saturated and 1 T is not. Thus, pen sets r = ( y − ) .Minimizing FOO_R gives x ∗ = − .

0, 1 .

0, or 2 .

0. We haveillustrated how these minimum points can be computed inunconstrained programming in Fig. 2(b). Suppose x ∗ = . ) Both 1 T and 1 F , as well as 0 T , are saturatedby the generated inputs { . , . } . Thus, pen returns theprevious r and FOO_R amounts to pen , which returns 0 if x > ( x − ) + ε otherwise, where ε is a small predeﬁnedconstant (Sect. 4). Suppose x ∗ = . ) All branches have been saturated. In this case, both pen and pen return r . FOO_R returns 1 for all x since FOO_R initializes r as 1. Suppose the minimum found is x ∗ = − . FOO_R ( x ∗ ) >

0, which conﬁrms that allbranches have been saturated (due to C1 and C2).

4. Algorithm

We provide details corresponding to the three steps inSect. 3.2. The algorithm is summarized in Algo. 1.

Algorithm for Step 1

The outcome of this step is theinstrumented program

FOO_I . As explained in Sect. 3.2, theessence is to inject the variable r and the assignment r = pen before each conditional statement (Algo. 1, Lines 1-4).To deﬁne pen , we ﬁrst introduce a set of helper functionsthat are sometimes known as branch distance . There are manydifferent forms of branch distance in the literature [29, 35].We deﬁne ours with respect to an arithmetic condition a op b . Deﬁnition 4.1.

Let a , b ∈ R , op ∈ { == , ≤ , <, (cid:54) = , ≥ , > } , ε ∈ R > . We deﬁne branch distance d ε ( op , a , b ) as follows: d ε (== , a , b ) def = ( a − b ) (4) d ε ( ≤ , a , b ) def = ( a ≤ b ) ? 0 : ( a − b ) (5) d ε ( <, a , b ) def = ( a < b ) ? 0 : ( a − b ) + ε (6) d ε ( (cid:54) = , a , b ) def = ( a (cid:54) = b ) ? 0 : ε (7) able 1: A scenario of how our approach saturates all branches of FOO by repeatedly minimizing

FOO_R . Column “Saturate”: Branchesthat have been saturated. Column “

FOO_R ”: The representing func-tion and its plot. Column “ x ∗ ”: The point where FOO_R attains theminimum. Column “ X ”: Generated test inputs. . FOO_R x ∗ X λ x . . { . } { F } λ x . (cid:40) (( x + ) − ) x ≤ ( x − ) else 1 . { . , . } { T , T , F } λ x . (cid:40) x > ( x − ) + ε else 1 . { . , . , . } { T , T , F , F } λ x . − . { . , . , . , − . } and d ε ( ≥ , a , b ) def = d ε ( ≤ , b , a ) , d ε ( >, a , b ) def = d ε ( <, b , a ) . Weuse ε to denote a small positive ﬂoating-point close tomachine epsilon. The idea is to treat a strict ﬂoating-pointinequality x > y as a non-strict inequality x ≥ y + ε , etc. Wewill drop the explicit reference to ε when using the branchdistance.The intention of d ( op , a , b ) is to quantify how far a and b are from attaining a op b . For example, d (== , a , b ) is strictlypositive when a (cid:54) = b , becomes smaller when a and b go closer,and vanishes when a == b . The following property holds: d ( op , a , b ) ≥ d ( op , a , b ) = ⇔ a op b . (8)As an analogue, we set pen to quantify how far an inputis from saturating a new branch. We deﬁne pen followingAlgo. 1, Lines 14-23. Deﬁnition 4.2.

For branch coverage-based testing, the func-tion pen has four parameters, namely, the label of the con-ditional statement l i , op , and a and b from the arithmeticcondition a op b .(a) If neither of the two branches at l i is saturated, we let pen return 0 because any input saturates a new branch(Lines 16-17).(b) If one branch at l i is saturated but the other is not, we set r to be the distance to the unsaturated branch (Lines 18-21).(c) If both branches at l i have already been saturated, pen re-turns the previous value of the global variable r (Lines 22-23). Algorithm 1:

Branch coverage-based testing

Input:

FOO

Program under test n _ start Number of starting points LM Local optimization used in

MCMC n _ iter Number of iterations for

MCMC

Output: X Generated input set /* Step 1 */ Inject global variable r in FOO for conditional statement l i in FOO do Let the Boolean condition at l i be a op b where op ∈ {≤ ,<, = ,>, ≥ , (cid:54) = } Insert assignment r = pen ( l i , op , a , b ) before l i /* Step 2 */ Let

FOO_I be the newly instrumented program, and

FOO_R be: double FOO_R(double x) {r = 1; FOO_I(x); return r;} /* Step 3 */ Let Saturate = /0 Let X = /0 for k = to n_start do Randomly take a starting point x Let x ∗ = MCMC ( FOO_R , x ) if FOO_R ( x ∗ ) = then X = X ∪ { x ∗ } Update Saturate return X Function pen ( l i , op , a , b ) Let i T and i F be the true and the false branches at l i if i T (cid:54)∈ Saturate and i F (cid:54)∈ Saturate then return else if i T (cid:54)∈ Saturate and i F ∈ Saturate then return d ( op , a , b ) /* d : Branch distance */ else if i T ∈ Saturate and i F (cid:54)∈ Saturate then return d ( op , a , b ) /* op : the opposite of op */ else /* i T ∈ Saturate and i F ∈ Saturate */ return r Function

MCMC ( f , x) x L = LM ( f , x ) /* Local minimization */ for k = to n_iter do Let δ be a random perturbation generation from apredeﬁned distribution Let (cid:101) x L = LM ( f , x L + δ ) if f ( (cid:101) x L ) < f ( x L ) then accept = true else Let m be a random number generated from theuniform distribution on [ , ] Let accept be the Boolean m < exp ( f ( x L ) − f ( (cid:101) x L )) if accept then x L = (cid:101) x L return x L For example, pen at l and l are invoked as pen ( l i , ≤ , x , ) and pen ( l , == , y , ) respectively in Fig. 3. Algorithm for Step 2

This step constructs the representingfunction

FOO_R (Algo. 1, Line 5). Its input domain is the sameas that of

FOO_I and

FOO , and its output domain is double ,so to simulate a real-valued mathematical function whichcan then be processed by the unconstrained programmingbackend.

OO_R initializes r to 1. This is essential for the correctnessof the algorithm because we expect FOO_R returns a non-negative value when all branches are saturated (Sect. 3.2,Step 2).

FOO_R then calls

FOO_I ( x ) and records the value of r at the end of executing FOO_I ( x ) . This r is the returned valueof FOO_R .As mentioned in Sect. 3.2, it is important to ensure that

FOO_R meets conditions C1 and C2. The condition C1 holdstrue since

FOO_R returns the value of the instrumented r ,which is never assigned a negative quantity. The theorembelow states FOO_R also satisﬁes C2.

Theorem 4.3.

Let

FOO_R be the program constructed inAlgo. 1, and S the branches that have been saturated. Then,for any input x ∈ dom ( FOO ) , FOO_R ( x ) = ⇔ x saturates abranch that does not belong to S.Proof. We ﬁrst prove the ⇒ direction. Take an arbitrary x such that FOO_R ( x ) =

0. Let τ = [ l , . . . l n ] be the path in FOO passed through by executing

FOO ( x ) . We know, from Lines 2-4 of the algorithm, that each l i is preceded by an invocation of pen in FOO_R . We write pen i for the one injected before l i anddivide { pen i | i ∈ [ , n ] } into three groups. For the given input x , we let P1 , P2 and P3 denote the groups of pen i that aredeﬁned in Def. 4.2(a), (b) and (c), respectively. Then, we canalways have a preﬁx path of τ = [ l , . . . l m ] , with 0 ≤ m ≤ n such that each pen i for i ∈ [ m + , n ] belongs to P3 , and each pen i for i ∈ [ , m ] belongs to either P1 or P2 . Here, we canguarantee the existence of such an m because, otherwise,all pen i belong in P3 , and FOO_R becomes λ x .

1. The lattercontradicts the assumption that

FOO_R ( x ) =

0. Because each pen i for i > m does nothing but performs r = r , we knowthat FOO_R ( x ) equals to the exact value of r that pen m assigns.Now consider two disjunctive cases on pen m . If pen m is in P1 , we immediately conclude that x saturates a new branch.Otherwise, if pen m is in P2 , we obtains the same from Eq. (8).Thus, we have established the ⇒ direction of the theorem.To prove the ⇐ direction, we use the same notation asabove, and let x be the input that saturates a new branch,and [ l , . . . , l n ] be the exercised path. Assume that l m where0 ≤ m ≤ n corresponds to the newly saturated branch. Weknow from the algorithm that (1) pen m updates r to 0, and (2)each pen i such that i > m maintains the value of r becausetheir descendant branches have been saturated. We have thusproven the ⇐ direction of the theorem. Algorithm for Step 3

The main loop (Algo. 1, Lines 8-12)relies on an existing MCMC engine. It takes an objectivefunction and a starting point and outputs x ∗ that it regardsas a minimum point. Each iteration of the loop launchesMCMC from a randomly selected starting point (Line 9).From each starting point, MCMC computes the minimumpoint x ∗ (Line 10). If FOO_R ( x ∗ ) = x ∗ is to be added to X (Line 11). Thm. 4.3 ensures that x ∗ saturates a new branchif FOO_R ( x ∗ ) =

0. Therefore, in theory, we only need to set n _ start = ∗ N where N denotes the number of conditional statements, so to saturate all 2 ∗ N branches. In practice,however, we set n _ start > ∗ N because MCMC cannotguarantee that its output is a true global minimum point.The MCMC procedure (Algo. 1, Lines 24-34) is alsoknown as the Basinhopping algorithm [33]. It is an MCMCsampling over the space of the local minimum points [34].The random starting point x is ﬁrst updated to a local min-imum point x L (Line 25). Each iteration (Lines 26-33) iscomposed of the two phases that are classic in the Metropolis-Hastings algorithm family of MCMC [19]. In the ﬁrst phase(Lines 27-28), the algorithm proposes a sample (cid:101) x L from thecurrent sample x . The sample (cid:101) x L is obtained with a per-turbation δ followed by a local minimization, i.e. , (cid:101) x L = LM ( f , x L + δ ) (Line 28), where LM denotes a local mini-mization in Basinhopping, and f is the objective function.The second phase (Lines 29-33) decides whether the pro-posed (cid:101) x L should be accepted as the next sampling point. If f ( (cid:101) x L ) < f ( x L ) , the proposed (cid:101) x L will be sampled; otherwise, (cid:101) x L may still be sampled, but only with the probability ofexp (( f ( x L ) − f ( (cid:101) x L )) / T ) , in which T (called the annealingtemperature [28]) is set to 1 in Algo. 1 for simplicity.

5. Implementation

As a proof-of-concept demonstration, we have implementedAlgo. 1 into the tool CoverMe. This section presents itsimplementation and technical details.

The frontend implements Steps 1 and 2 of Algo. 1. Cov-erMe compiles the program under test

FOO to LLVM IRwith Clang [2]. Then it uses an LLVM pass [8] to injectassignments. The program under test can be in any LLVM-supported language, e.g. , Ada, the C/C++ language family,or Julia, etc.

Our implementation has been tested on C code.Fig. 4 illustrates

FOO as a function of signature type_tFOO (type_t1 x1, type_t2 x2, ...) . The return type(output) of the function, type_t , can be any kind of typessupported by C, whereas the types of the input parame-ters, type_t1, type_t2, ... , are restricted to double or double* . We have explained the signature of pen in Def. 4.2.Note that CoverMe does not inject pen itself into FOO , butinstead injects assignments that invoke pen . We implement pen in a separate C++ ﬁle.The frontend also links

FOO_I and

FOO_R with a simpleprogram loader into a shared object ﬁle libr.so , whichis the outcome of the frontend. It stores the representingfunction in the form of a shared object ﬁle (.so ﬁle).

The backend implements Step 3 of Algo. 1. It invokes the rep-resenting function via libr.so . The kernel of the backend isan external MCMC engine. It uses the off-the-shelf implemen-tation known as

Basinhopping from the Scipy Optimizationpackage [9]. Basinhopping takes a range of input parame- OO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs) FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void LOADER (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void loader (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void LOADER (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void LOADER (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void loader (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void LOADER (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void LOADER (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void loader (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void LOADER (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void LOADER (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void loader (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void loader (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void LOADER (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void LOADER (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void loader (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void loader (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void loader (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void loader (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

Generated test inputs X : A set of FOO_R ’s global minimum pointsGenerated test inputs X : A set of FOO_R ’s global minimum points

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void loader (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void LOADER (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

FOO : Program under testin any LLVM-supported language type_t FOO (double x1, double x2, ...) pen (.cpp) double pen (int i, int op, double lhs, double rhs)

FOO_I : Instrumented program (.bc) type_t FOO_I (double x1, double x2, ...) loader (.cpp) void loader (double* P)

FOO_R (.cpp) void FOO_R (double* P) libr.so void FOO_R (double* P)

MCMC minimization procedure (.py) basinhopping (func, sp, n_iter, callback)

LLVM passLinking

Front-end: Construct the representing functionBack-end: Minimize the representing function

Figure 4: CoverMe implementation. ters. Fig. 4 shows the important ones for our implementa-tion basinhopping(f, sp, n_iter, call_back) , where f refers to the representing function from libr.so , sp is a start-ing point as a Python Numpy array, n_iter is the iterationnumber used in Algo. 1 and call_back is a client-deﬁnedprocedure. Basinhopping invokes call_back at the end ofeach iteration (Algo. 1, Lines 24-34). The call_back proce-dure allows CoverMe to terminate if it saturates all branches.In this way, CoverMe does not need to wait until passing all n _ start iterations (Algo. 1, Lines 8-12). Sect. 3 assumes Def. 3.1(a-c) for the sake of simpliﬁcation.This section discusses how CoverMe relaxes the assumptionswhen handling real-world ﬂoating-point code. We also showhow CoverMe handles function calls at the end of this section.

Handling Pointers (Relaxing Def. 3.1(a))

We consideronly pointers to ﬂoating-point numbers. They may occur(1) in an input parameter, (2) in a conditional statement, or(3) in the code body but not in the conditional statement. CoverMe inherently handles case (3) because it is execution-based and does not need to analyze pointers and their effects.CoverMe currently does not handle case (2) and ignores theseconditional statements by not injecting pen before them.Below we explain how CoverMe deals with case (1).A branch coverage testing problem for a program whoseinputs are pointers to doubles, can be regarded as thesame problem with a simpliﬁed program under test. For in-stance, ﬁnding test inputs to cover branches of program voidFOO(double* p) {if (*p <= 1)... } can be reduced totesting the program void FOO_with_no_pointer (doublex) {if (x <= 1)... } . CoverMe transforms program

FOO to FOO_with_no_pointer if a

FOO ’s input parameter is aﬂoating-point pointer.

Handling Comparison between Non-ﬂoating-point Ex-pressions (Relaxing Def. 3.1(b))

We have encounteredsituations where a conditional statement invokes a compar-ison between non ﬂoating-point numbers. CoverMe han-dles these situations by ﬁrst promoting the non ﬂoating-point numbers to ﬂoating-point numbers and then injecting pen as described in Algo. 1. For example, before a condi-tional statement like if (xi op yi) where xi and yi areintegers, CoverMe injects r = pen (i, op, (double) x,(double) y)); . Note that such an approach does not allowus to handle data types that are incompatible with ﬂoating-point types, e.g. , conditions like if (p != Null) , whichCoverMe has to ignore. Handling Infeasible Branches (Relaxing Def. 3.1(c))

In-feasible branches are branches that cannot be covered by anyinput. Attempts to cover infeasible branches are useless andtime-consuming.Detecting infeasible branches is a difﬁcult problem in gen-eral. CoverMe uses a heuristic to detect infeasible branches.When CoverMe ﬁnds a minimum that is not zero, it deemsthe unvisited branch of the last conditional to be infeasibleand adds it to Saturate, the set of unvisited and deemed-to-beinfeasible branches.Imagine that we modify l of the program FOO in Fig. 3 tothe conditional statement if (y == -1) . Then the branch 1 T becomes infeasible. We rewrite this modiﬁed program belowand illustrate how we deal with infeasible branches. l0: if (x <= 1) {x++};y = square(x);l1: if (y == -1) {...} where we omit the concrete implementation of square .Let FOO_R denote the representing function constructed forthe program. In the minimization process, whenever CoverMeobtains x ∗ such that FOO_R ( x ∗ ) >

0, CoverMe selects a branchthat it regards infeasible. CoverMe selects the branch asfollows: Suppose x ∗ exercises a path τ whose last conditionalstatement is denoted by l z , and, without loss of generality,suppose z T is passed through by τ , then CoverMe regards z F as an infeasible branch.n the modiﬁed program above, if 1 F has been saturated,the representing function evaluates to ( y + ) or ( y + ) +

1, where y equals to the non-negative square(x) . Thus,the minimum point x ∗ must satisfy FOO_R ( x ∗ ) > F . CoverMe then regards1 T as an infeasible branch.CoverMe then regards the infeasible branches as alreadysaturated. It means, in line 12 of Algo. 1, CoverMe updatesSaturate with saturated branches and infeasible branches(more precisely, branches that CoverMe regards infeasible).The presented heuristic works well in practice (SeeSect. 6), but we do not claim that our heuristic always cor-rectly detects infeasible branches. Handling Function Calls

By default, CoverMe injects r = pen i only in the entry function to test. If the entryfunction invokes other external functions, they will not betransformed. For example, in the program FOO of Fig. 3, we donot transform square(x) . In this way, CoverMe only attemptsto saturate all branches for a single function at a time.However, CoverMe can also easily handle functions in-voked by its entry function. As a simple example, consider: void FOO (double x) { GOO(x); }void GOO (double x) { if (sin(x) <= 0.99) ... } If CoverMe aims to saturate

FOO and

GOO but not sin , andit sets

FOO as the entry function, then it instruments both

FOO and

GOO . Only

GOO has a conditional statement, and CoverMeinjects an assignment on r in GOO .

6. Evaluation

This section presents our extensive evaluation of CoverMe.All experiments are performed on a laptop with a 2.6 GHzIntel Core i7 running a Ubuntu 14.04 virtual machine with4GB RAM. The main results are presented in Tab. 2, 3 andFig. 5.

Our benchmarks are C ﬂoating-point pro-grams from the Freely Distributable Math Library (Fdlibm)5.3, developed by Sun Microsystems, Inc. These programs areavailable from the network library netlib. We choose Fdlibmbecause it represents a set of ﬂoating-point programs withreference quality and a large user group. For example, theJava SE 8’s math library is deﬁned with respect to Fdlibm5.3. [3], Fdlibm is also ported to Matlab, JavaScript, and hasbeen integrated in Android.Fdlibm 5.3 has 80 programs. Each program deﬁnes one ormultiple math functions. In total, Fdlibm 5.3 contains 92 mathfunctions. Among them, we exclude 36 functions that haveno branch, 11 functions involving input parameters that arenot ﬂoating-point, and 5 static C functions. Our benchmarksinclude all remaining 40 functions. Sect. A lists all excludedfunctions in Fdlibm 5.3.

Parameter Settings of CoverMe

CoverMe supports threeparameters: (1) the number of Monte-Carlo iterations n _ iter ,(2) the local optimization algorithm LM , and (3) the numberof starting points n _ start . They correspond to the threeinput parameters of Algo. 1. Our experiment sets n _ iter = n _ start = LM = “powell” (which refers to Powell’salgorithm [43]). Tools for Comparison

We have compared CoverMe withthree tools that support ﬂoating-point coverage-based testing: • Rand is a pure random testing tool. We have implementedRand using a pseudo-random number generator. • AFL [1] is a gray-box testing tool released by the Googlesecurity team. It integrates a variety of guided searchstrategies and employs genetic algorithms to efﬁcientlyincrease code coverage. • Austin [30] is a coverage-based testing tool that imple-ments symbolic execution and search-based heuristics.Austin has been shown [31] to be more effective thanthe testing tool called CUTE [46] (which is not publiclyavailable).We have decided to not consider the following tools: • Klee [16] is the state-of-the-art implementation of sym-bolic execution. We do not consider Klee because itsexpression language does not support ﬂoating-point con-straints. In addition, many operations in our benchmarkprograms, such as pointer reference, dereference, typecasting, are not supported by Klee’s backend solverSTP [25], or any other backend solvers compatible withthe Klee Multi-solver extension [41]. • Klee-FP [21] is a variant of Klee geared toward reason-ing about ﬂoating-point value equivalence. It determinesequivalence by checking whether two ﬂoating-point val-ues are produced by the same operations [21]. We do notconsider Klee-FP because its special-purpose design doesnot support coverage-based testing. • Pex [49] is a coverage tool based on dynamic symbolicexecution. We do not consider Pex because it can onlyrun for .NET programs on Windows whereas our testedprograms are in C, and our testing platform is Linux. • FloPSy [32] is a ﬂoating-point testing tool that combinessearch based testing and symbolic execution. We do notconsider this tool because it is developed by the sameauthor of Austin and before Austin is released, and thetool is not available to us.

Coverage Measurement

Our evaluation focuses on branchcoverage. Sect. C also shows our line coverage results. ForCoverMe and Rand, we use the Gnu coverage tool Gcov [6]. In the recent Klee-dev mailing list, the Klee developers stated that Kleeor its variant Klee-FP only has basic ﬂoating-point support and they are still"working on full FP support for Klee" [7]. igure 5: Benchmark results corresponding to the data given in Tab.Tab. 2. The y-axis refers to the branch coverage; the x-axis refers to thebenchmarks.

For AFL, we use AFL-cov [4], a Gcov-based coverageanalysis tool for AFL. For Austin, we calculate the branchcoverage by the number of covered branches provided in the .csv ﬁle produced by Austin when it terminates.

Time Measurement

To compare the running time of Cov-erMe with the other tools requires careful design. CoverMe,Rand, and AFL all have the potentials to achieve highercoverage if given more time or iteration cycles. CoverMeterminates when it exhausts its iterations or achieves full cov-erage, whereas Random testing and AFL do not terminate bythemselves. In our experiment, we ﬁrst run CoverMe until itterminates using the parameters provided above. Then, werun Rand and AFL with ten times of the CoverMe time.Austin terminates when it decides that no more coveragecan be attained. It reports no coverage results until it termi-nates its calculation. Thus, it is not reasonable to set the sameamount of time for Austin as AFL and Rand. The time forAustin refers to the time Austin spends when it stops running.

As a sanity check, we ﬁrst compare CoverMe with Rand. InTab. 2, we sort all programs (Col. 1) and functions (Col. 2)by their names and give the numbers of branches (Col. 3).Tab. 2, Col. 6 gives the time spent by CoverMe. The timerefers to the wall time reported by the Linux command time .Observe that the time varies considerably, ranging from 0.1second ( s_erf.c , erfc ) to 22.1 seconds ( e_fmod.c ). Besides,the time is not necessarily correlated with the number ofbranches. For example, CoverMe takes 1.1 seconds to run s_expm1.c (42 branches) and 10.1 seconds to run s_floor.c (30 branches). It shows the potential for real-world program testing since CoverMe may not be very sensitive to thenumber of lines or branches.Tab. 2, Col. 4 gives the time spent by Rand. Since Randdoes not terminate by itself, the time refers to the timeoutbound. As mentioned above, we set the bound as ten times ofthe CoverMe time.Tab. 2, Col. 7 and 9 show the branch coverage results ofRand and CoverMe respectively. The coverage is reportedby the Gnu coverage tool Gcov [6]. CoverMe achieves 100%coverage for 11 out of 40 tested functions with an averageof 90.8% coverage, while Rand does not achieve any 100%coverage and attains only 38.0% coverage on average. Thelast row of the table shows the mean values. Observe thatall values in Col. 9 are larger than the corresponding valuesin Col. 7. It means that CoverMe achieves higher branchcoverage than Rand for every benchmark program. The resultvalidates our sanity check.Col. 10 is the improvement of CoverMe versus Rand. Wecalculate the coverage improvement as the difference be-tween their percentages. CoverMe provides 52.9% coverageimprovement on average. Remark 6.1.

Tab. 2 shows that CoverMe achieves partialcoverage for some tested programs. The incompletenessoccurs in two situations: (1) The program under test hasunreachable branches; (2) The representing function fails toconﬁrm

FOO_R = FOO_R . Sect. D illustrates thesetwo situations.

Tab. 2 also gives the experimental results of AFL. Col. 5corresponds to the “run time” statistics provided by the able 2: CoverMe versus Rand and AFL. The benchmark programs are taken from Fdlibm [5]. The coverage percentage is reported byGcov [6]. The time for CoverMe refers to the wall time. The time for Rand and AFL are set to be ten times of the CoverMe time.

Benchmark Time (s) Branch (%) Improvement (%)File Function rontend of AFL. Col. 8 shows the branch coverage of AFL.As mentioned in Sect. 6.1, we terminate AFL once it spendsten times of the CoverMe time. Sect. B gives additional detailson our AFL settings.Our results show that AFL achieves 100% coverage for 2out of 40 tested functions with an average of 72.9% coverage,while CoverMe achieves 100% coverage for 11 out of 40tested functions with an average of 90.8% coverage. Theaverage improvement is 17.9% (shown in the last row).The last column is the improvement of CoverMe versusAFL. We calculate the coverage improvement as the dif-ference between their percentages. Observe that CoverMeoutperforms AFL for 33 out of 40 tested functions. Thelargest improvement is 73.1% with program s_atan.c . Forﬁve of tested functions, CoverMe achieves the same coverageas AFL. There are three tested functions where CoverMeachieves less ( e_acosh.c , e_atan2.c , and e_pow.c ). Remark 6.2.

We have further studied CoverMe’s coverageon these three programs versus that of AFL. We have runAFL with the same amount of time as CoverMe (rather thanten times as much as CoverMe). With this setting, AFL doesnot achieves 70.0% for e_acosh.c , 63.6% for e_atan2.c ,and 54.4% for e_pow.c , which are less than or equal toCoverMe’s coverage.That being said, comparing CoverMe and AFL by runningthem using the same amount of time may be unfair becauseAFL usually requires much time to obtain good code coverage— the reason why we have set AFL to run ten times as muchtime as CoverMe for the results reported in Tab. 2.

Tab. 3 compares the results of CoverMe and Austin. We usethe same set of benchmarks as Tab. 2 (Col. 1-2). We use thetime (Col. 3-4) and the branch coverage metric (Col. 5-6) toevaluate the efﬁciency and the coverage. The branch coverageof Austin (Col. 5) is provided by Austin itself rather than byGcov. Gcov needs to have access to the generated test inputsto report the coverage, but Austin does not provide a viableway to access the generated test inputs.Austin shows large performance variances over differentbenchmarks, from 667.1 seconds ( s_sin.c ) to hours. Asshown in the last row of Tab. 3, Austin needs 6058.4 secondson average for the testing. The average time does not includethe benchmarks where Austin crashes or times out. Com-pared with Austin, CoverMe is faster (Tab. 3, Col. 4) with 6.9seconds on average.CoverMe achieves a higher branch coverage (90.8%) thanAustin (42.8%). We also compare across Tab. 3 and Tab. 2.On average, Austin provides slightly higher branch coverage(42.8%) than Rand (38.0%), but lower than AFL (72.9%). Austin raised an exception when testing e_sqrt.c . The exception wastriggered by

AustinOcaml/symbolic/symbolic.ml from Austin’s code, atline 209, Column 1.

Col. 7-8 are the improvement metrics of CoverMe againstAustin. We calculate the Speedup (Col. 7) as the ratio of thetime spent by Austin and the time spent by CoverMe, and thecoverage improvement (Col. 7) as the difference between thebranch coverage of CoverMe and that of Austin. We observethat CoverMe provides 3,868X speed-up and 48.9% coverageimprovement on average.

Remark 6.3.

Three reasons contribute to CoverMe’s effec-tiveness. First, Thm. 4.3 allows the search process to targetthe right test inputs only (each minimum point found in theminimization process corresponds to a new branch until allbranches are saturated). Most random search techniques donot have such guarantee, so they may waste time searchingfor irrelevant test inputs. Second, SMT-based methods runinto difﬁculties in certain kinds of programs, in particular,those with nonlinear arithmetic. As a result, it makes sense touse unconstrained programming in programs that are heavyon ﬂoating-point computation. In particular, the representingfunction for CoverMe is carefully designed to be smooth tosome degree ( e.g. , the branch distances deﬁned in Def. 4.1 arequadratic expressions), which allows CoverMe to leveragethe power of local optimization and MCMC. Third, comparedwith symbolic execution based solutions, such as Austin, Cov-erMe invokes the unconstrained programming backend forminimizing a single representing function, whereas symbolicexecution usually needs to invoke SMT solvers for solving alarge number of paths.Since CoverMe has achieved high code coverage on mosttested programs, one may wonder whether our generatedinputs have triggered any latent bugs. Note that when nospeciﬁcations are given, program crashes have frequentlybeen used as an oracle for ﬁnding bugs in integer programs.Floating-point programs, on the other hand, can silentlyproduce wrong results without crashing. Thus, when testingﬂoating-point programs, program crashes cannot be used asa simple, readily available oracle as for integer programs.Our experiments, therefore, have focused on assessing theeffectiveness of CoverMe in solving the problem deﬁned inDef. 3.1 and do not evaluate its effectiveness in ﬁnding bugs,which is orthogonal and exciting future work.

7. Related Work

Many survey papers [35, 47, 50] have reviewed the algo-rithms and implementations for coverage-based testing.

Random Testing

The most generic automated testing solu-tion may be to sample from the input space randomly. Purerandom testing is usually ineffective if the input space islarge, such as in the case of testing ﬂoating-point programs.Adaptive random testing [18] evenly spreads the randomlygenerated test cases, which is motivated by the observationthat neighboring inputs often exhibit similar failure behavior.AFL [1] is another improved random testing. Its samplingis aided with a set of heuristics. AFL starts from a random able 3: CoverMe versus Austin. The benchmark programs are taken from the Fdlibm library [5]. The “timeout” refers to a time of more than30000 seconds. The “crash” refers to a fatal exception when running Austin.

Benchmark Time (second) Branch coverage(%) ImprovementProgram Entry function Austin CoverMe Austin CoverMe Speedup Coverage (%)e_acos.c ieee754_acos(double) 6058.8 7.8 16.7 100.0 776.4 83.3e_acosh.c ieee754_acosh(double) 2016.4 2.3 40.0 90.0 887.5 50.0e_asin.c ieee754_asin(double) 6935.6 8.0 14.3 92.9 867.0 78.6e_atan2.c ieee754_atan2(double, double) 14456.0 17.4 34.1 63.6 831.2 29.6e_atanh.c ieee754_atanh(double) 4033.8 8.1 8.3 91.7 495.4 83.3e_cosh.c ieee754_cosh(double) 27334.5 8.2 37.5 93.8 3327.7 56.3e_exp.c ieee754_exp(double) 2952.1 8.4 75.0 96.7 349.7 21.7e_fmod.c ieee754_frmod(double, double) timeout 22.1 n/a 70.0 n/a n/ae_hypot.c ieee754_hypot(double, double) 5456.8 15.6 36.4 90.9 350.9 54.6e_j0.c ieee754_j0(double) 6973.0 9.0 33.3 94.4 776.5 61.1ieee754_y0(double) 5838.3 0.7 56.3 100.0 8243.5 43.8e_j1.c ieee754_j1(double) 4131.6 10.2 50.0 93.8 403.9 43.8ieee754_y1(double) 5701.7 0.7 56.3 100.0 8411.0 43.8e_log.c ieee754_log(double) 5109.0 3.4 59.1 90.9 1481.9 31.8e_log10.c ieee754_log10(double) 1175.5 1.1 62.5 87.5 1061.3 25.0e_pow.c ieee754_pow(double, double) timeout 18.8 n/a 81.6 n/a n/ae_rem_pio2.c ieee754_rem_pio2(double, double*) timeout 1.1 n/a 93.3 n/a n/ae_remainder.c ieee754_remainder(double, double) 4629.0 2.2 45.5 100.0 2146.5 54.6e_scalb.c ieee754_scalb(double, double) 1989.8 8.5 57.1 92.9 233.8 35.7e_sinh.c ieee754_sinh(double) 5534.8 0.6 35.0 95.0 9695.9 60.0e_sqrt.c iddd754_sqrt(double) crash 15.6 n/a 82.6 n/a n/ak_cos.c kernel_cos(double, double) 1885.1 15.4 37.5 87.5 122.6 50.0s_asinh.c asinh(double) 2439.1 8.4 41.7 91.7 290.8 50.0s_atan.c atan(double) 7584.7 8.5 26.9 88.5 890.6 61.6s_cbrt.c cbrt(double) 3583.4 0.4 50.0 83.3 9109.4 33.3s_ceil.c ceil(double) 7166.3 8.8 36.7 83.3 812.3 46.7s_cos.c cos (double) 669.4 0.4 75.0 100.0 1601.6 25.0s_erf.c erf(double) 28419.8 9.0 30.0 100.0 3166.8 70.0erfc(double) 6611.8 0.1 25.0 100.0 62020.9 75.0s_expm1.c expm1(double) timeout 1.1 n/a 97.6 n/a n/as_ﬂoor.c ﬂoor(double) 7620.6 10.1 36.7 83.3 757.8 46.7s_ilogb.c ilogb(double) 3654.7 8.3 16.7 75.0 438.7 58.3s_log1p.c log1p(double) 11913.7 9.9 61.1 88.9 1205.7 27.8s_logb.c logb(double) 1064.4 0.3 50.0 83.3 3131.8 33.3s_modf.c modf(double, double*) 1795.1 3.5 50.0 100.0 507.0 50.0s_nextafter.c nextafter(double, double) 7777.3 17.5 50.0 79.6 445.4 29.6s_rint.c rint(double) 5355.8 3.0 35.0 90.0 1808.3 55.0s_sin.c sin (double) 667.1 0.3 75.0 100.0 1951.4 25.0s_tan.c tan(double) 704.2 0.3 50.0 100.0 2701.9 50.0s_tanh.c tanh(double) 2805.5 0.7 33.3 100.0 4075.0 66.7MEAN 6058.4 6.9 42.8 90.8 3868.0 48.9 igure 6: Symbolic execution methods versus CoverMe. seed and repeatedly mutates it to inputs that can attain moreprogram coverage.

Symbolic Execution

Most branch coverage based testingalgorithms follow the pattern of symbolic execution [27]. Itselects a target path τ , derives a path condition Φ τ , and calcu-lates a model of the path condition with an SMT solver. Thesymbolic execution approach is attractive because of its theo-retical guarantee, that is, each model of the path condition Φ τ necessarily exercises its associated target path τ . In practice,however, symbolic execution can be ineffective if there aretoo many target paths (a.k.a. path explosion), or if the SMTbackend has difﬁculties in handling the path condition. Whenanalyzing ﬂoating-point programs, symbolic execution andits DSE (Dynamic Symbolic Execution) variants [15, 45, 48]typically reduce ﬂoating-point SMT solving to Boolean sat-isﬁability solving [42], or approximate the constraints overﬂoats by those over rationals [32] or reals [12].Fig. 6 compares symbolic execution approaches and Cov-erMe. Their major difference lies in that, while symbolic exe-cution solves a path condition Φ τ with an SMT backend foreach target path τ , CoverMe minimizes a single representingfunction FOO_R with unconstrained programming. Because ofthis difference, CoverMe does not have path explosion issuesand in addition, it does not need to analyze program seman-tics. Similar to symbolic execution, CoverMe also comes witha theoretical guarantee, namely, each minimum that attains 0must trigger a new branch, guarantee that contributes to itseffectiveness (Sect. 6).

Search-based Testing

Miller et al. [37] probably areamong the ﬁrst to introduce optimization methods in au-tomated testing. They reduce the problem of testing straight-line ﬂoating-point programs into constrained program-ming [38, 51], that is, optimization problems in which oneneeds to both minimize an objective function and satisfya set of constraints. Korel [24, 29] extends Miller et al. ’sto general programs, leading to the development of search-based testing [36]. It views a testing problem as a sequenceof subgoals where each subgoal is associated with a ﬁtnessfunction; the search process is then guided by minimizingthose ﬁtness functions. Search-based techniques have been integrated into symbolic execution, such as FloPSy [32] andAustin [30].CoverMe’s representing function is similar to the ﬁtnessfunction of search-based testing in the sense that both reducetesting into function minimization. However, CoverMe usesthe more efﬁcient unconstrained programming rather thanthe constrained programming used in search-based testing.Moreover, CoverMe’s use of representing function comes upwith a theoretical guarantee, which also opens the door to awhole suite of optimization backends.

8. Conclusion

We have introduced a new branch coverage based testingalgorithm for ﬂoating-point code. We turn the challengeof testing ﬂoating-point programs into the opportunity ofapplying unconstrained programming. Our core insight is tointroduce the representing function so that Thm. 4.3 holds,which guarantees that the minimum point of the representingfunction is an input that exercises a new branch of the testedprogram. We have implemented this approach into the toolCoverMe. Our evaluation on Sun’s math library Fdlibmshows that CoverMe is highly effective, achieving 90.8%of branch coverage in 6.9 seconds on average, which largelyoutperforms random testing, AFL, and Austin.For future work, we plan to investigate the potentialsynergies between CoverMe and symbolic execution, andextend this work to programs beyond ﬂoating-point code.

Acknowledgment

This research was supported in partby the United States National Science Foundation (NSF)Grants 1319187, 1528133, and 1618158, and by a GoogleFaculty Research Award. The information presented heredoes not necessarily reﬂect the position or the policy of theGovernment and no ofﬁcial endorsement should be inferred.We would like to thank our shepherd, Sasa Misailovic, andthe anonymous reviewers for valuable feedback on earlierdrafts of this paper, which helped improve its presentation.

References [1] AFL: American Fuzzy Lop. http://lcamtuf.coredump.cx/afl/ . Retrieved: 01 Novermber, 2016.[2] Clang: a C language family frontend for LLVM. http://clang.llvm.org/ . Retrieved: 24 March 2016.[3] Class StrictMath of Java SE 8. https://docs.oracle.com/javase/8/docs/api/java/lang/StrictMath.html . Re-trieved: 09 Novermber, 2016.[4] Code coverage analysis tool for AFL. https://github.com/mrash/afl-cov . Retrieved: 01 Novermber, 2016.[5] Fdlibm: Freely Distributed Math Library. . Retrieved: 01 Nov, 2016.[6] Gcov: GNU compiler collection tool. https://gcc.gnu.org/onlinedocs/gcc/Gcov.html/ . Retrieved: 24 March 2016.[7] klee-dev mailing list. . Retrieved: 09Novermber, 2016.8] llvm::pass class reference. http://llvm.org/docs/doxygen/html/classllvm_1_1Pass.html . Retrieved: 24 March 2016.[9] Scipy optimization package. http://docs.scipy.org/doc/scipy-dev/reference/optimize.html . Retrieved: 24 March2016.[10] Christophe Andrieu, Nando de Freitas, Arnaud Doucet, andMichael I. Jordan. An introduction to MCMC for machinelearning.

Machine Learning , 50(1-2):5–43, 2003.[11] Arthur Baars, Mark Harman, Youssef Hassoun, Kiran Lakhotia,Phil McMinn, Paolo Tonella, and Tanja Vos. Symbolic search-based testing. In

Proceedings of the 2011 26th IEEE/ACMInternational Conference on Automated Software Engineering ,ASE ’11, pages 53–62, Washington, DC, USA, 2011.[12] Earl T. Barr, Thanh Vo, Vu Le, and Zhendong Su. Automaticdetection of ﬂoating-point exceptions. In

POPL , pages 549–560, 2013.[13] D. L. Bird and C. U. Munoz. Automatic generation of randomself-checking test cases.

IBM Syst. J. , 22(3):229–245, 1983.[14] Robert S. Boyer, Bernard Elspas, and Karl N. Levitt. Aformal system for testing and debugging programs by symbolicexecution. In

Proceedings of the International Conference onReliable Software , pages 234–245, New York, NY, USA, 1975.[15] Jacob Burnim and Koushik Sen. Heuristics for scalabledynamic test generation. In , pages 443–446, 2008.[16] Cristian Cadar, Daniel Dunbar, and Dawson Engler. Klee:Unassisted and automatic generation of high-coverage tests forcomplex systems programs. In

Proceedings of the 8th USENIXConference on Operating Systems Design and Implementation ,OSDI’08, pages 209–224, Berkeley, CA, USA, 2008.[17] Cristian Cadar and Koushik Sen. Symbolic execution forsoftware testing: three decades later.

Commun. ACM , 56(2):82–90, 2013.[18] Tsong Yueh Chen, Hing Leung, and IK Mak. Adaptive randomtesting. In

Advances in Computer Science-ASIAN 2004. Higher-Level Decision Making , pages 320–329. 2004.[19] Siddhartha Chib and Edward Greenberg. Understanding theMetropolis-Hastings Algorithm.

The American Statistician ,49(4):327–335, 1995.[20] L. A. Clarke. A system to generate test data and symbolicallyexecute programs.

IEEE Trans. Softw. Eng. , 2(3):215–222,1976.[21] Peter Collingbourne, Cristian Cadar, and Paul HJ Kelly. Sym-bolic crosschecking of ﬂoating-point and simd code. In

Pro-ceedings of the sixth conference on Computer systems , pages315–328, 2011.[22] John E Dennis Jr and Robert B Schnabel.

Numerical meth-ods for unconstrained optimization and nonlinear equations ,volume 16. 1996.[23] Jonathan Eckstein and Dimitri P Bertsekas. On the Dou-glas—Rachford splitting method and the proximal point al-gorithm for maximal monotone operators.

Mathematical Pro-gramming , 55(1-3):293–318, 1992. [24] Roger Ferguson and Bogdan Korel. The chaining approachfor software test data generation.

ACM Trans. Softw. Eng.Methodol. , 5(1):63–86, 1996.[25] Vijay Ganesh and David L Dill. A decision procedure for bit-vectors and arrays. In

International Conference on ComputerAided Veriﬁcation , pages 519–531. Springer, 2007.[26] Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART:directed automated random testing. In

Proceedings of the ACMSIGPLAN 2005 Conference on Programming Language Designand Implementation, Chicago, IL, USA , pages 213–223, 2005.[27] James C. King. Symbolic execution and program testing.

Commun. ACM , 19(7):385–394, 1976.[28] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimizationby simulated annealing.

SCIENCE , 220(4598):671–680, 1983.[29] B. Korel. Automated software test data generation.

IEEETrans. Softw. Eng. , 16(8):870–879, 1990.[30] Kiran Lakhotia, Mark Harman, and Hamilton Gross. Austin:An open source tool for search based software testing of cprograms.

Information and Software Technology , 55(1):112–125, 2013.[31] Kiran Lakhotia, Phil McMinn, and Mark Harman. An em-pirical investigation into branch coverage for C programs us-ing CUTE and AUSTIN.

Journal of Systems and Software ,83(12):2379–2391, 2010.[32] Kiran Lakhotia, Nikolai Tillmann, Mark Harman, and JonathanDe Halleux. FloPSy: Search-based ﬂoating point constraintsolving for symbolic execution. ICTSS’10, pages 142–157,Berlin, Heidelberg, 2010.[33] DM Leitner, C Chakravarty, RJ Hinde, and DJ Wales. Globaloptimization by basin-hopping and the lowest energy structuresof lennard-jones clusters containing up to 110 atoms.

Phys.Rev. E , 56:363, 1997.[34] Z. Li and H. A. Scheraga. Monte Carlo-minimization approachto the multiple-minima problem in protein folding.

Proceed-ings of the National Academy of Sciences of the United Statesof America , 84(19):6611–6615, 1987.[35] Phil McMinn. Search-based software test data generation: Asurvey: Research articles.

Softw. Test. Verif. Reliab. , 14(2):105–156, 2004.[36] Phil McMinn. Search-based software testing: Past, present andfuture. ICSTW ’11, pages 153–163, Washington, DC, USA,2011.[37] W. Miller and D. L. Spooner. Automatic generation of ﬂoating-point test data.

IEEE Trans. Softw. Eng. , 2(3):223–226, 1976.[38] M. Minoux.

Mathematical Programming: Theory and Algo-rithms . Wiley, New York, 1986.[39] Glenford J. Myers. The art of software testing (2nd ed.). pagesI–XV, 1–234, 2004.[40] Jorge Nocedal and Stephen J. Wright.

Numerical Optimization .2006.[41] Hristina Palikareva and Cristian Cadar. Multi-solver supportin symbolic execution. In

Computer Aided Veriﬁcation , pages53–68. Springer, 2013.42] Jan Peleska, Elena Vorobev, and Florian Lapschies. Automatedtest case generation with smt-solving and abstract interpreta-tion. NFM’11, pages 298–312, Berlin, Heidelberg, 2011.[43] William H. Press, Saul A. Teukolsky, William T. Vetterling,and Brian P. Flannery.

Numerical Recipes 3rd Edition: The Artof Scientiﬁc Computing . New York, NY, USA, 2007.[44] Herbert Robbins and Sutton Monro. A stochastic approxi-mation method.

The annals of mathematical statistics , pages400–407, 1951.[45] Koushik Sen and Gul Agha. CUTE and jCUTE: Concolic unittesting and explicit path model-checking tools. In

CAV’06 ,pages 419–423, 2006.[46] Koushik Sen, Darko Marinov, and Gul Agha. CUTE: Aconcolic unit testing engine for C. ESEC/FSE-13, pages 263–272, New York, NY, USA, 2005. [47] Chayanika Sharma, Sangeeta Sabharwal, and Ritu Sibal. Asurvey on software testing techniques using genetic algorithm. arXiv:1411.1154 , 2014.[48] Dawn Song, D Brumley, H Yin, J Caballero, I Jager, MG Kang,Z Liang, J Newsome, P Poosankam, and P Saxena. Bitblaze:Binary analysis for computer security, 2013.[49] Nikolai Tillmann and Jonathan De Halleux. Pex: White boxtest generation for .net. TAP’08, pages 134–153, Berlin,Heidelberg, 2008.[50] Mark Utting, Alexander Pretschner, and Bruno Legeard. Ataxonomy of model-based testing approaches.

Software Testing,Veriﬁcation and Reliability , 22(5):297–312, 2012.[51] G. Zoutendijk.

Mathematical programming methods . North-Holland, Amsterdam, 1976. . Untested programs in Fdlibm andExplanation

Tab. 4 lists all untested programs and functions and explainsthe reasons. Three types of the functions are excluded fromthe evaluation. They are (1) functions without any branch, (2)functions input parameters that are not ﬂoating-point, and (3)static C functions.

B. AFL Settings

We run AFL following the instructions from its readme ﬁle :We ﬁrst instrument the source program using afl-gcc , then we generate the test data using afl-fuzz . We terminate afl-fuzz when it spends ten times of the CoverMe time.We show our test harness programs when we run AFLon Fdlibm programs. Test harness for Fdlibm programs thataccept a single double parameter, e.g. , e_acos.c : //LCOV_EXCL_START int main(int argc, char *argv[]){ double x = 0;scanf("%lf", &x);fdlibm_program(x);return 0;} //LCOV_EXCL_STOP Test harness for Fdlibm programs that accept two double parameters, e.g. , e_atan2.c : //LCOV_EXCL_START int main(int argc, char *argv[]){ double x = 0;double y = 0;scanf("%lf %lf", &x, &y);fdlibm_program(x,y);return 0; } //LCOV_EXCL_STOP Test harness for Fdlibm programs that accept a double and a double* parameters, e.g. , e_rem_pio2.c : //LCOV_EXCL_START int main(int argc, char *argv[]){ double x = 0;double y = 0;scanf("%lf %lf", &x, &y);fdlibm_program(x,&y);return 0;} //LCOV_EXCL_STOP Above, the

LCOV_EXCL_START and

LCOV_EXCL_STOP partsmark the beginning and the end of an excluded section. It http://lcamtuf.coredump.cx/afl/README.txt We use AFL-clang, the alternative of afl-gcc , for programs e_j0.c and e_j1.c . afl-gcc crashes on the two due a linking issue; we are unsurewhether this is due to bugs in afl-gcc or our misusing it. allows the coverage analysis tool GCov [6] to ignore the linesin between when reporting the coverage data. C. Line Coverage Results

The algorithm presented in the paper focuses on branchcoverage. CoverMe can also achieve high line coverage.The line coverage refers to the percentage of source linescovered by the generated test data. Tab. 5 provides the linecoverage results of CoverMe, AFL, and Rand. The linecoverage of CoverMe, AFL, and Rand are shown in Col.7, 8, 9 respectively. The time is the same as in Tab. 2. Inparticular, CoverMe has achieved 100% line coverage for28 out of 40 tested functions. On average, CoverMe, AFL,and Rand get line coverage of 97.0%, 87.0%, and 54.2%,respectively. able 4: Untested programs in Fdlibm and the explanations.

File name Function name Explanatione_gamma_r.c ieee754_gamma_r(double) no branche_gamma.c ieee754_gamma(double) no branche_j0.c pzero(double) static C functionqzero(double) static C functione_j1.c pone(double) static C functionqone(double) static C functione_jn.c ieee754_jn(int, double) unsupported input typeieee754_yn(int, double) unsupported input typee_lgamma_r.c sin_pi(double) static C functionieee754_lgammar_r(double, int*) unsupported input typee_lgamma.c ieee754_lgamma(double) no branchk_rem_pio2.c kernel_rem_pio2(double*, double*, int, int, const int*) unsupported input typek_sin.c kernel_sin(double, double, int) unsupported input typek_standard.c kernel_standard(double, double, int) unsupported input typek_tan.c kernel_tan(double, double, int) unsupported input types_copysign.c copysign(double) no branchs_fabs.c fabs(double) no branchs_ﬁnite.c ﬁnite(double) no branchs_frexp.c frexp(double, int*) unsupported input types_isnan.c isnan(double) no branchs_ldexp.c ldexp(double, int) unsupported input types_lib_version.c lib_versioin(double) no branchs_matherr.c matherr(struct exception*) unsupported input types_scalbn.c scalbn(double, int) unsupported input types_signgam.c signgam(double) no branchs_signiﬁcand.c signiﬁcand(double) no branchw_acos.c acos(double) no branchw_acosh.c acosh(double) no branchw_asin.c asin(double) no branchw_atan2.c atan2(double, double) no branchw_atanh.c atanh(double) no branchw_cosh.c cosh(double) no branchw_exp.c exp(double) no branchw_fmod.c fmod(double, double) no branchw_gamma_r.c gamma_r(double, int*) no branchw_gamma.c gamma(double, int*) no branchw_hypot.c hypot(double, double) no branchw_j0.c j0(double) no branchy0(double) no branchw_j1.c j1(double) no branchy1(double) no branchw_jn.c jn(double) no branchyn(double) no branchw_lgamma_r.c lgamma_r(double, int*) no branchw_lgamma.c lgamma(double) no branchw_log.c log(double) no branchw_log10.c log10(double) no branchw_pow.c pow(double, double) no branchw_remainder.c remainder(double, double) no branchw_scalb.c scalb(double, double) no branchw_sinh.c sinh(double) no branchw_sqrt.c sqrt(double) no branch able 5: Line Coverage: CoverMe versus Rand and AFL. The benchmark programs are taken from Fdlibm [5]. We calculate the coveragepercentage using Gcov [6].

Benchmark Time (s) Line (%) improvement (%)File Function if (hx<0x00100000) { /* subnormal x */ if (hx==0) { for (ix = -1043, i=lx; i>0; i<<=1) ix -=1; } else { for (ix = -1022,i=(hx<<11); i>0; i<<=1) ix -=1; } } else ix = (hx>>20)-1023; /* determine iy = ilogb(y) */ if (hy<0x00100000) { /* subnormal y */ if (hy==0) { for (iy = -1043, i=ly; i>0; i<<=1) iy -=1; } else { for (iy = -1022,i=(hy<<11); i>0; i<<=1) iy -=1; } } else iy = (hy>>20)-1023; Figure 8: Benchmark program e_fmod.c , Lines 57-72.

D. Illustration of the Incompleteness

This section discusses two benchmark programs where Cov-erMe did not attain full coverage. The ﬁrst one is related toinfeasible branches; the second one is due to the weaknessof the unconstrained programming backend Basinhopping insampling subnormal numbers.

Missed branches in k_cos.c

Fig. 7 lists one of the small-est programs in our tested benchmarks, k_cos.c . The pro-gram operates on two double inputs. It ﬁrst takes |x| ’s highword by bit twiddling, including a bitwise

AND (&), a pointreference (&) and a dereference (*) operator. The bit twid-dling result is then stored in an integer variable ix (Line 3),following which four conditional statements depend on ix (Lines 4-15). As shown in Tab. 2, CoverMe attained 87 . if ((int) x) == 0) always holds because it is nested within the | x | < = − branch. (We presume that Sun’s developers decidedto use this statement to trigger the ﬂoating-point inexactexception as a side effect.) Since no inputs of __kernel_cos can trigger the opposite branch of if (((int) x) == 0) ,the 87 .

5% branch coverage is in fact optimal.

Missed branches in e_fmod.c

Fig. 8 lists a segment ofprogram e_fmod.c from line 57 to 72. As shown in Tab. 3,CoverMe covered 28 out of the 44 branches in the program.The comment is copied from the original program directly.Line 57 (resp. Line 66) is a condition that can be triggeredonly if its input x (resp. y ) is subnormal. With our currentsettings speciﬁed in Sect. 6.1, CoverMe could not generateany subnormal number because its optimization backendcould not. Thus, CoverMe missed the two branches in Lines57 and 66 and all the nested branches, which are 12 in total. __HI(x) *(1+(int*)&x) double __kernel_cos(double x, double y){ ix = __HI(x)&0x7fffffff; /* ix = |x|’s high word */ if (ix<0x3e400000) { /* if |x| < 2**(-27) */ if (((int)x)==0) return ...; /* generate inexact */ } ...; if (ix < 0x3FD33333) /* if |x| < 0.3 */ return ...; else { if (ix > 0x3fe90000) { /* if |x| > 0.78125 */ ...; } else { ...; } return ...; } } Figure 7: Benchmark program k_cos.ck_cos.c