Comparing and Combining Approximate Computing Frameworks
CComparing and Combining ApproximateComputing Frameworks
Saeid Barati
Computer Science DepartmentUniversity of Chicago
Chicago, [email protected]
Gordon Kindlmann
Computer Science DepartmentUniversity of Chicago
Chicago, [email protected]
Hank Hoffmann
Computer Science DepartmentUniversity of Chicago
Chicago, [email protected]
Abstract —Approximate computing frameworks configure ap-plications so they can operate at a range of points in an accuracy-performance trade-off space. Prior work has introduced manyframeworks to create approximate programs. As approximationframeworks proliferate, it is natural to ask how they can be com-pared and combined to create even larger, richer trade-off spaces.We address these questions by presenting VIPER and BOA.VIPER compares trade-off spaces induced by different approx-imation frameworks by visualizing performance improvementsacross the full range of possible accuracies. BOA is a family ofexploration techniques that quickly locate Pareto-efficient pointsin the immense trade-off space produced by the combination oftwo or more approximation frameworks. We use VIPER andBOA to compare and combine three different approximationframeworks from across the system stack, including: one thatchanges numerical precision, one that skips loop iterations, andone that manipulates existing application parameters. Comparedto simply looking at Pareto-optimal curves, we find VIPER’svisualizations provide a quicker and more convenient way todetermine the best approximation technique for any accuracyloss. Compared to a state-of-the-art evolutionary algorithm, wefind that BOA explores 14 × fewer configurations yet locates 35%more Pareto-efficient points. I. I
NTRODUCTION
Approximation frameworks configure applications to op-erate within a trade-off space where the result accuracy isexchanged for other benefits, typically increased performance.Different approximation frameworks exist across the layersof the system stack. Some focus on the circuit level [1]–[6]. Others replace expensive hardware with approximations[7]–[10]. Still others exist at the programming language andcompiler level [11]–[18].As approximation methods proliferate, it is natural to ques-tion their interaction; especially: • How to compare the trade-off spaces induced by differenttechniques?
Comparing individual points in the trade-offspace is easy: simply compare all framework’s perfor-mance at that point. Most approximation frameworks,however, produce a trade-off space—with a range ofoperating points—so we need techniques that compareframeworks across that entire range. • How to combine different techniques’ trade-off spacesand locate Pareto-efficient points in the new space?
Thechallenge is quickly locating more efficient configurations in the immense combined trade-off space, which is toobig to search exhaustively.
VIPER and BOA:
To compare approximation frameworkswe propose VIPER: Visualizing Improved PErformance Ra-tios. While existing techniques use numerical comparisons[19]–[23] or simply display Pareto-optimal curves, VIPERproduces a visual representation of the trade-off space. VIPERproduces charts showing normalized performance for differentframeworks across all possible accuracy loss ranges. A chartis divided into different, mathematically meaningful regionsthat show how much one framework out-performs others.To combine frameworks, we propose BOA: Blending Opti-mal Approximations. BOA is a family of algorithms that locatePareto-efficient points in the huge trade-off space producedby multiple frameworks. In its simplest version, BOA-simplesearches the cross product of Pareto-optimal points fromindividual frameworks. BOA extensions, evaluate more of thesearch space—either deterministically or probabilistically—including more near-optimal points. BOA then returns thePareto-efficient points from this search space.
Summary of Results:
We consider two case studies. Bothuse prior approximation frameworks from across the systemstack: Loop Perforation—a compiler technique (LP) [24],PowerDial—an application-level technique (PD) [25], and theApproximate Math Library—a library that changes numericalprecision (AML) [18]. We use eleven applications coveringdomains from machine learning to image processing. Each ap-plication includes multiple inputs that we divide into trainingand test sets to evaluate whether combination methods producestatistically sound results on unseen inputs.The first case study uses VIPER to compare Loop Perfo-ration, PowerDial, and the Approximate Math Library. LoopPerforation simply discards loop iterations with no regardto original intent. PowerDial , in contrast, builds off ap-proximations that already exist in the application; i.e., thoseenvisioned by the original programmer. The ApproximateMath Library approximates math functions ( e.g., exp , log and sqrt ) using variable Taylor series expansion. While theseapproximations work at different levels of the stack, VIPERallows us to quickly compare them and produces more intuitivevisualizations than simply looking at Pareto-optimal curves.The second case study combines the three approximation1 a r X i v : . [ c s . A I] F e b echniques and locates Pareto-efficient points in this new,significantly larger trade-off space. We compare BOA to twostate-of-the-art design-space exploration algorithms: MultipleChoice Knapsack Problem (MCKP) [26] and Non-SortingGenetic Algorithm (NSGA-II) [27]. Compared to these twoapproaches, BOA achieves: • More efficient configurations : BOA-simple produces48.2% and 35.1% more Pareto-efficient points thanMCKP and NSGA-II, respectively. • More reliable behavior on unseen inputs : BOA-simplefinds statistically meaningful Pareto-efficient configura-tions that are not sensitive to input data and are morelikely to be efficient on an unseen set of inputs. Thatis, the correlation between BOA’s behavior on training(seen) and test (unseen) inputs is much higher thanMCKP and NSGA-II.Somewhat surprisingly, BOA produces much better resultswhile using a much simpler search technique than the priorworks to which it is compared. The fact that simpler searchmethods can produce better results is a key contribution of thiswork. The primary insight is that, empirically, we find thatoptimal combinations of approximation frameworks tend toinvolve configurations that are near-optimal for the individualframeworks. Therefore, BOA explores combinations derivedfrom these points with high probability. In contrast, MCKPdoes not consider enough non-optimal configurations and getsstuck in local minima, while NSGA-II explores too many non-optimal configurations—avoiding the worst local minima, butalso stopping short of the true optimal combinations. Thus,BOA’s method represents a compromise that works well forapproximation frameworks, whose optimal combinations tendto be near the individually optimal points.
Contributions: • Introduction of VIPER to visually compare approxima-tion trade-off spaces over their entire range. • Comparison analysis—based on VIPER—of the threeapproximation frameworks (Loop Perforation [24], Pow-erDial [25] and Approximate Math Library [18]). • Proposal of variations of BOA for fast locating Pareto-efficient configurations in combined trade-off space. • Open-source release of the VIPER and BOA tools.II. M
OTIVATION AND B ACKGROUND
A. Approximation Across the System Stack
Approximation frameworks reduce runtime (or energy) byallowing output quality degradation. Hardware approximationcomputes inexactly in return for reduced energy, area, or time[2], [4], [28]. Many software approximation techniques allowspecific software components to be replaced by approximatevariants; e.g. skipping loop iterations or replacing mathoperations with Taylor-series expansions [17], [18], [24], [25],[29]–[38]. Some approaches use machine learning to replaceexact computation with a faster, less accurate learned variant[5], [8], [9], [39]–[41]. Languages support approximationallowing specification of variants for key functionality and N o r m a li ze d R un ti m e Fluidanimate
Loop PerforationPowerDialApproximate Math Library
Fig. 1: fluidanimate ’s accuracy/runtime trade-off space with Loop Per-foration (LP), PowerDial (PD), and Approximate Math Library (AML). Thecloser to origin, the more efficient configuration. formal analysis of their effects [11]–[13], [15], [16], [42]–[45]. Other mechanisms guarantee that approximate programswill maintain some quality or energy guarantees either throughprogram analysis [46]–[48] or runtimes with formally analyz-able dynamic adaptation [37], [49]–[55].
B. Comparing Approximation Frameworks
Our intuition is that some approximation frameworks willproduce better results (e.g., higher performance for the sameaccuracy) than others in different situations. Hence, we needa method to compare frameworks across their full range ofaccuracy and choose the best one for a specific usage scenario.Points in these trade-off spaces correspond to configurations of the approximation framework. Point-by-point comparisonis unfeasible since trade-off spaces include numerous config-urations, many of which are not useful. Typically, only thePareto-optimal points are used for comparison.As an example, we compare three approximation frame-works: PowerDial [25], Loop Perforation [24], [31], [32],and the Approximate Math Library [18]. We pick these threeframeworks because (1) they are either easily recreated orpublicly available, requiring no specialized language or hard-ware support, and yet (2) they are representative of approachesapplied at different levels of the system stack. PowerDial isan application-level approach that exploits existing trade-offsenvisioned by the application developers. Loop Perforationcreates approximate applications by applying a compiler trans-formation to selectively skip loop iterations. The ApproximateMath Library changes computation, and while implementedin software, it is a good proxy for approximation techniquesthat change hardware arithmetic units. Figure 1 illustratesthe trade-off spaces induced by these three approaches forthe fluidanimate benchmark. Each point represents thenormalized runtime and accuracy loss.Comparison of approximation frameworks across their fullrange of accuracy is necessary, as not all users have thesame accuracy requirements. We find, however, that lookingat Pareto-optimal curves like those in Figure 1) is unsatisfyingand rarely makes it obvious which approximation method isbetter across a range of operating points. Moreover, while fora certain range of accuracy one framework might performbetter, another framework might produce higher performance2t different accuracy ranges. This motivates VIPER, a tool thatallows users to tell—at a glance—which framework has thebest performance for any range of accuracy loss.
C. Combining Approximation Frameworks
Figure 1 suggests that none of the three approximationframeworks is uniformly best. Furthermore, the fact that thethree are broadly representative of approaches from differentlevels of the ssystem stack motivates us to combine them forbetter accuracy/runtime tradeoffs than any individual frame-work. The challenge is that merging multiple frameworks leadsto an enormous trade-off space, which is infeasible to exploreexhaustively. Table II lists the number of points in the trade-off spaces of Loop Perforation, PowerDial, and ApproximateMath Library for sample benchmarks. For example, the x264 benchmark takes up to 4 weeks to test all combined configura-tions with a single input. Considering multiple inputs shouldbe tested for statistically sound results, the unfeasibility ofexhaustive exploration is obvious.More formally, combining approximation frameworks re-quires quickly locating Pareto-efficient configurations in thenew, larger trade-off space. Exploring large trade-off spacesis well-studied and has produced two broad classes of ap-proach. The first is carefully selecting and exhaustively search-ing a subset of the combined trade-off space [26], [56].The second class intelligently traverses the entire combinedtrade-off space—not limiting the initial configuration com-binations, but exploreing a small number of the total [21],[57], [58]. Among these intelligent search techniques, NSGA-II—a genetic algorithm-based approach—has repeatedly out-performed other proposals [27], [58].While prior work has proven effective for application-specific processor design, we find that it is not the bestmatch for combining approximation frameworks. Specifically,heuristic exploration of genetic algorithms appear to causestwo issues: (1) in an effort to avoid local minima, they produceless efficient combinations (see Section VI-B) and (2) theyadd too much randomization that leads to lower correlationbetween training and test inputs (see Section VI-E).III. C
OMPARING A PPROXIMATION F RAMEWORKS
A. Terminology
To produce performance/accuracy tradeoffs, any approxi-mation framework must have one or more tunable param-eters . The values assigned to the parameter set represent a configuration and the range of possible parameter settings isa configuration space . Each configuration represents a trade-off between performance and accuracy. The trade-off space (or design space ) is the set of all possible trade-offs; i.e., therange of achievable performance and accuracy.We consider large search spaces and often do not know thetrue optimal values for which we are searching. We thereforedistinguish between
Pareto-optimal —meaning we know thata point is on the true Pareto-optimal frontier—-and
Pareto-efficient —meaning a point on the estimated, unknown Pareto-optimal frontier. Thus, if we say a point is Pareto-efficient, it is better than all other points found so far, but we do not knowthat it is truly Pareto-optimal.
B. Numerical Comparisons
For large trade-off spaces, a point-by-point comparison isnot possible. Therefore, prior work has introduced analyticalmethods for comparing trade-off spaces based on the numberof Pareto-optimal—if the trade-off space is known—or Pareto-efficient—if the trade-off space is estimated—points from eachframework.A point in our accuracy-performance trade-off space is a 2Dvector with runtime and accuracyLoss . Ideally, we wouldhave zero run time and zero accuracy loss; i.e., instantaneouslyget a perfect answer, leading to:
Definition 1.
Objective Function: Given points x and x inthe objective to be minimized is f ( x ) where: f ( x ) < f ( x ) ⇐⇒ accuracyLoss ( x ) < accuracyLoss ( x )& runtime ( x ) < runtime ( x ) (1)Points closer to the origin represent more efficient configu-rations. Given the objective function f ( x ) , we determine if apoint is more efficient than another by: Definition 2.
Dominance: Given points x and x , we say: x (cid:23) x (weakly dominates) if f ( x ) (cid:54) f ( x ) x (cid:31) x (dominates) if f ( x ) < f ( x ) (2)A point is Pareto-optimal if it is not dominated by any otherpoint. A point is Pareto-efficient if we do not know of anotherpoint that dominates it. Figure 2(a) illustrates an example ofdominance where point x is dominated by point x . Coverage quantifies the number of Pareto-efficient pointsproduced by different techniques [57]:
Definition 3.
Coverage is the dominance ratio of the Pareto-efficient curves induced by two separate frameworks. If X and Y are two Pareto-efficient curves, and x and y represent pointson them respectively, then: C ( X, Y ) = |{ y ∈ Y | ∃ x ∈ X : x (cid:23) y }|| Y | (3) C ( X, Y ) = 1 means that all points in Y are weaklydominated by points in X; i.e., all points of X provide lowerruntime for the same accuracy loss than the points of Y .Figure 2(b) illustrates the coverage of curve X with respectto curve Y . The point y and y on the Y curve are dominatedby at least one point on the curve X — x , for example—therefore C ( X, Y ) = . In contrast, no point on X isdominated by a point on curve Y which means C ( Y, X ) = 0 .By this metric, we consider X more efficient, but note that thecurve Y extends through a larger range within the trade-offspace; i.e., y is a useful point which neither dominates noris dominated by any points in X . The coverage function isnon-symmetric ( C ( X, Y ) (cid:54) = C ( Y, X ) ) and usually their sumdoes not equal 1 [20]. Hence, we need a metric that considersboth coverage functions simultaneously:3 ig. 2: Dominance (a) and Coverage (b) Functions (from [57]). In (a) x dominates x as x is both faster and more accurate. In (b), X covers / of Y because y and y are dominated by at least one point in X . Definition 4.
Difference Of Coverage compares coverage fortwo different Pareto-efficient curves.
DOC ( X, Y ) = C ( X, Y ) − C ( Y, X ) (4)As a result, when DOC ( X, Y ) ≥ , that fraction of Y points which are dominated by X is greater than X pointsthat are dominated by Y . Higher DOC implies one set ismore efficient than the other. If
DOC ( X, Y ) is close to zeroboth may provide the same efficiency. This metric is widelyused by in multi-objective optimization problems [21]–[23]. C. VIPER
Numerical comparisons suffer from two major shortcom-ings. First, they do not show the full range of accuracy lossinduced by each framework. Second, as seen in Figure 1, thebest approximation framework varies as accuracy loss changes.Numerical metrics—like DOC—have limited expressiveness;Figure 2(b) shows that y is a useful point, but DOC makes X look uniformly better than Y .As an alternative to numerical methods, researchers haveused Pareto-optimal curves to compare frameworks, but thisgraphical evaluation has proven problematic [56], [59]. Whilecurves may look compact, they can be different by orders ofmagnitude; e.g. when there is a steep slope and a large rangecovered, a small change in one dimension ( e.g. accuracy loss)leads to a significant shift in the other ( e.g. runtime). Algorithm 1
VIPER.
Require:
M, B (cid:46)
Lower convex hulls for framework M and baseline B MinX = Max ( Min ( M.x ) , Min ( B.x )) (cid:46) lower bound MaxX = Min ( Max ( M.x ) , Max ( B.x )) (cid:46) upper bound step = ( MaxX − MinX ) / for accuLoss = MinX ; accuLoss < MaxX ; accuLoss + = step do M i ← find point on M where M i .x < accuLoss < M i +1 .x B j ← find point on B where B j .x < accuLoss < B j +1 .x ˆ y M ← interpolate runtime between M i and M i +1 where x = accuLoss ˆ y B ← interpolate runtime between B j and B j +1 where x = accuLoss perfImprovRatio [ accuLoss ] = ˆ y M / ˆ y B end for NORMALIZE( perfImprovRatio ) (cid:46) limit the ratio to [0,1] return perfImprovRatio (cid:46) array of points To provide an alternative visualization of approximationframeworks we introduce VIPER, which illustrates the relativeperformance of frameworks for any range of accuracy loss.Algorithm 1 explains how VIPER calculates the performanceimprovement ratio (PIR) of one framework M over a baseline B . The PIR is the ratio of the frameworks’ performance at agiven accuracy loss. To make the charts readable, PIR is inthe range [0,1]. PIR = 1 means a configuration is the fastest inthe space, while PIR = 0 is the slowest. A configuration withPIR = 0 . achieves 60% of the maximum speedup.First, VIPER finds the lower and upper bounds for theaccuracy loss which defines the range of comparison. Then,this range gets divided by a parameterized granularity. We usea granularity of 1000 in this paper as larger values producedno benefit and smaller values make the charts less clear.For each accuLoss value, VIPER finds the correspondingruntime in both frameworks. Afterwards, we search for thenearest points on each lower convex hull where their accuracyloss is smaller than accuLoss (identified with M i and B j points respectively). Then, we interpolate the runtime of thespecified accuLoss for both frameworks (named as ˆ y M and ˆ y B ), and divide these interpolated runtime values to computethe performance improvement ratio. Finally, we normalize theratio to [0,1]. When we compare more than two frameworks,the ratio is normalized to the lowest and highest among all.VIPER then charts the PIR across the range of accuracy loss.The values for the baseline B will be a straight horizontalline. Values of M above that line indicate that M achieveshigher performance for that accuracy loss. If the line for M stays above that for B for a greater range of accuracy loss, itmeans M has found more efficient configurations, on average.The color shading on the plot background indicates the highestperformance method for that accuracy loss from the multipleframeworks. Therefore, if the plot’s background is dominatedby a single color, the corresponding method provides the moreefficient configurations. Thus, VIPER allows users to see ata (literal) glance, whether one approximation framework isclearly superior to another.IV. C ASE S TUDY
1: C
OMPARING F RAMEWORKS
A. Experimental Setup
We use a dual socket Intel Xeon E5 server system with20 physical cores at 2.9 GHz, hyperthreading, and 32 GBmemory. Table I lists the used benchmarks, from Parsec 3.0[60] and Rodinia 3.1 [61]. Table I also contains the description,type, application accuracy metric, and default run-time foreach benchmark. Accuracy loss is the error relative to the mostaccurate configuration. Shorter runtime interprets as higherperformance. blackscholes ’s only tunable parameter is thenumber of prices to estimate and modifying it does not affectaccuracy. Thus, PowerDial has no effect on blackscholes .Similarly, canneal , heartwall , kmeans , and x264 usemath functions infrequently; the Approximate Math Library isnot applicable to them.We evaluate each suite across multiple inputs and comparethe median across the inputs for this evaluation. In this section,we are evaluating known frameworks, thus we use the traininginputs from Table I. In subsequent sections—where we presentnew techniques—we divide inputs into training and test andbuild combinations of frameworks using the training data and4hen use the test data to ensure our selected combination workswell on previously unseen data.We evaluate three approximation frameworks. PowerDial(PD) transforms an application’s command line parametersinto software knobs that are automatically manipulated to tradeaccuracy for performance [25]. Each application has tunable knobs , which can take different values, and an assignmentof values to knobs is a configuration. Loop Perforation (LP)identifies perforatable loops whose iterations can be skipped toproduce faster, but less accurate results [24]. A set of loops andperforation rates is a configuration. The Approximate MathLibrary (AML) substitutes math functions with a variableTaylor series expansion. A set of functions and their numberof terms is the configuration. B. Comparison by Difference of Coverage
To compare Loop Perforation and PowerDial, we calculatethe average coverage function ( C ( X, Y ) from Eq. 3) acrossall benchmarks for both. On average, Loop Perforation coversonly . Pareto-optimal points of PowerDial, while Power-Dial covers . Pareto-optimal points of Loop Perforation.Thus,
DOC ( LoopP erf oration, P owerDial ) = − . which shows the slight superiority of PowerDial over LoopPerforation, on average. On the other hand, negative valuesof DOC ( AM L, P D ) = − . and DOC ( AM L, LP ) = − . shows significant inferiority of Approximate MathLibrary against other frameworks, on average. C. Comparison by Pareto-optimal Curves
Figure 3a illustrates the frameworks’ trade-off spaces. Eachpoint represents a configuration. The y-axis is runtime normal-ized to the default configuration and the x-axis is the accuracyloss. Each framework’s Pareto-optimal curve is shown in thesame color as the trade-off space. These plots highlight howconfigurations cover wide ranges of runtime and accuracyloss. While in some cases— e.g. canneal , kmeans , and srad —Pareto-optimal curves are easily to compare, in otherbenchmarks— e.g. particlefilter and swaptions —comparison is unfeasible. For fluidanimate , the Pareto-optimal curves intersect multiple times; the best approximationframework differs across the range of accuracy loss. D. Comparison by VIPER
Just viewing the Pareto-optimal curves in Figure 3a provideslimited intuition as differences are not always visible. Weuse VIPER to compare these frameworks in Figure 3b. They-axis represents the performance improvement ratio (PIR),while the x-axis illustrates accuracy loss. The horizontal linerepresents Loop Perforation: points above that line mean thecorresponding technique is faster than Loop Perforation. Thebackdrop color indicates the best method for an accuracy loss.VIPER shows only useful configurations—if one dominatesothers, VIPER shows a small range of accuracyLoss. Thus,some ranges are not shown in the Figure 3b plots comparingto accuracy loss ranges in Figure 3a because there is no benefitto increasing accuracyLoss . VIPER provides the following insights: • It illustrates how frameworks perform within a specificaccuracy loss range. For instance, while PowerDial findsa higher performance configuration than Loop Perforationfor bodytrack and canneal , its performance is worsefor kmeans and x264 for most accuracies. • While the distinction between frameworks is not clearin Figure 3a for streamcluster and swaptions ,VIPER allows quick, obvious comparison. • VIPER clearly illustrates the intersection of Pareto-optimal curves; e.g. , in fluidanimate and srad .Since VIPER only requires trade-off spaces to compare, itcan be applied to any approximation frameworks regardlessof system level. We believe VIPER provides clear insights,which are instantly visually recognizable and mathematicallymeaningful. VIPER is not a replacement for existing methods,but a complement that simplifies comparison.V. C
OMBINING A PPROXIMATION F RAMEWORKS
The prior section shows that none of Loop Perforation,PowerDial, or the Approximate Math Library is uniformlybest. This observation motivates us to combine frameworks.At one level, this process is quite easy—just create a newtrade-off space that is the cross product of all configurationsin the original frameworks. The challenge, of course, is quicklylocating the Pareto-efficient points in the resulting massivecombined trade-off space.We meet this challenge with the BOA family of searchalgorithms. All BOA methods select a subset of the combinedtrade-off space and exhaustively search that subspace space.The first algorithm, BOA-simple, only considers configura-tions in the cross-product of individual frameworks’ Pareto-optimal configurations. This technique produces a relativelysmall set of points to search, but may be subject to localminima if the approximation frameworks are not independent.Unfortunately, most approximation frameworks are not in-dependent. For example, Loop Perforation changes the num-ber of loop iterations within an application; PowerDial maychange convergence criteria. When we combine configurationsfrom these frameworks, we find that some configurations thatwere Pareto-optimal when considering only the original frame-works are now far from optimal. Conversely, we empiricallyfind that some configurations that were not Pareto-optimal inthe original frameworks combine to be Pareto-optimal whenwe consider multiple frameworks together. These observationsmotivate us to expand BOA-simple to include more nonPareto-optimal configurations in combination.BOA-flex expands the combined search space to considerthe configurations that produce a trade-off within a user-defined threshold of Pareto-optimal. This technique searchesmore points and tends to find more efficient combinations,but it is deterministic. A common way to avoid local minimain large search spaces is expanding the exploration area withsome form of randomization. We follow this approach withthe last algorithm: BOA-prob, which probabilistically selectsconfigurations from each individual framework to combine.5
ABLE I: Benchmarks used for evaluation.
Benchmarks Accuracy Metric Training inputs Test Inputs Runtime(sec)Blackscholes
Average Relative Error of Prices 30 lists with 1M initial prices 90 lists with 1M initial prices 3.2
Bodytrack
Average Distance of Poses sequence of 100 frames sequence of 261 frames 3.1
Canneal
Average Relative Routing Cost 30 netlists with 400K+ elements 90 netlists with 400K+ elements 6.88
Fluidanimate
Distance between Particles 5 fluids with 100K+ particles 15 fluids with 500K+ particles 17.2
Heartwall
Average Relative Error of heart frames sequence of 30 ultrasound images sequence of 100 ultrasound images 11.6
Kmeans
Distance between Cluster Centers 30 vectors with 256K data points 90 vectors with 256K data points 3.1
Particlefilter
Distance between Particles sequence of 60 frames sequence of 240 frames 12.9
Srad
Image Diff (RMSE) 3 images with 2560*1920 pixels 9 images with 2560*1920 pixels 22.6
Streamcluster
Distance between Cluster Centers 3 streams of 19k-100K data points 9 streams of 100K data points 30
Swaptions
Average Relative Error of Prices 40 swaptions 160 swaptions 6.2 x264
Relative PSNR+Bitrate 4 HD videos of 200+ frames 12 HD videos of 200+ frames 7.7
Specifically, it uses a sigmoid probability function, so that thecloser points are to Pareto-optimal, the more likely they areto be included in the combined trade-off space. BOA-probincludes most of the same points as the other frameworks, butincludes some outliers with small probability, making it morerobust in the presence of local minima.
BOA-simple:
The simple version of BOA forms the cross-product of all Pareto-optimal configurations from the indi-vidual frameworks. After executing on evaluation platform,BOA-simple returns the Pareto-efficient configurations foundin this combined trade-off space. The worst case complexityof BOA-simple is bounded by O ( m + log ( m )) ∗ nm where m is the number of frameworks to be merged and n contains thetotal number of parameters of all approximation frameworks[56]. In our experiments, input parameters are the sum of thenumber of loop rates, software knobs, and Taylor series boundsthat represent our three frameworks. While the algorithmhas exponential complexity, it is practical because so fewconfigurations lie on the Pareto-optimal frontiers produced byindividual frameworks (see Figure 3 and Table V).
BOA-flex:
BOA-flex augments BOA-simple with a user-specified selection threshold, as shown in Algorithm 2. Thisthreshold also removes some inconsistency that may arise dueto experimental noise; i.e., it is possible that for high varianceapplications, the true Pareto-optimal configurations cannotbe found with confidence, so adding the threshold makesthe search more robust. Specifically, BOA-flex considers allconfigurations whose trade-off is within the user-specifiedthreshold of a Pareto-optimal trade-off.This threshold is specified in terms of normalized Euclideandistance . All trade-offs are normalized so that accuracy lossand runtime range from 0 to 1. Accuracy loss of 1 means thelowest quality. A runtime of 1 is the slowest execution time.A trade-off point is the output of executing a configurationand, is thus, a pair of accuracy loss and runtime. Havingnormalized all configurations accuracy loss and runtime, wecan then compute the Euclidean distance between the trade-offs of two separate configurations. Given this definition, thethreshold specifies how close to Pareto-optimal a trade-offmust be for it to be included in the search. For example, For the purpose of time complexity analysis, we assume each approxima-tion knob can take on two values only, however, in reality, parameters maybe assigned a larger number of values. the threshold is . , and then the algorithm will include anyconfiguration whose accuracy loss/runtime trade-off is within5% of a Pareto-optimal point. If the threshold is zero, thisalgorithm is equivalent to BOA-simple. Algorithm 2
BOA-flex: expands search space by threshold . Require: frameworks (cid:46) trade-off spaces of frameworks
Require: threshold (cid:46)
User-defined threshold Combination = [] (cid:46) configurations to explore for f in frameworks do P areto − opt f ← Get-Pareto-Opt( f ) for Config C i in P areto − opt do for Config C j in f do if NormalizedEuclideanDistance ( C i , C j ) (cid:54) threshold then Combination .append( C j ) end if end for end for end for return Combination (cid:46)
Set of points to explore
BOA-prob:
While BOA-flex expands the combined searchspace, it only considers additional configurations that are closeto an individual framework’s Pareto-optimal curve. To makeBOA even more robust to local minima, BOA-prob employsa sigmoid probability function to include a few points that arefurther from the individual frameworks’ Pareto-optimal curves: S ( C j ) = 11 + exp ( − ∆+ βγ ) (5)where ∆ is the normalized Euclidean distance between con-figuration C j and the nearest Pareto-optimal configuration. β is the horizontal shift and γ decides the curve’s smoothness.Algorithm 3 shows how BOA-prob uses Equation 5.We use constants β = 0 . and γ = 0 . so there isa 92% chance of including the point that has ∆ < . ,and 50% chance of selecting the point with ∆ < . . At ∆ = 0 . , there is less than 1% chance of including the pointin the combination. If ∆ = 0 , it means that C j is actuallyPareto-optimal and BOA-prob includes it. The interdependentparameters β and γ control the size of combined trade-offspace and the exploration time.VI. C ASE S TUDY
2: BOA E
VALUATION
We compare variations of BOA to prior exploration tech-niques using the same experimental setup from Section IV-A.6 N . R un ti m e Blackscholes N . R un ti m e Bodytrack N . R un ti m e Canneal N . R un ti m e Fluidanimate N . R un ti m e Heartwall N . R un ti m e Kmeans N . R un ti m e Particlefilter N . R un ti m e Srad N . R un ti m e Streamcluster N . R un ti m e Swaptions N . R un ti m e X264
LPPDAML (a) Performance/AccuracyLoss trade-off spaces for PowerDial, Loop Perforation, and Approximate Math Library. P I R Blackscholes P I R Bodytrack P I R Canneal P I R Fluidanimate P I R Heartwall P I R Kmeans P I R Particlefilter P I R Srad P I R Streamcluster P I R Swaptions P I R X264
LPPDAML (b) VIPER comparison of PowerDial, Loop Perforation, and Approximate Math Library.Fig. 3: Comparison of PowerDial, Loop Perforation, and Approximate Math Library. (a) shows Pareto-optimal frontiers and (b) shows VIPER.
Algorithm 3
BOA-prob: probabilistic search space expansion.
Require: frameworks (cid:46) trade-off spaces of frameworks Combination = [] (cid:46)
Pareto-efficient configurations for f in frameworks do P areto − opt f ← Get-Pareto-Opt( f ) for Config C i in P areto − opt do for Config C j in f do ∆ = NormalizedEuclideanDistance ( C i , C j ) S ( C j ) = exp ( − ∆+ βγ ) r=rand() (cid:46) Random number between 0 and 1 if ( r ¡ S ( C j ) ) or ( ∆ d = 0 ) then Combination .append( C j ) end if end for end for end for return Combination (cid:46)
Set of points to explore
We now split our inputs into training and test data sets asshown in Table I. For each exploration technique, we first usethe training inputs to find Pareto-efficient configurations, thenwe evaluate those points using the separate test data.
A. Points of Comparison
We compare BOA to state-of-the-art approaches for locatingPareto-efficient points in large trade-off spaces: • MCKP : The multiple choice knapsack problem variantof the classic knapsack problem has classes of itemsand must choose one item from each class. MCKPhas been used to find Pareto-efficient processor designsin the performance-power space for application specificembedded processors [26]. We declare each frameworkto be a class. MCKP then selects the Pareto-optimalconfigurations of each class while keeping the defaultvalues for other classes. This creates a new, small trade-off space which can be searched with brute force. • NSGA-II : The non-dominated sorting-based multi-objective evolutionary algorithm (NSGA-II) exploreslarge trade-off spaces to find Pareto-optimal con-figurations using an evolutionary genetic algorithm[27]. NSGA-II is the state-of-the-art for multiobjec-tive optimization of embedded processors that navigateperformance-power trade-offs, having been cited over7 l a c k s c h o l e s B o d y t r a c k C a n n e a l F l u i d a n i m a t e H e a r t w a ll K m e a n s P a r t i c l e fi l t e r S r a d S t r e a m c l u s t e r S w a p t i o n s X A v e r a g e -1-0.500.501 D i ff e r e n ce o f C o v e r a g e MCKP BOA - simple BOA - flex BOA - prob Fig. 4: Difference of coverage over NSGA-II. Higher bars are better.
B. Comparison by Difference of Coverage
Recall from Section III that difference of coverage (DOC)implies the efficiency of one curve over another. Figure 4displays the difference of coverage for various techniques overNSGA-II per benchmark and on average. The y-axis shows
DOC ( X, N SGA ) which is DOC of exploration technique X over NSGA-II (see Section III-B). Negative values of DOC ( X, N SGA ) indicate that X does not find as manyPareto-efficient points as NSGA-II. Conversely, positive valuesimply that technique X provides that many more valuesthat dominate NSGA-II. BOA-flex and BOA-prob on averagelocate 52.8-65.6% more Pareto-efficient configurations thanNSGA-II. BOA’s superiority is due to its focus on the con-figurations that have been shown to be Pareto-optimal onindividual frameworks. NSGA-II starts the exploration froma random set of points in the combined trade-off space anditeratively looks for more efficient points. MCKP uses the indi-vidual Pareto-optimal curves but keeps the rest of frameworksat default configurations. Since the frameworks are not fullyindependent, we empirically find the some configurations thatwere not Pareto-optimal in the original frameworks becomepart of a Pareto-efficient curve of combined trade-off spacewhen we consider multiple frameworks together. By expandingBOA with threshold and probabilistic exploration, we searchmore points, resulting in more efficient configurations. Inshort, MCKP does not search enough combinations, whileNSGA-II searches too many. By restricting the search to pointslikely to be near the Pareto-optimal frontier for individualframeworks, BOA achieves the right balance and the bestempirical results. This data shows that, for approximate com-putations, BOA produces many more efficient configurationsthan prior state-of-the-art search techniques.C. Comparison by VIPER
Figure 5a shows the Pareto-efficient points for each bench-mark and search method. The y-axis shows runtime (normal-ized to the default configuration) and the x-axis representsaccuracy loss. We use median runtime across test inputs. These figures display the range of runtime and accuracy loss that amethod can achieve. For instance, NSGA-II and MCKP cannotprovide normalized runtime less than 78.1% and 55.5% of thedefault configuration respectively for particlefilter .Figure 5b illustrates the VIPER comparison of NSGA-II,MCKP, and different variations of BOA. The y-axis shows per-formance improvement ratio while the x-axis shows accuracyloss. We use NSGA-II as the baseline, so it is represented bya horizontal line. For the same accuracy loss, lines above thathorizontal represent better (more efficient) configurations, andlines below represent configurations worse than those foundby NSGA-II. For most applications, MCKP stays below thehorizontal, meaning it is worse than NSGA-II. By comparingBOA-simple and MCKP performance improvement ratio lines,we see that BOA-simple outperforms MCKP.From the VIPER plots we also find the maximum andminimum performance improvement over NSGA-II. Consider-ing fluidanimate , the NSGA-II line is at 0.25 indicatingthat the maximum performance is 4 × better than NSGA-II,and minimum performance is 25% worse. In fact, for everybenchmark BOA-flex finds at least one configuration withhigher performance for the same accuracy.Whenever NSGA-II locates more Pareto-efficient pointsthan BOA-simple, by expanding the Pareto-efficientconfigurations we reduce the performance improvementratio gap. Benchmarks heartwall , kmeans , and particlefilter demonstrate how expanding thecombined configurations provides higher Pareto-efficiency. Intotal, we find by increasing the threshold, lines of BOA-flexare above the NSGA-II line more than 95% of the time. Theseresults provide visual confirmation that BOA not only findsa greater number of efficient points than prior techniques,but BOA’s points are also significantly better, representingmuch more efficient trade-offs. Furthermore, we believe thiscase study provides further evidence of VIPER’s value, asthe VIPER charts are visually intuitive in Figure 5b, but thePareto frontiers (Figure 5a) do not immediately show whichframework is best at a given accuracy or by how much.D. Exploration Time
The Pareto-efficiency of the located points depends onexploration time. While Figures 4 and 5 show that BOAproduces better configurations than other techniques, it isimportant to know if that gain comes from exploring morepoints or from a better exploration strategy. Table II presentsthe number of configurations explored for each benchmark fordifferent methods, including different thresholds for BOA-flex.To get an estimate of the time spent exploring the combinedtrade-off space for a specific benchmark, we can multiplythe number of explored configurations by the average runtime(from the last row in Table 1). Comparing NSGA-II and BOA-simple across all benchmarks, NSGA-II explores 2.05% ofall possible configurations, while BOA-simple explores about14 × less. BOA-flex and BOA-prob only search the 0.682%and 0.692% of all possible configurations, respectively. These N o r m a li ze d R un ti m e Blackscholes N o r m a li ze d R un ti m e Bodytrack N o r m a li ze d R un ti m e Canneal N o r m a li ze d R un ti m e Fluidanimate N o r m a li ze d R un ti m e Heartwall N o r m a li ze d R un ti m e Kmeans N o r m a li ze d R un ti m e Particlefilter N o r m a li ze d R un ti m e Srad N o r m a li ze d R un ti m e Streamcluster N o r m a li ze d R un ti m e Swaptions N o r m a li ze d R un ti m e X264
MCKPNSGABOA-simpleBOA-flexBOA-prob (a) Pareto-optimal curves found by each exploration technique. P I R Blackscholes P I R Bodytrack P I R Canneal P I R Fluidanimate P I R Heartwall P I R Kmeans P I R Particlefilter P I R Srad P I R Streamcluster P I R Swaptions P I R X264
MCKPNSGABOA-simpleBOA-flexBOA-prob (b) VIPER comparison of MCKP, NSGA-II, and different versions of BOA.Fig. 5: Comparison of MCKP, NSGA-II, and BOA with different thresholds. (a) shows comparison by Pareto-efficient frontiers and (b) shows comparison byVIPER. ABLE II: Number of Explored Configurations.
Benchmarks
LP PD AML MCKP NSGA-II BOA-simple BOA-flexTh=0.05 BOA-flexTh=0.1 BOA-prob CombinedConfigs
Blackscholes
20 - 216 5 240 3 12 18 27 4,320
Bodytrack
768 200 36 15 800 30 224 256 144 5,529,600
Canneal
Fluidanimate
144 20 6 11 180 24 216 672 168 17,280
Heartwall
256 320 - 19 400 98 665 722 777 81,920
Kmeans
120 100 - 11 200 32 216 216 192 12,000
Particlefilter
200 380 216 12 1000 240 1760 7200 1728 16,416,000
Srad
256 10 36 11 320 48 288 396 160 92,160
Streamcluster
384 256 6 9 480 80 392 1380 640 589,824
Swaptions
768 100 36 15 1000 330 2310 11025 1584 2,764,800
X264
768 400 - 17 1000 45 345 400 405 307,200 results indicate that BOA not only finds better combinationsof approximate frameworks, it does so with less searching.
Since MCKP only chooses configurations from individualPareto-optimal curves rather than merging the configurations,the number of explored configurations stays very low. In theworst case, MCKP explores up to the sum of the Pareto-optimal points of PowerDial, Loop Perforation, and the Ap-proximate Math Library. Unfortunately, while MCKP searchesa small space, it is too small to find many useful points.
E. Input Sensitivity
Since exhaustive exploration is not feasible, we use trainingand test data to ensure robustness of BOA on unseen data. Weshow how well the behavior on training inputs predicts that ontest inputs. For each search method, we take the normalizedruntime and accuracy loss, compute a linear least squaresfit of training data to test data, and compute the correlationcoefficient of each fit. Higher correlation coefficients implygreater robustness; i . e . the behavior of configurations foundduring training data is a good predictor of test behaviorTable III shows the correlation coefficient ( R -values) foraccuracy loss for each exploration method per benchmark.Table IV shows the R -values for runtime. By harmonic mean,BOA has higher consistency of accuracy loss and normalizedruntime comparing to NSGA-II by 17% and 64% respectively.Since MCKP evaluates few configurations, its predictions arequite robust—one advantage of MCKP over other techniques.While some benchmarks such as fluidanimate and streamcluster clearly stress the difference between train-ing and test inputs. NSGA-II’s heuristic approach can selectconfigurations for the training data that produce bad resultson the test data. In contrast, BOA not only finds moreefficient configurations, its results are also much more robustwhen applied to new inputs, producing uniformly high R -values. These results indicate that BOA is a sound methodfor combining approximation frameworks.F. Combination Distribution
When BOA combines frameworks, it considers multipleconfigurations from each rather than choosing from one or twoonly. Table V includes the number of configurations BOA-simple selects from each framework to generate the new,combined trade-off space. As mentioned in Section IV-D, theApproximate Math Library is never better than Loop Perfora-tion or PowerDial in any range of accuracy loss. However,
TABLE III: Correlation coefficients for accuracy loss.
Benchmark
MCKP NSGA-II BOA-simple BOA-flex BOA-prob
Blackscholes
Bodytrack
Canneal
Fluidanimate
Heartwall
Kmeans
Particlefilter
Srad
Streamcluster
Swaptions
X264
Average
TABLE IV: Correlation coefficients for normalized runtime.
Benchmark
MCKP NSGA-II BOA-simple BOA-flex BOA-prob
Blackscholes
Bodytrack
Canneal
Fluidanimate
Heartwall
Kmeans
Particlefilter
Srad
Streamcluster
Swaptions
X264
Average
BOA uses the Approximate Math Library in combinationwith Loop Perforation and PowerDial for 7 out of the 11applications.
These results show that there is real benefitto combining frameworks, as even the Approximate MathLibrary—which is uniformly the worst of the three techniquesby themselves—contributes to Pareto-efficient points in thecombined space found by BOA.
VII. C
ONCLUSION
A proliferation of approximation frameworks have recentlyappeared that exploit different configurable parameters to tradereduced accuracy for decreased resource consumption. Thispaper proposes methods for both comparing and combiningdifferent frameworks. VIPER is a visualization tool for com-paring approximation frameworks across their entire range ofavailable accuracies. We show this tool is useful for comparingexisting approximation frameworks regardless of their typeand applied system level. BOA is a family of algorithms thatcombine approximation frameworks and quickly locate Pareto-efficient configuration combination.
Acknowledgments:
The effort on this project is fundedby the U.S. Government under the DARPA BRASS programand by a DOE Early Career Award. Additional funding comes10
ABLE V: Combinations of approximation frameworks found by BOA.
Benchmark
LP(Pareto-opt) PD(Pareto-opt) AML(Pareto-opt) BOA-simple
Blackscholes
Bodytrack
Canneal
Fluidanimate
Heartwall
Kmeans
Particlefilter
Srad
Streamcluster
Swaptions
11 5 6 330
X264 from the NSF (CCF-1439156, CNS-1526304, CCF-1823032,CNS-1764039. R
EFERENCES[1] K. V. Palem, “Energy aware algorithm design via probabilisticcomputing: From algorithms and models to moore’s law and novel(semiconductor) devices,” in
Proceedings of the 2003 InternationalConference on Compilers, Architecture and Synthesis for EmbeddedSystems , ser. CASES ’03. New York, NY, USA: ACM, 2003, pp. 113–116. [Online]. Available: http://doi.acm.org/10.1145/951710.951712[2] A. Lingamneni, C. Enz, K. Palem, and C. Piguet, “Designing energy-efficient arithmetic operators using inexact computing,”
Journal of LowPower Electronics , vol. 9, no. 1, pp. 141–153, 2013.[3] A. Ingole, B. Maiti, J. Augustine, and K. Palem, “Does customizinginexactness help over simplistic precision (bit-width) reduction? a casestudy,” in
Compilers, Architecture and Synthesis for Embedded Systems(CASES), 2015 International Conference on , Oct 2015, pp. 33–34.[4] V. K. Chippa, S. Venkataramani, S. T. Chakradhar, K. Roy, andA. Raghunathan, “Approximate computing: An integrated hardwareapproach,” in , Nov 2013, pp. 111–117.[5] S. Muralidharan, A. Roy, M. Hall, M. Garland, and P. Rai,“Architecture-adaptive code variant tuning,”
SIGPLAN Not. , vol. 51,no. 4, pp. 325–338, Mar. 2016. [Online]. Available: http://doi.acm.org/10.1145/2954679.2872411[6] L. N. Chakrapani, B. E. S. Akgul, S. Cheemalavagu, P. Korkmaz, K. V.Palem, and B. Seshasayee, “Ultra-efficient (embedded) soc architecturesbased on probabilistic cmos (pcmos) technology,” in
Proceedings of theDesign Automation Test in Europe Conference , vol. 1, March 2006, pp.1–6.[7] M. Samadi, D. A. Jamshidi, J. Lee, and S. Mahlke, “Paraprox: Pattern-based approximation for data parallel applications,” in
ACM SIGARCHComputer Architecture News , vol. 42, no. 1. ACM, 2014, pp. 35–50.[8] H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, “Neuralacceleration for general-purpose approximate programs,” in
Proceedingsof the 2012 45th Annual IEEE/ACM International Symposiumon Microarchitecture , ser. MICRO-45. Washington, DC, USA:IEEE Computer Society, 2012, pp. 449–460. [Online]. Available:http://dx.doi.org/10.1109/MICRO.2012.48[9] O. Temam, “A defect-tolerant accelerator for emerging high-performanceapplications,” in
Proceedings of the 39th Annual InternationalSymposium on Computer Architecture , ser. ISCA ’12. Washington,DC, USA: IEEE Computer Society, 2012, pp. 356–367. [Online].Available: http://dl.acm.org/citation.cfm?id=2337159.2337200[10] H. Esmaeilzadeh, P. Saeedi, B. N. Araabi, C. Lucas, and S. M. Fakhraie,“Neural network stream processing core (nnsp) for embedded systems,”in , May2006, pp. 4 pp.–2776.[11] J. Bornholt, T. Mytkowicz, and K. S. McKinley, “Uncertain¡ t¿: A first-order type for uncertain data,”
ACM SIGPLAN Notices , vol. 49, no. 4,pp. 51–66, 2014.[12] A. Kansal, S. Saponas, A. B. Brush, K. S. McKinley, T. Mytkowicz,and R. Ziola, “The latency, accuracy, and battery (lab) abstraction:Programmer productivity and energy efficiency for continuous mobilecontext sensing,” in
Proceedings of the 2013 ACM SIGPLANInternational Conference on Object Oriented Programming SystemsLanguages & , ser. OOPSLA ’13. New York,NY, USA: ACM, 2013, pp. 661–676. [Online]. Available: http://doi.acm.org/10.1145/2509136.2509541[13] A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, andD. Grossman, “Enerj: Approximate data types for safe and generallow-power computation,” in
Proceedings of the 32Nd ACM SIGPLANConference on Programming Language Design and Implementation ,ser. PLDI ’11. New York, NY, USA: ACM, 2011, pp. 164–174.[Online]. Available: http://doi.acm.org/10.1145/1993498.1993518[14] T. Oh, H. Kim, N. P. Johnson, J. W. Lee, and D. I. August, “Practicalautomatic loop specialization,”
SIGPLAN Not. , vol. 48, no. 4, pp.419–430, Mar. 2013. [Online]. Available: http://doi.acm.org/10.1145/2499368.2451161[15] J. Ansel, Y. L. Wong, C. Chan, M. Olszewski, A. Edelman, and S. Ama-rasinghe, “Language and compiler support for auto-tuning variable-accuracy algorithms,” in
Proceedings of the 9th Annual IEEE/ACMInternational Symposium on Code Generation and Optimization . IEEEComputer Society, 2011, pp. 85–96.[16] W. Baek and T. M. Chilimbi, “Green: A framework for supportingenergy-conscious programming using controlled approximation,” IGPLAN Not. , vol. 45, no. 6, pp. 198–209, Jun. 2010. [Online].Available: http://doi.acm.org/10.1145/1809028.1806620[17] L. Fousse, G. Hanrot, V. Lef`evre, P. P´elissier, and P. Zimmermann,“Mpfr: A multiple-precision binary floating-point library with correctrounding,”
ACM Trans. Math. Softw. , vol. 33, no. 2, Jun. 2007.[Online]. Available: http://doi.acm.org/10.1145/1236463.1236468[18] T.-J. Kwon and J. Draper, “Floating-point division and squareroot using a taylor-series expansion algorithm,”
MicroelectronicsJournal
Swarm and Evolutionary Computation , July 2007, pp. 126–133.[21] G. Ascia, V. Catania, and M. Palesi, “A ga-based design space explo-ration framework for parameterized system-on-a-chip platforms,”
IEEETransactions on Evolutionary Computation , vol. 8, no. 4, pp. 329–346,Aug 2004.[22] L. Mart´ı, J. Garc´ıa, A. Berlanga, and J. M. Molina, “A cumulativeevidential stopping criterion for multiobjective optimization evolutionaryalgorithms,” in
Proceedings of the 9th Annual Conference Companionon Genetic and Evolutionary Computation , ser. GECCO ’07. NewYork, NY, USA: ACM, 2007, pp. 2835–2842. [Online]. Available:http://doi.acm.org/10.1145/1274000.1274053[23] L. Marti, J. Garcia, A. Berlanga, and J. M. Molina, “An approachto stopping criteria for multi-objective optimization evolutionary algo-rithms: The mgbm criterion,” in , May 2009, pp. 1263–1270.[24] S. Sidiroglou-Douskos, S. Misailovic, H. Hoffmann, and M. Rinard,“Managing performance vs. accuracy trade-offs with loop perforation,”in
Proceedings of the 19th ACM SIGSOFT Symposium and the 13thEuropean Conference on Foundations of Software Engineering , ser.ESEC/FSE ’11. New York, NY, USA: ACM, 2011, pp. 124–134.[Online]. Available: http://doi.acm.org/10.1145/2025113.2025133[25] H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, andM. Rinard, “Dynamic knobs for responsive power-aware computing,” in
Proceedings of the Sixteenth International Conference on ArchitecturalSupport for Programming Languages and Operating Systems , ser.ASPLOS XVI. New York, NY, USA: ACM, 2011, pp. 199–212.[Online]. Available: http://doi.acm.org/10.1145/1950365.1950390[26] P. Yang and F. Catthoor, “Pareto-optimization-based run-time taskscheduling for embedded systems,” in
Hardware/Software Codesign andSystem Synthesis, 2003. First IEEE/ACM/IFIP International Conferenceon , Oct 2003, pp. 120–125.[27] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitistmultiobjective genetic algorithm: Nsga-ii,”
IEEE Transactions on Evo-lutionary Computation , vol. 6, no. 2, pp. 182–197, Apr 2002.[28] H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, “Architecturesupport for disciplined approximate programming,”
SIGPLAN Not. ,vol. 47, no. 4, pp. 301–312, Mar. 2012. [Online]. Available:http://doi.acm.org/10.1145/2248487.2151008[29] Q. Shi, H. Hoffmann, and O. Khan, “A cross-layer multicorearchitecture to tradeoff program accuracy and resilience overheads,”
IEEE Comput. Archit. Lett. , vol. 14, no. 2, pp. 85–89, 2015. [Online].Available: https://doi.org/10.1109/LCA.2014.2365204[30] M. Rinard, H. Hoffmann, S. Misailovic, and S. Sidiroglou, “Patterns andstatistical analysis for understanding reduced resource computing,” in
Proceedings of the ACM International Conference on Object OrientedProgramming Systems Languages and Applications , ser. OOPSLA ’10.New York, NY, USA: Association for Computing Machinery, 2010, p.806821. [Online]. Available: https://doi.org/10.1145/1869459.1869525[31] S. Misailovic, S. Sidiroglou, H. Hoffmann, and M. Rinard,
Qualityof Service Profiling . New York, NY, USA: Association forComputing Machinery, 2010, p. 2534. [Online]. Available: https://doi.org/10.1145/1806799.1806808[32] H. Hoffmann, S. Misailovic, S. Sidiroglou, A. Agarwal, and M. Rinard,“Using code perforation to improve performance, reduce energy con- sumption, and respond to failures,” no. MIT-CSAIL-TR-2009-042, 092009.[33] M. Samadi, J. Lee, D. A. Jamshidi, A. Hormati, and S. Mahlke, “Sage:Self-tuning approximation for graphics engines,” in
Proceedings of the46th Annual IEEE/ACM International Symposium on Microarchitecture ,ser. MICRO-46. New York, NY, USA: ACM, 2013, pp. 13–24.[Online]. Available: http://doi.acm.org/10.1145/2540708.2540711[34] J. Park, E. Amaro, D. Mahajan, B. Thwaites, and H. Esmaeilzadeh,“Axgames: Towards crowdsourcing quality target determination inapproximate computing,”
SIGPLAN Not. , vol. 51, no. 4, pp. 623–636,Mar. 2016. [Online]. Available: http://doi.acm.org/10.1145/2954679.2872376[35] R. J. Mathar, “A java math. bigdecimal implementation of core mathe-matical functions,” arXiv preprint arXiv:0908.3030 , 2009.[36] A. Abad, R. Barrio, M. Marco-Buzunariz, and M. Rodr´ıguez,“Automatic implementation of the numerical taylor series method:A mathematica and sage approach,”
Applied Mathematics andComputation
SOSP , 2015.[38] A. Canino, Y. D. Liu, and H. Masuhara, “Stochastic energy optimizationfor mobile gps applications,” in
Proceedings of the 2018 26th ACMJoint Meeting on European Software Engineering Conference andSymposium on the Foundations of Software Engineering , ser. ESEC/FSE2018. New York, NY, USA: ACM, 2018, pp. 703–713. [Online].Available: http://doi.acm.org/10.1145/3236024.3236076[39] X. Sui, A. Lenharth, D. S. Fussell, and K. Pingali, “Proactive controlof approximate programs,”
SIGOPS Oper. Syst. Rev. , vol. 50, no. 2, pp.607–621, Mar. 2016. [Online]. Available: http://doi.acm.org/10.1145/2954680.2872402[40] C. Wan, H. Hoffmann, S. Lu, and M. Maire, “Orthogonalized SGD andnested architectures for anytime neural networks,” in
Proceedings of the37th International Conference on Machine Learning , ser. Proceedingsof Machine Learning Research, H. D. III and A. Singh, Eds., vol.119. PMLR, 13–18 Jul 2020, pp. 9807–9817. [Online]. Available:http://proceedings.mlr.press/v119/wan20a.html[41] C. Wan, M. Santriaji, E. Rogers, H. Hoffmann, M. Maire, andS. Lu, “ALERT: Accurate learning for energy and timeliness,” in
Proceedings of the 5th International Conference on EmbeddedNetworked Sensor Systems , ser. SenSys ’07, 2007.[43] S. Barati, F. A. Bartha, S. Biswas, R. Cartwright, A. Duracz, D. S.Fussell, H. Hoffmann, C. Imes, J. E. Miller, N. Mishra, Arvind,D. Nguyen, K. V. Palem, Y. Pei, K. Pingali, R. Sai, A. Wright, Y. Yang,and S. Zhang, “Proteus: Language and runtime support for self-adaptivesoftware development,”
IEEE Software , vol. 36, no. 2, pp. 73–82, 2019.[Online]. Available: https://doi.org/10.1109/MS.2018.2884864[44] A. Kansal, S. Saponas, A. Brush, K. S. McKinley, T. Mytkowicz,and R. Ziola, “The latency, accuracy, and battery (lab) abstraction:programmer productivity and energy efficiency for continuous mobilecontext sensing,” in
ACM SIGPLAN Notices , 2013.[45] A. Canino and Y. D. Liu, “Proactive and adaptive energy-awareprogramming with mixed typechecking,” in
Proceedings of the 38thACM SIGPLAN Conference on Programming Language Design andImplementation , ser. PLDI 2017. New York, NY, USA: ACM, 2017,pp. 217–232. [Online]. Available: http://doi.acm.org/10.1145/3062341.3062356[46] M. Ringenburg, A. Sampson, I. Ackerman, L. Ceze, and D. Grossman,“Monitoring and debugging the quality of results in approximateprograms,”
SIGPLAN Not. , vol. 50, no. 4, pp. 399–411, Mar. 2015.[Online]. Available: http://doi.acm.org/10.1145/2775054.2694365[47] M. Carbin, S. Misailovic, and M. C. Rinard, “Verifying quantitativereliability for programs that execute on unreliable hardware,” in
Proceedings of the 2013 ACM SIGPLAN International Conference onObject Oriented Programming Systems Languages & ,ser. OOPSLA ’13. New York, NY, USA: ACM, 2013, pp. 33–52.[Online]. Available: http://doi.acm.org/10.1145/2509136.2509546[48] E. Darulova, V. Kuncak, R. Majumdar, and I. Saha, “Synthesis of fixed-point programs,” in
Proceedings of the Eleventh ACM International onference on Embedded Software , ser. EMSOFT ’13. Piscataway,NJ, USA: IEEE Press, 2013, pp. 22:1–22:10. [Online]. Available:http://dl.acm.org/citation.cfm?id=2555754.2555776[49] H. Hoffmann, “Coadapt: Predictable behavior for accuracy-aware appli-cations running on power-aware systems,” in ,2014, pp. 223–232.[50] M. Maggio, A. V. Papadopoulos, A. Filieri, and H. Hoffmann,“Automated control of multiple software goals using multipleactuators,” in Proceedings of the 2017 11th Joint Meeting onFoundations of Software Engineering, ESEC/FSE 2017, Paderborn,Germany, September 4-8, 2017 , 2017, pp. 373–384. [Online]. Available:https://doi.org/10.1145/3106237.3106247[51] A. Filieri, H. Hoffmann, and M. Maggio, “Automated multi-objectivecontrol for self-adaptive software design,” in
Proceedings of the 201510th Joint Meeting on Foundations of Software Engineering, ESEC/FSE2015, Bergamo, Italy, August 30 - September 4, 2015 , E. D. Nitto,M. Harman, and P. Heymans, Eds. ACM, 2015, pp. 13–24. [Online].Available: https://doi.org/10.1145/2786805.2786833[52] A. Farrell and H. Hoffmann, “MEANTIME: achieving both minimalenergy and timeliness with approximate computing,” in , 2016, pp. 421–435.[53] A. Filieri, M. Maggio, K. Angelopoulos, N. D’Ippolito,I. Gerostathopoulos, A. B. Hempel, H. Hoffmann, P. Jamshidi,E. Kalyvianaki, C. Klein, F. Krikava, S. Misailovic, A. V.Papadopoulos, S. Ray, A. M. Sharifloo, S. Shevtsov, M. Ujma, andT. Vogel, “Control strategies for self-adaptive software systems,”
ACMTrans. Auton. Adapt. Syst. , vol. 11, no. 4, pp. 24:1–24:31, 2017.[Online]. Available: https://doi.org/10.1145/3024188[54] S. Wang, C. Li, H. Hoffmann, S. Lu, W. Sentosa, and A. I.Kistijantoro, “Understanding and auto-adjusting performance-sensitiveconfigurations,” in
Proceedings of the Twenty-Third InternationalConference on Architectural Support for Programming Languages andOperating Systems, ASPLOS 2018, Williamsburg, VA, USA, March24-28, 2018 , X. Shen, J. Tuck, R. Bianchini, and V. Sarkar, Eds.ACM, 2018, pp. 154–168. [Online]. Available: https://doi.org/10.1145/3173162.3173206[55] C. Hankendi, A. K. Coskun, and H. Hoffmann, “Adapt&cap:Coordinating system- and application-level adaptation for power-constrained systems,”
IEEE Des. Test , vol. 33, no. 1, pp. 68–76, 2016.[Online]. Available: https://doi.org/10.1109/MDAT.2015.2463275[56] T. Givargis, F. Vahid, and J. Henkel, “System-level exploration forpareto-optimal configurations in parameterized systems-on-a-chip,” in
Computer Aided Design, 2001. ICCAD 2001. IEEE/ACM InternationalConference on , Nov 2001, pp. 25–30.[57] G. Palermo, C. Silvano, and V. Zaccaria, “Respir: A response surface-based pareto iterative refinement for application-specific design spaceexploration,”
IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems , vol. 28, no. 12, pp. 1816–1829, Dec 2009.[58] E. Zitzler, M. Laumanns, and L. Thiele, “Spea2: Improving the strengthpareto evolutionary algorithm,” Tech. Rep., 2001.[59] J. Knowles and D. Corne, “The pareto archived evolution strategy:a new baseline algorithm for pareto multiobjective optimisation,” in
Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999Congress on , vol. 1, 1999, p. 105 Vol. 1.[60] C. Bienia, “Benchmarking modern multiprocessors,” Ph.D. dissertation,Princeton University, January 2011.[61] S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, andK. Skadron, “Rodinia: A benchmark suite for heterogeneous computing,”in
Workload Characterization, 2009. IISWC 2009. IEEE InternationalSymposium on , Oct 2009, pp. 44–54., Oct 2009, pp. 44–54.