Optimizing Program Size Using Multi-result Supercompilation
LL. Fribourg and M. Heizmann (Eds.): VPT/HCVS 2020EPTCS 320, 2020, pp. 125–139, doi:10.4204/EPTCS.320.9 c (cid:13)
D.N. KrustevThis work is licensed under theCreative Commons Attribution License.
Optimizing Program Size Using Multi-resultSupercompilation
Dimitur Nikolaev Krustev
IGE+XAO BalkanSofia, Bulgaria evustd rk @ e ocox- m.agi
Supercompilation is a powerful program transformation technique with numerous interesting appli-cations. Existing methods of supercompilation, however, are often very unpredictable with respectto the size of the resulting programs. We consider an approach for controlling result size, based ona combination of multi-result supercompilation and a specific generalization strategy, which avoidscode duplication. The current early experiments with this method show promising results – we cankeep the size of the result small, while still performing powerful optimizations.
Supercompilation was invented by Turchin [17] and has found numerous applications, such as programoptimization[15, 16, 6], program analysis, software testing, formal verification [7, 13, 14]. It is closelyrelated to partial evaluation [4] and deforestation. Different extensions of the basic supercompilationapproach are also studied, aiming to further increase its power – for example, distillation [3], higher-levelsupercompilation [9], etc.Supercompilation performs very powerful program transformations by simulating the actual executionof the input program on a whole set of possible inputs simultaneously. The flip side of this power isthat the behavior of supercompilation – with respect to both transformation time and result size – canbe very unpredictable. This makes supercompilation problematic for including as an optimization stepof a standard compiler, for example. Measures have been proposed to make supercompilation morewell-behaved, both in execution time and result size [1, 5], while still achieving substantial improvementsin program run time. These proposals are all based on a combination of specially crafted and empiricallyfine-tuned heuristics. The main goal of the present study is to experiment with a more principled approachfor finding a better balance between the size and the run-time performance of programs produced bysupercompilation. This approach is based on a few key ideas: • use multi-result supercompilation [8, 10, 2] to systematically explore a large set of differentgeneralizations during the transformation process, leading to different trade-offs between performedoptimizations and code explosion; • carefully select a generalization scheme, which can – if applied systematically – avoid duplicatingcode during the supercompilation process itself; • re-use ideas from Grechanik et al. [2] to compactly represent and efficiently explore the set ofprograms resulting from multi-result supercompilation.We outline the main ideas of multi-result supercompilation – as well as the specific approach to itsimplementation that we use – in Sec. 2. We then describe the main contributions of this study: often abbreviated as MRSC from now on Optimizing Program Size Using MRSC • We propose a method of adapting some existing techniques (MRSC with efficient queries overresult sets) to solve in a systematic way the problem of code explosion during supercompilation(Sec. 3). • We define a particular strategy for generalization during MRSC (Sec. 3.3), which: – avoids any risk of duplicating code during supercompilation (when applied); – avoids unnecessary increase of the search space of possible transformed programs, whichMRSC must explore. • We analyze the performance of the proposed strategy on several simple examples (Sec. 4).
Supercompilation is often defined as a transformation of “configurations” – data structures containinginformation about the set of states of program execution currently being explored. The configurationsare typically produced by a process called “driving”, and organized in “configuration trees”. Sometimesthe current configuration is similar enough to some previous one, and we can “fold” the former to thelatter, turning the configuration tree into a “configuration graph”. Finally, we can generate a new “residual”program from the configuration graph, where folding typically corresponds to calls to functions introducedby new recursive definitions. To ensure termination of the supercompilation process, a check – usuallycalled a “whistle” – is systematically performed on the sequence of configurations being produced. Whenthis whistle signals a potential risk of non-termination, one possibility to continue the supercompilationprocess is to perform a “generalization” – replace the current configuration by another, which avoids thenon-termination risk, possibly by forgetting some information. The whistle usually marks the currentconfiguration (“lower” in the tree) as risking non-termination with respect to some other configurationproduced earlier (“higher” in the tree). Typical positive supercompilers – as described, for example, bySørensen et al. [16] – make a choice whether to generalize the “lower” or the “upper” configuration.One of the key insights behind multi-result supercompilation is that the place where the whistle hasblown is not always the best place to make a generalization. The proposed solution is radical: do notgeneralize when the whistle has blown; instead, at any driving step check if suitable generalizations existand continue driving not only the non-generalized configuration, but also all generalized ones, giving riseto multiple alternative results of driving. This process is illustrated by the example at the end of Sec. 3.3.Our implementation of multi-result supercompilation mostly follows the same generic frameworkused in [2, 12]. It offers some important simplifications compared to the original work of Klyuchnikov etal. [8, 10]: • It is based on “big-step” driving (called so by analogy with big-step operational semantics, asdriving a configuration produces the full configuration subtree corresponding to it). • The whole set of transformed programs is represented compactly in a tree-like data structure,which further permits not only recovering the full set of configuration graphs, but also performingefficiently certain kinds of queries on this set.The compact representation of the set of graphs is shown in Fig. 1. We use direct excerpts of the F , but hopefully they will be readable by anyone familiar with other functionallanguages such as OCaml or Haskell. Another important caveat is that we have implemented MRSC forprograms in a specific language, so some details will only become clear once we introduce this language. Available at https://github.com/dkrustev/MRScpOptSize .N. Krustev t y p e
MConf = M u l t i D r i v e S t e p R e s u l t ∗ Exp t y p e
G r a p h S e t = | GSNone | GSFold o f
MConf ∗ i n t ∗ l i s t < VarName ∗ VarName > | GSBuild o f
MConf ∗ l i s t < l i s t < GraphSet >> l e t r e c g s e t 2 g r a p h s ( gs : G r a p h S e t ) : seq < MConf ∗ ConfGraph > = match gs with | GSNone − > Seq . empty | GSFold ( conf , n , r e n ) − > Seq . s i n g l e t o n ( conf , CGFold ( n , r e n ) ) | GSBuild ( conf , a l t s ) − > l e t b u i l d G r a p h ’ s u b G r a p h s = ( conf , b u i l d G r a p h ( snd c o n f ) s u b G r a p h s ) l e t b u i l d A l t a l t =Seq . map b u i l d G r a p h ’ ( Seq . c a r t e s i a n ( Seq . map g s e t 2 g r a p h s a l t ) )Seq . c o l l e c t b u i l d A l t a l t s Figure 1: Representation and Expansion of Graph SetsIn particular, the configurations (
MConf ) we use are pairs of a multi-result driving step output and anexpression of the language, as described in Sec. 3.3. A language-specific helper function buildGraph builds a configuration graph from a given configuration and subgraphs. Still, the details of the languageare not important for understanding the core MRSC algorithm; indeed, such details are successfullyabstracted away in [2, 12]. We prefer to show excerpts from our actual implementation for concreteness.The representation from Fig. 1 is termed lazy graph by Grechanik et al. [2], and can be viewed as adomain-specific language (DSL) describing the construction of the complete set of configuration graphsproduced by multi-result supercompilation. • Node
GSNone is used when the whistle has blown. It represents an empty set of configurationgraphs. • Node
GSFold is used when folding is possible. It gives the relative distance to the upper node towhich we fold, plus the renaming (finite mapping of variables to variables), which makes the foldedconfigurations compatible. Note that, following Grechanik et al. [2], we consider only folding to anode on the path from the current node to the root of the graph. This choice enables us to keep therepresentation of the set of configuration graphs we produce simpler. Also, similar to other positivesupercompilers [15, 16, 6], it is only possible to fold to a node, which is a renaming of the currentone. • Node
GSBuild is the most complicated one, representing a list of alternative developments (drivingor generalization) of the current configuration. Each alternative, in turn, gives rise to a list of newconfigurations to explore, and hence, to a list of nested graph sets.The semantics of this DSL is shown in the same Fig. 1 as a function gset2graphs expanding a
GraphSet into a sequence of configuration graphs. Note the use of
Seq.cartesian to compose thesubgraphs of the graph node of each alternative configuration.The main MRSC algorithm – the one that builds the graph set of a given initial configuration –is presented with some simplifications as Algorithm 1. We can ignore the details about splitting theconfiguration history into a local and a global one – they mostly follow established heuristics as inSørensen et al. [15, 16]. The overall approach is simple - if folding is possible, we produce a fold nodeand stop pursuing the current configuration. Otherwise we check the whistle – in our case, the same28
Optimizing Program Size Using MRSC
Algorithm 1
Main MRSC Algorithm function MR S CP R EC ( P , l , h , c ) (cid:46) P – program (function definitions); l – nesting level; (cid:46) h – history (list of tuples: local/global flag; level; configuration); c – configuration if ∃ ( , l (cid:48) , c (cid:48) ) ∈ h , ρ – renaming : c = rename ( c (cid:48) , ρ ) then return GSFold ( c , l − l (cid:48) , ρ ) else rs ← multiDriveSteps ( P , c ) if ∃ ( MDSRCases ) ∈ rs then hk ← HEGlobal relHist ← [( hk (cid:48) , l , c ) | hk (cid:48) = HEGlobal ] else hk ← HELocal relHist ← takeWhile ( λ ( hk (cid:48) , l , c ) . hk (cid:48) = HELocal , h ) end if if ∃ ( , , c (cid:48) ) ∈ relHist : c (cid:48) (cid:69) c then (cid:46) (cid:69) denotes homeomorphic embedding return GSNone else css ← map ( mdsrSubExps , rs ) h (cid:48) ← ( hk , l , c ) :: h return GSBuild ( c , map ( λ cs . map ( λ c . MR S CP R EC ( P , l + , h (cid:48) , c ) , cs ) , css )) end if end if end function function MR S CP ( P , c ) return MR S CP R EC ( P , , [] , c ) end function homeomorphic embedding relation, which is used in other positive supercompilers [16]. If it blows, westop immediately with an empty set of resulting graphs. When there is neither folding nor a whistle, wecontinue analyzing the execution of the current configuration – based on 2 language-specific functions: • multiDriveSteps returns a number of alternatives for the current configuration – either a set ofnew configurations produced by a single step of driving, or by (possibly several different forms of)generalization. • mdsrSubExps returns – for a given alternative produced by the previous function – the list ofsub-configurations (in our case – subexpressions) that must be subjected to further analysis.The implementation of both functions is described in Sec. 3. Once we have this list of lists of sub-configurations, we simply apply the same algorithm recursively, but with extended history. Readersfamiliar with the implementation details of other supercompilers are invited to compare them to thesimplicity of this MRSC approach. .N. Krustev Expressionse ::= x variable | a ( e , . . . , e n ) call Call kinds a ::= C constructor | f function Patterns p ::= C ( x , . . . , x n ) Function definitionsd ::= f ( x , . . . , x n ) = e ordinary function | g ( p , y , . . . , y m ) = e pattern-matching . . . function g ( p n , y , . . . , y m ) = e n Programs P ::= d , . . . , d n Figure 2: Object Language Syntax t y p e
D r i v e S t e p R e s u l t = | DSRNone | DSRCon o f
ConName ∗ l i s t < Exp > | DSRUnfold o f
Exp | DSRCases o f
VarName ∗ l i s t < P a t t e r n ∗ Exp > Figure 3: Result of a Single Step of Driving
The object language we consider is a first-order functional language with ordinary (not pattern-matching)and pattern-matching function definitions. Its syntax is summarized in Fig. 2. A very similar language –with call-by-name semantics – is often used in many introductions to positive supercompilation [15, 16, 6].A notable restriction in our case is that we omit if-expressions and a built-in generic equality. We use theconvention that data constructors always start with an uppercase letter, while function and variable namesstart with a lowercase one. The patterns of any function definition must be exhaustive, not nested, and non-overlapping. As a technical detail, we do not make a distinction between ordinary and pattern-matchingfunctions at each call site, as this information is uniquely determined by the function definition itself.
Let us recall what driving looks like for this simple language, in the case of positive supercompilation(which is a simplification of the more general approach pioneered by Turchin [17]). As a technical device,we define a single step of driving, producing a result of type
DriveStepResult , as defined in Fig. 3 • We cannot drive a variable any further: drive (cid:74) x (cid:75) = DSRNone ; • Driving a constructor results in a constructor node with all arguments available for further driving: drive (cid:74) C ( e , . . . , e n ) (cid:75) = DSRCon ( C , e , . . . , e n ) ; • If we stumble upon a call to an ordinary function, we simply unfold its definition: drive (cid:74) f ( e , . . . , e n ) (cid:75) = DSRUnfold ( e [ x → e , . . . , x n → e n ]) , where f ( x , . . . , x n ) = e ∈ P and e [ x → e , . . . , x n → e n ] denotes simultaneous substitution; • The most interesting cases concern a call to a pattern-matching function, as the situation is differentdepending on the kind of the first argument: – drive (cid:74) g ( C ( e (cid:48) , . . . , e (cid:48) m ) , e , . . . , e n ) (cid:75) = DSRUnfold ( e [ x → e (cid:48) , . . . , x m → e (cid:48) m , y → e , . . . , y n → e n ]) where g ( C ( x , . . . , x m ) , y , . . . , y n ) = e ∈ P ;30 Optimizing Program Size Using MRSC t y p e
ConfGraph = | CGLeaf o f
Exp | CGCon o f
ConName ∗ l i s t < ConfGraph > | CGUnfold o f
ConfGraph | CGCases o f
VarName ∗ l i s t < P a t t e r n ∗ ConfGraph > | CGFold o f i n t ∗ l i s t < VarName ∗ VarName > | CGLet o f l i s t < VarName ∗ ConfGraph > ∗ ConfGraph
Figure 4: Representation of a Configuration Graph – drive (cid:74) g ( x , e , . . . , e n ) (cid:75) = DSRCases ( x , propagate ( x , p , ( e , . . . , e n ) , e (cid:48) ) , . . . , propagate ( x , p m , ( e , . . . , e n ) , e (cid:48) m )) where g ( p , y , . . . , y n ) = e (cid:48) , . . . , g ( p m , y , . . . , y n ) = e (cid:48) m ∈ P and propagate performs positive information propagation by substituting (a suitable renamingof) p i for x in the corresponding branch i ; – drive (cid:74) g ( f ( e (cid:48) , . . . , e (cid:48) m ) , e , . . . , e n ) (cid:75) = dsrMap ( (cid:74) g ( • , e , . . . , e n ) (cid:75) , drive (cid:74) f ( e (cid:48) , . . . , e (cid:48) m ) (cid:75) ) where dsrMap transforms a driving step result by splicing it in an expression with a hole .We deliberately omit many low-level details in this description, as they are well-known and can be foundin most introductions to positive supercompilation [15, 16, 6]. Using this definition of driving, plus theusual definitions of folding, whistle, and generalization, we can build configuration graphs of the formshown in Fig. 4. Note that we use the same representation of variables inside object-language programsand inside configuration graphs, as no confusions arise (assuming suitable measures for avoiding variablecapture). As already mentioned, a key difference in multi-result supercompilation is that driving and generalizationare grouped together: a multi-driving step can return not one, but several alternative configurations. Oneof them is typically the result of standard driving, but the others can be different kinds of generalizations.The choice of generalization strategy depends on the intended use of the multi-result supercompiler. In ourcase, the main goal is to find a program of optimal size among the results. Previous analyses have shownthat one of the main reasons for code size explosion in supercompilation is the unrestricted duplicationof subexpressions during driving. Of course, sometimes such duplication pays off, as it leads to newopportunities for optimization. But this is not always the case. These observations lead us to consider twoguiding principles that should help us attain our goal: • if standard driving can duplicate existing code, provide also a generalized configuration, where noexisting (non-trivial) subexpressions are duplicated ; • if there is no risk of duplicating code, avoid any generalization, as it will be unlikely to help withthe size of the result.To apply these principles, we analyze standard driving, case by case, to see where we need to avoid codeduplication by generalization. In order to express generalization as a possible result of a driving step,we extend our representation of a step result - Fig. 5. The same figure shows the implementation of mdsrSubExps that we have encountered earlier. The source function multiDriveSteps will be denoted mrdrive for brevity below. We implement expressions with a single hole as functions from expressions to expressions. We do not attempt to remove already existing code duplication, only to avoid introducing new duplication. .N. Krustev t y p e
M u l t i D r i v e S t e p R e s u l t = | MDSRLeaf o f
Exp | MDSRCon o f
ConName ∗ l i s t < Exp > | MDSRUnfold o f
Exp | MDSRCases o f
VarName ∗ l i s t < P a t t e r n ∗ Exp > | MDSRLet o f l i s t < VarName ∗ Exp > ∗ Exp l e t mdsrSubExps ( mdsr : M u l t i D r i v e S t e p R e s u l t ) : l i s t < Exp > = match mdsr with | MDSRLeaf − > [ ] | MDSRCon( , e s ) − > e s | MDSRUnfold e − > [ e ] | MDSRCases ( , c a s e s ) − > L i s t . map snd c a s e s | MDSRLet ( b i n d s , e ) − > e : : L i s t . map snd b i n d s Figure 5: One Step of Multi-result Driving • The variable case is again trivial: mrdrive (cid:74) x (cid:75) = [ MDSRLeaf ( x )] ; (We use [ . . . ; . . . ; . . . ] to denote a list of results.) • Driving a constructor does not duplicate code, so we again make no generalization: mrdrive (cid:74) C ( e , . . . , e n ) (cid:75) = [ MDSRCon ( C , e , . . . , e n )] ; • The unfolding of a call to an ordinary function can produce code duplication, if some argumentsappear multiple times in the body of the definition. We conservatively generalize all arguments ofthe call : mrdrive (cid:74) f ( e , . . . , e n ) (cid:75) = [ MDSRLet ( y = e , . . . , y n = e n , e [ x → y , . . . , x n → y n ]) ; MDSRUnfold ( e [ x → e , . . . , x n → e n ])] , where y , . . . , y n are fresh and f ( x , . . . , x n ) = e ∈ P . Notice that we shallalways place the generalization result before the driving result in the list. In this way, when weexpand the lazy graph using gset2graphs , configuration graphs earlier in the resulting sequencewill have more generalizations; • The case of a pattern-matching call with a known constructor is completely analogous to theprevious one: mrdrive (cid:74) g ( C ( e (cid:48) , . . . , e (cid:48) m ) , e , . . . , e n ) (cid:75) = [ MDSRLet ( u = e (cid:48) , . . . , u m = e (cid:48) m , z = e , . . . , z n = e n , e [ x → u , . . . , x m → u m , y → z , . . . , y n → z n ]) ; MDSRUnfold ( e [ x → e (cid:48) , . . . , x m → e (cid:48) m , y → e , . . . , y n → e n ])] where u , . . . , u m , z , . . . , z n are fresh and g ( C ( x , . . . , x m ) , y , . . . , y n ) = e ∈ P ; • When we pattern-match on a variable, information propagation can introduce some code duplication.The code potentially being duplicated, however, is always of the form C ( x , . . . , x n ) . We have cur-rently decided to accept this limited form of potential duplication, without adding a generalization: mrdrive (cid:74) g ( x , e , . . . , e n ) (cid:75) = [ MDSRCases ( x , propagate ( x , p , ( e , . . . , e n ) , e (cid:48) ) , . . . , propagate ( x , p m , ( e , . . . , e n ) , e (cid:48) m ))] where g ( p , y , . . . , y n ) = e (cid:48) , . . . , g ( p m , y , . . . , y n ) = e (cid:48) m ∈ P ; • The case of matching on a function call is perhaps the least obvious. As during normal driving wereuse the result of driving the nested call, it is not clear in advance what it will be. So we prefer tobe conservative, and add a full generalization of the outer call here: mrdrive (cid:74) g ( f ( e (cid:48) , . . . , e (cid:48) m ) , e , . . . , e n ) (cid:75) = [ MDSRLet ( x = f ( e (cid:48) , . . . , e (cid:48) m ) , x = e , . . . , x n = e n , g ( x , . . . , x n )) ; mdsrMap ( (cid:74) g ( • , e , . . . , e n ) (cid:75) , mrdrive (cid:74) f ( e (cid:48) , . . . , e (cid:48) m ) (cid:75) )] where x , . . . , x n are, as usual,fresh. We leave a more refined generalization treatment for future work. Optimizing Program Size Using MRSC
GSBuild (g(Cons(A(), Nil()), z))alt. 1 alt. 2 GSBuild (let x0 = _; xs0 = _; y0 = _ in f(g(xs0, y0)))alt. 1 alt. 2 GSBuild (A())alt. 1 GSBuild (Nil())alt. 1 GSBuild (z)alt. 1 GSBuild (f(g(Nil(), z)))alt. 1 alt. 2 GSBuild (let w0 = _ in B(w0, w0))alt. 1 GSBuild (g(xs0, y0))alt. 1 GSBuild (B(g(xs0, y0), g(xs0, y0)))alt. 1 GSBuild (w0)alt. 1 GSBuild (w0)alt. 1 GSBuild (case xs0 = Nil() : y0)alt. 1 GSFold (case xs0 = Cons(x0, xs1) : f(g(xs1, y0)), [xs1 := xs0; y0 := y0]) ... ... ... ... ...
Figure 6: Partial Lazy Graph of “exp growth” Small ExampleTo illustrate the process on a very simple example, consider the example program from Fig. 7f-g(explained in Sec. 4), but specialized to a smaller input: g(Cons(A, Nil), z) . A part of the resultinglazy graph (omitting subgraphs for alternatives other than the first) is shown in Fig. 6. The full lazy graphfor the same example is shown in the extended version of this article [11]. • Multi-result driving produces 2 alternatives from the initial expression g(Cons(A, Nil), z) : let x0 = A; xs0 = Nil; y0 = z in f(g(xs0, y0)) and f(g(Nil, z)) . As mentioned, wefurther consider only the subgraph for the first alternative, which is the result of a generalization. • Driving cannot transform any further the subexpressions A , Nil , and z , so they end up as leafsin the lazy graph. Driving the subexpression f(g(xs0, y0)) again produces 2 alternatives: letw0 = g(xs0, y0) in B(w0, w0) and B(g(xs0, y0), g(xs0, y0)) . • The first alternative here has 2 subexpressions: – B(w0, w0) , where driving of its subexpressions in turn leads to 2 leafs, both w0 ; – g(xs0, y0) , where driving must perform a case analysis on x0 , resulting in 2 subgraphs: ∗ case xs0 = Nil() : y0 , where driving cannot proceed any further; ∗ case xs0 = Cons(x0, xs1) : f(g(xs1, y0)) – this expression is a renaming of f(g(xs0, y0)) , encountered above, so we end up with a folding node. We have studied the behavior of the proposed multi-result supercompiler on a few simple examples, witha focus on the resulting program size. A straightforward approach, which works for some of the smallerexamples, is to enumerate all resulting configuration graphs and gather statistics from them. For oneexample – the well-known “KMP test” – this approach turned out to be too time consuming. So, to makeit possible to analyze larger examples, we adapted the approach of Grechanik et al. [2], which permits tofilter the sequence of configuration graphs produced by MRSC, without explicitly enumerating them. Inparticular, we have implemented functions to extract: • the first configuration graph in the sequence (recall that by the ordering of results during driving, itshould contain the most generalizations); • the last one – with the least number of generalizations; .N. Krustev • the graph with the smallest number of nodes; • the graph with the largest number of nodes.The implementation of some of these functions is shown in the extended version of this article [11]. Theseimplementations take polynomial time (and often almost linear time in practice) with respect to the sizeof the lazy graph, while the number of configuration graphs can be exponential with respect to this size.As such, they are key to making the proposed approach tractable on larger examples. By comparingthe first to the smallest and the largest graph we can see how successful the proposed generalizationstrategy is in controlling code size. By comparing the last to the first and the smallest graph, we can seethe improvements that our approach can achieve with respect to standard supercompilation – as the lastconfiguration graph usually corresponds to the one that a standard positive supercompiler would produce.After we extract a single configuration graph out of the lazy graph, we can further produce a newprogram in the object language from this graph. This residualization process involves 2 main steps: ) Extract a program in a language extended with case and let expressions from the configurationgraph. This step also creates recursive function definitions from fold nodes. ) Remove case and let expressions (by a method similar to lambda lifting) to obtain a program in the original object language.The residualization process also includes several optimizations: a ) removing “trivial” let expressions(expressions of the form let x = y in ... , or where the variable is only used once); b ) removingduplicated function definitions (a limited form of common subexpression elimination). The results shownin Fig. 8-12 are all produced by applying this residualization process on the given configuration graph.Note, however, that we have decided to compare the sizes of the configuration graphs, and not of theresidualized programs, for a couple of reasons: • The size of the resulting program depends not only on the optimizations performed by supercompi-lation proper (which are reflected directly in the configuration graph), but also on the additionaloptimizations performed during residualization. The latter are standard optimizations that can beperformed on any program, no matter if it is produced by supercompilation or written by hand. • As already mentioned, we can efficiently extract a configuration graph of optimum size from thelazy graph. We have no way to do the same with respect to the size of the residual program.Here we analyze in detail several of the example programs (Fig. 7) showing the most interestingresults. The extended version of this article [11] discusses a few additional examples. • “double append” (Fig. 7a-b) is traditionally used to demonstrate the power of deforestation andsupercompilation. It can also be seen as a first step of a proof that list append is associative. • The “KMP test” (Fig. 7c-d) is another classical example, which demonstrates the power ofsupercompilation with respect to deforestation and partial evaluation. It involves specializing asublist predicate to a fixed sublist being searched in an unknown list. • “ eqBool symmetry” (Fig. 7c, 7e) is intended to show that Boolean equality is symmetric. • “exp growth” (Fig. 7f-g) is an example taken from Sørensen’s thesis [15, Example 11.4.1], whoattributes it to Sestoft. It is aimed to demonstrate how classical supercompilation can produceoutput programs, which grow exponentially with respect to the input.The results of the multi-result supercompilation – using our specific generalization approach – aresummarized in Table 1. We give the configuration graph sizes for each of the four types of results (first,last, minimum/maximum size).Several interesting observations arise from analyzing the selected resulting programs themselves:34 Optimizing Program Size Using MRSC append ( Nil , ys ) = ys ;append ( Cons ( x , xs ) , ys ) = Cons ( x , append ( xs , ys ) ) ; (a) List Append Program append ( append ( xs , ys ) , z s ) (b) Double Append n o t ( True ) = F a l s e ; n o t ( F a l s e ) = True ; eqBool ( True , b ) = b ; eqBool ( F a l s e , b ) = n o t ( b ) ; match ( Nil , ss , op , os ) = True ; match ( Cons ( p , pp ) , ss , op , os ) = matchCons ( ss , p , pp , op , os ) ; matchCons ( Nil , p , pp , op , os ) = F a l s e ; matchCons ( Cons ( s , s s ) , p , pp , op , os ) = matchHdEq ( eqBool ( p , s ) , pp , ss , op , os ) ; matchHdEq ( True , pp , ss , op , os ) = match ( pp , ss , op , os ) ; matchHdEq ( F a l s e , pp , ss , op , os ) = n e x t ( os , op ) ; n e x t ( Nil , op ) = F a l s e ; n e x t ( Cons ( s , s s ) , op ) = match ( op , ss , op , s s ) ; i s S u b l i s t ( p , s ) = match ( p , s , p , s ) ; (c) Substring Program i s S u b l i s t ( Cons ( True , Cons ( True , Cons ( F a l s e , N i l ) ) ) , s ) (d) “KMP Test” eqBool ( eqBool ( x , y ) , eqBool ( y , x ) ) (e) Bool Equality Symmetry g ( Nil , y ) = y ; g ( Cons ( x , xs ) , y ) = f ( g ( xs , y ) ) ; f (w) = B(w, w) ; (f) Program Demonstrating Exponential Growth g ( Cons (A, Cons (A, Cons (A, N i l ) ) ) , z ) (g) Expression Demonstrating Exponential Growth Figure 7: Example Programs .N. Krustev eqBool symmetry 16 17 16 30exp growth 15 37 15 57 • In 2 cases (“ eqBool symmetry” and “exp growth”) the minimum program coincides with the first(most generalizing) one; in 1 case (“double append”) – with the last (least generalizing) result.In “KMP test” the difference between the minimum-size and the last program is minimal. Thisconfirms the highly unpredictable impact of driving+generalization on program size. • Note, however, that “double append” is a bit of an outlier – it results in only 3 programs, one ofwhich (the first) is isomorphic to the input. The smallest one is the expected optimized version,shown in Fig. 8. • In all cases, the size of the first result is closer to the minimum than to the maximum size. Thisconfirms that our choice of generalization ensures limited growth of the result size. • The smallest/last “KMP test” graph produces the expected optimal (as execution time) program, asshown in Fig. 9. • The last “ eqBool symmetry” graph produces a program, which can indeed serve as evidence of thesymmetry of Boolean equality – Fig. 10. • The results of “exp growth” are especially interesting in view of our main goal. The last result isthe same as produced by Sørensen’s supercompiler –
B(B(B(z, z), B(z, z)), B(B(z, z),B(z, z))) – clearly suffering from code-size explosion. The minimum-size (and also first)program – shown in Fig. 11 – avoids the pitfall of code-size explosion, thanks to generalization. Ithas, however, also missed some opportunities for static evaluation. Interestingly, if we analyze thefull set of results, there is another graph of size 17 that produces a program, which has eliminatedall possible static reductions, while avoiding the risk of code explosion – Fig. 12. Apparently, ifwe do not want to miss such results, we need a more refined approach for looking for (close to)minimum-size programs. One explanation of this discrepancy is that – as already explained – ratherthan comparing the sizes of the programs produced by residualizing these graphs, we compareconfiguration graph sizes. A possible compromise is to study better size measures for configurationgraphs, instead of the simple node count we currently use. For example, ignoring unfolding nodeswhen calculating size can give a better idea of the expected size of the residualized program, asunfolding nodes are skipped during residualization. Another possibility is to find not only (one of)the minimum-size result(s), but the N smallest results ( N being an input parameter).Based on the last observation above, we have implemented modified queries for finding the graph ofminimum (and maximum) size, where unfolding nodes are not counted – with very encouraging results: • for “double append”, “KMP test”, and “ eqBool symmetry” the modified query returns the sameoptimal programs discussed above, which were also found by the existing queries for minimum orlast program; • for “exp growth”, the modified-minimum query again finds the optimal program – shown Fig. 12 –which was missed by all standard queries.36 Optimizing Program Size Using MRSC f 0 ( ys , z s ) = f 0 c a s e 0 ( ys , z s ) ;f 0 c a s e 0 ( N i l ( ) , z s ) = z s ;f 0 c a s e 0 ( Cons ( x0 , xs0 ) , z s ) = Cons ( x0 , f 0 ( xs0 , z s ) ) ;f ( xs , ys , z s ) = f c a s e 0 ( xs , ys , z s ) ;f c a s e 0 ( N i l ( ) , ys , z s ) = f 0 ( ys , z s ) ;f c a s e 0 ( Cons ( x00 , xs00 ) , ys , z s ) = Cons ( x00 , f ( xs00 , ys , z s ) ) ;e x p r e s s i o n : f ( xs , ys , z s )
Figure 8: Optimized double-append f 1 0 0 0 1 0 0 ( s0 , s s 1 ) = f 1 0 0 0 1 0 0 c a s e 0 ( s0 , s s 1 ) ;f 1 0 0 0 1 0 0 c a s e 0 ( True ( ) , s s 1 ) = f 1 0 0 0 1 0 0 c a s e 1 ( s s 1 ) ;f 1 0 0 0 1 0 0 c a s e 0 ( F a l s e ( ) , s s 1 ) = f 0 ( s s 1 ) ;f 1 0 0 0 1 0 0 c a s e 1 ( N i l ( ) , ) = F a l s e ( ) ;f 1 0 0 0 1 0 0 c a s e 1 ( Cons ( s0 , s s 0 ) , ) = f 1 0 0 0 1 0 0 c a s e 2 ( s0 , s0 , s s 0 ) ;f 1 0 0 0 1 0 0 c a s e 2 ( True ( ) , s0 , s s 0 ) = f 1 0 0 0 1 0 0 ( s0 , s s 0 ) ;f 1 0 0 0 1 0 0 c a s e 2 ( F a l s e ( ) , s0 , s s 0 ) = True ( ) ;f 0 ( s ) = f 0 c a s e 0 ( s ) ;f 0 c a s e 0 ( N i l ( ) , ) = F a l s e ( ) ;f 0 c a s e 0 ( Cons ( s0 , s s 0 ) , ) = f 0 c a s e 1 ( s0 , s s 0 ) ;f 0 c a s e 1 ( True ( ) , s s 0 ) = f 0 c a s e 2 ( s s 0 ) ;f 0 c a s e 1 ( F a l s e ( ) , s s 0 ) = f 0 ( s s 0 ) ;f 0 c a s e 2 ( N i l ( ) , ) = F a l s e ( ) ;f 0 c a s e 2 ( Cons ( s0 , s s 1 ) , ) = f 1 0 0 0 1 0 0 ( s0 , s s 1 ) ;e x p r e s s i o n : f 0 ( s )
Figure 9: Optimized KMP Test Result m a i n c a s e 0 ( True ( ) , y ) = m a i n c a s e 2 ( y ) ;m a i n c a s e 0 ( F a l s e ( ) , y ) = m a i n c a s e 2 ( y ) ;m a i n c a s e 2 ( True ( ) , ) = True ( ) ;m a i n c a s e 2 ( F a l s e ( ) , ) = True ( ) ;e x p r e s s i o n : m a i n c a s e 0 ( x , y )
Figure 10: “ eqBool symmetry” Optimal Result f 3 ( xs0 , y0 ) = f 3 l e t 0 ( f 3 c a s e 0 ( xs0 , y0 ) ) ;f 3 l e t 0 ( w0 ) = B( w0 , w0 ) ;f 3 c a s e 0 ( N i l ( ) , y0 ) = y0 ;f 3 c a s e 0 ( Cons ( x0 , xs1 ) , y0 ) = f 3 ( xs1 , y0 ) ;e x p r e s s i o n : f 3 ( Cons (A ( ) , Cons (A ( ) , N i l ( ) ) ) , z ) )
Figure 11: “exp growth” Minimum-size Result m a i n l e t 1 ( w0 ) = B( w0 , w0 ) ;e x p r e s s i o n : m a i n l e t 1 ( m a i n l e t 1 (B( z , z ) ) )
Figure 12: “exp growth” Optimal Result .N. Krustev
The unpredictability of supercompilation with respect to both performance and result size is a well-established issue. Problems with code duplication and result size are discussed by Sørensen [15], forexample. Few works directly tackle this problem, however. Bolingbroke et al. [1] study heuristics forimproving the general performance of a specific supercompiler, in order to use it as an automatic phase ofan optimizing compiler for Haskell. Some of these heuristics concern avoiding code duplication, and as aconsequence they may lead to improvements in result size, while still producing faster programs. Thekey idea is to roll back – discarding some work done by the supercompiler – if the heuristics indicatethat this work is not leading to a useful result (a form of generalization). Speculative execution can alsohelp with code size in some instances. One problem with this approach is that it is not clear how togeneralize it or apply it to a completely different supercompiler. The main advantage is that by carefullyselecting heuristics, suitable for the specific supercompiler, the authors report good results on a number ofbenchmarks.Jonsson et al. [5] explicitly address both the issue of code explosion and the related issue of su-percompilation time. The main idea is again to discard the result of supercompiling certain programfragments if they do not meet certain usefulness criteria (based on the number of reductions performed bythe supercompiler and the resulting code size). We can again consider this a form of generalization. Whensuch generalization happens, however, is based on specific hand-picked heuristics, apparently based onanalyzing the results of different test runs.Grechanik et al. [2] propose a generic framework for building “big-step” multi-result supercompilers,and a way to efficiently extract results satisfying certain criteria. Selecting the smallest result is one of thecriteria studied. Optimization of the result size is not a goal of their work, however. The authors haveinstantiated the framework on a language simulating counter systems, which is not Turing-complete, andthus does not demonstrate some of the complications coming with Turing-complete object languages.The work of Grechanik et al. [2] is most closely related to ours: we re-use the ideas for implementingour multi-result supercompiler and for efficiently filtering its results by criteria. Our main emphasis,however, is on using MRSC together with a generalization strategy, which is explicitly tailored towardsavoiding code duplication and – consequently – optimizing result size. We pay much less attention tosupercompilation time, as long as it is not unacceptably big even for the small examples we want toanalyze.
We have presented a study on the feasibility of controlling result size after supercompilation – basedon using multi-result supercompilation coupled with a specific generalization strategy avoiding codeduplication. While the idea of multi-result supercompilation is not new, the idea to use it – combined witha specific generalization strategy – for taming code explosion in supercompilation results appears new.The current results of the approach – based on a small set of typical supercompilation examples – areencouraging: • the smallest configuration graphs we produce do not show exponential growth with respect to thesize of the input and typically are much smaller than the largest results; • often the results of small size (though not necessarily the smallest) also feature a significant numberof optimizations, comparable to what a standard classical supercompiler can achieve on the sametask.38 Optimizing Program Size Using MRSC
We have already hinted at some areas for potential improvements of the proposed approach: • study less conservative definitions of generalization; for example, avoid generalizing expressions,which will not be duplicated (because the corresponding function parameter is not referencedmultiple times), or expressions, whose duplication is not critical; • study definitions of configuration graph size, which more closely match the expected size of theresidualized program, to avoid missing interesting results, as was the case with “exp growth”.Clearly the first thing to do, however, is to test the proposed approach on a larger set of different examples.Due to the small number of analyzed examples, we consider the current proposal to be work in progress.An extended set of tests could give more insight on the strengths and weaknesses of the proposed technique,and would likely lead to ideas for further study.Provided we obtain mostly encouraging results from further testing, the next logical step would be tomake the approach more practical: • make an implementation covering a larger object language, closer to functional languages actuallyused in practice; • provide a larger set of functions for quickly filtering useful results.From a more theoretical perspective, it would be interesting to try to formulate properties of gener-alization, which can give some upper bounds on the code size of MRSC results. Because of unfolding,which can replace the current configuration with a new one of unrelated size (even if we avoid codeduplication at this point), the task is not trivial. On the other hand, at least in the case of our simple objectlanguage, we have a fixed list of function definitions, which can give us some bound on the configurationsize after unfolding. Acknowledgments
The author would like to thank the four anonymous reviewers for the helpfulsuggestions on improving the presentation of this article.
References [1] Max Bolingbroke & Simon Peyton Jones (2011):
Improving supercompilation: tag-bags, rollback,speculation, normalisation, and generalisation . In:
ICFP . Available at .[2] Sergei Grechanik, Ilya Klyuchnikov & Sergei Romanenko (2014):
Staged Multi-Result Super-compilation: Filtering by Transformation . In Andrei Klimov & Sergei Romanenko, editors:
Proceedings of the Fourth International Valentin Turchin Workshop on Metacomputation , Uni-versity of Pereslavl Publishing House, Pereslavl-Zalessky, Russia, pp. 54–78. Available at http://meta2014.pereslavl.ru/papers/2014_Grechanik_Klyuchnikov_Romanenko__Staged_Multi-Result_Supercompilation__Filtering_by_Transformation.pdf .[3] G. W. Hamilton (2007):
Distillation: Extracting the Essence of Programs . In:
Proceedings of the 2007ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation , PEPM ’07,Association for Computing Machinery, New York, NY, USA, pp. 61–70, doi:10.1145/1244381.1244391.[4] Neil D. Jones, Carsten K. Gomard & Peter Sestoft (1993):
Partial Evaluation and Automatic ProgramGeneration . Prentice-Hall, Inc., Upper Saddle River, NJ, USA.[5] Peter A. Jonsson & Johan Nordlander (2011):
Taming Code Explosion in Supercompilation . In:
Proceedings ofthe 20th ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation , PEPM ’11, Associationfor Computing Machinery, New York, NY, USA, pp. 33–42, doi:10.1145/1929501.1929507. .N. Krustev [6] Ilya Klyuchnikov & Dimitur Krustev (2014):
Supercompilation: Ideas and methods . The Monad Reader
Proving the Equivalence of Higher-Order Terms by Means ofSupercompilation . In Amir Pnueli, Irina Virbitskaite & Andrei Voronkov, editors:
Perspectives of SystemsInformatics: 7th International Andrei Ershov Memorial Conference, PSI 2009, Novosibirsk, Russia, June15-19, 2009. Revised Papers , Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 193–205, doi:10.1007/978-3-642-11486-1 17.[8] Ilya Klyuchnikov & Sergei A. Romanenko (2012):
Multi-result Supercompilation as Branching Growth ofthe Penultimate Level in Metasystem Transitions . In Edmund Clarke, Irina Virbitskaite & Andrei Voronkov,editors:
Perspectives of Systems Informatics , Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 210–226,doi:10.1007/978-3-642-29709-0 19.[9] Ilya G. Klyuchnikov & Sergei A. Romanenko (2010):
Towards Higher-Level Supercompilation . In A. P.Nemytykh, editor:
Proceedings of the Second International Workshop on Metacomputation (META 2010) , pp.82–101.[10] Ilya G. Klyuchnikov & Sergei A. Romanenko (2012):
Formalizing and Implementing Multi-Result Supercom-pilation . In A. V. Klimov & S. A. Romanenko, editors:
Proceedings of the Third International Workshop onMetacomputation (META 2012) , pp. 142–164.[11] Dimitur Krustev (2020):
Controlling the Size of Supercompiled Programs using Multi-result Supercompilation .Available at https://arxiv.org/abs/2006.02204 .[12] Dimitur Nikolaev Krustev (2014):
An Approach for Modular Verification of Multi-Result Supercompilers . InA.V. Klimov & S.A. Romamenko, editors:
Proceedings of the Fourth International Valentin Turchin Workshopon Metacomputation , University of Pereslavl Publishing House, Pereslavl-Zalessky, Russia, pp. 177–193.[13] Alexei P. Lisitsa & Andrei P. Nemytykh (2017):
Verification of Programs via Intermediate Interpretation . Electronic Proceedings in Theoretical Computer Science
Types and Verification for Infinite State Systems . PhD thesis, Dublin CityUniversity, Dublin, Ireland.[15] M. H. Sørensen (1994):
Turchin’s Supercompiler Revisited: an Operational Theory of Positive InformationPropagation . Master’s thesis, Københavns Universitet, Datalogisk Institut.[16] Morten Heine Sørensen & Robert Gl¨uck (1999):
Introduction to Supercompilation . In John Hatcliff, TorbenMogensen & Peter Thiemann, editors:
Partial Evaluation: Practice and Theory , Lecture Notes in ComputerScience