[PDF] Optimizing Program Size Using Multi-result Supercompilation

Abstract

Supercompilation is a powerful program transformation technique with numerous interesting applications. Existing methods of supercompilation, however, are often very unpredictable with respect to the size of the resulting programs. We consider an approach for controlling result size, based on a combination of multi-result supercompilation and a specific generalization strategy, which avoids code duplication. The current early experiments with this method show promising results - we can keep the size of the result small, while still performing powerful optimizations.

Full PDF

LL. Fribourg and M. Heizmann (Eds.): VPT/HCVS 2020EPTCS 320, 2020, pp. 125–139, doi:10.4204/EPTCS.320.9 c (cid:13)

D.N. KrustevThis work is licensed under theCreative Commons Attribution License.

Optimizing Program Size Using Multi-resultSupercompilation

Dimitur Nikolaev Krustev

IGE+XAO BalkanSoﬁa, Bulgaria evustd rk @ e ocox- m.agi

Supercompilation is a powerful program transformation technique with numerous interesting appli-cations. Existing methods of supercompilation, however, are often very unpredictable with respectto the size of the resulting programs. We consider an approach for controlling result size, based ona combination of multi-result supercompilation and a speciﬁc generalization strategy, which avoidscode duplication. The current early experiments with this method show promising results – we cankeep the size of the result small, while still performing powerful optimizations.

Supercompilation was invented by Turchin [17] and has found numerous applications, such as programoptimization[15, 16, 6], program analysis, software testing, formal veriﬁcation [7, 13, 14]. It is closelyrelated to partial evaluation [4] and deforestation. Different extensions of the basic supercompilationapproach are also studied, aiming to further increase its power – for example, distillation [3], higher-levelsupercompilation [9], etc.Supercompilation performs very powerful program transformations by simulating the actual executionof the input program on a whole set of possible inputs simultaneously. The ﬂip side of this power isthat the behavior of supercompilation – with respect to both transformation time and result size – canbe very unpredictable. This makes supercompilation problematic for including as an optimization stepof a standard compiler, for example. Measures have been proposed to make supercompilation morewell-behaved, both in execution time and result size [1, 5], while still achieving substantial improvementsin program run time. These proposals are all based on a combination of specially crafted and empiricallyﬁne-tuned heuristics. The main goal of the present study is to experiment with a more principled approachfor ﬁnding a better balance between the size and the run-time performance of programs produced bysupercompilation. This approach is based on a few key ideas: • use multi-result supercompilation [8, 10, 2] to systematically explore a large set of differentgeneralizations during the transformation process, leading to different trade-offs between performedoptimizations and code explosion; • carefully select a generalization scheme, which can – if applied systematically – avoid duplicatingcode during the supercompilation process itself; • re-use ideas from Grechanik et al. [2] to compactly represent and efﬁciently explore the set ofprograms resulting from multi-result supercompilation.We outline the main ideas of multi-result supercompilation – as well as the speciﬁc approach to itsimplementation that we use – in Sec. 2. We then describe the main contributions of this study: often abbreviated as MRSC from now on Optimizing Program Size Using MRSC • We propose a method of adapting some existing techniques (MRSC with efﬁcient queries overresult sets) to solve in a systematic way the problem of code explosion during supercompilation(Sec. 3). • We deﬁne a particular strategy for generalization during MRSC (Sec. 3.3), which: – avoids any risk of duplicating code during supercompilation (when applied); – avoids unnecessary increase of the search space of possible transformed programs, whichMRSC must explore. • We analyze the performance of the proposed strategy on several simple examples (Sec. 4).

Supercompilation is often deﬁned as a transformation of “conﬁgurations” – data structures containinginformation about the set of states of program execution currently being explored. The conﬁgurationsare typically produced by a process called “driving”, and organized in “conﬁguration trees”. Sometimesthe current conﬁguration is similar enough to some previous one, and we can “fold” the former to thelatter, turning the conﬁguration tree into a “conﬁguration graph”. Finally, we can generate a new “residual”program from the conﬁguration graph, where folding typically corresponds to calls to functions introducedby new recursive deﬁnitions. To ensure termination of the supercompilation process, a check – usuallycalled a “whistle” – is systematically performed on the sequence of conﬁgurations being produced. Whenthis whistle signals a potential risk of non-termination, one possibility to continue the supercompilationprocess is to perform a “generalization” – replace the current conﬁguration by another, which avoids thenon-termination risk, possibly by forgetting some information. The whistle usually marks the currentconﬁguration (“lower” in the tree) as risking non-termination with respect to some other conﬁgurationproduced earlier (“higher” in the tree). Typical positive supercompilers – as described, for example, bySørensen et al. [16] – make a choice whether to generalize the “lower” or the “upper” conﬁguration.One of the key insights behind multi-result supercompilation is that the place where the whistle hasblown is not always the best place to make a generalization. The proposed solution is radical: do notgeneralize when the whistle has blown; instead, at any driving step check if suitable generalizations existand continue driving not only the non-generalized conﬁguration, but also all generalized ones, giving riseto multiple alternative results of driving. This process is illustrated by the example at the end of Sec. 3.3.Our implementation of multi-result supercompilation mostly follows the same generic frameworkused in [2, 12]. It offers some important simpliﬁcations compared to the original work of Klyuchnikov etal. [8, 10]: • It is based on “big-step” driving (called so by analogy with big-step operational semantics, asdriving a conﬁguration produces the full conﬁguration subtree corresponding to it). • The whole set of transformed programs is represented compactly in a tree-like data structure,which further permits not only recovering the full set of conﬁguration graphs, but also performingefﬁciently certain kinds of queries on this set.The compact representation of the set of graphs is shown in Fig. 1. We use direct excerpts of the F , but hopefully they will be readable by anyone familiar with other functionallanguages such as OCaml or Haskell. Another important caveat is that we have implemented MRSC forprograms in a speciﬁc language, so some details will only become clear once we introduce this language. Available at https://github.com/dkrustev/MRScpOptSize .N. Krustev t y p e

MConf = M u l t i D r i v e S t e p R e s u l t ∗ Exp t y p e

G r a p h S e t = | GSNone | GSFold o f

MConf ∗ i n t ∗ l i s t < VarName ∗ VarName > | GSBuild o f

MConf ∗ l i s t < l i s t < GraphSet >> l e t r e c g s e t 2 g r a p h s ( gs : G r a p h S e t ) : seq < MConf ∗ ConfGraph > = match gs with | GSNone − > Seq . empty | GSFold ( conf , n , r e n ) − > Seq . s i n g l e t o n ( conf , CGFold ( n , r e n ) ) | GSBuild ( conf , a l t s ) − > l e t b u i l d G r a p h ’ s u b G r a p h s = ( conf , b u i l d G r a p h ( snd c o n f ) s u b G r a p h s ) l e t b u i l d A l t a l t =Seq . map b u i l d G r a p h ’ ( Seq . c a r t e s i a n ( Seq . map g s e t 2 g r a p h s a l t ) )Seq . c o l l e c t b u i l d A l t a l t s Figure 1: Representation and Expansion of Graph SetsIn particular, the conﬁgurations (

MConf ) we use are pairs of a multi-result driving step output and anexpression of the language, as described in Sec. 3.3. A language-speciﬁc helper function buildGraph builds a conﬁguration graph from a given conﬁguration and subgraphs. Still, the details of the languageare not important for understanding the core MRSC algorithm; indeed, such details are successfullyabstracted away in [2, 12]. We prefer to show excerpts from our actual implementation for concreteness.The representation from Fig. 1 is termed lazy graph by Grechanik et al. [2], and can be viewed as adomain-speciﬁc language (DSL) describing the construction of the complete set of conﬁguration graphsproduced by multi-result supercompilation. • Node

GSNone is used when the whistle has blown. It represents an empty set of conﬁgurationgraphs. • Node

GSFold is used when folding is possible. It gives the relative distance to the upper node towhich we fold, plus the renaming (ﬁnite mapping of variables to variables), which makes the foldedconﬁgurations compatible. Note that, following Grechanik et al. [2], we consider only folding to anode on the path from the current node to the root of the graph. This choice enables us to keep therepresentation of the set of conﬁguration graphs we produce simpler. Also, similar to other positivesupercompilers [15, 16, 6], it is only possible to fold to a node, which is a renaming of the currentone. • Node

GSBuild is the most complicated one, representing a list of alternative developments (drivingor generalization) of the current conﬁguration. Each alternative, in turn, gives rise to a list of newconﬁgurations to explore, and hence, to a list of nested graph sets.The semantics of this DSL is shown in the same Fig. 1 as a function gset2graphs expanding a

GraphSet into a sequence of conﬁguration graphs. Note the use of

Seq.cartesian to compose thesubgraphs of the graph node of each alternative conﬁguration.The main MRSC algorithm – the one that builds the graph set of a given initial conﬁguration –is presented with some simpliﬁcations as Algorithm 1. We can ignore the details about splitting theconﬁguration history into a local and a global one – they mostly follow established heuristics as inSørensen et al. [15, 16]. The overall approach is simple - if folding is possible, we produce a fold nodeand stop pursuing the current conﬁguration. Otherwise we check the whistle – in our case, the same28

Optimizing Program Size Using MRSC

Algorithm 1

Main MRSC Algorithm function MR S CP R EC ( P , l , h , c ) (cid:46) P – program (function deﬁnitions); l – nesting level; (cid:46) h – history (list of tuples: local/global ﬂag; level; conﬁguration); c – conﬁguration if ∃ ( , l (cid:48) , c (cid:48) ) ∈ h , ρ – renaming : c = rename ( c (cid:48) , ρ ) then return GSFold ( c , l − l (cid:48) , ρ ) else rs ← multiDriveSteps ( P , c ) if ∃ ( MDSRCases ) ∈ rs then hk ← HEGlobal relHist ← [( hk (cid:48) , l , c ) | hk (cid:48) = HEGlobal ] else hk ← HELocal relHist ← takeWhile ( λ ( hk (cid:48) , l , c ) . hk (cid:48) = HELocal , h ) end if if ∃ ( , , c (cid:48) ) ∈ relHist : c (cid:48) (cid:69) c then (cid:46) (cid:69) denotes homeomorphic embedding return GSNone else css ← map ( mdsrSubExps , rs ) h (cid:48) ← ( hk , l , c ) :: h return GSBuild ( c , map ( λ cs . map ( λ c . MR S CP R EC ( P , l + , h (cid:48) , c ) , cs ) , css )) end if end if end function function MR S CP ( P , c ) return MR S CP R EC ( P , , [] , c ) end function homeomorphic embedding relation, which is used in other positive supercompilers [16]. If it blows, westop immediately with an empty set of resulting graphs. When there is neither folding nor a whistle, wecontinue analyzing the execution of the current conﬁguration – based on 2 language-speciﬁc functions: • multiDriveSteps returns a number of alternatives for the current conﬁguration – either a set ofnew conﬁgurations produced by a single step of driving, or by (possibly several different forms of)generalization. • mdsrSubExps returns – for a given alternative produced by the previous function – the list ofsub-conﬁgurations (in our case – subexpressions) that must be subjected to further analysis.The implementation of both functions is described in Sec. 3. Once we have this list of lists of sub-conﬁgurations, we simply apply the same algorithm recursively, but with extended history. Readersfamiliar with the implementation details of other supercompilers are invited to compare them to thesimplicity of this MRSC approach. .N. Krustev Expressionse ::= x variable | a ( e , . . . , e n ) call Call kinds a ::= C constructor | f function Patterns p ::= C ( x , . . . , x n ) Function deﬁnitionsd ::= f ( x , . . . , x n ) = e ordinary function | g ( p , y , . . . , y m ) = e pattern-matching . . . function g ( p n , y , . . . , y m ) = e n Programs P ::= d , . . . , d n Figure 2: Object Language Syntax t y p e

D r i v e S t e p R e s u l t = | DSRNone | DSRCon o f

ConName ∗ l i s t < Exp > | DSRUnfold o f

Exp | DSRCases o f

VarName ∗ l i s t < P a t t e r n ∗ Exp > Figure 3: Result of a Single Step of Driving

The object language we consider is a ﬁrst-order functional language with ordinary (not pattern-matching)and pattern-matching function deﬁnitions. Its syntax is summarized in Fig. 2. A very similar language –with call-by-name semantics – is often used in many introductions to positive supercompilation [15, 16, 6].A notable restriction in our case is that we omit if-expressions and a built-in generic equality. We use theconvention that data constructors always start with an uppercase letter, while function and variable namesstart with a lowercase one. The patterns of any function deﬁnition must be exhaustive, not nested, and non-overlapping. As a technical detail, we do not make a distinction between ordinary and pattern-matchingfunctions at each call site, as this information is uniquely determined by the function deﬁnition itself.

Let us recall what driving looks like for this simple language, in the case of positive supercompilation(which is a simpliﬁcation of the more general approach pioneered by Turchin [17]). As a technical device,we deﬁne a single step of driving, producing a result of type

DriveStepResult , as deﬁned in Fig. 3 • We cannot drive a variable any further: drive (cid:74) x (cid:75) = DSRNone ; • Driving a constructor results in a constructor node with all arguments available for further driving: drive (cid:74) C ( e , . . . , e n ) (cid:75) = DSRCon ( C , e , . . . , e n ) ; • If we stumble upon a call to an ordinary function, we simply unfold its deﬁnition: drive (cid:74) f ( e , . . . , e n ) (cid:75) = DSRUnfold ( e [ x → e , . . . , x n → e n ]) , where f ( x , . . . , x n ) = e ∈ P and e [ x → e , . . . , x n → e n ] denotes simultaneous substitution; • The most interesting cases concern a call to a pattern-matching function, as the situation is differentdepending on the kind of the ﬁrst argument: – drive (cid:74) g ( C ( e (cid:48) , . . . , e (cid:48) m ) , e , . . . , e n ) (cid:75) = DSRUnfold ( e [ x → e (cid:48) , . . . , x m → e (cid:48) m , y → e , . . . , y n → e n ]) where g ( C ( x , . . . , x m ) , y , . . . , y n ) = e ∈ P ;30 Optimizing Program Size Using MRSC t y p e

ConfGraph = | CGLeaf o f

Exp | CGCon o f

ConName ∗ l i s t < ConfGraph > | CGUnfold o f

ConfGraph | CGCases o f

VarName ∗ l i s t < P a t t e r n ∗ ConfGraph > | CGFold o f i n t ∗ l i s t < VarName ∗ VarName > | CGLet o f l i s t < VarName ∗ ConfGraph > ∗ ConfGraph

Figure 4: Representation of a Conﬁguration Graph – drive (cid:74) g ( x , e , . . . , e n ) (cid:75) = DSRCases ( x , propagate ( x , p , ( e , . . . , e n ) , e (cid:48) ) , . . . , propagate ( x , p m , ( e , . . . , e n ) , e (cid:48) m )) where g ( p , y , . . . , y n ) = e (cid:48) , . . . , g ( p m , y , . . . , y n ) = e (cid:48) m ∈ P and propagate performs positive information propagation by substituting (a suitable renamingof) p i for x in the corresponding branch i ; – drive (cid:74) g ( f ( e (cid:48) , . . . , e (cid:48) m ) , e , . . . , e n ) (cid:75) = dsrMap ( (cid:74) g ( • , e , . . . , e n ) (cid:75) , drive (cid:74) f ( e (cid:48) , . . . , e (cid:48) m ) (cid:75) ) where dsrMap transforms a driving step result by splicing it in an expression with a hole .We deliberately omit many low-level details in this description, as they are well-known and can be foundin most introductions to positive supercompilation [15, 16, 6]. Using this deﬁnition of driving, plus theusual deﬁnitions of folding, whistle, and generalization, we can build conﬁguration graphs of the formshown in Fig. 4. Note that we use the same representation of variables inside object-language programsand inside conﬁguration graphs, as no confusions arise (assuming suitable measures for avoiding variablecapture). As already mentioned, a key difference in multi-result supercompilation is that driving and generalizationare grouped together: a multi-driving step can return not one, but several alternative conﬁgurations. Oneof them is typically the result of standard driving, but the others can be different kinds of generalizations.The choice of generalization strategy depends on the intended use of the multi-result supercompiler. In ourcase, the main goal is to ﬁnd a program of optimal size among the results. Previous analyses have shownthat one of the main reasons for code size explosion in supercompilation is the unrestricted duplicationof subexpressions during driving. Of course, sometimes such duplication pays off, as it leads to newopportunities for optimization. But this is not always the case. These observations lead us to consider twoguiding principles that should help us attain our goal: • if standard driving can duplicate existing code, provide also a generalized conﬁguration, where noexisting (non-trivial) subexpressions are duplicated ; • if there is no risk of duplicating code, avoid any generalization, as it will be unlikely to help withthe size of the result.To apply these principles, we analyze standard driving, case by case, to see where we need to avoid codeduplication by generalization. In order to express generalization as a possible result of a driving step,we extend our representation of a step result - Fig. 5. The same ﬁgure shows the implementation of mdsrSubExps that we have encountered earlier. The source function multiDriveSteps will be denoted mrdrive for brevity below. We implement expressions with a single hole as functions from expressions to expressions. We do not attempt to remove already existing code duplication, only to avoid introducing new duplication. .N. Krustev t y p e

M u l t i D r i v e S t e p R e s u l t = | MDSRLeaf o f

Exp | MDSRCon o f

ConName ∗ l i s t < Exp > | MDSRUnfold o f

Exp | MDSRCases o f

VarName ∗ l i s t < P a t t e r n ∗ Exp > | MDSRLet o f l i s t < VarName ∗ Exp > ∗ Exp l e t mdsrSubExps ( mdsr : M u l t i D r i v e S t e p R e s u l t ) : l i s t < Exp > = match mdsr with | MDSRLeaf − > [ ] | MDSRCon( , e s ) − > e s | MDSRUnfold e − > [ e ] | MDSRCases ( , c a s e s ) − > L i s t . map snd c a s e s | MDSRLet ( b i n d s , e ) − > e : : L i s t . map snd b i n d s Figure 5: One Step of Multi-result Driving • The variable case is again trivial: mrdrive (cid:74) x (cid:75) = [ MDSRLeaf ( x )] ; (We use [ . . . ; . . . ; . . . ] to denote a list of results.) • Driving a constructor does not duplicate code, so we again make no generalization: mrdrive (cid:74) C ( e , . . . , e n ) (cid:75) = [ MDSRCon ( C , e , . . . , e n )] ; • The unfolding of a call to an ordinary function can produce code duplication, if some argumentsappear multiple times in the body of the deﬁnition. We conservatively generalize all arguments ofthe call : mrdrive (cid:74) f ( e , . . . , e n ) (cid:75) = [ MDSRLet ( y = e , . . . , y n = e n , e [ x → y , . . . , x n → y n ]) ; MDSRUnfold ( e [ x → e , . . . , x n → e n ])] , where y , . . . , y n are fresh and f ( x , . . . , x n ) = e ∈ P . Notice that we shallalways place the generalization result before the driving result in the list. In this way, when weexpand the lazy graph using gset2graphs , conﬁguration graphs earlier in the resulting sequencewill have more generalizations; • The case of a pattern-matching call with a known constructor is completely analogous to theprevious one: mrdrive (cid:74) g ( C ( e (cid:48) , . . . , e (cid:48) m ) , e , . . . , e n ) (cid:75) = [ MDSRLet ( u = e (cid:48) , . . . , u m = e (cid:48) m , z = e , . . . , z n = e n , e [ x → u , . . . , x m → u m , y → z , . . . , y n → z n ]) ; MDSRUnfold ( e [ x → e (cid:48) , . . . , x m → e (cid:48) m , y → e , . . . , y n → e n ])] where u , . . . , u m , z , . . . , z n are fresh and g ( C ( x , . . . , x m ) , y , . . . , y n ) = e ∈ P ; • When we pattern-match on a variable, information propagation can introduce some code duplication.The code potentially being duplicated, however, is always of the form C ( x , . . . , x n ) . We have cur-rently decided to accept this limited form of potential duplication, without adding a generalization: mrdrive (cid:74) g ( x , e , . . . , e n ) (cid:75) = [ MDSRCases ( x , propagate ( x , p , ( e , . . . , e n ) , e (cid:48) ) , . . . , propagate ( x , p m , ( e , . . . , e n ) , e (cid:48) m ))] where g ( p , y , . . . , y n ) = e (cid:48) , . . . , g ( p m , y , . . . , y n ) = e (cid:48) m ∈ P ; • The case of matching on a function call is perhaps the least obvious. As during normal driving wereuse the result of driving the nested call, it is not clear in advance what it will be. So we prefer tobe conservative, and add a full generalization of the outer call here: mrdrive (cid:74) g ( f ( e (cid:48) , . . . , e (cid:48) m ) , e , . . . , e n ) (cid:75) = [ MDSRLet ( x = f ( e (cid:48) , . . . , e (cid:48) m ) , x = e , . . . , x n = e n , g ( x , . . . , x n )) ; mdsrMap ( (cid:74) g ( • , e , . . . , e n ) (cid:75) , mrdrive (cid:74) f ( e (cid:48) , . . . , e (cid:48) m ) (cid:75) )] where x , . . . , x n are, as usual,fresh. We leave a more reﬁned generalization treatment for future work. Optimizing Program Size Using MRSC

GSBuild (g(Cons(A(), Nil()), z))alt. 1 alt. 2 GSBuild (let x0 = _; xs0 = _; y0 = _ in f(g(xs0, y0)))alt. 1 alt. 2 GSBuild (A())alt. 1 GSBuild (Nil())alt. 1 GSBuild (z)alt. 1 GSBuild (f(g(Nil(), z)))alt. 1 alt. 2 GSBuild (let w0 = _ in B(w0, w0))alt. 1 GSBuild (g(xs0, y0))alt. 1 GSBuild (B(g(xs0, y0), g(xs0, y0)))alt. 1 GSBuild (w0)alt. 1 GSBuild (w0)alt. 1 GSBuild (case xs0 = Nil() : y0)alt. 1 GSFold (case xs0 = Cons(x0, xs1) : f(g(xs1, y0)), [xs1 := xs0; y0 := y0]) ... ... ... ... ...

Figure 6: Partial Lazy Graph of “exp growth” Small ExampleTo illustrate the process on a very simple example, consider the example program from Fig. 7f-g(explained in Sec. 4), but specialized to a smaller input: g(Cons(A, Nil), z) . A part of the resultinglazy graph (omitting subgraphs for alternatives other than the ﬁrst) is shown in Fig. 6. The full lazy graphfor the same example is shown in the extended version of this article [11]. • Multi-result driving produces 2 alternatives from the initial expression g(Cons(A, Nil), z) : let x0 = A; xs0 = Nil; y0 = z in f(g(xs0, y0)) and f(g(Nil, z)) . As mentioned, wefurther consider only the subgraph for the ﬁrst alternative, which is the result of a generalization. • Driving cannot transform any further the subexpressions A , Nil , and z , so they end up as leafsin the lazy graph. Driving the subexpression f(g(xs0, y0)) again produces 2 alternatives: letw0 = g(xs0, y0) in B(w0, w0) and B(g(xs0, y0), g(xs0, y0)) . • The ﬁrst alternative here has 2 subexpressions: – B(w0, w0) , where driving of its subexpressions in turn leads to 2 leafs, both w0 ; – g(xs0, y0) , where driving must perform a case analysis on x0 , resulting in 2 subgraphs: ∗ case xs0 = Nil() : y0 , where driving cannot proceed any further; ∗ case xs0 = Cons(x0, xs1) : f(g(xs1, y0)) – this expression is a renaming of f(g(xs0, y0)) , encountered above, so we end up with a folding node. We have studied the behavior of the proposed multi-result supercompiler on a few simple examples, witha focus on the resulting program size. A straightforward approach, which works for some of the smallerexamples, is to enumerate all resulting conﬁguration graphs and gather statistics from them. For oneexample – the well-known “KMP test” – this approach turned out to be too time consuming. So, to makeit possible to analyze larger examples, we adapted the approach of Grechanik et al. [2], which permits toﬁlter the sequence of conﬁguration graphs produced by MRSC, without explicitly enumerating them. Inparticular, we have implemented functions to extract: • the ﬁrst conﬁguration graph in the sequence (recall that by the ordering of results during driving, itshould contain the most generalizations); • the last one – with the least number of generalizations; .N. Krustev • the graph with the smallest number of nodes; • the graph with the largest number of nodes.The implementation of some of these functions is shown in the extended version of this article [11]. Theseimplementations take polynomial time (and often almost linear time in practice) with respect to the sizeof the lazy graph, while the number of conﬁguration graphs can be exponential with respect to this size.As such, they are key to making the proposed approach tractable on larger examples. By comparingthe ﬁrst to the smallest and the largest graph we can see how successful the proposed generalizationstrategy is in controlling code size. By comparing the last to the ﬁrst and the smallest graph, we can seethe improvements that our approach can achieve with respect to standard supercompilation – as the lastconﬁguration graph usually corresponds to the one that a standard positive supercompiler would produce.After we extract a single conﬁguration graph out of the lazy graph, we can further produce a newprogram in the object language from this graph. This residualization process involves 2 main steps: ) Extract a program in a language extended with case and let expressions from the conﬁgurationgraph. This step also creates recursive function deﬁnitions from fold nodes. ) Remove case and let expressions (by a method similar to lambda lifting) to obtain a program in the original object language.The residualization process also includes several optimizations: a ) removing “trivial” let expressions(expressions of the form let x = y in ... , or where the variable is only used once); b ) removingduplicated function deﬁnitions (a limited form of common subexpression elimination). The results shownin Fig. 8-12 are all produced by applying this residualization process on the given conﬁguration graph.Note, however, that we have decided to compare the sizes of the conﬁguration graphs, and not of theresidualized programs, for a couple of reasons: • The size of the resulting program depends not only on the optimizations performed by supercompi-lation proper (which are reﬂected directly in the conﬁguration graph), but also on the additionaloptimizations performed during residualization. The latter are standard optimizations that can beperformed on any program, no matter if it is produced by supercompilation or written by hand. • As already mentioned, we can efﬁciently extract a conﬁguration graph of optimum size from thelazy graph. We have no way to do the same with respect to the size of the residual program.Here we analyze in detail several of the example programs (Fig. 7) showing the most interestingresults. The extended version of this article [11] discusses a few additional examples. • “double append” (Fig. 7a-b) is traditionally used to demonstrate the power of deforestation andsupercompilation. It can also be seen as a ﬁrst step of a proof that list append is associative. • The “KMP test” (Fig. 7c-d) is another classical example, which demonstrates the power ofsupercompilation with respect to deforestation and partial evaluation. It involves specializing asublist predicate to a ﬁxed sublist being searched in an unknown list. • “ eqBool symmetry” (Fig. 7c, 7e) is intended to show that Boolean equality is symmetric. • “exp growth” (Fig. 7f-g) is an example taken from Sørensen’s thesis [15, Example 11.4.1], whoattributes it to Sestoft. It is aimed to demonstrate how classical supercompilation can produceoutput programs, which grow exponentially with respect to the input.The results of the multi-result supercompilation – using our speciﬁc generalization approach – aresummarized in Table 1. We give the conﬁguration graph sizes for each of the four types of results (ﬁrst,last, minimum/maximum size).Several interesting observations arise from analyzing the selected resulting programs themselves:34 Optimizing Program Size Using MRSC append ( Nil , ys ) = ys ;append ( Cons ( x , xs ) , ys ) = Cons ( x , append ( xs , ys ) ) ; (a) List Append Program append ( append ( xs , ys ) , z s ) (b) Double Append n o t ( True ) = F a l s e ; n o t ( F a l s e ) = True ; eqBool ( True , b ) = b ; eqBool ( F a l s e , b ) = n o t ( b ) ; match ( Nil , ss , op , os ) = True ; match ( Cons ( p , pp ) , ss , op , os ) = matchCons ( ss , p , pp , op , os ) ; matchCons ( Nil , p , pp , op , os ) = F a l s e ; matchCons ( Cons ( s , s s ) , p , pp , op , os ) = matchHdEq ( eqBool ( p , s ) , pp , ss , op , os ) ; matchHdEq ( True , pp , ss , op , os ) = match ( pp , ss , op , os ) ; matchHdEq ( F a l s e , pp , ss , op , os ) = n e x t ( os , op ) ; n e x t ( Nil , op ) = F a l s e ; n e x t ( Cons ( s , s s ) , op ) = match ( op , ss , op , s s ) ; i s S u b l i s t ( p , s ) = match ( p , s , p , s ) ; (c) Substring Program i s S u b l i s t ( Cons ( True , Cons ( True , Cons ( F a l s e , N i l ) ) ) , s ) (d) “KMP Test” eqBool ( eqBool ( x , y ) , eqBool ( y , x ) ) (e) Bool Equality Symmetry g ( Nil , y ) = y ; g ( Cons ( x , xs ) , y ) = f ( g ( xs , y ) ) ; f (w) = B(w, w) ; (f) Program Demonstrating Exponential Growth g ( Cons (A, Cons (A, Cons (A, N i l ) ) ) , z ) (g) Expression Demonstrating Exponential Growth Figure 7: Example Programs .N. Krustev eqBool symmetry 16 17 16 30exp growth 15 37 15 57 • In 2 cases (“ eqBool symmetry” and “exp growth”) the minimum program coincides with the ﬁrst(most generalizing) one; in 1 case (“double append”) – with the last (least generalizing) result.In “KMP test” the difference between the minimum-size and the last program is minimal. Thisconﬁrms the highly unpredictable impact of driving+generalization on program size. • Note, however, that “double append” is a bit of an outlier – it results in only 3 programs, one ofwhich (the ﬁrst) is isomorphic to the input. The smallest one is the expected optimized version,shown in Fig. 8. • In all cases, the size of the ﬁrst result is closer to the minimum than to the maximum size. Thisconﬁrms that our choice of generalization ensures limited growth of the result size. • The smallest/last “KMP test” graph produces the expected optimal (as execution time) program, asshown in Fig. 9. • The last “ eqBool symmetry” graph produces a program, which can indeed serve as evidence of thesymmetry of Boolean equality – Fig. 10. • The results of “exp growth” are especially interesting in view of our main goal. The last result isthe same as produced by Sørensen’s supercompiler –

B(B(B(z, z), B(z, z)), B(B(z, z),B(z, z))) – clearly suffering from code-size explosion. The minimum-size (and also ﬁrst)program – shown in Fig. 11 – avoids the pitfall of code-size explosion, thanks to generalization. Ithas, however, also missed some opportunities for static evaluation. Interestingly, if we analyze thefull set of results, there is another graph of size 17 that produces a program, which has eliminatedall possible static reductions, while avoiding the risk of code explosion – Fig. 12. Apparently, ifwe do not want to miss such results, we need a more reﬁned approach for looking for (close to)minimum-size programs. One explanation of this discrepancy is that – as already explained – ratherthan comparing the sizes of the programs produced by residualizing these graphs, we compareconﬁguration graph sizes. A possible compromise is to study better size measures for conﬁgurationgraphs, instead of the simple node count we currently use. For example, ignoring unfolding nodeswhen calculating size can give a better idea of the expected size of the residualized program, asunfolding nodes are skipped during residualization. Another possibility is to ﬁnd not only (one of)the minimum-size result(s), but the N smallest results ( N being an input parameter).Based on the last observation above, we have implemented modiﬁed queries for ﬁnding the graph ofminimum (and maximum) size, where unfolding nodes are not counted – with very encouraging results: • for “double append”, “KMP test”, and “ eqBool symmetry” the modiﬁed query returns the sameoptimal programs discussed above, which were also found by the existing queries for minimum orlast program; • for “exp growth”, the modiﬁed-minimum query again ﬁnds the optimal program – shown Fig. 12 –which was missed by all standard queries.36 Optimizing Program Size Using MRSC f 0 ( ys , z s ) = f 0 c a s e 0 ( ys , z s ) ;f 0 c a s e 0 ( N i l ( ) , z s ) = z s ;f 0 c a s e 0 ( Cons ( x0 , xs0 ) , z s ) = Cons ( x0 , f 0 ( xs0 , z s ) ) ;f ( xs , ys , z s ) = f c a s e 0 ( xs , ys , z s ) ;f c a s e 0 ( N i l ( ) , ys , z s ) = f 0 ( ys , z s ) ;f c a s e 0 ( Cons ( x00 , xs00 ) , ys , z s ) = Cons ( x00 , f ( xs00 , ys , z s ) ) ;e x p r e s s i o n : f ( xs , ys , z s )

Figure 8: Optimized double-append f 1 0 0 0 1 0 0 ( s0 , s s 1 ) = f 1 0 0 0 1 0 0 c a s e 0 ( s0 , s s 1 ) ;f 1 0 0 0 1 0 0 c a s e 0 ( True ( ) , s s 1 ) = f 1 0 0 0 1 0 0 c a s e 1 ( s s 1 ) ;f 1 0 0 0 1 0 0 c a s e 0 ( F a l s e ( ) , s s 1 ) = f 0 ( s s 1 ) ;f 1 0 0 0 1 0 0 c a s e 1 ( N i l ( ) , ) = F a l s e ( ) ;f 1 0 0 0 1 0 0 c a s e 1 ( Cons ( s0 , s s 0 ) , ) = f 1 0 0 0 1 0 0 c a s e 2 ( s0 , s0 , s s 0 ) ;f 1 0 0 0 1 0 0 c a s e 2 ( True ( ) , s0 , s s 0 ) = f 1 0 0 0 1 0 0 ( s0 , s s 0 ) ;f 1 0 0 0 1 0 0 c a s e 2 ( F a l s e ( ) , s0 , s s 0 ) = True ( ) ;f 0 ( s ) = f 0 c a s e 0 ( s ) ;f 0 c a s e 0 ( N i l ( ) , ) = F a l s e ( ) ;f 0 c a s e 0 ( Cons ( s0 , s s 0 ) , ) = f 0 c a s e 1 ( s0 , s s 0 ) ;f 0 c a s e 1 ( True ( ) , s s 0 ) = f 0 c a s e 2 ( s s 0 ) ;f 0 c a s e 1 ( F a l s e ( ) , s s 0 ) = f 0 ( s s 0 ) ;f 0 c a s e 2 ( N i l ( ) , ) = F a l s e ( ) ;f 0 c a s e 2 ( Cons ( s0 , s s 1 ) , ) = f 1 0 0 0 1 0 0 ( s0 , s s 1 ) ;e x p r e s s i o n : f 0 ( s )

Figure 9: Optimized KMP Test Result m a i n c a s e 0 ( True ( ) , y ) = m a i n c a s e 2 ( y ) ;m a i n c a s e 0 ( F a l s e ( ) , y ) = m a i n c a s e 2 ( y ) ;m a i n c a s e 2 ( True ( ) , ) = True ( ) ;m a i n c a s e 2 ( F a l s e ( ) , ) = True ( ) ;e x p r e s s i o n : m a i n c a s e 0 ( x , y )

Figure 10: “ eqBool symmetry” Optimal Result f 3 ( xs0 , y0 ) = f 3 l e t 0 ( f 3 c a s e 0 ( xs0 , y0 ) ) ;f 3 l e t 0 ( w0 ) = B( w0 , w0 ) ;f 3 c a s e 0 ( N i l ( ) , y0 ) = y0 ;f 3 c a s e 0 ( Cons ( x0 , xs1 ) , y0 ) = f 3 ( xs1 , y0 ) ;e x p r e s s i o n : f 3 ( Cons (A ( ) , Cons (A ( ) , N i l ( ) ) ) , z ) )

Figure 11: “exp growth” Minimum-size Result m a i n l e t 1 ( w0 ) = B( w0 , w0 ) ;e x p r e s s i o n : m a i n l e t 1 ( m a i n l e t 1 (B( z , z ) ) )

Figure 12: “exp growth” Optimal Result .N. Krustev

The unpredictability of supercompilation with respect to both performance and result size is a well-established issue. Problems with code duplication and result size are discussed by Sørensen [15], forexample. Few works directly tackle this problem, however. Bolingbroke et al. [1] study heuristics forimproving the general performance of a speciﬁc supercompiler, in order to use it as an automatic phase ofan optimizing compiler for Haskell. Some of these heuristics concern avoiding code duplication, and as aconsequence they may lead to improvements in result size, while still producing faster programs. Thekey idea is to roll back – discarding some work done by the supercompiler – if the heuristics indicatethat this work is not leading to a useful result (a form of generalization). Speculative execution can alsohelp with code size in some instances. One problem with this approach is that it is not clear how togeneralize it or apply it to a completely different supercompiler. The main advantage is that by carefullyselecting heuristics, suitable for the speciﬁc supercompiler, the authors report good results on a number ofbenchmarks.Jonsson et al. [5] explicitly address both the issue of code explosion and the related issue of su-percompilation time. The main idea is again to discard the result of supercompiling certain programfragments if they do not meet certain usefulness criteria (based on the number of reductions performed bythe supercompiler and the resulting code size). We can again consider this a form of generalization. Whensuch generalization happens, however, is based on speciﬁc hand-picked heuristics, apparently based onanalyzing the results of different test runs.Grechanik et al. [2] propose a generic framework for building “big-step” multi-result supercompilers,and a way to efﬁciently extract results satisfying certain criteria. Selecting the smallest result is one of thecriteria studied. Optimization of the result size is not a goal of their work, however. The authors haveinstantiated the framework on a language simulating counter systems, which is not Turing-complete, andthus does not demonstrate some of the complications coming with Turing-complete object languages.The work of Grechanik et al. [2] is most closely related to ours: we re-use the ideas for implementingour multi-result supercompiler and for efﬁciently ﬁltering its results by criteria. Our main emphasis,however, is on using MRSC together with a generalization strategy, which is explicitly tailored towardsavoiding code duplication and – consequently – optimizing result size. We pay much less attention tosupercompilation time, as long as it is not unacceptably big even for the small examples we want toanalyze.

We have presented a study on the feasibility of controlling result size after supercompilation – basedon using multi-result supercompilation coupled with a speciﬁc generalization strategy avoiding codeduplication. While the idea of multi-result supercompilation is not new, the idea to use it – combined witha speciﬁc generalization strategy – for taming code explosion in supercompilation results appears new.The current results of the approach – based on a small set of typical supercompilation examples – areencouraging: • the smallest conﬁguration graphs we produce do not show exponential growth with respect to thesize of the input and typically are much smaller than the largest results; • often the results of small size (though not necessarily the smallest) also feature a signiﬁcant numberof optimizations, comparable to what a standard classical supercompiler can achieve on the sametask.38 Optimizing Program Size Using MRSC

We have already hinted at some areas for potential improvements of the proposed approach: • study less conservative deﬁnitions of generalization; for example, avoid generalizing expressions,which will not be duplicated (because the corresponding function parameter is not referencedmultiple times), or expressions, whose duplication is not critical; • study deﬁnitions of conﬁguration graph size, which more closely match the expected size of theresidualized program, to avoid missing interesting results, as was the case with “exp growth”.Clearly the ﬁrst thing to do, however, is to test the proposed approach on a larger set of different examples.Due to the small number of analyzed examples, we consider the current proposal to be work in progress.An extended set of tests could give more insight on the strengths and weaknesses of the proposed technique,and would likely lead to ideas for further study.Provided we obtain mostly encouraging results from further testing, the next logical step would be tomake the approach more practical: • make an implementation covering a larger object language, closer to functional languages actuallyused in practice; • provide a larger set of functions for quickly ﬁltering useful results.From a more theoretical perspective, it would be interesting to try to formulate properties of gener-alization, which can give some upper bounds on the code size of MRSC results. Because of unfolding,which can replace the current conﬁguration with a new one of unrelated size (even if we avoid codeduplication at this point), the task is not trivial. On the other hand, at least in the case of our simple objectlanguage, we have a ﬁxed list of function deﬁnitions, which can give us some bound on the conﬁgurationsize after unfolding. Acknowledgments

The author would like to thank the four anonymous reviewers for the helpfulsuggestions on improving the presentation of this article.

References [1] Max Bolingbroke & Simon Peyton Jones (2011):

Improving supercompilation: tag-bags, rollback,speculation, normalisation, and generalisation . In:

ICFP . Available at .[2] Sergei Grechanik, Ilya Klyuchnikov & Sergei Romanenko (2014):

Staged Multi-Result Super-compilation: Filtering by Transformation . In Andrei Klimov & Sergei Romanenko, editors:

Proceedings of the Fourth International Valentin Turchin Workshop on Metacomputation , Uni-versity of Pereslavl Publishing House, Pereslavl-Zalessky, Russia, pp. 54–78. Available at http://meta2014.pereslavl.ru/papers/2014_Grechanik_Klyuchnikov_Romanenko__Staged_Multi-Result_Supercompilation__Filtering_by_Transformation.pdf .[3] G. W. Hamilton (2007):

Distillation: Extracting the Essence of Programs . In:

Proceedings of the 2007ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation , PEPM ’07,Association for Computing Machinery, New York, NY, USA, pp. 61–70, doi:10.1145/1244381.1244391.[4] Neil D. Jones, Carsten K. Gomard & Peter Sestoft (1993):

Partial Evaluation and Automatic ProgramGeneration . Prentice-Hall, Inc., Upper Saddle River, NJ, USA.[5] Peter A. Jonsson & Johan Nordlander (2011):

Taming Code Explosion in Supercompilation . In:

Proceedings ofthe 20th ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation , PEPM ’11, Associationfor Computing Machinery, New York, NY, USA, pp. 33–42, doi:10.1145/1929501.1929507. .N. Krustev [6] Ilya Klyuchnikov & Dimitur Krustev (2014):

Supercompilation: Ideas and methods . The Monad Reader

Proving the Equivalence of Higher-Order Terms by Means ofSupercompilation . In Amir Pnueli, Irina Virbitskaite & Andrei Voronkov, editors:

Perspectives of SystemsInformatics: 7th International Andrei Ershov Memorial Conference, PSI 2009, Novosibirsk, Russia, June15-19, 2009. Revised Papers , Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 193–205, doi:10.1007/978-3-642-11486-1 17.[8] Ilya Klyuchnikov & Sergei A. Romanenko (2012):

Multi-result Supercompilation as Branching Growth ofthe Penultimate Level in Metasystem Transitions . In Edmund Clarke, Irina Virbitskaite & Andrei Voronkov,editors:

Perspectives of Systems Informatics , Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 210–226,doi:10.1007/978-3-642-29709-0 19.[9] Ilya G. Klyuchnikov & Sergei A. Romanenko (2010):

Towards Higher-Level Supercompilation . In A. P.Nemytykh, editor:

Proceedings of the Second International Workshop on Metacomputation (META 2010) , pp.82–101.[10] Ilya G. Klyuchnikov & Sergei A. Romanenko (2012):

Formalizing and Implementing Multi-Result Supercom-pilation . In A. V. Klimov & S. A. Romanenko, editors:

Proceedings of the Third International Workshop onMetacomputation (META 2012) , pp. 142–164.[11] Dimitur Krustev (2020):

Controlling the Size of Supercompiled Programs using Multi-result Supercompilation .Available at https://arxiv.org/abs/2006.02204 .[12] Dimitur Nikolaev Krustev (2014):

An Approach for Modular Veriﬁcation of Multi-Result Supercompilers . InA.V. Klimov & S.A. Romamenko, editors:

Proceedings of the Fourth International Valentin Turchin Workshopon Metacomputation , University of Pereslavl Publishing House, Pereslavl-Zalessky, Russia, pp. 177–193.[13] Alexei P. Lisitsa & Andrei P. Nemytykh (2017):

Veriﬁcation of Programs via Intermediate Interpretation . Electronic Proceedings in Theoretical Computer Science

Types and Veriﬁcation for Inﬁnite State Systems . PhD thesis, Dublin CityUniversity, Dublin, Ireland.[15] M. H. Sørensen (1994):

Turchin’s Supercompiler Revisited: an Operational Theory of Positive InformationPropagation . Master’s thesis, Københavns Universitet, Datalogisk Institut.[16] Morten Heine Sørensen & Robert Gl¨uck (1999):

Introduction to Supercompilation . In John Hatcliff, TorbenMogensen & Peter Thiemann, editors:

Partial Evaluation: Practice and Theory , Lecture Notes in ComputerScience