[PDF] Automatic Synthesis of Parallel and Distributed Unix Commands with KumQuat

Abstract

We present KumQuat, a system for automatically synthesizing parallel and distributed versions of Unix shell commands. KumQuat follows a divide-and-conquer approach, decomposing commands into (i) a parallel mapper applying the original command to produce partial results, and (ii) an ordered combiner that combines the partial results into the final output. KumQuat synthesizes the combiner by applying repeated rounds of exploration; at each round, it compares the results of the synthesized program with those from the sequential program to discard invalid candidates. A series of refinements improve the performance of both the synthesis component and the resulting synthesized programs. For 98.2% of Unix commands from real pipelines, KumQuat either synthesizes a combiner (92.2%) or reports that a combiner is not synthesizable (7.8%), offering an average speedup of 7.8\times for the parallel version and 3.8\times for the distributed version.

Full PDF

AAutomatic Synthesis of Parallel and Distributed UnixCommands with KumQuat

Nikos Vasilakis ∗ CSAIL, MIT, USA [email protected]

Jiasi Shen ∗ CSAIL, MIT, USA [email protected]

Martin Rinard

CSAIL, MIT, USA [email protected]

Abstract

We present KumQuat, a system for automatically synthesiz-ing parallel and distributed versions of Unix shell commands.KumQuat follows a divide-and-conquer approach, decom-posing commands into (i) a parallel mapper applying theoriginal command to produce partial results, and (ii) an or-dered combiner that combines the partial results into thefinal output. KumQuat synthesizes the combiner by applyingrepeated rounds of exploration; at each round, it comparesthe results of the synthesized program with those from thesequential program to discard invalid candidates. A seriesof refinements improve the performance of both the syn-thesis component and the resulting synthesized programs.For 98.2% of Unix commands from real pipelines, KumQuateither synthesizes a combiner (92.2%) or reports that a com-biner is not synthesizable (7.8%), offering an average speedupof 7.8 × for the parallel version and 3.8 × for the distributedversion. The Unix shell offers a convenient way to write many com-mon stream processing computations. The shell comes witha wide range of useful commands, which can be composedtogether to obtain useful computations.Unix shell commands typically execute sequentially on asingle machine. This fact leaves the significant performancebenefits available via data parallelism (executing the com-mand in parallel on different parts of the input stream) un-exploited. This model of computation also does not supportcomputations that operate on large datasets available onlyon distributed computing platforms.

KumQuat : We present KumQuat, a new system for auto-matically synthesizing parallel and distributed versions ofUnix shell commands. KumQuat uses active learning [7]: itrepeatedly feeds the computation selected inputs, observesthe resulting outputs, then repeats the process to infer keyproperties about the behavior of the computation and even-tually generate a parallel and/or distributed version.KumQuat works with commands that can be expressedas divide-and-conquer computations with two phases: the ∗ The two marked authors contributed equally to the paper. There is no requirement that the actual internal implementation of thecommands must be structured as a divide-and-conquer computation—therequirement is instead only that the computation that it implements can beexpressed in this way. first executes the original, unmodified command on partsof the input; the second combines the partial results fromthe first phase to obtain the final output. To automate thegeneration of parallel and/or distributed versions, KumQuatautomatically synthesizes the combine operators requiredto implement the second phase.The resulting (automatically generated) parallel compu-tation executes directly in the same environment and withthe same program and data locations as the original sequen-tial command. This version is suitable, for example, for in-creasing the performance of local computations that do notrequire the heavyweight resources of a distributed comput-ing infrastructure. The current KumQuat implementationalso leverages Hadoop’s streaming API, as well as customdata and program transformations, to produce distributedversions that contain synthesized Hadoop-ready code.

Results : Applied to 115 commands, KumQuat either syn-thesizes a combiner (106/115, 92.2%) or reports that a com-biner is not synthesizable (9/115, 7.8%). For 85/106 of synthe-sizable commands (80.2%), KumQuat synthesizes a combinerwithin 35 seconds, checking candidate combiners at a rate of1.3K expressions per second. Compared to KumQuat’s base-line synthesis, refinements lead to significant improvements: > × from type refinements, > × from parallelism, and > × from refinements targeting input generation. On a64-core machine, speedup due to parallel execution averages22.6 × for the data parallel version of the command and 7.8 × after applying the combiner.On a 10-machine cluster, the speedup due to distributedexecution averages 3.8 × . We also observe some I/O boundcommands that incur a slowdown. We note that in manycases the goal of the distributed execution is to support com-putations over data that does not fit in a single computer(or cannot be moved off of the multiple computers in thedistributed platform).We also note that it is possible to further improve theperformance by applying an optimization that eliminatesseveral combiners between pipeline stages [44]. With thisoptimization, which can eliminate 158 out of the 251 gen-erated combiners, experimental results show that we cananticipate full-pipeline speedups that average 6.02 × on a64-core machine [44]. Contributions : We make the following contributions: a r X i v : . [ c s . P L ] D ec onference’17, July 2017, Washington, DC, USA Nikos Vasilakis, Jiasi Shen, and Martin Rinard • Algorithm:

We present a new algorithm that uses activelearning to automatically synthesize combiners for paralleland distributed versions of Unix commands. The resultingsynthesized combiners enable the automatic generationof parallel and distributed versions of standard sequentialUnix commands. • Domain-Specific Language:

We present a domain-spe-cific language for combiner operators. This language sup-ports both the class of combiner operators relevant to thisdomain and the efficient synthesis algorithm for automat-ically generating these combiner operators. • Transformations:

We present program and data trans-formations that augment inputs and outputs for Unix com-mands with identifiers that enable combiners to correctlycombine multiple parallel input streams to successfullyreconstruct correct ordered output streams. • System:

We present KumQuat, a system that uses thesynthesized combiner operators to automatically generatedivide-and-conquer parallel and distributed versions ofUnix commands. • Experimental Results:

We present experimental resultsthat characterize the effectiveness of KumQuat on a set ofbenchmark commands. The results show that KumQuatcan synthesize efficient combiners that improve the per-formance of popular commands by an order of magnitudein under a minute.

Structure : The paper starts with an example illustratinghow KumQuat automatically generates parallel and dis-tributed versions of the above Unix pipeline 𝑝 (§2). It thenpresents the core KumQuat design (§3), and several perfor-mance and correctness refinements that enhance its effective-ness (§4). It then presents KumQuat’s implementation (§5)and evaluation (§6), compares with previous work (§7), andconcludes (§8). We next illustrate how KumQuat exploits data parallelismavailable in commands from the following classic Unix script,which computes term frequencies in a collection of inputdocuments [3]: cat * | tr A -Z a -z | tr -cs a -z '\n ' |sort | uniq -c | sort -rn | head 5 > out

Parallelizing tr : The first processing stage, command trA-Z a-z , transliterates upper- to lower-case characters: kq1 tr A -Z a -z KumQuat starts by generating and feeding some randominputs to the tr command. It observes the (types of the) tr command’s outputs and attempts to extract a coarse, high-level specification 𝜎 for the component’s interface—for ex-ample, that the result contains numerical elements. =? Fig. 1. High-level schematic.

KumQuat infers and generates paralleland distributed versions of a sequential command. The original sequentialcommand is treated as a black box (left). The generated version internallyleverages the black-box command to implement most of the functionality,but synthesizes an appropriate combiner (right) by studying the input-output behavior of both versions.

KumQuat then creates a lightweight scaffolding infras-tructure such as the one presented in Fig. 1. The infrastruc-ture places two copies of the tr command next to each other,ready to receive input and execute in parallel. A final com-biner function, marked in Fig. 1 with a circle, collects theresults and attempts to merge them in a meaningful way.The combiner is based on 𝜎 , with KumQuat focusing oncollapsing the search space as quickly as possible.With the starting sequential setup and the current parallelsetup, KumQuat infers the combiner function by choosing aninput, feeding it to both setups, and comparing the outputs.It gives the sequential version the entire input, which is splitinto two chunks and fed to commands in the parallel setup.To synthesize the combiner, KumQuat attempts severalcandidate operators— e.g. , list concatenation (++) , integeraddition (+) , etc. —as well as compositions of these operators.Comparing the outputs from the sequential and parallel se-tups, KumQuat refines its selection of both the terms formingthe combiner as well as the subsequent inputs. If the resultsdo not match, KumQuat discards the invalid candidate. If theresults match, KumQuat continues to generate more inputsthat may distinguish invalid candidates. When KumQuathas obtained enough information from these executions, itgenerates the parallel and distributed versions. Parallel Version : The generated parallel version executeswithout any modifications to the surrounding environmentnor any additional runtime support. It executes with thesame shell, within the same environment, and with the sameprogram and data locations as the original command. It istherefore suitable for increasing the performance of com-mands that operate on inputs that fit within the user’s com-puter and do not require the heavyweight resources of a largedistributed computing infrastructure. Because the parallelversion operates with no distributed processing overheads(such as startup overheads and overheads associated withmoving inputs and outputs into and out of the distributedinfrastructure), KumQuat’s active learning and generationworks with the parallel version.The generated parallel command is written in tr.par.sh ,a script whose contents are available for inspection by thedeveloper and amount to: We have simplified the script for clarity, but not significantly.2 utomatic Synthesis of Parallel and Distributed Unix Commands with KumQuat Conference’17, July 2017, Washington, DC, USA $C <( cat $IN1 | $M ) <( cat $IN2 | $M )

This generated script works with two processors. Variables

IN1 and

IN2 point to the two input halves, M points to trA-Z a-z , and C points to the synthesized combiner commandimplementing (++) . The combiner is a program expressedin KumQuat’s DSL embedded in Python. import sys , os , functools , utils , kq def comb (a , b ): return kq . concat (a , b) utils . help () s = functools . reduce ( comb , utils . read () , []) utils . out ("". join (s )) In practice, the generated program also depends on severalparameters such as the number of available processors andthe input file sizes. A larger number of available processorsleads to a corresponding number of <( . . . ) constructs andassociated input files. If the original tr A-Z a-z commandreceived its input from a file, then KumQuat splits the inputand populates the IN* variables; otherwise, these variablespoint to named pipes (FIFOs) in the file-system.

Distributed Version : KumQuat also generates a distribut-ed, fault-tolerant version—the current implementation syn-thesizes Hadoop-ready code. It leverages Hadoop’s streamingAPI [17], which supports mapper and reducer functions inlanguages other than Java: commands may be written in aplethora of languages, and the synthesized combiners usePython. KumQuat’s mapper and reducer functions use cus-tom data and program transformations to retrofit order onHadoop’s unordered reducer invocation—namely, by intro-ducing additional metadata and value wrapping-unwrappingpairs. The core of the generated code looks like this: hadoop jar $HLib / hadoop - streaming -*. jar- input $IN - output $OUT- mapper $M - combiner $S - reducer $C

Variables point to similar values as in the parallel case, exceptfor S that points to a custom Hadoop shuffler (§5). Hadooptakes care of partitioning and replicating the input files andintermediary results, as well as the communication betweendifferent parts of the program.The core of the generated functionality—map and com-biner functions—is similar between the parallel and distributedversions. Where they differ is the scaffolding infrastructurebuilt around these two functions (not shown here): the scaf-folding infrastructure for the distributed program is signif-icantly simpler than the parallel version. This simplicity,somewhat counter-intuitive as distribution subsumes paral-lelism, happens because the parallel version has to provideexplicit support for input partitioning, scheduling, and com-munication, which in the distributed version are taken careof directly by Hadoop. Decomposing the Pipeline : Our example Unix script fea-tures several other commands than just tr . Rather than invoking KumQuat for every command, a developer cansimply pass the script to KumQuat as follows: kq -t 200 -p p1 . sh KumQuat statically parses and rewrites the script, prefix-ing each command with a new kq1 command—effectivelyconverting each command and its arguments all into argu-ments of kq1 . This simple rewriting serves a dual purpose.The first is to decompose a single complex pipeline intomultiple stages, each significantly more tractable than theoriginal pipeline, without any manual effort from the devel-oper. By injecting active-learning monitors at the boundariesof each command, KumQuat interposes on each command’sinput-output pairs, generating the command in the way de-scribed earlier. The second purpose is to deal with runtimeexpansion of parameters such as * , ~ , and $() . Expansion isimportant, because KumQuat needs access to a command’sflags in order to invoke it correctly, and challenging, becauseit may require full evaluation. By prefixing a command andits flags with a command, KumQuat effectively inverts theorder of evaluation: parameters are expanded to their fullextent before active learning is run.After parallelizing the commands in the script, KumQuatgenerates a data parallel version that pipelines the executionof the data parallel commands, with each data parallel com-mand followed by a combiner that produces the input for thenext command. It is often possible for KumQuat to furtherenhance the parallelism by removing combiners betweenadjacent parallel commands in the pipeline, with the outputfrom each parallel instance of the first command compris-ing the input for the corresponding parallel instance of thesecond command. We next present the core system design of KumQuat.

In Unix, shell commands are abstract processes that operateon data streams . Streams, in combination with optional ar-guments and carefully designed default values informed bypractical use, are partly responsible for much of the abilityof shell commands and pipelines to express powerful com-putations with little code. Many expressions are simplifiedbecause commands and pipelines are designed to operate onstreams while maintaining minimal, if any, state on the side.Commands often operate on a single element (line) or onpairs of adjacent lines.KumQuat views streams as ordered and finite, and pro-cesses as deterministic and monotonic. These assumptionssimplify KumQuat’s underlying theoretical model. A streamcan be represented as a finite sequence of elements.The Unix stream abstraction (and, more generally, thelanguage of the shell) is weakly uni-typed, representing alldata as lines of text—including numbers, booleans, and other onference’17, July 2017, Washington, DC, USA Nikos Vasilakis, Jiasi Shen, and Martin Rinard primitive types. Thus, at the core, KumQuat models the basetype a of each element as String , the newline character asthe element separator, and the EOF condition as the base nil element. While the Unix shell does not explicitly specifymany other types, individual elements or results can be en-coded using a variety of data-types. For example, wc ’s result

729 5435 42860 counting lines, words, and bytes can beviewed as a 3-tuple of the form ( Int , Int , Int ) .Processes can be thought of as pure functions operat-ing on such lists. For example, the tr command, translit-erating characters, can be viewed as a function with type [ String ] → [

String ] , whereas wc -l can be viewed as afunction with type [ String ] →

Int . Some commands arehigher-order functions: xargs -n 𝜈 is a command that takesa command and provides it as input 𝜈 elements from theinput stream; time takes a command and wraps its resultswith timing metadata. KumQuat’s underlying idea is to decompose the parallel ver-sion of a program 𝑓 into two parts. The first part is comprisedof parallel applications of a mapping function 𝑚 (the mapper)followed by the second part, the application of a reductionfunction 𝑔 (the combiner). Applications of 𝑚 (and, often, 𝑔 )are commutative and associative, which means that theirapplications on partial input can execute in parallel (thisdoes not mean, however, that 𝑚 and 𝑔 can be interleaved asthe two cannot be assumed to be commutative or associativebetween each other). One key enabler behind KumQuat isto assume that the mapper 𝑚 is 𝑓 itself.For clarity of exposition, the following discussion focuseson 2-ary combiners— i.e. , ones with two input arguments—but the analysis generalizes straightforwardly to combinersof any arity (§6). For 2-ary combiners, the goal is to identify 𝑔 such that: the result of a command 𝑓 over a stream ( 𝑖 ++ 𝑖 ) that can be broken into two parts 𝑖 and 𝑖 is the same asindependent piecewise applications of 𝑓 over 𝑖 and 𝑖 whoseresults are combined by a function 𝑔 . That is: 𝑓 ( 𝑖 ++ 𝑖 ) = 𝑔 ( 𝑓 ( 𝑖 ) , 𝑓 ( 𝑖 )) (1)For classes where this transformation is possible, the goalbecomes to identify and synthesize the function 𝑔 .Commands part of a pipeline are often pure functions overtheir input stream and do not feature context-dependent se-mantics. KumQuat’s target data intensive Unix commandsand pipelines, operating as pure functions over streams carry-ing large amounts of data, have this property. KumQuat doesnot target commands that may mutate the pipeline context(which is naturally rare in Unix’s task-parallel pipelines).An important invariant is that the type of the outputsof 𝑓 and 𝑔 must be the same. If Γ ⊢ 𝑓 : 𝑇 → 𝑇 then, byconstruction, Γ ⊢ 𝑔 : 𝑇 → 𝑇 → 𝑇 ; that is, 𝑔 is a specialtype of combiner whose input and output types are the same.When operating on Unix’s weakly streams of type [ String ] , 𝑒 ∈ Expr : = 𝑏 𝑥 𝑥 𝑏, 𝑏 , 𝑏 , 𝑏 ∈ BinOp : = num 𝑛 | concat | concat 𝑑 | first | second | rerun | merge 𝑐 | unwrap - f 𝑑 𝑏 | unwrap - b 𝑑 𝑏 | fuse 𝑑 𝑏 | offset 𝑑 𝑑 𝑏 𝑏 | stitch 𝑑 𝑏 𝑏 | stitch - ht 𝑑 𝑑 𝑏 𝑏 𝑏 𝑛 ∈ NumOp : = + | − | × | / | % 𝑑, 𝑑 , 𝑑 ∈ Delim : = ‘ \ 𝑛 ‘ | ‘ \ 𝑡 ‘ | ‘ ‘ | ‘ , ‘ | ‘ _ ‘ | ‘ : ‘ 𝑐 ∈ Comparator : = 𝑥 , 𝑥 ∈ Variable : = a | b Fig. 2. KumQuat’s synthesis DSL.

The DSL captures the space of synthe-sizable combiners. this invariant aids structural decomposition. Examples oftype information include that wc has spaces between itsresults and that its values are numerical. As seen next, thisproperty is used by the definition of the language (§3.3)describing combiners: all constructs are designed to produceoutputs that have the same formatting as inputs. To capture the space of possible combiners, KumQuat definesand uses a domain-specific language (DSL) presented in Fig. 2.A program in KumQuat’s DSL is an expression, which is abinary operation with parameter variables 𝑥 and 𝑥 . Binaryoperations include numeric operations ( num ), string con-catenations ( concat ), selections ( first and second ), commandexecutions ( rerun and merge ), and delimiter-based compos-ite operations ( unwrap - f , unwrap - b , fuse , offset , stitch , and stitch - ht ). These operators are defined with a variety of nu-meric operators and delimiters. The comparator is a set ofknown flags specific to the current command 𝑓 .Figure 3 presents the big-step execution semantics forthe DSL. An environment 𝜎 maps variable names to theirvalues. In our usage scenarios, variables a and b hold theoutputs of executing command 𝑓 on the first and the secondinput streams, respectively. In other words, given 𝜎 withinformation of the input streams 𝑖 and 𝑖 , 𝜎 ( a ) evaluates to 𝑓 ( 𝑖 ) and 𝜎 ( b ) evaluates to 𝑓 ( 𝑖 ) .The transition function ⇒ maps a DSL expression to itsoutput value. Errors while evaluating a combiner 𝑔 wouldimply that 𝑔 does not always satisfy Eq.1 for command 𝑓 . strToInt converts a string into an integer. intToStr convertsan integer into a string. ++ concatenates two strings. runCmd executes command 𝑓 . unixMerge takes a comparator flagand two strings, then uses the flag to execute the sort -m command to merge the two strings. delFront and delBack each takes a delimiter and a string. delFront removes leadingdelimiters from the string, while delBack removes trailingdelimiters. nil denotes an empty string. splitFirst , splitLast ,and splitLastNonempty each takes a delimiter and a string. utomatic Synthesis of Parallel and Distributed Unix Commands with KumQuat Conference’17, July 2017, Washington, DC, USA 𝜎 ⊢ 𝑥 = ⇒ 𝜎 ( 𝑥 ) 𝜎 ⊢ 𝑥 = ⇒ 𝑣 𝜎 ⊢ 𝑥 = ⇒ 𝑣 𝑏 𝑣 𝑣 = ⇒ 𝑒 𝑣𝜎 ⊢ 𝑏 𝑥 𝑥 = ⇒ 𝑣𝑣 ′ = strToInt 𝑣 𝑣 ′ = strToInt 𝑣 ( num 𝑛 ) 𝑣 𝑣 = ⇒ 𝑒 intToStr ( 𝑛 𝑣 ′ 𝑣 ′ ) concat 𝑣 𝑣 = ⇒ 𝑒 𝑣 ++ 𝑣 ( concat 𝑑 ) 𝑣 𝑣 = ⇒ 𝑒 𝑣 ++ 𝑑 ++ 𝑣 first 𝑣 𝑣 = ⇒ 𝑒 𝑣 second 𝑣 𝑣 = ⇒ 𝑒 𝑣 𝑣 = runCmd ( 𝑣 ++ 𝑣 ) rerun 𝑣 𝑣 = ⇒ 𝑒 𝑣𝑣 = unixMerge 𝑐 𝑣 𝑣 ( merge 𝑐 ) 𝑣 𝑣 = ⇒ 𝑒 𝑣𝑏 ( delFront 𝑑 𝑣 ) ( delFront 𝑑 𝑣 ) = ⇒ 𝑒 𝑣 ( unwrap - f 𝑑 𝑏 ) 𝑣 𝑣 = ⇒ 𝑒 𝑑 ++ 𝑣𝑏 ( delBack 𝑑 𝑣 ) ( delBack 𝑑 𝑣 ) = ⇒ 𝑒 𝑣 ( unwrap - b 𝑑 𝑏 ) 𝑣 𝑣 = ⇒ 𝑒 𝑣 ++ 𝑑 ( fuse 𝑑 𝑏 ) nil nil = ⇒ 𝑒 nil ( offset 𝑑 𝑑 𝑏 𝑏 ) 𝑣 nil = ⇒ 𝑒 𝑣 ℎ , 𝑡 = splitFirst 𝑑 𝑣 ℎ , 𝑡 = splitFirst 𝑑 𝑣 𝑏 ℎ ℎ = ⇒ 𝑒 𝑣 ( fuse 𝑑 𝑏 ) 𝑡 𝑡 = ⇒ 𝑒 𝑣 ′ ( fuse 𝑑 𝑏 ) 𝑣 𝑣 = ⇒ 𝑒 𝑣 ++ 𝑑 ++ 𝑣 ′ ℎ , 𝑡 = splitFirst 𝑑 𝑣 ℎ is nil ( offset 𝑑 𝑑 𝑏 𝑏 ) 𝑣 𝑡 = ⇒ 𝑒 𝑣 ( offset 𝑑 𝑑 𝑏 𝑏 ) 𝑣 𝑣 = ⇒ 𝑒 𝑣 ++ 𝑑 ++ 𝑣𝑡 = splitLastNonempty 𝑑 𝑣 𝑝 , 𝑡 ′ = delPad 𝑡 ℎ ′ , 𝑡 ′′ = splitFirst 𝑑 𝑡 ′ ℎ , 𝑡 = splitFirst 𝑑 𝑣 ℎ is not nil 𝑝 , 𝑡 ′ = delPad ℎ ℎ ′ , 𝑡 ′′ = splitFirst 𝑑 𝑡 ′ 𝑏 ℎ ′ ℎ ′ = ⇒ 𝑒 ℎ 𝑏 𝑡 ′′ 𝑡 ′′ = ⇒ 𝑒 𝑡 𝑝 = calcPad 𝑝 ℎ ′ ℎ𝑣 = addPad 𝑝 ( ℎ ++ 𝑑 ++ 𝑡 ) ( offset 𝑑 𝑑 𝑏 𝑏 ) 𝑣 𝑡 = ⇒ 𝑒 𝑣 ′ ( offset 𝑑 𝑑 𝑏 𝑏 ) 𝑣 𝑣 = ⇒ 𝑒 𝑣 ++ 𝑣 ++ 𝑑 ++ 𝑣 ′ ℎ , 𝑡 = splitLast 𝑑 𝑣 ℎ , 𝑡 = splitFirst 𝑑 𝑣 𝑡 == ℎ 𝑏 𝑡 ℎ = ⇒ 𝑒 𝑣 ( stitch 𝑑 𝑏 𝑏 ) 𝑣 𝑣 = ⇒ 𝑒 ℎ ++ 𝑑 ++ 𝑣 ++ 𝑑 ++ 𝑡 ℎ , 𝑡 = splitLast 𝑑 𝑣 ℎ , 𝑡 = splitFirst 𝑑 𝑣 𝑡 ! = ℎ 𝑏 𝑡 ℎ = ⇒ 𝑒 𝑣 ( stitch 𝑑 𝑏 𝑏 ) 𝑣 𝑣 = ⇒ 𝑒 ℎ ++ 𝑑 ++ 𝑣 ++ 𝑑 ++ 𝑡 ℎ , 𝑡 = splitLast 𝑑 𝑣 𝑝 , 𝑡 ′ = delPad 𝑡 ℎ ′ , 𝑡 ′′ = splitFirst 𝑑 𝑡 ′ ℎ , 𝑡 = splitFirst 𝑑 𝑣 𝑝 , 𝑡 ′ = delPad ℎ ℎ ′ , 𝑡 ′′ = splitFirst 𝑑 𝑡 ′ 𝑡 ′′ == 𝑡 ′′ 𝑏 ℎ ′ ℎ ′ = ⇒ 𝑒 ℎ 𝑏 𝑡 ′′ 𝑡 ′′ = ⇒ 𝑒 𝑡𝑝 = calcPad 𝑝 ℎ ′ ℎ 𝑣 = addPad 𝑝 ( ℎ ++ 𝑑 ++ 𝑡 )( stitch - ht 𝑑 𝑑 𝑏 𝑏 𝑏 ) 𝑣 𝑣 = ⇒ 𝑒 ℎ ++ 𝑑 ++ 𝑣 ++ 𝑑 ++ 𝑡 ℎ , 𝑡 = splitLast 𝑑 𝑣 𝑝 , 𝑡 ′ = delPad 𝑡 ℎ ′ , 𝑡 ′′ = splitFirst 𝑑 𝑡 ′ ℎ , 𝑡 = splitFirst 𝑑 𝑣 𝑝 , 𝑡 ′ = delPad ℎ ℎ ′ , 𝑡 ′′ = splitFirst 𝑑 𝑡 ′ 𝑡 ′′ ! = 𝑡 ′′ 𝑏 𝑡 ℎ = ⇒ 𝑒 𝑣 ( stitch - ht 𝑑 𝑑 𝑏 𝑏 𝑏 ) 𝑣 𝑣 = ⇒ 𝑒 ℎ ++ 𝑑 ++ 𝑣 ++ 𝑑 ++ 𝑡 Fig. 3. DSL Semantics.

The semantics of KumQuat’s synthesis DSL, describing all the synthesizable classes of combiners. splitFirst splits the string into elements separated by thedelimiter, then returns the first element as the first output.It connects the remaining elements using the delimiter asthe second output. splitLast likewise splits the string withthe delimiter, then returns the last element as the secondoutput and returns the remaining substring as the first out-put. splitLastNonempty splits the string with the delimiter,then returns the last nonempty element. delPad removesleading spaces from a string, then returns the number ofremoved spaces as the first output and returns the remainingsubstring as the second output. calcPad takes an integer andtwo strings, where the integer denotes the number of spacesthat pad the first string. It returns the padding needed forthe second string. addPad inserts padding before a string.The DSL operators capture the following broad classes offunctionality required to implement shell commands.

Simple Merge : Combiners that apply a straightforwardoperation on their inputs. Instances of this class includearithmetic operators and basic string operators such as con-catenation. For example, the combiner for the command tr -cs simply concatenates the outputs from divide-and-conquer executions. KumQuat’s DSL supports such com-biners with operators num , concat , first , second , unwrap - f ,and unwrap - b . Sequence Fusion : Combiners that apply a binary operatorpiecewise on input elements separated by a delimiter. Afterapplying the piecewise binary operators, the combiner thenconnects the piecewise results using the original delimiter.For example, the outputs from the command wc -lw containtwo numbers separated by spaces. The combiner for thiscommand sums up the corresponding numbers. KumQuat’sDSL supports such combiners with the operator fuse . Stateful Merge : Combiners that operate on certain inputelements that are separated by delimiters, using merge op-erators that depend on the input state such as the values atcertain locations. An example is the nl command: for ev-ery non-empty element in its input stream, the commandoutputs a tuple whose first element is the element’s indexfrom the start of the stream. The combiner for this command onference’17, July 2017, Washington, DC, USA Nikos Vasilakis, Jiasi Shen, and Martin Rinard uses the first output’s last index to update the second out-put’s indices. KumQuat’s DSL supports such combiners withoperators offset , stitch , and stitch - ht . Command Execution : Combiners that execute an off-the-shelf command as part of the calculation. An example isthe sort command, whose combiner may utilize a standardUnix merge command that takes two pre-sorted streamsand interleaves them into a sorted merged stream. KumQuatsupports such reductions with operators rerun and merge . Given a command 𝑓 , KumQuat synthesizes a combiner 𝑔 such that Eq. 1 holds for all of the observed input streams 𝑖 , 𝑖 . A key property of the synthesis algorithm is that itactively generates inputs 𝑖 , 𝑖 during synthesis to effectivelyeliminate incorrect candidate combiners. Notation : I denotes the set of all possible input streamsfor a command 𝑓 . Each input stream 𝑖 ∈ I is a string thatterminates with a newline character and may contain the fol-lowing ASCII characters: newline, space, tab, lowercase/up-percase letters, and Arabic numerals. The tuple ⟨ 𝑖 , 𝑖 ⟩ de-notes an input pair , where inputs 𝑖 , 𝑖 ∈ I . I × I denotesthe set of all input pairs.An input shape 𝑠 = ⟨ 𝑠 𝐿 , 𝑠 𝑊 , 𝑠 𝐶 ⟩ ∈ Shape specifies the con-figurations for three dimensions of an input: the lines in eachinput as separated by newline characters ( 𝑠 𝐿 ∈ Config ), thewords in each line as separated by spaces ( 𝑠 𝑊 ∈ Config ), andthe characters in each word ( 𝑠 𝐶 ∈ Config ). The configura-tion for each dimension is of the form ⟨ 𝑙, 𝑢, 𝑑 ⟩ and specifiesthree bounds: the minimum element count ( 𝑙 ∈ Int ), themaximum element count ( 𝑢 ∈ Int ), and the percentage ofdistinct elements ( 𝑑 ∈ Percent ) on that dimension. 𝑠 ∈ Shape = Config × Config × Config 𝑠 𝐿 , 𝑠 𝑊 , 𝑠 𝐶 ∈ Config = Int × Int × Percent

An input 𝑖 ∈ I satisfies an input shape 𝑠 ∈ Shape , denoted as 𝑖 ∼ 𝑠 , if 𝑖 conforms to the bounds specified in 𝑠 . An input pair ⟨ 𝑖 , 𝑖 ⟩ ∈ I × I satisfies an input shape 𝑠 ∈ Shape , denotedas ⟨ 𝑖 , 𝑖 ⟩ ∼ 𝑠 , if ( 𝑖 ++ 𝑖 ) ∼ 𝑠 . 𝐾 denotes all possible combiners in KumQuat’s DSL and | 𝑔 | denotes the size of a combiner 𝑔 ∈ 𝐾 , defined as thenumber of variables plus the number of BinOp expansionsin 𝑔 ’s abstract syntax tree (AST). 𝐺 𝑛 = { 𝑔 ∈ 𝐾 | | 𝑔 | ≤ 𝑛 } denotes the set of combiners that are under size 𝑛 .A combiner 𝑔 ∈ 𝐾 is correct with respect to input pairs 𝐼 ⊆ I × I , denoted as 𝑔 ∼ 𝑓 𝐼 , if 𝑓 ( 𝑖 ++ 𝑖 ) = 𝑔 ( 𝑓 ( 𝑖 ) , 𝑓 ( 𝑖 )) for all ⟨ 𝑖 , 𝑖 ⟩ ∈ 𝐼 . 𝐺 𝑓 ,𝐼𝑛 = { 𝑔 ∈ 𝐺 𝑛 | 𝑔 ∼ 𝑓 𝐼 } denotes thecombiners for command 𝑓 that are under size 𝑛 and arecorrect with respect to input pairs 𝐼 . Size 𝑛 is sufficient forcommand 𝑓 if 𝐺 𝑓 , I×I+∞ ≠ ∅ implies 𝐺 𝑓 , I×I 𝑛 ≠ ∅ . An inputpair ⟨ 𝑖 , 𝑖 ⟩ ∈ I × I strengthens input pairs 𝐼 for command 𝑓 at size 𝑛 if 𝐺 𝑓 ,𝐼 ∪{⟨ 𝑖 ,𝑖 ⟩} 𝑛 ⊂ 𝐺 𝑓 ,𝐼𝑛 . Data:

Command 𝑓 , max combiner size 𝑛 Result:

Synthesized combiner 𝑔𝐶 ← AllCandidates ( 𝑛 ) for 𝑟 = , , . . . do 𝐼 𝑟 ← GetEffectiveInputs ( 𝑓 , 𝐶 𝑟 − , RandomShape ()) 𝐶 𝑟 ← FilterCandidates ( 𝑓 , 𝐶 𝑟 − , 𝐼 𝑟 ) if 𝐶 𝑟 = ∅ thenreturn nil endif not MakingProgress ([ 𝐶 , . . . , 𝐶 𝑟 ]) thenreturn GetBestCandidate ( 𝐶 𝑟 ) endend Algorithm 1:

Procedure

Synthesize , which implements KumQuat’score synthesis algorithm. The procedure takes a command and syn-thesizes a combiner for the command that is correct with respect to arange of carefully-chosen input pairs.

Combiner Synthesis : Alg. 1 presents KumQuat’s com-biner synthesis, which takes a black-box command 𝑓 and auser-specified maximum size 𝑛 of candidate combiners. Itstarts by invoking AllCombiners , which prepares the set 𝐺 𝑛 of all combiners under size 𝑛 . The algorithm then per-forms multiple rounds of filtering on the candidates. The 𝑟 -th round ( 𝑟 = , , . . . ) generates a set of input pairs 𝐼 𝑟 , usesthem to filter candidates, and stores the remaining candi-dates in variable 𝐶 𝑟 . We present candidate selection belowand defer input generation for the next section.Alg. 2 presents CheckCandidate , which takes a command 𝑓 , a candidate combiner 𝑔 , and a set of input pairs 𝐼 . It usesthe input pairs to execute command 𝑓 and uses the corre-sponding outputs to check if combiner 𝑔 is plausible. For eachinput pair, CheckCandidate executes command 𝑓 with thedivide-and-conquer inputs to obtain outputs 𝑜 , 𝑜 . The pro-cedure also executes 𝑓 with the full input to obtain the fulloutput 𝑜 . The procedure evaluates 𝑔 with the two divide-and-conquer outputs according to Fig. 3, then compares theresult against the full output. If they always match, the pro-cedure returns true . Otherwise it returns false .Procedure FilterCandidates takes a command 𝑓 , a set ofcandidate combiners 𝐶 , and a set of input pairs 𝐼 . It invokes CheckCandidate for each candidate 𝑔 ∈ 𝐶 to eliminate theincorrect ones, returning the set of remaining candidatesthat are correct with respect to 𝐼 . 𝐶 𝑟 holds the set of combiners that are correct with respectto all seen input pairs, i.e. , the set 𝐺 𝑓 ,𝐼𝑛 where 𝐼 = (cid:208) 𝑟𝑟 ′ = 𝐼 𝑟 ′ .Procedure Synthesize terminates if either (1) no candidatecombiners remain, in which case it returns nil and reports anerror, or (2) no progress is made according to empirical crite-ria, in which case it invokes

GetBestCandidate to pick thebest combiner based on likelihood and performance features,and returns it as the synthesis outcome. Progress criteria areencoded in

MakingProgress (see below).

Input Generation : Recall from Alg. 1 that the synthesizerautomatically generates input streams to characterize the utomatic Synthesis of Parallel and Distributed Unix Commands with KumQuat Conference’17, July 2017, Washington, DC, USA Data:

Command 𝑓 , candidate combiner 𝑔 , input pairs 𝐼 Result:

Whether 𝑔 ∼ 𝑓 𝐼 for ⟨ 𝑖 , 𝑖 ⟩ ∈ 𝐼 do 𝑜 ← Execute ( 𝑓 , 𝑖 ) 𝑜 ← Execute ( 𝑓 , 𝑖 ) 𝑜 ← Execute ( 𝑓 , 𝑖 ++ 𝑖 ) if Evaluate ( 𝑔, 𝑜 , 𝑜 ) ≠ 𝑜 thenreturn false endendreturn true Algorithm 2:

Procedure

CheckCandidate , which checks whether acandidate is correct with respect to a set of input pairs. This procedureis invoked by

FilterCandidates . Data:

Command 𝑓 , candidate combiners 𝐶 , input shape 𝑠 Result:

Input pairs, generated from mutating 𝑠 , thateffectively eliminate incorrect candidates in 𝐶𝐼 ← Empty set for 𝑚 = , . . . , 𝑀 dofor 𝑗 = , . . . , do 𝑠 𝑗𝑚 − ← MutateShape ( 𝑠 𝑚 − , 𝑗 ) 𝐼 𝑗𝑚 − ← RandomInputPairs ( 𝑠 𝑗𝑚 − ) Add 𝐼 𝑗𝑚 − to 𝐼 end 𝑗 ′ ← IndexBestMutation ( 𝐶, 𝐼 𝑚 − , . . . , 𝐼 𝑚 − ) 𝑠 𝑚 ← 𝑠 𝑗 ′ 𝑚 − endreturn 𝐼 Algorithm 3:

Procedure

GetEffectiveInputs , which mutates inputshapes to generate effective input pairs. desired combiner behavior. A key goal of input generation isto generate a variety of input streams that enable the black-box command to exercise a wide range of its functionality.KumQuat generates inputs with an active learning algorithm.The algorithm is driven by mutations to an input shape, fromwhich KumQuat generates random inputs. The mutationsare chosen by how effectively their resulting inputs eliminateincorrect candidate combiners.Alg. 3 presents KumQuat’s input generation algorithm.Procedure

GetEffectiveInputs takes a black-box command 𝑓 , a set of candidate combiners 𝐶 , and an initial input shape 𝑠 ; it mutates the input shape iteratively, generating inputstreams along the way.The iterative mutation process is inspired by gradient de-scent. The 𝑚 -th iteration ( 𝑚 = , . . . , 𝑀 ) mutates the inputshape 𝑠 𝑚 − using one of twelve potential mutations. Thesepotential mutations are along three dimensions (lines, words,and characters) and four directions (more/fewer elements,more/less homogeneous). Procedure MutateShape takes aninitial input shape and a mutation index, then returns anew input shape mutated as specified. For the 𝑗 -th poten-tial mutation ( 𝑗 = , . . . , ), Alg. 3 uses the mutated inputshape 𝑠 𝑗𝑚 − to generate a set of input pairs 𝐼 𝑗𝑚 − by invoking RandomInputPairs (see below). Our current implementa-tion generates five such input pairs each time.Procedure

RandomInputPairs takes an input shape andgenerates a set of random input pairs that each satisfy the in-put shape. In other words, the variable 𝐼 𝑗𝑚 − satisfies ⟨ 𝑖 , 𝑖 ⟩ ∼ 𝑠 𝑗𝑚 − for all ⟨ 𝑖 , 𝑖 ⟩ ∈ 𝐼 𝑗𝑚 − . A complication when generatinginputs for black-box commands is that certain commandsmay crash or produce trivial outputs when the inputs donot meet certain properties. For example, comm commandprints an error if its input lines are not sorted according toa command-line flag and grep produces no output if theinput does not match its regular expression. To address theseproblems, KumQuat supports generating sorted inputs orones based on a regular expression.Procedure GetEffectiveInputs then evaluates the effec-tiveness of all of the input shape mutations. It identifies themost effective one with

IndexBestMutation , which takes aset of candidate combiners and twelve sets of input pairs. Foreach set of input pairs, it invokes

FilterCandidates to evalu-ate how effective the input pairs are at eliminating incorrectcandidates. It returns the index, 𝑗 ′ , of the most effective set.The 𝑗 ′ -th mutation then produces the input shape for thenext iteration, 𝑠 𝑚 = 𝑠 𝑗 ′ 𝑚 − , in GetEffectiveInputs . The proce-dure repeats these operations for 𝑀 iterations. Our currentimplementation adopts an empirical upper bound, 𝑀 = .Finally the procedure returns the set of all observed inputpairs, 𝐼 = (cid:208) 𝑗,𝑚 𝐼 𝑗𝑚 − .Alg. 1’s iterative rounds are designed to identify new in-put pairs that strengthen previous observations. When suchinput pairs abound, they are often identified quickly. Weencode this empirical pattern in the MakingProgress proce-dure, which takes a sequence of sets of candidate combinersand returns whether the sequence indicates that the algo-rithm may continue to identify more inputs that strengthenprevious observations. In our current implementation, thisprocedure returns true if the two most recent rounds elimi-nated at least one candidate.

Design Rationale : KumQuat generates inputs to executea black-box command, whose behavior indirectly specifiesthe desirable behavior of its combiner. As a result, candidatecombiners that are not equivalent as stand-alone DSL pro-grams may turn out to be equivalent for the specific purposeof combining a command’s divide-and-conquer executions.Even when only one equivalent candidate combiner remains,it is still generally infeasible to determine whether this candi-date may be eliminated by another input pair. We thereforedecided to stop generating inputs only after KumQuat stopsmaking progress for an empirically long enough time.The algorithm takes advantage of the domain knowledgeof typical Unix commands, many of which process inputswith certain properties that effectively expose the nontrivialcommand functionality— e.g. , uniq deduplication function-ality primarily for input streams where multiple adjacent onference’17, July 2017, Washington, DC, USA Nikos Vasilakis, Jiasi Shen, and Martin Rinard nonempty lines are equal. Based on this domain knowledge,KumQuat generates inputs by searching for an effective in-put shape. The algorithm generates some inputs, observesthe command, determines what input properties are impor-tant, then uses these properties to generate more inputs.Candidate combiners may contain conditional branches( e.g. , offset , stitch , and stitch - ht ), but all candidates are notfully distinguishable by inputs that exercise only one branch.KumQuat addresses this challenge by using various initialinput shapes for multiple rounds of input generation.This form of active learning enables KumQuat to synthe-size combiners efficiently. This section describes a series of refinements that aim atimproving the efficiency and effectiveness of the synthesizeras well as that of the synthesized programs.

A few extensions on the core synthesis algorithm have thepotential to improve KumQuat’s performance significantly.

Type Guidance : Our current implementation prunes thesearch space (variable 𝐶 ) in Alg. 1 with a set of initial com-mand executions that imply the command’s output types.KumQuat currently focuses on two types: delimiters and val-ues. If the initial command outputs do not contain a specificdelimiter, e.g. , ':' , KumQuat notes this type informationand discards candidate combiners that use this delimiter. Sim-ilarly, if the initial outputs do not contain Arabic numbers,KumQuat notes this information to discard candidate com-biners that involve numerical operations. These refinementsprune the synthesis search space significantly.A complication with delimiters is that they may containduplicates of varying lengths, often due to padding. KumQuatfirst attempts to synthesize a combiner with the type guid-ance described above. If this synthesis fails, it retries afterrewriting each group of consecutive spaces in the commandoutputs into the '\t' delimiter. Term Weights : Different combiners appear with a differ-ent likelihood. For example, many combiners use concat , fol-lowed closely by concat d , in turn followed by num . KumQuatuses term likelihood to rank the satisfying candidate com-biners at the end of Alg. 1. This ranking often produces themost succinct and correct candidate as the best one.A special case is commands that may serve as their owncombiners, that is, the combiner 𝑔 satisfies 𝑓 ( 𝑖 ++ 𝑖 ) = 𝑔 ( 𝑓 ( 𝑖 ) , 𝑓 ( 𝑖 )) = 𝑓 ( 𝑓 ( 𝑖 ) ++ 𝑓 ( 𝑖 )) for all input streams 𝑖 , 𝑖 . A trivial DSL combiner would use rerun . However,re-executing the original command 𝑓 over the divide-and-conquer execution outputs can result in significant overheadsdue to constant costs ( e.g. , process fork etc. ) as well as repeat-ing work, foregoing many of the parallelism gains. KumQuathence de-prioritizes such candidate combiners due to poor performance. In our experiments (§6), candidate combinersthat use rerun often take two orders of magnitude longer toexecute than combiners that do not. Parallel Synthesis : KumQuat’s synthesis features ampleopportunities for parallelization. One opportunity occursin candidate generation, in which different worker replicascan explore disjoint subsets of the candidate space. Anotheropportunity occurs in input generation and testing.As scaling out involves constant overheads for processspawning and interprocess communication, scaling out makessense only after constant costs are negligible relative to syn-thesis. This is achieved by having KumQuat scale out aftera few AST levels have been explored.

We now turn to refinements related to distributed executionand side effects.

Mapping onto MapReduce : KumQuat’s map-combinerdecomposition is subtly different from the one in the dis-tributed MapReduce paradigm [10]. By default, MapReducedoes not maintain input order: inputs to map are key-valuepairs and outputs are grouped by key before handed offto the reducer. To leverage a MapReduce implementation,KumQuat must first graft ordering atop MapReduce.KumQuat therefore augments MapReduce operations,starting with a custom wrap-unwrap function pair that op-erates at a few crucial points (Fig. 4). This pair transformsstreams to augment them with ordered identifiers. As theseidentifiers need to respect global order, the first wrap is ap-plied before the start of the MapReduce job— e.g. , when thestream is in transit to the distributed file system.Identifiers can be placed in-band, as long as the KumQuat-provided map and reduce functions take care of wrappingand unwrapping so that the original command and synthe-sized combiner operate on the raw elements of the stream.KumQuat’s map thus first applies unwrap , then calls theblack-box command, and finally applies wrap . On ingress, unwrap splits the stream into two parallel streams, one foridentifiers and one for raw data values. On egress, wrap merges the two streams into one, ending the identifier streamearly when the output value stream has fewer elements thanthe identifier stream— e.g. , wc .These identifiers are not only ordered but also unique, toprohibit MapReduce from grouping elements together: eachstream is processed by the reducer in due order. To completethe picture, KumQuat provides a custom partitioner for sort-ing (rather than hashing) identifiers during MapReduce’sshuffling phase. Side Effects : Side-effectful commands are ones that modifysome state outside their output streams (the main effect).Broadly, such state falls into two classes, both of which posesignificant challenges for parallel and distributed systems. utomatic Synthesis of Parallel and Distributed Unix Commands with KumQuat Conference’17, July 2017, Washington, DC, USA in ininin m unwrap wrap unwrap g … … wrap out MR’scombiner kq’s combinerMR map MR reduce … X, ✕ , ☓ , ✖ , ✗ , ✘ , ✓ , ✔ , ✔ ✘ ✔ m unwrap wrapMR map m unwrap wrapMR map Fig. 4. KumQuat’s distributed implementation over MapReduce’s core abstractions.

To be able to use the abstractions made available by a MapReduceimplementation, KumQuat wraps stream elements with ordered identifiers. Wrappers around map and reduce take care of wrapping and unwrapping elementsand a custom partitioner takes care of partitioning data by its identifiers.

The first class comprises side-effects that fall outside thecommand’s main memory—for example, writes to the filesystem, environment variables, or requests over the network.These side-effects are particularly challenging for the syn-thesizer to deal with, exactly because they are outside themonitored interface—even if two commands result in thesame stdout and stderr streams for the same inputs, theymight encode different computations. s s syscall tracingstdoutstderr s t d i n To address this challenge,KumQuat wraps a commandwith system-call tracing in-frastructure that reports allsystem calls during its execution—excluding accesses to stan-dard input, output and error streams. Applied online on everyinput, this wrapping could decelerate the synthesizer signifi-cantly. To avoid such slowdown, KumQuat wraps only a sin-gle invocation of a command, which completes concurrently(and always before) the synthesis. Note that exceptions inUnix are typically signaled through either the error streamor the exit code, both of which KumQuat collects.The second class of side-effects are in-memory ones—forexample, the fact that sort maintains a sorted list of inputsand that wc maintains a few counters. Fortunately, such side-effects are still encoded within stdout and stderr streamsin KumQuat’s domain: server or daemon commands wouldbe challenging to deal with, but are not typically found asintermediate phases of data-processing pipelines. Core System : KumQuat combines several componentswritten in various languages. The active learning and gen-eration components are written in Python, implementingthe algorithms and DSL presented in this paper. The baseset of terms as well as the resulting combiners are evaluatedby a small interpreter written in Python. Synthesized com-biners link against an 80-line utility library that providesruntime support— e.g. , for parsing stream descriptors pro-vided as arguments (§2), reading streams with partial results,and applying the combiner.For pipelines that incorporate more than one commands,KumQuat first calls into Smoosh’s OCaml bindings [19]. Smoosh passes the script’s AST as JSON to KumQuat, whichperforms light transformations that decompose the pipelineinto individual stages, and prefix each command with aKumQuat-provided kq1 command (§2). The kq1 commandsupplies the original command and its arguments to the ac-tive learning and generation core. KumQuat finally feedsthe generated AST back to Smoosh, which converts it backto a POSIX-compliant string.To check for side effects outside main memory, KumQuat’scurrent implementation prefixes commands with a call to strace [26]. This reports any system calls outside accessesto the input, output, and error streams.

Parallel and Distributed Drivers : For the parallel ver-sion, a driver orchestrates execution by splitting input intomultiple chunks, creating the necessary FIFOs, passing par-tial inputs to maps and redirecting their output to the FIFOs,and passing FIFOs as arguments to the combiner.For the distributed version, the driver reads from stdin and writes to stdout . The wrap function is a call to nl withcustom delimiters. The unwrap function is a combination of tee with two sed s, each applied to one of tee ’s resultingstreams to demultiplex the identifier stream from the datastream. For convenience, HDFS’s put and get are modifiedto call wrap and unwrap on the stream to/from HDFS. We use a series of benchmarks to evaluate the performanceof KumQuat as well as that of the synthesized parallel anddistributed programs.

Methodology : We identify a total of 115 data-processingcommands from three sets of benchmarks: • The top 100 most popular shell pipelines from GitHub asranked by popularity [18]. This set results in 42 commandinstances. • Unofficial GitHub solutions to Unix50, a pipeline-constructiongame developed by Bell Labs [4]. These pipelines result in76 command instances. • Four Unix pipelines collected from the literature [2, 3,23, 35], resulting in 29 command instances (7, 8, 8, and 8commands, respectively). onference’17, July 2017, Washington, DC, USA Nikos Vasilakis, Jiasi Shen, and Martin Rinard Tab. 1. Evaluation results for 115 benchmarks.

We group benchmarks into 25 bins based on their characteristics in terms of (1) active learning andgeneration, and (2) runtime performance (Cf. §6). B i n I D B i n S i z e Example Command S y n t h e s i s T i m e S u ffi c i e n t A S T S i z e T o t a l C a n d i d a t e s S a t i s f y i n g C a n d i d a t e s S e q . E x e c u t i o n P a r . E x e c u t i o n S e q / P a r C m b . E x e c u t i o n S e q / ( P a r + C m b ) Synthesized Ops; Requirements1 37 awk '{print $2, $0}' 𝑠 𝑠 𝑠 × 𝑠 × concat awk 'BEGIN{print ""}{print}' 𝑠 𝑠 𝑠 × 𝑠 × unwrap - f , concat comm -12 - $DICT 𝑠 𝑠 𝑠 × 𝑠 × rerun ; sorted input4 22 cut -d '(' -f 2 𝑠 𝑠 𝑠 × 𝑠 × concat ; special chars5 2 fmt -w1 𝑠 𝑠 𝑠 × 𝑠 × concat grep $REGEX 𝑠 𝑠 𝑠 × 𝑠 × concat ; regex, input bound7 1 grep -c $REGEX 𝑠 𝑠 𝑠 × 𝑠 × unwrap - b , num + ; regex8 2 head -n 1 𝑠 𝑠 𝑠 × 𝑠 × rerun head -n 2 𝑠 𝑠 𝑠 × 𝑠 × first

10 1 nl 𝑠 𝑠 𝑠 × 𝑠 × offset , num + , second

11 8 sort -n -k 2 𝑠 𝑠 𝑠 × 𝑠 × merge ( ′ − k2n ′ )

12 1 tac 𝑠 𝑠 𝑠 × 𝑠 × concat (reverse variables)13 1 tail -n 1 𝑠 𝑠 𝑠 × 𝑠 × rerun

14 2 tr -c '[A-Z]' '\n' 𝑠 𝑠 𝑠 × 𝑠 × concat

15 1 tr -cs '[a-z][A-Z]' '\n' 𝑠 𝑠 𝑠 × 𝑠 × rerun

16 4 tr A-Z a-z 𝑠 𝑠 𝑠 × 𝑠 × concat

17 2 uniq 𝑠 𝑠 𝑠 × 𝑠 × rerun

18 — " — 43.2 𝑠 𝑠 × stitch , first , concat

19 2 uniq -c 𝑠 𝑠 𝑠 × 𝑠 × stitch - ht , num + , first , concat

20 9 wc -cl 𝑠 𝑠 𝑠 × 𝑠 × unwrap - b , unwrap - f , fuse , num +

21 5 wc -l 𝑠 𝑠 𝑠 × 𝑠 × unwrap - b , num +

22 2 xargs 𝑠 𝑠 𝑠 × 𝑠 × rerun Minimum (over 106 commands) 28.4 𝑠 𝑠 𝑠 × 𝑠 × Maximum (over 106 commands) 13718 𝑠 𝑠 𝑠 × 𝑠 × Average (over 106 commands) 194.2 𝑠 𝑠 𝑠 × 𝑠 ×

23 6 uniq -d, sed 1d 𝑠 Commands violate Eq. 124 1 comm -23 - $DICT 𝑠 Command violates Eq. 125 1 grep -in $REGEX 𝑠 Command has no DSL combiners under size 626 1 comm -2 - $DICT 𝑠 Command has no DSL combiners under size 6

For each set, we extract all the commands these benchmarksemploy and filter out (1) any commands not in our PATHsuch as csvcut , (2) commands with side effects, such as cp or mv , (3) commands with multiple inputs or outputs, suchas paste or tee . The total is 115 command instances, i.e. ,unique combinations of commands and flags. Setup : KumQuat and all parallel programs run on a serverwith 0.5TB of memory and 40 physical (80 virtual) × × i3.largeAWS EC2 nodes, with Hadoop 3.2.1 running on Oracle’s JDK1.8.0_251; the rest of the software setup was identical to theserver used for parallel experiments. Commands are evalu-ated on the Wikipedia corpus (150GB), with a 654K-word american-insane used as a dictionary for commands suchas comm . Command Breakdown : We group commands that aresimilar—based on characteristics of their input generation, synthesis, and runtime execution—into 25 bins. The size ofeach bin is the number of similar commands in the bin.Tab. 1 presents results. The first 22 rows each correspondto a bin whose commands have successful synthesized com-biners. These bins contain 106 commands, for which wereport the minimum, maximum, and average statistics (nextthree rows). The last four rows each correspond to a bin forwhose commands KumQuat did not synthesize combinersbut reported errors. The first three columns describe the bins,the next four present combiner synthesis statistics, the nextfive present execution time statistics, and the last describesthe synthesized combiners and the commands’ requirements.

Synthesis Results : The fourth column (“Synthesis Time”)presents the wall-clock time needed to synthesize a satisfyingcombiner with the smallest sufficient size (the fifth column).The sixth column (“Total Candidates”) presents the size of thesearch space, that is, the number of all DSL combiners underthe specified size after pruning with type guidance (§4.1). Thenext column (“Satisfying Candidates”) presents the numberof candidate combiners that are correct with respect to all utomatic Synthesis of Parallel and Distributed Unix Commands with KumQuat Conference’17, July 2017, Washington, DC, USA of the observed input pairs. The last column (“SynthesizedOps; Requirements”) describes the combiners synthesized byAlg. 1, as well as the commands’ input format requirements.For 112/115 (97.4%) of commands KumQuat either synthe-sizes a combiner (106/115, 92.2%) or reports that a combineris not synthesizable (6/115, 5.2%). Overall, the synthesis timecorrelates positively with the size of the search space, whichincreases for each command as the specified AST size in-creases. All except one of these commands (111/112, 99.1%)took KumQuat less than 8 minutes to synthesize a combiner.The only exception is a grep command (bin 6), which usesan expensive regular expression that evaluates slowly oncertain inputs generated by KumQuat.For most of the successful commands (86/106, 81.1%) theAST size 3 is sufficient. For all but one of these commands,KumQuat synthesizes a combiner within 35 seconds. Theonly exception is the comm -12 command (bin 3), whichproduced only empty outputs for certain input shapes in ourexperiments. In this scenario our KumQuat implementationinvokes Alg. 3 with various input shapes until observingnontrivial behavior from the command.We manually inspected the synthesized combiners. Thehighest-ranked combiners are all correct with respect to allpossible inputs.Command uniq is an interesting case where KumQuatreturns different highest-ranked combiners depending onthe specified maximum AST size. When the size limit is 3 or4, KumQuat synthesizes a combiner whose core is the term rerun . It is correct but operates on all lines of the partialoutputs, which would lead to bad performance. When thesize limit is 5 or 6, KumQuat synthesizes a combiner thatuses stitch , first , and concat . While the new combiner con-tains more terms, it performs significantly better because itoperates only at the boundaries of the partial outputs.The last four rows of Tab. 1 contain nine commands forwhich KumQuat did not synthesize combiners. For thesecommands KumQuat often eliminated all candidate combin-ers in the search space and reported an error. We report thetime that KumQuat took to report errors for the maximumAST size 6. For most of these commands, KumQuat correctlyreports an error within four minutes.Seven of these commands (bins 23, 24) are not paralleliz-able in the most general form (Eq. 1). The reason is that thereexists corner-case input pairs that produce counterexamplesfor this equation. An example is the uniq -d command,which outputs only the lines that are repeated in the input.A corner-case input pair may contain two identical lines,one as the last line in the first input stream and the other asthe first line in the second input stream. The concatenationof these two input streams would cause the command tooutput the repeated line, which does not appear in any ofthe divide-and-conquer execution outputs. The remainingtwo commands (bins 25, 26) are potentially parallelizable, butthey do not have combiners in the specified search space. The reason is that these commands produce structured outputsthat would require a potential combiner to perform complexstring parsing and reformatting operations that are not ef-fectively captured by the current DSL. In our experimentsKumQuat successfully generated corner-case input pairsfor these commands, concluding that the observed behaviormatches no candidates in the search space. Execution Time Results : Tab. 1’s columns 8–12 are re-lated to execution time over 100GB input, except 10GB inputfor all grep commands and 1GB input for all xargs com-mands. Parallel execution is split between the parallel-onlysegment (“Par. Execution”) and the combiner segment (“Cmb.Execution”), with the speedup gained by the former shownin “Seq / Par” and by the combination of the former and latterin “Seq/(Par+Cmb)”.The speedup due to parallel execution averages 22.6 × forthe data parallel only execution and 7.8 × for the parallel ver-sion plus the combiner. For a few commands such as grep thespeedup is near-linear—47.5 × and 42.2 × , respectively. Othercommands see sub-linear speedups, partly because somecommands are I/O-bound and partly because the combinerexercises limited parallelism. A few command instances havenear-instant sequential execution and thus decelerate whenthe combiner is applied: for example, parallelizing head endsup adding work as the combiner could have itself filtered outthe necessary rows by rerun . Their execution still remainswithin a few seconds from the sequential execution.The distributed experiments indicate lower speedups (notshown), averaging . × (and peaking at . × ). To bettertest whether some of the speedup limitations are due toHadoop’s streaming interface, we compared with native im-plementations expressed in Java for two benchmarks: wc and grep . KumQuat’s runtime is about 47% slower; over-heads are due to KumQuat’s ordering transformations andHadoop’s streaming interface [11]. Parallel Pipelines and Combiner Removal : Unix com-mands are often composed into pipelines to accomplish a de-sired task [23, 27]. Leveraging the synthesized parallel com-mands, KumQuat generates a hybrid series/parallel+pipelineparallel execution of Unix pipelines. In this execution, eachparallel command is followed by its combiner, which pro-duces the input for the next parallel command in the pipeline.In addition to data parallelism available within each parallelcommand, the resulting computation also exploits pipelinedparallelism available between different parallel commands.In many cases it is also possible to remove intermedi-ate combiners between data parallel commands. For exam-ple, when the combiner is concat it is possible to combine map stages and apply concat only once after the last map stage. Experimental results show that, depending on thecharacteristics of the pipeline, combiner removal can de-liver additional significant parallel speedups [8, 12, 25, 44]. onference’17, July 2017, Washington, DC, USA Nikos Vasilakis, Jiasi Shen, and Martin Rinard For the Unix50 pipelines and four pipelines from the lit-erature, it is possible to remove the combiners for 158 ofthe 251 pipeline stages. For computations with relativelysmall parallel pipeline stages, this combiner removal canoffer significant additional parallel performance. Specifically,experimental results from the PaSh system [44], which cur-rently uses manually coded combiners and implements anoptimization that removes combiners between adjacent par-allel phases, indicates that this optimization can produceparallel versions of the Unix 50 benchmarks that exhibitparallel speedups of 6.02 × . Micro-benchmark—Tracing : To understand the overheadsof system call tracing, we apply it on successfully synthe-sized commands. For each command, the output written to /dev/null and the output of strace was written to /dev/shm (memory-mapped file-system). The average slowdown is . × ; this overhead is insignificant relative to KumQuat’soverall runtime performance: even for 100MB input, thehighest observed was . 𝑠 for a command that without strace took . 𝑠 . Micro-benchmark—Refinements : To understand the ben-efits of various optimizations is challenging due to the highsynthesis times. We use the synthesis of cat ’s combiner withan AST size of 8 to measure the impact of different synthe-sis refinements. Without type refinements, the search spacecontains 335636 terms; with type refinements, these dropto 5810 (reduction: 57.7 × ). Sequential synthesis takes 273 𝑠 ,dropped to 75.9 𝑠 by parallel synthesis (3.64 × ). We discuss related work in Unix synthesis, synthesis ofdivide-and-conquer computations, synthesis of distributedsystems, program synthesis driven by provided input/outputexamples, and active learning of computer programs.

Unix Synthesis : Prior work on synthesis for Unix shellcommands and pipelines [5, 9] is guided by examples ornatural-language specifications. Instead of automatically gen-erating parallel or distributed versions of an existing com-mand or pipeline, the goal was to synthesize the sequentialcommand itself from examples or natural language specs.

Divide and Conquer Decomposition : Prior work focusedon decomposing programs or program fragments using divide-and-conquer techniques [13, 14, 33, 41]. The majority of thiswork focuses on parallelizing special constructs— e.g. , loops,matrices, and arrays—rather than stream-oriented primitives.In some cases [13, 14], the map phase is augmented to main-tain additional metadata used by the reducer phase; this isantithetical to KumQuat’s approach, which explicitly leavesthe unmodified original command as the map phase.Of particular relevance is the synthesis of MapReduce-style distributed programs [41]. This work focuses on syn-thesizing entire programs from input-output pairs rather inferring only the reducer function from an existing im-plementation—applied in scenarios where a programmerdevelops the full distributed computation by preparing ex-amples. It also enables a much larger decomposition spacethat includes flatMap , reduceByKey , filterBy , and as a re-sult requires a much more expressive underlying frameworksuch as Spark [47]. KumQuat can, in principle, synthesize thedistributed computation for any framework that follows thebasic MapReduce paradigm, as long as the framework pro-vides support for black-box streaming operations—for whichHadoop provides a simple interface. KumQuat also addsmetadata for ordering, which generally in non-streamingsystems and specifically in [41] is not a requirement. Synthesizing Distributed Systems : There is prior workon synthesizing features of distributed systems [6, 16, 24, 34],for example to automatically infer fences and repair pro-grams in the context of distributed environments. KumQuatis different both in application and technique: it synthesizesdata-parallel programs from black-box Unix commands, asopposed to modifications or augmentations of existing (andoften white-box) programs or program fragments.

Input-Output Synthesis : KumQuat overlaps (and buildsupon) techniques explored in the program inference andsynthesis communities [1, 15, 20, 22, 29], and particularlyprogramming-by-example [31, 40, 45]. These works assumea different usage scenarios—for example, many approachesrequire the user to provide a set of examples. KumQuat per-forms program inference by interacting with and observingthe behavior of a full application. Rather than rely on a fixedcorpus of input-output examples, the application’s executionis used to define the target specification. Incorporating activelearning allows these systems to successfully and efficientlyexpand the input-output examples automatically as needed.

Active Learning : Active learning is a classical topic inmachine learning [36]. In the context of program inference,it includes learning (and generating) programs that inter-act with relational databases [37] or key/value stores [32],oracle-guided synthesis for loop-free programs [22], andtechniques for pruning the search space in synthesis target-ing Datalog [39]. KumQuat, in contrast, works with Unixcommands to synthesize merge operations over streams pro-duced by divide-and-conquer computations for automaticparallelization and/or distribution of the computation.

Commands and Shells : There is a series of systems thataid developers in running commands or script fragments ina parallel or distributed fashion. These range from simpleUnix utilities [38, 43, 46] to parallel/distributed shells [28,42] to data-parallel frameworks that incorporate Unix com-mands [17, 21] These tools require developers to modify pro-grams to make use of the tools’ APIs, in contrast to KumQuat,which aims to provide an automated solution that works di-rectly on the original sequential command. utomatic Synthesis of Parallel and Distributed Unix Commands with KumQuat Conference’17, July 2017, Washington, DC, USA The PaSh and POSH systems parallelize and distributeUnix shell scripts mostly automatically [30, 44], but theircurrent implementations work with manually coded com-biners. PaSh can also work directly with the automaticallysynthesized KumQuat combiners. This approach eliminatesthe need for manual coding and can enable the immediateextension of PaSh to work with new commands that requirenew combiners. PaSH also implements an optimization forremoving combiners between adjacent parallel stages, whichcan improve the performance of a pipeline significantly.

This paper presented KumQuat, a system for automaticallysynthesizing parallel and distributed versions of Unix com-mands. KumQuat decomposes commands into two phases:(i) a parallel mapper that applies the original command toproduce partial results, and (ii) an combiner that combinesthe partial results and is synthesized by KumQuat. To syn-thesize the combiner, KumQuat applies program inferenceon the command—with a key insight being that the parallelspecification to be inferred is the already-available sequentialcommand. KumQuat synthesizes the merger by applying re-peated rounds of exploration; at each round, it compares theresults of the synthesized program with those from the se-quential program to discard invalid candidates. KumQuat’simplementation supports two back-ends. A parallel back-endaccelerates execution on a single host without any additionalruntime support. A distributed back-end offers program-and data-transformation wrappers to bolt KumQuat’s twophases on Hadoop, offloading several distributed-systemschallenges to a production-grade distributed computing run-time. KumQuat can offer significant parallel speedups andsynthesizes combiners for most of our benchmark commandswithin a minute.

Acknowledgments

We are thankful to Konstantinos Kallas, Claudia Zhu, ShivamHanda, and Konstantinos Mamouras for interesting discus-sions. This research was funded in part by DARPA contractsHR00112020013 and HR001120C0191. Any opinions, findings,conclusions, or recommendations expressed in this materialare those of the authors and do not necessarily reflect thoseof DARPA.

References [1] Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo MK Martin, MukundRaghothaman, Sanjit A Seshia, Rishabh Singh, Armando Solar-Lezama,Emina Torlak, and Abhishek Udupa. 2013. Syntax-guided synthesis.In . IEEE, 1–8.[2] Jon Bentley. 1985. Programming Pearls: A Spelling Checker.

Commun.ACM

28, 5 (May 1985), 456–462. https://doi.org/10.1145/3532.315102 [3] Jon Bentley, Don Knuth, and Doug McIlroy. 1986. Programming Pearls:A Literate Program.

Commun. ACM

29, 6 (June 1986), 471–483. https://doi.org/10.1145/5948.315654 [4] Pawan Bhandari. 2020.

Solutions to unixgame.io . https://git.io/Jf2dn Accessed: 2020-04-14.[5] Sanjay Bhansali and Mehdi T Harandi. 1993. Synthesis of UNIX pro-grams using derivational analogy.

Machine Learning

10, 1 (1993),7–55.[6] Borzoo Bonakdarpour and Sandeep S. Kulkarni. 2008. SYCRAFT: ATool for Synthesizing Distributed Fault-Tolerant Programs. In

CONCUR2008 - Concurrency Theory , Franck van Breugel and Marsha Chechik(Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 167–171.[7] José P. Cambronero, Thurston H. Y. Dang, Nikos Vasilakis, Jiasi Shen,Jerry Wu, and Martin C. Rinard. 2019. Active Learning for Soft-ware Engineering. In

Proceedings of the 2019 ACM SIGPLAN Inter-national Symposium on New Ideas, New Paradigms, and Reflectionson Programming and Software (Athens, Greece) (Onward! 2019) . As-sociation for Computing Machinery, New York, NY, USA, 62–78. https://doi.org/10.1145/3359591.3359732 [8] Duncan Coutts, Roman Leshchinskiy, and Don Stewart. 2007. Streamfusion: From lists to streams to nothing at all.

ACM SIGPLAN Notices

42, 9 (2007), 315–326.[9] Anthony Cozzie, Murph Finnicum, and Samuel T King. 2011. Macho:Programming with Man Pages. In . USENIX Association, Napa, CA, United States. [10] Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: SimplifiedData Processing on Large Clusters.

Commun. ACM

51, 1 (Jan. 2008),107–113. https://doi.org/10.1145/1327452.1327492 [11] Mengwei Ding, Long Zheng, Yanchao Lu, Li Li, Song Guo, and MinyiGuo. 2011. More Convenient More Overhead: The Performance Evalu-ation of Hadoop Streaming. In

Proceedings of the 2011 ACM Symposiumon Research in Applied Computation (Miami, Florida) (RACS ’11) . As-sociation for Computing Machinery, New York, NY, USA, 307–313. https://doi.org/10.1145/2103380.2103444 [12] Andrew Farmer, Christian Hoener zu Siederdissen, and Andy Gill.2014. The HERMIT in the stream: fusing stream fusion’s concatMap.In

Proceedings of the ACM SIGPLAN 2014 workshop on Partial evaluationand program manipulation . 97–108.[13] Azadeh Farzan and Victor Nicolet. 2017. Synthesis of Divide and Con-quer Parallelism for Loops. In

Proceedings of the 38th ACM SIGPLANConference on Programming Language Design and Implementation (Barcelona, Spain) (PLDI 2017) . Association for Computing Machinery,New York, NY, USA, 540–555. https://doi.org/10.1145/3062341.3062355 [14] Azadeh Farzan and Victor Nicolet. 2019. Modular Divide-and-ConquerParallelization of Nested Loops. In

Proceedings of the 40th ACM SIG-PLAN Conference on Programming Language Design and Implemen-tation (Phoenix, AZ, USA) (PLDI 2019) . Association for ComputingMachinery, New York, NY, USA, 610–624. https://doi.org/10.1145/3314221.3314612 [15] John K Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizingdata structure transformations from input-output examples. In

ACMSIGPLAN Notices , Vol. 50. ACM, 229–239.[16] Bernd Finkbeiner and Paul Gölz. 2017. Synthesis in distributed envi-ronments. arXiv preprint arXiv:1710.05368 (2017).[17] Apache Software Foundation. 2020. Hadoop Streaming. https://hadoop.apache.org/docs/r1.2.1/streaming.html [18] Inc. GitHub. 2020. Most Starred Shell Repositories. https://git.io/JU2pB [19] Michael Greenberg and Austin J Blatt. 2019. Executable formal se-mantics for the POSIX shell.

Proceedings of the ACM on ProgrammingLanguages

4, POPL (2019), 1–30.[20] Sumit Gulwani. 2011. Automating string processing in spreadsheetsusing input-output examples. In

ACM Sigplan Notices , Vol. 46. ACM,317–330.[21] Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and DennisFetterly. 2007. Dryad: distributed data-parallel programs from sequen-tial building blocks. In

Proceedings of the 2nd ACM SIGOPS/EuroSys onference’17, July 2017, Washington, DC, USA Nikos Vasilakis, Jiasi Shen, and Martin Rinard European Conference on Computer Systems 2007 . 59–72.[22] Susmit Jha, Sumit Gulwani, Sanjit A Seshia, and Ashish Tiwari. 2010.Oracle-guided component-based program synthesis. In

Proceedings ofthe 32nd ACM/IEEE International Conference on Software Engineering-Volume 1 . ACM, 215–224.[23] Dan Jurafsky. 2017. Unix for Poets. https://web.stanford.edu/class/cs124/lec/124-2018-UnixForPoets.pdf [24] Idit Keidar. 2012. Distributed Computing Column 46: SynthesizingDistributed and Concurrent Programs.

SIGACT News

43, 2 (June 2012),84. https://doi.org/10.1145/2261417.2261436 [25] Oleg Kiselyov, Aggelos Biboudis, Nick Palladinos, and Yannis Smarag-dakis. 2017. Stream fusion, to completeness. In

Proceedings of the 44thACM SIGPLAN Symposium on Principles of Programming Languages .285–299.[26] R McGrath and W Akkerman. 2004. Linux Strace.[27] Malcolm D McIlroy, Elliot N Pinson, and Berkley A Tague. 1978. UNIXTime-Sharing System: Foreword.

Bell System Technical Journal

57, 6(1978), 1899–1904.[28] Rob Pike, Dave Presotto, Ken Thompson, Howard Trickey, et al. 1990.Plan 9 from Bell Labs. In

Proceedings of the summer 1990 UKUUGConference . 1–9. http://css.csail.mit.edu/6.824/2014/papers/plan9.pdf [29] Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Pro-gram synthesis from polymorphic refinement types.

ACM SIGPLANNotices

51, 6 (2016), 522–538.[30] Deepti Raghavan, Sadjad Fouladi, Philip Levis, and Matei Zaharia.2020. POSH: A Data-Aware Shell. In . USENIX Association, 617–631. [31] Mohammad Raza and Sumit Gulwani. 2018. Disjunctive ProgramSynthesis: A Robust Approach to Programming by Example. In

Thirty-Second AAAI Conference on Artificial Intelligence .[32] Martin C. Rinard, Jiasi Shen, and Varun Mangalick. [n.d.]. Activelearning for inference and regeneration of computer programs thatstore and retrieve data. In

Proceedings of the 2018 ACM SIGPLAN Inter-national Symposium on New Ideas, New Paradigms, and Reflections onProgramming and Software, Onward! 2018, Boston, MA, USA, November7-8, 2018 , Elisa Gonzalez Boix and Richard P. Gabriel (Eds.).[33] Radu Rugina and Martin Rinard. 1999. Automatic Parallelization ofDivide and Conquer Algorithms. In

Proceedings of the Seventh ACMSIGPLAN Symposium on Principles and Practice of Parallel Programming (Atlanta, Georgia, USA) (PPoPP ’99) . Association for Computing Ma-chinery, New York, NY, USA, 72–83. https://doi.org/10.1145/301104.301111 [34] Sven Schewe. 2008. Synthesis of distributed systems. (2008).[35] Torsten Seemann. 2019. 25 reasons assemblies don’t make it intorefseq. http://thegenomefactory.blogspot.com/2019/09/25-reasons-assemblies-dont-make-it-into.html [36] Burr Settles. 2009.

Active Learning Literature Survey . Computer Sci-ences Technical Report 1648. University of Wisconsin–Madison.[37] Jiasi Shen and Martin Rinard. 2019. Using Active Learning to Syn-thesize Models of Applications That Access Databases. In

Proceed-ings of the 40th ACM SIGPLAN Conference on Programming LanguageDesign and Implementation (Phoenix, AZ, USA) (PLDI ’19) . ACM. https://doi.org/10.1145/3314221.3314591 [38] Wei Shen. 2019. A Cross-platform Command-line Tool for ExecutingJobs in Parallel. https://github.com/shenwei356/rush .[39] Xujie Si, Woosuk Lee, Richard Zhang, Aws Albarghouthi, ParaschosKoutris, and Mayur Naik. 2018. Syntax-Guided Synthesis of Data-log Programs. In

Proceedings of the 2018 26th ACM Joint Meeting onEuropean Software Engineering Conference and Symposium on the Foun-dations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE2018) . Association for Computing Machinery, New York, NY, USA,515–527. https://doi.org/10.1145/3236024.3236034 [40] Rishabh Singh. 2016. Blinkfill: Semi-supervised programming by ex-ample for syntactic string transformations.

Proceedings of the VLDBEndowment

9, 10 (2016), 816–827.[41] Calvin Smith and Aws Albarghouthi. 2016. MapReduce ProgramSynthesis. In

Proceedings of the 37th ACM SIGPLAN Conference onProgramming Language Design and Implementation (Santa Barbara,CA, USA) (PLDI ’16) . Association for Computing Machinery, New York,NY, USA, 326–340. https://doi.org/10.1145/2908080.2908102 [42] Diomidis Spinellis and Marios Fragkoulis. 2017. Extending UnixPipelines to DAGs.

IEEE Trans. Comput.

66, 9 (2017), 1547–1561.[43] Ole Tange. 2011. GNU Parallel—The Command-Line Power Tool. ;login:The USENIX Magazine

36, 1 (Feb 2011), 42–47. https://doi.org/10.5281/zenodo.16303 [44] Nikos Vasilakis, Konstantinos Kallas, Konstantinos Mamouras, Achil-leas Benetopoulos, and Lazar Cvetkovich. 2020. PaSh: Light-touchData-Parallel Shell Processing. arXiv preprint arXiv:2007.09436 (2020).[45] Navid Yaghmazadeh, Xinyu Wang, and Isil Dillig. 2018. Automatedmigration of hierarchical data to relational tables using programming-by-example.

Proceedings of the VLDB Endowment

11, 5 (2018), 580–593.[46] Andy B Yoo, Morris A Jette, and Mark Grondona. 2003. Slurm: Simplelinux utility for resource management. In

Workshop on Job SchedulingStrategies for Parallel Processing . Springer, 44–60.[47] Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, andIon Stoica. 2012. Resilient Distributed Datasets: A Fault-tolerant Ab-straction for In-memory Cluster Computing. In

Proceedings of the 9thUSENIX Conference on Networked Systems Design and Implementation (San Jose, CA) (NSDI’12) . USENIX Association, Berkeley, CA, USA, 2–2. http://dl.acm.org/citation.cfm?id=2228298.2228301http://dl.acm.org/citation.cfm?id=2228298.2228301