Featured Researches

Programming Languages

Automatic Synthesis of Parallel and Distributed Unix Commands with KumQuat

We present KumQuat, a system for automatically synthesizing parallel and distributed versions of Unix shell commands. KumQuat follows a divide-and-conquer approach, decomposing commands into (i) a parallel mapper applying the original command to produce partial results, and (ii) an ordered combiner that combines the partial results into the final output. KumQuat synthesizes the combiner by applying repeated rounds of exploration; at each round, it compares the results of the synthesized program with those from the sequential program to discard invalid candidates. A series of refinements improve the performance of both the synthesis component and the resulting synthesized programs. For 98.2% of Unix commands from real pipelines, KumQuat either synthesizes a combiner (92.2%) or reports that a combiner is not synthesizable (7.8%), offering an average speedup of 7.8 ? for the parallel version and 3.8 ? for the distributed version.

Read more
Programming Languages

Automatic Verification of LLVM Code

In this work we present our work in developing a software verification tool for llvm-code - Lodin - that incorporates both explicit-state model checking, statistical model checking and symbolic state model checking algorithms.

Read more
Programming Languages

Automatic and Efficient Variability-Aware Lifting of Functional Programs

A software analysis is a computer program that takes some representation of a software product as input and produces some useful information about that product as output. A software product line encompasses \emph{many} software product variants, and thus existing analyses can be applied to each of the product variations individually, but not to the entire product line as a whole. Enumerating all product variants and analyzing them one by one is usually intractable due to the combinatorial explosion of the number of product variants with respect to product line features. Several software analyses (e.g., type checkers, model checkers, data flow analyses) have been redesigned/re-implemented to support variability. This usually requires a lot of time and effort, and the variability-aware version of the analysis might have new errors/bugs that do not exist in the original one. Given an analysis program written in a functional language based on PCF, in this paper we present two approaches to transforming (lifting) it into a semantically equivalent variability-aware analysis. A light-weight approach (referred to as \emph{shallow lifting}) wraps the analysis program into a variability-aware version, exploring all combinations of its input arguments. Deep lifting, on the other hand, is a program rewriting mechanism where the syntactic constructs of the input program are rewritten into their variability-aware counterparts. Compositionally this results in an efficient program semantically equivalent to the input program, modulo variability. We present the correctness criteria for functional program lifting, together with correctness proof sketches of our program transformations. We evaluate our approach on a set of program analyses applied to the BusyBox C-language product line.

Read more
Programming Languages

Automatic generation and verification of test-stable floating-point code

Test instability in a floating-point program occurs when the control flow of the program diverges from its ideal execution assuming real arithmetic. This phenomenon is caused by the presence of round-off errors that affect the evaluation of arithmetic expressions occurring in conditional statements. Unstable tests may lead to significant errors in safety-critical applications that depend on numerical computations. Writing programs that take into consideration test instability is a difficult task that requires expertise on finite precision computations and rounding errors. This paper presents a toolchain to automatically generate and verify a provably correct test-stable floating-point program from a functional specification in real arithmetic. The input is a real-valued program written in the Prototype Verification System (PVS) specification language and the output is a transformed floating-point C program annotated with ANSI/ISO C Specification Language (ACSL) contracts. These contracts relate the floating-point program to its functional specification in real arithmetic. The transformed program detects if unstable tests may occur and, in these cases, issues a warning and terminate. An approach that combines the Frama-C analyzer, the PRECiSA round-off error estimator, and PVS is proposed to automatically verify that the generated program code is correct in the sense that, if the program terminates without a warning, it follows the same computational path as its real-valued functional specification.

Read more
Programming Languages

Automatically Deriving Control-Flow Graph Generators from Operational Semantics

We develop the first theory of control-flow graphs from first principles, and use it to create an algorithm for automatically synthesizing many variants of control-flow graph generators from a language's operational semantics. Our approach first introduces a new algorithm for converting a large class of small-step operational semantics to an abstract machine. It next uses a technique called "abstract rewriting" to automatically abstract the semantics of a language, which is used both to directly generate a CFG from a program ("interpreted mode") and to generate standalone code, similar to a human-written CFG generator, for any program in a language. We show how the choice of two abstraction and projection parameters allow our approach to synthesize several families of CFG-generators useful for different kinds of tools. We prove the correspondence between the generated graphs and the original semantics. We provide and prove an algorithm for automatically proving the termination of interpreted-mode generators. In addition to our theoretical results, we have implemented this algorithm in a tool called Mandate, and show that it produces human-readable code on two medium-size languages with 60-80 rules, featuring nearly all intraprocedural control constructs common in modern languages. We then showed these CFG-generators were sufficient to build two static analyzers atop them. Our work is a promising step towards the grand vision of being able to synthesize all desired tools from the semantics of a programming language.

Read more
Programming Languages

AwkwardForth: accelerating Uproot with an internal DSL

File formats for generic data structures, such as ROOT, Avro, and Parquet, pose a problem for deserialization: it must be fast, but its code depends on the type of the data structure, not known at compile-time. Just-in-time compilation can satisfy both constraints, but we propose a more portable solution: specialized virtual machines. AwkwardForth is a Forth-driven virtual machine for deserializing data into Awkward Arrays. As a language, it is not intended for humans to write, but it loosens the coupling between Uproot and Awkward Array. AwkwardForth programs for deserializing record-oriented formats (ROOT and Avro) are about as fast as C++ ROOT and 10-80 ? faster than fastavro. Columnar formats (simple TTrees, RNTuple, and Parquet) only require specialization to interpret metadata and are therefore faster with precompiled code.

Read more
Programming Languages

BARR-C:2018 and MISRA C:2012: Synergy Between the Two Most Widely Used C Coding Standards

The Barr Group's Embedded C Coding Standard (BARR-C:2018, which originates from the 2009 Netrino's Embedded C Coding Standard) is, for coding standards used by the embedded system industry, second only in popularity to MISRA C. However, the choice between MISRA C:2012 and BARR-C:2018 needs not be a hard decision since they are complementary in two quite different ways. On the one hand, BARR-C:2018 has removed all the incompatibilities with respect to MISRA C:2012 that were present in the previous edition (BARR-C:2013). As a result, disregarding programming style, BARR-C:2018 defines a subset of C that, while preventing a significant number of programming errors, is larger than the one defined by MISRA C:2012. On the other hand, concerning programming style, whereas MISRA C leaves this to individual organizations, BARR-C:2018 defines a programming style aimed primarily at minimizing programming errors. As a result, BARR-C:2018 can be seen as a first, dramatically useful step to C language subsetting that is suitable for all kinds of projects; critical projects can then evolve toward MISRA C:2012 compliance smoothly while maintaining the BARR-C programming style. In this paper, we introduce BARR-C:2018, we describe its relationship with MISRA C:2012, and we discuss the parallel and serial adoption of the two coding standards.

Read more
Programming Languages

BISM: Bytecode-Level Instrumentation for Software Monitoring

BISM (Bytecode-Level Instrumentation for Software Monitoring) is a lightweight bytecode instrumentation tool that features an expressive high-level control-flow-aware instrumentation language. The language follows the aspect-oriented programming paradigm by adopting the joinpoint model, advice inlining, and separate instrumentation mechanisms. BISM provides joinpoints ranging from bytecode instruction to method execution, access to comprehensive static and dynamic context information, and instrumentation methods. BISM runs in two instrumentation modes: build-time and load-time. We demonstrate BISM effectiveness using two experiments: a security scenario and a general runtime verification case. The results show that BISM instrumentation incurs low runtime and memory overheads.

Read more
Programming Languages

BUSTLE: Bottom-up program-Synthesis Through Learning-guided Exploration

Program synthesis is challenging largely because of the difficulty of search in a large space of programs. Human programmers routinely tackle the task of writing complex programs by writing sub-programs and then analysing their intermediate results to compose them in appropriate ways. Motivated by this intuition, we present a new synthesis approach that leverages learning to guide a bottom-up search over programs. In particular, we train a model to prioritize compositions of intermediate values during search conditioned on a given set of input-output examples. This is a powerful combination because of several emergent properties: First, in bottom-up search, intermediate programs can be executed, providing semantic information to the neural network. Second, given the concrete values from those executions, we can exploit rich features based on recent work on property signatures. Finally, bottom-up search allows the system substantial flexibility in what order to generate the solution, allowing the synthesizer to build up a program from multiple smaller sub-programs. Overall, our empirical evaluation finds that the combination of learning and bottom-up search is remarkably effective, even with simple supervised learning approaches. We demonstrate the effectiveness of our technique on a new data set for synthesis of string transformation programs.

Read more
Programming Languages

Bacatá: Notebooks for DSLs, Almost for Free

Context: Computational notebooks are a contemporary style of literate programming, in which users can communicate and transfer knowledge by interleaving executable code, output, and prose in a single rich document. A Domain-Specific Language (DSL) is an artificial software language tailored for a particular application domain. Usually, DSL users are domain experts that may not have a software engineering background. As a consequence, they might not be familiar with Integrated Development Environments (IDEs). Thus, the development of tools that offer different interfaces for interacting with a DSL is relevant. Inquiry: However, resources available to DSL designers are limited. We would like to leverage tools used to interact with general purpose languages in the context of DSLs. Computational notebooks are an example of such tools. Then, our main question is: What is an efficient and effective method of designing and implementing notebook interfaces for DSLs? By addressing this question we might be able to speed up the development of DSL tools, and ease the interaction between end-users and DSLs. Approach: In this paper, we present Bacatá, a mechanism for generating notebook interfaces for DSLs in a language parametric fashion. We designed this mechanism in a way in which language engineers can reuse as many language components (e.g., language processors, type checkers, code generators) as possible. Knowledge: Our results show that notebook interfaces generated by Bacatá can be automatically generated with little manual configuration. There are few considerations and caveats that should be addressed by language engineers that rely on language design aspects. The creation of a notebook for a DSL with Bacatá becomes a matter of writing the code that wires existing language components in the Rascal language workbench with the Jupyter platform. Grounding: We evaluate Bacatá by generating functional computational notebook interfaces for three different non-trivial DSLs, namely: a small subset of Halide (a DSL for digital image processing), SweeterJS (an extended version of JavaScript), and QL (a DSL for questionnaires). Additionally, it is relevant to generate notebook implementations rather than implementing them manually. We measured and compared the number of Source Lines of Code (SLOCs) that we reused from existing implementations of those languages. Importance: The adoption of notebooks by novice-programmers and end-users has made them very popular in several domains such as exploratory programming, data science, data journalism, and machine learning. Why are they popular? In (data) science, it is essential to make results reproducible as well as understandable. However, notebooks are only available for GPLs. This paper opens up the notebook metaphor for DSLs to improve the end-user experience when interacting with code and to increase DSLs adoption.

Read more

Ready to get started?

Join us today