Featured Researches

Programming Languages

Data-Driven Inference of Representation Invariants

A representation invariant is a property that holds of all values of abstract type produced by a module. Representation invariants play important roles in software engineering and program verification. In this paper, we develop a counterexample-driven algorithm for inferring a representation invariant that is sufficient to imply a desired specification for a module. The key novelty is a type-directed notion of visible inductiveness, which ensures that the algorithm makes progress toward its goal as it alternates between weakening and strengthening candidate invariants. The algorithm is parameterized by an example-based synthesis engine and a verifier, and we prove that it is sound and complete for first-order modules over finite types, assuming that the synthesizer and verifier are as well. We implement these ideas in a tool called Hanoi, which synthesizes representation invariants for recursive data types. Hanoi not only handles invariants for first-order code, but higher-order code as well. In its back end, Hanoi uses an enumerative synthesizer called Myth and an enumerative testing tool as a verifier. Because Hanoi uses testing for verification, it is not sound, though our empirical evaluation shows that it is successful on the benchmarks we investigated.

Read more
Programming Languages

Dataflow Analysis With Prophecy and History Variables

Leveraging concepts from state machine refinement proofs, we use prophecy variables, which predict information about the future program execution, to enable forward reasoning for backward dataflow analyses. Drawing prophecy and history variables (concepts from the dynamic execution of the program) from the same lattice as the static program analysis results, we require the analysis results to satisfy both the dataflow equations and the transition relations in the operational semantics of underlying programming language. This approach eliminates explicit abstraction and concretization functions and promotes a more direct connection between the analysis and program executions, with the connection taking the form of a bisimulation relation between concrete executions and an augmented operational semantics over the analysis results. We present several classical dataflow analyses with this approach (live variables, very busy expressions, defined variables, and reaching definitions) along with proofs that highlight how this approach can enable more streamlined reasoning. To the best of our knowledge, we are the first to use prophecy variables for dataflow analysis.

Read more
Programming Languages

Debug-Localize-Repair: A Symbiotic Construction for Heap Manipulations

We present Wolverine, an integrated Debug-Localize-Repair environment for heap manipulating programs. Wolverine packages a new bug localization algorithm that reduces the search space of repairs and a novel, proof-directed repair algorithm to synthesize the repair patches. While concretely executing a program, Wolverine displays the abstract program states (as box-and-arrow diagrams) as a visual aid to the programmer. Wolverine supports "hot-patching" of the generated patches to provide a seamless debugging environment, and the bug localization enkindles tremendous speedups in repair timing. Wolverine also facilitates new debug-localize-repair possibilities, specification refinement, and checkpoint-based hopping. We evaluate our framework on 6400 buggy programs (generated using automated fault injection) on a variety of data-structures like singly, doubly, and circular linked lists, AVL trees, Red-Black trees, Splay Trees, and Binary Search Trees; Wolverine could repair all the buggy instances within realistic programmer wait-time (less than 5 sec in most cases). Wolverine could also repair more than 80% of the 247 (buggy) student submissions (where a reasonable attempt was made).

Read more
Programming Languages

Decidability and Synthesis of Abstract Inductive Invariants

Decidability and synthesis of inductive invariants ranging in a given domain play an important role in many software and hardware verification systems. We consider here inductive invariants belonging to an abstract domain A as defined in abstract interpretation, namely, ensuring the existence of the best approximation in A of any system property. In this setting, we study the decidability of the existence of abstract inductive invariants in A of transition systems and their corresponding algorithmic synthesis. Our model relies on some general results which relate the existence of abstract inductive invariants with least fixed points of best correct approximations in A of the transfer functions of transition systems and their completeness properties. This approach allows us to derive decidability and synthesis results for abstract inductive invariants which are applied to the well-known Kildall's constant propagation and Karr's affine equalities abstract domains. Moreover, we show that a recent general algorithm for synthesizing inductive invariants in domains of logical formulae can be systematically derived from our results and generalized to a range of algorithms for computing abstract inductive invariants.

Read more
Programming Languages

Declarative Demand-Driven Reverse Engineering

Binary reverse engineering is a challenging task because it often necessitates reasoning using both domain-specific knowledge (e.g., understanding entrypoint idioms common to an ABI) and logical inference (e.g., reconstructing interprocedural control flow). To help perform these tasks, reverse engineers often use toolkits (such as IDA Pro or Ghidra) that allow them to interactively explicate properties of binaries. We argue that deductive databases serve as a natural abstraction for interfacing between visualization-based binary analysis tools and high-performance logical inference engines that compute facts about binaries. In this paper, we present a vision for the future in which reverse engineers use a visualization-based tool to understand binaries while simultaneously querying a logical-inference engine to perform arbitrarily-complex deductive inference tasks. We call our vision declarative demand-driven reverse engineering (D^3RE for short), and sketch a formal semantics whose goal is to mediate interaction between a logical-inference engine (such Souffle) and a reverse engineering tool. We describe aprototype tool, d3re, which are using to explore the D^3RE vision. While still a prototype, we have used d3re to reimplement several common querying tasks on binaries. Our evaluation demonstrates that d3re enables both better performance and more succinct implementation of these common RE tasks.

Read more
Programming Languages

Declarative Programming with Intensional Sets in Java Using JSetL

Intensional sets are sets given by a property rather than by enumerating their elements. In previous work, we have proposed a decision procedure for a first-order logic language which provides Restricted Intensional Sets (RIS), i.e., a sub-class of intensional sets that are guaranteed to denote finite---though unbounded---sets. In this paper we show how RIS can be exploited as a convenient programming tool also in a conventional setting, namely, the imperative O-O language Java. We do this by considering a Java library, called JSetL, that integrates the notions of logical variable, (set) unification and constraints that are typical of constraint logic programming languages into the Java language. We show how JSetL is naturally extended to accommodate for RIS and RIS constraints, and how this extension can be exploited, on the one hand, to support a more declarative style of programming and, on the other hand, to effectively enhance the expressive power of the constraint language provided by the library.

Read more
Programming Languages

Deductive Verification of Floating-Point Java Programs in KeY

Deductive verification has been successful in verifying interesting properties of real-world programs. One notable gap is the limited support for floating-point reasoning. This is unfortunate, as floating-point arithmetic is particularly unintuitive to reason about due to rounding as well as the presence of the special values infinity and `Not a Number' (NaN). In this paper, we present the first floating-point support in a deductive verification tool for the Java programming language. Our support in the KeY verifier handles arithmetic via floating-point decision procedures inside SMT solvers and transcendental functions via axiomatization. We evaluate this integration on new benchmarks, and show that this approach is powerful enough to prove the absence of floating-point special values -- often a prerequisite for further reasoning about numerical computations -- as well as certain functional properties for realistic benchmarks.

Read more
Programming Languages

Deep Data Flow Analysis

Compiler architects increasingly look to machine learning when building heuristics for compiler optimization. The promise of automatic heuristic design, freeing the compiler engineer from the complex interactions of program, architecture, and other optimizations, is alluring. However, most machine learning methods cannot replicate even the simplest of the abstract interpretations of data flow analysis that are critical to making good optimization decisions. This must change for machine learning to become the dominant technology in compiler heuristics. To this end, we propose ProGraML - Program Graphs for Machine Learning - a language-independent, portable representation of whole-program semantics for deep learning. To benchmark current and future learning techniques for compiler analyses we introduce an open dataset of 461k Intermediate Representation (IR) files for LLVM, covering five source programming languages, and 15.4M corresponding data flow results. We formulate data flow analysis as an MPNN and show that, using ProGraML, standard analyses can be learned, yielding improved performance on downstream compiler optimization tasks.

Read more
Programming Languages

Deep Generation of Coq Lemma Names Using Elaborated Terms

Coding conventions for naming, spacing, and other essentially stylistic properties are necessary for developers to effectively understand, review, and modify source code in large software projects. Consistent conventions in verification projects based on proof assistants, such as Coq, increase in importance as projects grow in size and scope. While conventions can be documented and enforced manually at high cost, emerging approaches automatically learn and suggest idiomatic names in Java-like languages by applying statistical language models on large code corpora. However, due to its powerful language extension facilities and fusion of type checking and computation, Coq is a challenging target for automated learning techniques. We present novel generation models for learning and suggesting lemma names for Coq projects. Our models, based on multi-input neural networks, are the first to leverage syntactic and semantic information from Coq's lexer (tokens in lemma statements), parser (syntax trees), and kernel (elaborated terms) for naming; the key insight is that learning from elaborated terms can substantially boost model performance. We implemented our models in a toolchain, dubbed Roosterize, and applied it on a large corpus of code derived from the Mathematical Components family of projects, known for its stringent coding conventions. Our results show that Roosterize substantially outperforms baselines for suggesting lemma names, highlighting the importance of using multi-input models and elaborated terms.

Read more
Programming Languages

Deep Static Modeling of invokedynamic

Java 7 introduced programmable dynamic linking in the form of the invokedynamic framework. Static analysis of code containing programmable dynamic linking has often been cited as a significant source of unsoundness in the analysis of Java programs. For example, Java lambdas, introduced in Java 8, are a very popular feature, which is, however, resistant to static analysis, since it mixes invokedynamic with dynamic code generation. These techniques invalidate static analysis assumptions: programmable linking breaks reasoning about method resolution while dynamically generated code is, by definition, not available statically. In this paper, we show that a static analysis can predictively model uses of invokedynamic while also cooperating with extra rules to handle the runtime code generation of lambdas. Our approach plugs into an existing static analysis and helps eliminate all unsoundness in the handling of lambdas (including associated features such as method references) and generic invokedynamic uses. We evaluate our technique on a benchmark suite of our own and on third-party benchmarks, uncovering all code previously unreachable due to unsoundness, highly efficiently.

Read more

Ready to get started?

Join us today