Sumit Gulwani | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sumit Gulwani is active.

Explore More

Publication

Featured researches published by Sumit Gulwani.

symposium on principles of programming languages | 2011

Automating string processing in spreadsheets using input-output examples

Sumit Gulwani

We describe the design of a string programming/expression language that supports restricted forms of regular expressions, conditionals and loops. The language is expressive enough to represent a wide variety of string manipulation tasks that end-users struggle with. We describe an algorithm based on several novel concepts for synthesizing a desired program in this language from input-output examples. The synthesis algorithm is very efficient taking a fraction of a second for various benchmark examples. The synthesis algorithm is interactive and has several desirable features: it can rank multiple solutions and has fast convergence, it can detect noise in the user input, and it supports an active interaction model wherein the user is prompted to provide outputs on inputs that may have multiple computational interpretations. The algorithm has been implemented as an interactive add-in for Microsoft Excel spreadsheet system. The prototype tool has met the golden test - it has synthesized part of itself, and has been used to solve problems beyond authors imagination.

symposium on principles of programming languages | 2009

SPEED: precise and efficient static estimation of program computational complexity

Sumit Gulwani; Krishna Kumar Mehra; Trishul M. Chilimbi

This paper describes an inter-procedural technique for computing symbolic bounds on the number of statements a procedure executes in terms of its scalar inputs and user-defined quantitative functions of input data-structures. Such computational complexity bounds for even simple programs are usually disjunctive, non-linear, and involve numerical properties of heaps. We address the challenges of generating these bounds using two novel ideas. We introduce a proof methodology based on multiple counter instrumentation (each counter can be initialized and incremented at potentially multiple program locations) that allows a given linear invariant generation tool to compute linear bounds individually on these counter variables. The bounds on these counters are then composed together to generate total bounds that are non-linear and disjunctive. We also give an algorithm for automating this proof methodology. Our algorithm generates complexity bounds that are usually precise not only in terms of the computational complexity, but also in terms of the constant factors. Next, we introduce the notion of user-defined quantitative functions that can be associated with abstract data-structures, e.g., length of a list, height of a tree, etc. We show how to compute bounds in terms of these quantitative functions using a linear invariant generation tool that has support for handling uninterpreted functions. We show application of this methodology to commonly used data-structures (namely lists, list of lists, trees, bit-vectors) using examples from Microsoft product code. We observe that a few quantitative functions for each data-structure are usually sufficient to allow generation of symbolic complexity bounds of a variety of loops that iterate over these data-structures, and that it is straightforward to define these quantitative functions. The combination of these techniques enables generation of precise computational complexity bounds for real-world examples (drawn from Microsoft product code and C++ STL library code) for some of which it is non-trivial to even prove termination. Such automatically generated bounds are very useful for early detection of egregious performance problems in large modular codebases that are constantly being changed by multiple developers who make heavy use of code written by others without a good understanding of their implementation complexity.

international conference on software engineering | 2010

Oracle-guided component-based program synthesis

Susmit Jha; Sumit Gulwani; Sanjit A. Seshia; Ashish Tiwari

We present a novel approach to automatic synthesis of loop-free programs. The approach is based on a combination of oracle-guided learning from examples, and constraint-based synthesis from components using satisfiability modulo theories (SMT) solvers. Our approach is suitable for many applications, including as an aid to program understanding tasks such as deobfuscating malware. We demonstrate the efficiency and effectiveness of our approach by synthesizing bit-manipulating programs and by deobfuscating programs.

symposium on principles of programming languages | 2010

From program verification to program synthesis

Saurabh Srivastava; Sumit Gulwani; Jeffrey S. Foster

This paper describes a novel technique for the synthesis of imperative programs. Automated program synthesis has the potential to make programming and the design of systems easier by allowing programs to be specified at a higher-level than executable code. In our approach, which we call proof-theoretic synthesis, the user provides an input-output functional specification, a description of the atomic operations in the programming language, and a specification of the synthesized programs looping structure, allowed stack space, and bound on usage of certain operations. Our technique synthesizes a program, if there exists one, that meets the input-output specification and uses only the given resources. The insight behind our approach is to interpret program synthesis as generalized program verification, which allows us to bring verification tools and techniques to program synthesis. Our synthesis algorithm works by creating a program with unknown statements, guards, inductive invariants, and ranking functions. It then generates constraints that relate the unknowns and enforces three kinds of requirements: partial correctness, loop termination, and well-formedness conditions on program guards. We formalize the requirements that program verification tools must meet to solve these constraint and use tools from prior work as our synthesizers. We demonstrate the feasibility of the proposed approach by synthesizing programs in three different domains: arithmetic, sorting, and dynamic programming. Using verification tools that we previously built in the VS3 project we are able to synthesize programs for complicated arithmetic algorithms including Strassens matrix multiplication and Bresenhams line drawing; several sorting algorithms; and several dynamic programming algorithms. For these programs, the median time for synthesis is 14 seconds, and the ratio of synthesis to verification time ranges between 1x to 92x (with an median of 7x), illustrating the potential of the approach.

programming language design and implementation | 2013

Automated feedback generation for introductory programming assignments

Rishabh Singh; Sumit Gulwani; Armando Solar-Lezama

We present a new method for automatically providing feedback for introductory programming problems. In order to use this method, we need a reference implementation of the assignment, and an error model consisting of potential corrections to errors that students might make. Using this information, the system automatically derives minimal corrections to students incorrect solutions, providing them with a measure of exactly how incorrect a given solution was, as well as feedback about what they did wrong. We introduce a simple language for describing error models in terms of correction rules, and formally define a rule-directed translation strategy that reduces the problem of finding minimal corrections in an incorrect program to the problem of synthesizing a correct program from a sketch. We have evaluated our system on thousands of real student attempts obtained from the Introduction to Programming course at MIT (6.00) and MITx (6.00x). Our results show that relatively simple error models can correct on average 64% of all incorrect submissions in our benchmark set.

programming language design and implementation | 2011

Synthesis of loop-free programs

Sumit Gulwani; Susmit Jha; Ashish Tiwari; Ramarathnam Venkatesan

We consider the problem of synthesizing loop-free programs that implement a desired functionality using components from a given library. Specifications of the desired functionality and the library components are provided as logical relations between their respective input and output variables. The library components can be used at most once, and hence the library is required to contain a reasonable overapproximation of the multiset of the components required. We solve the above component-based synthesis problem using a constraint-based approach that involves first generating a synthesis constraint, and then solving the constraint. The synthesis constraint is a first-order ∃∀ logic formula whose size is quadratic in the number of components. We present a novel algorithm for solving such constraints. Our algorithm is based on counterexample guided iterative synthesis paradigm and uses off-the-shelf SMT solvers. We present experimental results that show that our tool Brahma can efficiently synthesize highly nontrivial 10-20 line loop-free bitvector programs. These programs represent a state space of approximately 2010 programs, and are beyond the reach of the other tools based on sketching and superoptimization.

programming language design and implementation | 2008

Program analysis as constraint solving

Sumit Gulwani; Saurabh Srivastava; Ramarathnam Venkatesan

A constraint-based approach to invariant generation in programs translates a program into constraints that are solved using off-the-shelf constraint solvers to yield desired program invariants. In this paper we show how the constraint-based approach can be used to model a wide spectrum of program analyses in an expressive domain containing disjunctions and conjunctions of linear inequalities. In particular, we show how to model the problem of context-sensitive interprocedural program verification. We also present the first constraint-based approach to weakest precondition and strongest postcondition inference. The constraints we generate are boolean combinations of quadratic inequalities over integer variables. We reduce these constraints to SAT formulae using bitvector modeling and use off-the-shelf SAT solvers to solve them. Furthermore, we present interesting applications of the above analyses, namely bounds analysis and generation of most-general counter-examples for both safety and termination properties. We also present encouraging preliminary experimental results demonstrating the feasibility of our technique on a variety of challenging examples.

formal methods in computer-aided design | 2010

Dimensions in program synthesis

Sumit Gulwani

Program Synthesis, which is the task of discovering programs that realize user intent, can be useful in several scenarios: discovery of new algorithms, helping regular programmers automatically discover tricky/mundane programming details, enabling people with no programming background to develop scripts for performing repetitive tasks (end-user programming), and even problem solving in the context of automating teaching. In this tutorial, I will describe the three key dimensions that should be taken into account in designing any program synthesis system: expression of user intent, space of programs over which to search, and the search technique [1]. (i) The user intent can be expressed in the form of logical relations between inputs and outputs, input-output examples, demonstrations, natural language, and inefficient or related programs. (ii) The search space can be over imperative or functional programs (with possible restrictions on the control structure or the operator set), or over restricted models of computations such as regular/context-free grammars/transducers, or succinct logical representations. (iii) The search technique can be based on exhaustive search, version space algebras, machine learning techniques (such as belief propagation or genetic programming), or logical reasoning techniques based on SAT/SMT solvers. I will illustrate these concepts by brief description of various program synthesis projects that target synthesis of a wide variety of programs such as standard undergraduate textbook algorithms (e.g., sorting, dynamic programming), program inverses (e.g., decoders, deserializers), bitvector manipulation routines, deobfuscated programs, graph algorithms, text-manipulating routines, geometry algorithms etc.

symposium on principles of programming languages | 2008

Lifting abstract interpreters to quantified logical domains

Sumit Gulwani; Bill McCloskey; Ashish Tiwari

We describe a general technique for building abstract interpreters over powerful universally quantified abstract domains that leverage existing quantifier-free domains. Our quantified abstract domain can represent universally quantified facts like ∀i(0 ≤ i < n ⇒ α[i] = 0). The principal challenge in this effort is that, while most domains supply over-approximations of operations like join, meet, and variable elimination, working with the guards of quantified facts requires under-approximation. We present an automatic technique to convert the standard over-approximation operations provided with all domains into sound under-approximations. We establish the correctness of our abstract interpreters by identifying two lattices---one that establishes the soundness of the abstract interpreter and another that defines its precision, or completeness. Our experiments on a variety of programs using arrays and pointers (including several sorting algorithms) demonstrate the feasibility of the approach on challenging examples.

Communications of The ACM | 2012

Spreadsheet data manipulation using examples

Sumit Gulwani; William R. Harris; Rishabh Singh

Millions of computer end users need to perform tasks over large spreadsheet data, yet lack the programming knowledge to do such tasks automatically. We present a programming by example methodology that allows end users to automate such repetitive tasks. Our methodology involves designing a domain-specific language and developing a synthesis algorithm that can learn programs in that language from user-provided examples. We present instantiations of this methodology for particular domains of tasks: (a) syntactic transformations of strings using restricted forms of regular expressions, conditionals, and loops, (b) semantic transformations of strings involving lookup in relational tables, and (c) layout transformations on spreadsheet tables. We have implemented this technology as an add-in for the Microsoft Excel Spreadsheet system and have evaluated it successfully over several benchmarks picked from various Excel help forums.

Explore More