Michael D. Ernst | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael D. Ernst is active.

Explore More

Publication

Featured researches published by Michael D. Ernst.

international conference on software engineering | 1999

Dynamically discovering likely program invariants to support program evolution

Michael D. Ernst; Jake Cockrell; William G. Griswold; David Notkin

Explicitly stated program invariants can help programmers by identifying program properties that must be preserved when modifying code. In practice, however, these invariants are usually implicit. An alternative to expecting programmers to fully annotate code with invariants is to automatically infer invariants from the program itself. This research focuses on dynamic techniques for discovering invariants from execution traces. This paper reports two results. First, it describes techniques for dynamically discovering invariants, along with an instrumenter and an inference engine that embody these techniques. Second, it reports on the application of the engine to two sets of target programs. In programs from Criess work on program derivation, we rediscovered predefined invariants. In a C program lacking explicit invariants, we discovered invariants that assisted a software evolution task.

Science of Computer Programming | 2007

The Daikon system for dynamic detection of likely invariants

Michael D. Ernst; Jeff H. Perkins; Philip J. Guo; Stephen McCamant; Carlos Pacheco; Matthew S. Tschantz; Chen Xiao

Daikon is an implementation of dynamic detection of likely invariants; that is, the Daikon invariant detector reports likely program invariants. An invariant is a property that holds at a certain point or points in a program; these are often used in assert statements, documentation, and formal specifications. Examples include being constant (x=a), non-zero (x 0), being in a range (a@?x@?b), linear relationships (y=ax+b), ordering (x@?y), functions from a library (x=fn(y)), containment (x@?y), sortedness (xissorted), and many more. Users can extend Daikon to check for additional invariants. Dynamic invariant detection runs a program, observes the values that the program computes, and then reports properties that were true over the observed executions. Dynamic invariant detection is a machine learning technique that can be applied to arbitrary data. Daikon can detect invariants in C, C++, Java, and Perl programs, and in record-structured data sources; it is easy to extend Daikon to other applications. Invariants can be useful in program understanding and a host of other applications. Daikons output has been used for generating test cases, predicting incompatibilities in component integration, automating theorem proving, repairing inconsistent data structures, and checking the validity of data streams, among other tasks. Daikon is freely available in source and binary form, along with extensive documentation, at http://pag.csail.mit.edu/daikon/.

formal methods for industrial critical systems | 2005

An overview of JML tools and applications

Lilian Burdy; Yoonsik Cheon; David R. Cok; Michael D. Ernst; Joseph R. Kiniry; Gary T. Leavens; K. Rustan M. Leino; Erik Poll

The Java Modeling Language (JML) can be used to specify the detailed design of Java classes and interfaces by adding annotations to Java source files. The aim of JML is to provide a specification language that is easy to use for Java programmers and that is supported by a wide range of tools for specification typechecking, runtime debugging, static analysis, and verification.This paper gives an overview of the main ideas behind JML, details about JML’s wide range of tools, and a glimpse into existing applications of JML.

very large data bases | 2010

HaLoop: efficient iterative data processing on large clusters

Yingyi Bu; Bill Howe; Magdalena Balazinska; Michael D. Ernst

The growing demand for large-scale data mining and data analysis applications has led both industry and academia to design new types of highly scalable data-intensive computing platforms. MapReduce and Dryad are two popular platforms in which the dataflow takes the form of a directed acyclic graph of operators. These platforms lack built-in support for iterative programs, which arise naturally in many applications including data mining, web ranking, graph analysis, model fitting, and so on. This paper presents HaLoop, a modified version of the Hadoop MapReduce framework that is designed to serve these applications. HaLoop not only extends MapReduce with programming support for iterative applications, it also dramatically improves their efficiency by making the task scheduler loop-aware and by adding various caching mechanisms. We evaluated HaLoop on real queries and real datasets. Compared with Hadoop, on average, HaLoop reduces query runtimes by 1.85, and shuffles only 4% of the data between mappers and reducers.The growing demand for large-scale data mining and data analysis applications has led both industry and academia to design new types of highly scalable data-intensive computing platforms. MapReduce and Dryad are two popular platforms in which the dataflow takes the form of a directed acyclic graph of operators. These platforms lack built-in support for iterative programs, which arise naturally in many applications including data mining, web ranking, graph analysis, model fitting, and so on. This paper presents HaLoop, a modified version of the Hadoop MapReduce framework that is designed to serve these applications. HaLoop not only extends MapReduce with programming support for iterative applications, it also dramatically improves their efficiency by making the task scheduler loop-aware and by adding various caching mechanisms. We evaluated HaLoop on real queries and real datasets. Compared with Hadoop, on average, HaLoop reduces query runtimes by 1.85, and shuffles only 4% of the data between mappers and reducers.

international conference on software engineering | 2007

Feedback-Directed Random Test Generation

Carlos Pacheco; Shuvendu K. Lahiri; Michael D. Ernst; Thomas Ball

We present a technique that improves random test generation by incorporating feedback obtained from executing test inputs as they are created. Our technique builds inputs incrementally by randomly selecting a method call to apply and finding arguments from among previously-constructed inputs. As soon as an input is built, it is executed and checked against a set of contracts and filters. The result of the execution determines whether the input is redundant, illegal, contract-violating, or useful for generating more inputs. The technique outputs a test suite consisting of unit tests for the classes under test. Passing tests can be used to ensure that code contracts are preserved across program changes; failing tests (that violate one or more contract) point to potential errors that should be corrected. Our experimental results indicate that feedback-directed random test generation can outperform systematic and undirected random test generation, in terms of coverage and error detection. On four small but nontrivial data structures (used previously in the literature), our technique achieves higher or equal block and predicate coverage than model checking (with and without abstraction) and undirected random generation. On 14 large, widely-used libraries (comprising 780KLOC), feedback-directed random test generation finds many previously-unknown errors, not found by either model checking or undirected random generation.

international conference on software engineering | 2009

Automatic creation of SQL Injection and cross-site scripting attacks

Adam Kieyzun; Philip J. Guo; Karthick Jayaraman; Michael D. Ernst

We present a technique for finding security vulnerabilities in Web applications. SQL Injection (SQLI) and cross-site scripting (XSS) attacks are widespread forms of attack in which the attacker crafts the input to the application to access or modify user data and execute malicious code. In the most serious attacks (called second-order, or persistent, XSS), an attacker can corrupt a database so as to cause subsequent users to execute malicious code.

symposium on operating systems principles | 2009

Automatically patching errors in deployed software

Jeff H. Perkins; Sunghun Kim; Samuel Larsen; Saman P. Amarasinghe; Jonathan Bachrach; Michael Carbin; Carlos Pacheco; Frank Sherwood; Stelios Sidiroglou; Greg Sullivan; Weng-Fai Wong; Yoav Zibin; Michael D. Ernst; Martin C. Rinard

We present ClearView, a system for automatically patching errors in deployed software. ClearView works on stripped Windows x86 binaries without any need for source code, debugging information, or other external information, and without human intervention. ClearView (1) observes normal executions to learn invariants thatcharacterize the applications normal behavior, (2) uses error detectors to distinguish normal executions from erroneous executions, (3) identifies violations of learned invariants that occur during erroneous executions, (4) generates candidate repair patches that enforce selected invariants by changing the state or flow of control to make the invariant true, and (5) observes the continued execution of patched applications to select the most successful patch. ClearView is designed to correct errors in software with high availability requirements. Aspects of ClearView that make it particularly appropriate for this context include its ability to generate patches without human intervention, apply and remove patchesto and from running applications without requiring restarts or otherwise perturbing the execution, and identify and discard ineffective or damaging patches by evaluating the continued behavior of patched applications. ClearView was evaluated in a Red Team exercise designed to test its ability to successfully survive attacks that exploit security vulnerabilities. A hostile external Red Team developed ten code injection exploits and used these exploits to repeatedly attack an application protected by ClearView. ClearView detected and blocked all of the attacks. For seven of the ten exploits, ClearView automatically generated patches that corrected the error, enabling the application to survive the attacks and continue on to successfully process subsequent inputs. Finally, the Red Team attempted to make Clear-View apply an undesirable patch, but ClearViews patch evaluation mechanism enabled ClearView to identify and discard both ineffective patches and damaging patches.

international symposium on software testing and analysis | 2009

HAMPI: a solver for string constraints

Adam Kiezun; Vijay Ganesh; Philip J. Guo; Pieter Hooimeijer; Michael D. Ernst

Many automatic testing, analysis, and verification techniques for programs can be effectively reduced to a constraint generation phase followed by a constraint-solving phase. This separation of concerns often leads to more effective and maintainable tools. The increasing efficiency of off-the-shelf constraint solvers makes this approach even more compelling. However, there are few effective and sufficiently expressive off-the-shelf solvers for string constraints generated by analysis techniques for string-manipulating programs. We designed and implemented Hampi, a solver for string constraints over fixed-size string variables. Hampi constraints express membership in regular languages and fixed-size context-free languages. Hampi constraints may contain context-free-language definitions, regular language definitions and operations, and the membership predicate. Given a set of constraints, Hampi outputs a string that satisfies all the constraints, or reports that the constraints are unsatisfiable. Hampi is expressive and efficient, and can be successfully applied to testing and analysis of real programs. Our experiments use Hampi in: static and dynamic analyses for finding SQL injection vulnerabilities in Web applications; automated bug finding in C programs using systematic testing; and compare Hampi with another string solver. Hampis source code, documentation, and the experimental data are available at http://people.csail.mit.edu/akiezun/hampi.

international conference on software engineering | 2000

Quickly detecting relevant program invariants

Michael D. Ernst; Adam Czeisler; William G. Griswold; David Notkin

Explicitly stated program invariants can help programmers by characterizing certain aspects of program execution and identifying program properties that must be preserved when modifying code. Unfortunately, these invariants are usually absent from code. Previous work showed how to dynamically detect invariants from program traces by looking for patterns in and relationships among variable values. A prototype implementation, Daikon, accurately recovered invariants from formally-specified programs, and the invariants it detected in other programs assisted programmers in a software evolution task. However, Daikon suffered from reporting too many invariants, many of which were not useful, and also failed to report some desired invariants. The paper presents, and gives experimental evidence of the efficacy of, four approaches for increasing the relevance of invariants reported by a dynamic invariant detector. One of them (exploiting unused polymorphism), adds desired invariants to the output. The other three (suppressing implied invariants, limiting which variables are compared to one another, and ignoring unchanged values), eliminate undesired invariants from the output and also improve runtime by reducing the work done by the invariant detector.

european conference on object oriented programming | 2005

Eclat: automatic generation and classification of test inputs

Carlos Pacheco; Michael D. Ernst

This paper describes a technique that selects, from a large set of test inputs, a small subset likely to reveal faults in the software under test. The technique takes a program or software component, plus a set of correct executions — say, from observations of the software running properly, or from an existing test suite that a user wishes to enhance. The technique first infers an operational model of the softwares operation. Then, inputs whose operational pattern of execution differs from the model in specific ways are suggestive of faults. These inputs are further reduced by selecting only one input per operational pattern. The result is a small portion of the original inputs, deemed by the technique as most likely to reveal faults. Thus, the technique can also be seen as an error-detection technique. The paper describes two additional techniques that complement test input selection. One is a technique for automatically producing an oracle (a set of assertions) for a test input from the operational model, thus transforming the test input into a test case. The other is a classification-guided test input generation technique that also makes use of operational models and patterns. When generating inputs, it filters out code sequences that are unlikely to contribute to legal inputs, improving the efficiency of its search for fault-revealing inputs. We have implemented these techniques in the Eclat tool, which generates unit tests for Java classes. Eclats input is a set of classes to test and an example program execution—say, a passing test suite. Eclats output is a set of JUnit test cases, each containing a potentially fault-revealing input and a set of assertions at least one of which fails. In our experiments, Eclat successfully generated inputs that exposed fault-revealing behavior; we have used Eclat to reveal real errors in programs. The inputs it selects as fault-revealing are an order of magnitude as likely to reveal a fault as all generated inputs.

Explore More