Is this you? Create Your Porfile

Mark Gabel

University of California, Davis

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark Gabel is active.

Explore More

Publication

Featured researches published by Mark Gabel.

international conference on software engineering | 2012

On the naturalness of software

Abram Hindle; Earl T. Barr; Zhendong Su; Mark Gabel; Premkumar T. Devanbu

Natural languages like English are rich, complex, and powerful. The highly creative and graceful use of languages like English and Tamil, by masters like Shakespeare and Avvaiyar, can certainly delight and inspire. But in practice, given cognitive constraints and the exigencies of daily life, most human utterances are far simpler and much more repetitive and predictable. In fact, these utterances can be very usefully modeled using modern statistical methods. This fact has led to the phenomenal success of statistical approaches to speech recognition, natural language translation, question-answering, and text mining and comprehension. We begin with the conjecture that most software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations — and thus, like natural language, it is also likely to be repetitive and predictable. We then proceed to ask whether a) code can be usefully modeled by statistical language models and b) such models can be leveraged to support software engineers. Using the widely adopted n-gram model, we provide empirical evidence supportive of a positive answer to both these questions. We show that code is also very repetitive, and in fact even more so than natural languages. As an example use of the model, we have developed a simple code completion engine for Java that, despite its simplicity, already improves Eclipses built-in completion capability. We conclude the paper by laying out a vision for future research in this area.

international conference on software engineering | 2008

Scalable detection of semantic clones

Mark Gabel; Lingxiao Jiang; Zhendong Su

Several techniques have been developed for identifying similar code fragments in programs. These similar fragments, referred to as code clones, can be used to identify redundant code, locate bugs, or gain insight into program design. Existing scalable approaches to clone detection are limited to finding program fragments that are similar only in their contiguous syntax. Other, semantics-based approaches are more resilient to differences in syntax, such as reordered statements, related statements interleaved with other unrelated statements, or the use of semantically equivalent control structures. However, none of these techniques have scaled to real world code bases. These approaches capture semantic information from Program Dependence Graphs (PDGs), program representations that encode data and control dependencies between statements and predicates. Our definition of a code clone is also based on this representation: we consider program fragments with isomorphic PDGs to be clones. In this paper, we present the first scalable clone detection algorithm based on this definition of semantic clones. Our insight is the reduction of the difficult graph similarity problem to a simpler tree similarity problem by mapping carefully selected PDG subgraphs to their related structured syntax. We efficiently solve the tree similarity problem to create a scalable analysis. We have implemented this algorithm in a practical tool and performed evaluations on several million-line open source projects, including the Linux kernel. Compared with previous approaches, our tool locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments.

foundations of software engineering | 2008

Javert: fully automatic mining of general temporal properties from dynamic traces

Mark Gabel; Zhendong Su

Program specifications are important for many tasks during software design, development, and maintenance. Among these, temporal specifications are particularly useful. They express formal correctness requirements of an applications ordering of specific actions and events during execution, such as the strict alternation of acquisition and release of locks. Despite their importance, temporal specifications are often missing, incomplete, or described only informally. Many techniques have been proposed that mine such specifications from execution traces or program source code. However, existing techniques mine only simple patterns, or they mine a single complex pattern that is restricted to a particular set of manually selected events. There is no practical, automatic technique that can mine general temporal properties from execution traces. In this paper, we present Javert, the first general specification mining framework that can learn, fully automatically, complex temporal properties from execution traces. The key insight behind Javert is that real, complex specifications can be formed by composing instances of small generic patterns, such as the alternating pattern ((ab)) and the resource usage pattern ((ab c)). In particular, Javert learns simple generic patterns and composes them using sound rules to construct large, complex specifications. We have implemented the algorithm in a practical tool and conducted an extensive empirical evaluation on several open source software projects. Our results are promising; they show that Javert is scalable, general, and precise. It discovered many interesting, nontrivial specifications in real-world code that are beyond the reach of existing automatic techniques.

international conference on software engineering | 2010

Online inference and enforcement of temporal properties

Mark Gabel; Zhendong Su

The interfaces of software components are often paired with specifications or protocols that prescribe correct and safe usage. An important class of these specifications consists of temporal safety properties over function or method call sequences. Because violations of these properties can lead to program crashes or subtly inconsistent program state, these properties are frequently the target of runtime monitoring techniques. However, the properties must be specified in advance, a time-consuming process. Recognizing this problem, researchers have proposed various specification inference techniques, but they suffer from imprecision and require a significant investment in developer time. This work presents the first fully automatic dynamic technique for simultaneously learning and enforcing general temporal properties over method call sequences. Our technique is an online algorithm that operates over a short, finite execution history. This limited view works well in practice due to the inherent temporal locality in sequential method calls on Java objects, a property we validate empirically. We have implemented our algorithm in a practical tool for Java, Ocd, that operates with a high degree of precision and finds new defects and code smells in well-tested applications.

international conference on software engineering | 2008

Symbolic mining of temporal specifications

Mark Gabel; Zhendong Su

Program specifications are important in many phases of the software development process, but they are often omitted or incomplete. An important class of specifications takes the form of temporal properties that prescribe proper usage of components of a software system. Recent work has focused on the automated inference of temporal specifications from the static or runtime behavior of programs. Many techniques match a specification pattern (represented by a finite state automaton) to all possible combinations of program components and enumerate the possible matches. Such approaches suffer from high space complexity and have not scaled beyond simple, two-letter alternating patterns (e.g. (ab)*). In this paper, we precisely define this form of specification mining and show that its general form is NP-complete. We observe a great deal of regularity in the representation and tracking of all possible combinations of system components. This motivates us to introduce a symbolic algorithm, based on binary decision diagrams (BDDs), that exploits this regularity. Our results show that this symbolic approach expands the tractability of this problem by orders of magnitude in both time and space. It enables us to mine more complex specifications, such as the common three-letter resource acquisition, usage, and release pattern ((ab+c)*). We have implemented our algorithm in a practical tool and used it to find significant specifications in real systems, including Apache Ant and Hibernate. We then used these specifications to find previously unknown bugs.

conference on object-oriented programming systems, languages, and applications | 2010

Scalable and systematic detection of buggy inconsistencies in source code

Mark Gabel; Junfeng Yang; Yuan Yu; Moises Goldszmidt; Zhendong Su

Software developers often duplicate source code to replicate functionality. This practice can hinder the maintenance of a software project: bugs may arise when two identical code segments are edited inconsistently. This paper presents DejaVu, a highly scalable system for detecting these general syntactic inconsistency bugs. DejaVu operates in two phases. Given a target code base, a parallel /inconsistent clone analysis/ first enumerates all groups of source code fragments that are similar but not identical. Next, an extensible /buggy change analysis/ framework refines these results, separating each group of inconsistent fragments into a fine-grained set of inconsistent changes and classifying each as benign or buggy. On a 75+ million line pre-production commercial code base, DejaVu executed in under five hours and produced a report of over 8,000 potential bugs. Our analysis of a sizable random sample suggests with high likelihood that at this report contains at least 2,000 true bugs and 1,000 code smells. These bugs draw from a diverse class of software defects and are often simple to correct: syntactic inconsistencies both indicate problems and suggest solutions.

foundations of software engineering | 2010