Earl T. Barr | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Earl T. Barr is active.

Explore More

Publication

Featured researches published by Earl T. Barr.

international conference on software engineering | 2012

On the naturalness of software

Abram Hindle; Earl T. Barr; Zhendong Su; Mark Gabel; Premkumar T. Devanbu

Natural languages like English are rich, complex, and powerful. The highly creative and graceful use of languages like English and Tamil, by masters like Shakespeare and Avvaiyar, can certainly delight and inspire. But in practice, given cognitive constraints and the exigencies of daily life, most human utterances are far simpler and much more repetitive and predictable. In fact, these utterances can be very usefully modeled using modern statistical methods. This fact has led to the phenomenal success of statistical approaches to speech recognition, natural language translation, question-answering, and text mining and comprehension. We begin with the conjecture that most software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations — and thus, like natural language, it is also likely to be repetitive and predictable. We then proceed to ask whether a) code can be usefully modeled by statistical language models and b) such models can be leveraged to support software engineers. Using the widely adopted n-gram model, we provide empirical evidence supportive of a positive answer to both these questions. We show that code is also very repetitive, and in fact even more so than natural languages. As an example use of the model, we have developed a simple code completion engine for Java that, despite its simplicity, already improves Eclipses built-in completion capability. We conclude the paper by laying out a vision for future research in this area.

european conference on object oriented programming | 2000

Runtime Support for Type-Safe Dynamic Java Classes

Scott Malabarba; Raju Pandey; Jeff Gragg; Earl T. Barr; J. Fritz Barnes

Modern software must evolve in response to changing conditions. In the most widely used programming environments, code is static and cannot change at runtime. This poses problems for applications that have limited down-time. More support is needed for dynamic evolution. In this paper we present an approach for supporting dynamic evolution of Java programs. In this approach, Java programs can evolve by changing their components, namely classes, during their execution. Changes in a class lead to changes in its instances, thereby allowing evolution of both code and state. The approach promotes compatibility with existing Java applications, and maintains the security and type safety controls imposed by Javas dynamic linking mechanism. Experimental analyses of our implementation indicate that the implementation imposes a moderate performance penalty relative to the unmodified virtual machine.

IEEE Transactions on Software Engineering | 2015

The Oracle Problem in Software Testing: A Survey

Earl T. Barr; Mark Harman; Phil McMinn; Muzammil Shahbaz; Shin Yoo

Testing involves examining the behaviour of a system in order to discover potential faults. Given an input for a system, the challenge of distinguishing the corresponding desired, correct behaviour from potentially incorrect behavior is called the “test oracle problem”. Test oracle automation is important to remove a current bottleneck that inhibits greater overall test automation. Without test oracle automation, the human has to determine whether observed behaviour is correct. The literature on test oracles has introduced techniques for oracle automation, including modelling, specifications, contract-driven development and metamorphic testing. When none of these is completely adequate, the final source of test oracle information remains the human, who may be aware of informal specifications, expectations, norms and domain specific information that provide informal oracle guidance. All forms of test oracles, even the humble human, involve challenges of reducing cost and increasing benefit. This paper provides a comprehensive survey of current approaches to the test oracle problem and an analysis of trends in this important area of software testing research and practice.

foundations of software engineering | 2014

Learning natural coding conventions

Miltiadis Allamanis; Earl T. Barr; Christian Bird; Charles A. Sutton

Every programmer has a characteristic style, ranging from preferences about identifier naming to preferences about object relationships and design patterns. Coding conventions define a consistent syntactic style, fostering readability and hence maintainability. When collaborating, programmers strive to obey a project’s coding conventions. However, one third of reviews of changes contain feedback about coding conventions, indicating that programmers do not always follow them and that project members care deeply about adherence. Unfortunately, programmers are often unaware of coding conventions because inferring them requires a global view, one that aggregates the many local decisions programmers make and identifies emergent consensus on style. We present NATURALIZE, a framework that learns the style of a codebase, and suggests revisions to improve stylistic consistency. NATURALIZE builds on recent work in applying statistical natural language processing to source code. We apply NATURALIZE to suggest natural identifier names and formatting conventions. We present four tools focused on ensuring natural code during development and release management, including code review. NATURALIZE achieves 94 % accuracy in its top suggestions for identifier names. We used NATURALIZE to generate 18 patches for 5 open source projects: 14 were accepted.

foundations of software engineering | 2015

Suggesting accurate method and class names

Miltiadis Allamanis; Earl T. Barr; Christian Bird; Charles A. Sutton

Descriptive names are a vital part of readable, and hence maintainable, code. Recent progress on automatically suggesting names for local variables tantalizes with the prospect of replicating that success with method and class names. However, suggesting names for methods and classes is much more difficult. This is because good method and class names need to be functionally descriptive, but suggesting such names requires that the model goes beyond local context. We introduce a neural probabilistic language model for source code that is specifically designed for the method naming problem. Our model learns which names are semantically similar by assigning them to locations, called embeddings, in a high-dimensional continuous space, in such a way that names with similar embeddings tend to be used in similar contexts. These embeddings seem to contain semantic information about tokens, even though they are learned only from statistical co-occurrences of tokens. Furthermore, we introduce a variant of our model that is, to our knowledge, the first that can propose neologisms, names that have not appeared in the training corpus. We obtain state of the art results on the method, class, and even the simpler variable naming tasks. More broadly, the continuous embeddings that are learned by our model have the potential for wide application within software engineering.

international conference on software engineering | 2010

Has the bug really been fixed

Zhongxian Gu; Earl T. Barr; David J. Hamilton; Zhendong Su

Software has bugs, and fixing those bugs pervades the software engineering process. It is folklore that bug fixes are often buggy themselves, resulting in bad fixes, either failing to fix a bug or creating new bugs. To confirm this folklore, we explored bug databases of the Ant, AspectJ, and Rhino projects, and found that bad fixes comprise as much as 9% of all bugs. Thus, detecting and correcting bad fixes is important for improving the quality and reliability of software. However, no prior work has systematically considered this bad fix problem, which this paper introduces and formalizes. In particular, the paper formalizes two criteria to determine whether a fix resolves a bug: coverage and disruption. The coverage of a fix measures the extent to which the fix correctly handles all inputs that may trigger a bug, while disruption measures the deviations from the programs intended behavior after the application of a fix. This paper also introduces a novel notion of distance-bounded weakest precondition as the basis for the developed practical techniques to compute the coverage and disruption of a fix. To validate our approach, we implemented Fixation, a prototype that automatically detects bad fixes for Java programs. When it detects a bad fix, Fixation returns an input that still triggers the bug or reports a newly introduced bug. Programmers can then use that bug-triggering input to refine or reformulate their fix. We manually extracted fixes drawn from real-world projects and evaluated Fixation against them: Fixation successfully detected the extracted bad fixes.

symposium on principles of programming languages | 2013

Automatic detection of floating-point exceptions

Earl T. Barr; Thanh Vo; Vu Le; Zhendong Su

It is well-known that floating-point exceptions can be disastrous and writing exception-free numerical programs is very difficult. Thus, it is important to automatically detect such errors. In this paper, we present Ariadne, a practical symbolic execution system specifically designed and implemented for detecting floating-point exceptions. Ariadne systematically transforms a numerical program to explicitly check each exception triggering condition. Ariadne symbolically executes the transformed program using real arithmetic to find candidate real-valued inputs that can reach and trigger an exception. Ariadne converts each candidate input into a floating-point number, then tests it against the original program. In general, approximating floating-point arithmetic with real arithmetic can change paths from feasible to infeasible and vice versa. The key insight of this work is that, for the problem of detecting floating-point exceptions, this approximation works well in practice because, if one input reaches an exception, many are likely to, and at least one of them will do so over both floating-point and real arithmetic. To realize Ariadne, we also devised a novel, practical linearization technique to solve nonlinear constraints. We extensively evaluated Ariadne over 467 scalar functions in the widely used GNU Scientific Library (GSL). Our results show that Ariadne is practical and identifies a large number of real runtime exceptions in GSL. The GSL developers confirmed our preliminary findings and look forward to Ariadnes public release, which we plan to do in the near future.

international conference on software engineering | 2014

Comparing static bug finders and statistical prediction

Foyzur Rahman; Sameer Khatri; Earl T. Barr; Premkumar T. Devanbu

The all-important goal of delivering better software at lower cost has led to a vital, enduring quest for ways to find and remove defects efficiently and accurately. To this end, two parallel lines of research have emerged over the last years. Static analysis seeks to find defects using algorithms that process well-defined semantic abstractions of code. Statistical defect prediction uses historical data to estimate parameters of statistical formulae modeling the phenomena thought to govern defect occurrence and predict where defects are likely to occur. These two approaches have emerged from distinct intellectual traditions and have largely evolved independently, in “splendid isolation”. In this paper, we evaluate these two (largely) disparate approaches on a similar footing. We use historical defect data to apprise the two approaches, compare them, and seek synergies. We find that under some accounting principles, they provide comparable benefits; we also find that in some settings, the performance of certain static bug-finders can be enhanced using information provided by statistical defect prediction.

international symposium on software testing and analysis | 2015

Automated software transplantation

Earl T. Barr; Mark Harman; Yue Jia; Alexandru Marginean; Justyna Petke

Automated transplantation would open many exciting avenues for software development: suppose we could autotransplant code from one system into another, entirely unrelated, system. This paper introduces a theory, an algorithm, and a tool that achieve this. Leveraging lightweight annotation, program analysis identifies an organ (interesting behavior to transplant); testing validates that the organ exhibits the desired behavior during its extraction and after its implantation into a host. While we do not claim automated transplantation is now a solved problem, our results are encouraging: we report that in 12 of 15 experiments, involving 5 donors and 3 hosts (all popular real-world systems), we successfully autotransplanted new functionality and passed all regression tests. Autotransplantation is also already useful: in 26 hours computation time we successfully autotransplanted the H.264 video encoding functionality from the x264 system to the VLC media player; compare this to upgrading x264 within VLC, a task that we estimate, from VLCs version history, took human programmers an average of 20 days of elapsed, as opposed to dedicated, time.

fundamental approaches to software engineering | 2012

Cohesive and isolated development with branches

Earl T. Barr; Christian Bird; Peter C. Rigby; Abram Hindle; Daniel M. German; Premkumar T. Devanbu

The adoption of distributed version control (DVC ), such as Git and Mercurial, in open-source software (OSS) projects has been explosive. Why is this and how are projects using DVC? This new generation of version control supports two important new features: distributed repositories and histories that preserve branches and merges. Through interviews with lead developers in OSS projects and a quantitative analysis of mined data from the histories of sixty project, we find that the vast majority of the projects now using DVC continue to use a centralized model of code sharing, while using branching much more extensively than before their transition to DVC. We then examine the Linux history in depth in an effort to understand and evaluate how branches are used and what benefits they provide. We find that they enable natural collaborative processes: DVC branching allows developers to collaborate on tasks in highly cohesive branches, while enjoying reduced interference from developers working on other tasks, even if those tasks are strongly coupled to theirs.

Explore More