Emily Hill | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Emily Hill is active.

Explore More

Publication

Featured researches published by Emily Hill.

automated software engineering | 2010

Towards automatically generating summary comments for Java methods

Giriprasad Sridhara; Emily Hill; Divya Muppaneni; Lori L. Pollock; K. Vijay-Shanker

Studies have shown that good comments can help programmers quickly understand what a method does, aiding program comprehension and software maintenance. Unfortunately, few software projects adequately comment the code. One way to overcome the lack of human-written summary comments, and guard against obsolete comments, is to automatically generate them. In this paper, we present a novel technique to automatically generate descriptive summary comments for Java methods. Given the signature and body of a method, our automatic comment generator identifies the content for the summary and generates natural language text that summarizes the methods overall actions. According to programmers who judged our generated comments, the summaries are accurate, do not miss important content, and are reasonably concise.

aspect-oriented software development | 2007

Using natural language program analysis to locate and understand action-oriented concerns

David C. Shepherd; Zachary P. Fry; Emily Hill; Lori L. Pollock; K. Vijay-Shanker

Most current software systems contain undocumented high-level ideas implemented across multiple files and modules. When developers perform program maintenance tasks, they often waste time and effort locating and understanding these scattered concerns. We have developed a semi-automated concern location and comprehension tool, Find-Concept, designed to reduce the time developers spend on maintenance tasks and to increase their confidence in the results of these tasks. Find-Concept is effective because it searches a unique natural language-based representation of source code, uses novel techniques to expand initial queries into more effective queries, and displays search results in an easy-to-comprehend format. We describe the Find-Concept tool, the underlying program analysis, and an experimental study comparing Find-Concepts search effectiveness with two state-of-the-art lexical and information retrieval-based search tools. Across nine action-oriented concern location tasks derived from open source bug reports, our Eclipse-based tool produced more effective queries more consistently than either competing search tool with similar user effort.

international conference on software engineering | 2009

Automatically capturing source code context of NL-queries for software maintenance and reuse

Emily Hill; Lori L. Pollock; K. Vijay-Shanker

As software systems continue to grow and evolve, locating code for maintenance and reuse tasks becomes increasingly difficult. Existing static code search techniques using natural language queries provide little support to help developers determine whether search results are relevant, and few recommend alternative words to help developers reformulate poor queries. In this paper, we present a novel approach that automatically extracts natural language phrases from source code identifiers and categorizes the phrases and search results in a hierarchy. Our contextual search approach allows developers to explore the word usage in a piece of software, helping them to quickly identify relevant program elements for investigation or to quickly recognize alternative words for query reformulation. An empirical evaluation of 22 developers reveals that our contextual search approach significantly outperforms the most closely related technique in terms of effort and effectiveness.

mining software repositories | 2009

Mining source code to automatically split identifiers for software analysis

Eric Enslen; Emily Hill; Lori L. Pollock; K. Vijay-Shanker

Automated software engineering tools (e.g., program search, concern location, code reuse, quality assessment, etc.) increasingly rely on natural language information from comments and identifiers in code. The first step in analyzing words from identifiers requires splitting identifiers into their constituent words. Unlike natural languages, where space and punctuation are used to delineate words, identifiers cannot contain spaces. One common way to split identifiers is to follow programming language naming conventions. For example, Java programmers often use camel case, where words are delineated by uppercase letters or non-alphabetic characters. However, programmers also create identifiers by concatenating sequences of words together with no discernible delineation, which poses challenges to automatic identifier splitting. In this paper, we present an algorithm to automatically split identifiers into sequences of words by mining word frequencies in source code. With these word frequencies, our identifier splitter uses a scoring technique to automatically select the most appropriate partitioning for an identifier. In an evaluation of over 8000 identifiers from open source Java programs, our Samurai approach outperforms the existing state of the art techniques.

automated software engineering | 2007

Exploring the neighborhood with dora to expedite software maintenance

Emily Hill; Lori L. Pollock; K. Vijay-Shanker

Completing software maintenance and evolution tasks for todayâ s large, complex software systems can be difficult, often requiring considerable time to understand the system well enough to make correct changes. Despite evidence that successful programmers use program structure as well as identifier names to explore software, most existing program exploration techniques use either structural or lexical identifier information. By using only one type of information, automated tools ignore valuable clues about a developers intentions - clues critical to the human program comprehension process. In this paper, we present and evaluate a technique that exploits both program structure and lexical information to help programmers more effectively explore programs. Our approach uses structural information to focus automated program exploration and lexical information to prune irrelevant structure edges from consideration. For the important program exploration step of expanding from a seed, our experimental results demonstrate that an integrated lexical-and structural-based approach is significantly more effective than a state-of-the-art structural program exploration technique

mining software repositories | 2008

AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools

Emily Hill; Zachary P. Fry; Haley Boyd; Giriprasad Sridhara; Yana Novikova; Lori L. Pollock; K. Vijay-Shanker

When writing software, developers often employ abbreviations in identifier names. In fact, some abbreviations may never occur with the expanded word, or occur more often in the code. However, most existing program comprehension and search tools do little to address the problem of abbreviations, and therefore may miss meaningful pieces of code or relationships between software artifacts. In this paper, we present an automated approach to mining abbreviation expansions from source code to enhance software maintenance tools that utilize natural language information. Our scoped approach uses contextual information at the method, program, and general software level to automatically select the most appropriate expansion for a given abbreviation. We evaluated our approach on a set of 250 potential abbreviations and found that our scoped approach provides a 57% improvement in accuracy over the current state of the art.

foundations of software engineering | 2007

Does a programmer's activity indicate knowledge of code?

Thomas Fritz; Gail C. Murphy; Emily Hill

The practice of software development can likely be improved if an externalized model of each programmers knowledge of a particular code base is available. Some tools already assume a useful form of such a model can be created from data collected during development, such as expertise recommenders that use information about who has changed each file to suggest who might answer questions about particular parts of a system. In this paper, we report on an empirical study that investigates whether a programmers activity can be used to build a model of what a programmer knows about a code base. In this study, nineteen professional Java programmers completed a series of questionnaires about the code on which they were working. These questionnaires were generated automatically and asked about program elements a programmer had worked with frequently and recently and ones that he had not. We found that a degree of interest model based on this frequency and recency of interaction can often indicate the parts of the code base for which the programmer has knowledge. We also determined a number of factors that may be used to improve the model, such as authorship of program elements, the role of elements, and the task being performed.

international conference on program comprehension | 2008

Identifying Word Relations in Software: A Comparative Study of Semantic Similarity Tools

Giriprasad Sridhara; Emily Hill; Lori L. Pollock; K. Vijay-Shanker

Modern software systems are typically large and complex, making comprehension of these systems extremely difficult. Experienced programmers comprehend code by seamlessly processing synonyms and other word relations. Thus, we believe that automated comprehension and software tools can be significantly improved by leveraging word relations in software. In this paper, we perform a comparative study of six state of the art, English-based semantic similarity techniques and evaluate their effectiveness on words from the comments and identifiers in software. Our results suggest that applying English-based semantic similarity techniques to software without any customization could be detrimental to the performance of the client software tools. We propose strategies to customize the existing semantic similarity techniques to software, and describe how various program comprehension tools can benefit from word relation information.

automated software engineering | 2011

Improving source code search with natural language phrasal representations of method signatures

Emily Hill; Lori L. Pollock; K. Vijay-Shanker

As software continues to grow, locating code for maintenance tasks becomes increasingly difficult. Software search tools help developers find source code relevant to their maintenance tasks. One major challenge to successful search tools is locating relevant code when the users query contains words with multiple meanings or words that occur frequently throughout the program. Traditional search techniques, which treat each word individually, are unable to distinguish relevant and irrelevant methods under these conditions. In this paper, we present a novel search technique that uses information such as the position of the query word and its semantic role to calculate relevance. Our evaluation shows that this approach is more consistently effective than three other state of the art search techniques.

international conference on software maintenance | 2011

A comparison of stemmers on source code identifiers for software search

Andrew Wiese; Valerie Ho; Emily Hill

As the popularity of text-based source code analysis grows, the use of stemmers to strip suffixes has increased. Stemmers have been used to more accurately determine relevance between a keyword query and methods in source code for search, exploration, and bug localization. In this paper, we investigate which traditional stemmers perform best on the domain of software, specifically, Java source code. We compare the stemmers using two case studies: a comparative analysis of the unified word classes in terms of accuracy and completeness, as well as an investigation into the effectiveness of stemming for software search. Our results indicate that relative stemmer effectiveness varies with a software engineering tool such as search, justifying further research into this area.

Explore More