Michael W. Godfrey | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael W. Godfrey is active.

Explore More

Publication

Featured researches published by Michael W. Godfrey.

IEEE Transactions on Software Engineering | 2005

Using origin analysis to detect merging and splitting of source code entities

Michael W. Godfrey; Lijie Zou

Merging and splitting source code entities is a common activity during the lifespan of a software system; as developers rethink the essential structure of a system or plan for a new evolutionary direction, so must they be able to reorganize the design artifacts at various abstraction levels as seems appropriate. However, while the raw effects of such changes may be plainly evident in the new artifacts, the original context of the design changes is often lost. That is, it may be obvious which characters of which files have changed, but it may not be obvious where or why moving, renaming, merging, and/or splitting of design elements has occurred. In this paper, we discuss how we have extended origin analysis (Q. Tu et al., 2002), (M.W. Godfrey et al., 2002) to aid in the detection of merging and splitting of files and functions in procedural code; in particular, we show how reasoning about how call relationships have changed can aid a developer in locating where merges and splits have occurred, thereby helping to recover some information about the context of the design change. We also describe a case study of these techniques (as implemented in the Beagle tool) using the PostgreSQL database system as the subject.

working conference on reverse engineering | 2006

Cloning Considered Harmful Considered Harmful

Cory Kapser; Michael W. Godfrey

Current literature on the topic of duplicated (cloned) code in software systems often considers duplication harmful to the system quality and the reasons commonly cited for duplicating code often have a negative connotation. While these positions are sometimes correct, during our case studies we have found that this is not universally true, and we have found several situations where code duplication seems to be a reasonable or even beneficial design option. For example, a method of introducing experimental changes to core subsystems is to duplicate the subsystem and introduce changes there in a kind of sandbox testbed. As features mature and become stable within the experimental subsystem, they can then be introduced gradually into the stable code base. In this way risk of introducing instabilities in the stable version is minimized. This paper describes several patterns of cloning that we have encountered in our case studies and discusses the advantages and disadvantages associated with using them

2008 Frontiers of Software Maintenance | 2008

The past, present, and future of software evolution

Michael W. Godfrey; Daniel M. German

Change is an essential characteristic of software development, as software systems must respond to evolving requirements, platforms, and other environmental pressures. In this paper, we discuss the concept of software evolution from several perspectives. We examine how it relates to and differs from software maintenance. We discuss insights about software evolution arising from Lehmanpsilas laws of software evolution and the staged lifecycle model of Bennett and Rajlich. We compare software evolution to other kinds of evolution, from science and social sciences, and we examine the forces that shape change. Finally, we discuss the changing nature of software in general as it relates to evolution, and we propose open challenges and future directions for software evolution research.

workshop on program comprehension | 2002

An integrated approach for studying architectural evolution

Qiang Tu; Michael W. Godfrey

Studying how a software system has evolved over time is difficult, time consuming, and costly; existing techniques are often limited in their applicability, are hard to extend, and provide little support for coping with architectural change. The paper introduces an approach to studying software evolution that integrates the use of metrics, software visualization, and origin analysis, which is a set of techniques for reasoning about structural and architectural change. Our approach incorporates data from various statistical and metrics tools, and provides a query engine as well as a Web-based visualization and navigation interface. It aims to provide an extensible, integrated environment for aiding software maintainers in understanding the evolution of long-lived systems that have undergone significant architectural change. We use the evolution of GCC as an example to demonstrate the uses of various functionalities of BEAGLE, a prototype implementation of the proposed environment.

international conference on software maintenance | 2009

What's hot and what's not: Windowed developer topic analysis

Abram Hindle; Michael W. Godfrey; Richard C. Holt

As development on a software project progresses, developers shift their focus between different topics and tasks many times. Managers and newcomer developers often seek ways of understanding what tasks have recently been worked on and how much effort has gone into each; for example, a manager might wonder what unexpected tasks occupied their teams attention during a period when they were supposed to have been implementing new features. Tools such as Latent Dirichlet Allocation (LDA) and Latent Semantic Indexing (LSI) can be used to extract a set of independent topics from a corpus of commit-log comments. Previous work in the area has created a single set of topics by analyzing comments from the entire lifetime of the project. In this paper, we propose windowing the topic analysis to give a more nuanced view of the systems evolution. By using a defined time-window of, for example, one month, we can track which topics come and go over time, and which ones recur. We propose visualizations of this model that allows us to explore the evolving stream of topics of development occurring over time. We demonstrate that windowed topic analysis offers advantages over topic analysis applied to a projects lifetime because many topics are quite local.

mining software repositories | 2011

Software bertillonage: finding the provenance of an entity

Julius Davies; Daniel M. German; Michael W. Godfrey; Abram Hindle

Deployed software systems are typically composed of many pieces, not all of which may have been created by the main development team. Often, the provenance of included components -- such as external libraries or cloned source code -- is not clearly stated, and this uncertainty can introduce technical and ethical concerns that make it difficult for system owners and other stakeholders to manage their software assets. In this work, we motivate the need for the recovery of the provenance of software entities by a broad set of techniques that could include signature matching, source code fact extraction, software clone detection, call flow graph matching, string matching, historical analyses, and other techniques. We liken our provenance goals to that of Bertillonage, a simple and approximate forensic analysis technique based on bio-metrics that was developed in 19th century France before the advent of fingerprints. As an example, we have developed a fast, simple, and approximate technique called anchored signature matching for identifying library version information within a given Java application. This technique involves a type of structured signature matching performed against a database of candidates drawn from the Maven2 repository, a 150GB collection of open source Java libraries. An exploratory case study using a proprietary e-commerce Java application illustrates that the approach is both feasible and effective.

international workshop on principles of software evolution | 2004

Aiding comprehension of cloning through categorization

Cory Kapser; Michael W. Godfrey

Management of duplicated code in software systems is important in ensuring its graceful evolution. Commonly clone detection tools return large numbers of detected clones with little or no information about them, making clone management impractical and unscalable. We have used taxonomy of clones to augment current clone detection tools in order to increase the user comprehension of duplication of code within software systems and filter false positives from the clone set. We support our arguments by means of 2 case studies, where we found that as much as 53% of clones can be grouped to form function clones or partial function clones and we were able to filter out as many as 65% of clones as false positives from the reported clone pairs.

international conference on software maintenance | 2005

Improved tool support for the investigation of duplication in software

Cory Kapser; Michael W. Godfrey

Code duplication is a well documented problem in software systems. There has been considerable research into techniques for detecting duplication in software, and there are several effective tools to perform this task. However, a common problem with such tools is that the result set returned can be too large to handle without complementary tool support. The goal of this paper is to describe the criteria for a complete tool that is designed to aid in the comprehension of cloning within a software system. Furthermore, we present a prototype of such a tool and demonstrate the value of its features through a case study on the Apache httpd Web server. For example, in our study we found that a single subsystem comprising only 17% of the system code contained 38.8% of the clones.

international conference on program comprehension | 2009

A bug you like: A framework for automated assignment of bugs

Olga Baysal; Michael W. Godfrey; Robin Cohen

Assigning bug reports to individual developers is typically a manual, time-consuming, and tedious task. In this paper, we present a framework for automated assignment of bug-fixing tasks. Our approach employs preference elicitation to learn developer predilections in fixing bugs within a given system. This approach infers knowledge about a developers expertise by analyzing the history of bugs previously resolved by the developer. We apply a vector space model to recommend experts for resolving bugs. When a new bug report arrives, the system automatically assigns it to the appropriate developer considering his or her expertise, current workload, and preferences. We address the task allocation problem by proposing a set of heuristics that support accurate assignment of bug reports to the developers.

mining software repositories | 2011

Automated topic naming to support cross-project analysis of software maintenance activities

Abram Hindle; Neil A. Ernst; Michael W. Godfrey; John Mylopoulos

Researchers have employed a variety of techniques to extract underlying topics that relate to software development artifacts. Typically, these techniques use semi-unsupervised machine-learning algorithms to suggest candidate word-lists. However, word-lists are difficult to interpret in the absence of meaningful summary labels. Current topic modeling techniques assume manual labelling and do not use domainspecific knowledge to improve, contextualize, or describe results for the developers. We propose a solution: automated labelled topic extraction. Topics are extracted using Latent Dirichlet Allocation (LDA) from commit-log comments recovered from source control systems such as CVS and Bit-Keeper. These topics are given labels from a generalizable cross-project taxonomy, consisting of non-functional requirements. Our approach was evaluated with experiments and case studies on two large-scale RDBMS projects: MySQL and MaxDB. The case studies show that labelled topic extraction can produce appropriate, context-sensitive labels relevant to these projects, which provides fresh insight into their evolving software development activities.

Explore More