Harald C. Gall | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Harald C. Gall is active.

Explore More

Publication

Featured researches published by Harald C. Gall.

international conference on software maintenance | 1998

Detection of logical coupling based on product release history

Harald C. Gall; Karin Hajek; Mehdi Jazayeri

Code-based metrics such as coupling and cohesion are used to measure a systems structural complexity. But dealing with large systems-those consisting of several millions of lines-at the code level faces many problems. An alternative approach is to concentrate on the systems building blocks such as programs or modules as the unit of examination. We present an approach that uses information in a release history of a system to uncover logical dependencies and change patterns among modules. We have developed the approach by working with 20 releases of a large Telecommunications Switching System. We use release information such as version numbers of programs, modules, and subsystems together with change reports to discover common change behavior (i.e. change patterns) of modules. Our approach identifies logical coupling among modules in such a way that potential structural shortcomings can be identified and further examined, pointing to restructuring or reengineering opportunities.

IEEE Transactions on Software Engineering | 2007

Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction

Beat Fluri; Michael Würsch; Martin Pinzger; Harald C. Gall

A key issue in software evolution analysis is the identification of particular changes that occur across several versions of a program. We present change distilling, a tree differencing algorithm for fine-grained source code change extraction. For that, we have improved the existing algorithm by Chawathe et al. for extracting changes in hierarchically structured data. Our algorithm extracts changes by finding both a match between the nodes of the compared two abstract syntax trees and a minimum edit script that can transform one tree into the other given the computed matching. As a result, we can identify fine-grained change types between program versions according to our taxonomy of source code changes. We evaluated our change distilling algorithm with a benchmark that we developed, which consists of 1,064 manually classified changes in 219 revisions of eight methods from three different open source projects. We achieved significant improvements in extracting types of source code changes: Our algorithm approximates the minimum edit script 45 percent better than the original change extraction approach by Chawathe et al. We are able to find all occurring changes and almost reach the minimum conforming edit script, that is, we reach a mean absolute percentage error of 34 percent, compared to the 79 percent reached by the original algorithm. The paper describes both our change distilling algorithm and the results of our evolution.

foundations of software engineering | 2009

Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Thomas Zimmermann; Nachiappan Nagappan; Harald C. Gall; Emanuel Giger; Brendan Murphy

Prediction of software defects works well within projects as long as there is a sufficient amount of data available to train any models. However, this is rarely the case for new software projects and for many companies. So far, only a few have studies focused on transferring prediction models from one project to another. In this paper, we study cross-project defect prediction models on a large scale. For 12 real-world applications, we ran 622 cross-project predictions. Our results indicate that cross-project prediction is a serious challenge, i.e., simply using models from projects in the same domain or with the same process does not lead to accurate predictions. To help software engineers choose models wisely, we identified factors that do influence the success of cross-project predictions. We also derived decision trees that can provide early estimates for precision, recall, and accuracy before a prediction is attempted.

international workshop on principles of software evolution | 2003

CVS release history data for detecting logical couplings

Harald C. Gall; Mehdi Jazayeri; Jacek Krajewski

The dependencies and interrelations between classes and modules affect the maintainability of object-oriented systems. It is therefore important to capture weaknesses of the software architecture to make necessary corrections. We describe a method for software evolution analysis. It consists of three complementary steps, which form an integrated approach for the reasoning about software structures based on historical data: 1) the quantitative analysis uses version information for the assessment of growth and change behavior; 2) the change sequence analysis identifies common change patterns across all system parts; and 3) the relation analysis compares classes based on CVS release history data and reveals the dependencies within the evolution of particular entities. We focus on the relation analysis and discuss its results; it has been validated based on empirical data collected from a concurrent versions system (CVS) covering 28 months of a picture archiving and communication system (PACS). Our software evolution analysis approach enabled us to detect shortcomings of PACS such as architectural weaknesses, poorly designed inheritance hierarchies, or blurred interfaces of modules.

foundations of software engineering | 2011

Don't touch my code!: examining the effects of ownership on software quality

Christian Bird; Nachiappan Nagappan; Brendan Murphy; Harald C. Gall; Premkumar T. Devanbu

Ownership is a key aspect of large-scale software development. We examine the relationship between different ownership measures and software failures in two large software projects: Windows Vista and Windows 7. We find that in all cases, measures of ownership such as the number of low-expertise developers, and the proportion of ownership for the top owner have a relationship with both pre-release faults and post-release failures. We also empirically identify reasons that low-expertise developers make changes to components and show that the removal of low-expertise contributions dramatically decreases the performance of contribution based defect prediction. Finally we provide recommendations for source code change policies and utilization of resources such as code inspections based on our results.

business process management | 2007

Generation of business process models for object life cycle compliance

Jochen Malte Küster; Ksenia Ryndina; Harald C. Gall

Business process models usually capture data exchanged betweentasks in terms of objects. These objects are commonly standardizedusing reference data models that prescribe, among other things, allowedobject states. Allowed state transitions can be modeled as objectlife cycles that require compliance of business processes. In this paper, wefirst establish a notion of compliance of a business process model with anobject life cycle. We then propose a technique for generating a compliantbusiness process model from a set of given reference object life cycles.

international conference on software maintenance | 1997

Software evolution observations based on product release history

Harald C. Gall; Mehdi Jazayeri; René Klösch; Georg Trausmuth

Large software systems evolve slowly but constantly. In this paper, we examine the structure of several releases of a telecommunication switching system (TSS) based on information stored in a database of product releases. We tracked the historical evolution of the TSS structure and related the adaptations made (e.g. addition of new features, etc.) to the structure of the system. Such a systematic examination can uncover potential shortcomings in the structure of the system and identify modules or subsystems that should be subject to restructuring or reengineering. Further, we have identified additional information that would be useful for such investigations but is currently lacking in the database

international conference on software maintenance | 1999

Visualizing software release histories: the use of color and third dimension

Harald C. Gall; Mehdi Jazayeri; Claudio de la Riva

The data regarding the components of a software system consists of a large amount of information such as version history, number of lines, defect density, and complexity measures. The ability to quickly grasp a comprehensive view of the evolution and dependencies of such information is the key to making informed decisions about future developments of the system. Managers usually make such decision based only on expert judgement. For help in making such decisions, we can turn to the evolution history of large software systems, which contain a wealth of hidden information. Traditionally, this information is passed on through anecdotes without any supporting analytical data. This paper reports on our attempts to make such information more concrete through information visualization techniques. We present a three-dimensional visual representation for examining a systems software release history. The structure of the system is displayed by 2D or 3D graphs. The historical information is displayed by using time as the third dimension. Colors are used for displaying module properties and their historical changes in the system. A supporting software tool enables not only visualization but also navigation in the 3D space to change the viewpoint, to browse system information, to find interesting patterns and to discover previously unknown relationships among system components.

Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering | 2010

Predicting the fix time of bugs

Emanuel Giger; Martin Pinzger; Harald C. Gall

Two important questions concerning the coordination of development effort are which bugs to fix first and how long it takes to fix them. In this paper we investigate empirically the relationships between bug report attributes and the time to fix. The objective is to compute prediction models that can be used to recommend whether a new bug should and will be fixed fast or will take more time for resolution. We examine in detail if attributes of a bug report can be used to build such a recommender system. We use decision tree analysis to compute and 10-fold cross validation to test prediction models. We explore prediction models in a series of empirical studies with bug report data of six systems of the three open source projects Eclipse, Mozilla, and Gnome. Results show that our models perform significantly better than random classification. For example, fast fixed Eclipse Platform bugs were classified correctly with a precision of 0.654 and a recall of 0.692. We also show that the inclusion of postsubmission bug report data of up to one month can further improve prediction models.

international symposium on software reliability engineering | 2009

Putting It All Together: Using Socio-technical Networks to Predict Failures

Christian Bird; Nachiappan Nagappan; Harald C. Gall; Brendan Murphy; Premkumar T. Devanbu

Studies have shown that social factors in development organizations have adramatic effect on software quality. Separately, program dependencyinformation has also been used successfully to predict which software componentsare more fault prone. Interestingly, the influence of these two phenomena haveonly been studied separately. Intuition and practical experience suggests,however, that task assignment (i.e. who worked on which components and howmuch) and dependency structure (which components have dependencies on others)together interact to influence the quality of the resulting software. Westudy the influence of combined socio-technical software networks onthe fault-proneness of individual software components within a system. Thenetwork properties of a software component in this combined network are ableto predict if an entity is failure prone with greater accuracy than priormethods which use dependency or contribution information in isolation. Weevaluate our approach in different settings by using it on Windows Vista andacross six releases of the Eclipse development environment including usingmodels built from one release to predict failure prone components in the nextrelease. We compare this to previous work. In every case, our method performsas well or better and is able to more accurately identify those softwarecomponents that have more post-release failures, with precision and recallrates as high as 85%.

Explore More