Harald C. Gall
University of Zurich
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Harald C. Gall.
international conference on software maintenance | 1998
Harald C. Gall; Karin Hajek; Mehdi Jazayeri
Code-based metrics such as coupling and cohesion are used to measure a systems structural complexity. But dealing with large systems-those consisting of several millions of lines-at the code level faces many problems. An alternative approach is to concentrate on the systems building blocks such as programs or modules as the unit of examination. We present an approach that uses information in a release history of a system to uncover logical dependencies and change patterns among modules. We have developed the approach by working with 20 releases of a large Telecommunications Switching System. We use release information such as version numbers of programs, modules, and subsystems together with change reports to discover common change behavior (i.e. change patterns) of modules. Our approach identifies logical coupling among modules in such a way that potential structural shortcomings can be identified and further examined, pointing to restructuring or reengineering opportunities.
IEEE Transactions on Software Engineering | 2007
Beat Fluri; Michael Würsch; Martin Pinzger; Harald C. Gall
A key issue in software evolution analysis is the identification of particular changes that occur across several versions of a program. We present change distilling, a tree differencing algorithm for fine-grained source code change extraction. For that, we have improved the existing algorithm by Chawathe et al. for extracting changes in hierarchically structured data. Our algorithm extracts changes by finding both a match between the nodes of the compared two abstract syntax trees and a minimum edit script that can transform one tree into the other given the computed matching. As a result, we can identify fine-grained change types between program versions according to our taxonomy of source code changes. We evaluated our change distilling algorithm with a benchmark that we developed, which consists of 1,064 manually classified changes in 219 revisions of eight methods from three different open source projects. We achieved significant improvements in extracting types of source code changes: Our algorithm approximates the minimum edit script 45 percent better than the original change extraction approach by Chawathe et al. We are able to find all occurring changes and almost reach the minimum conforming edit script, that is, we reach a mean absolute percentage error of 34 percent, compared to the 79 percent reached by the original algorithm. The paper describes both our change distilling algorithm and the results of our evolution.
foundations of software engineering | 2009
Thomas Zimmermann; Nachiappan Nagappan; Harald C. Gall; Emanuel Giger; Brendan Murphy
Prediction of software defects works well within projects as long as there is a sufficient amount of data available to train any models. However, this is rarely the case for new software projects and for many companies. So far, only a few have studies focused on transferring prediction models from one project to another. In this paper, we study cross-project defect prediction models on a large scale. For 12 real-world applications, we ran 622 cross-project predictions. Our results indicate that cross-project prediction is a serious challenge, i.e., simply using models from projects in the same domain or with the same process does not lead to accurate predictions. To help software engineers choose models wisely, we identified factors that do influence the success of cross-project predictions. We also derived decision trees that can provide early estimates for precision, recall, and accuracy before a prediction is attempted.
international workshop on principles of software evolution | 2003
Harald C. Gall; Mehdi Jazayeri; Jacek Krajewski
The dependencies and interrelations between classes and modules affect the maintainability of object-oriented systems. It is therefore important to capture weaknesses of the software architecture to make necessary corrections. We describe a method for software evolution analysis. It consists of three complementary steps, which form an integrated approach for the reasoning about software structures based on historical data: 1) the quantitative analysis uses version information for the assessment of growth and change behavior; 2) the change sequence analysis identifies common change patterns across all system parts; and 3) the relation analysis compares classes based on CVS release history data and reveals the dependencies within the evolution of particular entities. We focus on the relation analysis and discuss its results; it has been validated based on empirical data collected from a concurrent versions system (CVS) covering 28 months of a picture archiving and communication system (PACS). Our software evolution analysis approach enabled us to detect shortcomings of PACS such as architectural weaknesses, poorly designed inheritance hierarchies, or blurred interfaces of modules.
foundations of software engineering | 2011
Christian Bird; Nachiappan Nagappan; Brendan Murphy; Harald C. Gall; Premkumar T. Devanbu
Ownership is a key aspect of large-scale software development. We examine the relationship between different ownership measures and software failures in two large software projects: Windows Vista and Windows 7. We find that in all cases, measures of ownership such as the number of low-expertise developers, and the proportion of ownership for the top owner have a relationship with both pre-release faults and post-release failures. We also empirically identify reasons that low-expertise developers make changes to components and show that the removal of low-expertise contributions dramatically decreases the performance of contribution based defect prediction. Finally we provide recommendations for source code change policies and utilization of resources such as code inspections based on our results.
business process management | 2007
Jochen Malte Küster; Ksenia Ryndina; Harald C. Gall
Business process models usually capture data exchanged betweentasks in terms of objects. These objects are commonly standardizedusing reference data models that prescribe, among other things, allowedobject states. Allowed state transitions can be modeled as objectlife cycles that require compliance of business processes. In this paper, wefirst establish a notion of compliance of a business process model with anobject life cycle. We then propose a technique for generating a compliantbusiness process model from a set of given reference object life cycles.
international conference on software maintenance | 1997
Harald C. Gall; Mehdi Jazayeri; René Klösch; Georg Trausmuth
Large software systems evolve slowly but constantly. In this paper, we examine the structure of several releases of a telecommunication switching system (TSS) based on information stored in a database of product releases. We tracked the historical evolution of the TSS structure and related the adaptations made (e.g. addition of new features, etc.) to the structure of the system. Such a systematic examination can uncover potential shortcomings in the structure of the system and identify modules or subsystems that should be subject to restructuring or reengineering. Further, we have identified additional information that would be useful for such investigations but is currently lacking in the database
international conference on software maintenance | 1999
Harald C. Gall; Mehdi Jazayeri; Claudio de la Riva
The data regarding the components of a software system consists of a large amount of information such as version history, number of lines, defect density, and complexity measures. The ability to quickly grasp a comprehensive view of the evolution and dependencies of such information is the key to making informed decisions about future developments of the system. Managers usually make such decision based only on expert judgement. For help in making such decisions, we can turn to the evolution history of large software systems, which contain a wealth of hidden information. Traditionally, this information is passed on through anecdotes without any supporting analytical data. This paper reports on our attempts to make such information more concrete through information visualization techniques. We present a three-dimensional visual representation for examining a systems software release history. The structure of the system is displayed by 2D or 3D graphs. The historical information is displayed by using time as the third dimension. Colors are used for displaying module properties and their historical changes in the system. A supporting software tool enables not only visualization but also navigation in the 3D space to change the viewpoint, to browse system information, to find interesting patterns and to discover previously unknown relationships among system components.
Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering | 2010
Emanuel Giger; Martin Pinzger; Harald C. Gall
Two important questions concerning the coordination of development effort are which bugs to fix first and how long it takes to fix them. In this paper we investigate empirically the relationships between bug report attributes and the time to fix. The objective is to compute prediction models that can be used to recommend whether a new bug should and will be fixed fast or will take more time for resolution. We examine in detail if attributes of a bug report can be used to build such a recommender system. We use decision tree analysis to compute and 10-fold cross validation to test prediction models. We explore prediction models in a series of empirical studies with bug report data of six systems of the three open source projects Eclipse, Mozilla, and Gnome. Results show that our models perform significantly better than random classification. For example, fast fixed Eclipse Platform bugs were classified correctly with a precision of 0.654 and a recall of 0.692. We also show that the inclusion of postsubmission bug report data of up to one month can further improve prediction models.
international symposium on software reliability engineering | 2009
Christian Bird; Nachiappan Nagappan; Harald C. Gall; Brendan Murphy; Premkumar T. Devanbu
Studies have shown that social factors in development organizations have adramatic effect on software quality. Separately, program dependencyinformation has also been used successfully to predict which software componentsare more fault prone. Interestingly, the influence of these two phenomena haveonly been studied separately. Intuition and practical experience suggests,however, that task assignment (i.e. who worked on which components and howmuch) and dependency structure (which components have dependencies on others)together interact to influence the quality of the resulting software. Westudy the influence of combined socio-technical software networks onthe fault-proneness of individual software components within a system. Thenetwork properties of a software component in this combined network are ableto predict if an entity is failure prone with greater accuracy than priormethods which use dependency or contribution information in isolation. Weevaluate our approach in different settings by using it on Windows Vista andacross six releases of the Eclipse development environment including usingmodels built from one release to predict failure prone components in the nextrelease. We compare this to previous work. In every case, our method performsas well or better and is able to more accurately identify those softwarecomponents that have more post-release failures, with precision and recallrates as high as 85%.