Nicholas A. Kraft | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nicholas A. Kraft is active.

Explore More

Publication

Featured researches published by Nicholas A. Kraft.

Information & Software Technology | 2010

Bug localization using latent Dirichlet allocation

Stacy K. Lukins; Nicholas A. Kraft; Letha H. Etzkorn

Context: Some recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI). Latent Dirichlet allocation (LDA) is a generative statistical model that has significant advantages, in modularity and extensibility, over both LSI and probabilistic LSI (pLSI). Moreover, LDA has been shown effective in topic model based information retrieval. In this paper, we present a static LDA-based technique for automatic bug localization and evaluate its effectiveness. Objective: We evaluate the accuracy and scalability of the LDA-based technique and investigate whether it is suitable for use with open-source software systems of varying size, including those developed using agile methods. Method: We present five case studies designed to determine the accuracy and scalability of the LDA-based technique, as well as its relationships to software system size and to source code stability. The studies examine over 300 bugs across more than 25 iterations of three software systems. Results: The results of the studies show that the LDA-based technique maintains sufficient accuracy across all bugs in a single iteration of a software system and is scalable to a large number of bugs across multiple revisions of two software systems. The results of the studies also indicate that the accuracy of the LDA-based technique is not affected by the size of the subject software system or by the stability of its source code base. Conclusion: We conclude that an effective static technique for automatic bug localization can be built around LDA. We also conclude that there is no significant relationship between the accuracy of the LDA-based technique and the size of the subject software system or the stability of its source code base. Thus, the LDA-based technique is widely applicable.

mining software repositories | 2013

Building reputation in StackOverflow: An empirical investigation

Amiangshu Bosu; Christopher S. Corley; Dustin Heaton; Debarshi Chatterji; Jeffrey C. Carver; Nicholas A. Kraft

StackOverflow (SO) contributors are recognized by reputation scores. Earning a high reputation score requires technical expertise and sustained effort. We analyzed the SO data from four perspectives to understand the dynamics of reputation building on SO. The results of our analysis provide guidance to new SO contributors who want to earn high reputation scores quickly. In particular, the results indicate that the following activities can help to build reputation quickly: answering questions related to tags with lower expertise density, answering questions promptly, being the first one to answer a question, being active during off peak hours, and contributing to diverse areas.

Journal of Software: Evolution and Process | 2013

Clone evolution: a systematic review

Jeremy R. Pate; Robert Tairas; Nicholas A. Kraft

Detection of code clones — similar or identical source code fragments — is of concern both to researchers and to practitioners. An analysis of the clone detection results for a single source code version provides a developer with information about a discrete state in the evolution of the software system. However, tracing clones across multiple source code versions permits a clone analysis to consider a temporal dimension. Such an analysis of clone evolution can be used to uncover the patterns and characteristics exhibited by clones as they evolve within a system. Developers can use the results of this analysis to understand the clones more completely, which may help them to manage the clones more effectively. Thus, studies of clone evolution serve a key role in understanding and addressing issues of cloning in software. In this paper, we present a systematic review of the literature on clone evolution. In particular, we present a detailed analysis of 30 relevant papers that we identified in accordance with our review protocol. The review results were organized to address three research questions. Through our answers to these questions, we present the methods that researchers have used to study clone evolution, the patterns that researchers have found evolving clones to exhibit, and the evidence that researchers have established regarding the extent of inconsistent change undergone by clones during software evolution. Overall, the review results indicate that whereas researchers have conducted several empirical studies of clone evolution, there are contradictions among the reported findings, particularly regarding the lifetimes of clone lineages and the consistency with which clones are changed during software evolution. We identify human‐based empirical studies and classification of clone evolution patterns as two areas that are in particular need of further work. Copyright

Empirical Software Engineering | 2014

Configuring latent Dirichlet allocation based feature location

Lauren R. Biggers; Cecylia Bocovich; Riley Capshaw; Brian P. Eddy; Letha H. Etzkorn; Nicholas A. Kraft

Feature location is a program comprehension activity, the goal of which is to identify source code entities that implement a functionality. Recent feature location techniques apply text retrieval models such as latent Dirichlet allocation (LDA) to corpora built from text embedded in source code. These techniques are highly configurable, and the literature offers little insight into how different configurations affect their performance. In this paper we present a study of an LDA based feature location technique (FLT) in which we measure the performance effects of using different configurations to index corpora and to retrieve 618 features from 6 open source Java systems. In particular, we measure the effects of the query, the text extractor configuration, and the LDA parameter values on the accuracy of the LDA based FLT. Our key findings are that exclusion of comments and literals from the corpus lowers accuracy and that heuristics for selecting LDA parameter values in the natural language context are suboptimal in the source code context. Based on the results of our case study, we offer specific recommendations for configuring the LDA based FLT.

international conference on program comprehension | 2013

Evaluating source code summarization techniques: Replication and expansion

Brian P. Eddy; Jeffrey Robinson; Nicholas A. Kraft; Jeffrey C. Carver

During software evolution a developer must investigate source code to locate then understand the entities that must be modified to complete a change task. To help developers in this task, Haiduc et al. proposed text summarization based approaches to the automatic generation of class and method summaries, and via a study of four developers, they evaluated source code summaries generated using their techniques. In this paper we propose a new topic modeling based approach to source code summarization, and via a study of 14 developers, we evaluate source code summaries generated using the proposed technique. Our study partially replicates the original study by Haiduc et al. in that it uses the objects, the instruments, and a subset of the summaries from the original study, but it also expands the original study in that it includes more subjects and new summaries. The results of our study both support the findings of the original and provide new insights into the processes and criteria that developers use to evaluate source code summaries. Based on our results, we suggest future directions for research on source code summarization.

Journal of Software Engineering and Applications | 2009

Cyclomatic Complexity and Lines of Code: Empirical Evidence of a Stable Linear Relationship

Graylin Trevor Jay; Joanne E. Hale; Randy K. Smith; David P. Hale; Nicholas A. Kraft; Charles Ward

Researchers have often commented on the high correlation between McCabe’s Cyclomatic Complexity (CC) and lines of code (LOC). Many have believed this correlation high enough to justify adjusting CC by LOC or even substituting LOC for CC. However, from an empirical standpoint the relationship of CC to LOC is still an open one. We undertake the largest statistical study of this relationship to date. Employing modern regression techniques, we find the linearity of this relationship has been severely underestimated, so much so that CC can be said to have absolutely no explanatory power of its own. This research presents evidence that LOC and CC have a stable practically perfect linear relationship that holds across programmers, languages, code paradigms (procedural versus object-oriented), and software processes. Linear models are developed relating LOC and CC. These models are verified against over 1.2 million randomly selected source files from the SourceForge code repository. These files represent software projects from three target languages (C, C++, and Java) and a variety of programmer experience levels, software architectures, and development methodologies. The models developed are found to successfully predict roughly 90% of CC’s variance by LOC alone. This suggest not only that the linear relationship between LOC and CC is stable, but the aspects of code complexity that CC measures, such as the size of the test case space, grow linearly with source code size across languages and programming paradigms.

Proceedings of the 6th International Workshop on Traceability in Emerging Forms of Software Engineering | 2011

Recovering traceability links between source code and fixed bugs via patch analysis

Christopher S. Corley; Nicholas A. Kraft; Letha H. Etzkorn; Stacy K. Lukins

Traceability links can be recovered using data mined from a revision control system, such as CVS, and an issue tracking system, such as Bugzilla. Existing approaches to recover links between a bug and the methods changed to fix the bug rely on the presence of the bugs identifier in a CVS log message. In this paper we present an approach that relies instead on the presence of a patch in the issue report for the bug. That is, rather than analyzing deltas retrieved from CVS to recover links, our approach analyzes patches retrieved from Bugzilla. We use BugTrace, the tool implementing our approach, to conduct a case study in which we compare the links recovered by our approach to links recovered by manual inspection. The results of the case study support the efficacy of our approach. After describing the limitations of our case study, we conclude by reviewing closely related work and suggesting possible future work.

international conference on program comprehension | 2013

Structural information based term weighting in text retrieval for feature location

Blake Bassett; Nicholas A. Kraft

Many recent feature location techniques (FLTs) apply text retrieval (TR) techniques to corpora built from text embedded in source code. Term weighting is a standard preprocessing step in TR and is used to adjust the importance of a term within a document or corpus. Common term weighting schemes such as tf-idf may not be optimal for use with source code, because they originate from a natural language context and were designed for use with unstructured documents. In this paper we propose a new approach to term weighting in which term weights are assigned using the structural information from the source code. We then evaluate the proposed approach by conducting an empirical study of a TR-based FLT. In all, we study over 400 bugs and features from five open source Java systems and find that structural term weighting can cause a statistically significant improvement in the accuracy of the FLT.

empirical software engineering and measurement | 2013

Identifying Barriers to the Systematic Literature Review Process

Jeffrey C. Carver; Edgar E. Hassler; Elis Hernandes; Nicholas A. Kraft

Conducting a systematic literature review (SLR) is difficult and time-consuming for an experienced researcher, and even more so for a novice graduate student. With a better understanding of the most common difficulties in the SLR process, mentors will be better prepared to guide novices through the process. This understanding will help researchers have more realistic expectations of the SLR process and will help mentors guide novices through its planning, execution, and documentation phases. Consequently, the objectives of this work are to identify the most difficult and time-consuming phases of the SLR process. Using data from two sources - 52 responses to an online survey sent to all authors of SLRs published in software engineering venues and qualitative experience reports from 8 PhD students who conducted SLRs as part of a course - we identified specific difficulties related to each phase of the SLR process. Our findings highlight the importance of planning, teamwork, and mentoring by an experienced researcher throughout the process. The paper also identifies implications for the teaching of the SLR process.

empirical software engineering and measurement | 2011

Measuring the Efficacy of Code Clone Information in a Bug Localization Task: An Empirical Study

Debarshi Chatterji; Jeffrey C. Carver; Beverly Massengil; Jason Oslin; Nicholas A. Kraft

Much recent research effort has been devoted to designing efficient code clone detection techniques and tools. However, there has been little human-based empirical study of developers as they use the outputs of those tools while performing maintenance tasks. This paper describes a study that investigates the usefulness of code clone information for performing a bug localization task. In this study 43 graduate students were observed while identifying defects in both cloned and non-cloned portions of code. The goal of the study was to understand how those developers used clone information to perform this task. The results of this study showed that participants who first identified a defect then used it to look for clones of the defect were more effective than participants who used the clone information before finding any defects. The results also show a relationship between the perceived efficacy of the clone information and effectiveness in finding defects. Finally, the results show that participants who had industrial experience were more effective in identifying defects than those without industrial experience.

Explore More