Is this you? Create Your Porfile

Letha H. Etzkorn

University of Alabama in Huntsville

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Letha H. Etzkorn is active.

Explore More

Publication

Featured researches published by Letha H. Etzkorn.

Information & Software Technology | 2010

Bug localization using latent Dirichlet allocation

Stacy K. Lukins; Nicholas A. Kraft; Letha H. Etzkorn

Context: Some recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI). Latent Dirichlet allocation (LDA) is a generative statistical model that has significant advantages, in modularity and extensibility, over both LSI and probabilistic LSI (pLSI). Moreover, LDA has been shown effective in topic model based information retrieval. In this paper, we present a static LDA-based technique for automatic bug localization and evaluate its effectiveness. Objective: We evaluate the accuracy and scalability of the LDA-based technique and investigate whether it is suitable for use with open-source software systems of varying size, including those developed using agile methods. Method: We present five case studies designed to determine the accuracy and scalability of the LDA-based technique, as well as its relationships to software system size and to source code stability. The studies examine over 300 bugs across more than 25 iterations of three software systems. Results: The results of the studies show that the LDA-based technique maintains sufficient accuracy across all bugs in a single iteration of a software system and is scalable to a large number of bugs across multiple revisions of two software systems. The results of the studies also indicate that the accuracy of the LDA-based technique is not affected by the size of the subject software system or by the stability of its source code base. Conclusion: We conclude that an effective static technique for automatic bug localization can be built around LDA. We also conclude that there is no significant relationship between the accuracy of the LDA-based technique and the size of the subject software system or the stability of its source code base. Thus, the LDA-based technique is widely applicable.

working conference on reverse engineering | 2008

Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation

Stacy K. Lukins; Nicholas A. Kraft; Letha H. Etzkorn

In bug localization, a developer uses information about a bug to locate the portion of the source code to modify to correct the bug. Developers expend considerable effort performing this task. Some recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI); however, latent Dirichlet allocation (LDA), a modular and extensible IR model, has significant advantages over both LSI and probabilistic LSI (pLSI). In this paper we present an LDA-based static technique for automating bug localization. We describe the implementation of our technique and three case studies that measure its effectiveness. For two of the case studies we directly compare our results to those from similar studies performed using LSI. The results demonstrate our LDA-based technique performs at least as well as the LSI-based techniques for all bugs and performs better, often significantly so, than the LSI-based techniques for most bugs.

IEEE Computer | 1997

Automatically identifying reusable OO legacy code

Letha H. Etzkorn; Carl G. Davis

Much object oriented code has been written without reuse in mind, making identification of useful components difficult. The Patricia (Program Analysis Tool for Reuse) system automatically identifies these components through understanding comments and identifiers. To understand a program, Patricia uses a unique heuristic approach, deriving information from the linguistic aspects of comments and identifiers and from other nonlinguistic aspects of OO code, such as a class hierarchy. In developing the Patricia system, we had to overcome the problems of syntactically parsing natural language comments and syntactically analyzing identifiers-all prior to a semantic understanding of the comments and identifiers. Another challenge was the semantic understanding phase, when the organization of the knowledge base and an inferencing scheme were developed.

acm southeast regional conference | 2004

Towards a semantic-based approach for software reusable component classification and retrieval

Haining Yao; Letha H. Etzkorn

In this paper, we propose a semantic-based approach to improve software component reuse. The whole approach extends the software reusable library to the World Wide Web; overcomes the keyword-based barrier by allowing user queries in natural language; treats a software component as a service described by semantic service representation format; enhances the retrieval by semantically matching between a user query semantic representation and software component semantic descriptions against a domain ontology; and finally stores the relevant software components into a reusable repository based UDDI infrastructure. The technologies applied to achieve the goal include: Natural Language Processing, Web services, Semantic Web, Conceptual Graph, domain ontology. The research in the first phase will focus on the classification and retrieval for software reusable components. In the classification process, natural language processing and domain knowledge technologies are employed for program understanding down to code level, and Web services and Semantic Web technologies as well as Conceptual Graph are used to semantically describe/represent a component. In the retrieval process, a user query in natural language is translate into semantic representation formats in order to augment retrieval recall and precision by deploying the same semantic representation technologies on both the user query side and the component side.

european software engineering conference | 1999

An entropy-based complexity measure for object-oriented designs

Jagdish Bansiya; Carl G. Davis; Letha H. Etzkorn

The use of entropy as a measure of information content has led to its use in measuring the code complexity of functionally developed software products; however, no similar capability exists for evaluating complexities of object-oriented systems using entropy. In this paper a new metric based on entropy as a complexity measure for object-oriented classes is defined and validated using several large commercial object-oriented projects. The metric is computed using information available in class definitions. The new complexity measure of classes is correlated with traditional complexity measures such as McCabes cyclomatic metric and the number-of-defects metric, both of which were evaluated from the implementation of the methods of the classes. The correlation study used the final versions of the class definitions. The high degree of positive correlation between the entropy-based class definition measure and the traditional measures of class implementation complexity verify that the new entropy measure computed from class definitions can be used as a predictive measure for class implementation complexities provided the class definitions do not change significantly during the implementation.

Information & Software Technology | 2001

Automated reusability quality analysis of OO legacy software

Letha H. Etzkorn; William E. Hughes; Carl G. Davis

Abstract Software reuse increases productivity, reduces costs, and improves quality. Object-oriented (OO) software has been shown to be inherently more reusable than functionally decomposed software; however, most OO software was not specifically designed for reuse [Software Reuse Guidelines and Methods, Plenum Press, New York, 1991]. This paper describes the analysis, in terms of quality factors related to reusability, contained in an approach that aids significantly in assessing existing OO software for reusability. An automated tool implementing the approach is validated by comparing the tools quality determinations to that of human experts. This comparison provides insight into how OO software metrics should be interpreted in relation to the quality factors they purport to measure.

Empirical Software Engineering | 2014

Configuring latent Dirichlet allocation based feature location

Lauren R. Biggers; Cecylia Bocovich; Riley Capshaw; Brian P. Eddy; Letha H. Etzkorn; Nicholas A. Kraft

Feature location is a program comprehension activity, the goal of which is to identify source code entities that implement a functionality. Recent feature location techniques apply text retrieval models such as latent Dirichlet allocation (LDA) to corpora built from text embedded in source code. These techniques are highly configurable, and the literature offers little insight into how different configurations affect their performance. In this paper we present a study of an LDA based feature location technique (FLT) in which we measure the performance effects of using different configurations to index corpora and to retrieve 618 features from 6 open source Java systems. In particular, we measure the effects of the query, the text extractor configuration, and the LDA parameter values on the accuracy of the LDA based FLT. Our key findings are that exclusion of comments and literals from the corpus lowers accuracy and that heuristics for selecting LDA parameter values in the natural language context are suboptimal in the source code context. Based on the results of our case study, we offer specific recommendations for configuring the LDA based FLT.

Information & Software Technology | 2004

A comparison of cohesion metrics for object-oriented systems

Letha H. Etzkorn; Sampson Gholston; Julie Fortune; Cara Stein; Dawn R. Utley; Phillip A. Farrington; Glenn W. Cox

Abstract Cohesion is the degree to which the elements of a class or object belong together. Many different object-oriented cohesion metrics have been developed; many of them are based on the notion of degree of similarity of methods. No consensus has yet arisen as to which of these metrics best measures cohesion; this is a problem for software developers since there are so many suggested metrics, it is difficult to make an informed choice. This research compares various cohesion metrics with ratings of two separate teams of experts over two software packages, to determine which of these metrics best match human-oriented views of cohesion. Additionally, the metrics are compared statistically, to determine which tend to measure the same kinds of cohesion. Differences in results for different object-oriented metrics tools are discussed.

Natural Language Engineering | 1999

An approach to program understanding by natural language understanding

Letha H. Etzkorn; Lisa L. Bowen; Carl G. Davis

An automated tool to assist in the understanding of legacy code components can be useful both in the areas of software reuse and software maintenance. Most previous work in this area has concentrated on functionally-oriented code. Whereas object-oriented code has been shown to be inherently more reusable than functionally-oriented code, in many cases the eventual reuse of the object-oriented code was not considered during development. A knowledge-based, natural language processing approach to the automated understanding of object-oriented code as an aid to the reuse of object-oriented code is described. A system, called the PATRicia system (Program Analysis Tool for Reuse) that implements the approach is examined. The natural language processing/information extraction system that comprises a large part of the PATRicia system is discussed and the knowledge-base of the PATRicia system, in the form of conceptual graphs, is described. Reports provided by natural language-generation in the PATRicia system are described.

Journal of Pragmatics | 2001

The language of comments in computer software: A sublanguage of English

Letha H. Etzkorn; Carl G. Davis; Lisa L. Bowen

Abstract A sublanguage is a subset of a natural language such as the English language. Sublanguages tend to emerge gradually through the use of a language in various fields by specialists in those fields. Some such sublanguages are the ‘language of biophysics’ and the ‘language of naval telegraphic transmissions’. This paper explores whether English-language comments in object-oriented software can be considered to be a sublanguage of English, using standard criteria for sublanguage determination. To make this determination, the article looks at the grammatical content of comments, including: sentence-style comments versus non-sentence-style comments, and the use of tense, mood, and voice in sentence-style comments. The telegraphic nature of comments is also examined. Additionally, the subject-matter of comments is analyzed in terms of the purpose of comments in describing the operation of computer software.

Explore More