Hoan Anh Nguyen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hoan Anh Nguyen is active.

Explore More

Publication

Featured researches published by Hoan Anh Nguyen.

foundations of software engineering | 2009

Graph-based mining of multiple object usage patterns

Tung Thanh Nguyen; Hoan Anh Nguyen; Nam H. Pham; Jafar M. Al-Kofahi; Tien N. Nguyen

The interplay of multiple objects in object-oriented programming often follows specific protocols, for example certain orders of method calls and/or control structure constraints among them that are parts of the intended object usages. Unfortunately, the information is not always documented. That creates long learning curve, and importantly, leads to subtle problems due to the misuse of objects. In this paper, we propose GrouMiner, a novel graph-based approach for mining the usage patterns of one or multiple objects. GrouMiner approach includes a graph-based representation for multiple object usages, a pattern mining algorithm, and an anomaly detection technique that are efficient, accurate, and resilient to software changes. Our experiments on several real-world programs show that our prototype is able to find useful usage patterns with multiple objects and control structures, and to translate them into user-friendly code skeletons to assist developers in programming. It could also detect the usage anomalies that caused yet undiscovered defects and code smells in those programs.

conference on object-oriented programming systems, languages, and applications | 2010

A graph-based approach to API usage adaptation

Hoan Anh Nguyen; Tung Thanh Nguyen; Gary R. Wilson; Anh Tuan Nguyen; Miryung Kim; Tien N. Nguyen

Reusing existing library components is essential for reducing the cost of software development and maintenance. When library components evolve to accommodate new feature requests, to fix bugs, or to meet new standards, the clients of software libraries often need to make corresponding changes to correctly use the updated libraries. Existing API usage adaptation techniques support simple adaptation such as replacing the target of calls to a deprecated API, however, cannot handle complex adaptations such as creating a new object to be passed to a different API method, or adding an exception handling logic that surrounds the updated API method calls. This paper presents LIBSYNC that guides developers in adapting API usage code by learning complex API usage adaptation patterns from other clients that already migrated to a new library version (and also from the API usages within the librarys test code). LIBSYNC uses several graph-based techniques (1) to identify changes to API declarations by comparing two library versions, (2) to extract associated API usage skeletons before and after library migration, and (3) to compare the extracted API usage skeletons to recover API usage adaptation patterns. Using the learned adaptation patterns, LIBSYNC recommends the locations and edit operations for adapting API usages. The evaluation of LIBSYNC on real-world software systems shows that it is highly correct and useful with a precision of 100% and a recall of 91%.

international conference on software engineering | 2009

Complete and accurate clone detection in graph-based models

Nam H. Pham; Hoan Anh Nguyen; Tung Thanh Nguyen; Jafar M. Al-Kofahi; Tien N. Nguyen

Model-Driven Engineering (MDE) has become an important development framework for many large-scale software. Previous research has reported that as in traditional code-based development, cloning also occurs in MDE. However, there has been little work on clone detection in models with the limitations on detection precision and completeness. This paper presents ModelCD, a novel clone detection tool for Matlab/Simulink models, that is able to efficiently and accurately detect both exactly matched and approximate model clones. The core of ModelCD is two novel graph-based clone detection algorithms that are able to systematically and incrementally discover clones with a high degree of completeness, accuracy, and scalability. We have conducted an empirical evaluation with various experimental studies on many real-world systems to demonstrate the usefulness of our approach and to compare the performance of ModelCD with existing tools.

international conference on software engineering | 2010

Recurring bug fixes in object-oriented programs

Tung Thanh Nguyen; Hoan Anh Nguyen; Nam H. Pham; Jafar M. Al-Kofahi; Tien N. Nguyen

Previous research confirms the existence of recurring bug fixes in software systems. Analyzing such fixes manually, we found that a large percentage of them occurs in code peers, the classes/methods having the similar roles in the systems, such as providing similar functions and/or participating in similar object interactions. Based on graph-based representation of object usages, we have developed several techniques to identify code peers, recognize recurring bug fixes, and recommend changes for code units from the bug fixes of their peers. The empirical evaluation on several open-source projects shows that our prototype, FixWizard, is able to identify recurring bug fixes and provide fixing recommendations with acceptable accuracy.

foundations of software engineering | 2013

A statistical semantic language model for source code

Tung Thanh Nguyen; Anh Tuan Nguyen; Hoan Anh Nguyen; Tien N. Nguyen

Recent research has successfully applied the statistical n-gram language model to show that source code exhibits a good level of repetition. The n-gram model is shown to have good predictability in supporting code suggestion and completion. However, the state-of-the-art n-gram approach to capture source code regularities/patterns is based only on the lexical information in a local context of the code units. To improve predictability, we introduce SLAMC, a novel statistical semantic language model for source code. It incorporates semantic information into code tokens and models the regularities/patterns of such semantic annotations, called sememes, rather than their lexemes. It combines the local context in semantic n-grams with the global technical concerns/functionality into an n-gram topic model, together with pairwise associations of program elements. Based on SLAMC, we developed a new code suggestion method, which is empirically evaluated on several projects to have relatively 18-68% higher accuracy than the state-of-the-art approach.

IEEE Transactions on Software Engineering | 2012

Clone Management for Evolving Software

Hoan Anh Nguyen; Tung Thanh Nguyen; Nam H. Pham; Jafar M. Al-Kofahi; Tien N. Nguyen

Recent research results suggest a need for code clone management. In this paper, we introduce JSync, a novel clone management tool. JSync provides two main functions to support developers in being aware of the clone relation among code fragments as software systems evolve and in making consistent changes as they create or modify cloned code. JSync represents source code and clones as (sub)trees in Abstract Syntax Trees, measures code similarity based on structural characteristic vectors, and describes code changes as tree editing scripts. The key techniques of JSync include the algorithms to compute tree editing scripts, to detect and update code clones and their groups, to analyze the changes of cloned code to validate their consistency, and to recommend relevant clone synchronization and merging. Our empirical study on several real-world systems shows that JSync is efficient and accurate in clone detection and updating, and provides the correct detection of the defects resulting from inconsistent changes to clones and the correct recommendations for change propagation across cloned code.

automated software engineering | 2013

A study of repetitiveness of code changes in software evolution

Hoan Anh Nguyen; Anh Tuan Nguyen; Tung Thanh Nguyen; Tien N. Nguyen; Hridesh Rajan

In this paper, we present a large-scale study of repetitiveness of code changes in software evolution. We collected a large data set of 2,841 Java projects, with 1.7 billion source lines of code (SLOC) at the latest revisions, 1.8 million code change revisions (0.4 million fixes), 6.2 million changed files, and 2.5 billion changed SLOCs. A change is considered repeated within or cross-project if it matches another change having occurred in the history of the project or another project, respectively. We report the following important findings. First, repetitiveness of changes could be as high as 70-100% at small sizes and decreases exponentially as size increases. Second, repetitiveness is higher and more stable in the cross-project setting than in the within-project one. Third, fixing changes repeat similarly to general changes. Importantly, learning code changes and recommending them in software evolution is beneficial with accuracy for top-1 recommendation of over 30% and top-3 of nearly 35%. Repeated fixing changes could also be useful for automatic program repair.

fundamental approaches to software engineering | 2009

Accurate and Efficient Structural Characteristic Feature Extraction for Clone Detection

Hoan Anh Nguyen; Tung Thanh Nguyen; Nam H. Pham; Jafar M. Al-Kofahi; Tien N. Nguyen

Structure-oriented approaches in clone detection have become popular in both code-based and model-based clone detection. However, existing methods for capturing structural information in software artifacts are either too computationally expensive to be efficient or too light-weight to be accurate in clone detection. In this paper, we present Exas, an accurate and efficient structural characteristic feature extraction approach that better approximates and captures the structure within the fragments of artifacts. Exas structural features are the sequences of labels and numbers built from nodes, edges, and paths of various lengths of a graph-based representation. A fragment is characterized by a structural characteristic vector of the occurrence counts of those features. We have applied Exas in building two clone detection tools for source code and models. Our analytic study and empirical evaluation on open-source software show that Exas and its algorithm for computing the characteristic vectors are highly accurate and efficient in clone detection.

automated software engineering | 2009

Clone-Aware Configuration Management

Tung Thanh Nguyen; Hoan Anh Nguyen; Nam H. Pham; Jafar M. Al-Kofahi; Tien N. Nguyen

Recent research results show several benefits of the management of code clones. In this paper, we introduce Clever, a novel clone-aware software configuration management (SCM) system. In addition to traditional SCM functionality, Clever provides clone management support, including clone detection and update, clone change management, clone consistency validating, clone synchronizing, and clone merging. Clever represents source code and clones as (sub)trees in Abstract Syntax Trees (ASTs), measures code similarity based on structural characteristic vectors, and describes code changes as tree editing scripts. The key techniques of Clever include the algorithms to compute tree editing scripts; to detect and update code clones and their groups; and to analyze the changes of cloned code to validate their consistency and recommend the relevant synchronization. Our empirical study on many real-world programs shows that Clever is highly efficient and accurate in clone detection and updating, and provides useful analysis of clone changes.

foundations of software engineering | 2012

Multi-layered approach for recovering links between bug reports and fixes

Anh Tuan Nguyen; Tung Thanh Nguyen; Hoan Anh Nguyen; Tien N. Nguyen

The links between the bug reports in an issue-tracking system and the corresponding fixing changes in a version repository are not often recorded by developers. Such linking information is crucial for research in mining software repositories in measuring software defects and maintenance efforts. However, the state-of-the-art bug-to-fix link recovery approaches still rely much on textual matching between bug reports and commit/change logs and cannot handle well the cases where their contents are not textually similar. This paper introduces MLink, a multi-layered approach that takes into account not only textual features but also source code features of the changed code corresponding to the commit logs. It is also capable of learning the association relations between the terms in bug reports and the names of entities/components in the changed source code of the commits from the established bug-to-fix links, and uses them for link recovery between the reports and commits that do not share much similar texts. Our empirical evaluation on real-world projects shows that MLink can improve the state-of-the-art bug-to-fix link recovery methods by 11--18%, 13--17%, and 8--17% in F-score, recall, and precision, respectively.

Explore More