Na Meng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Na Meng is active.

Explore More

Publication

Featured researches published by Na Meng.

international conference on program comprehension | 2017

How does execution information help with information-retrieval based bug localization?

Tung Dao; Lingming Zhang; Na Meng

Bug localization is challenging and time-consuming. Given a bug report, a developer may spend tremendous time comprehending the bug description together with code in order to locate bugs. To facilitate bug report comprehension, information retrieval (IR)-based bug localization techniques have been proposed to automatically search for and rank potential buggy code elements (i.e., classes or methods). However, these techniques do not leverage any dynamic execution information of buggy programs. In this paper, we perform the first systematic study on how dynamic execution information can help with static IR-based bug localization. More specifically, with the fixing patches and bug reports of 157 real bugs, we investigated the impact of various execution information (i.e. coverage, slicing, and spectrum) on three IR-based techniques: the baseline technique, BugLocator, and BLUiR. Our experiments demonstrate that both the coverage and slicing information of failed tests can effectively reduce the search space and improve IR-based techniques at both class and method levels. Using additional spectrum information can further improve bug localization at the method but not the class level. Some of our investigated ways of augmenting IR-based bug localization with execution information even outperform a state-of-the-art technique, which merges spectrum with an IR-based technique in a complicated way. Different from prior work, by investigating various easy-to-understand ways to combine execution information with IR-based techniques, this study shows for the first time that execution information can generally bring considerable improvement to IR-based bug localization.

Empirical Software Engineering | 2018

Towards reusing hints from past fixes: An exploratory study on thousands of real samples

Hao Zhong; Na Meng

With the usage of version control systems, many bug fixes have accumulated over the years. Researchers have proposed various automatic program repair (APR) approaches that reuse past fixes to fix new bugs. However, some fundamental questions, such as how new fixes overlap with old fixes, have not been investigated. Intuitively, the overlap between old and new fixes decides how APR approaches can construct new fixes with old ones. Based on this intuition, we systematically designed six overlap metrics, and performed an empirical study on 5,735 bug fixes to investigate the usefulness of past fixes when composing new fixes. For each bug fix, we created delta graphs (i.e., program dependency graphs for code changes), and identified how bug fixes overlap with each other in terms of the content, code structures, and identifier names of fixes. Our results show that if an APR approach knows all code name changes and composes new fixes by fully or partially reusing the content of past fixes, only 2.1% and 3.2% new fixes can be created from single or multiple past fixes in the same project, compared with 0.9% and 1.2% fixes created from past fixes across projects. However, if an APR approach knows all code name changes and composes new fixes by fully or partially reusing the code structures of past fixes, up to 41.3% and 29.7% new fixes can be created. By making the above observations and revealing other ten findings, we investigated the upper bound of reusable past fixes and composable new fixes, exploring the potential of existing and future APR approaches.

international conference on software maintenance | 2017

A Characterization Study of Repeated Bug Fixes

Ruru Yue; Na Meng; Qianxiang Wang

Programmers always fix bugs when maintaining software. Previous studies showed that developers apply repeated bug fixes—similar or identical code changes—to multiple locations. Based on the observation, researchers built tools to identify code locations in need of similar changes, or to suggest similar bug fixes to multiple code fragments. However, some fundamental research questions, such as what are the characteristics of repeated bug fixes, are still unexplored. In this paper, we present a comprehensive empirical study with 341,856 bug fixes from 3 open source projects to investigate repeated fixes in terms of their frequency, edit locations, and semantic meanings. Specifically, we sampled bug reports and retrieved the corresponding fixing patches in version history. Then we chopped patches into smaller fixes (edit fragments). Among all the fixes related to a bug, we identified repeated fixes using clone detection, and put a fix and its repeated ones into one repeated-fix group. With these groups, we characterized the edit locations, and investigated the common bug patterns as well as common fixes.Our study on Eclipse JDT, Mozilla Firefox, and LibreOffice shows that (1) 15-20% of bugs involved repeated fixes; (2) 73-92% of repeated-fix groups were applied purely to code clones; and (3) 39% of manually examined groups focused on bugs relevant to additions or deletions of whole if-structures. These results deepened our understanding of repeated fixes. They enabled us to assess the effectiveness of existing tools, and will further provide insights for future research directions in automatic software maintenance and program repair.

international conference on software engineering | 2017

An empirical study on using hints from past fixes: poster

Hao Zhong; Na Meng

With the usage of version control systems, many bugfixes have accumulated over the years. Researchers have proposedvarious approaches that reuse past fixes to fix new bugs. However, some fundamental questions, such as how new bug fixes can beconstructed from old fixes, have not been investigated. When anapproach reuses past fixes to fix a new bug, the new bug fixshould overlap with past fixes in terms of code structures and/orcode names. Based on this intuition, we systematically design sixoverlap metrics, and conduct an empirical study on 5,735 bugfixes to investigate the usefulness of past fixes. For each bug fix, we create delta dependency graphs, and identify how bug fixesoverlap with each other by detecting isomorphic subgraphs. Ourresults show Besides that above two major findings, we haveadditional ten findings, which can deepen the understanding onautomatic program repair.

international conference on software engineering | 2018

Secure coding practices in Java: challenges and vulnerabilities

Na Meng; Stefan Nagy; Danfeng Yao; Wenjie Zhuang; Gustavo Arango-Argoty

The Java platform and its third-party libraries provide useful features to facilitate secure coding. However, misusing them can cost developers time and effort, as well as introduce security vulnerabilities in software. We conducted an empirical study on StackOverflow posts, aiming to understand developers concerns on Java secure coding, their programming obstacles, and insecure coding practices. We observed a wide adoption of the authentication and authorization features provided by Spring Security—a third-party framework designed to secure enterprise applications. We found that programming challenges are usually related to APIs or libraries, including the complicated cross-language data handling of cryptography APIs, and the complex Java-based or XML-based approaches to configure Spring Security. In addition, we reported multiple security vulnerabilities in the suggested code of accepted answers on the StackOverflow forum. The vulnerabilities included disabling the default protection against Cross-Site Request Forgery (CSRF) attacks, breaking SSL/TLS security through bypassing certificate validation, and using insecure cryptographic hash functions. Our findings reveal the insufficiency of secure coding assistance and documentation, as well as the huge gap between security theory and coding practices.

international conference on software maintenance | 2017

CCLearner: A Deep Learning-Based Clone Detection Approach

Liuqing Li; He Feng; Wenjie Zhuang; Na Meng; Barbara G. Ryder

Programmers produce code clones when developing software. By copying and pasting code with or without modification, developers reuse existing code to improve programming productivity. However, code clones present challenges to software maintenance: they may require consistent application of the same or similar bug fixes or program changes to multiple code locations. To simplify the maintenance process, various tools have been proposed to automatically detect clones [1], [2], [3], [4], [5], [6]. Some tools tokenize source code, and then compare the sequence or frequency of tokens to reveal clones [1], [3], [4], [5]. Some other tools detect clones using tree-matching algorithms to compare the Abstract Syntax Trees (ASTs) of source code [2], [6]. In this paper, we present CCLEARNER, the first solely token-based clone detection approach leveraging deep learning. CCLEARNER extracts tokens from known method-level code clones and nonclones to train a classifier, and then uses the classifier to detect clones in a given codebase.To evaluate CCLEARNER, we reused BigCloneBench [7], an existing large benchmark of real clones. We used part of the benchmark for training and the other part for testing, and observed that CCLEARNER effectively detected clones. With the same data set, we conducted the first systematic comparison experiment between CCLEARNER and three popular clone detection tools. Compared with the approaches not using deep learning, CCLEARNER achieved competitive clone detection effectiveness with low time cost.

international conference on software engineering | 2018

Towards reusing hints from past fixes: an exploratory study on thousands of real samples

Hao Zhong; Na Meng

Researchers have recently proposed various automatic program repair (APR) approaches that reuse past fixes to fix new bugs. However, some fundamental questions, such as how new fixes overlap with old fixes, have not been investigated. Intuitively, the overlap between old and new fixes decides how APR approaches can construct new fixes with old ones. Based on this intuition, we systematically designed six overlap metrics, and performed an empirical study on 5,735 bug fixes to investigate the usefulness of past fixes when composing new fixes. For each bug fix, we created delta dependency graphs (i.e., program dependency graphs for code changes), and identified how bug fixes overlapped with each other in terms of the content, code structure, and identifier names of fixes. Our results show that if an APR approach composes new fixes by fully or partially reusing the content of past fixes, only 2.1% and 3.2% new fixes can be created from single or multiple past fixes in the same project, compared with 0.9% and 1.2% fixes created from past fixes across projects. However, if an APR approach composes new fixes by fully or partially reusing the code structure of past fixes, up to 41.3% and 29.7% new fixes can be created.

Journal of Systems and Software | 2018

Lascad : Language-agnostic software categorization and similar application detection

Doaa Altarawy; Hossameldin Shahin; Ayat Mohammed; Na Meng

Abstract Categorizing software and detecting similar programs are useful for various purposes including expertise sharing, program comprehension, and rapid prototyping. However, existing categorization and similar software detection tools are not sufficient. Some tools only handle applications written in certain languages or belonging to specific domains like Java or Android. Other tools require significant configuration effort due to their sensitivity to parameter settings, and may produce excessively large numbers of categories. In this paper, we present a more usable and reliable approach of Language-Agnostic Software Categorization and similar Application Detection ( Lascad ). Our approach applies Latent Dirichlet Allocation (LDA) and hierarchical clustering to programs’ source code in order to reveal which applications implement similar functionalities. Lascad is easier to use in cases when no domain-specific tool is available or when users want to find similar software in different programming languages. To evaluate Lascad ’s capability of categorizing software, we used three labeled data sets: two sets from prior work and one larger set that we created with 103 applications implemented in 19 different languages. By comparing Lascad with prior approaches on these data sets, we found Lascad to be more usable and outperform existing tools. To evaluate Lascad ’s capability of similar application detection, we reused our 103-application data set and a newly created unlabeled data set of 5220 applications. The relevance scores of the Top-1 retrieved applications within these two data sets were, separately, 70% and 71%. Overall, Lascad effectively categorizes and detects similar programs across languages.

Archive | 2016