Is this you? Create Your Porfile

Qing Mi

City University of Hong Kong

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Qing Mi is active.

Explore More

Publication

Featured researches published by Qing Mi.

evaluation and assessment in software engineering | 2016

An empirical analysis of reopened bugs based on open source projects

Qing Mi; Jacky Keung

Background: Bug fixing is a long-term and time-consuming activity. A software bug experiences a typical life cycle from newly reported to finally closed by developers, but it could be reopened afterwards for further actions due to reasons such as unclear description given by the bug reporter and developer negligence. Bug reopening is neither desirable nor could be completely avoided in practice, and it is more likely to bring unnecessary workloads to already-busy developers. Aims: To the best of our knowledge, there has been a little previous work on software bug reopening. In order to further study in this area, we perform an empirical analysis to provide a comprehensive understanding of this special area. Method: Based on four open source projects from Eclipse product family, they are CDT, JDT, PDE and Platform, we first quantitatively analyze reopened bugs from perspectives of proportion, impacts and time distribution. After initial exploration on their characteristics, we then qualitatively summarize root causes for bug reopening, this is carried out by investigating developer discussions recorded in Eclipse Bugzilla. Results: Results show that 6%--10% of total bugs will lead to reopening eventually. Over 93% of reopened bugs place serious influence on the normal operation of the system being developed. Several key reasons for bug reopening have been identified in our empirical study. Conclusions: Although reopened bugs have significant impacts on both end users and developers, it is quite possible to reduce bug reopening rate through the adoption of appropriate methods, such as promoting effective and efficient communication among bug reporters and developers, which is supported by empirical evidence in this study.

predictive models in software engineering | 2016

Measuring the Stylistic Inconsistency in Software Projects using Hierarchical Agglomerative Clustering

Qing Mi; Jacky Keung; Yang Yu

Background: Although many software engineering methodologies and guidelines are provided, it is common that developers apply their very own programming styles to the source code being produced. These individually preferred programming styles are more comprehensive for themselves, but may well conflict with each other. Thus, the problem of stylistic inconsistency is inevitable during the software development process involving multiple developers, the result is undesirable and that will significantly degrade program readability and maintainability. Aims: Given limited understanding in this regard, we perform an empirical analysis for the purpose of quantitatively measuring the inconsistency degree of programming style within a software project team. Method: We first propose stylistic fingerprints, which are represented as a set of attribute-counting-metrics, in an attempt to characterize different programming styles. Then we adopt the hierarchical agglomerative clustering (HAC) technique to quantitatively measuring the proximity of programming style based on six C/C++ open source projects chosen from different application domains. Results: The empirical results demonstrate the feasibility and validity of our fingerprinting methodology. Moreover, the proposed clustering procedure utilizing HAC algorithm with dendrograms is capable of effectively illustrating the inconsistency degree of programming style among source files, which is significant for future research. Conclusions: This study proposed an effective and efficient approach for analyzing programming style inconsistency, supported by a sound theoretical basis for dealing with such a problem. Ultimately improving program readability and therefore reduce the maintenance overhead for software projects.

Journal of Systems and Software | 2018

On the value of a prioritization scheme for resolving Self-admitted technical debt

Solomon Mensah; Jacky Keung; Jeffery Svajlenko; Kwabena Ebo Bennin; Qing Mi

Abstract Programmers tend to leave incomplete, temporary workarounds and buggy codes that require rework in software development and such pitfall is referred to as Self-admitted Technical Debt (SATD). Previous studies have shown that SATD negatively affects software project and incurs high maintenance overheads. In this study, we introduce a prioritization scheme comprising mainly of identification, examination and rework effort estimation of prioritized tasks in order to make a final decision prior to software release. Using the proposed prioritization scheme, we perform an exploratory analysis on four open source projects to investigate how SATD can be minimized. Four prominent causes of SATD are identified, namely code smells (23.2%), complicated and complex tasks (22.0%), inadequate code testing (21.2%) and unexpected code performance (17.4%). Results show that, among all the types of SATD, design debts on average are highly prone to software bugs across the four projects analysed. Our findings show that a rework effort of approximately 10 to 25 commented LOC per SATD source file is needed to address the highly prioritized SATD ( vital few ) tasks. The proposed prioritization scheme is a novel technique that will aid in decision making prior to software release in an attempt to minimize high maintenance overheads.

Information & Software Technology | 2018

Machine translation-based bug localization technique for bridging lexical gap

Yan Xiao; Jacky Keung; Kwabena Ebo Bennin; Qing Mi

Abstract Context The challenge of locating bugs in mostly large-scale software systems has led to the development of bug localization techniques. However, the lexical mismatch between bug reports and source codes degrades the performances of existing information retrieval or machine learning-based approaches. Objective To bridge the lexical gap and improve the effectiveness of localizing buggy files by leveraging the extracted semantic information from bug reports and source code. Method We present BugTranslator, a novel deep learning-based machine translation technique composed of an attention-based recurrent neural network (RNN) Encoder-Decoder with long short-term memory cells. One RNN encodes bug reports into several context vectors that are decoded by another RNN into code tokens of buggy files. The technique studies and adopts the relevance between the extracted semantic information from bug reports and source files. Results The experimental results show that BugTranslator outperforms a current state-of-the-art word embedding technique on three open-source projects with higher MAP and MRR. The results show that BugTranslator can rank actual buggy files at the second or third places on average. Conclusion BugTranslator distinguishes bug reports and source code into different symbolic classes and then extracts deep semantic similarity and relevance between bug reports and the corresponding buggy files to bridge the lexical gap at its source, thereby further improving the performance of bug localization.

evaluation and assessment in software engineering | 2018

An Inception Architecture-Based Model for Improving Code Readability Classification

Qing Mi; Jacky Keung; Yan Xiao; Solomon Mensah; Xiupei Mei

The process of classifying a piece of source code into a Readable or Unreadable class is referred to as Code Readability Classification. To build accurate classification models, existing studies focus on handcrafting features from different aspects that intuitively seem to correlate with code readability, and then exploring various machine learning algorithms based on the newly proposed features. On the contrary, our work opens up a new way to tackle the problem by using the technique of deep learning. Specifically, we propose IncepCRM, a novel model based on the Inception architecture that can learn multi-scale features automatically from source code with little manual intervention. We apply the information of human annotators as the auxiliary input for training IncepCRM and empirically verify the performance of IncepCRM on three publicly available datasets. The results show that: 1) Annotator information is beneficial for model performance as confirmed by robust statistical tests (i.e., the Brunner-Munzel test and Cliffs delta); 2) IncepCRM can achieve an improved accuracy against previously reported models across all datasets. The findings of our study confirm the feasibility and effectiveness of deep learning for code readability classification.

evaluation and assessment in software engineering | 2018

Bug Localization with Semantic and Structural Features using Convolutional Neural Network and Cascade Forest

Yan Xiao; Jacky Keung; Qing Mi; Kwabena Ebo Bennin

Background: Correctly localizing buggy files for bug reports together with their semantic and structural information is a crucial task, which would essentially improve the accuracy of bug localization techniques. Aims: To empirically evaluate and demonstrate the effects of both semantic and structural information in bug reports and source files on improving the performance of bug localization, we propose CNN_Forest involving convolutional neural network and ensemble of random forests that have excellent performance in the tasks of semantic parsing and structural information extraction. Method: We first employ convolutional neural network with multiple filters and an ensemble of random forests with multi-grained scanning to extract semantic and structural features from the word vectors derived from bug reports and source files. And a subsequent cascade forest (a cascade of ensembles of random forests) is used to further extract deeper features and observe the correlated relationships between bug reports and source files. CNNLForest is then empirically evaluated over 10,754 bug reports extracted from AspectJ, Eclipse UI, JDT, SWT, and Tomcat projects. Results: The experiments empirically demonstrate the significance of including semantic and structural information in bug localization, and further show that the proposed CNN_Forest achieves higher Mean Average Precision and Mean Reciprocal Rank measures than the best results of the four current state-of-the-art approaches (NPCNN, LR+WE, DNNLOC, and BugLocator). Conclusion: CNNLForest is capable of defining the correlated relationships between bug reports and source files, and we empirically show that semantic and structural information in bug reports and source files are crucial in improving bug localization.

Information & Software Technology | 2018

Improving bug localization with word embedding and enhanced convolutional neural networks

Yan Xiao; Jacky Keung; Kwabena Ebo Bennin; Qing Mi

Abstract Context: Automatic localization of buggy files can speed up the process of bug fixing to improve the efficiency and productivity of software quality assurance teams. Useful semantic information is available in bug reports and source code, but it is usually underutilized by existing bug localization approaches. Objective: To improve the performance of bug localization, we propose DeepLoc, a novel deep learning-based model that makes full use of semantic information. Method: DeepLoc is composed of an enhanced convolutional neural network (CNN) that considers bug-fixing recency and frequency, together with word-embedding and feature-detecting techniques. DeepLoc uses word embeddings to represent the words in bug reports and source files that retain their semantic information, and different CNNs to detect features from them. DeepLoc is evaluated on over 18,500 bug reports extracted from AspectJ, Eclipse, JDT, SWT, and Tomcat projects. Results: The experimental results show that DeepLoc achieves 10.87%–13.4% higher MAP (mean average precision) than conventional CNN. DeepLoc outperforms four current state-of-the-art approaches (DeepLocator, HyLoc, LR+WE, and BugLocator) in terms of Accuracy@k (the percentage of bug reports for which at least one real buggy file is located within the top k rank), MAP, and MRR (mean reciprocal rank) using less computation time. Conclusion: DeepLoc is capable of automatically connecting bug reports to the corresponding buggy files and achieves better performance than four state-of-the-art approaches based on a deep understanding of semantics in bug reports and source code.

Information & Software Technology | 2018

Improving code readability classification using convolutional neural networks

Qing Mi; Jacky Keung; Yan Xiao; Solomon Mensah; Yujin Gao

Abstract Context Code readability classification (which refers to classification of a piece of source code as either readable or unreadable) has attracted increasing concern in academia and industry. To construct accurate classification models, previous studies depended mainly upon handcrafted features. However, the manual feature engineering process is usually labor-intensive and can capture only partial information about the source code, which is likely to limit the model performance. Objective To improve code readability classification, we propose the use of Convolutional Neural Networks (ConvNets). Method We first introduce a representation strategy (with different granularities) to transform source codes into integer matrices as the input to ConvNets. We then propose DeepCRM, a deep learning-based model for code readability classification. DeepCRM consists of three separate ConvNets with identical architectures that are trained on data preprocessed in different ways. We evaluate our approach against five state-of-the-art code readability models. Results The experimental results show that DeepCRM can outperform previous approaches. The improvement in accuracy ranges from 2.4% to 17.2%. Conclusions By eliminating the need for manual feature engineering, DeepCRM provides a relatively improved performance, confirming the efficacy of deep learning techniques in the task of code readability classification.

asia-pacific software engineering conference | 2017