Jianling Sun | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jianling Sun is active.

Explore More

Publication

Featured researches published by Jianling Sun.

conference on software maintenance and reengineering | 2004

Business rules extraction from large legacy systems

Xinyu Wang; Jianling Sun; Xiaohu Yang; Zhijun He; Srinivasa R. Maddineni

Business rules are a set of conditional operations attached to a given data result. On legacy systems, it is very difficult to extract business rules because of the inconsistency of documentation. Some techniques have been presented for extracting business rules from legacy systems. But usefulness of these methods is limited when they are applied to large complex legacy systems. Generally, large legacy systems involve large amount of code, domain variables, synonym variables and business rules, which make it more difficult to extract business rules. This paper proposes a framework, which offers distinct advantages over normal extraction solutions for large legacv systems. This framework consists of five steps: slicing program, identifying domain variables, data analysis, presenting business rules, and business validation. It has been applied to a large complex financial legacy system which has proved to be successful.

2015 IEEE International Conference on Software Quality, Reliability and Security | 2015

Deep Learning for Just-in-Time Defect Prediction

Xinli Yang; David Lo; Xin Xia; Yun Zhang; Jianling Sun

Defect prediction is a very meaningful topic, particularly at change-level. Change-level defect prediction, which is also referred as just-in-time defect prediction, could not only ensure software quality in the development process, but also make the developers check and fix the defects in time. Nowadays, deep learning is a hot topic in the machine learning literature. Whether deep learning can be used to improve the performance of just-in-time defect prediction is still uninvestigated. In this paper, to bridge this research gap, we propose an approach Deeper which leverages deep learning techniques to predict defect-prone changes. We first build a set of expressive features from a set of initial change features by leveraging a deep belief network algorithm. Next, a machine learning classifier is built on the selected features. To evaluate the performance of our approach, we use datasets from six large open source projects, i.e., Bugzilla, Columba, JDT, Platform, Mozilla, and PostgreSQL, containing a total of 137,417 changes. We compare our approach with the approach proposed by Kamei et al. The experimental results show that on average across the 6 projects, Deeper could discover 32.22% more bugs than Kamei et als approach (51.04% versus 18.82% on average). In addition, Deeper can achieve F1-scores of 0.22-0.63, which are statistically significantly higher than those of Kamei et al.s approach on 4 out of the 6 projects.

computer software and applications conference | 2015

An Empirical Study of Classifier Combination for Cross-Project Defect Prediction

Yun Zhang; David Lo; Xin Xia; Jianling Sun

To help developers better allocate testing and debugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes that are more likely to be buggy based on past history of buggy classes. These techniques work well as long as a sufficient amount of data is available to train a prediction model. However, there is rarely enough training data for new software projects. To deal with this problem, cross-project defect prediction, which transfers a prediction model trained using data from one project to another, has been proposed and is regarded as a new challenge for defect prediction. So far, only a few cross-project defect prediction techniques have been proposed. To advance the state-of-the-art, in this work, we investigate 7 composite algorithms, which integrate multiple machine learning classifiers, to improve cross-project defect prediction. To evaluate the performance of the composite algorithms, we perform experiments on 10 open source software systems from the PROMISE repository which contain a total of 5,305 instances labeled as defective or clean. We compare the composite algorithms with CODEP Logistic, which is the latest cross-project defect prediction algorithm proposed by Panichella et al., in terms of two standard evaluation metrics: cost effectiveness and F-measure. Our experiment results show that several algorithms outperform CODEP Logistic: Max performs the best in terms of F-measure and its average F-measure outperforms that of CODEP Logistic by 36.88%. Bagging J48 performs the best in terms of cost effectiveness and its average cost effectiveness outperforms that of CODEP Logistic by 15.34%.

conference on software maintenance and reengineering | 2013

A Comparative Study of Supervised Learning Algorithms for Re-opened Bug Prediction

Xin Xia; David Lo; Xinyu Wang; Xiaohu Yang; Shanping Li; Jianling Sun

Bug fixing is a time-consuming and costly job which is performed in the whole life cycle of software development and maintenance. For many systems, bugs are managed in bug management systems such as Bugzilla. Generally, the status of a typical bug report in Bugzilla changes from new to assigned, verified and closed. However, some bugs have to be reopened. Reopened bugs increase the software development and maintenance cost, increase the workload of bug fixers, and might even delay the future delivery of a software. Only a few studies investigate the phenomenon of reopened bug reports. In this paper, we evaluate the effectiveness of various supervised learning algorithms to predict if a bug report would be reopened. We choose 7 state-of-the-art classical supervised learning algorithm in machine learning literature, i.e., kNN, SVM, Simple Logistic, Bayesian Network, Decision Table, CART and LWL, and 3 ensemble learning algorithms, i.e., AdaBoost, Bagging and Random Forest, and evaluate their performance in predicting reopened bug reports. The experiment results show that among the 10 algorithms, Bagging and Decision Table (IDTM) achieve the best performance. They achieve accuracy scores of 92.91% and 92.80%, respectively, and reopened bug reports F-Measure scores of 0.735 and 0.732, respectively. These results improve the reopened bug reports F-Measure of the state-of-the-art approaches proposed by Shihab et al. by up to 23.53%.

international conference on software maintenance | 2016

Inferring Links between Concerns and Methods with Multi-abstraction Vector Space Model

Yun Zhang; David Lo; Xin Xia; Tien-Duy B. Le; Giuseppe Scanniello; Jianling Sun

Concern localization refers to the process of locating code units that match a particular textual description. It takes as input textual documents such as bug reports and feature requests and outputs a list of candidate code units that are relevant to the bug reports or feature requests. Many information retrieval (IR) based concern localization techniques have been proposed in the literature. These techniques typically represent code units and textual descriptions as a bag of tokens at one level of abstraction, e.g., each token is a word, or each token is a topic. In this work, we propose a multi-abstraction concern localization technique named MULAB. MULAB represents a code unit and a textual description at multiple abstraction levels. Similarity of a textual description and a code unit is now made by considering all these abstraction levels. We combine a vector space model and multiple topic models to compute the similarity and apply a genetic algorithm to infer semi-optimal topic model configurations. We have evaluated our solution on 136 concerns from 8 open source Java software systems. The experimental results show that MULAB outperforms the state-of-art baseline PR, which is proposed by Scanniello et al. in terms of effectiveness and rank.

international symposium on software reliability engineering | 2016

Combining Word Embedding with Information Retrieval to Recommend Similar Bug Reports

Xinli Yang; David Lo; Xin Xia; Lingfeng Bao; Jianling Sun

Similar bugs are bugs that require handling of many common code files. Developers can often fix similar bugs with a shorter time and a higher quality since they can focus on fewer code files. Therefore, similar bug recommendation is a meaningful task which can improve development efficiency. Rocha et al. propose the first similar bug recommendation system named NextBug. Although NextBug performs better than a start-of-the-art duplicated bug detection technique REP, its performance is not optimal and thus more work is needed to improve its effectiveness. Technically, it is also rather simple as it relies only upon a standard information retrieval technique, i.e., cosine similarity. In the paper, we propose a novel approach to recommend similar bugs. The approach combines a traditional information retrieval technique and a word embedding technique, and takes bug titles and descriptions as well as bug product and component information into consideration. To evaluate the approach, we use datasets from two popular open-source projects, i.e., Eclipse and Mozilla, each of which contains bug reports whose bug ids range from [1,400000]. The results show that our approach improves the performance of NextBug statistically significantly and substantially for both projects.

ieee international conference on software analysis evolution and reengineering | 2017

Detecting similar repositories on GitHub

Yun Zhang; David Lo; Pavneet Singh Kochhar; Xin Xia; Quanlai Li; Jianling Sun

GitHub contains millions of repositories among which many are similar with one another (i.e., having similar source codes or implementing similar functionalities). Finding similar repositories on GitHub can be helpful for software engineers as it can help them reuse source code, build prototypes, identify alternative implementations, explore related projects, find projects to contribute to, and discover code theft and plagiarism. Previous studies have proposed techniques to detect similar applications by analyzing API usage patterns and software tags. However, these prior studies either only make use of a limited source of information or use information not available for projects on GitHub. In this paper, we propose a novel approach that can effectively detect similar repositories on GitHub. Our approach is designed based on three heuristics leveraging two data sources (i.e., GitHub stars and readme files) which are not considered in previous works. The three heuristics are: repositories whose readme files contain similar contents are likely to be similar with one another, repositories starred by users of similar interests are likely to be similar, and repositories starred together within a short period of time by the same user are likely to be similar. Based on these three heuristics, we compute three relevance scores (i.e., readme-based relevance, stargazer-based relevance, and time-based relevance) to assess the similarity between two repositories. By integrating the three relevance scores, we build a recommendation system called RepoPal to detect similar repositories. We compare RepoPal to a prior state-of-the-art approach CLAN using one thousand Java repositories on GitHub. Our empirical evaluation demonstrates that RepoPal achieves a higher success rate, precision and confidence over CLAN.

Journal of Computer Science and Technology | 2017

High-Impact Bug Report Identification with Imbalanced Learning Strategies

Xinli Yang; David Lo; Xin Xia; Qiao Huang; Jianling Sun

In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high-impact bugs are used to refer to the bugs which appear at unexpected time or locations and bring more unexpected effects (i.e., surprise bugs), or break pre-existing functionalities and destroy the user experience (i.e., breakage bugs). Unfortunately, identifying high-impact bugs from thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high-impact bugs, the identification of high-impact bug reports is a difficult task. In this paper, we propose an approach to identify high-impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various variants, each of which combines one particular imbalanced learning strategy and one particular classification algorithm. In particular, we choose four widely used strategies for dealing with imbalanced data and four state-of-the-art text classification algorithms to conduct experiments on four datasets from four different open source projects. We mainly perform an analytical study on two types of high-impact bugs, i.e., surprise bugs and breakage bugs. The results show that different variants have different performances, and the best performing variants SMOTE (synthetic minority over-sampling technique) + KNN (K-nearest neighbours) for surprise bug identification and RUS (random under-sampling) + NB (naive Bayes) for breakage bug identification outperform the F1-scores of the two state-of-the-art approaches by Thung et al. and Garcia and Shihab.

world congress on intelligent control and automation | 2004

Application of information-flow relations algorithm on extracting business rules from legacy code

Xinyu Wang; Jianling Sun; Xiaohu Yang; Zhijun He; Srinivasa R. Maddineni

Business rules are a set of conditional operations attached to a given data result. Over time, systems that implement business rules are updated when organizations change the data and operations to the changing business needs. On legacy systems, it is very difficult to extract business rules because of the inconsistency of documentation. So it is necessary to extract business rules from source code. Identifying domain variables is a significant step in extracting business rules from source code. This paper proposes a solution to identify domain variables automatically from source code by applying information-flow relations algorithm. The solution contains three steps: identifying variables affected by input domain variables; identifying variables, which affect output domain variables; domain variables management. It has been applied to a large complex financial legacy system, which has proved to be successful.

systems, man and cybernetics | 2004

Human factors in extracting business rules from legacy systems

Xinyu Wang; Jianling Sun; Xiaohu Yang; Zhijun He; Srinivasa R. Maddineni

Business rules are operational rules based on data that business organizations follow to perform various activities. Systems that implement business rules are updated when the organization change to meet new business needs or to follow the change of laws. Over time, the documentation of systems would become inconsistent with system behaviour. Thus software code becomes a more reliable source than any available documentation to obtain business rules. Currently, extracting business rules from legacy systems is heavily dependent on human interaction and steering. Though there are tools to assist the extraction, it is not a fully automated process. So it is very important to analyze the human factors, which have never been summarized or analyzed in this field. This work would analyze human factors in every procedure of extracting business rules from legacy systems. These human factors have been proved in extracting business rules from a financial legacy system.

Explore More