Is this you? Create Your Porfile

Jifeng Xuan

Dalian University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jifeng Xuan is active.

Explore More

Publication

Featured researches published by Jifeng Xuan.

international conference on software engineering | 2012

Developer prioritization in bug repositories

Jifeng Xuan; He Jiang; Zhilei Ren; Weiqin Zou

Developers build all the software artifacts in development. Existing work has studied the social behavior in software repositories. In one of the most important software repositories, a bug repository, developers create and update bug reports to support software development and maintenance. However, no prior work has considered the priorities of developers in bug repositories. In this paper, we address the problem of the developer prioritization, which aims to rank the contributions of developers. We mainly explore two aspects, namely modeling the developer prioritization in a bug repository and assisting predictive tasks with our model. First, we model how to assign the priorities of developers based on a social network technique. Three problems are investigated, including the developer rankings in products, the evolution over time, and the tolerance of noisy comments. Second, we consider leveraging the developer prioritization to improve three predicted tasks in bug repositories, i.e., bug triage, severity identification, and reopened bug prediction. We empirically investigate the performance of our model and its applications in bug repositories of Eclipse and Mozilla. The results indicate that the developer prioritization can provide the knowledge of developer priorities to assist software tasks, especially the task of bug triage.

IEEE Transactions on Software Engineering | 2012

Solving the Large Scale Next Release Problem with a Backbone-Based Multilevel Algorithm

Jifeng Xuan; He Jiang; Zhilei Ren; Zhongxuan Luo

The Next Release Problem (NRP) aims to optimize customer profits and requirements selection for the software releases. The research on the NRP is restricted by the growing scale of requirements. In this paper, we propose a Backbone-based Multilevel Algorithm (BMA) to address the large scale NRP. In contrast to direct solving approaches, the BMA employs multilevel reductions to downgrade the problem scale and multilevel refinements to construct the final optimal set of customers. In both reductions and refinements, the backbone is built to fix the common part of the optimal customers. Since it is intractable to extract the backbone in practice, the approximate backbone is employed for the instance reduction while the soft backbone is proposed to augment the backbone application. In the experiments, to cope with the lack of open large requirements databases, we propose a method to extract instances from open bug repositories. Experimental results on 15 classic instances and 24 realistic instances demonstrate that the BMA can achieve better solutions on the large scale NRP instances than direct solving approaches. Our work provides a reduction approach for solving large scale problems in search-based requirements engineering.

IEEE Transactions on Knowledge and Data Engineering | 2015

Towards Effective Bug Triage with Software Data Reduction Techniques

Jifeng Xuan; He Jiang; Yan Hu; Zhilei Ren; Weiqin Zou; Zhongxuan Luo; Xindong Wu

Software companies spend over 45 percent of cost in dealing with software bugs. An inevitable step of fixing bugs is bug triage, which aims to correctly assign a developer to a new bug. To decrease the time cost in manual work, text classification techniques are applied to conduct automatic bug triage. In this paper, we address the problem of data reduction for bug triage, i.e., how to reduce the scale and improve the quality of bug data. We combine instance selection with feature selection to simultaneously reduce data scale on the bug dimension and the word dimension. To determine the order of applying instance selection and feature selection, we extract attributes from historical bug data sets and build a predictive model for a new bug data set. We empirically investigate the performance of data reduction on totally 600,000 bug reports of two large open source projects, namely Eclipse and Mozilla. The results show that our data reduction can effectively reduce the data scale and improve the accuracy of bug triage. Ourwork provides an approach to leveraging techniques on data processing to form reduced and high-quality bug data in software development and maintenance.

computer software and applications conference | 2011

Towards Training Set Reduction for Bug Triage

Weiqin Zou; Yan Hu; Jifeng Xuan; He Jiang

Bug triage is an important step in the process of bug fixing. The goal of bug triage is to assign a new-coming bug to the correct potential developer. The existing bug triage approaches are based on machine learning algorithms, which build classifiers from the training sets of bug reports. In practice, these approaches suffer from the large-scale and low-quality training sets. In this paper, we propose the training set reduction with both feature selection and instance selection techniques for bug triage. We combine feature selection with instance selection to improve the accuracy of bug triage. The feature selection algorithm ¦Ö2-test, instance selection algorithm Iterative Case Filter, and their combinations are studied in this paper. We evaluate the training set reduction on the bug data of Eclipse. For the training set, 70% words and 50% bug reports are removed after the training set reduction. The experimental results show that the new and small training sets can provide better accuracy than the original one.

electronic commerce | 2012

Hyper-heuristics with low level parameter adaptation

Zhilei Ren; He Jiang; Jifeng Xuan; Zhongxuan Luo

Recent years have witnessed the great success of hyper-heuristics applying to numerous real-world applications. Hyper-heuristics raise the generality of search methodologies by manipulating a set of low level heuristics (LLHs) to solve problems, and aim to automate the algorithm design process. However, those LLHs are usually parameterized, which may contradict the domain independent motivation of hyper-heuristics. In this paper, we show how to automatically maintain low level parameters (LLPs) using a hyper-heuristic with LLP adaptation (AD-HH), and exemplify the feasibility of AD-HH by adaptively maintaining the LLPs for two hyper-heuristic models. Furthermore, aiming at tackling the search space expansion due to the LLP adaptation, we apply a heuristic space reduction (SAR) mechanism to improve the AD-HH framework. The integration of the LLP adaptation and the SAR mechanism is able to explore the heuristic space more effectively and efficiently. To evaluate the performance of the proposed algorithms, we choose the p-median problem as a case study. The empirical results show that with the adaptation of the LLPs and the SAR mechanism, the proposed algorithms are able to achieve competitive results over the three heterogeneous classes of benchmark instances.

IEEE Transactions on Systems, Man, and Cybernetics | 2014

New Insights Into Diversification of Hyper-Heuristics

Zhilei Ren; He Jiang; Jifeng Xuan; Yan Hu; Zhongxuan Luo

There has been a growing research trend of applying hyper-heuristics for problem solving, due to their ability of balancing the intensification and the diversification with low level heuristics. Traditionally, the diversification mechanism is mostly realized by perturbing the incumbent solutions to escape from local optima. In this paper, we report our attempt toward providing a new diversification mechanism, which is based on the concept of instance perturbation. In contrast to existing approaches, the proposed mechanism achieves the diversification by perturbing the instance under solving, rather than the solutions. To tackle the challenge of incorporating instance perturbation into hyper-heuristics, we also design a new hyper-heuristic framework HIP-HOP (recursive acronym of HIP-HOP is an instance perturbation-based hyper-heuristic optimization procedure), which employs a grammar guided high level strategy to manipulate the low level heuristics. With the expressive power of the grammar, the constraints, such as the feasibility of the output solution could be easily satisfied. Numerical results and statistical tests over both the Ising spin glass problem and the p-median problem instances show that HIP-HOP is able to achieve promising performances. Furthermore, runtime distribution analysis reveals that, although being relatively slow at the beginning, HIP-HOP is able to achieve competitive solutions once given sufficient time.

Neurocomputing | 2013

Extracting elite pairwise constraints for clustering

He Jiang; Zhilei Ren; Jifeng Xuan; Xindong Wu

Semi-supervised clustering under pairwise constraints (i.e. must-links and cannot-links) has been a hot topic in the data mining community in recent years. Since pairwise constraints provided by distinct domain experts may conflict with each other, a lot of research work has been conducted to evaluate the effects of noise imposing on semi-supervised clustering. In this paper, we introduce elite pairwise constraints, including elite must-link (EML) and elite cannot-link (ECL) constraints. In contrast to traditional constraints, both EML and ECL constraints are required to be satisfied in every optimal partition (i.e. a partition with the minimum criterion function). Therefore, no conflict will be caused by those new constraints. First, we prove that it is NP-hard to obtain EML or ECL constraints. Then, a heuristic method named Limit Crossing is proposed to achieve a fraction of those new constraints. In practice, this new method can always retrieve a lot of EML or ECL constraints. To evaluate the effectiveness of Limit Crossing, multi-partition based and distance based methods are also proposed in this paper to generate faux elite pairwise constraints. Extensive experiments have been conducted on both UCI and synthetic data sets using a semi-supervised clustering algorithm named COP-KMedoids. Experimental results demonstrate that COP-KMedoids under EML and ECL constraints generated by Limit Crossing can outperform those under either faux constraints or no constraints.

genetic and evolutionary computation conference | 2010

Approximate backbone based multilevel algorithm for next release problem

He Jiang; Jifeng Xuan; Zhilei Ren

The next release problem (NRP) aims to effectively select software requirements in order to acquire maximum customer profits. As an NP-hard problem in software requirement engineering, NRP lacks efficient approximate algorithms for large scale instances. The backbone is a new tool for tackling large scale NP-hard problems in recent years. In this paper, we employ the backbone to design high performance approximate algorithms for large scale NRP instances. Firstly we show that it is NP-hard to obtain the backbone of NRP. Then, we illustrate by fitness landscape analysis that the backbone can be well approximated by the shared common parts of local optimal solutions. Therefore, we propose an approximate backbone based multilevel algorithm (ABMA) to solve large scale NRP instances. This algorithm iteratively explores the search spaces by multilevel reductions and refinements. Experimental results demonstrate that ABMA outperforms existing algorithms on large instances in terms of solution quality and running time.

international conference on software engineering | 2016

Revisit of automatic debugging via human focus-tracking analysis

Xiaoyuan Xie; Zicong Liu; Shuo Song; Zhenyu Chen; Jifeng Xuan; Baowen Xu

In many fields of software engineering, studies on human behavior have attracted a lot of attention; however, few such studies exist in automated debugging. Parnin and Orso conducted a pioneering study comparing the performance of programmers in debugging with and without a ranking-based fault localization technique, namely Spectrum-Based Fault Localization (SBFL). In this paper, we revisit the actual helpfulness of SBFL, by addressing some major problems that were not resolved in Parnin and Orso’s study. Our investigation involved 207 participants and 17 debugging tasks. A user-friendly SBFL tool was adopted. It was found that SBFL tended not to be helpful in improving the efficiency of debugging. By tracking and analyzing programmers’ focus of attention, we characterized their source code navigation patterns and provided in-depth explanations to the observations. Results indicated that (1) a short “first scan” on the source code tended to result in inefficient debugging; and (2) inspections on the pinpointed statements during the “follow-up browsing” were normally just quick skimming. Moreover, we found that the SBFL assistance may even slightly weaken programmers’ abilities in fault detection. Our observations imply interference between the mechanism of automated fault localization and the actual assistance needed by programmers in debugging. To resolve this interference, we provide several insights and suggestions.

systems man and cybernetics | 2012

An Accelerated-Limit-Crossing-Based Multilevel Algorithm for the

Zhilei Ren; He Jiang; Jifeng Xuan; Zhongxuan Luo

In this paper, we investigate how to design an efficient heuristic algorithm under the guideline of the backbone and the fat, in the context of the p-median problem. Given a problem instance, the backbone variables are defined as the variables shared by all optimal solutions, and the fat variables are defined as the variables that are absent from every optimal solution. Identification of the backbone (fat) variables is essential for the heuristic algorithms exploiting such structures. Since the existing exact identification method, i.e., limit crossing (LC), is time consuming and sensitive to the upper bounds, it is hard to incorporate LC into heuristic algorithm design. In this paper, we develop the accelerated-LC (ALC)-based multilevel algorithm (ALCMA). In contrast to LC which repeatedly runs the time-consuming Lagrangian relaxation (LR) procedure, ALC is introduced in ALCMA such that LR is performed only once, and every backbone (fat) variable can be determined in O(1) time. Meanwhile, the upper bound sensitivity is eliminated by a dynamic pseudo upper bound mechanism. By combining ALC with the pseudo upper bound, ALCMA can efficiently find high-quality solutions within a series of reduced search spaces. Extensive empirical results demonstrate that ALCMA outperforms existing heuristic algorithms in terms of the average solution quality.In this paper, we investigate how to design an efficient heuristic algorithm under the guideline of the backbone and the fat, in the context of the p-median problem. Given a problem instance, the backbone variables are defined as the variables shared by all optimal solutions, and the fat variables are defined as the variables that are absent from every optimal solution. Identification of the backbone (fat) variables is essential for the heuristic algorithms exploiting such structures. Since the existing exact identification method, i.e., limit crossing (LC), is time consuming and sensitive to the upper bounds, it is hard to incorporate LC into heuristic algorithm design. In this paper, we develop the accelerated-LC (ALC)-based multilevel algorithm (ALCMA). In contrast to LC which repeatedly runs the time-consuming Lagrangian relaxation (LR) procedure, ALC is introduced in ALCMA such that LR is performed only once, and every backbone (fat) variable can be determined in O(1) time. Meanwhile, the upper bound sensitivity is eliminated by a dynamic pseudo upper bound mechanism. By combining ALC with the pseudo upper bound, ALCMA can efficiently find high-quality solutions within a series of reduced search spaces. Extensive empirical results demonstrate that ALCMA outperforms existing heuristic algorithms in terms of the average solution quality.

Explore More