Is this you? Create Your Porfile

Zhilei Ren

Dalian University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhilei Ren is active.

Explore More

Publication

Featured researches published by Zhilei Ren.

international conference on software engineering | 2012

Developer prioritization in bug repositories

Jifeng Xuan; He Jiang; Zhilei Ren; Weiqin Zou

Developers build all the software artifacts in development. Existing work has studied the social behavior in software repositories. In one of the most important software repositories, a bug repository, developers create and update bug reports to support software development and maintenance. However, no prior work has considered the priorities of developers in bug repositories. In this paper, we address the problem of the developer prioritization, which aims to rank the contributions of developers. We mainly explore two aspects, namely modeling the developer prioritization in a bug repository and assisting predictive tasks with our model. First, we model how to assign the priorities of developers based on a social network technique. Three problems are investigated, including the developer rankings in products, the evolution over time, and the tolerance of noisy comments. Second, we consider leveraging the developer prioritization to improve three predicted tasks in bug repositories, i.e., bug triage, severity identification, and reopened bug prediction. We empirically investigate the performance of our model and its applications in bug repositories of Eclipse and Mozilla. The results indicate that the developer prioritization can provide the knowledge of developer priorities to assist software tasks, especially the task of bug triage.

IEEE Transactions on Software Engineering | 2012

Solving the Large Scale Next Release Problem with a Backbone-Based Multilevel Algorithm

Jifeng Xuan; He Jiang; Zhilei Ren; Zhongxuan Luo

The Next Release Problem (NRP) aims to optimize customer profits and requirements selection for the software releases. The research on the NRP is restricted by the growing scale of requirements. In this paper, we propose a Backbone-based Multilevel Algorithm (BMA) to address the large scale NRP. In contrast to direct solving approaches, the BMA employs multilevel reductions to downgrade the problem scale and multilevel refinements to construct the final optimal set of customers. In both reductions and refinements, the backbone is built to fix the common part of the optimal customers. Since it is intractable to extract the backbone in practice, the approximate backbone is employed for the instance reduction while the soft backbone is proposed to augment the backbone application. In the experiments, to cope with the lack of open large requirements databases, we propose a method to extract instances from open bug repositories. Experimental results on 15 classic instances and 24 realistic instances demonstrate that the BMA can achieve better solutions on the large scale NRP instances than direct solving approaches. Our work provides a reduction approach for solving large scale problems in search-based requirements engineering.

IEEE Transactions on Knowledge and Data Engineering | 2015

Towards Effective Bug Triage with Software Data Reduction Techniques

Jifeng Xuan; He Jiang; Yan Hu; Zhilei Ren; Weiqin Zou; Zhongxuan Luo; Xindong Wu

Software companies spend over 45 percent of cost in dealing with software bugs. An inevitable step of fixing bugs is bug triage, which aims to correctly assign a developer to a new bug. To decrease the time cost in manual work, text classification techniques are applied to conduct automatic bug triage. In this paper, we address the problem of data reduction for bug triage, i.e., how to reduce the scale and improve the quality of bug data. We combine instance selection with feature selection to simultaneously reduce data scale on the bug dimension and the word dimension. To determine the order of applying instance selection and feature selection, we extract attributes from historical bug data sets and build a predictive model for a new bug data set. We empirically investigate the performance of data reduction on totally 600,000 bug reports of two large open source projects, namely Eclipse and Mozilla. The results show that our data reduction can effectively reduce the data scale and improve the accuracy of bug triage. Ourwork provides an approach to leveraging techniques on data processing to form reduced and high-quality bug data in software development and maintenance.

electronic commerce | 2012

Hyper-heuristics with low level parameter adaptation

Zhilei Ren; He Jiang; Jifeng Xuan; Zhongxuan Luo

Recent years have witnessed the great success of hyper-heuristics applying to numerous real-world applications. Hyper-heuristics raise the generality of search methodologies by manipulating a set of low level heuristics (LLHs) to solve problems, and aim to automate the algorithm design process. However, those LLHs are usually parameterized, which may contradict the domain independent motivation of hyper-heuristics. In this paper, we show how to automatically maintain low level parameters (LLPs) using a hyper-heuristic with LLP adaptation (AD-HH), and exemplify the feasibility of AD-HH by adaptively maintaining the LLPs for two hyper-heuristic models. Furthermore, aiming at tackling the search space expansion due to the LLP adaptation, we apply a heuristic space reduction (SAR) mechanism to improve the AD-HH framework. The integration of the LLP adaptation and the SAR mechanism is able to explore the heuristic space more effectively and efficiently. To evaluate the performance of the proposed algorithms, we choose the p-median problem as a case study. The empirical results show that with the adaptation of the LLPs and the SAR mechanism, the proposed algorithms are able to achieve competitive results over the three heterogeneous classes of benchmark instances.

IEEE Transactions on Services Computing | 2016

Query Expansion Based on Crowd Knowledge for Code Search

Liming Nie; He Jiang; Zhilei Ren; Zeyi Sun; Xiaochen Li

As code search is a frequent developer activity in software development practices, improving the performance of code search is a critical task. In the text retrieval based search techniques employed in the code search, the term mismatch problem is a critical language issue for retrieval effectiveness. By reformulating the queries, query expansion provides effective ways to solve the term mismatch problem. In this paper, we propose Query Expansion based on Crowd Knowledge (QECK), a novel technique to improve the performance of code search algorithms. QECK identifies software-specific expansion words from the high quality pseudo relevance feedback question and answer pairs on Stack Overflow to automatically generate the expansion queries. Furthermore, we incorporate QECK in the classic Rocchios model, and propose QECK based code search method QECKRocchio. We conduct three experiments to evaluate our QECK technique and investigate QECKRocchio in a large-scale corpus containing real-world code snippets and a question and answer pair collection. The results show that QECK improves the performance of three code search algorithms by up to 64 percent in Precision, and 35 percent in NDCG. Meanwhile, compared with the state-of-the-art query expansion method, the improvement of QECK Rocchio is 22 percent in Precision, and 16 percent in NDCG.

IEEE Transactions on Systems, Man, and Cybernetics | 2014

New Insights Into Diversification of Hyper-Heuristics

Zhilei Ren; He Jiang; Jifeng Xuan; Yan Hu; Zhongxuan Luo

There has been a growing research trend of applying hyper-heuristics for problem solving, due to their ability of balancing the intensification and the diversification with low level heuristics. Traditionally, the diversification mechanism is mostly realized by perturbing the incumbent solutions to escape from local optima. In this paper, we report our attempt toward providing a new diversification mechanism, which is based on the concept of instance perturbation. In contrast to existing approaches, the proposed mechanism achieves the diversification by perturbing the instance under solving, rather than the solutions. To tackle the challenge of incorporating instance perturbation into hyper-heuristics, we also design a new hyper-heuristic framework HIP-HOP (recursive acronym of HIP-HOP is an instance perturbation-based hyper-heuristic optimization procedure), which employs a grammar guided high level strategy to manipulate the low level heuristics. With the expressive power of the grammar, the constraints, such as the feasibility of the output solution could be easily satisfied. Numerical results and statistical tests over both the Ising spin glass problem and the p-median problem instances show that HIP-HOP is able to achieve promising performances. Furthermore, runtime distribution analysis reveals that, although being relatively slow at the beginning, HIP-HOP is able to achieve competitive solutions once given sufficient time.

Neurocomputing | 2013

Extracting elite pairwise constraints for clustering

He Jiang; Zhilei Ren; Jifeng Xuan; Xindong Wu

Semi-supervised clustering under pairwise constraints (i.e. must-links and cannot-links) has been a hot topic in the data mining community in recent years. Since pairwise constraints provided by distinct domain experts may conflict with each other, a lot of research work has been conducted to evaluate the effects of noise imposing on semi-supervised clustering. In this paper, we introduce elite pairwise constraints, including elite must-link (EML) and elite cannot-link (ECL) constraints. In contrast to traditional constraints, both EML and ECL constraints are required to be satisfied in every optimal partition (i.e. a partition with the minimum criterion function). Therefore, no conflict will be caused by those new constraints. First, we prove that it is NP-hard to obtain EML or ECL constraints. Then, a heuristic method named Limit Crossing is proposed to achieve a fraction of those new constraints. In practice, this new method can always retrieve a lot of EML or ECL constraints. To evaluate the effectiveness of Limit Crossing, multi-partition based and distance based methods are also proposed in this paper to generate faux elite pairwise constraints. Extensive experiments have been conducted on both UCI and synthetic data sets using a semi-supervised clustering algorithm named COP-KMedoids. Experimental results demonstrate that COP-KMedoids under EML and ECL constraints generated by Limit Crossing can outperform those under either faux constraints or no constraints.

genetic and evolutionary computation conference | 2010

Approximate backbone based multilevel algorithm for next release problem

He Jiang; Jifeng Xuan; Zhilei Ren

The next release problem (NRP) aims to effectively select software requirements in order to acquire maximum customer profits. As an NP-hard problem in software requirement engineering, NRP lacks efficient approximate algorithms for large scale instances. The backbone is a new tool for tackling large scale NP-hard problems in recent years. In this paper, we employ the backbone to design high performance approximate algorithms for large scale NRP instances. Firstly we show that it is NP-hard to obtain the backbone of NRP. Then, we illustrate by fitness landscape analysis that the backbone can be well approximated by the shared common parts of local optimal solutions. Therefore, we propose an approximate backbone based multilevel algorithm (ABMA) to solve large scale NRP instances. This algorithm iteratively explores the search spaces by multilevel reductions and refinements. Experimental results demonstrate that ABMA outperforms existing algorithms on large instances in terms of solution quality and running time.

ieee international conference on software analysis evolution and reengineering | 2016

A More Accurate Model for Finding Tutorial Segments Explaining APIs

He Jiang; Jingxuan Zhang; Xiaochen Li; Zhilei Ren; David Lo

Developers prefer to utilize third-party libraries when they implement some functionalities and Application Programming Interfaces (APIs) are frequently used by them. Facing an unfamiliar API, developers tend to consult tutorials as learning resources. Unfortunately, the segments explaining a specific API scatter across tutorials. Hence, it remains a challenging issue to find the relevant segments. In this study, we propose a more accurate model to find the exact tutorial fragments explaining APIs. This new model consists of a text classifier with domain specific features. More specifically, we discover two important indicators to complement traditional text based features, namely co-occurrence APIs and knowledge based API extensions. In addition, we incorporate Word2Vec, a semantic similarity metric to enhance the new model. Extensive experiments over two publicly available tutorial datasets show that our new model could find up to 90% fragments explaining APIs and improve the state-of-the-art model by up to 30% in terms of F-measure.

Science in China Series F: Information Sciences | 2017

Mining authorship characteristics in bug repositories

He Jiang; Jingxuan Zhang; Hongjing Ma; Najam Nazar; Zhilei Ren

Bug reports are widely employed to facilitate software tasks in software maintenance. Since bug reports are contributed by people, the authorship characteristics of contributors may heavily impact the perfor-mance of resolving software tasks. Poorly written bug reports may delay developers when fixing bugs. However, no in-depth investigation has been conducted over the authorship characteristics. In this study, we first leverage byte-level N-grams to model the authorship characteristics and employ Normalized Simplified Profile Intersection (NSPI) to identify the similarity of the authorship characteristics. Then, we investigate a series of properties related to contributors’ authorship characteristics, including the evolvement over time and the variation among distinct products in open source projects. Moreover, we show how to leverage the authorship characteristics to facilitate a well-known task in software maintenance, namely Bug Report Summarization (BRS). Experiments on open source projects validate that incorporating the authorship characteristics can effectively improve a state-of-the-art method in BRS. Our findings suggest that contributors should retain stable authorship characteristics and the authorship characteristics can assist in resolving software tasks.创新点本文创造性的利用比特级N元文法来为缺陷仓库中的贡献者的写作风格建模, 同时引入NSPI来度量两种写作风格之间的相似度。本文研究了贡献者写作风格的一些性质, 包括贡献者写作风格随时间的变化情况以及在不同产品的变化情况等。进而利用贡献者写作风格来帮助解决一个典型的软件维护任务, 即缺陷报告摘要。本文的实验数据已经公开。实验结果表明, 利用开发者写作风格能够有效的提升缺陷报告摘要的效果

Explore More