Xiaohu Yang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaohu Yang is active.

Explore More

Publication

Featured researches published by Xiaohu Yang.

Information & Software Technology | 2015

ELBlocker: Predicting blocking bugs with ensemble imbalance learning

Xin Xia; David Lo; Emad Shihab; Xinyu Wang; Xiaohu Yang

Abstract Context Blocking bugs are bugs that prevent other bugs from being fixed. Previous studies show that blocking bugs take approximately two to three times longer to be fixed compared to non-blocking bugs. Objective Thus, automatically predicting blocking bugs early on so that developers are aware of them, can help reduce the impact of or avoid blocking bugs. However, a major challenge when predicting blocking bugs is that only a small proportion of bugs are blocking bugs, i.e., there is an unequal distribution between blocking and non-blocking bugs. For example, in Eclipse and OpenOffice, only 2.8% and 3.0% bugs are blocking bugs, respectively. We refer to this as the class imbalance phenomenon. Method In this paper, we propose ELBlocker to identify blocking bugs given a training data. ELBlocker first randomly divides the training data into multiple disjoint sets, and for each disjoint set, it builds a classifier. Next, it combines these multiple classifiers, and automatically determines an appropriate imbalance decision boundary to differentiate blocking bugs from non-blocking bugs. With the imbalance decision boundary, a bug report will be classified to be a blocking bug when its likelihood score is larger than the decision boundary, even if its likelihood score is low. Results To examine the benefits of ELBlocker, we perform experiments on 6 large open source projects – namely Freedesktop, Chromium, Mozilla, Netbeans, OpenOffice, and Eclipse containing a total of 402,962 bugs. We find that ELBlocker achieves F1 and EffectivenessRatio@20% scores of up to 0.482 and 0.831, respectively. On average across the 6 projects, ELBlocker improves the F1 and EffectivenessRatio@20% scores over the state-of-the-art method proposed by Garcia and Shihab by 14.69% and 8.99%, respectively. Statistical tests show that the improvements are significant and the effect sizes are large. Conclusion ELBlocker can help deal with the class imbalance phenomenon and improve the prediction of blocking bugs. ELBlocker achieves a substantial and statistically significant improvement over the state-of-the-art methods, i.e., Garcia and Shihab’s method, SMOTE, OSS, and Bagging.

international conference on software maintenance | 2015

Who should review this change?: Putting text and file location analyses together for more accurate recommendations

Xin Xia; David Lo; Xinyu Wang; Xiaohu Yang

Software code review is a process of developers inspecting new code changes made by others, to evaluate their quality and identify and fix defects, before integrating them to the main branch of a version control system. Modern Code Review (MCR), a lightweight and tool-based variant of conventional code review, is widely adopted in both open source and proprietary software projects. One challenge that impacts MCR is the assignment of appropriate developers to review a code change. Considering that there could be hundreds of potential code reviewers in a software project, picking suitable reviewers is not a straightforward task. A prior study by Thongtanunam et al. showed that the difficulty in selecting suitable reviewers may delay the review process by an average of 12 days. In this paper, to address the challenge of assigning suitable reviewers to changes, we propose a hybrid and incremental approach Tie which utilizes the advantages of both Text mIning and a filE location-based approach. To do this, Tie integrates an incremental text mining model which analyzes the textual contents in a review request, and a similarity model which measures the similarity of changed file paths and reviewed file paths. We perform a large-scale experiment on four open source projects, namely Android, OpenStack, QT, and LibreOffice, containing a total of 42,045 reviews. The experimental results show that on average Tie can achieve top-1, top-5, and top-10 accuracies, and Mean Reciprocal Rank (MRR) of 0.52, 0.79, 0.85, and 0.64 for the four projects, which improves the state-of-the-art approach RevFinder, proposed by Thongtanunam et al., by 61%, 23%, 8%, and 37%, respectively.

pacific asia workshop on intelligence and security informatics | 2012

Information credibility on twitter in emergency situation

Xin Xia; Xiaohu Yang; Chao Wu; Shanping Li; Linfeng Bao

Twitter has shown its greatest power of influence for its fast information diffusion. Previous research has shown that most of the tweets posted are truthful, but as some people post the rumors and spams on Twitter in emergence situation, the direction of public opinion can be misled and even the riots are caused. In this paper, we focus on the methods for the information credibility in emergency situation. More precisely, we build a novel Twitter monitor model to monitoring Twitter online. Within the novel monitor model, an unsupervised learning algorithm is proposed to detect the emergency situation. A collection of training dataset which includes the tweets of typical events is gathered through the Twitter monitor. Then we manually dispatch the dataset to experts who label each tweet into two classes: credibility or incredibility. With the classified tweets, a number of features related to the user social behavior, the tweet content, the tweet topic and the tweet diffusion are extracted. A supervised method using learning Bayesian Network is used to predict the tweets credibility in emergency situation. Experiments with the tweets of UK Riots related topics show that our procedure achieves good performance to classify the tweets compared with other state-of-art algorithms.

ieee international conference on services computing | 2013

Reputation-Aware QoS Value Prediction of Web Services

Weiwei Qiu; Zibin Zheng; Xinyu Wang; Xiaohu Yang; Michael R. Lyu

QoS value prediction of Web services is an important research issue for service recommendation, selection and composition. Collaborative Filtering (CF) is one of the most widely used methods which employs QoS values contributed by similar users to make predictions. Therefore, historical QoS values contributed by different users can have great impacts on prediction results. However, existing Web service QoS value prediction approaches did not take data credibility into consideration, which may impact the prediction accuracy. To address this problem, we propose a reputation-aware QoS value prediction approach, which first calculates the reputation of each user based on their contributed values, and then takes advantage of reputation-based ranking to exclude the values contributed by untrustworthy users. CF QoS prediction approach is finally used to predict the missing QoS values based on the purified dataset. Experimental results show that our approach has higher prediction accuracy than other approaches.

international conference on software maintenance | 2009

Business process recovery for system maintenance — An empirical approach

Zhengong Cai; Xiaohu Yang; Xinyu Wang

Understanding business processes is an important step for software maintenance. The approaches for recovering business processes are mostly based on source code analysis, including static analysis and dynamic analysis. All these methods are proved to be effective in some specific situations. However, they are challenged when facing enterprise legacy systems which implement complex business processes triggered by the external actors. In this paper, we introduce a new business process recovery approach that combines the requirement reacquisition with dynamic and static program analysis methods. The approach has been applied to the maintenance of an equity trading system to prove its efficiency.

IEEE Transactions on Services Computing | 2014

Reliability-Based Design Optimization for Cloud Migration

Weiwei Qiu; Zibin Zheng; Xinyu Wang; Xiaohu Yang; Michael R. Lyu

The on-demand use, high scalability, and low maintenance cost nature of cloud computing have attracted more and more enterprises to migrate their legacy applications to the cloud environment. Although the cloud platform itself promises high reliability, ensuring high quality of service is still one of the major concerns, since the enterprise applications are usually complicated and consist of a large number of distributed components. Thus, improving the reliability of an application during cloud migration is a challenging and critical research problem. To address this problem, we propose a reliability-based optimization framework, named ROCloud, to improve the application reliability by fault tolerance. ROCloud includes two ranking algorithms. The first algorithm ranks components for the applications that all their components will be migrated to the cloud. The second algorithm ranks components for hybrid applications that only part of their components are migrated to the cloud. Both algorithms employ the application structure information as well as the historical reliability information for component ranking. Based on the ranking result, optimal fault-tolerant strategy will be selected automatically for the most significant components with respect to their predefined constraints. The experimental results show that by refactoring a small number of error-prone components and tolerating faults of the most significant components, the reliability of the application can be greatly improved.

conference on software maintenance and reengineering | 2013

A Comparative Study of Supervised Learning Algorithms for Re-opened Bug Prediction

Xin Xia; David Lo; Xinyu Wang; Xiaohu Yang; Shanping Li; Jianling Sun

Bug fixing is a time-consuming and costly job which is performed in the whole life cycle of software development and maintenance. For many systems, bugs are managed in bug management systems such as Bugzilla. Generally, the status of a typical bug report in Bugzilla changes from new to assigned, verified and closed. However, some bugs have to be reopened. Reopened bugs increase the software development and maintenance cost, increase the workload of bug fixers, and might even delay the future delivery of a software. Only a few studies investigate the phenomenon of reopened bug reports. In this paper, we evaluate the effectiveness of various supervised learning algorithms to predict if a bug report would be reopened. We choose 7 state-of-the-art classical supervised learning algorithm in machine learning literature, i.e., kNN, SVM, Simple Logistic, Bayesian Network, Decision Table, CART and LWL, and 3 ensemble learning algorithms, i.e., AdaBoost, Bagging and Random Forest, and evaluate their performance in predicting reopened bug reports. The experiment results show that among the 10 algorithms, Bagging and Decision Table (IDTM) achieve the best performance. They achieve accuracy scores of 92.91% and 92.80%, respectively, and reopened bug reports F-Measure scores of 0.735 and 0.732, respectively. These results improve the reopened bug reports F-Measure of the state-of-the-art approaches proposed by Shihab et al. by up to 23.53%.

Frontiers of Computer Science in China | 2014

Model checking with fairness assumptions using PAT

Yuanjie Si; Jun Sun; Yang Liu; Jin Song Dong; Jun Pang; Shao Jie Zhang; Xiaohu Yang

Recent development on distributed systems has shown that a variety of fairness constraints (some of which are only recently defined) play vital roles in designing self-stabilizing population protocols. Existing model checkers are deficient in verifying the systems as only limited kinds of fairness are supported with limited verification efficiency. In this work, we support model checking of distributed systems in the toolkit PAT (process analysis toolkit), with a variety of fairness constraints (e.g., process-level weak/strong fairness, event-level weak/strong fairness, strong global fairness). It performs on-the-fly verification against linear temporal properties. We show through empirical evaluation (on recent population protocols as well as benchmark systems) that PAT has advantage in model checking with fairness. Previously unknown bugs have been revealed against systems which are designed to function only with strong global fairness.

international conference on web services | 2013

Geographic Location-Based Network-Aware QoS Prediction for Service Composition

Yuanhong Shen; Jianke Zhu; Xinyu Wang; Liang Cai; Xiaohu Yang; Bo Zhou

QoS-aware service composition intends to maximize the global QoS of a composite service while selecting candidate services from different providers with local and global QoS constraints. With more and more candidate services emerging from all over the world, the network delays often greatly impact the performance of the composite service, which are usually not easy to be collected before the composition. One remedy is to predict them for the composition. However, new issues occur in predicting network delay for the composition, including prediction accuracy and on-demand measures to new services, which affect the performance of network-aware composite services. To solve these critical challenges, in this paper, we take advantage of the geographic location information of candidate services. We propose a network-aware QoS (NQoS) model for the composite service. Based on that, we present a novel geographic location-based NQoS prediction approach before composition, and a NQoS re-prediction approach during the execution of the composite service. Extensive experiments are conducted on the real-world dataset collected from PlanetLab. Comparative experiment results reveal our approach facilitates to improve the prediction accuracy and predictability of the NQoS values, and increase global NQoS of the composite service while ensuring its reliability constraints.

conference on information and knowledge management | 2011

RW.KNN: a proposed random walk KNN algorithm for multi-label classification

Xin Xia; Xiaohu Yang; Shanping Li; Chao Wu; Linlin Zhou

Multi-label classification refers to the problem that predicts each single instance to be one or more labels in a set of associated labels. It is common in many real-world applications such as text categorization, functional genomics and semantic scene classification. The main challenge for multi-label classification is predicting the labels of a new instance with the exponential number of possible label sets. Previous works mainly pay attention to transforming the multi-label classification to be single-label classification or modifying the existing traditional algorithm. In this paper, a novel algorithm which combines the advantage of the famous KNN and Random Walk algorithm (RW.KNN) is proposed. The KNN based link graph is built with the k-nearest neighbors for each instance. For an unseen instance, a random walk is performed in the link graph. The final probability is computed according to the random walk results. Lastly, a novel algorithm based on minimizing Hamming Loss to select the classification threshold is also proposed in this paper.

Explore More