Xiaozhen Xue
Texas Tech University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xiaozhen Xue.
computer software and applications conference | 2014
Xiaozhen Xue; Yulei Pang; Akbar Siami Namin
Although empirical studies have demonstrated the usefulness of statistical fault localizations based on code coverage, the effectiveness of these techniques may be deteriorated due to the presence of some undesired circumstances such as the existence of coincidental correctness where one or more passing test cases exercise a faulty statement and thus causing some confusion to decide whether the underlying exercised statement is faulty or not. Fault localizations based on coverage can be improved if all possible instances of coincidental correctness are identified and proper strategies are employed to deal with these troublesome test cases. We introduce a technique to effectively identify coincidentally correct test cases. The proposed technique combines support vector machines and ensemble learning to detect mislabeled test cases, i.e. Coincidentally correct test cases. The ensemble-based support vector machine then can be used to trim a test suite or flip the test status of the coincidental correctness test cases and thus improving the effectiveness of fault localizations.
international conference on machine learning and applications | 2014
Yulei Pang; Feiya Xiao; Huaying Wang; Xiaozhen Xue
Group work is widely used in tertiary institutions due to the considerable advantages of collaborative learning. Previous studies indicated that the group diversity had positive influence on the group work achievement. Therefore, how to achieve diversity within a group effectively and automatically is an interesting question. In this paper we propose a novel clustering-based grouping model. The proposed technique first employs balanced K-means algorithm to divide the students into several size-balanced clusters, such that the students within the same cluster are more similar (in some sense) to each other than to those in other clusters, then adopts one-sample-each-cluster strategy to construct the groups1. We evaluated the proposed technique based on two small-scale case studies. The result observed may indicate that the clustering-based grouping model is feasible and effective.
international conference on machine learning and applications | 2013
Yulei Pang; Xiaozhen Xue; Akbar Siami Namin
Testing is the most time consuming and expensive process in the software development life cycle. In order to reduce the cost of regression testing, we propose a test case classification methodology based on k-means clustering with the purpose of classifying test cases into two groups of effective and non-effective test cases. The clustering strategy is based on Hamming distances measured over the differences between coverage information obtained for current and the previous releases of the program under test. Our empirical study shows that the clustering-based test case classification can identify effective test cases with high recall ratio and considerable accuracy percentage. The paper also investigates and compares the performance of the proposed clustering-based approach with various factors including coverage criteria and the weights factor used in measuring distances.
empirical software engineering and measurement | 2013
Xiaozhen Xue; Akbar Siami Namin
The effectiveness of coverage-based fault localizations in the presence of multiple faults has been a major concern for the software testing research community. A commonly held belief is that the fault localization techniques based on coverage statistics are less effective in the presence of multiple faults and their performance deteriorates. The fault interference phenomenon refers to cases where the software under test contains multiple faults whose interactions hinder effective debugging. The immediate research question that arises is to what extent fault interactions are influential. This paper focuses on verifying the existence of fault interference phenomenon in programs developed in programming languages with object-oriented features. The paper then statistically measures the influence and significance of fault interactions on the performance of debugging based on coverage-based fault localizations. The result verifies that the fault interleaving phenomenon occurs. However, its impact on the performance of fault localizations is negligible.
international conference on machine learning and applications | 2015
Yulei Pang; Xiaozhen Xue; Akbar Siami Namin
Vulnerabilities need to be detected and removed from software. Although previous studies demonstrated the usefulness of employing prediction techniques in deciding about vulnerabilities of software components, the accuracy and improvement of effectiveness of these prediction techniques is still a grand challenging research question. This paper proposes a hybrid technique based on combining N-gram analysis and feature selection algorithms for predicting vulnerable software components where features are defined as continuous sequences of token in source code files, i.e., Java class file. Machine learning-based feature selection algorithms are then employed to reduce the feature and search space. We evaluated the proposed technique based on some Java Android applications, and the results demonstrated that the proposed technique could predict vulnerable classes, i.e., software components, with high precision, accuracy and recall.
international conference on machine learning and applications | 2014
Xiaozhen Xue; Yulei Pang; Akbar Siami Namin
Due to the complex causality of failure and the special characteristics of test cases, the faults in GUI (Graphic User Interface) applications are difficult to localize. This paper adapts feature selection algorithms to localize GUI-related faults in a given program. Features are defined as the subsequences of events executed. By employing statistical feature ranking techniques, the events can be ranked by the suspiciousness of events being responsible to exhibit faulty behavior. The features defined in a given source code implementing (event handle) the underlying event are then ranked in suspiciousness order. The evaluation of the proposed technique based on some open source Java projects verified the effectiveness of this feature selection based fault localization technique for GUI applications.
international conference on reliable software technologies | 2013
Xiaozhen Xue; Akbar Siami Namin
The statistics captured during testing a faulty program are the primary source of information for effective fault localization. A typical ranking metric estimates suspiciousness of executable statements and ranks them according to the estimated scores. The coverage-based ranking schemes, such as the metric used in Tarantula and Ochiai score, utilize the execution profile of each test case, including code coverage and the statistics associated with the number of failing and passing test cases. Although the coverage-based fault localization metrics could be extended to hypothesis testing and in particular to the chi-square test associated with crosstab or known as contingency tables, not all contingency table association metrics are explored and studied.
Proceedings of the 2017 International Conference on Deep Learning Technologies | 2017
Yulei Pang; Xiaozhen Xue; Huaying Wang
Vulnerabilities need to be detected and removed from software. Although previous studies demonstrated the usefulness of employing prediction techniques in deciding about vulnerabilities of software components, the improvement of effectiveness of these prediction techniques is still a grand challenging research question. This paper employed a technique based on a deep neural network with rectifier linear units trained with stochastic gradient descent method and batch normalization, for predicting vulnerable software components. The features are defined as continuous sequences of tokens in source code files. Besides, a statistical feature selection algorithm is then employed to reduce the feature and search space. We evaluated the proposed technique based on some Java Android applications, and the results demonstrated that the proposed technique could predict vulnerable classes, i.e., software components, with high precision, accuracy and recall.
International Journal of Software Engineering and Knowledge Engineering | 2017
Yulei Pang; Xiaozhen Xue; Akbar Siami Namin
We introduce a novel application of feature ranking methods to the fault localization problem. We envision the problem of localizing causes of failures as instances of ranking program’s elements where elements are conceptualized as features. In this paper, we define features as program’s statements. However, in its fine-grained definition, the idea of program’s features can refer to any traits of programs. This paper proposes feature ranking-based algorithms. The algorithms analyze execution traces of both passing and failing test cases, and extract the bug signatures from the failing test cases. The proposed procedure extracts possible combinations of program’s elements when executed together from bug signatures. The feature ranking-based algorithms then order statements according to the suspiciousness of the combinations. When viewed as sequences, the combination of program’s elements produced and traced in bug signatures can be utilized to reason about the common longest subsequence. The common longest...
international conference on machine learning and applications | 2016
Yulei Pang; Xiaozhen Xue; Akbar Siami Namin
Software components, which are vulnerable to being exploited, need to be identified and patched. Employing any prevention techniques designed for the purpose of detecting vulnerable software components in early stages can reduce the expenses associated with the software testing process significantly and thus help building a more reliable and robust software system. Although previous studies have demonstrated the effectiveness of adapting prediction techniques in vulnerability detection, the feasibility of those techniques is limited mainly because of insufficient training data sets. This paper proposes a prediction technique targeting at early identification of potentially vulnerable software components. In the proposed scheme, the potentially vulnerable components are viewed as mislabeled data that may contain true but not yet observed vulnerabilities. The proposed hybrid technique combines the supports vector machine algorithm and ensemble learning strategy to better identify potential vulnerable components. The proposed vulnerability detection scheme is evaluated using some Java Android applications. The results demonstrated that the proposed hybrid technique could identify potentially vulnerable classes with high precision and relatively acceptable accuracy and recall.