Shanping Li | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shanping Li is active.

Explore More

Publication

Featured researches published by Shanping Li.

automated software engineering | 2016

Predicting semantically linkable knowledge in developer online forums via convolutional neural network

Bowen Xu; Deheng Ye; Zhenchang Xing; Xin Xia; Guibin Chen; Shanping Li

Consider a question and its answers in Stack Overflow as a knowledge unit. Knowledge units often contain semantically relevant knowledge, and thus linkable for different purposes, such as duplicate questions, directly linkable for problem solving, indirectly linkable for related information. Recognising different classes of linkable knowledge would support more targeted information needs when users search or explore the knowledge base. Existing methods focus on binary relatedness (i.e., related or not), and are not robust to recognize different classes of semantic relatedness when linkable knowledge units share few words in common (i.e., have lexical gap). In this paper, we formulate the problem of predicting semantically linkable knowledge units as a multiclass classification problem, and solve the problem using deep learning techniques. To overcome the lexical gap issue, we adopt neural language model (word embeddings) and convolutional neural network (CNN) to capture word- and document-level semantics of knowledge units. Instead of using human-engineered classifier features which are hard to design for informal user-generated content, we exploit large amounts of different types of user-created knowledge-unit links to train the CNN to learn the most informative wordlevel and document-level features for the multiclass classification task. Our evaluation shows that our deep-learning based approach significantly and consistently outperforms traditional methods using traditional word representations and human-engineered classifier features.

pacific asia workshop on intelligence and security informatics | 2012

Information credibility on twitter in emergency situation

Xin Xia; Xiaohu Yang; Chao Wu; Shanping Li; Linfeng Bao

Twitter has shown its greatest power of influence for its fast information diffusion. Previous research has shown that most of the tweets posted are truthful, but as some people post the rumors and spams on Twitter in emergence situation, the direction of public opinion can be misled and even the riots are caused. In this paper, we focus on the methods for the information credibility in emergency situation. More precisely, we build a novel Twitter monitor model to monitoring Twitter online. Within the novel monitor model, an unsupervised learning algorithm is proposed to detect the emergency situation. A collection of training dataset which includes the tweets of typical events is gathered through the Twitter monitor. Then we manually dispatch the dataset to experts who label each tweet into two classes: credibility or incredibility. With the classified tweets, a number of features related to the user social behavior, the tweet content, the tweet topic and the tweet diffusion are extracted. A supervised method using learning Bayesian Network is used to predict the tweets credibility in emergency situation. Experiments with the tweets of UK Riots related topics show that our procedure achieves good performance to classify the tweets compared with other state-of-art algorithms.

conference on software maintenance and reengineering | 2013

A Comparative Study of Supervised Learning Algorithms for Re-opened Bug Prediction

Xin Xia; David Lo; Xinyu Wang; Xiaohu Yang; Shanping Li; Jianling Sun

Bug fixing is a time-consuming and costly job which is performed in the whole life cycle of software development and maintenance. For many systems, bugs are managed in bug management systems such as Bugzilla. Generally, the status of a typical bug report in Bugzilla changes from new to assigned, verified and closed. However, some bugs have to be reopened. Reopened bugs increase the software development and maintenance cost, increase the workload of bug fixers, and might even delay the future delivery of a software. Only a few studies investigate the phenomenon of reopened bug reports. In this paper, we evaluate the effectiveness of various supervised learning algorithms to predict if a bug report would be reopened. We choose 7 state-of-the-art classical supervised learning algorithm in machine learning literature, i.e., kNN, SVM, Simple Logistic, Bayesian Network, Decision Table, CART and LWL, and 3 ensemble learning algorithms, i.e., AdaBoost, Bagging and Random Forest, and evaluate their performance in predicting reopened bug reports. The experiment results show that among the 10 algorithms, Bagging and Decision Table (IDTM) achieve the best performance. They achieve accuracy scores of 92.91% and 92.80%, respectively, and reopened bug reports F-Measure scores of 0.735 and 0.732, respectively. These results improve the reopened bug reports F-Measure of the state-of-the-art approaches proposed by Shihab et al. by up to 23.53%.

asia pacific symposium on internetware | 2010

A QoS ontology cooperated with feature models for non-functional requirements elicitation

Ting Wang; Yuanjie Si; Xiao Xuan; Xinyu Wang; Xiaohu Yang; Shanping Li; Aleksander J. Kavs

Non-functional requirements (NFRs) are often regarded as the key success factor in building high quality software. However, most of the requirements elicitation methods are centered on discovering functional requirements only. This paper presents a novel NFRs elicitation approach aiming at empowering requirements analysts with a knowledge repository that aids to the process of capturing precise NFRs during elicitation interviews. The knowledge repository is composed of two layers: the upper layer of feature models and the lower layer of the QoS ontology. The case study of the stock trading domain illustrates the relationships and cooperations of the two layers.

tools and algorithms for construction and analysis of systems | 2014

Are Timed Automata Bad for a Specification Language? Language Inclusion Checking for Timed Automata

Ting Wang; Jun Sun; Yang Liu; Xinyu Wang; Shanping Li

Given a timed automaton \(\cal P\) modeling an implementation and a timed automaton \(\cal S\) as a specification, language inclusion checking is to decide whether the language of \(\cal P\) is a subset of that of \(\cal S\). It is known that this problem is undecidable and “this result is an obstacle in using timed automata as a specification language” [2]. This undecidability result, however, does not imply that all timed automata are bad for specification. In this work, we propose a zone-based semi-algorithm for language inclusion checking, which implements simulation reduction based on Anti-Chain and LU-simulation. Though it is not guaranteed to terminate, we show that it does in many cases through both theoretical and empirical analysis. The semi-algorithm has been incorporated into the PAT model checker, and applied to multiple systems to show its usefulness and scalability.

acm symposium on applied computing | 2013

GCplace: geo-cloud based correlation aware data replica placement

Zhen Ye; Shanping Li; Xiaozhen Zhou

Cross datacenter data replication has been widely used in geo-cloud environment due to its ability to increase applications availability and improve the performance. However, with the large scale of cloud, it is difficult to determine the location of replicas among datacenters in order to minimize overall user access latency. The data correlation between each other makes replica placement problem more complex. To address these large scale and data correlation issues, we propose a two-step approach called GCplace. Before applying GCplace, a network coordinate system is used to predict the latency between all users and datacenter nodes. In the first step of GCplace, we introduce a stream based similarity clustering, which uses a small number of micro clusters to represent huge number of users and thus significantly reducing the cost of replica placement algorithm. In the second step, an iterative algorithm is proposed to get an approximation solution. We evaluated our approach by using a large scale real network latency dataset. Comprehensive experiments show that GCplace can reduce average user access latency significantly.

conference on information and knowledge management | 2011

RW.KNN: a proposed random walk KNN algorithm for multi-label classification

Xin Xia; Xiaohu Yang; Shanping Li; Chao Wu; Linlin Zhou

Multi-label classification refers to the problem that predicts each single instance to be one or more labels in a set of associated labels. It is common in many real-world applications such as text categorization, functional genomics and semantic scene classification. The main challenge for multi-label classification is predicting the labels of a new instance with the exponential number of possible label sets. Previous works mainly pay attention to transforming the multi-label classification to be single-label classification or modifying the existing traditional algorithm. In this paper, a novel algorithm which combines the advantage of the famous KNN and Random Walk algorithm (RW.KNN) is proposed. The KNN based link graph is built with the k-nearest neighbors for each instance. For an unseen instance, a random walk is performed in the link graph. The final probability is computed according to the random walk results. Lastly, a novel algorithm based on minimizing Hamming Loss to select the classification threshold is also proposed in this paper.

Empirical Software Engineering | 2018

Identifying self-admitted technical debt in open source projects using text mining

Qiao Huang; Emad Shihab; Xin Xia; David Lo; Shanping Li

Technical debt is a metaphor to describe the situation in which long-term code quality is traded for short-term goals in software projects. Recently, the concept of self-admitted technical debt (SATD) was proposed, which considers debt that is intentionally introduced, e.g., in the form of quick or temporary fixes. Prior work on SATD has shown that source code comments can be used to successfully detect SATD, however, most current state-of-the-art classification approaches of SATD rely on manual inspection of the source code comments. In this paper, we proposed an automated approach to detect SATD in source code comments using text mining. In our approach, we utilize feature selection to select useful features for classifier training, and we combine multiple classifiers from different source projects to build a composite classifier that identifies SATD comments in a target project. We investigate the performance of our approach on 8 open source projects that contain 212,413 comments. Our experimental results show that, on every target project, our approach outperforms the state-of-the-art and the baselines approaches in terms of F1-score. The F1-score achieved by our approach ranges between 0.518 - 0.841, with an average of 0.737, which improves over the state-of-the-art approach proposed by Potdar and Shihab by 499.19%. When compared with the text mining-based baseline approaches, our approach significantly improves the average F1-score by at least 58.49%. When compared with a natural language processing-based baseline, our approach also significantly improves its F1-score by 27.95%. Our proposed approach can be used by project personnel to effectively identify SATD with minimal manual effort.

international conference on engineering of complex computer systems | 2015

Combining Software Metrics and Text Features for Vulnerable File Prediction

Yun Zhang; David Lo; Xin Xia; Bowen Xu; Jianling Sun; Shanping Li

In recent years, to help developers reduce time and effort required to build highly secure software, a number of prediction models which are built on different kinds of features have been proposed to identify vulnerable source code files. In this paper, we propose a novel approach VULPREDICTOR to predict vulnerable files, it analyzes software metrics and text mining together to build a composite prediction model. VULPREDICTOR first builds 6 underlying classifiers on a training set of vulnerable and non-vulnerable files represented by their software metrics and text features, and then constructs a meta classifier to process the outputs of the 6 underlying classifiers. We evaluate our solution on datasets from three web applications including Drupal, PHPMyAdmin and Moodle which contain a total of 3,466 files and 223 vulnerabilities. The experiment results show that VULPREDICTOR can achieve F1 and EffectivenessRatio@20% scores of up to 0.683 and 75%, respectively. On average across the 3 projects, VULPREDICTOR improves the F1 and EffectivenessRatio@20% scores of the best performing state-of-the-art approaches proposed by Walden et al. by 46.53% and 14.93%, respectively.

acm symposium on applied computing | 2014

An empirical study of bugs in build process

Xiaoqiong Zhao; Xin Xia; Pavneet Singh Kochhar; David Lo; Shanping Li

Software build process translates source codes into executable programs, packages the programs, generates documents, and distributes products. In this paper, we perform an empirical study to characterize build process bugs. We analyze bugs in build process in 5 open-source systems under Apache namely CXF, Camel, Felix, Struts, and Tuscany. We compare build process bugs and other bugs across 3 different dimensions, i.e., bug severity, bug fix time, and the number of files modified to fix a bug. Our results show that the fraction of build process bugs which are above major severity level is lower than that of other bugs. However, the time effort required to fix a build process bug is around 2.03 times more than that of a non-build process bug, and the number of source files modified to fix a build process bug is around 2.34 times more than that modified for a non-build bug.

Explore More