Xiaofei Xu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaofei Xu is active.

Explore More

Publication

Featured researches published by Xiaofei Xu.

database systems for advanced applications | 2017

Memory-Enhanced Latent Semantic Model: Short Text Understanding for Sentiment Analysis

Fei Hu; Xiaofei Xu; Jingyuan Wang; Zhanbo Yang; Li Li

Short texts, such as tweets and reviews, are not easy to be processed using conventional methods because of the short length, the irregular syntax and the lack of statistical signals. Term dependencies can be used to relax the problem, and to mine latent semantics hidden in short texts. And Long Short-Term Memory networks (LSTMs) can capture and remember term dependencies in a long distance. LSTMs have been widely used to mine semantics of short texts. At the same time, by analyzing the text, we find that a number of key words contribute greatly to the semantics of the text. In this paper, we propose a LSTM based model (MLSM) to enhance the memory of the key words in the short text. The proposed model is evaluated with two datasets: IMDB and SemEval2016, respectively. Experimental results demonstrate that the proposed method is effective with significant performance enhancement over the baseline LSTM and several other latent semantic models.

international conference on big data and cloud computing | 2014

An Improved Latent Dirichlet Allocation Model for Hot Topic Extraction

Guolong Liu; Xiaofei Xu; Ying Zhu; Li Li

Micro blogging is fast becoming a dominant medium in social media and its impact is evident in our daily lives. A massive amount of information is produced on a daily basis. It is observed that detecting hot topics can be very helpful for people to get essential information quickly. But due to short and sparse features, high flood of meaningless tweets and other characteristics of micro blogs, traditional topic detection methods are unable to achieve a desirable level of performance. In this paper, we propose a multi-attribute latent dirichlet allocation (MA-LDA) model, a topic analysis model in which the time and tag attributes of micro blogs are incorporated into LDA model. By introducing a time variable about the time attribute, MA-LDA model can decide whether a word should appear in hot topics or not. Applying tag attribute allows MA-LDA model to rank the core words high in results so that the expressiveness of outcomes can be improved over the traditional LDA model. Empirical evaluation on real data sets demonstrate our method is able to detect hot topics accurately and efficiently with more terms associated with each hot topic found. Our study provides strong evidence of the importance of the temporal factor in hot topics extraction.

web age information management | 2017

Efficient Stance Detection with Latent Feature

Xiaofei Xu; Fei Hu; Peiwen Du; Jingyuan Wang; Li Li

Social platforms, such as Twitter, are becoming more and more popular. However it is hard to identify the sentimental stance from those social media. In this paper, an approach is proposed to identify the stance of opinion. Digging out the latent factors of the given rough processed information is essential because it has the potential to reveal different aspects of the known information, which eventually contributes to the advancement of stance analysis. Generally, we take a very large number of articles from Chinese wikipedia as the corpus. The latent feature vectors are generated by word2vec. The HowNet sentiment dictionary (with positive and negative words) are applied to divide the items in the corpus into two parts. The two parts with sentiment polarity are used as the training set for SVM model. Experimentation on NLPCC 2016 Stance Detection dataset demonstrates that the proposed approach can outperform the baselines by about 10% in the term of precision.

Mathematical Problems in Engineering | 2017

Batch Image Encryption Using Generated Deep Features Based on Stacked Autoencoder Network

Fei Hu; Jingyuan Wang; Xiaofei Xu; Changjiu Pu; Tao Peng

Chaos-based algorithms have been widely adopted to encrypt images. But previous chaos-based encryption schemes are not secure enough for batch image encryption, for images are usually encrypted using a single sequence. Once an encrypted image is cracked, all the others will be vulnerable. In this paper, we proposed a batch image encryption scheme into which a stacked autoencoder (SAE) network was introduced to generate two chaotic matrices; then one set is used to produce a total shuffling matrix to shuffle the pixel positions on each plain image, and another produces a series of independent sequences of which each is used to confuse the relationship between the permutated image and the encrypted image. The scheme is efficient because of the advantages of parallel computing of SAE, which leads to a significant reduction in the run-time complexity; in addition, the hybrid application of shuffling and confusing enhances the encryption effect. To evaluate the efficiency of our scheme, we compared it with the prevalent “logistic map,” and outperformance was achieved in running time estimation. The experimental results and analysis show that our scheme has good encryption effect and is able to resist brute-force attack, statistical attack, and differential attack.

Journal of Computer Science and Technology | 2017

Emphasizing Essential Words for Sentiment Classification Based on Recurrent Neural Networks

Fei Hu; Li Li; Zili Zhang; Jingyuan Wang; Xiaofei Xu

With the explosion of online communication and publication, texts become obtainable via forums, chat messages, blogs, book reviews and movie reviews. Usually, these texts are much short and noisy without sufficient statistical signals and enough information for a good semantic analysis. Traditional natural language processing methods such as Bow-of-Word (BOW) based probabilistic latent semantic models fail to achieve high performance due to the short text environment. Recent researches have focused on the correlations between words, i.e., term dependencies, which could be helpful for mining latent semantics hidden in short texts and help people to understand them. Long short-term memory (LSTM) network can capture term dependencies and is able to remember the information for long periods of time. LSTM has been widely used and has obtained promising results in variants of problems of understanding latent semantics of texts. At the same time, by analyzing the texts, we find that a number of keywords contribute greatly to the semantics of the texts. In this paper, we establish a keyword vocabulary and propose an LSTM-based model that is sensitive to the words in the vocabulary; hence, the keywords leverage the semantics of the full document. The proposed model is evaluated in a short-text sentiment analysis task on two datasets: IMDB and SemEval-2016, respectively. Experimental results demonstrate that our model outperforms the baseline LSTM by 1%~2% in terms of accuracy and is effective with significant performance enhancement over several non-recurrent neural network latent semantic models (especially in dealing with short texts). We also incorporate the idea into a variant of LSTM named the gated recurrent unit (GRU) model and achieve good performance, which proves that our method is general enough to improve different deep learning models.

software engineering artificial intelligence networking and parallel distributed computing | 2016

A hybrid method for bilingual text sentiment classification based on deep learning

Guolong Liu; Xiaofei Xu; Bailong Deng; Siding Chen; Li Li

Text sentiment classification has occupied a pivotal position in sentiment analysis research, it offers important opinion mining functions. Nowadays, with explosion of information, many researchers are focusing on sentiment classification research on massive amounts of data. However, the traditional machine learning methods cannot acquire text semantic information and most research achievements are about single language, in this paper, a hybrid method which integrates the deep learning features and shallow learning features is proposed. The hybrid method can not only realize single language text sentiment classification but realize bilingual text sentiment classification as well. Models such as recurrent neural networks (RNNs) with long short term memory(LSTM), Naïve Bayes Support Vector Machine (NB-SVM), word vectors and bag-of-words are explored. Firstly, these models are studied separately in sentiment classification task. The paper then integrates the above methods as a whole to complete the task. Different combination strategies are discussed regarding the contribution of each method. The experiments show that the accuracy can reach 89% and the hybrid method performs much better than any other method individually. The proposed method achieves a performance close to the state-of-the-art methods based on the had-engineered features. Whats more, the hybrid model can learn more linguistic phenomena with the growth of the accuracy of emotional tendency discrimination when more background knowledge is available.

knowledge science, engineering and management | 2016

Analyzing Topic-Sentiment and Topic Evolution over Time from Social Media

Yan Hu; Xiaofei Xu; Li Li

Most online news websites have enabled users to annotate their sentiments while reading the news. Different from traditional users’ feedbacks such as reviews or ratings, those annotations are more intuitive to express the sentiment of the users. Topic model is proved more effective to analyze the text information, however, most existing topic models focus on either extracting static topic sentiment or tracking topics over time but ignoring sentiment analysis. In the paper, we propose a joint topic-sentiment over time model (JTSoT) to detect the topic-sentiment shift and track the topic evolution over time. The critical challenge is how to balance the relationship among the topic, sentiment and time. The topic is represented as a Beta distribution over time and a Dirichlet distribution with respect to the sentiment. We evaluate our method on the real-world news dataset. The experimental results show that we have achieved high correlation between the topic and sentiment, better interpretable topic evolution, and higher document sentiment classification result and perplexity.

web age information management | 2018

Hybrid Decision Based Chinese News Headline Classification

Yukun Cao; Xiaofei Xu; Ye Du; Jun He; Li Li

In recent years, short text classification is attracting more attention. With the development of social platforms such as micro blogging and wechatting, Chinese short text classification has great impact on public opinion analysis and sentiment mining. Among social media texts, news headline classification has substantial influence on both academia and Internet economy. The issues such as semantic sparsity caused by the limited length of texts, and the grammatical nonstandard of the text, have prevented the performance of classification. In the paper, a Chinese news headline classification method based on multi model decision is proposed. First, an effective Convolutional Neural Network (CNN) is applied as one of text classifiers, at the same time, a Long Short-Term Memory (LSTM) is used as another text classifier as well. The aim is to obtain both abstract semantics of news headlines through CNN and context information between word sequences through LSTM. Second, an efficient text categorization tool - fastText (Facebook) is introduced to get the most excellent and balanced results. Finally, a decision model is proposed to favor the best performance of classification. A simple but very effective voting system is proposed and the result is very promising. Experiments based on the dataset from nlpcc 2017 Task2 has proved the efficiency of our method. Our method achieves much higher performance (\(F_{1}\) of 79%) than the baseline provided by nlpcc 2017.

knowledge science, engineering and management | 2018

P-DBL: A Deep Traffic Flow Prediction Architecture Based on Trajectory Data

Jingyuan Wang; Xiaofei Xu; Jun He; Li Li

Predicting large-scale transportation network traffic flow has become an important and challenging topic in recent decades. However, accurate traffic flow prediction is still hard to realize. Weather factors such as precipitation in residential areas and tourist destinations affect traffic flow on the surrounding roads. In this paper, we attempt to take precipitation impact into consideration when predicting traffic flow. To realize this idea, we propose a deep traffic flow prediction architecture by introducing a deep bi-directional long short-term memory model, precipitation information, residual connection, regression layer and dropout training method. The proposed model has good ability to capture the deep features of traffic flow. Besides, it can take full advantage of time-aware traffic flow data and additional precipitation data. We evaluate the prediction architecture on taxi trajectory dataset in Chongqing and taxi trajectory dataset in Beijing with corresponding precipitation data from China Meteorological Data Service Center (CMDC). The experiment results demonstrate that the proposed model for traffic flow prediction obtains high accuracy compared with other models.

Neural Computing and Applications | 2018

Opinion extraction by distinguishing term dependencies and digging deep text features

Fei Hu; Li Li; Xiaofei Xu; Jingyuan Wang; Jinjing Zhang

Opinion extraction of user reviews has been playing an important role in the academic and industrial fields, and a lot of progresses were achieved by recurrent neural networks (RNNs). Compared with conventional bag-of-word-based models, RNNs can capture dependencies among words, able to remember contextual information for long periods of time. However, RNNs resort to assign a uniform weighted dependency between pairwise words. It is against the fact that people pay attention to different words in varying degrees when reading a text. In this paper, we develop a deeply hierarchical bi-directional key-word emphasis model (DHBK) by introducing term dependencies, human distinguishing memory mechanism, residual connections, deeply hierarchical networks and bi-directional information flow. This model is able to capture weighted term dependencies according to different words, and mine deep text features, and then better extract opinions of the user. Furthermore, we introduce the DHBK and the dropout training method into an opinion extraction task and advocate two novel frameworks: DHBK based on LSTM (DKL) and DHBK based on GRU (DKG). We evaluate the frameworks on two real-world datasets, respectively, IMDB and SemEval-2016 Task 4 Subtask A. Experimental results demonstrate that the improvements are effective with a significant performance enhancement.

Explore More