Is this you? Create Your Porfile

Kan Xu

Dalian University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kan Xu is active.

Explore More

Publication

Featured researches published by Kan Xu.

Scientometrics | 2014

Literature retrieval based on citation context

Shengbo Liu; Chaomei Chen; Kun Ding; Bo Wang; Kan Xu; Yuan Lin

While the citation context of a reference may provide detailed and direct information about the nature of a citation, few studies have specifically addressed the role of this information in retrieving relevant documents from the literature primarily due to the lack of full text databases. In this paper, we design a retrieval system based on full texts in the PubMed Central database. We constructed two modules in the retrieval system. One is a reference retrieval module based on citation contexts. Another is a citation context retrieval module for searching the citation contexts of a specific paper. The results of comparisons show that the reference retrieval module performed better than Google Scholar and PubMed database in terms of finding proper references based on topic words extracted from citation context. It also performed very well on searching highly cited papers and classic papers. The citation context retrieval module visualizes the topics of citation contexts as tag clouds and classifies citation contexts based on cue words in citation contexts.

Scientometrics | 2015

How we collaborate: characterizing, modeling and predicting scientific collaborations

Xiaoling Sun; Hongfei Lin; Kan Xu; Kun Ding

The large amounts of publicly available bibliographic repositories on the web provide us great opportunities to study the scientific behaviors of scholars. This paper aims to study the way we collaborate, model the dynamics of collaborations and predict future collaborations among authors. We investigate the collaborations in three disciplines including physics, computer science and information science,and different kinds of features which may influence the creation of collaborations. Path-based features are found to be particularly useful in predicting collaborations. Besides, the combination of path-based and attribute-based features achieves almost the same performance as the combination of all features considered. Inspired by the findings, we propose an agent-based model to simulate the dynamics of collaborations. The model merges the ideas of network structure and node attributes by leveraging random walk mechanism and interests similarity. Empirical results show that the model could reproduce a number of realistic and critical network statistics and patterns. We further apply the model to predict collaborations in an unsupervised manner and compare it with several state-of-the-art approaches. The proposed model achieves the best predictive performance compared with the random baseline and other approaches. The results suggest that both network structure and node attributes may play an important role in shaping the evolution of collaboration networks.

Chinese National Conference on Social Media Processing | 2017

Social Annotation for Query Expansion Learning from Multiple Expansion Strategies

Yuan Lin; Bo Xu; Luying Li; Hongfei Lin; Kan Xu

User-generated content, such as web pages, is often annotated by users with free-text labels, called annotations, which can be an effective source of information for query formulation tasks. The implicit relationships between annotations can be important to select expansion terms. However, extracting such knowledge from social annotations presents many challenges, since annotations are often ambiguous, noisy, and uncertain. Besides, most research uses a single query expansion method for query expansion tasks, and never considers the annotations attributes. In contrast, in this paper, we proposed a novel framework that optimized the combination of three query expansion methods used for expansion terms from social annotations in three strategies. Furthermore, we also introduce learning to rank methods for phrase weighting, and select the features from social annotation resource for training ranking model. Experimental results on three TREC test collections show that the retrieval performance can be improved by our proposed method.

Neural Computing and Applications | 2018

Detecting adverse drug reactions from social media based on multi-channel convolutional neural networks

Chen Shen; Hongfei Lin; Kai Guo; Kan Xu; Zhihao Yang; Jian Wang

As one of the most important medical field subjects, adverse drug reaction seriously affects the patient’s life, health, and safety. Although many methods have been proposed, there are still plenty of important adverse drug reactions unknown, due to the complexity of the detection process. Social media, such as medical forums and social networking services, collects a large amount of drug use information from patients, and so is important for adverse drug reaction mining. However, most of the existing studies only involved a single source of data. This study automatically crawls the information published by users of the MedHelp Medical Forum. Then combining it with disease-related user posts which obtained from Twitter. We combine different word embeddings and utilize a multi-channel convolutional neural network to deal with the challenge that encountered in data representation of multiple sources, and further identify text containing adverse drug reaction information. In particular, in this process, to enable the model to take advantage of the morphological and shape information of words, we use a convolutional channel to learn the features from character-level embeddings of words. The experiment results show that the proposed method improved the representation of words and thus effectively detects adverse drug reactions from text.

international symposium on bioinformatics research and applications | 2017

Detecting Potential Adverse Drug Reactions Using Association Rules and Embedding Models.

Kai Guo; Hongfei Lin; Bo Xu; Zhihao Yang; Jian Wang; Yuanyuan Sun; Kan Xu

Adverse drug reactions (ADRs) may occur following a single dose or prolonged administration of a drug or result from the combination of two or more drugs. Given the restrictions of the traditional methods like clinical trials, it’s difficult to detect the ADRs in a timely manner. Many countries have built spontaneous adverse drug event reporting systems, which provide a large amount of adverse drug event reports for research purpose. In this paper, we utilize the association rule mining to reconstruct the data from adverse drug event reports, and apply modified embedding models to calculate the relevance of the drug and adverse reactions to detect potential ADRs. We examine the effectiveness of methods by conducting experiments on two drugs: Gadoversetamide and Rofecoxib, finding 6 potential drug reactions, which can be further verified by biomedical data.

conference on information and knowledge management | 2017

Learning to Rank with Query-level Semi-supervised Autoencoders

Bo Xu; Hongfei Lin; Yuan Lin; Kan Xu

Learning to rank utilizes machine learning methods to solve ranking problems by constructing ranking models in a supervised way, which needs fixed-length feature vectors of documents as inputs, and outputs the ranking models learned by iteratively reducing the pre-defined ranking loss. The document features are always extracted based on classic textual statistics, and different features contribute differently to ranking performance. Given that well-defined features would contribute more to the retrieval performance, we investigate the usage of autoencoders to enrich the feature representations of documents. Autoencoders, as basic building blocks of deep neural networks, have been successfully used in many text mining tasks for generating effective features. To enrich the feature space for learning to rank, we introduce supervision into the loss functions of autoencoders. Specifically, we first train a linear ranking model on the training data, and then incorporate the learned weights into the reconstruction costs of an autoencoder. Meanwhile, we accumulate the costs of documents for a given query with query-level constraints for producing more useful features. We evaluate the effectiveness of our model on three LETOR datasets, and show that our model can generate effective document features to improve the retrieval performance.

International Journal of Machine Learning and Cybernetics | 2017

Learning to rank using multiple loss functions

Yuan Lin; Jiajin Wu; Bo Xu; Kan Xu; Hongfei Lin

Learning to rank has attracted much attention in the domain of information retrieval and machine learning. Prior studies on learning to rank mainly focused on three types of methods, namely, pointwise, pairwise and listwise. Each of these paradigms focuses on a different aspect of input instances sampled from the training dataset. This paper explores how to combine them to improve ranking performance. The basic idea is to incorporate the different loss functions and enrich the objective loss function. We present a flexible framework for multiple loss function incorporation and based on which three loss-weighting schemes are given. Moreover, in order to get good performance, we define several candidate loss functions and select them experimentally. The performance of the three types of weighting schemes is compared on LETOR3.0 dataset, which demonstrates that with a good weighting scheme, our method significantly outperforms the baselines which use single loss function, and it is at least comparable to the state-of-the-art algorithms in most cases.

China Conference on Information Retrieval | 2017

Tripartite-Replicated Softmax Model for Document Representations

Bo Xu; Hongfei Lin; Lin Wang; Yuan Lin; Kan Xu; Xiaocong Wei; Dong Huang

Text mining tasks based on machine learning require inputs to be represented as fixed-length vectors, and effective vectors of words, phrases, sentences and even documents may greatly improve the performance of these tasks. Recently, distributed word representations based on neural networks have been demonstrated powerful in many tasks by encoding abundant semantic and linguistic information. However, it remains a great challenge for document representations because of the complex semantic structures in different documents. To meet the challenge, we propose two novel tripartite graphical models for document representations by incorporating word representations into the Replicated Softmax model, and we name the models as Tripartite-Replicated Softmax model (TRPS) and directed Tripartite-Replicated Softmax model (d-TRPS), respectively. We also introduce some optimization strategies for training the proposed models to learn better document representations. The proposed models can capture linear relationships among words and latent semantic information within documents simultaneously, thus learning both linear and nonlinear document representations. We examine the learned document representations in a document classification task and a document retrieval task. Experimental results show that the learned representations by our models outperform the state-of-the-art models in improving the performance of these two tasks.

BMC Medical Informatics and Decision Making | 2017

A multiple distributed representation method based on neural network for biomedical event extraction

Anran Wang; Jian Wang; Hongfei Lin; Jianhai Zhang; Zhihao Yang; Kan Xu

BackgroundBiomedical event extraction is one of the most frontier domains in biomedical research. The two main subtasks of biomedical event extraction are trigger identification and arguments detection which can both be considered as classification problems. However, traditional state-of-the-art methods are based on support vector machine (SVM) with massive manually designed one-hot represented features, which require enormous work but lack semantic relation among words.MethodsIn this paper, we propose a multiple distributed representation method for biomedical event extraction. The method combines context consisting of dependency-based word embedding, and task-based features represented in a distributed way as the input of deep learning models to train deep learning models. Finally, we used softmax classifier to label the example candidates.ResultsThe experimental results on Multi-Level Event Extraction (MLEE) corpus show higher F-scores of 77.97% in trigger identification and 58.31% in overall compared to the state-of-the-art SVM method.ConclusionsOur distributed representation method for biomedical event extraction avoids the problems of semantic gap and dimension disaster from traditional one-hot representation methods. The promising results demonstrate that our proposed method is effective for biomedical event extraction.

asia information retrieval symposium | 2016

Patent Retrieval Based on Multiple Information Resources

Kan Xu; Hongfei Lin; Yuan Lin; Bo Xu; Liang Yang; Shaowu Zhang

Query expansion methods have been proven to be effective to improve the average performance of patent retrieval, and most of query expansion methods use single source of information for query expansion term selection. In this paper, we propose a method which exploits external resources for improving patent retrieval. Google search engine and Derwent World Patents Index were used as external resources to enhance the performance of query expansion methods. LambdaRank was employed to improve patent retrieval performance by combining different query expansion methods with different text fields weighting strategies of different resources. Experiments on TREC data sets showed that our combination of multiple information sources for query formulation was more effective than using any single source to improve patent retrieval performance.

Explore More