Yuan Ling
Drexel University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yuan Ling.
bioinformatics and biomedicine | 2014
Mengwen Liu; Yuan Ling; Yuan An; Xiaohua Hu
We develop a novel distant supervised model that integrates the results from open information extraction techniques to perform relation extraction task from biomedical literature. Unlike state-of-the-art models for relation extraction in biomedical domain which are mainly based on supervised methods, our approach does not require manually-labeled instances. In addition, our model incorporates a grouping strategy to take into consideration the coordinating structure among entities co-occurred in one sentence. We apply our approach to extract gene expression relationship between genes and brain regions from literature. Results show that our methods can achieve promising performance over baselines of Transductive Support Vector Machine and with non-grouping strategy.
bioinformatics and biomedicine | 2014
Yuan Ling; Yuan An; Xiaohua Hu
Clinical notes are rich free-text data sources containing valuable symptom and medication information. Little research has been done on matching medication information with multiple symptoms information. Such a matching could provide valuable information for patients with multiple syndromes. We propose a Symptom-Medication (Symp-Med) matching framework to model symptom and medication relationships from clinical notes. After extracting symptom and medication concepts, we construct a weighted bipartite graph to represent the relationships between the two groups of concepts. The key is to efficiently answer users symptom-medication queries using the graph. We formulate this problem as an Integer Linear Programming (ILP) problem. The objectives are to maximize the total edge weight and minimize the number of medication concepts. We first explore a Branch-and-Cut based algorithm. Then, we revise the combinational objective, and propose a Greedy-based algorithm for solving the Symp-Med problem. The Greedy-based algorithm performs better and significantly improves the computational costs.
bioinformatics and biomedicine | 2013
Yuan Ling; Yuan An; Mengwen Liu; Xiaohua Hu
We develop an error detecting and tagging framework for reducing data entry errors in Electronic Medical Records (EMR) systems. We propose a taxonomy of data errors with three levels: Incorrect Format and Missing error, Out of Range error, and Inconsistent error. We aim to address the challenging problem of detecting erroneous input values that look statistically normal but are abnormal in medical sense. Detecting such an error needs to take patient medical history and population data into consideration. In particular, we propose a probabilistic method based on the assumption that the input value for a field depends on the historical records of this field, and is affected by other fields through dependency relationships. We evaluate our methods using the data collected from an EMR System. The results show that the method is promising for automatic data entry error detection.
international symposium on neural networks | 2017
Yuan Ling; Yuan An; Mengwen Liu; Sadid A. Hasan; Yetian Fan; Xiaohua Hu
Word embedding in the NLP area has attracted increasing attention in recent years. The continuous bag-of-words model (CBOW) and the continuous Skip-gram model (Skip-gram) have been developed to learn distributed representations of words from a large amount of unlabeled text data. In this paper, we explore the idea of integrating extra knowledge to the CBOW and Skip-gram models and applying the new models to biomedical NLP tasks. The main idea is to construct a weighted graph from knowledge bases (KBs) to represent structured relationships among words/concepts. In particular, we propose a GCBOW model and a GSkip-gram model respectively by integrating such a graph into the original CBOW model and Skip-gram model via graph regularization. Our experiments on four general domain standard datasets show encouraging improvements with the new models. Further evaluations on two biomedical NLP tasks (biomedical similarity/relatedness task and biomedical Information Retrieval (IR) task) show that our methods have better performance than baselines.
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications | 2017
Yuan Ling; Yuan An; Sadid A. Hasan
This paper presents a novel approach to the task of automatically inferring the most probable diagnosis from a given clinical narrative. Structured Knowledge Bases (KBs) can be useful for such complex tasks but not sufficient. Hence, we leverage a vast amount of unstructured free text to integrate with structured KBs. The key innovative ideas include building a concept graph from both structured and unstructured knowledge sources and ranking the diagnosis concepts using the enhanced word embedding vectors learned from integrated sources. Experiments on the TREC CDS and HumanDx datasets showed that our methods improved the results of clinical diagnosis inference.
bioinformatics and biomedicine | 2015
Yetian Fan; Xingpeng Jiang; Xiaohua Hu; Bo Song; Yuan Ling; Wei Wu
Visualization is an important method in microbiome data analysis, and dimensionality reduction is a necessary procedure to achieve it. Multidimensional Scaling (MDS) is a popular method, which is necessary to compute the distance matrix. The Unifrac distance is very reasonable and biologically meaningful in the analysis of microbiome data. Due to the complexity of the phylogenetic tree and the high dimensionality of data, MDS needs a large amount of calculations to determine all the distances between pairs. In this paper, we proposed a novel dimensionality reduction algorithm based on Laplace matrix (DRLM) for the analysis of microbiome data. The experimental results indicate that both on synthesized and microbiome data, our algorithm DRLM can not only cluster the data more clearly, but also can significantly reduce the computational cost.
IEEE Transactions on Nanobioscience | 2016
Yetian Fan; Xingpeng Jiang; Xiaohua Hu; Bo Song; Yuan Ling; Wei Wu
Visualization is an important method of data analysis in the study of microbiome, with the dimensionality reduction techniques as its prerequisites for high-dimensional data. Multidimensional scaling (MDS), as a popular method for data visualization, can provide a low-dimensional representation of the original data utilizing its distance matrix. Meanwhile, the unique fraction metric (UniFrac) is a very reasonable and biologically meaningful metric for calculating distance matrices through a phylogenetic tree constructed from microbiome data. However, due to the complexity of the phylogenetic tree and the notable high dimensionality of the microbiome data, applying the MDS with UniFrac would require costly calculations. In this paper, we propose a novel dimensionality reduction algorithm based on Laplace matrix (DRLM) for microbiome data analysis. The experimental results from both synthesized and real microbiome data demonstrate the proposed DRLM is able to conduct more distinct clustering while significantly reducing the computational cost for the dimensionality reduction and visualization in the microbiome data analysis.
international conference on big data | 2014
Xiaoli Song; Yue Shang; Yuan Ling; Mengwen Liu; Xiaohua Hu
Topic modeling is a powerful tool to model documents to find their underlying topics. However, the unstructured nature of the raw text makes it hard to model the semantic relationship between the text units, which may be the words, phrases or sentences, and thus even harder to model their corresponding underlying topics. In our work, we try to examine the pairwise relationship of the underlying topics through relation extraction. We first extract the entity pairs within one relation tuple out of the raw text. Then, we model the relationship between the entity pairs by adding the dependencies between entities and their corresponding topics. We propose six different versions of Pairwise Topic Model (PTM) to simultaneously discover the latent topics and their pairwise relationship. The experiment on four data sets (AP news articles, DUC 2004 task2, Clinical Notes and Neuroscience Papers) shows the PTM models are better-structured language model than the traditional topic model Latent Dirichlet Allocation (LDA). Also, empirical results show that the proposed Pairwise Topic Models (PTMs) can explicitly explain how two topics are related.
north american chapter of the association for computational linguistics | 2018
Reza Ghaeini; Sadid A. Hasan; Vivek V. Datla; Joey Liu; Kathy Lee; Ashequl Qadir; Yuan Ling; Aaditya Prakash; Xiaoli Z. Fern; Oladimeji Farri
text retrieval conference | 2015
Sadid A. Hasan; Yuan Ling; Joey Liu; Oladimeji Farri