Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Liwei Wang is active.

Publication


Featured researches published by Liwei Wang.


Journal of Biomedical Informatics | 2018

Clinical information extraction applications: A literature review

Yanshan Wang; Liwei Wang; Majid Rastegar-Mojarad; Sungrim Moon; Feichen Shen; Naveed Afzal; Sijia Liu; Yuqun Zeng; Saeed Mehrabi; Sunghwan Sohn; Hongfang Liu

BACKGROUND With the rapid adoption of electronic health records (EHRs), it is desirable to harvest information and knowledge from EHRs to support automated systems at the point of care and to enable secondary use of EHRs for clinical and translational research. One critical component used to facilitate the secondary use of EHR data is the information extraction (IE) task, which automatically extracts and encodes clinical information from text. OBJECTIVES In this literature review, we present a review of recent published research on clinical information extraction (IE) applications. METHODS A literature search was conducted for articles published from January 2009 to September 2016 based on Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and ACM Digital Library. RESULTS A total of 1917 publications were identified for title and abstract screening. Of these publications, 263 articles were selected and discussed in this review in terms of publication venues and data sources, clinical IE tools, methods, and applications in the areas of disease- and drug-related studies, and clinical workflow optimizations. CONCLUSIONS Clinical IE has been used for a wide range of applications, however, there is a considerable gap between clinical studies using EHR data and studies using clinical IE. This study enabled us to gain a more concrete understanding of the gap and to provide potential solutions to bridge this gap.


international conference on bioinformatics | 2017

Dependency and AMR Embeddings for Drug-Drug Interaction Extraction from Biomedical Literature

Yanshan Wang; Sijia Liu; Majid Rastegar-Mojarad; Liwei Wang; Feichen Shen; Fei Liu; Hongfang Liu

Drug-drug interaction (DDI) is an unexpected change in a drugs effect on the human body when the drug and a second drug are co-prescribed and taken together. As many DDIs are frequently reported in biomedical literature, it is important to mine DDI information from literature to keep DDI knowledge up to date. One of the SemEval challenges in the year 2011 and 2013 was designed to tackle the task where the best system achieved an F1 score of 0.80. In this paper, we propose to utilize dependency embeddings and Abstract Meaning Representation (AMR) embeddings as features for extracting DDIs. Our contribution is two-fold. First, we employed dependency embeddings, previously shown effective for sentence classification, for DDI extraction. The dependency embeddings incorporated structural syntactic contexts into the embeddings, which were not present in the conventional word embeddings. Second, we proposed a novel syntactic embedding approach using AMR. AMR aims to abstract away from syntactic idiosyncrasies and attempts to capture only the core meaning of a sentence, which could potentially improve DDI extraction from sentences. Two classifiers (Support Vector Machine and Random Forest) taking these embedding features as input were evaluated on the DDIExtraction 2013 challenge corpus. The experimental results show the effectiveness of dependency and AMR embeddings in the DDI extraction task. The best performance was obtained by combining word, dependency and AMR embeddings (F1 score=0.84).


16th World Congress of Medical and Health Informatics: Precision Healthcare through Informatics, MedInfo 2017 | 2017

Phenotypic Analysis of Clinical Narratives Using Human Phenotype Ontology.

Feichen Shen; Liwei Wang; Hongfang Liu

Phenotypes are defined as observable characteristics and clinical traits of diseases and organisms. As connectors between medical experimental findings and clinical practices, phenotypes play vital roles in translational medicine. To facilitate the translation between genotype and phenotype, Human Phenotype Ontology (HPO) was developed as a semantically computable vocabulary to capture phenotypic abnormalities found in human diseases discovered through biomedical research. The use of HPO in annotating phenotypic information in clinical practice remains unexplored. In this study, we investigated the use of HPO to annotate phenotypic information in clinical domain by leveraging a corpus of 12.8 million clinical notes created from 2010 to 2015 for 729 thousand patients at Mayo Clinic Rochester campus and assessed the distribution information of HPO terms in the corpus. We also analyzed the distributional difference of HPO terms among demographic groups. We further demonstrated the potential application of the annotated corpus to support knowledge discovery in precision medicine through Wilson’s Disease.


international conference on bioinformatics | 2016

Prioritizing Adverse Drug Reaction and Drug Repositioning Candidates Generated by Literature-Based Discovery

Majid Rastegar-Mojarad; Ravikumar Komandur Elayavilli; Liwei Wang; Rashmi Prasad; Hongfang Liu

Literature based discovery (LBD) is a well-known paradigm to discover hidden knowledge in scientific literature. By identifying and utilizing reported findings in literature, LBD hypothesizes novel discoveries. Most often, LBD systems generate a long list of potential discoveries and it would be time consuming and expensive to validate all of those discoveries. Preliminary validation or prioritization of the discoveries can improve the significance of LBD systems. In this study, we proposed a method utilizing information surrounding causal findings to prioritize discoveries generated by LBD systems. As a case study, we focused on discovering drug-disease relations, which have potential to identify drug repositioning candidates or adverse drug reactions. Our LBD system used drug-gene and gene-disease semantic predication in SemMedDB as causal findings and Swansons ABC model to generate potential drug-disease relations. Using sentences, which causal findings extracted from, our ranking method trained a binary classifier to classify generated drug-disease relations into desired classes. We trained and tested our classifier for three different purposes: a) drug repositioning b) adverse drug events c) drug-disease relation detection. The classifier obtained 0.78, 0.86, and 0.83 f-measure respectively for these tasks. The number of causal findings of each hypothesis, which were classified as positive by the classifier, is the main metric for ranking the hypotheses in the proposed method. To evaluate the ranking method, we counted and compared the number of true relations in the top 100 pairs, which were ranked by our method and one of previous methods. Out of 181 true relations in the test dataset, the proposed method ranked 20 of them in top 100 relations while this number was 13 for the other method.


international conference on bioinformatics | 2018

BioCreative/OHNLP Challenge 2018

Majid Rastegar-Mojarad; Sijia Liu; Yanshan Wang; Naveed Afzal; Liwei Wang; Feichen Shen; Sunyang Fu; Hongfang Liu

The application of Natural Language Processing (NLP) methods and resources to clinical and biomedical text has received growing attention over the past years, but progress has been limited by difficulties to access shared tools and resources, partially caused by patient privacy and data confidentiality constraints. Efforts to increase sharing and interoperability of the few existing resources are needed to facilitate the progress observed in the general NLP domain. Leveraging our research in corpus analysis and de-identification research, we have created multiple synthetic data sets for a couple of NLP tasks based on real clinical sentences. We are organizing a challenge workshop to promote community efforts towards the advancement in clinical NLP. The challenge workshop will have two tasks: 1) Family History Information Extraction; and 2) Clinical Semantic Textual Similarity.


Journal of Medical Internet Research | 2017

Recommending Education Materials for Diabetic Questions Using Information Retrieval Approaches

Yuqun Zeng; Xusheng Liu; Yanshan Wang; Feichen Shen; Sijia Liu; Majid Rastegar-Mojarad; Liwei Wang; Hongfang Liu

Background Self-management is crucial to diabetes care and providing expert-vetted content for answering patients’ questions is crucial in facilitating patient self-management. Objective The aim is to investigate the use of information retrieval techniques in recommending patient education materials for diabetic questions of patients. Methods We compared two retrieval algorithms, one based on Latent Dirichlet Allocation topic modeling (topic modeling-based model) and one based on semantic group (semantic group-based model), with the baseline retrieval models, vector space model (VSM), in recommending diabetic patient education materials to diabetic questions posted on the TuDiabetes forum. The evaluation was based on a gold standard dataset consisting of 50 randomly selected diabetic questions where the relevancy of diabetic education materials to the questions was manually assigned by two experts. The performance was assessed using precision of top-ranked documents. Results We retrieved 7510 diabetic questions on the forum and 144 diabetic patient educational materials from the patient education database at Mayo Clinic. The mapping rate of words in each corpus mapped to the Unified Medical Language System (UMLS) was significantly different (P<.001). The topic modeling-based model outperformed the other retrieval algorithms. For example, for the top-retrieved document, the precision of the topic modeling-based, semantic group-based, and VSM models was 67.0%, 62.8%, and 54.3%, respectively. Conclusions This study demonstrated that topic modeling can mitigate the vocabulary difference and it achieved the best performance in recommending education materials for answering patients’ questions. One direction for future work is to assess the generalizability of our findings and to extend our study to other disease areas, other patient education material resources, and online forums.


16th World Congress of Medical and Health Informatics: Precision Healthcare through Informatics, MedInfo 2017 | 2017

Using human phenotype ontology for phenotypic analysis of clinical notes

Feichen Shen; Liwei Wang; Hongfang Liu

Phenotypes are defined as observable characteristics of organisms. To facilitate the translation between genotype and phenotype, Human Phenotype Ontology (HPO) was developed as a semantically computable standardized vocabulary to capture phenotypic abnormalities found in human. In this study, we investigated the use of HPO to annotate phenotypic information in clinical domain by leveraging a corpus of 12.8 million clinical notes created from 2010 to 2015 for 729 thousand patients at Mayo Clinic Rochester campus.


language resources and evaluation | 2018

MedSTS: a resource for clinical semantic textual similarity

Yanshan Wang; Naveed Afzal; Sunyang Fu; Liwei Wang; Feichen Shen; Majid Rastegar-Mojarad; Hongfang Liu

The adoption of electronic health records (EHRs) has enabled a wide range of applications leveraging EHR data. However, the meaningful use of EHR data largely depends on our ability to efficiently extract and consolidate information embedded in clinical text where natural language processing (NLP) techniques are essential. Semantic textual similarity (STS) that measures the semantic similarity between text snippets plays a significant role in many NLP applications. In the general NLP domain, STS shared tasks have made available a huge collection of text snippet pairs with manual annotations in various domains. In the clinical domain, STS can enable us to detect and eliminate redundant information that may lead to a reduction in cognitive burden and an improvement in the clinical decision-making process. This paper elaborates our efforts to assemble a resource for STS in the medical domain, MedSTS. It consists of a total of 174,629 sentence pairs gathered from a clinical corpus at Mayo Clinic. A subset of MedSTS (MedSTS_ann) containing 1068 sentence pairs was annotated by two medical experts with semantic similarity scores of 0–5 (low to high similarity). We further analyzed the medical concepts in the MedSTS corpus, and tested four STS systems on the MedSTS_ann corpus. In the future, we will organize a shared task by releasing the MedSTS_ann corpus to motivate the community to tackle the real world clinical problems.


ieee international conference on healthcare informatics | 2018

Predicting Practice Setting Using Topic Modeling

Liwei Wang; Yanshan Wang; Feichen Shen; Majid Rastegar-Mojarad; Hongfang Liu

The implementation of problem lists in EHRs has a potential to help practitioners to provide customized care to patients. However, it remains an open question on how to leverage problem lists in different practice settings to provide tailored care, of which the bottleneck lies in the associations between problem list and practice setting. In this study, we investigated their association and predicted practice setting based on problem list using topic modeling.


JMIR medical informatics | 2018

Utilization of Electronic Medical Records and Biomedical Literature to Support Rare Disease Diagnosis (Preprint)

Feichen Shen; Sijia Liu; Yanshan Wang; Andrew Wen; Liwei Wang; Hongfang Liu

Background In the United States, a rare disease is characterized as the one affecting no more than 200,000 patients at a certain period. Patients suffering from rare diseases are often either misdiagnosed or left undiagnosed, possibly due to insufficient knowledge or experience with the rare disease on the part of clinical practitioners. With an exponentially growing volume of electronically accessible medical data, a large volume of information on thousands of rare diseases and their potentially associated diagnostic information is buried in electronic medical records (EMRs) and medical literature. Objective This study aimed to leverage information contained in heterogeneous datasets to assist rare disease diagnosis. Phenotypic information of patients existed in EMRs and biomedical literature could be fully leveraged to speed up diagnosis of diseases. Methods In our previous work, we advanced the use of a collaborative filtering recommendation system to support rare disease diagnostic decision making based on phenotypes derived solely from EMR data. However, the influence of using heterogeneous data with collaborative filtering was not discussed, which is an essential problem while facing large volumes of data from various resources. In this study, to further investigate the performance of collaborative filtering on heterogeneous datasets, we studied EMR data generated at Mayo Clinic as well as published article abstracts retrieved from the Semantic MEDLINE Database. Specifically, in this study, we designed different data fusion strategies from heterogeneous resources and integrated them with the collaborative filtering model. Results We evaluated performance of the proposed system using characterizations derived from various combinations of EMR data and literature, as well as with sole EMR data. We extracted nearly 13 million EMRs from the patient cohort generated between 2010 and 2015 at Mayo Clinic and retrieved all article abstracts from the semistructured Semantic MEDLINE Database that were published till the end of 2016. We applied a collaborative filtering model and compared the performance generated by different metrics. Log likelihood ratio similarity combined with k-nearest neighbor on heterogeneous datasets showed the optimal performance in patient recommendation with area under the precision-recall curve (PRAUC) 0.475 (string match), 0.511 (systematized nomenclature of medicine [SNOMED] match), and 0.752 (Genetic and Rare Diseases Information Center [GARD] match). Log likelihood ratio similarity also performed the best with mean average precision 0.465 (string match), 0.5 (SNOMED match), and 0.749 (GARD match). Performance of rare disease prediction was also demonstrated by using the optimal algorithm. Macro-average F-measure for string, SNOMED, and GARD match were 0.32, 0.42, and 0.63, respectively. Conclusions This study demonstrated potential utilization of heterogeneous datasets in a collaborative filtering model to support rare disease diagnosis. In addition to phenotypic-based analysis, in the future, we plan to further resolve the heterogeneity issue and reduce miscommunication between EMR and literature by mining genotypic information to establish a comprehensive disease-phenotype-gene network for rare disease diagnosis.

Collaboration


Dive into the Liwei Wang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sijia Liu

University at Buffalo

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xusheng Liu

Guangzhou University of Chinese Medicine

View shared research outputs
Top Co-Authors

Avatar

Yuqun Zeng

Guangzhou University of Chinese Medicine

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge