Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yanshan Wang is active.

Publication


Featured researches published by Yanshan Wang.


Journal of Biomedical Informatics | 2018

Clinical information extraction applications: A literature review

Yanshan Wang; Liwei Wang; Majid Rastegar-Mojarad; Sungrim Moon; Feichen Shen; Naveed Afzal; Sijia Liu; Yuqun Zeng; Saeed Mehrabi; Sunghwan Sohn; Hongfang Liu

BACKGROUND With the rapid adoption of electronic health records (EHRs), it is desirable to harvest information and knowledge from EHRs to support automated systems at the point of care and to enable secondary use of EHRs for clinical and translational research. One critical component used to facilitate the secondary use of EHR data is the information extraction (IE) task, which automatically extracts and encodes clinical information from text. OBJECTIVES In this literature review, we present a review of recent published research on clinical information extraction (IE) applications. METHODS A literature search was conducted for articles published from January 2009 to September 2016 based on Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and ACM Digital Library. RESULTS A total of 1917 publications were identified for title and abstract screening. Of these publications, 263 articles were selected and discussed in this review in terms of publication venues and data sources, clinical IE tools, methods, and applications in the areas of disease- and drug-related studies, and clinical workflow optimizations. CONCLUSIONS Clinical IE has been used for a wide range of applications, however, there is a considerable gap between clinical studies using EHR data and studies using clinical IE. This study enabled us to gain a more concrete understanding of the gap and to provide potential solutions to bridge this gap.


north american chapter of the association for computational linguistics | 2016

MayoNLP at SemEval-2016 task 1: Semantic textual similarity based on lexical semantic net and deep learning semantic model

Naveed Afzal; Yanshan Wang; Hongfang Liu

Given two sentences, participating systems assign a semantic similarity score in the range of 0-5. We applied two different techniques for the task: one is based on lexical semantic net (corresponding to run 1) and the other is based on deep learning semantic model (corresponding to run 2). We also combined these two runs linearly (corresponding to run 3). Our results indicate that the two techniques perform comparably while the combination outperforms the individual ones on four out of five datasets, namely answeranswer, headlines, plagiarism, and questionquestion, and on the overall weighted mean of STS 2016 and 2015 datasets.


Biomedical Informatics Insights | 2016

Toward a Learning Health-care System – Knowledge Delivery at the Point of Care Empowered by Big Data and NLP

Vinod Kaggal; Ravikumar Komandur Elayavilli; Saeed Mehrabi; Joshua J. Pankratz; Sunghwan Sohn; Yanshan Wang; Dingcheng Li; Majid Mojarad Rastegar; Sean P. Murphy; Jason L. Ross; Rajeev Chaudhry; James D. Buntrock; Hongfang Liu

The concept of optimizing health care by understanding and generating knowledge from previous evidence, ie, the Learning Health-care System (LHS), has gained momentum and now has national prominence. Meanwhile, the rapid adoption of electronic health records (EHRs) enables the data collection required to form the basis for facilitating LHS. A prerequisite for using EHR data within the LHS is an infrastructure that enables access to EHR data longitudinally for health-care analytics and real time for knowledge delivery. Additionally, significant clinical information is embedded in the free text, making natural language processing (NLP) an essential component in implementing an LHS. Herein, we share our institutional implementation of a big data-empowered clinical NLP infrastructure, which not only enables health-care analytics but also has real-time NLP processing capability. The infrastructure has been utilized for multiple institutional projects including the MayoExpertAdvisor, an individualized care recommendation solution for clinical care. We compared the advantages of big data over two other environments. Big data infrastructure significantly outperformed other infrastructure in terms of computing speed, demonstrating its value in making the LHS a possibility in the near future.


international conference on bioinformatics | 2015

A frequency-filtering strategy of obtaining PHI-free sentences from clinical data repository

Dingcheng Li; Majid Rastegar-Mojarad; Ravikumar Komandur Elayavilli; Yanshan Wang; Saeed Mehrabi; Yue Yu; Sunghwan Sohn; Yanpeng Li; Naveed Afzal; Hongfang Liu

Clinical natural language processing (NLP) has become indispensable in the secondary use of electronic medical records (EMRs). However, it is found that current clinical NLP tools face the problem of portability among different institutes. An ideal solution to this problem is cross-institutional data sharing. However, the legal enforcement of no revelation of protected health information (PHI) obstructs this practice even with the availability of state-of-the-art de-identification tools. In this paper, we investigated the use of a frequency-filtering approach to extract PHI-free sentences utilizing the Enterprise Data Trust (EDT), a large collection of EMRs at Mayo Clinic. Our approach is based on the assumption that sentences appearing frequently tend to contain no PHI. This assumption originates from the observation that there exist a large number of redundant descriptions of similar patient conditions in EDT. Both manual and automatic evaluations on the sentence set with frequencies higher than one show no PHI are found. The promising results demonstrate the potential of sharing highly frequent sentences among institutes.


Journal of the American Medical Informatics Association | 2018

Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions

Sunghwan Sohn; Yanshan Wang; Chung Il Wi; Elizabeth Krusemark; Euijung Ryu; Mir H. Ali; Young J. Juhn; Hongfang Liu

Abstract Objective To assess clinical documentation variations across health care institutions using different electronic medical record systems and investigate how they affect natural language processing (NLP) system portability. Materials and Methods Birth cohorts from Mayo Clinic and Sanford Children’s Hospital (SCH) were used in this study (n = 298 for each). Documentation variations regarding asthma between the 2 cohorts were examined in various aspects: (1) overall corpus at the word level (ie, lexical variation), (2) topics and asthma-related concepts (ie, semantic variation), and (3) clinical note types (ie, process variation). We compared those statistics and explored NLP system portability for asthma ascertainment in 2 stages: prototype and refinement. Results There exist notable lexical variations (word-level similarity = 0.669) and process variations (differences in major note types containing asthma-related concepts). However, semantic-level corpora were relatively homogeneous (topic similarity = 0.944, asthma-related concept similarity = 0.971). The NLP system for asthma ascertainment had anF-score of 0.937 at Mayo, and produced 0.813 (prototype) and 0.908 (refinement) when applied at SCH. Discussion The criteria for asthma ascertainment are largely dependent on asthma-related concepts. Therefore, we believe that semantic similarity is important to estimate NLP system portability. As the Mayo Clinic and SCH corpora were relatively homogeneous at a semantic level, the NLP system, developed at Mayo Clinic, was imported to SCH successfully with proper adjustments to deal with the intrinsic corpus heterogeneity.


bioinformatics and biomedicine | 2017

Medical concept intersection between outside medical records and consultant notes: A case study in transferred cardiovascular patients

Sungrim Moon; Sijia Liu; Paul Kingsbury; David Chen; Yanshan Wang; Feichen Shen; Rajeev Chaudhry; Hongfang Liu

One of the promises of “meaningful use” of Electronic Health Records (EHRs) is to facilitate digital information exchange between healthcare providers through continuity of care documents. Despite such promise, outside medical records (OMRs) of referral patients including clinical notes, lab test results or diagnostic test reports are frequently provided through fax or print out. Moreover, it is not clear how much information in those OMRs is utilized when providing care at the early stage. In this study, we collected clinical concepts automatically from OMRs through optical character recognition (OCR) technology and then performed a quantitative analysis of concepts presented in OMRs and concepts captured in clinical notes at Mayo Clinic. We also investigated information from OMRs not captured in initial consultant notes but presented over subsequent consultant notes. We identified 12.93% of concepts from OMRs were identified in clinical documents within three months. Among those overlapping concepts, 26.74% of them were not captured in initial consultant notes. Our study presents that clinical information from OMRs is important for patient care. Also, the delayed presence of information in clinical notes may indicate important information from OMRs is not fully utilized earlier in the care.


ieee international conference on healthcare informatics | 2015

Retrieval of Semantically Similar Healthcare Questions in Healthcare Forums

Yanshan Wang; Saeed Mehrabi; Majid Rastegar Mojarad; Dingcheng Li; Hongfang Liu

Healthcare forums are popular platforms for patients to communicate with other patients who have similar conditions. Retrieving similar post to a users question is valuable as the question might have been already answered in similar threads. ICHI 2015 organized a challenge with a corpus of ninety-five selected questions and a query set of ten questions from various diabetes related online forums. The task in this challenge is that given the corpus, a system should be developed to generate the most three similar questions for each question in the query set. In order to accomplish the challenge, we utilized Elastic search to built and search an index based on tokens, UMLS concepts, and semantic types, and finally combined the ranking results with LDI, a Latent Dirichlet Allocation (LDA) based ranking method. The experimental results showed that a mean average precision of 0.72 was achieved on the manually created gold standard.


international conference on bioinformatics | 2018

BioCreative/OHNLP Challenge 2018

Majid Rastegar-Mojarad; Sijia Liu; Yanshan Wang; Naveed Afzal; Liwei Wang; Feichen Shen; Sunyang Fu; Hongfang Liu

The application of Natural Language Processing (NLP) methods and resources to clinical and biomedical text has received growing attention over the past years, but progress has been limited by difficulties to access shared tools and resources, partially caused by patient privacy and data confidentiality constraints. Efforts to increase sharing and interoperability of the few existing resources are needed to facilitate the progress observed in the general NLP domain. Leveraging our research in corpus analysis and de-identification research, we have created multiple synthetic data sets for a couple of NLP tasks based on real clinical sentences. We are organizing a challenge workshop to promote community efforts towards the advancement in clinical NLP. The challenge workshop will have two tasks: 1) Family History Information Extraction; and 2) Clinical Semantic Textual Similarity.


Journal of Medical Internet Research | 2017

Recommending Education Materials for Diabetic Questions Using Information Retrieval Approaches

Yuqun Zeng; Xusheng Liu; Yanshan Wang; Feichen Shen; Sijia Liu; Majid Rastegar-Mojarad; Liwei Wang; Hongfang Liu

Background Self-management is crucial to diabetes care and providing expert-vetted content for answering patients’ questions is crucial in facilitating patient self-management. Objective The aim is to investigate the use of information retrieval techniques in recommending patient education materials for diabetic questions of patients. Methods We compared two retrieval algorithms, one based on Latent Dirichlet Allocation topic modeling (topic modeling-based model) and one based on semantic group (semantic group-based model), with the baseline retrieval models, vector space model (VSM), in recommending diabetic patient education materials to diabetic questions posted on the TuDiabetes forum. The evaluation was based on a gold standard dataset consisting of 50 randomly selected diabetic questions where the relevancy of diabetic education materials to the questions was manually assigned by two experts. The performance was assessed using precision of top-ranked documents. Results We retrieved 7510 diabetic questions on the forum and 144 diabetic patient educational materials from the patient education database at Mayo Clinic. The mapping rate of words in each corpus mapped to the Unified Medical Language System (UMLS) was significantly different (P<.001). The topic modeling-based model outperformed the other retrieval algorithms. For example, for the top-retrieved document, the precision of the topic modeling-based, semantic group-based, and VSM models was 67.0%, 62.8%, and 54.3%, respectively. Conclusions This study demonstrated that topic modeling can mitigate the vocabulary difference and it achieved the best performance in recommending education materials for answering patients’ questions. One direction for future work is to assess the generalizability of our findings and to extend our study to other disease areas, other patient education material resources, and online forums.


Database | 2017

Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts

Yanshan Wang; Majid Rastegar-Mojarad; Ravikumar Komandur-Elayavilli; Hongfang Liu

Abstract The recent movement towards open data in the biomedical domain has generated a large number of datasets that are publicly accessible. The Big Data to Knowledge data indexing project, biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE), has gathered these datasets in a one-stop portal aiming at facilitating their reuse for accelerating scientific advances. However, as the number of biomedical datasets stored and indexed increases, it becomes more and more challenging to retrieve the relevant datasets according to researchers’ queries. In this article, we propose an information retrieval (IR) system to tackle this problem and implement it for the bioCADDIE Dataset Retrieval Challenge. The system leverages the unstructured texts of each dataset including the title and description for the dataset, and utilizes a state-of-the-art IR model, medical named entity extraction techniques, query expansion with deep learning-based word embeddings and a re-ranking strategy to enhance the retrieval performance. In empirical experiments, we compared the proposed system with 11 baseline systems using the bioCADDIE Dataset Retrieval Challenge datasets. The experimental results show that the proposed system outperforms other systems in terms of inference Average Precision and inference normalized Discounted Cumulative Gain, implying that the proposed system is a viable option for biomedical dataset retrieval. Database URL: https://github.com/yanshanwang/biocaddie2016mayodata

Collaboration


Dive into the Yanshan Wang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge