Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sijia Liu is active.

Publication


Featured researches published by Sijia Liu.


Journal of Biomedical Informatics | 2018

Clinical information extraction applications: A literature review

Yanshan Wang; Liwei Wang; Majid Rastegar-Mojarad; Sungrim Moon; Feichen Shen; Naveed Afzal; Sijia Liu; Yuqun Zeng; Saeed Mehrabi; Sunghwan Sohn; Hongfang Liu

BACKGROUNDnWith the rapid adoption of electronic health records (EHRs), it is desirable to harvest information and knowledge from EHRs to support automated systems at the point of care and to enable secondary use of EHRs for clinical and translational research. One critical component used to facilitate the secondary use of EHR data is the information extraction (IE) task, which automatically extracts and encodes clinical information from text.nnnOBJECTIVESnIn this literature review, we present a review of recent published research on clinical information extraction (IE) applications.nnnMETHODSnA literature search was conducted for articles published from January 2009 to September 2016 based on Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and ACM Digital Library.nnnRESULTSnA total of 1917 publications were identified for title and abstract screening. Of these publications, 263 articles were selected and discussed in this review in terms of publication venues and data sources, clinical IE tools, methods, and applications in the areas of disease- and drug-related studies, and clinical workflow optimizations.nnnCONCLUSIONSnClinical IE has been used for a wide range of applications, however, there is a considerable gap between clinical studies using EHR data and studies using clinical IE. This study enabled us to gain a more concrete understanding of the gap and to provide potential solutions to bridge this gap.


PLOS ONE | 2018

Systematic identification of latent disease-gene associations from PubMed articles

Yuji Zhang; Feichen Shen; Majid Rastegar Mojarad; Dingcheng Li; Sijia Liu; Cui Tao; Yue Yu; Hongfang Liu

Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated associations with noises, the interpretability of such association data for semantic knowledge discovery is challenging. In this study, we describe an integrative computational framework aiming to expedite the discovery of latent disease mechanisms by dissecting 146,245 disease-gene associations from over 25 million of PubMed indexed articles. We take advantage of both Latent Dirichlet Allocation (LDA) modeling and network-based analysis for their capabilities of detecting latent associations and reducing noises for large volume data respectively. Our results demonstrate that (1) the LDA-based modeling is able to group similar diseases into disease topics; (2) the disease-specific association networks follow the scale-free network property; (3) certain subnetwork patterns were enriched in the disease-specific association networks; and (4) genes were enriched in topic-specific biological processes. Our approach offers promising opportunities for latent disease-gene knowledge discovery in biomedical research.


international conference on bioinformatics | 2017

Dependency and AMR Embeddings for Drug-Drug Interaction Extraction from Biomedical Literature

Yanshan Wang; Sijia Liu; Majid Rastegar-Mojarad; Liwei Wang; Feichen Shen; Fei Liu; Hongfang Liu

Drug-drug interaction (DDI) is an unexpected change in a drugs effect on the human body when the drug and a second drug are co-prescribed and taken together. As many DDIs are frequently reported in biomedical literature, it is important to mine DDI information from literature to keep DDI knowledge up to date. One of the SemEval challenges in the year 2011 and 2013 was designed to tackle the task where the best system achieved an F1 score of 0.80. In this paper, we propose to utilize dependency embeddings and Abstract Meaning Representation (AMR) embeddings as features for extracting DDIs. Our contribution is two-fold. First, we employed dependency embeddings, previously shown effective for sentence classification, for DDI extraction. The dependency embeddings incorporated structural syntactic contexts into the embeddings, which were not present in the conventional word embeddings. Second, we proposed a novel syntactic embedding approach using AMR. AMR aims to abstract away from syntactic idiosyncrasies and attempts to capture only the core meaning of a sentence, which could potentially improve DDI extraction from sentences. Two classifiers (Support Vector Machine and Random Forest) taking these embedding features as input were evaluated on the DDIExtraction 2013 challenge corpus. The experimental results show the effectiveness of dependency and AMR embeddings in the DDI extraction task. The best performance was obtained by combining word, dependency and AMR embeddings (F1 score=0.84).


bioinformatics and biomedicine | 2017

Medical concept intersection between outside medical records and consultant notes: A case study in transferred cardiovascular patients

Sungrim Moon; Sijia Liu; Paul Kingsbury; David Chen; Yanshan Wang; Feichen Shen; Rajeev Chaudhry; Hongfang Liu

One of the promises of “meaningful use” of Electronic Health Records (EHRs) is to facilitate digital information exchange between healthcare providers through continuity of care documents. Despite such promise, outside medical records (OMRs) of referral patients including clinical notes, lab test results or diagnostic test reports are frequently provided through fax or print out. Moreover, it is not clear how much information in those OMRs is utilized when providing care at the early stage. In this study, we collected clinical concepts automatically from OMRs through optical character recognition (OCR) technology and then performed a quantitative analysis of concepts presented in OMRs and concepts captured in clinical notes at Mayo Clinic. We also investigated information from OMRs not captured in initial consultant notes but presented over subsequent consultant notes. We identified 12.93% of concepts from OMRs were identified in clinical documents within three months. Among those overlapping concepts, 26.74% of them were not captured in initial consultant notes. Our study presents that clinical information from OMRs is important for patient care. Also, the delayed presence of information in clinical notes may indicate important information from OMRs is not fully utilized earlier in the care.


Journal of the Association for Information Science and Technology | 2017

Intrainstitutional EHR collections for patient-level information retrieval

Stephen T. Wu; Sijia Liu; Yanshan Wang; Tamara Timmons; Harsha Uppili; Steven Bedrick; William R. Hersh; Hongfang Liu

Research in clinical information retrieval has long been stymied by the lack of open resources. However, both clinical information retrieval research innovation and legitimate privacy concerns can be served by the creation of intrainstitutional, fully protected resources. In this article, we provide some principles and tools for information retrieval resource‐building in the unique problem setting of patient‐level information retrieval, following the tradition of the Cranfield paradigm. We further include an analysis of parallel information retrieval resources at Oregon Health & Science University and Mayo Clinic that were built on these principles.


international conference on bioinformatics | 2018

BioCreative/OHNLP Challenge 2018

Majid Rastegar-Mojarad; Sijia Liu; Yanshan Wang; Naveed Afzal; Liwei Wang; Feichen Shen; Sunyang Fu; Hongfang Liu

The application of Natural Language Processing (NLP) methods and resources to clinical and biomedical text has received growing attention over the past years, but progress has been limited by difficulties to access shared tools and resources, partially caused by patient privacy and data confidentiality constraints. Efforts to increase sharing and interoperability of the few existing resources are needed to facilitate the progress observed in the general NLP domain. Leveraging our research in corpus analysis and de-identification research, we have created multiple synthetic data sets for a couple of NLP tasks based on real clinical sentences. We are organizing a challenge workshop to promote community efforts towards the advancement in clinical NLP. The challenge workshop will have two tasks: 1) Family History Information Extraction; and 2) Clinical Semantic Textual Similarity.


16th World Congress of Medical and Health Informatics: Precision Healthcare through Informatics, MedInfo 2017 | 2017

Aligned-layer text search in clinical notes

Stephen T. Wu; Andrew Wen; Yanshan Wang; Sijia Liu; Hongfang Liu

Search techniques in clinical text need to make fine-grained semantic distinctions, since medical terms may be negated, about someone other than the patient, or at some time other than the present. While natural language processing (NLP) approaches address these fine-grained distinctions, a task like patient cohort identification from electronic health records (EHRs) simultaneously requires a much more coarse-grained combination of evidence from the text and structured data of each patient’s health records. We thus introduce aligned-layer language models, a novel approach to information retrieval (IR) that incorporates the output of other NLP systems. We show that this framework is able to represent standard IR queries, formulate previously impossible multi-layered queries, and customize the desired degree of linguistic granularity.


Journal of Biomedical Informatics | 2018

Modeling asynchronous event sequences with RNNs

Stephen T. Wu; Sijia Liu; Sunghwan Sohn; Sungrim Moon; Chung-Il Wi; Young J. Juhn; Hongfang Liu

Sequences of events have often been modeled with computational techniques, but typical preprocessing steps and problem settings do not explicitly address the ramifications of timestamped events. Clinical data, such as is found in electronic health records (EHRs), typically comes with timestamp information. In this work, we define event sequences and their properties: synchronicity, evenness, and co-cardinality; we then show how asynchronous, uneven, and multi-cardinal problem settings can support explicit accountings of relative time. Our evaluation uses the temporally sensitive clinical use case of pediatric asthma, which is a chronic disease with symptoms (and lack thereof) evolving over time. We show several approaches to explicitly incorporating relative time into a recurrent neural network (RNN) model that improve the overall classification of patients into those with no asthma, those with persistent asthma, those in long-term remission, and those who have experienced relapse. We also compare and contrast these results with those in an inpatient intensive care setting.


Journal of Biomedical Informatics | 2018

A comparison of word embeddings for the biomedical natural language processing

Yanshan Wang; Sijia Liu; Naveed Afzal; Majid Rastegar-Mojarad; Liwei Wang; Feichen Shen; Paul Kingsbury; Hongfang Liu

BACKGROUNDnWord embeddings have been prevalently used in biomedical Natural Language Processing (NLP) applications due to the ability of the vector representations being able to capture useful semantic properties and linguistic relationships between words. Different textual resources (e.g., Wikipedia and biomedical literature corpus) have been utilized in biomedical NLP to train word embeddings and these word embeddings have been commonly leveraged as feature input to downstream machine learning models. However, there has been little work on evaluating the word embeddings trained from different textual resources.nnnMETHODSnIn this study, we empirically evaluated word embeddings trained from four different corpora, namely clinical notes, biomedical publications, Wikipedia, and news. For the former two resources, we trained word embeddings using unstructured electronic health record (EHR) data available at Mayo Clinic and articles (MedLit) from PubMed Central, respectively. For the latter two resources, we used publicly available pre-trained word embeddings, GloVe and Google News. The evaluation was done qualitatively and quantitatively. For the qualitative evaluation, we randomly selected medical terms from three categories (i.e., disorder, symptom, and drug), and manually inspected the five most similar words computed by embeddings for each term. We also analyzed the word embeddings through a 2-dimensional visualization plot of 377 medical terms. For the quantitative evaluation, we conducted both intrinsic and extrinsic evaluation. For the intrinsic evaluation, we evaluated the word embeddings ability to capture medical semantics by measruing the semantic similarity between medical terms using four published datasets: Pedersens dataset, Hliaoutakiss dataset, MayoSRS, and UMNSRS. For the extrinsic evaluation, we applied word embeddings to multiple downstream biomedical NLP applications, including clinical information extraction (IE), biomedical information retrieval (IR), and relation extraction (RE), with data from shared tasks.nnnRESULTSnThe qualitative evaluation shows that the word embeddings trained from EHR and MedLit can find more similar medical terms than those trained from GloVe and Google News. The intrinsic quantitative evaluation verifies that the semantic similarity captured by the word embeddings trained from EHR is closer to human experts judgments on all four tested datasets. The extrinsic quantitative evaluation shows that the word embeddings trained on EHR achieved the best F1 score of 0.900 for the clinical IE task; no word embeddings improved the performance for the biomedical IR task; and the word embeddings trained on Google News had the best overall F1 score of 0.790 for the RE task.nnnCONCLUSIONnBased on the evaluation results, we can draw the following conclusions. First, the word embeddings trained from EHR and MedLit can capture the semantics of medical terms better, and find semantically relevant medical terms closer to human experts judgments than those trained from GloVe and Google News. Second, there does not exist a consistent global ranking of word embeddings for all downstream biomedical NLP applications. However, adding word embeddings as extra features will improve results on most downstream tasks. Finally, the word embeddings trained from the biomedical domain corpora do not necessarily have better performance than those trained from the general domain corpora for any downstream biomedical NLP task.


JMIR medical informatics | 2018

Utilization of Electronic Medical Records and Biomedical Literature to Support Rare Disease Diagnosis (Preprint)

Feichen Shen; Sijia Liu; Yanshan Wang; Andrew Wen; Liwei Wang; Hongfang Liu

Background In the United States, a rare disease is characterized as the one affecting no more than 200,000 patients at a certain period. Patients suffering from rare diseases are often either misdiagnosed or left undiagnosed, possibly due to insufficient knowledge or experience with the rare disease on the part of clinical practitioners. With an exponentially growing volume of electronically accessible medical data, a large volume of information on thousands of rare diseases and their potentially associated diagnostic information is buried in electronic medical records (EMRs) and medical literature. Objective This study aimed to leverage information contained in heterogeneous datasets to assist rare disease diagnosis. Phenotypic information of patients existed in EMRs and biomedical literature could be fully leveraged to speed up diagnosis of diseases. Methods In our previous work, we advanced the use of a collaborative filtering recommendation system to support rare disease diagnostic decision making based on phenotypes derived solely from EMR data. However, the influence of using heterogeneous data with collaborative filtering was not discussed, which is an essential problem while facing large volumes of data from various resources. In this study, to further investigate the performance of collaborative filtering on heterogeneous datasets, we studied EMR data generated at Mayo Clinic as well as published article abstracts retrieved from the Semantic MEDLINE Database. Specifically, in this study, we designed different data fusion strategies from heterogeneous resources and integrated them with the collaborative filtering model. Results We evaluated performance of the proposed system using characterizations derived from various combinations of EMR data and literature, as well as with sole EMR data. We extracted nearly 13 million EMRs from the patient cohort generated between 2010 and 2015 at Mayo Clinic and retrieved all article abstracts from the semistructured Semantic MEDLINE Database that were published till the end of 2016. We applied a collaborative filtering model and compared the performance generated by different metrics. Log likelihood ratio similarity combined with k-nearest neighbor on heterogeneous datasets showed the optimal performance in patient recommendation with area under the precision-recall curve (PRAUC) 0.475 (string match), 0.511 (systematized nomenclature of medicine [SNOMED] match), and 0.752 (Genetic and Rare Diseases Information Center [GARD] match). Log likelihood ratio similarity also performed the best with mean average precision 0.465 (string match), 0.5 (SNOMED match), and 0.749 (GARD match). Performance of rare disease prediction was also demonstrated by using the optimal algorithm. Macro-average F-measure for string, SNOMED, and GARD match were 0.32, 0.42, and 0.63, respectively. Conclusions This study demonstrated potential utilization of heterogeneous datasets in a collaborative filtering model to support rare disease diagnosis. In addition to phenotypic-based analysis, in the future, we plan to further resolve the heterogeneity issue and reduce miscommunication between EMR and literature by mining genotypic information to establish a comprehensive disease-phenotype-gene network for rare disease diagnosis.

Collaboration


Dive into the Sijia Liu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge