Guoqian Jiang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guoqian Jiang is active.

Explore More

Publication

Featured researches published by Guoqian Jiang.

Journal of the American Medical Informatics Association | 2013

A semantic-web oriented representation of the clinical element model for secondary use of electronic health records data

Cui Tao; Guoqian Jiang; Thomas A. Oniki; Robert R. Freimuth; Qian Zhu; Deepak K. Sharma; Jyotishman Pathak; Stanley M. Huff; Christopher G. Chute

The clinical element model (CEM) is an information model designed for representing clinical information in electronic health records (EHR) systems across organizations. The current representation of CEMs does not support formal semantic definitions and therefore it is not possible to perform reasoning and consistency checking on derived models. This paper introduces our efforts to represent the CEM specification using the Web Ontology Language (OWL). The CEM-OWL representation connects the CEM content with the Semantic Web environment, which provides authoring, reasoning, and querying tools. This work may also facilitate the harmonization of the CEMs with domain knowledge represented in terminology models as well as other clinical information models such as the openEHR archetype model. We have created the CEM-OWL meta ontology based on the CEM specification. A convertor has been implemented in Java to automatically translate detailed CEMs from XML to OWL. A panel evaluation has been conducted, and the results show that the OWL modeling can faithfully represent the CEM specification and represent patient data.

Journal of the American Medical Informatics Association | 2009

Auditing the Semantic Completeness of SNOMED CT Using Formal Concept Analysis

Guoqian Jiang; Christopher G. Chute

OBJECTIVE This study sought to develop and evaluate an approach for auditing the semantic completeness of the SNOMED CT contents using a formal concept analysis (FCA)-based model. DESIGN We developed a model for formalizing the normal forms of SNOMED CT expressions using FCA. Anonymous nodes, identified through the analyses, were retrieved from the model for evaluation. Two quasi-Poisson regression models were developed to test whether anonymous nodes can evaluate the semantic completeness of SNOMED CT contents (Model 1), and for testing whether such completeness differs between 2 clinical domains (Model 2). The data were randomly sampled from all the contexts that could be formed in the 2 largest domains: Procedure and Clinical Finding. Case studies (n = 4) were performed on randomly selected anonymous node samples for validation. MEASUREMENTS In Model 1, the outcome variable is the number of fully defined concepts within a context, while the explanatory variables are the number of lattice nodes and the number of anonymous nodes. In Model 2, the outcome variable is the number of anonymous nodes and the explanatory variables are the number of lattice nodes and a binary category for domain (Procedure/Clinical Finding). RESULTS A total of 5,450 contexts from the 2 domains were collected for analyses. Our findings revealed that the number of anonymous nodes had a significant negative correlation with the number of fully defined concepts within a context (p < 0.001). Further, the Clinical Finding domain had fewer anonymous nodes than the Procedure domain (p < 0.001). Case studies demonstrated that the anonymous nodes are an effective index for auditing SNOMED CT. CONCLUSION The anonymous nodes retrieved from FCA-based analyses are a candidate proxy for the semantic completeness of the SNOMED CT contents. Our novel FCA-based approach can be useful for auditing the semantic completeness of SNOMED CT contents, or any large ontology, within or across domains.

Journal of the American Medical Informatics Association | 2015

Desiderata for computable representations of electronic health records-driven phenotype algorithms.

Huan Mo; William K. Thompson; Luke V. Rasmussen; Jennifer A. Pacheco; Guoqian Jiang; Richard C. Kiefer; Qian Zhu; Jie Xu; Enid Montague; David Carrell; Todd Lingren; Frank D. Mentch; Yizhao Ni; Firas H. Wehbe; Peggy L. Peissig; Gerard Tromp; Eric B. Larson; Christopher G. Chute; Jyotishman Pathak; Joshua C. Denny; Peter Speltz; Abel N. Kho; Gail P. Jarvik; Cosmin Adrian Bejan; Marc S. Williams; Kenneth M. Borthwick; Terrie Kitchner; Dan M. Roden; Paul A. Harris

Background Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM). Methods A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms. Results We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility. Conclusion A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages.

Journal of Biomedical Informatics | 2009

Formalizing ICD coding rules using Formal Concept Analysis

Guoqian Jiang; Jyotishman Pathak; Christopher G. Chute

BACKGROUND With the 11th revision of the International Classification of Disease (ICD) being officially launched by the World Health Organization (WHO), the significance of a formal representation for ICD coding rules has emerged as a pragmatic concern. OBJECTIVES To explore the role of Formal Concept Analysis (FCA) on examining ICD10 coding rules and to develop FCA-based auditing approaches for the formalization process. METHODS We propose a model for formalizing ICD coding rules underlying the ICD Index using FCA. The coding rules are generated from FCA models and represented in the Semantic Web Rule Language (SWRL). Two auditing approaches were developed focusing upon non-disjoint nodes and anonymous nodes manifest in the FCA model. The candidate domains (i.e. any three character code with their sub-codes) of all 22 chapters of the ICD10 2006 version were analyzed using the two auditing approaches. Case studies and a preliminary evaluation were performed for validation. RESULTS A total of 2044 formal contexts from the candidate domains of 22 ICD chapters were generated and audited. We identified 692 ICD codes having non-disjoint nodes in all chapters; chapters 19 and 21 contained the highest proportion of candidate domains with non-disjoint nodes (61.9% and 45.6%). We also identified 6996 anonymous nodes from 1382 candidate domains. Chapters 7, 11, 13, and 17, have the highest proportion of candidate domains having anonymous nodes (97.5%, 95.4%, 93.6% and 93.0%) while chapters 15 and 17 have the highest proportion of anonymous nodes among all chapters (45.5% and 44.0%). Case studies and a limited evaluation demonstrate that non-disjoint nodes and anonymous nodes arising from FCA are effective mechanisms for auditing ICD10. CONCLUSION FCA-based models demonstrate a practical solution for formalizing ICD coding rules. FCA techniques could not only audit ICD domain knowledge completeness for a specific domain, but also provide a high level auditing profile for all ICD chapters.

Journal of Biomedical Semantics | 2014

Network-based analysis reveals distinct association patterns in a semantic MEDLINE-based drug-disease-gene network

Yuji Zhang; Cui Tao; Guoqian Jiang; Asha Nair; Jian Su; Christopher G. Chute; Hongfang Liu

BackgroundA huge amount of associations among different biological entities (e.g., disease, drug, and gene) are scattered in millions of biomedical articles. Systematic analysis of such heterogeneous data can infer novel associations among different biological entities in the context of personalized medicine and translational research. Recently, network-based computational approaches have gained popularity in investigating such heterogeneous data, proposing novel therapeutic targets and deciphering disease mechanisms. However, little effort has been devoted to investigating associations among drugs, diseases, and genes in an integrative manner.ResultsWe propose a novel network-based computational framework to identify statistically over-expressed subnetwork patterns, called network motifs, in an integrated disease-drug-gene network extracted from Semantic MEDLINE. The framework consists of two steps. The first step is to construct an association network by extracting pair-wise associations between diseases, drugs and genes in Semantic MEDLINE using a domain pattern driven strategy. A Resource Description Framework (RDF)-linked data approach is used to re-organize the data to increase the flexibility of data integration, the interoperability within domain ontologies, and the efficiency of data storage. Unique associations among drugs, diseases, and genes are extracted for downstream network-based analysis. The second step is to apply a network-based approach to mine the local network structure of this heterogeneous network. Significant network motifs are then identified as the backbone of the network. A simplified network based on those significant motifs is then constructed to facilitate discovery. We implemented our computational framework and identified five network motifs, each of which corresponds to specific biological meanings. Three case studies demonstrate that novel associations are derived from the network topology analysis of reconstructed networks of significant network motifs, further validated by expert knowledge and functional enrichment analyses.ConclusionsWe have developed a novel network-based computational approach to investigate the heterogeneous drug-gene-disease network extracted from Semantic MEDLINE. We demonstrate the power of this approach by prioritizing candidate disease genes, inferring potential disease relationships, and proposing novel drug targets, within the context of the entire knowledge. The results indicate that such approach will facilitate the formulization of novel research hypotheses, which is critical for translational medicine research and personalized medicine.

Journal of Biomedical Semantics | 2014

Standardizing adverse drug event reporting data

Liwei Wang; Guoqian Jiang; Dingcheng Li; Hongfang Liu

BackgroundThe Adverse Event Reporting System (AERS) is an FDA database providing rich information on voluntary reports of adverse drug events (ADEs). Normalizing data in the AERS would improve the mining capacity of the AERS for drug safety signal detection and promote semantic interoperability between the AERS and other data sources. In this study, we normalize the AERS and build a publicly available normalized ADE data source. The drug information in the AERS is normalized to RxNorm, a standard terminology source for medication, using a natural language processing medication extraction tool, MedEx. Drug class information is then obtained from the National Drug File-Reference Terminology (NDF-RT) using a greedy algorithm. Adverse events are aggregated through mapping with the Preferred Term (PT) and System Organ Class (SOC) codes of Medical Dictionary for Regulatory Activities (MedDRA). The performance of MedEx-based annotation was evaluated and case studies were performed to demonstrate the usefulness of our approaches.ResultsOur study yields an aggregated knowledge-enhanced AERS data mining set (AERS-DM). In total, the AERS-DM contains 37,029,228 Drug-ADE records. Seventy-one percent (10,221/14,490) of normalized drug concepts in the AERS were classified to 9 classes in NDF-RT. The number of unique pairs is 4,639,613 between RxNorm concepts and MedDRA Preferred Term (PT) codes and 205,725 between RxNorm concepts and SOC codes after ADE aggregation.ConclusionsWe have built an open-source Drug-ADE knowledge resource with data being normalized and aggregated using standard biomedical ontologies. The data resource has the potential to assist the mining of ADE from AERS for the data mining research community.

Scientific Reports | 2017

Preoperative red cell distribution width and neutrophil-to-lymphocyte ratio predict survival in patients with epithelial ovarian cancer

Zheng Li; Na Hong; Melissa S. Robertson; Chen Wang; Guoqian Jiang

Several parameters of preoperative complete blood count (CBC) and inflammation-associated blood cell markers derived from them have been reported to correlate with prognosis in patients with epithelial ovarian cancer (EOC), but their prognostic importance and optimal cutoffs are still needed be elucidated. Clinic/pathological parameters, 5-year follow-up data and preoperative CBC parameters were obtained retrospectively in 654 EOC patients underwent primary surgery at Mayo Clinic. Cutoffs for neutrophil-to-lymphocyte ratio (NLR), platelet-to-lymphocyte ratio (PLR), and monocyte-to-lymphocyte ratio (MLR) were optimized by receiver operating characteristic (ROC) curve. Prognostic significance for overall survival (OS) and recurrence free survival (RFS) were determined by Cox proportional hazards models and Kaplan-Meier method. Associations of RDW and NLR with clinic/pathological parameters were analyzed using non-parametric tests. RDW with cutoff 14.5 and NLR with cutoff 5.25 had independent prognostic significance for OS, while combined RDW and NLR scores stratified patients into low (RDW-low and NLR-low), intermediate (RDW-high or NLR-high) and high risk (RDW-high and NLR-high) groups, especially in patients with high-grade serous ovarian cancer (HGSOC). Moreover, high NLR was associated with poor RFS as well. Elevated RDW was strongly associated with age, whereas high NLR was strongly associated with stage, preoperative CA125 level and ascites at surgery.

Journal of the American Medical Informatics Association | 2015

Review and evaluation of electronic health records-driven phenotype algorithm authoring tools for clinical and translational research

Jie Xu; Luke V. Rasmussen; Pamela L Shaw; Guoqian Jiang; Richard C. Kiefer; Huan Mo; Jennifer A. Pacheco; Peter Speltz; Qian Zhu; Joshua C. Denny; Jyotishman Pathak; William K. Thompson; Enid Montague

OBJECTIVE To review and evaluate available software tools for electronic health record-driven phenotype authoring in order to identify gaps and needs for future development. MATERIALS AND METHODS Candidate phenotype authoring tools were identified through (1) literature search in four publication databases (PubMed, Embase, Web of Science, and Scopus) and (2) a web search. A collection of tools was compiled and reviewed after the searches. A survey was designed and distributed to the developers of the reviewed tools to discover their functionalities and features. RESULTS Twenty-four different phenotype authoring tools were identified and reviewed. Developers of 16 of these identified tools completed the evaluation survey (67% response rate). The surveyed tools showed commonalities but also varied in their capabilities in algorithm representation, logic functions, data support and software extensibility, search functions, user interface, and data outputs. DISCUSSION Positive trends identified in the evaluation included: algorithms can be represented in both computable and human readable formats; and most tools offer a web interface for easy access. However, issues were also identified: many tools were lacking advanced logic functions for authoring complex algorithms; the ability to construct queries that leveraged un-structured data was not widely implemented; and many tools had limited support for plug-ins or external analytic software. CONCLUSIONS Existing phenotype authoring tools could enable clinical researchers to work with electronic health record data more efficiently, but gaps still exist in terms of the functionalities of such tools. The present work can serve as a reference point for the future development of similar tools.

conference on information and knowledge management | 2012

Optimizing semantic MEDLINE for translational science studies using semantic web technologies

Cui Tao; Yuji Zhang; Guoqian Jiang; Matt-Mouley Bouamrane; Christopher G. Chute

Semantic MEDLINE provides comprehensive resources with structured annotations that have a potential to facilitate translational studies in the biomedical domain. It is computationally challenging, however, to perform queries directly from the data in the current Semantic MEDLINE database. In this research, we propose a domain pattern driven approach to optimize the Semantic MEDLINE data organization and representation for translational science studies using the Resource Description Framework (RDF) and Semantic Web technologies.

Journal of the American Medical Informatics Association | 2012

Quality evaluation of value sets from cancer study common data elements using the UMLS semantic groups

Guoqian Jiang; Harold R. Solbrig; Christopher G. Chute

Objective The objective of this study is to develop an approach to evaluate the quality of terminological annotations on the value set (ie, enumerated value domain) components of the common data elements (CDEs) in the context of clinical research using both unified medical language system (UMLS) semantic types and groups. Materials and methods The CDEs of the National Cancer Institute (NCI) Cancer Data Standards Repository, the NCI Thesaurus (NCIt) concepts and the UMLS semantic network were integrated using a semantic web-based framework for a SPARQL-enabled evaluation. First, the set of CDE-permissible values with corresponding meanings in external controlled terminologies were isolated. The corresponding value meanings were then evaluated against their NCI- or UMLS-generated semantic network mapping to determine whether all of the meanings fell within the same semantic group. Results Of the enumerated CDEs in the Cancer Data Standards Repository, 3093 (26.2%) had elements drawn from more than one UMLS semantic group. A random sample (n=100) of this set of elements indicated that 17% of them were likely to have been misclassified. Discussion The use of existing semantic web tools can support a high-throughput mechanism for evaluating the quality of large CDE collections. This study demonstrates that the involvement of multiple semantic groups in an enumerated value domain of a CDE is an effective anchor to trigger an auditing point for quality evaluation activities. Conclusion This approach produces a useful quality assurance mechanism for a clinical study CDE repository.

Explore More