Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shiqiang Tao is active.

Publication


Featured researches published by Shiqiang Tao.


international conference on big data | 2014

MaPLE: A MapReduce Pipeline for Lattice-based Evaluation and its application to SNOMED CT

Guo-Qiang Zhang; Wei Zhu; Mengmeng Sun; Shiqiang Tao; Olivier Bodenreider; Licong Cui

Non-lattice fragments are often indicative of structural anomalies in ontological systems and, as such, represent possible areas of focus for subsequent quality assurance work. However, extracting the non-lattice fragments in large ontological systems is computationally expensive if not prohibitive, using a traditional sequential approach. In this paper we present a general MapReduce pipeline, called MaPLE (MapReduce Pipeline for Lattice-based Evaluation), for extracting non-lattice fragments in large partially ordered sets and demonstrate its applicability in ontology quality assurance. Using MaPLE in a 30-node Hadoop local cloud, we systematically extracted non-lattice fragments in 8 SNOMED CT versions from 2009 to 2014 (each containing over 300k concepts), with an average total computing time of less than 3 hours per version. With dramatically reduced time, MaPLE makes it feasible not only to perform exhaustive structural analysis of large ontological hierarchies, but also to systematically track structural changes between versions. Our change analysis showed that the average change rates on the non-lattice pairs are up to 38.6 times higher than the change rates of the background structure (concept nodes). This demonstrates that fragments around non-lattice pairs exhibit significantly higher rates of change in the process of ontological evolution.


ACM Transactions on Knowledge Discovery From Data | 2016

Biomedical Ontology Quality Assurance Using a Big Data Approach

Licong Cui; Shiqiang Tao; Guo-Qiang Zhang

This article presents recent progresses made in using scalable cloud computing environment, Hadoop and MapReduce, to perform ontology quality assurance (OQA), and points to areas of future opportunity. The standard sequential approach used for implementing OQA methods can take weeks if not months for exhaustive analyses for large biomedical ontological systems. With OQA methods newly implemented using massively parallel algorithms in the MapReduce framework, several orders of magnitude in speed-up can be achieved (e.g., from three months to three hours). Such dramatically reduced time makes it feasible not only to perform exhaustive structural analysis of large ontological hierarchies, but also to systematically track structural changes between versions for evolutional analysis. As an exemplar, progress is reported in using MapReduce to perform evolutional analysis and visualization on the Systemized Nomenclature of Medicine—Clinical Terms (SNOMED CT), a prominent clinical terminology system. Future opportunities in three areas are described: one is to extend the scope of MapReduce-based approach to existing OQA methods, especially for automated exhaustive structural analysis. The second is to apply our proposed MapReduce Pipeline for Lattice-based Evaluation (MaPLE) approach, demonstrated as an exemplar method for SNOMED CT, to other biomedical ontologies. The third area is to develop interfaces for reviewing results obtained by OQA methods and for visualizing ontological alignment and evolution, which can also take advantage of cloud computing technology to systematically pre-compute computationally intensive jobs in order to increase performance during user interactions with the visualization interface. Advances in these directions are expected to better support the ontological engineering lifecycle.


Journal of the American Medical Informatics Association | 2017

Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT

Licong Cui; Wei Zhu; Shiqiang Tao; James T. Case; Olivier Bodenreider; Guo-Qiang Zhang

Abstract Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material and Methods: All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracted using a scalable MapReduce algorithm. Four lexical patterns (the lexical part) are identified among the extracted non-lattice subgraphs. Non-lattice subgraphs exhibiting such lexical patterns are often indicative of missing hierarchical relations or concepts. Each lexical pattern is associated with a potential specific type of error. Results: Applying the structural-lexical method to SNOMED CT (September 2015 US edition), we found 6801 non-lattice subgraphs that matched these lexical patterns, of which 2046 were amenable to visual inspection. We evaluated a random sample of 100 small subgraphs, of which 59 were reviewed in detail by domain experts. All the subgraphs reviewed contained errors confirmed by the experts. The most frequent type of error was missing is-a relations due to incomplete or inconsistent modeling of the concepts. Conclusions: Our hybrid structural-lexical method is innovative and proved effective not only in detecting errors in SNOMED CT, but also in suggesting remediation for these errors.


Journal of the American Medical Informatics Association | 2018

The National Sleep Research Resource: towards a sleep data commons

Guo-Qiang Zhang; Licong Cui; Remo Mueller; Shiqiang Tao; Matthew Kim; Michael Rueschman; Sara Mariani; Daniel Mobley; Susan Redline

Abstract Objective The gold standard for diagnosing sleep disorders is polysomnography, which generates extensive data about biophysical changes occurring during sleep. We developed the National Sleep Research Resource (NSRR), a comprehensive system for sharing sleep data. The NSRR embodies elements of a data commons aimed at accelerating research to address critical questions about the impact of sleep disorders on important health outcomes. Approach We used a metadata-guided approach, with a set of common sleep-specific terms enforcing uniform semantic interpretation of data elements across three main components: (1) annotated datasets; (2) user interfaces for accessing data; and (3) computational tools for the analysis of polysomnography recordings. We incorporated the process for managing dataset-specific data use agreements, evidence of Institutional Review Board review, and the corresponding access control in the NSRR web portal. The metadata-guided approach facilitates structural and semantic interoperability, ultimately leading to enhanced data reusability and scientific rigor. Results The authors curated and deposited retrospective data from 10 large, NIH-funded sleep cohort studies, including several from the Trans-Omics for Precision Medicine (TOPMed) program, into the NSRR. The NSRR currently contains data on 26 808 subjects and 31 166 signal files in European Data Format. Launched in April 2014, over 3000 registered users have downloaded over 130 terabytes of data. Conclusions The NSRR offers a use case and an example for creating a full-fledged data commons. It provides a single point of access to analysis-ready physiological signals from polysomnography obtained from multiple sources, and a wide variety of clinical data to facilitate sleep research.


Cancer Informatics | 2014

Trial prospector: Matching patients with cancer research studies using an automated and scalable approach

Satya S. Sahoo; Shiqiang Tao; Andrew Parchman; Zhihui Luo; Licong Cui; Patrick Mergler; Robert Lanese; Jill S. Barnholtz-Sloan; Neal J. Meropol; Guo-Qiang Zhang

Cancer is responsible for approximately 7.6 million deaths per year worldwide. A 2012 survey in the United Kingdom found dramatic improvement in survival rates for childhood cancer because of increased participation in clinical trials. Unfortunately, overall patient participation in cancer clinical studies is low. A key logistical barrier to patient and physician participation is the time required for identification of appropriate clinical trials for individual patients. We introduce the Trial Prospector tool that supports end-to-end management of cancer clinical trial recruitment workflow with (a) structured entry of trial eligibility criteria, (b) automated extraction of patient data from multiple sources, (c) a scalable matching algorithm, and (d) interactive user interface (UI) for physicians with both matching results and a detailed explanation of causes for ineligibility of available trials. We report the results from deployment of Trial Prospector at the National Cancer Institute (NCI)-designated Case Comprehensive Cancer Center (Case CCC) with 1,367 clinical trial eligibility evaluations performed with 100% accuracy.


IEEE Journal of Biomedical and Health Informatics | 2018

HyCLASSS: A Hybrid Classifier for Automatic Sleep Stage Scoring

Xiaojin Li; Licong Cui; Shiqiang Tao; Jing Chen; Xiang Zhang; Guo-Qiang Zhang

Automatic identification of sleep stage is an important step in a sleep study. In this paper, we propose a hybrid automatic sleep stage scoring approach, named HyCLASSS, based on single channel electroencephalogram (EEG). HyCLASSS, for the first time, leverages both signal and stage transition features of human sleep for automatic identification of sleep stages. HyCLASSS consists of two parts: A random forest classifier and correction rules. Random forest classifier is trained using 30 EEG signal features, including temporal, frequency, and nonlinear features. The correction rules are constructed based on stage transition feature, importing the continuity property of sleep, and characteristic of sleep stage transition. Compared with the gold standard of manual scoring using Rechtschaffen and Kales criterion, the overall accuracy and kappa coefficient applied on 198 subjects has reached 85.95% and 0.8046 in our experiment, respectively. The performance of HyCLASS compared favorably to previous work, and it could be integrated with sleep evaluation or sleep diagnosis system in the future.


JMIR Research Protocols | 2018

Individualized Clinical Practice Guidelines for Pressure Injury Management: Development of an Integrated Multi-Modal Biomedical Information Resource

Kath M. Bogie; Guo-Qiang Zhang; Steven K. Roggenkamp; Ningzhou Zeng; Jacinta Seton; Shiqiang Tao; Arielle L Bloostein; Jiayang Sun

Background Pressure ulcers (PU) and deep tissue injuries (DTI), collectively known as pressure injuries are serious complications causing staggering costs and human suffering with over 200 reported risk factors from many domains. Primary pressure injury prevention seeks to prevent the first incidence, while secondary PU/DTI prevention aims to decrease chronic recurrence. Clinical practice guidelines (CPG) combine evidence-based practice and expert opinion to aid clinicians in the goal of achieving best practices for primary and secondary prevention. The correction of all risk factors can be both overwhelming and impractical to implement in clinical practice. There is a need to develop practical clinical tools to prioritize the multiple recommendations of CPG, but there is limited guidance on how to prioritize based on individual cases. Bioinformatics platforms enable data management to support clinical decision support and user-interface development for complex clinical challenges such as pressure injury prevention care planning. Objective The central hypothesis of the study is that the individual’s risk factor profile can provide the basis for adaptive, personalized care planning for PU prevention based on CPG prioritization. The study objective is to develop the Spinal Cord Injury Pressure Ulcer and Deep Tissue Injury (SCIPUD+) Resource to support personalized care planning for primary and secondary PU/DTI prevention. Methods The study is employing a retrospective electronic health record (EHR) chart review of over 75 factors known to be relevant for pressure injury risk in individuals with a spinal cord injury (SCI) and routinely recorded in the EHR. We also perform tissue health assessments of a selected sub-group. A systems approach is being used to develop and validate the SCIPUD+ Resource incorporating the many risk factor domains associated with PU/DTI primary and secondary prevention, ranging from the individual’s environment to local tissue health. Our multiscale approach will leverage the strength of bioinformatics applied to an established national EHR system. A comprehensive model is being used to relate the primary outcome of interest (PU/DTI development) with over 75 PU/DTI risk factors using a retrospective chart review of 5000 individuals selected from the study cohort of more than 36,000 persons with SCI. A Spinal Cord Injury Pressure Ulcer and Deep Tissue Injury Ontology (SCIPUDO) is being developed to enable robust text-mining for data extraction from free-form notes. Results The results from this study are pending. Conclusions PU/DTI remains a highly significant source of morbidity for individuals with SCI. Personalized interactive care plans may decrease both initial PU formation and readmission rates for high-risk individuals. The project is using established EHR data to build a comprehensive, structured model of environmental, social and clinical pressure injury risk factors. The comprehensive SCIPUD+ health care tool will be used to relate the primary outcome of interest (pressure injury development) with covariates including environmental, social, clinical, personal and tissue health profiles as well as possible interactions among some of these covariates. The study will result in a validated tool for personalized implementation of CPG recommendations and has great potential to change the standard of care for PrI clinical practice by enabling clinicians to provide personalized application of CPG priorities tailored to the needs of each at-risk individual with SCI. Registered Report Identifier RR1-10.2196/10871


bioinformatics and biomedicine | 2015

RREV: Reconfigurable Rendering Engine for visualization of clinically annotated polysomnograms

Catherine P. Jayapandian; Wei Wang; Michael G. Morrical; Dennis A. Dean; Shiqiang Tao; Daniel Mobley; Matthew Kim; Michael Rueschman; Kenneth A. Loparo; Susan Redline; Guo-Qiang Zhang

In sleep medicine, clinical studies often use their own data dictionaries for capturing clinical sleep events using proprietary signal analysis software [1][2]. Visualization of polysomnograms and their associated events from multiple distinct studies, such as for the National Sleep Research Resource (NSRR)[3], is an unresolved issue. Currently, there is no known visualization software for the European Data Format (EDF) that can be dynamically configured to support rendering of sleep events for multiple vendor formats. To address this challenge, domain ontology has been developed as a part of NSRR to model all sleep medicine terms and concepts to provide a common schema for addressing the structural and semantic heterogeneity of multiple vendor formats [4]. A Reconfigurable Rendering Engine using Abstract Factory pattern [5] and domain ontology provides a standard interface for accessing ontology-enabled clinical events for the visualization of electrophysiological signals. About 11,078 polysomnograms (8,444 SHHS, 860 CHAT, 591 HeartBEAT, 730 CFS, 453 SOF) [12] in EDF have been processed resulting in 1.1TB of web-accessible and reusable PSGs with NSRR standardized event annotations.


american medical informatics association annual symposium | 2014

A Semantic-based Approach for Exploring Consumer Health Questions Using UMLS

Licong Cui; Shiqiang Tao; Guo-Qiang Zhang


AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science | 2015

NEO: Systematic Non-Lattice Embedding of Ontologies for Comparing the Subsumption Relationship in SNOMED CT and in FMA Using MapReduce.

Wei Zhu; Guo-Qiang Zhang; Shiqiang Tao; Mengmeng Sun; Licong Cui

Collaboration


Dive into the Shiqiang Tao's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Licong Cui

University of Kentucky

View shared research outputs
Top Co-Authors

Avatar

Wei Zhu

University of Kentucky

View shared research outputs
Top Co-Authors

Avatar

Mengmeng Sun

Case Western Reserve University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Matthew Kim

Brigham and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar

Michael Rueschman

Brigham and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar

Olivier Bodenreider

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Susan Redline

Brigham and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar

Xi Wu

University of Kentucky

View shared research outputs
Researchain Logo
Decentralizing Knowledge