Shuai Zheng
Emory University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shuai Zheng.
International Journal of Semantic Computing | 2014
Shuai Zheng; Fusheng Wang; James J. Lu
While current biomedical ontology repositories offer primitive query capabilities, it is difficult or cumbersome to support ontology based semantic queries directly in semantically annotated biomedical databases. The problem may be largely attributed to the mismatch between the models of the ontologies and the databases, and the mismatch between the query interfaces of the two systems. To fully realize semantic query capabilities based on ontologies, we develop a system DBOntoLink to provide unified semantic query interfaces by extending database query languages. With DBOntoLink, semantic queries can be directly and naturally specified as extended functions of the database query languages without any programming needed. DBOntoLink is adaptable to different ontologies through customizations and supports major biomedical ontologies hosted at the NCBO BioPortal. We demonstrate the use of DBOntoLink in a real world biomedical database with semantically annotated medical image annotations.
JMIR medical informatics | 2017
Shuai Zheng; James J. Lu; Nima Ghasemzadeh; Salim Hayek; Arshed A. Quyyumi; Fusheng Wang
Background Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time. Objective Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results. Methods A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction. Results Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports—each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%. Conclusions IDEAL-X adopts a unique online machine learning–based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable.
Journal of Pathology Informatics | 2015
Shuai Zheng; James J. Lu; Christina L. Appin; Daniel J. Brat; Fusheng Wang
Background: Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user′s interaction with minimal human effort. Methods : We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system′s data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users′ corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. Results: We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. Conclusions: Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human′s feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search.
conference on information and knowledge management | 2012
Shuai Zheng; Fusheng Wang; James J. Lu; Joel H. Saltz
While current biomedical ontology repositories offer primitive query capabilities, it is difficult or cumbersome to support ontology based semantic queries directly in semantically annotated biomedical databases. The problem may be largely attributed to the mismatch between the models of the ontologies and the databases, and the mismatch between the query interfaces of the two systems. To fully realize semantic query capabilities based on ontologies, we develop a system DBOntoLink to provide unified semantic query interfaces by extending database query languages. With DBOntoLink, semantic queries can be directly and naturally specified as extended functions of the database query languages without any programming needed. DBOntoLink is adaptable to different ontologies through customizations and supports major biomedical ontologies hosted at the NCBO BioPortal. We demonstrate the use of DBOntoLink in a real world biomedical database with semantically annotated medical image annotations.
JMIR medical informatics | 2018
Shuai Zheng; Salma K. Jabbour; Shannon E O'Reilly; James J. Lu; Lihua Dong; Lijuan Ding; Ying Xiao; Ning J. Yue; Fusheng Wang; Wei Zou
Background In outcome studies of oncology patients undergoing radiation, researchers extract valuable information from medical records generated before, during, and after radiotherapy visits, such as survival data, toxicities, and complications. Clinical studies rely heavily on these data to correlate the treatment regimen with the prognosis to develop evidence-based radiation therapy paradigms. These data are available mainly in forms of narrative texts or table formats with heterogeneous vocabularies. Manual extraction of the related information from these data can be time consuming and labor intensive, which is not ideal for large studies. Objective The objective of this study was to adapt the interactive information extraction platform Information and Data Extraction using Adaptive Learning (IDEAL-X) to extract treatment and prognosis data for patients with locally advanced or inoperable non–small cell lung cancer (NSCLC). Methods We transformed patient treatment and prognosis documents into normalized structured forms using the IDEAL-X system for easy data navigation. The adaptive learning and user-customized controlled toxicity vocabularies were applied to extract categorized treatment and prognosis data, so as to generate structured output. Results In total, we extracted data from 261 treatment and prognosis documents relating to 50 patients, with overall precision and recall more than 93% and 83%, respectively. For toxicity information extractions, which are important to study patient posttreatment side effects and quality of life, the precision and recall achieved 95.7% and 94.5% respectively. Conclusions The IDEAL-X system is capable of extracting study data regarding NSCLC chemoradiation patients with significant accuracy and effectiveness, and therefore can be used in large-scale radiotherapy clinical data studies.
american medical informatics association annual symposium | 2013
Shuai Zheng; Fusheng Wang; James J. Lu
Proceedings of the 2nd international workshop on Managing interoperability and compleXity in health systems | 2012
Shuai Zheng; Fusheng Wang; James J. Lu
Medical Care | 2017
Raymund Dantes; Shuai Zheng; James J. Lu; Michele G. Beckman; Asha Krishnaswamy; Lisa C. Richardson; Sheri Chernetsky-Tejedor; Fusheng Wang
AMIA | 2015
Shuai Zheng; Raymund Dantes; James J. Lu; Sheri Chernetsky Tejedor; Michele G. Beckman; Asha Krishnaswamy; Lisa C. Richardson; Fusheng Wang
International Journal of Radiation Oncology Biology Physics | 2014
Shuai Zheng; Fusheng Wang; H. Gan; James J. Lu; Salma K. Jabbour; N Yue; W. Zou