Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Qing T. Zeng is active.

Publication


Featured researches published by Qing T. Zeng.


Journal of Biomedical Informatics | 2004

GLIF3: a representation format for sharable computer-interpretable clinical practice guidelines

Aziz A. Boxwala; Mor Peleg; Samson W. Tu; Omolola Ogunyemi; Qing T. Zeng; Dongwen Wang; Vimla L. Patel; Robert A. Greenes; Edward H. Shortliffe

The Guideline Interchange Format (GLIF) is a model for representation of sharable computer-interpretable guidelines. The current version of GLIF (GLIF3) is a substantial update and enhancement of the model since the previous version (GLIF2). GLIF3 enables encoding of a guideline at three levels: a conceptual flowchart, a computable specification that can be verified for logical consistency and completeness, and an implementable specification that is intended to be incorporated into particular institutional information systems. The representation has been tested on a wide variety of guidelines that are typical of the range of guidelines in clinical use. It builds upon GLIF2 by adding several constructs that enable interpretation of encoded guidelines in computer-based decision-support systems. GLIF3 leverages standards being developed in Health Level 7 in order to allow integration of guidelines with clinical information systems. The GLIF3 specification consists of an extensible object-oriented model and a structured syntax based on the resource description framework (RDF). Empirical validation of the ability to generate appropriate recommendations using GLIF3 has been tested by executing encoded guidelines against actual patient data. GLIF3 is accordingly ready for broader experimentation and prototype use by organizations that wish to evaluate its ability to capture the logic of clinical guidelines, to implement them in clinical systems, and thereby to provide integrated decision support to assist clinicians.


Psychological Medicine | 2012

Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model

Roy H. Perlis; Dan V. Iosifescu; Victor M. Castro; Shawn N. Murphy; Vivian S. Gainer; Jessica Minnier; Tianxi Cai; Sergey Goryachev; Qing T. Zeng; Patience Gallagher; Maurizio Fava; Jeffrey B. Weilburg; Susanne Churchill; Isaac S. Kohane; Jordan W. Smoller

BACKGROUND Electronic medical records (EMR) provide a unique opportunity for efficient, large-scale clinical investigation in psychiatry. However, such studies will require development of tools to define treatment outcome. METHOD Natural language processing (NLP) was applied to classify notes from 127 504 patients with a billing diagnosis of major depressive disorder, drawn from out-patient psychiatry practices affiliated with multiple, large New England hospitals. Classifications were compared with results using billing data (ICD-9 codes) alone and to a clinical gold standard based on chart review by a panel of senior clinicians. These cross-sectional classifications were then used to define longitudinal treatment outcomes, which were compared with a clinician-rated gold standard. RESULTS Models incorporating NLP were superior to those relying on billing data alone for classifying current mood state (area under receiver operating characteristic curve of 0.85-0.88 v. 0.54-0.55). When these cross-sectional visits were integrated to define longitudinal outcomes and incorporate treatment data, 15% of the cohort remitted with a single antidepressant treatment, while 13% were identified as failing to remit despite at least two antidepressant trials. Non-remitting patients were more likely to be non-Caucasian (p<0.001). CONCLUSIONS The application of bioinformatics tools such as NLP should enable accurate and efficient determination of longitudinal outcomes, enabling existing EMR data to be applied to clinical research, including biomarker investigations. Continued development will be required to better address moderators of outcome such as adherence and co-morbidity.


Journal of Biomedical Informatics | 2001

Toward a Representation Format for Sharable Clinical Guidelines

Aziz A. Boxwala; Samson W. Tu; Mor Peleg; Qing T. Zeng; Omolola Ogunyemi; Robert A. Greenes; Edward H. Shortliffe; Vimla L. Patel

Clinical guidelines are being developed for the purpose of reducing medical errors and unjustified variations in medical practice, and for basing medical practice on evidence. Encoding guidelines in a computer-interpretable format and integrating them with the electronic medical record can enable delivery of patient-specific recommendations when and where needed. Since great effort must be expended in developing high-quality guidelines, and in making them computer-interpretable, it is highly desirable to be able to share computer-interpretable guidelines (CIGs) among institutions. Adoption of a common format for representing CIGs is one approach to sharing. Factors that need to be considered in creating a format for sharable CIGs include (i) the scope of guidelines and their intended applications, (ii) the method of delivery of the recommendations, and (iii) the environment, consisting of the practice setting and the information system in which the guidelines will be applied. Several investigators have proposed solutions that improve the sharability of CIGs and, more generally, of medical knowledge. These approaches can be useful in the development of a format for sharable CIGs. Challenges in sharing CIGs also include the need to extend the traditional framework for disseminating guidelines to enable them to be integrated into practice. These extensions include processes for (i) local adaptation of recommendations encoded in shared generic guidelines and (ii) integration of guidelines into the institutional information systems.


international conference on biological and medical data analysis | 2005

On sample size and classification accuracy: a performance comparison

Margarita Sordo; Qing T. Zeng

We investigate the dependency between sample size and classification accuracy of three classification techniques: Naive Bayes, Support Vector Machines and Decision Trees over a set of 8500 text excerpts extracted automatically from narrative reports from the Brigham & Womens Hospital, Boston, USA. Each excerpt refers to the smoking status of a patient as: current, past, never a smoker or, denies smoking. Our empirical results, consistent with [1], confirm that size of the training set and the classification rate are indeed correlated. Even though these algorithms perform reasonably well with small datasets, as the number of cases increases, both SMV and Decision Trees show a substantial improvement in performance, suggesting a more consistent learning process. Unlike the majority of evaluations, ours were carried out specifically in a medical domain where the limited amount of data is a common occurrence [13][14]. This study is part of the I2B2 project, Core 2.


Journal of Biomedical Informatics | 2001

A knowledge-based, concept-oriented view generation system for clinical data

Qing T. Zeng; James J. Cimino

Information overload is a well-known problem for clinicians who must review large amounts of data in patient records. Concept-oriented views, which organize patient data around clinical concepts such as diagnostic strategies and therapeutic goals, may offer a solution to the problem of information overload. However, although concept-oriented views are desirable, they are difficult to create and maintain. We have developed a general-purpose, knowledge-based approach to the generation of concept-oriented views and have developed a system to test our approach. The system creates concept-oriented views through automated identification of relevant patient data. The knowledge in the system is represented by both a semantic network and rules. The key relevant data identification function is accomplished by a rule-based traversal of the semantic network. This paper focuses on the design and implementation of the system; an evaluation of the system is reported separately.


international conference on biological and medical data analysis | 2005

A text corpora-based estimation of the familiarity of health terminology

Qing T. Zeng; Eunjung Kim; Jonathan Crowell; Tony Tse

In a pilot effort to improve health communication we created a method for measuring the familiarity of various medical terms. To obtain term familiarity data, we recruited 21 volunteers who agreed to take medical terminology quizzes containing 68 terms. We then created predictive models for familiarity based on term occurrence in text corpora and readers demographics. Although the sample size was small, our preliminary results indicate that predicting the familiarity of medical terms based on an analysis of the frequency in text corpora is feasible. Further, individualized familiarity assessment is feasible when demographic features are included as predictors.


Journal of Health and Medical Informatics | 2013

Characterizing Clinical Text and Sublanguage: A Case Study of the VA Clinical Notes

Qing T. Zeng; Doug Redd; Guy Divita; SamahJarad; Cynthia Br; Jonathan R. Nebeker

Objective: To characterize text and sublanguage in medical records to better address challenges within Natural Language Processing (NLP) tasks such as information extraction, word sense disambiguation, information retrieval, and text summarization. The text and sublanguage analysis is needed to scale up the NLP development for large and diverse free-text clinical data sets. Design: This is a quantitative descriptive study which analyzes the text and sublanguage characteristics of a very large Veteran Affairs (VA) clinical note corpus (569 million notes) to guide the customization of natural language processing (NLP) of VA notes. Methods: We randomly sampled 100,000 notes from the top 100 most frequently appearing document types. We examined surface features and used those features to identify sublanguage groups using unsupervised clustering. Results: Using the text features we are able to characterize each of the 100 document types and identify 16 distinct sublanguage groups. The identified sublanguages reflect different clinical domains and types of encounters within the sample corpus. We also found much variance within each of the document types. Such characteristics will facilitate the tuning and crafting of NLP tools. Conclusion: Using a diverse and large sample of clinical text, we were able to show there are a relatively large number of sublanguages and variance both within and between document types. These findings will guide NLP development to create more customizable and generalizable solutions across medical domains and sublanguages.


eGEMs (Generating Evidence & Methods to improve patient outcomes) | 2016

v3NLP Framework: Tools to Build Applications for Extracting Concepts from Clinical Text.

Guy Divita; Marjorie E. Carter; Le-Thuy T. Tran; Doug Redd; Qing T. Zeng; Scott L. DuVall; Matthew H. Samore; Adi V. Gundlapalli

Introduction: Substantial amounts of clinically significant information are contained only within the narrative of the clinical notes in electronic medical records. The v3NLP Framework is a set of “best-of-breed” functionalities developed to transform this information into structured data for use in quality improvement, research, population health surveillance, and decision support. Background: MetaMap, cTAKES and similar well-known natural language processing (NLP) tools do not have sufficient scalability out of the box. The v3NLP Framework evolved out of the necessity to scale-up these tools up and provide a framework to customize and tune techniques that fit a variety of tasks, including document classification, tuned concept extraction for specific conditions, patient classification, and information retrieval. Innovation: Beyond scalability, several v3NLP Framework-developed projects have been efficacy tested and benchmarked. While v3NLP Framework includes annotators, pipelines and applications, its functionalities enable developers to create novel annotators and to place annotators into pipelines and scaled applications. Discussion: The v3NLP Framework has been successfully utilized in many projects including general concept extraction, risk factors for homelessness among veterans, and identification of mentions of the presence of an indwelling urinary catheter. Projects as diverse as predicting colonization with methicillin-resistant Staphylococcus aureus and extracting references to military sexual trauma are being built using v3NLP Framework components. Conclusion: The v3NLP Framework is a set of functionalities and components that provide Java developers with the ability to create novel annotators and to place those annotators into pipelines and applications to extract concepts from clinical text. There are scale-up and scale-out functionalities to process large numbers of records.


Computers in Biology and Medicine | 2014

Informatics can identify systemic sclerosis (SSc) patients at risk for scleroderma renal crisis

Doug Redd; Tracy M. Frech; Maureen A. Murtaugh; Julia Rhiannon; Qing T. Zeng

BACKGROUND Electronic medical records (EMR) provide an ideal opportunity for the detection, diagnosis, and management of systemic sclerosis (SSc) patients within the Veterans Health Administration (VHA). The objective of this project was to use informatics to identify potential SSc patients in the VHA that were on prednisone, in order to inform an outreach project to prevent scleroderma renal crisis (SRC). METHODS The electronic medical data for this study came from Veterans Informatics and Computing Infrastructure (VINCI). For natural language processing (NLP) analysis, a set of retrieval criteria was developed for documents expected to have a high correlation to SSc. The two annotators reviewed the ratings to assemble a single adjudicated set of ratings, from which a support vector machine (SVM) based document classifier was trained. Any patient having at least one document positively classified for SSc was considered positive for SSc and the use of prednisone≥10mg in the clinical document was reviewed to determine whether it was an active medication on the prescription list. RESULTS In the VHA, there were 4272 patients that have a diagnosis of SSc determined by the presence of an ICD-9 code. From these patients, 1118 patients (21%) had the use of prednisone≥10mg. Of these patients, 26 had a concurrent diagnosis of hypertension, thus these patients should not be on prednisone. By the use of natural language processing (NLP) an additional 16,522 patients were identified as possible SSc, highlighting that cases of SSc in the VHA may exist that are unidentified by ICD-9. A 10-fold cross validation of the classifier resulted in a precision (positive predictive value) of 0.814, recall (sensitivity) of 0.973, and f-measure of 0.873. CONCLUSIONS Our study demonstrated that current clinical practice in the VHA includes the potentially dangerous use of prednisone for veterans with SSc. This present study also suggests there may be many undetected cases of SSc and NLP can successfully identify these patients.


international conference on biological and medical data analysis | 2006

Data integration in multi-dimensional data sets: informational asymmetry in the valid correlation of subdivided samples

Qing T. Zeng; Juan P. Pratt; Jane Pak; Eun-Young Kim; Dino J. Ravnic; Harold T. Huss; Steven J. Mentzer

Background: Flow cytometry is the only currently available high throughput technology that can measure multiple physical and molecular characteristics of individual cells. It is common in flow cytometry to measure a relatively large number of characteristics or features by performing separate experiments on subdivided samples. Correlating data from multiple experiments using certain shared features (e.g. cell size) could provide useful information on the combination pattern of the not shared features. Such correlation, however, are not always reliable. Methods: We developed a method to assess the correlation reliability by estimating the percentage of cells that can be unambiguously correlated between two samples. This method was evaluated using 81 pairs of subdivided samples of microspheres (artificial cells) with known molecular characteristics. Results: Strong correlation (R=0.85) was found between the estimated and actual percentage of unambiguous correlation. Conclusion: The correlation reliability we developed can be used to support data integration of experiments on subdivided samples.

Collaboration


Dive into the Qing T. Zeng's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tony Tse

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge