Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jingqi Wang is active.

Publication


Featured researches published by Jingqi Wang.


international conference on computational linguistics | 2014

UTH_CCB: A report for SemEval 2014 -- Task 7 Analysis of Clinical Text

Yaoyun Zhang; Jingqi Wang; Buzhou Tang; Yonghui Wu; Min Jiang; Yukun Chen; Hua Xu

This work describes the participation of the University of Texas Health Science Center at Houston (UTHealth) team on the SemEval 2014 ‐ Task 7 analysis of clinical text challenge. The task consisted of two subtasks: (1) disorder entity recognition, recognizing mentions of disorder concepts; (2) disorder entity encoding, mapping each mention to a unique Concept Unique Identifier (CUI) defined in Unified Medical Language System (UMLS). We developed three ensemble learning approaches for recognizing disorder entities and a Vector Space Model based method for encoding. Our approaches achieved top rank in both subtasks, with the best F measure of 0.813 for entity recognition and the best accuracy of 74.1% for encoding, indicating the proposed approaches are promising.


Journal of Cheminformatics | 2015

A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature

Buzhou Tang; Yudong Feng; Xiaolong Wang; Yonghui Wu; Yaoyun Zhang; Min Jiang; Jingqi Wang; Hua Xu

BackgroundChemical compounds and drugs (together called chemical entities) embedded in scientific articles are crucial for many information extraction tasks in the biomedical domain. However, only a very limited number of chemical entity recognition systems are publically available, probably due to the lack of large manually annotated corpora. To accelerate the development of chemical entity recognition systems, the Spanish National Cancer Research Center (CNIO) and The University of Navarra organized a challenge on Chemical and Drug Named Entity Recognition (CHEMDNER). The CHEMDNER challenge contains two individual subtasks: 1) Chemical Entity Mention recognition (CEM); and 2) Chemical Document Indexing (CDI). Our study proposes machine learning-based systems for the CEM task.MethodsThe 2013 CHEMDNER challenge organizers provided a manually annotated 10,000 UTF8-encoded PubMed abstracts according to a predefined annotation guideline: a training set of 3,500 abstracts, a development set of 3,500 abstracts and a test set of 3,000 abstracts. We developed machine learning-based systems, based on conditional random fields (CRF) and structured support vector machines (SSVM) respectively, for the CEM task for this data set. The effects of three types of word representation (WR) features, generated by Brown clustering, random indexing and skip-gram, on both two machine learning-based systems were also investigated. The performance of our system was evaluated on the test set using scripts provided by the CHEMDNER challenge organizers. Primary evaluation measures were micro Precision, Recall, and F-measure.ResultsOur best system was among the top ranked systems with an official micro F-measure of 85.05%. Fixing a bug caused by inconsistent features marginally improved the performance (micro F-measure of 85.20%) of the system.ConclusionsThe SSVM-based CEM systems outperformed the CRF-based CEM systems when using the same features. Each type of the WR feature was beneficial to the CEM task. Both the CRF-based and SSVM-based systems using the all three types of WR features showed better performance than the systems using only one type of the WR feature.


Database | 2016

CD-REST: a system for extracting chemical-induced disease relation in literature

Jun Xu; Yonghui Wu; Yaoyun Zhang; Jingqi Wang; Hee-Jin Lee; Hua Xu

Mining chemical-induced disease relations embedded in the vast biomedical literature could facilitate a wide range of computational biomedical applications, such as pharmacovigilance. The BioCreative V organized a Chemical Disease Relation (CDR) Track regarding chemical-induced disease relation extraction from biomedical literature in 2015. We participated in all subtasks of this challenge. In this article, we present our participation system Chemical Disease Relation Extraction SysTem (CD-REST), an end-to-end system for extracting chemical-induced disease relations in biomedical literature. CD-REST consists of two main components: (1) a chemical and disease named entity recognition and normalization module, which employs the Conditional Random Fields algorithm for entity recognition and a Vector Space Model-based approach for normalization; and (2) a relation extraction module that classifies both sentence-level and document-level candidate drug–disease pairs by support vector machines. Our system achieved the best performance on the chemical-induced disease relation extraction subtask in the BioCreative V CDR Track, demonstrating the effectiveness of our proposed machine learning-based approaches for automatic extraction of chemical-induced disease relations in biomedical literature. The CD-REST system provides web services using HTTP POST request. The web services can be accessed from http://clinicalnlptool.com/cdr. The online CD-REST demonstration system is available at http://clinicalnlptool.com/cdr/cdr.html. Database URL: http://clinicalnlptool.com/cdr; http://clinicalnlptool.com/cdr/cdr.html


BMC Systems Biology | 2017

A systematic analysis of FDA-approved anticancer drugs

Jingchun Sun; Qiang Wei; Yubo Zhou; Jingqi Wang; Qi Liu; Hua Xu

BackgroundThe discovery of novel anticancer drugs is critical for the pharmaceutical research and development, and patient treatment. Repurposing existing drugs that may have unanticipated effects as potential candidates is one way to meet this important goal. Systematic investigation of efficient anticancer drugs could provide valuable insights into trends in the discovery of anticancer drugs, which may contribute to the systematic discovery of new anticancer drugs.ResultsIn this study, we collected and analyzed 150 anticancer drugs approved by the US Food and Drug Administration (FDA). Based on drug mechanism of action, these agents are divided into two groups: 61 cytotoxic-based drugs and 89 target-based drugs. We found that in the recent years, the proportion of targeted agents tended to be increasing, and the targeted drugs tended to be delivered as signal drugs. For 89 target-based drugs, we collected 102 effect-mediating drug targets in the human genome and found that most targets located on the plasma membrane and most of them belonged to the enzyme, especially tyrosine kinase. From above 150 drugs, we built a drug-cancer network, which contained 183 nodes (150 drugs and 33 cancer types) and 248 drug-cancer associations. The network indicated that the cytotoxic drugs tended to be used to treat more cancer types than targeted drugs. From 89 targeted drugs, we built a cancer-drug-target network, which contained 214 nodes (23 cancer types, 89 drugs, and 102 targets) and 313 edges (118 drug-cancer associations and 195 drug-target associations). Starting from the network, we discovered 133 novel drug-cancer associations among 52 drugs and 16 cancer types by applying the common target-based approach. Most novel drug-cancer associations (116, 87%) are supported by at least one clinical trial study.ConclusionsIn this study, we provided a comprehensive data source, including anticancer drugs and their targets and performed a detailed analysis in term of historical tendency and networks. Its application to identify novel drug-cancer associations demonstrated that the data collected in this study is promising to serve as a fundamental for anticancer drug repurposing and development.


Journal of the American Medical Informatics Association | 2018

CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines

Ergin Soysal; Jingqi Wang; Min Jiang; Yonghui Wu; Serguei V. S. Pakhomov; Hongfang Liu; Hua Xu

Abstract Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community.


north american chapter of the association for computational linguistics | 2016

UTHealth at SemEval-2016 Task 12: an End-to-End System for Temporal Information Extraction from Clinical Notes

Hee-Jin Lee; Hua Xu; Jingqi Wang; Yaoyun Zhang; Sungrim Moon; Jun Xu; Yonghui Wu

The 2016 Clinical TempEval challenge addresses temporal information extraction from clinical notes. The challenge is composed of six sub-tasks, each of which is to identify: (1) event mention spans, (2) time expression spans, (3) event attributes, (4) time attributes, (5) events’ temporal relations to the document creation times (DocTimeRel), and (6) narrative container relations among events and times. In this article, we present an end-to-end system that addresses all six sub-tasks. Our system achieved the best performance for all six sub-tasks when plain texts were given as input. It also performed best for narrative container relation identification when gold standard event/time annotations were given.


Database | 2016

Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning

Yaoyun Zhang; Jun Xu; Hui Chen; Jingqi Wang; Yonghui Wu; Manu Prakasam; Hua Xu

Medicinal chemistry patents contain rich information about chemical compounds. Although much effort has been devoted to extracting chemical entities from scientific literature, limited numbers of patent mining systems are publically available, probably due to the lack of large manually annotated corpora. To accelerate the development of information extraction systems for medicinal chemistry patents, the 2015 BioCreative V challenge organized a track on Chemical and Drug Named Entity Recognition from patent text (CHEMDNER patents). This track included three individual subtasks: (i) Chemical Entity Mention Recognition in Patents (CEMP), (ii) Chemical Passage Detection (CPD) and (iii) Gene and Protein Related Object task (GPRO). We participated in the two subtasks of CEMP and CPD using machine learning-based systems. Our machine learning-based systems employed the algorithms of conditional random fields (CRF) and structured support vector machines (SSVMs), respectively. To improve the performance of the NER systems, two strategies were proposed for feature engineering: (i) domain knowledge features of dictionaries, chemical structural patterns and semantic type information present in the context of the candidate chemical and (ii) unsupervised feature learning algorithms to generate word representation features by Brown clustering and a novel binarized Word embedding to enhance the generalizability of the system. Further, the system output for the CPD task was yielded based on the patent titles and abstracts with chemicals recognized in the CEMP task. The effects of the proposed feature strategies on both the machine learning-based systems were investigated. Our best system achieved the second best performance among 21 participating teams in CEMP with a precision of 87.18%, a recall of 90.78% and a F-measure of 88.94% and was the top performing system among nine participating teams in CPD with a sensitivity of 98.60%, a specificity of 87.21%, an accuracy of 94.75%, a Matthew’s correlation coefficient (MCC) of 88.24%, a precision at full recall (P_full_R) of 66.57% and an area under the precision-recall curve (AUC_PR) of 0.9347. The SSVM-based CEMP systems outperformed the CRF-based CEMP systems when using the same features. Features generated from both the domain knowledge and unsupervised learning algorithms significantly improved the chemical NER task on patents. Database URL: http:// database. oxfordjournals. org/ content/ 2016/ baw049


north american chapter of the association for computational linguistics | 2015

UTH-CCB: The Participation of the SemEval 2015 Challenge -- Task 14

Jun Xu; Yaoyun Zhang; Jingqi Wang; Yonghui Wu; Min Jiang; Ergin Soysal; Hua Xu

This paper describes the system developed by the University of Texas Health Science Center at Houston (UTHealth), for the 2015 SemEval shared task on “Analysis of Clinical Text” (Task 14). We participated in both sub-tasks: Task 1 for “Disorder Identification”, which aims to detect disorder entities and encode them to UMLS (Unified Medial Language System) CUI (Concept Unique Identifier) and Task 2 for Disorder Slot Filling, where the task is to identify normalized value for modifiers of disorders. For Task 1, we developed an ensemble approach that combined machine learning based named entity recognition classifiers with MetaMap, an existing symbolic biomedical NLP system, to recognize disorder entities, and we used a general Vector Space Model-based approach for disorder encoding to UMLS CUIs. To identify modifiers of disorders (Task 2), we developed Support Vector Machines-based classifiers for each type of modifier, by exploring various types of features. Our system was ranked 3 for Task 1 and 1 for the Task 2 (both 2A and 2B), demonstrating the effectiveness of machine learning-based approaches for extracting clinical entities and their modifiers from clinical narratives.


BMC Medical Informatics and Decision Making | 2017

An active learning-enabled annotation system for clinical named entity recognition

Yukun Chen; Thomas A. Lask; Qiaozhu Mei; Qingxia Chen; Sungrim Moon; Jingqi Wang; Ky Nguyen; Tolulola Dawodu; Trevor Cohen; Joshua C. Denny; Hua Xu

BackgroundActive learning (AL) has shown the promising potential to minimize the annotation cost while maximizing the performance in building statistical natural language processing (NLP) models. However, very few studies have investigated AL in a real-life setting in medical domain.MethodsIn this study, we developed the first AL-enabled annotation system for clinical named entity recognition (NER) with a novel AL algorithm. Besides the simulation study to evaluate the novel AL algorithm, we further conducted user studies with two nurses using this system to assess the performance of AL in real world annotation processes for building clinical NER models.ResultsThe simulation results show that the novel AL algorithm outperformed traditional AL algorithm and random sampling. However, the user study tells a different story that AL methods did not always perform better than random sampling for different users.ConclusionsWe found that the increased information content of actively selected sentences is strongly offset by the increased time required to annotate them. Moreover, the annotation time was not considered in the querying algorithms. Our future work includes developing better AL algorithms with the estimation of annotation time and evaluating the system with larger number of users.


Journal of the American Medical Informatics Association | 2018

Toward a normalized clinical drug knowledge base in China—applying the RxNorm model to Chinese clinical drugs

Li Wang; Yaoyun Zhang; Min Jiang; Jingqi Wang; Jiancheng Dong; Yun Liu; Cui Tao; Guoqian Jiang; Yi Zhou; Hua Xu

Objective In recent years, electronic health record systems have been widely implemented in China, making clinical data available electronically. However, little effort has been devoted to making drug information exchangeable among these systems. This study aimed to build a Normalized Chinese Clinical Drug (NCCD) knowledge base, by applying and extending the information model of RxNorm to Chinese clinical drugs. Methods Chinese drugs were collected from 4 major resources-China Food and Drug Administration, China Health Insurance Systems, Hospital Pharmacy Systems, and China Pharmacopoeia-for integration and normalization in NCCD. Chemical drugs were normalized using the information model in RxNorm without much change. Chinese patent drugs (i.e., Chinese herbal extracts), however, were represented using an expanded RxNorm model to incorporate the unique characteristics of these drugs. A hybrid approach combining automated natural language processing technologies and manual review by domain experts was then applied to drug attribute extraction, normalization, and further generation of drug names at different specification levels. Lastly, we reported the statistics of NCCD, as well as the evaluation results using several sets of randomly selected Chinese drugs. Results The current version of NCCD contains 16 976 chemical drugs and 2663 Chinese patent medicines, resulting in 19 639 clinical drugs, 250 267 unique concepts, and 2 602 760 relations. By manual review of 1700 chemical drugs and 250 Chinese patent drugs randomly selected from NCCD (about 10%), we showed that the hybrid approach could achieve an accuracy of 98.60% for drug name extraction and normalization. Using a collection of 500 chemical drugs and 500 Chinese patent drugs from other resources, we showed that NCCD achieved coverages of 97.0% and 90.0% for chemical drugs and Chinese patent drugs, respectively. Conclusion Evaluation results demonstrated the potential to improve interoperability across various electronic drug systems in China.

Collaboration


Dive into the Jingqi Wang's collaboration.

Top Co-Authors

Avatar

Hua Xu

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar

Yaoyun Zhang

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar

Yonghui Wu

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar

Jun Xu

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar

Min Jiang

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar

Sungrim Moon

University of Minnesota

View shared research outputs
Top Co-Authors

Avatar

Ergin Soysal

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Cui Tao

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar

Joshua C. Denny

Vanderbilt University Medical Center

View shared research outputs
Researchain Logo
Decentralizing Knowledge