Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shaodian Zhang is active.

Publication


Featured researches published by Shaodian Zhang.


Journal of Biomedical Informatics | 2013

Unsupervised biomedical named entity recognition

Shaodian Zhang; Noémie Elhadad

Named entity recognition is a crucial component of biomedical natural language processing, enabling information extraction and ultimately reasoning over and knowledge discovery from text. Much progress has been made in the design of rule-based and supervised tools, but they are often genre and task dependent. As such, adapting them to different genres of text or identifying new types of entities requires major effort in re-annotation or rule development. In this paper, we propose an unsupervised approach to extracting named entities from biomedical text. We describe a stepwise solution to tackle the challenges of entity boundary detection and entity type classification without relying on any handcrafted rules, heuristics, or annotated data. A noun phrase chunker followed by a filter based on inverse document frequency extracts candidate entities from free text. Classification of candidate entities into categories of interest is carried out by leveraging principles from distributional semantics. Experiments show that our system, especially the entity classification step, yields competitive results on two popular biomedical datasets of clinical notes and biological literature, and outperforms a baseline dictionary match approach. Detailed error analysis provides a road map for future work.


ACM Transactions on Intelligent Systems and Technology | 2013

Named entity recognition for tweets

Xiaohua Liu; Furu Wei; Shaodian Zhang; Ming Zhou

Two main challenges of Named Entity Recognition (NER) for tweets are the insufficient information in a tweet and the lack of training data. We propose a novel method consisting of three core elements: (1) normalization of tweets; (2) combination of a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model; and (3) semisupervised learning framework. The tweet normalization preprocessing corrects common ill-formed words using a global linear model. The KNN-based classifier conducts prelabeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semisupervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of normalization, KNN, and semisupervised learning.


Journal of the American Medical Informatics Association | 2016

Online cancer communities as informatics intervention for social support: conceptualization, characterization, and impact

Shaodian Zhang; Erin O’Carroll Bantum; Jason E. Owen; Suzanne Bakken; Noémie Elhadad

Objectives: The Internet and social media are revolutionizing how social support is exchanged and perceived, making online health communities (OHCs) one of the most exciting research areas in health informatics. This paper aims to provide a framework for organizing research of OHCs and help identify questions to explore for future informatics research. Based on the framework, we conceptualize OHCs from a social support standpoint and identify variables of interest in characterizing community members. For the sake of this tutorial, we focus our review on online cancer communities. Target audience: The primary target audience is informaticists interested in understanding ways to characterize OHCs, their members, and the impact of participation, and in creating tools to facilitate outcome research of OHCs. OHC designers and moderators are also among the target audience for this tutorial. Scope: The tutorial provides an informatics point of view of online cancer communities, with social support as their leading element. We conceptualize OHCs according to 3 major variables: type of support, source of support, and setting in which the support is exchanged. We summarize current research and synthesize the findings for 2 primary research questions on online cancer communities: (1) the impact of using online social support on an individuals health, and (2) the characteristics of the community, its members, and their interactions. We discuss ways in which future research in informatics in social support and OHCs can ultimately benefit patients.


Journal of Biomedical Informatics | 2016

Speculation detection for Chinese clinical notes

Shaodian Zhang; Tian Kang; Xingting Zhang; Dong Wen; Noémie Elhadad; Jianbo Lei

Speculations represent uncertainty toward certain facts. In clinical texts, identifying speculations is a critical step of natural language processing (NLP). While it is a nontrivial task in many languages, detecting speculations in Chinese clinical notes can be particularly challenging because word segmentation may be necessary as an upstream operation. The objective of this paper is to construct a state-of-the-art speculation detection system for Chinese clinical notes and to investigate whether embedding features and word segmentations are worth exploiting toward this overall task. We propose a sequence labeling based system for speculation detection, which relies on features from bag of characters, bag of words, character embedding, and word embedding. We experiment on a novel dataset of 36,828 clinical notes with 5103 gold-standard speculation annotations on 2000 notes, and compare the systems in which word embeddings are calculated based on word segmentations given by general and by domain specific segmenters respectively. Our systems are able to reach performance as high as 92.2% measured by F score. We demonstrate that word segmentation is critical to produce high quality word embedding to facilitate downstream information extraction applications, and suggest that a domain dependent word segmenter can be vital to such a clinical NLP task in Chinese language.


Computer Methods and Programs in Biomedicine | 2017

Detecting negation and scope in Chinese clinical notes using character and word embedding

Tian Kang; Shaodian Zhang; Nanfang Xu; Dong Wen; Xingting Zhang; Jianbo Lei

BACKGROUND AND OBJECTIVES Researchers have developed effective methods to index free-text clinical notes into structured database, in which negation detection is a critical but challenging step. In Chinese clinical records, negation detection is particularly challenging because it may depend on upstream Chinese information processing components such as word segmentation [1]. Traditionally, negation detection was carried out mostly using rule-based methods, whose comprehensiveness and portability were usually limited. Our objectives in this paper are to: 1) Construct a large Chinese clinical notes corpus with negation annotated; 2) develop a negation detection tool for Chinese clinical notes; 3) evaluate the performance of character and word embedding features in Chinese clinical natural language processing. METHODS In this paper, we construct a Chinese clinical corpus consisting of admission and discharge summaries, and propose sequence labeling based systems for negation and scope detection. Our systems rely on features from bag of characters, bag of words, character embedding and word embedding. For scopes, we introduce an additional feature to handle nested scopes with multiple negations. RESULTS The two annotators reached an agreement of 0.79 measured by Kappa in manual annotation. In cue detection, our systems are able to achieve a performance as high as 99.0% measured by F score, which significantly outperform its rule-based counterpart (79% F). The best system uses word embedding as features, which yields precision of 99.0% and recall of 99.1%. In scope detection, our system is able to achieve a performance of 94.6% measured by F score. CONCLUSIONS Our study provides a state-of-the-art negation-detecting tool for Chinese clinical free-text notes; Experimental results demonstrate that word embedding is effective in identifying negations, and that nested scopes can be identified effectively by our method.


Journal of the American Medical Informatics Association | 2017

EliIE: An open-source information extraction system for clinical trial eligibility criteria

Tian Kang; Shaodian Zhang; Youlan Tang; Gregory W. Hruby; Alexander Rusanov; Noémie Elhadad; Chunhua Weng

Objective To develop an open-source information extraction system called Eligibility Criteria Information Extraction (EliIE) for parsing and formalizing free-text clinical research eligibility criteria (EC) following Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) version 5.0. Materials and Methods EliIE parses EC in 4 steps: (1) clinical entity and attribute recognition, (2) negation detection, (3) relation extraction, and (4) concept normalization and output structuring. Informaticians and domain experts were recruited to design an annotation guideline and generate a training corpus of annotated EC for 230 Alzheimers clinical trials, which were represented as queries against the OMOP CDM and included 8008 entities, 3550 attributes, and 3529 relations. A sequence labeling-based method was developed for automatic entity and attribute recognition. Negation detection was supported by NegEx and a set of predefined rules. Relation extraction was achieved by a support vector machine classifier. We further performed terminology-based concept normalization and output structuring. Results In task-specific evaluations, the best F1 score for entity recognition was 0.79, and for relation extraction was 0.89. The accuracy of negation detection was 0.94. The overall accuracy for query formalization was 0.71 in an end-to-end evaluation. Conclusions This study presents EliIE, an OMOP CDM-based information extraction system for automatic structuring and formalization of free-text EC. According to our evaluation, machine learning-based EliIE outperforms existing systems and shows promise to improve.


Journal of Biomedical Informatics | 2017

A cascaded approach for Chinese clinical text de-identification with less annotation effort

Zhe Jian; Xusheng Guo; Shijian Liu; Handong Ma; Shaodian Zhang; Rui Zhang; Jianbo Lei

With rapid adoption of Electronic Health Records (EHR) in China, an increasing amount of clinical data has been available to support clinical research. Clinical data secondary use usually requires de-identification of personal information to protect patient privacy. Since manually de-identification of free clinical text requires significant amount of human work, developing an automated de-identification system is necessary. While there are many de-identification systems available for English clinical text, designing a de-identification system for Chinese clinical text faces many challenges such as unavailability of necessary lexical resources and sparsity of patient health information (PHI) in Chinese clinical text. In this paper, we designed a de-identification pipeline taking advantage of both rule-based and machine learning techniques. Our method, in particular, can effectively construct a data set with dense PHI information, which saves annotation time significantly for subsequent supervised learning. We experiment on a dataset of 3000 heterogeneous clinical documents to evaluate the annotation cost and the de-identification performance. Our approach can increase the efficiency of the annotation effort by over 60% while reaching performance as high as over 90% measured by F score. We demonstrate that combing rule-based and machine learning is an effective way to reduce the annotation cost and achieve high performance in Chinese clinical text de-identification task.


international world wide web conferences | 2017

Cataloguing Treatments Discussed and Used in Online Autism Communities

Shaodian Zhang; Tian Kang; Lin Qiu; Weinan Zhang; Yong Yu; Noémie Elhadad

A large number of patients discuss treatments in online health communities (OHCs). One research question of interest to health researchers is whether treatments being discussed in OHCs are eventually used by community members in their real lives. In this paper, we rely on machine learning methods to automatically identify attributions of mentions of treatments from an online autism community. The context of our work is online autism communities, where parents exchange support for the care of their children with autism spectrum disorder. Our methods are able to distinguish discussions of treatments that are associated with patients, caregivers, and others, as well as identify whether a treatment is actually taken. We investigate treatments that are not just discussed but also used by patients according to two types of content analysis, cross-sectional and longitudinal. The treatments identified through our content analysis help create a catalogue of real-world treatments. This study results lay the foundation for future research to compare real-world drug usage with established clinical guidelines.


meeting of the association for computational linguistics | 2011

Recognizing Named Entities in Tweets

Xiaohua Liu; Shaodian Zhang; Furu Wei; Ming Zhou


american medical informatics association annual symposium | 2014

Characterizing the sublanguage of online breast cancer forums for medications, symptoms, and emotions.

Noémie Elhadad; Shaodian Zhang; Patricia Driscoll; Samuel Brody

Collaboration


Dive into the Shaodian Zhang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Weinan Zhang

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar

Yong Yu

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jason E. Owen

VA Palo Alto Healthcare System

View shared research outputs
Top Co-Authors

Avatar

Bao-Liang Lu

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar

Hai Zhao

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lin Qiu

Shanghai Jiao Tong University

View shared research outputs
Researchain Logo
Decentralizing Knowledge