[PDF] Comprehend Medical: a Named Entity Recognition and Relationship Extraction Web Service

Abstract

Comprehend Medical is a stateless and Health Insurance Portability and Accountability Act (HIPAA) eligible Named Entity Recognition (NER) and Relationship Extraction (RE) service launched under Amazon Web Services (AWS) trained using state-of-the-art deep learning models. Contrary to many existing open source tools, Comprehend Medical is scalable and does not require steep learning curve, dependencies, pipeline configurations, or installations. Currently, Comprehend Medical performs NER in five medical categories: Anatomy, Medical Condition, Medications, Protected Health Information (PHI) and Treatment, Test and Procedure (TTP). Additionally, the service provides relationship extraction for the detected entities as well as contextual information such as negation and temporality in the form of traits. Comprehend Medical provides two Application Programming Interfaces (API): 1) the NERe API which returns all the extracted named entities, their traits and the relationships between them and 2) the PHId API which returns just the protected health information contained in the text. Furthermore, Comprehend Medical is accessible through AWS Console, Java and Python Software Development Kit (SDK), making it easier for non-developers and developers to use.

Full PDF

CComprehend Medical: a Named Entity Recognitionand Relationship Extraction Web Service

Parminder Bhatia

Amazon

Seattle, Washington, [email protected]

Busra Celikkaya

Amazon

Seattle, Washington, [email protected]

Mohammed Khalilia

Amazon

Seattle, Washington, [email protected]

Selvan Senthivel

Amazon

Seattle, Washington, [email protected]

Abstract —Comprehend Medical is a stateless and HealthInsurance Portability and Accountability Act (HIPAA) eligibleNamed Entity Recognition (NER) and Relationship Extraction(RE) service launched under Amazon Web Services (AWS)trained using state-of-the-art deep learning models. Contrary tomany existing open source tools, Comprehend Medical is scalableand does not require steep learning curve, dependencies, pipelineconﬁgurations, or installations. Currently, Comprehend Medicalperforms NER in ﬁve medical categories: Anatomy, MedicalCondition, Medications, Protected Health Information (PHI) andTreatment, Test and Procedure (TTP). Additionally, the serviceprovides relationship extraction for the detected entities as wellas contextual information such as negation and temporality inthe form of traits. Comprehend Medical provides two ApplicationProgramming Interfaces (API): 1) the NERe API which returnsall the extracted named entities, their traits and the relationshipsbetween them and 2) the PHId API which returns just theprotected health information contained in the text. Furthermore,Comprehend Medical is accessible through AWS Console, Javaand Python Software Development Kit (SDK), making it easierfor non-developers and developers to use.

Index Terms —Neural Networks, Multi-task Learning, NaturalLanguage Processing, Clinical NLP, Named Entity Recognition,Relationship Extraction

I. I

NTRODUCTION

Electronic Health Records (EHR) contain a wealth of pa-tients’ data ranging from diagnoses, problems, treatments,medications to imaging and clinical narratives such as dis-charge summaries and progress reports. Structured data areimportant for billing, quality and outcomes. On the otherhand, narrative text is more expressive, more engaging andcaptures patient’s story more accurately. Narrative notes mayalso contain information about level of concern and uncertaintyto others who are reviewing the note. Studies have shown thatnarrative notes contain more naturalistic prose, more reliablein identifying patients with a given disease and more under-standable to healthcare providers reviewing those notes [1]–[5]. Therefore, to have a clear perspective on patient condition,narrative text should be analyzed. However, manual analysisof massive number of narrative text is time consuming, laborintensive and prone to errors.Many clinical Natural Language Processing (NLP) toolsand systems were published to help us make sense of thosevaluable narrative text. For instance, clinical Text Analysisand Knowledge Extraction System (cTAKES) [6] is an open- source NLP package based on the Unstructured Informa-tion Management Architecture (UIMA) framework [7] andOpenNLP [8] natural language processing toolkit. cTAKESuses a dictionary look-up and each mention is mapped toa Uniﬁed Medical Language System (UMLS) concept [9].MetaMap [10] is another open-source tool aims at mappingmentions in biomedical text to UMLS concepts using dictio-nary lookup. MetaMap Lite [11] adds negation detection basedon either ConText [12] or NegEx [13].The Clinical Language Annotation, Modeling, and Process-ing (CLAMP) [14] is one of the most recent clinical NLPsystems. CLAMP is motivated by the fact that existing clinicalNLP systems need customization and must be tailored toone’s task. For NER, CLAMP takes two approaches: machinelearning approach using Conditional Random Field (CRF) [15]and dictionary-based, which maps mentions to standardizedontologies. CLAMP also provides assertion and negation de-tection based on machine learning or rule-based NegEx.Many of the existing NLP systems rely on ConText [12]and NegEx [13] to detect assertions such as negation. ConTextextracts three contextual features for medical conditions: nega-tion, historical or hypothetical and experienced by someoneother than the patient. ConText is an extension of NegEx,which is based on regular expression.Most of the NLP systems discussed above perform linkingof mentions to UMLS. They are based on pipelined compo-nents that are conﬁgurable, rely on dictionary look-up for NERand regular expressions for assertion detection.Recently, neural network models have been proposed toovercome some of the limitations of rule-based techniques.A feedforward and bidirectional Long Short Term Memory(BiLSTM) networks for generic negation scope detection wasproposed in [16]. In [17] a gated recurrent units (GRUs)are used to represent the clinical relations and their context,along with an attention mechanism. Given a text annotatedwith relations, it classiﬁes the presence and period of therelations. However, this approach is not end-to-end as it doesnot predict the relations. Additionally, these models generallyrequire large annotated corpus to achieve good performance,but clinical data is scarce.Kernel-based approaches are also very common, especiallyin the 2010 i2b2/VA task of predicting assertions. The state-of-the-art in that challenge applied support vector machines a r X i v : . [ c s . C L ] O c t SVM) to assertion prediction as a separate step after entityextraction [18]. They train classiﬁers to predict assertionsof each concept word, and a separate classiﬁer to predictthe assertion of the whole entity. Augmented Bag of WordsKernel (ABoW), which generates features based on NegExrules along with bag-of-words features was proposed in [19]and a CRF based approach for classiﬁcation of cues and scopedetection was proposed in [20]. These machine learning basedapproaches often suffer in generalizability.Once named entities are extracted it is important to identifythe relationships between the entities. Several end-to-end mod-els were proposed that jointly learn named entity recognitionand relationship extraction [21]–[23]. Generally, relationshipextraction models consist of an encoder followed by rela-tionship classiﬁcation unit [24]–[26]. The encoder providescontext aware vector representations for both target entities,which are then merged or concatenated before being passedto the relation classiﬁcation unit, where a two layered neuralnetwork or multi-layered perceptron classiﬁes the pair intodifferent relation types.Despite the existence of many clinical NLP systems, au-tomatic information extraction from narrative clinical texthas not achieved enough traction yet [27]. As reported by[27] there is a signiﬁcant gap between clinical studies usingElectornic Health Record (EHR) data and studies using clinicalinformation extraction. Reasons for such gap can be attributedto limited expertise of NLP experts in the clinical domain, lim-ited availability of clinical data sets due to the HIPAA privacyrules and poor portability and generalizability of clinical NLPsystems. Rule-based NLP systems require handcrafted rules,while machine learning-based NLP systems require annotateddatasets.To narrow the clinical NLP adoption gap and to addresssome of the limitations in existing NLP systems, we presentComprehend Medical, a web service for clinical named entityrecognition and relationship extraction. Our contributions areas follows: • Named entity recognition, relationship extraction and traitdetection service encapsulated in one easy to use API. • Web service that uses deep learning multi-task [28]approach trained on labeled training data and requiresno conﬁgurations or customization. • Trait (negation, sign, symptom and diagnosis) detectionfor medical condition and negation detection for medica-tion.The rest of the paper is organized as follows: section IIpresents the methods, section III describes the datasets andexperimental settings, section IV contains the results for theNER and RE models, section V talks about the implementationdetails, section VI gives overview of the supported entities,traits and relationships, section VII presents some of the usecases and we conclude in section VIII.II. M

ETHODS

In this section we brieﬂy introduce the architectures fornamed entity recognition and trait detection proposed in [29] and the relation extraction using explicit context conditioningproposed in [30].

A. Named Entity Recognition Architecture

A sequence tagging problem such as NER can be formulatedas maximizing the conditional probability distribution overtags y given an input sequence x , and model parameters θ . P ( y | x , θ ) = T (cid:89) t =1 P ( y t | x t , y t − , θ ) (1) T is the length of the sequence, and y t − are tags for theprevious words. The architecture we use as a foundation is thatof [31], [32]. The model consists of three main components: (i)character encoder, (ii) word encoder, and (iii) decoder/tagger.

1) Encoders:

Given an input sequence x ∈ N T whosecoordinates indicate the words in the input vocabulary, weﬁrst encode the character level representation for each word.For each x t the corresponding sequence c ( t ) ∈ R L × e c ofcharacter embeddings is fed into an encoder, where L is thelength of a given word and e c is the size of the characterembedding. The character encoder employs two LSTM unitswhich produce −→ h ( t )1: l , and ←− h ( t )1: l , the forward and backward hid-den representations, respectively, where l is the last timestepin both sequences. We concatenate the last timestep of each ofthese as the ﬁnal encoded representation, h ( t ) c = [ −→ h ( t ) l ||←− h ( t ) l ] ,of x t at the character level.The output of the character encoder is concatenated witha pre-trained word embedding, m t = [ h ( t ) c || emb word ( x t )] ,which is used as the input to the word level encoder.Using learned character embeddings alongside word em-beddings has shown to be useful for learning word levelmorphology, as well as mitigating loss of representation forout-of-vocabulary words. Similar to the character encoder weuse a BiLSTM to encode the sequence at the word level. Theword encoder does not lose resolution, meaning the output ateach timestep is the concatenated output of both word LSTMs, h t = [ −→ h t ||←− h t ] .

2) Decoder and Tagger:

Finally, the concatenated output ofthe word encoder is used as input to the decoder, along withthe label embedding of the previous timestep. During trainingwe use teacher forcing [33] to provide the gold standard labelas part of the input. o t = LSTM ( o t − , [ h t || ˆ y t − ]) (2) ˆ y t = Softmax ( W o t + b s ) (3)where W ∈ R d × n , d is the number of hidden units in thedecoder LSTM, and n is the number of tags. The model istrained in an end-to-end fashion using a standard cross-entropyobjective.

3) Named Entity Recognition Decoder Model:

Our decodermodel provides more context to trait detection by adding anadditional input, which is the softmax output from entityextraction. We refer to this architecture as the ConditionalSoftmax Decoder as shown in Fig. 1 [29]. Thus, the modelearns more about the input as well as the label distributionfrom entity extraction prediction. As an example, we usenegation only for problem entity in the i2b2 dataset. Providingthe entity prediction distribution helps the negation model tomake better predictions. The negation model learns that ifthe prediction probability is not inclined towards the problementity, then it should not predict negation irrespective of theword representation. ˆ y Entityt , SoftOut

Entityt = Softmax

Ent ( W Ent o t + b s ) (4) ˆ y Negt = Softmax

Neg ( W Neg [ o t , SoftOut

Entityt ] + b s ) (5)where, SoftOut Entityt is the softmax output of the entity attime step t .Readers are referred to [29] for more detailed discussion onthe conditional softmax decoder model. Fig. 1. Conditional softmax decoder model

B. Relationship Extraction Architecture

The extracted entities are not very meaningful by them-selves, specially in the healthcare domain. For instance, it isimportant to know if the procedure was performed bilaterally,on the left or right side. Knowing the correct location willresult in more accurate and reliable billing and reimbursement.Hence, it is important to identify the relationships among thoseclinical entities.The RE model architecture is described in [30], but wereiterate some of the important details here. Relationships aredeﬁned between two entities, which we refer to as head andtail entity. To extract such relationships we proposed relationextraction using explicit context conditioning, where two targetentities (head and tail) can be explicitly connected via acontext token also known as second order relations. Similarto Bi-afﬁne Relation Attention Networks (BRAN) [24], weﬁrst compute the representations for both the head, e headi ,and tail, e taili , entities, which are then passed through twomulti-layer perceptron (MLP-1) to obtain ﬁrst-order relationscores, score (1) ( p head , p tail ) , as shown in Fig. 2. We also pass e headi and e taili through MLP-2 to obtain second-order relationscores, score (2) ( p head , p tail ) , where p head and p tail are theindices for the head and tail entities. The motivation for addingMLP-2 was driven by the need for representations focused onestablishing relations with context tokens, as opposed to ﬁrst-order relations. At the end, the ﬁnal score for relation between two entities is given as a weighted sum of ﬁrst and secondorder scores. Fig. 2. Relationship extraction model

III. E

XPERIMENTS

A. Dataset

We evaluated our model on two datasets. First is the 2010i2b2/VA challenge dataset for “test, treatment, problem” (TTP)entity extraction and assertion detection, herein referred to as i2b2 . Unfortunately, only part of this dataset was made publicafter the challenge, therefore we cannot directly comparewith NegEx and ABoW results. We followed the originaldata split from [34] of 170 notes for training and 256 fortesting. The second dataset is proprietary and consists of 4,200de-identiﬁed clinical notes with medical conditions, hereinreferred to as

DCN .The i2b2 dataset contains six predeﬁned relations typesincluding

TrCP (Treatment Causes Problem),

TrIP (TreatmentImproves Problem),

TrWP (Treatment Worsens Problem) andone negative relation. The DCN dataset contains seven prede-ﬁned relationship types such as with dosage , every and onenegative relation. A summary of the datasets is presented inTable I. TABLE IO

VERVIEW OF THE I B AND

DCN

DATASETS . i2b2 DCN Notes 426 4,200Tokens 416K 1.5MEntity Tags 13 37Relations 3,653 270,000Relation Types 6 7

B. NER Model Settings

Word, character and tag embeddings are 100, 25, and50 dimensions, respectively. Word embeddings are initializedsing GloVe, while character and tag embeddings are learned.Character and word encoders have 50 and 100 hidden units,respectively, while the decoder LSTM has a hidden size of50. Dropout is used after every LSTM, as well as for wordembedding input. We use Adam as an optimizer. Our model isbuilt using MXNet. Hyperparameters are tuned using BayesianOptimization [35].

C. RE Model Settings

Our ﬁnal network had two encoder layers, with 8 attentionheads in each multi-head attention sublayer and 256 ﬁlters forconvolution layers in position-wise feedforward sublayer. Weused dropout with probability 0.3 after the embedding layer,head/tail MLPs and the output of each encoder sublayer. Wealso used a word dropout with probability 0.15 before theembedding layer. IV. R

ESULTS

A. NER and Trait Detection Results

We report the results for NER and negation detection forboth the i2b2 and DCN datasets in Table II. We observe thatour purposed conditional softmax decoder approach outper-forms the best model [34] on the i2b2 challenge.We compare our models for negation detection againstNegEx [13] and ABoW [19], which has the best results for thenegation detection task on i2b2 dataset. Conditional softmaxdecoder model outperforms both NegEx and ABoW (Table II).Low performance of NegEx and ABoW is mainly attributedto the fact that they use ontology lookup to index ﬁndingsand negation regular expression search within a ﬁxed scope. Asimilar trend was observed in the medication condition dataset(Table II). The important thing to note is the low F1 score forNegEx. This can primarily be attributed to abbreviations andmisspellings in clinical notes which can not be handled wellby rule-based systems.

TABLE IIT

EST SET PERFORMANCE WITH MULTI - TASK I B AND

DCN

DATASETS

Data Model Precision Recall F Named Entityi2b2 LSTM:CRF [34] 0.844 0.834 0.839

Conditional Decoder

DCN LSTM:CRF [34] 0.82 0.84 0.83

Conditional Decoder

Negationi2b2 Negex [13] 0.896 0.799 0.845ABoW Kernel [19] 0.899 0.900 0.900

Conditional Decoder

DCN Negex [13] 0.403 0.932 0.563

Conditional Decoder

We also evaluated the conditional softmax decoder in lowresource settings, where we used a sample of our training data.We observed that conditional decoder is more robust in lowresource settings than other approaches as we reported in [29].

B. RE Results

To show the beneﬁts of using second-order relations wecompared our models performance to BRAN. The two modelsare different in the weighted addition of second-order relationscores. We tune over this weight parameter on the dev setand observed an improvement in MacroF1 score from 0.712to 0.734 over DCN data and from 0.395 to 0.407 over i2b2data. For further comparison a recently published model calledHybrid Deep Learning Approach (HDLA) [36] reported amacroF1 score of 0.388 on the same i2b2 dataset. It shouldbe mentioned that HDLA used syntactic parsers for featureextraction but we do not use any such external tools.Table III summarizes the performance of our relationshipmodel (+SOR) using second-order relations compared toBRAN and HDLA. We refer the readers to [30] for moredetailed analysis of our relationship extraction model.

TABLE IIIT

EST SET PERFORMANCE OF RELATION EXTRACTION ON I B AND

DCN

DATASETS

Data Model Precision Recall F i2b2 HDLA [36] 0.378 0.422 0.388BRAN [24] 0.396 0.403 0.395+SOR 0.424 0.419 DCN BRAN [24] 0.614 0.85 0.712+SOR 0.643 0.879

V. I

MPLEMENTATION

Comprehend Medical APIs run in Amazon’s proven, high-availability data centers, with service stack replication con-ﬁgured across three facilities in each AWS region to providefault tolerance in the event of a server failure or AvailabilityZone outage. Additionally, Comprehend Medical ensures thatsystem artifacts are encrypted in transit and user data is passthrough and will not be stored in any part of the system.Comprehend Medical is available through a Graphical Userinterface (GUI) within the AWS console and can be accessedusing the Java and Python SDK. Comprehend Medical offerstwo APIs: 1) the NERe API which returns all the extractednamed entities, their traits and the relationships between them,2) the PHId API which returns just the protected health infor-mation contained in the text. Developers can easily integrateComprehend Medical into their data processing pipelines asshown in Fig. 4.The only input needed by Comprehend Medical is thetext to be analyze. No conﬁguration, customization or otherparameters needed, making Comprehend Medical easy to useby anyone who has access to AWS. Comprehend Medicaloutputs the results in JavaScript Object Notation (JSON),which contains named entities, begin offset, end offset, traits,conﬁdence scores and the relationships between the entities.Using the GUI (Fig. 3) users can quickly visualize their results. ig. 3. Rendering of entities, traits and relations by Comprehend Medical UI

VI. E

NTITIES , T

RAITS AND R ELATIONSHIPS

A. Entities

Named entity mentions found in narrative notes are taggedwith entity types listed in Table IV. The entities are dividedinto ﬁve categories: Anatomy, Medical Condition, Medication,PHI and TTP. Comprehend Medical is HIPAA eligible andtherefore it supports HIPAA identiﬁers. Some of those iden-tiﬁers are grouped under one identiﬁer. For instance, ContactPoint covers phone and fax numbers, and ID covers socialsecurity number, medical record number, account number,certiﬁcate or license number and vehicle or device number.An example input text is shown in Fig. 3.

TABLE IVE

NTITIES EXTRACTED BY C OMPREHEND M EDICAL

Category Entity

Anatomy DirectionSystem Organ SiteMedical Condition Dx NameAcuityMedication Brand NameGeneric NameDosageDurationFrequencyFormRoute or ModeStrengthRatePHI AgeDateNameContact PointEmailURLIdentiﬁerAddressProfessionTTP Test NameTest ValueTest UnitProcedure NameTreatment Name

B. Traits

Comprehend Medical covers four traits, listed in Table V.Negation asserts the presence or absence of a Dx Name and whether or not the individual is taking the medication.Dx Name has three additional traits: Diagnosis, Sign andSymptom.

Diagnosis identiﬁes an illness or a disease.

Sign isan objective evidence of disease and it is a phenomenon thatis detected by a physician or a nurse.

Symptom is a subjectiveevidence of disease and it is phenomenon that is observed bythe individual affected by the disease. An example of traits isshown in Fig. 3.

TABLE VT

RAITS EXTRACTED BY C OMPREHEND M EDICAL

Trait Entity

Negation Brand/Generic Name, Dx NameDiagnosis Dx NameSign Dx NameSymptom Dx Name

C. Relationships

A relationship is deﬁned between a pair of entities in theMedication and TTP categories (Table VI). One of the entitiesin a relationship is the head while the other is the tail entity.In Medication, Generic and Brand Name are the head entity,which can have relationships to tail entities such as Strengthand Dosage. An example of relations is shown in Fig. 3.

TABLE VIR

ELATIONSHIPS EXTRACTED BY C OMPREHEND M EDICAL

Head Entity Tail Entity

Brand/Generic Name DosageDurationFrequencyFormRoute or ModeStrengthTest Name Test ValueTest Unit

VII. U SE C ASES

Comprehend Medical reduces the cost, time and effort ofprocessing large amounts of unstructured medical text withhigh accuracy, making it possible to pursue use cases such ig. 4. Integrating Comprehend Medical into data processing pipeline as clinical trial management, clinical decision support andrevenue cycle management.

A. Clinical Trial Management

It can take about 10-15 years for a treatment to be developedfrom discovery to registration with the Federal Drug Admin-istration (FDA). During that time, research organization canspend six years on clinical trails. Despite the number of yearit takes to design those clinical trails, 90% of all clinical trailsfail to enroll patients within the targeted time and are forcedto extend the enrollment period, 75% of trails fail to enroll thetargeted number of patients and 27% fail to enroll any subjects[37].Life sciences and clinical research organizations can speedup and optimize the process of recruiting patients into aclinical trial as extractions from unstructured text and medicalrecords can expedite the matching process. For instance,indexing patients based on medication, medical conditionand treatments can help with quickly identifying the rightparticipants for a lifesaving clinical trial.Fred Hutchinson Cancer Research Center (FHCRC) uti-lized Comprehend Medical in their clinical trail management.FHCRC was spending 1.5 hours to annotate a single patientnote, about 2.5 hours on manual chart abstraction per patientand per day they can process charts for about three patients.By using Comprehend Medical, FHCRC was able to annotate9,642 patient notes per hour.

B. Patient and Population Health Analytics

Population health focuses on the discovery of factors andconditions for the a health of a population over time. It aimsat identifying patterns of occurrence and knowledge discoveryin order to develop polices and actions to improve health of agroup or population [38].Examples of population health analytics include patientstratiﬁcation, readmission prediction and mortality measure-ment. Automatically unlocking important information from the narrative text is invaluable to organizations participating invalue-based healthcare and population health. Structured med-ical records do not fully identify patients with medical historyof diabetes, which results in an underestimation of diseaseprevalence [39]. The inability to identify patient cohorts fromstructured data represents a problem for the development ofpopulation health and clinical management systems. It alsonegatively affects the accuracy of identifying high-risk andhigh-cost patients [40]. Ref. [41] identiﬁed three areas thatmay have an impact on readmission, but that are poorlydocumented in the EMR system, thus the need for NLP-basedsolutions to extract such information. Also, some symptomsand illness characteristics that are necessary to develop reliablepredictors are missing in the coded billing data [42]. Ref. [43]performed mortality prediction and reported a 2% increasein the Area Under the Curve when using features from bothstructured data and concepts extracted from narrative notesand [44] found that the predictive power of suicide risk factorsfound in EMR systems become asymptotic, leading them toincorporate analysis on clinical notes to predict risk of suicide.As seen from the examples above, NLP-based approachescan assist in identifying concepts that are incorrectly codiﬁedor are missing in EMR system. Population health platformscan expand their risk analytics to leveraged unstructured clin-ical data for prediction of high risk patients and epidemiologicstudies on outbreaks of diseases.

C. Revenue Cycle Management

In healthcare, Revenue Cycle Management (RCM) is theprocess of collecting revenue and tracking claims from health-care providers including hospitals, outpatient clinics, nursinghomes, dentist clinics and physician groups [45].RCM process has been inefﬁcient as most healthcare sys-tems use rule-based approaches and manual audits of doc-uments for billing and coding purposes [46]. Rule-basedsystems are time consuming, expensive to maintain, requireattention and frequent human intervention. Due to these inef-ective processes, data coded at point care, which is the sourcefor claims data, can contain errors and inconsistencies.Coding is the process of encoding the details of patientencounters into standardized terminology [38]. A study by [47]shows that 48 errors found in 38 of the 106 ﬁnished consultantepisodes in urology and 71% of these errors are causedby inaccurate coding. Ref. [48] measured the consistency ofcoded data and found that some of these errors were signiﬁcantenough to change the diagnostic related group.RCM companies can use Comprehend Medical to enhanceexisting workﬂows around computer assisted coding, and val-idate submitted codes by providers. In addition, claim audits,which often requires ﬁnding text evidence for submitted claimsand is done manually, could be done more accurately andfaster.

D. Pharmacovigilance

The aim of pharmacovigilance is to monitor, detect andprevent adverse drug events (ADE) of medical drugs. Earlysystem used for pharmacovigilance is the spontaneous re-porting system (SRS), which provided safety information ondrugs [49]. However, SRS databases are incomplete, inaccurateand contain biased reporting [49], [50]. A newer generationof databases was created that contains clinical informationfor large patient population, such as the Intensive MedicinesMonitoring Program (IMMP) and the General Practice Re-search Database (GPRD). Such databases included data fromstructured ﬁelds and forms, but very small amount of detailsare stored in the structured ﬁelds. Researchers then startedto look into EHR data for pharmacovigilance. However, mostvaluable information in patient records are contained in theunstructured text.Using NLP to extract information from narrative text haveshown improvement in ADE detection and pharmacovigilance[51]. Ref. [50], [52] also reported that ADEs are underreportedin EHR systems and they used NLP techniques to enhanceADE detection. VIII. C

ONCLUSION

Studies have shown that narrative notes are more expressive,more engaging and captures patient’s story more accuratelycompared to the structured EHR data. They also contain morenaturalistic prose, more reliable in identifying patients with agiven disease and more understandable to healthcare providersreviewing those notes, which urges the need for a moreaccurate, intuitive and easy to use NLP system. In this paperwe presented Comprehend Medical, a HIPAA eligible AmazonWeb Service for medical language entity recognition andrelationship extraction. Comprehend Medical supports severalentity types divided into ﬁve different categories (Anatomy,Medical Condition, Medication, Protected Health Information,Treatment, Test and Procedure) and four traits (Negation, Di-agnosis, Sign, Symptom). Comprehend Medical uses state-of-the-art deep learning models and provides two APIs, the NEReand PHId API. Comprehend Medical also comes with fourdifferent interfaces (CLI, Java SDK, Python SDK and GUI) and contrary to many other existing clinical NLP systems,it does not require dependencies, conﬁguration or pipelinedcomponents customization.R

EFERENCES[1] S. T. Rosenbloom, J. C. Denny, H. Xu, N. Lorenzi, W. W. Stead, andK. B. Johnson, “Data from clinical notes: a perspective on the tensionbetween structure and ﬂexible documentation,”

Journal of the AmericanMedical Informatics Association , vol. 18, no. 2, pp. 181–186, mar 2011.[2] K. M. Fox, M. Reuland, W. G. Hawkes, J. R. Hebel, J. Hudson, S. I.Zimmerman, J. Kenzora, and J. Magaziner, “Accuracy of medical recordsin hip fracture.”

Journal of the American Geriatrics Society , vol. 46,no. 6, pp. 745–50, jun 1998.[3] K. A. Marill, E. S. Gauharou, B. K. Nelson, M. A. Peterson, R. L.Curtis, and M. R. Gonzalez, “Prospective, randomized trial of template-assisted versus undirected written recording of physician records in theemergency department.”

Annals of emergency medicine , vol. 33, no. 5,pp. 500–9, may 1999.[4] A. M. van Ginneken, “The physician’s ﬂexible narrative.”

Methods ofinformation in medicine , vol. 35, no. 2, pp. 98–100, jun 1996.[5] A. J. Cawsey, B. L. Webber, and R. B. Jones, “Natural languagegeneration in health care.”

Journal of the American Medical InformaticsAssociation : JAMIA , vol. 4, no. 6, pp. 473–82, 1997.[6] G. K. Savova, J. J. Masanz, P. V. Ogren, J. Zheng, S. Sohn, K. C. Kipper-Schuler, and C. G. Chute, “Mayo clinical Text Analysis and KnowledgeExtraction System (cTAKES): architecture, component evaluation andapplications.”

Journal of the American Medical Informatics Association: JAMIA , vol. 17, no. 5, pp. 507–13, jan 2010.[7] D. Ferrucci and A. Lally, “UIMA: an architectural approach to unstruc-tured information processing in the corporate research environment,”

Natural Language Engineering , vol. 10, no. 3-4, pp. 327–348, sep 2004.[8] J. Baldridge, “The Apache OpenNLP Project,”

URL:https://opennlp.apache.org/ , 2005.[9] O. Bodenreider, “The Uniﬁed Medical Language System (UMLS):integrating biomedical terminology,”

Nucleic Acids Research , vol. 32,no. 90001, pp. 267D–270, jan 2004.[10] A. R. Aronson and F.-M. Lang, “An overview of MetaMap: historicalperspective and recent advances.”

Journal of the American MedicalInformatics Association : JAMIA , vol. 17, no. 3, pp. 229–36, jan 2010.[11] D. Demner-Fushman, W. J. Rogers, and A. R. Aronson, “MetaMap Lite:an evaluation of a new Java implementation of MetaMap,”

Journal of theAmerican Medical Informatics Association , vol. 24, no. 4, p. ocw177,jan 2017.[12] H. Harkema, J. N. Dowling, and T. Thornblade, “ConText: An algorithmfor determining negation, experiencer, and temporal status from clinicalreports,”

Journal of Biomedical Informatics , vol. 42, no. 5, pp. 839–851,oct 2009.[13] W. W. Chapman, W. Bridewell, P. Hanbury, G. F. Cooper, and B. G.Buchanan, “A Simple Algorithm for Identifying Negated Findings andDiseases in Discharge Summaries,”

Journal of Biomedical Informatics ,vol. 34, no. 5, pp. 301–310, oct 2001.[14] E. Soysal, J. Wang, M. Jiang, Y. Wu, S. Pakhomov, H. Liu, andH. Xu, “CLAMP a toolkit for efﬁciently building customized clinicalnatural language processing pipelines,”

Journal of the American MedicalInformatics Association , vol. 25, no. 3, pp. 331–336, mar 2018.[15] J. Lafferty, A. McCallum, and F. C. N. Pereira, “Conditional RandomFields: Probabilistic Models for Segmenting and Labeling SequenceData,” in

Proceedings of the 18th International Conference on MachineLearning , vol. 951. Citeseer, 2001, pp. 282–289.[16] F. Fancellu, A. Lopez, and B. Webber, “Neural networks for negationscope detection,” in

Proceedings of the 54th Annual Meeting of theAssociation for Computational Linguistics (Volume 1: Long Papers) ,vol. 1, 2016, pp. 495–504.[17] L. Rumeng, N. Jagannatha Abhyuday, and Y. Hong, “A hybrid neuralnetwork model for joint prediction of presence and period assertionsof medical events in clinical notes,” in

AMIA Annual SymposiumProceedings , vol. 2017. American Medical Informatics Association,2017, p. 1149.[18] B. de Bruijn, C. Cherry, S. Kiritchenko, J. Martin, and X. Zhu,“Machine-learned solutions for three stages of clinical informationextraction: the state of the art at i2b2 2010,”

Journal of the AmericanMedical Informatics Association , vol. 18, no. 5, pp. 557–562, 2011.19] C. Shivade, M.-C. de Marneffe, E. Fosler-Lussier, and A. M. Lai,“Extending negex with kernel methods for negation detection in clinicaltext,” in

Proceedings of the Second Workshop on Extra-PropositionalAspects of Meaning in Computational Semantics (ExProM 2015) , 2015,pp. 41–46.[20] K. Cheng, T. Baldwin, and K. Verspoor, “Automatic negation andspeculation detection in veterinary clinical text,” in

Proceedings of theAustralasian Language Technology Association Workshop 2017 , 2017,pp. 70–78.[21] M. Miwa and M. Bansal, “End-to-end relation extraction using lstmson sequences and tree structures,” in

Proceedings of the 54th AnnualMeeting of the Association for Computational Linguistics (Volume 1:Long Papers) , vol. 1, 2016, pp. 1105–1116.[22] S. Zheng, F. Wang, H. Bao, Y. Hao, P. Zhou, and B. Xu, “Jointextraction of entities and relations based on a novel tagging scheme,”in

Proceedings of the 55th Annual Meeting of the Association forComputational Linguistics (Volume 1: Long Papers) . Vancouver,Canada: Association for Computational Linguistics, Jul. 2017, pp. 1227–1236.[23] H. Adel and H. Sch¨utze, “Global normalization of convolutional neuralnetworks for joint entity and relation classiﬁcation,” in

Proceedingsof the 2017 Conference on Empirical Methods in Natural LanguageProcessing . Copenhagen, Denmark: Association for ComputationalLinguistics, Sep. 2017, pp. 1723–1729.[24] P. Verga, E. Strubell, and A. McCallum, “Simultaneously self-attendingto all mentions for full-abstract biological relation extraction,” in

Proceedings of the 2018 Conference of the North American Chapterof the Association for Computational Linguistics: Human LanguageTechnologies, Volume 1 (Long Papers) . New Orleans, Louisiana:Association for Computational Linguistics, Jun. 2018, pp. 872–884.[25] F. Christopoulou, M. Miwa, and S. Ananiadou, “A walk-based model onentity graphs for relation extraction,” in

Proceedings of the 56th AnnualMeeting of the Association for Computational Linguistics (Volume 2:Short Papers) . Melbourne, Australia: Association for ComputationalLinguistics, Jul. 2018, pp. 81–88.[26] Y. Su, H. Liu, S. Yavuz, I. Gur, H. Sun, and X. Yan, “Global relation em-bedding for relation extraction,” in

Proceedings of the 2018 Conferenceof the North American Chapter of the Association for ComputationalLinguistics: Human Language Technologies, Volume 1 (Long Papers) .New Orleans, Louisiana: Association for Computational Linguistics,Jun. 2018, pp. 820–830.[27] Y. Wang, L. Wang, M. Rastegar-Mojarad, S. Moon, F. Shen, N. Afzal,S. Liu, Y. Zeng, S. Mehrabi, S. Sohn, and H. Liu, “Clinical informationextraction applications: A literature review,”

Journal of BiomedicalInformatics , vol. 77, pp. 34–49, jan 2018.[28] P. Bhatia, K. Arumae, and E. B. Celikkaya, “Dynamic transfer learningfor named entity recognition,” in

International Workshop on HealthIntelligence . Springer, 2019, pp. 69–81.[29] P. Bhatia, B. Celikkaya, and M. Khalilia, “Joint Entity Extraction andAssertion Detection for Clinical Text,” in

Proceedings of the 57th AnnualMeeting of the Association for Computational Linguistics . Florence,Italy: Association for Computational Linguistics, 2019, pp. 954–959.[30] G. Singh and P. Bhatia, “Relation Extraction using Explicit ContextConditioning,” in

Proceedings of the 2019 Conference of the NorthAmerican Chapter of the Association for Computational Linguistics:Human Language Technologies . Minneapolis, Minnesota, USA: Asso-ciation for Computational Linguistics, 2019, pp. 1442–1447.[31] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer,“Neural architectures for named entity recognition,” in

Proceedings ofNAACL-HLT , 2016, pp. 260–270.[32] Z. Yang, R. Salakhutdinov, and W. Cohen, “Multi-task cross-lingualsequence tagging from scratch,” arXiv preprint arXiv:1603.06270 , 2016.[33] R. J. Williams and D. Zipser, “A Learning Algorithm for ContinuallyRunning Fully Recurrent Neural Networks,”

Neural Computation , vol. 1,no. 2, pp. 270–280, jun 1989.[34] R. Chalapathy, E. Z. Borzeshi, and M. Piccardi, “Bidirectional LSTM-CRF for Clinical Concept Extraction,” in

Proceedings of the ClinicalNatural Language Processing Workshop (ClinicalNLP) , Osaka, Japan,2016, pp. 7–12.[35] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimiza-tion of machine learning algorithms,” in

Advances in neural informationprocessing systems , 2012, pp. 2951–2959.[36] V. R. Chikka and K. Karlapalem, “A hybrid deep learning approach formedical relation extraction,” arXiv preprint arXiv:1806.11189 , 2018. [37] R. B. Gifﬁn, Y. Lebovitz, R. A. English, and Others,

Transformingclinical research in the United States: challenges and opportunities:workshop summary . National Academies Press, 2010.[38] K. Giannangelo and S. Fenton, “EHR’s effect on the revenue cyclemanagement Coding function.”

Journal of healthcare information man-agement : JHIM , vol. 22, no. 1, pp. 26–30, 2008.[39] L. Zheng, Y. Wang, S. Hao, A. Y. Shin, B. Jin, A. D. Ngo, M. S. Jackson-Browne, D. J. Feller, T. Fu, K. Zhang, X. Zhou, C. Zhu, D. Dai, Y. Yu,G. Zheng, Y.-M. Li, D. B. McElhinney, D. S. Culver, S. T. Alfreds,F. Stearns, K. G. Sylvester, E. Widen, and X. B. Ling, “Web-based Real-Time Case Finding for the Population Health Management of PatientsWith Diabetes Mellitus: A Prospective Validation of the Natural Lan-guage Processing-Based Algorithm With Statewide Electronic MedicalRecords.”

JMIR medical informatics , vol. 4, no. 4, p. e37, nov 2016.[40] D. W. Bates, S. Saria, L. Ohno-Machado, A. Shah, and G. Escobar,“Big Data In Health Care: Using Analytics To Identify And ManageHigh-Risk And High-Cost Patients,”

Health Affairs , vol. 33, no. 7, pp.1123–1131, jul 2014.[41] J. L. Greenwald, P. R. Cronin, V. Carballo, G. Danaei, and G. Choy,“A Novel Model for Predicting Rehospitalization Risk IncorporatingPhysical Function, Cognitive Status, and Psychosocial Support UsingNatural Language Processing,”

Medical Care , vol. 55, no. 3, pp. 261–266, mar 2017.[42] A. Rumshisky, M. Ghassemi, T. Naumann, P. Szolovits, V. M. Castro,T. H. McCoy, and R. H. Perlis, “Predicting early psychiatric readmissionwith natural language processing of narrative discharge summaries,”

Translational Psychiatry , vol. 6, no. 10, pp. e921–e921, oct 2016.[43] M. Jin, M. T. Bahadori, A. Colak, P. Bhatia, B. Celikkaya, R. Bhakta,S. Senthivel, M. Khalilia, D. Navarro, B. Zhang, T. Doman, A. Ravi,M. Liger, and T. Kass-hout, “Improving Hospital Mortality Predictionwith Medical Named Entities and Multimodal Learning,”

Neural Infor-mation Processing Systems workshop on Machine Learning for Health ,2018.[44] C. Poulin, B. Shiner, P. Thompson, L. Vepstas, Y. Young-Xu, B. Go-ertzel, B. Watts, L. Flashman, and T. McAllister, “Predicting the Riskof Suicide by Analyzing the Text of Clinical Notes,”

PLoS ONE , vol. 9,no. 1, p. e85733, jan 2014.[45] V. Mindel and L. Mathiassen, “Contextualist inquiry into IT-enabledhospital revenue cycle management: bridging research and practice,”

Journal of the Association for Information Systems , vol. 16, no. 12, p.1016, 2015.[46] P. Schouten, “Big data in health care: solving provider revenue leakagewith advanced analytics,”

Healthcare Financial Management , vol. 67,no. 2, pp. 40–43, feb 2013.[47] A. Ballaro, S. Oliver, and M. Emberton, “Do we do what they say wedo? Coding errors in urology,”

BJU International , vol. 85, no. 4, pp.389–391, mar 2000.[48] D. P. Lorence and I. A. Ibrahim, “Benchmarking variation in codingaccuracy across the United States.”

Journal of health care ﬁnance ,vol. 29, no. 4, pp. 29–42, 2003.[49] X. Wang, G. Hripcsak, M. Markatou, and C. Friedman, “Active Comput-erized Pharmacovigilance Using Natural Language Processing, Statis-tics, and Electronic Health Records: A Feasibility Study,”

Journal ofthe American Medical Informatics Association , vol. 16, no. 3, pp. 328–337, may 2009.[50] A. Henriksson, M. Kvist, H. Dalianis, and M. Duneld, “Identifyingadverse drug event information in clinical notes with distributionalsemantic representations of context,”

Journal of Biomedical Informatics ,vol. 57, pp. 333–349, oct 2015.[51] Y. Luo, W. K. Thompson, T. M. Herr, Z. Zeng, M. A. Berendsen,S. R. Jonnalagadda, M. B. Carson, and J. Starren, “Natural LanguageProcessing for EHR-Based Pharmacovigilance: A Structured Review,”

Drug Safety , vol. 40, no. 11, pp. 1075–1089, nov 2017.[52] N. Shang, H. Xu, T. C. Rindﬂesch, and T. Cohen, “Identifying plausibleadverse drug reactions using knowledge extracted from the literature,”