Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chaitanya Shivade is active.

Publication


Featured researches published by Chaitanya Shivade.


Journal of the American Medical Informatics Association | 2014

A review of approaches to identifying patient phenotype cohorts using electronic health records

Chaitanya Shivade; Preethi Raghavan; Eric Fosler-Lussier; Peter J. Embi; Noémie Elhadad; Stephen B. Johnson; Albert M. Lai

Objective To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype. Materials and methods We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included. Results Ninety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients. Discussion We observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems. Conclusions There are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses.


acm symposium on applied computing | 2012

Towards building large-scale distributed systems for twitter sentiment analysis

Vinh Ngoc Khuc; Chaitanya Shivade; Rajiv Ramnath; Jay Ramanathan

In recent years, social networks have become very popular. Twitter, a micro-blogging service, is estimated to have about 200 million registered users and these users create approximately 65 million tweets a day. Twitter users usually show their opinion about topics of their interest. The challenge is that each tweet is limited in 140 characters, and is hence very short. It may contain slang and misspelled words. Thus, it is difficult to apply traditional NLP techniques which are designed for working with formal languages, into Twitter domain. Another challenge is that the total volume of tweets is extremely high, and it takes a long time to process. In this paper, we describe a large-scale distributed system for real-time Twitter sentiment analysis. Our system consists of two components: a lexicon builder and a sentiment classifier. These two components are capable of running on a large-scale distributed system since they are implemented using a MapReduce framework and a distributed database model. Thus, our lexicon builder and sentiment classifier are scalable with the number of machines and the size of data. The experiments also show that our lexicon has a good quality in opinion extraction, and the accuracy of the sentiment classifier can be improved by combining the lexicon with a machine learning technique.


BMC Medical Informatics and Decision Making | 2014

Diagnosis-specific readmission risk prediction using electronic health data: a retrospective cohort study

Courtney Hebert; Chaitanya Shivade; Randi E. Foraker; Jared R. Wasserman; Caryn Roth; Hagop S. Mekhjian; Stanley Lemeshow; Peter J. Embi

BackgroundReadmissions after hospital discharge are a common occurrence and are costly for both hospitals and patients. Previous attempts to create universal risk prediction models for readmission have not met with success. In this study we leveraged a comprehensive electronic health record to create readmission-risk models that were institution- and patient- specific in an attempt to improve our ability to predict readmission.MethodsThis is a retrospective cohort study performed at a large midwestern tertiary care medical center. All patients with a primary discharge diagnosis of congestive heart failure, acute myocardial infarction or pneumonia over a two-year time period were included in the analysis.The main outcome was 30-day readmission. Demographic, comorbidity, laboratory, and medication data were collected on all patients from a comprehensive information warehouse. Using multivariable analysis with stepwise removal we created three risk disease-specific risk prediction models and a combined model. These models were then validated on separate cohorts.Results3572 patients were included in the derivation cohort. Overall there was a 16.2% readmission rate. The acute myocardial infarction and pneumonia readmission-risk models performed well on a random sample validation cohort (AUC range 0.73 to 0.76) but less well on a historical validation cohort (AUC 0.66 for both). The congestive heart failure model performed poorly on both validation cohorts (AUC 0.63 and 0.64).ConclusionsThe readmission-risk models for acute myocardial infarction and pneumonia validated well on a contemporary cohort, but not as well on a historical cohort, suggesting that models such as these need to be continuously trained and adjusted to respond to local trends. The poor performance of the congestive heart failure model may suggest that for chronic disease conditions social and behavioral variables are of greater importance and improved documentation of these variables within the electronic health record should be encouraged.


Journal of Biomedical Informatics | 2015

Comparison of UMLS terminologies to identify risk of heart disease using clinical notes

Chaitanya Shivade; Pranav Malewadkar; Eric Fosler-Lussier; Albert M. Lai

The second track of the 2014 i2b2 challenge asked participants to automatically identify risk factors for heart disease among diabetic patients using natural language processing techniques for clinical notes. This paper describes a rule-based system developed using a combination of regular expressions, concepts from the Unified Medical Language System (UMLS), and freely-available resources from the community. With a performance (F1=90.7) that is significantly higher than the median (F1=87.20) and close to the top performing system (F1=92.8), it was the best rule-based system of all the submissions in the challenge. We also used this system to evaluate the utility of different terminologies in the UMLS towards the challenge task. Of the 155 terminologies in the UMLS, 129 (76.78%) have no representation in the corpus. The Consumer Health Vocabulary had very good coverage of relevant concepts and was the most useful terminology for the challenge task. While segmenting notes into sections and lists has a significant impact on the performance, identifying negations and experiencer of the medical event results in negligible gain.


Journal of Biomedical Informatics | 2015

Textual inference for eligibility criteria resolution in clinical trials

Chaitanya Shivade; Courtney Hebert; Marcelo A. Lopetegui; Marie-Catherine de Marneffe; Eric Fosler-Lussier; Albert M. Lai

Clinical trials are essential for determining whether new interventions are effective. In order to determine the eligibility of patients to enroll into these trials, clinical trial coordinators often perform a manual review of clinical notes in the electronic health record of patients. This is a very time-consuming and exhausting task. Efforts in this process can be expedited if these coordinators are directed toward specific parts of the text that are relevant for eligibility determination. In this study, we describe the creation of a dataset that can be used to evaluate automated methods capable of identifying sentences in a note that are relevant for screening a patients eligibility in clinical trials. Using this dataset, we also present results for four simple methods in natural language processing that can be used to automate this task. We found that this is a challenging task (maximum F-score=26.25), but it is a promising direction for further research.


meeting of the association for computational linguistics | 2016

Addressing Limited Data for Textual Entailment Across Domains

Chaitanya Shivade; Preethi Raghavan; Siddharth Patwardhan

We seek to address the lack of labeled data (and high cost of annotation) for textual entailment in some domains. To that end, we first create (for experimental purposes) an entailment dataset for the clinical domain, and a highly competitive supervised entailment system, ENT, that is effective (out of the box) on two domains. We then explore self-training and active learning strategies to address the lack of labeled data. With self-training, we successfully exploit unlabeled data to improve over ENT by 15% F-score on the newswire domain, and 13% F-score on clinical data. On the other hand, our active learning experiments demonstrate that we can match (and even beat) ENT using only 6.6% of the training data in the clinical domain, and only 5.8% of the training data in the newswire domain.


Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi) | 2014

Precise Medication Extraction using Agile Text Mining

Chaitanya Shivade; James Cormack; David Milward

Agile text mining is widely used for commercial text mining in the pharmaceutical industry. It can be applied without building an annotated training corpus, so is well-suited to novel or one-off extraction tasks. In this work we wanted to see how efficiently it could be adapted for healthcare extraction tasks such as medication extraction. The aim was to identify medication names, associated dosage, route of administration, frequency, duration and reason, as specified in the 2009 i2b2 medication challenge. Queries were constructed based on 696 discharge summaries available as training data. Performance was measured on a test dataset of 251 unseen


north american chapter of the association for computational linguistics | 2015

Corpus-based discovery of semantic intensity scales

Chaitanya Shivade; Marie-Catherine de Marneffe; Eric Fosler-Lussier; Albert M. Lai

Gradable terms such as brief, lengthy and extended illustrate varying degrees of a scale and can therefore participate in comparative constructs. Knowing the set of words that can be compared on the same scale and the associated ordering between them (brief < lengthy < extended) is very useful for a variety of lexical semantic tasks. Current techniques to derive such an ordering rely on WordNet to determine which words belong on the same scale and are limited to adjectives. Here we describe an extension to recent work: we investigate a fully automated pipeline to extract gradable terms from a corpus, group them into clusters reflecting the same scale and establish an ordering among them. This methodology reduces the amount of required handcrafted knowledge, and can infer gradability of words independent of their part of speech. Our approach infers an ordering for adjectives with comparable performance to previous work, but also for adverbs with an accuracy of 71%. We find that the technique is useful for inferring such rankings among words across different domains, and present an example using biomedical text.


Proceedings of the Second Workshop on Extra-Propositional Aspects of Meaning in Computational Semantics (ExProM 2015) | 2015

Extending NegEx with Kernel Methods for Negation Detection in Clinical Text

Chaitanya Shivade; Marie-Catherine de Marneffe; Eric Fosler-Lussier; Albert M. Lai

NegEx is a popular rule-based system used to identify negated concepts in clinical notes. This system has been reported to perform very well by numerous studies in the past. In this paper, we demonstrate the use of kernel methods to extend the performance of NegEx. A kernel leveraging the rules of NegEx and its output as features, performs as well as the rule-based system. An improvement in performance is achieved if this kernel is coupled with a bag of words kernel. Our experiments show that kernel methods outperform the rule-based system, when evaluated within and across two different open datasets. We also present the results of a semi-supervised approach to the problem, which improves performance on the data.


meeting of the association for computational linguistics | 2016

Identification, characterization, and grounding of gradable terms in clinical text.

Chaitanya Shivade; Marie-Catherine de Marneffe; Eric Fosler-Lussier; Albert M. Lai

Gradable adjectives are inherently vague and are used by clinicians to document medical interpretations (e.g., severe reaction, mild symptoms). We present a comprehensive study of gradable adjectives used in the clinical domain. We automatically identify gradable adjectives and demonstrate that they have a substantial presence in clinical text. Further, we show that there is a specific pattern associated with their usage, where certain medical concepts are more likely to be described using these adjectives than others. Interpretation of statements using such adjectives is a barrier in medical decision making. Therefore, we use a simple probabilistic model to ground their meaning based on their usage in context.

Collaboration


Dive into the Chaitanya Shivade's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge