Vijay Garla
Yale University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vijay Garla.
Journal of the American Medical Informatics Association | 2011
Vijay Garla; Vincent Lo Re; Zachariah Dorey-Stein; Farah Kidwai; Matthew Scotch; Julie A. Womack; Amy C. Justice; Cynthia Brandt
BACKGROUND Open-source clinical natural-language-processing (NLP) systems have lowered the barrier to the development of effective clinical document classification systems. Clinical natural-language-processing systems annotate the syntax and semantics of clinical text; however, feature extraction and representation for document classification pose technical challenges. METHODS The authors developed extensions to the clinical Text Analysis and Knowledge Extraction System (cTAKES) that simplify feature extraction, experimentation with various feature representations, and the development of both rule and machine-learning based document classifiers. The authors describe and evaluate their system, the Yale cTAKES Extensions (YTEX), on the classification of radiology reports that contain findings suggestive of hepatic decompensation. RESULTS AND DISCUSSION The F(1)-Score of the system for the retrieval of abdominal radiology reports was 96%, and was 79%, 91%, and 95% for the presence of liver masses, ascites, and varices, respectively. The authors released YTEX as open source, available at http://code.google.com/p/ytex.
BMC Bioinformatics | 2012
Vijay Garla; Cynthia Brandt
BackgroundSemantic similarity measures estimate the similarity between concepts, and play an important role in many text processing tasks. Approaches to semantic similarity in the biomedical domain can be roughly divided into knowledge based and distributional based methods. Knowledge based approaches utilize knowledge sources such as dictionaries, taxonomies, and semantic networks, and include path finding measures and intrinsic information content (IC) measures. Distributional measures utilize, in addition to a knowledge source, the distribution of concepts within a corpus to compute similarity; these include corpus IC and context vector methods. Prior evaluations of these measures in the biomedical domain showed that distributional measures outperform knowledge based path finding methods; but more recent studies suggested that intrinsic IC based measures exceed the accuracy of distributional approaches. Limitations of previous evaluations of similarity measures in the biomedical domain include their focus on the SNOMED CT ontology, and their reliance on small benchmarks not powered to detect significant differences between measure accuracy. There have been few evaluations of the relative performance of these measures on other biomedical knowledge sources such as the UMLS, and on larger, recently developed semantic similarity benchmarks.ResultsWe evaluated knowledge based and corpus IC based semantic similarity measures derived from SNOMED CT, MeSH, and the UMLS on recently developed semantic similarity benchmarks. Semantic similarity measures based on the UMLS, which contains SNOMED CT and MeSH, significantly outperformed those based solely on SNOMED CT or MeSH across evaluations. Intrinsic IC based measures significantly outperformed path-based and distributional measures. We released all code required to reproduce our results and all tools developed as part of this study as open source, available under http://code.google.com/p/ytex. We provide a publicly-accessible web service to compute semantic similarity, available under http://informatics.med.yale.edu/ytex.web/.ConclusionsKnowledge based semantic similarity measures are more practical to compute than distributional measures, as they do not require an external corpus. Furthermore, knowledge based measures significantly and meaningfully outperformed distributional measures on large semantic similarity benchmarks, suggesting that they are a practical alternative to distributional measures. Future evaluations of semantic similarity measures should utilize benchmarks powered to detect significant differences in measure accuracy.
Journal of Biomedical Informatics | 2013
Vijay Garla; Caroline Taylor; Cynthia Brandt
OBJECTIVE To compare linear and Laplacian SVMs on a clinical text classification task; to evaluate the effect of unlabeled training data on Laplacian SVM performance. BACKGROUND The development of machine-learning based clinical text classifiers requires the creation of labeled training data, obtained via manual review by clinicians. Due to the effort and expense involved in labeling data, training data sets in the clinical domain are of limited size. In contrast, electronic medical record (EMR) systems contain hundreds of thousands of unlabeled notes that are not used by supervised machine learning approaches. Semi-supervised learning algorithms use both labeled and unlabeled data to train classifiers, and can outperform their supervised counterparts. METHODS We trained support vector machines (SVMs) and Laplacian SVMs on a training reference standard of 820 abdominal CT, MRI, and ultrasound reports labeled for the presence of potentially malignant liver lesions that require follow up (positive class prevalence 77%). The Laplacian SVM used 19,845 randomly sampled unlabeled notes in addition to the training reference standard. We evaluated SVMs and Laplacian SVMs on a test set of 520 labeled reports. RESULTS The Laplacian SVM trained on labeled and unlabeled radiology reports significantly outperformed supervised SVMs (Macro-F1 0.773 vs. 0.741, Sensitivity 0.943 vs. 0.911, Positive Predictive value 0.877 vs. 0.883). Performance improved with the number of labeled and unlabeled notes used to train the Laplacian SVM (pearsons ρ=0.529 for correlation between number of unlabeled notes and macro-F1 score). These results suggest that practical semi-supervised methods such as the Laplacian SVM can leverage the large, unlabeled corpora that reside within EMRs to improve clinical text classification.
Clinical Lung Cancer | 2013
Susan Alsamarai; Xiaopan Yao; Hilary C. Cain; B.W. Chang; Herta H. Chao; Donna M. Connery; Yanhong Deng; Vijay Garla; Laura S. Hunnibell; Anthony W. Kim; J. Antonio Obando; Caroline Taylor; George Tellides; Michal G. Rose
BACKGROUND Timeliness of care improves patient satisfaction and might improve outcomes. The CCCP was established in November 2007 to improve timeliness of care of NSCLC at the Veterans Affairs Connecticut Healthcare System (VACHS). PATIENTS AND METHODS We performed a retrospective cohort analysis of patients diagnosed with NSCLC at VACHS between 2005 and 2010. We compared timeliness of care and stage at diagnosis before and after the implementation of the CCCP. RESULTS Data from 352 patients were analyzed: 163 with initial abnormal imaging between January 1, 2005 and October 31, 2007, and 189 with imaging conducted between November 1, 2007 and December 31, 2010. Variables associated with a longer interval between the initial abnormal image and the initiation of therapy were: (1) earlier stage (mean of 130 days for stages I/II vs. 87 days for stages III/IV; P < .0001); (2) lack of cancer-related symptoms (145 vs. 60 days; P < .0001); (3) presence of more than 1 medical comorbidity (123 vs. 82; P = .0002); and (4) depression (126 vs. 98 days; P = .029). The percent of patients diagnosed at stages I/II increased from 32% to 48% (P = .006) after establishment of the CCCP. In a multivariate model adjusting for stage, histology, reason for imaging, and presence of primary care provider, implementation of the CCCP resulted in a mean reduction of 25 days between first abnormal image and the initiation of treatment (126 to 101 days; P = .015). CONCLUSION A centralized, multidisciplinary, hospital-based CCCP can improve timeliness of NSCLC care, and help ensure that early stage lung cancers are diagnosed and treated.
Journal of Biomedical Informatics | 2012
Vijay Garla; Cynthia Brandt
In this study we present novel feature engineering techniques that leverage the biomedical domain knowledge encoded in the Unified Medical Language System (UMLS) to improve machine-learning based clinical text classification. Critical steps in clinical text classification include identification of features and passages relevant to the classification task, and representation of clinical text to enable discrimination between documents of different classes. We developed novel information-theoretic techniques that utilize the taxonomical structure of the Unified Medical Language System (UMLS) to improve feature ranking, and we developed a semantic similarity measure that projects clinical text into a feature space that improves classification. We evaluated these methods on the 2008 Integrating Informatics with Biology and the Bedside (I2B2) obesity challenge. The methods we developed improve upon the results of this challenges top machine-learning based system, and may improve the performance of other machine-learning based clinical text classification systems. We have released all tools developed as part of this study as open source, available at http://code.google.com/p/ytex.
Journal of the American Medical Informatics Association | 2013
Vijay Garla; Cynthia Brandt
BACKGROUND Word sense disambiguation (WSD) methods automatically assign an unambiguous concept to an ambiguous term based on context, and are important to many text-processing tasks. In this study we developed and evaluated a knowledge-based WSD method that uses semantic similarity measures derived from the Unified Medical Language System (UMLS) and evaluated the contribution of WSD to clinical text classification. METHODS We evaluated our system on biomedical WSD datasets and determined the contribution of our WSD system to clinical document classification on the 2007 Computational Medicine Challenge corpus. RESULTS Our system compared favorably with other knowledge-based methods. Machine learning classifiers trained on disambiguated concepts significantly outperformed those trained using all concepts. CONCLUSIONS We developed a WSD system that achieves high disambiguation accuracy on standard biomedical WSD datasets and showed that our WSD system improves clinical document classification. DATA SHARING We integrated our WSD system with MetaMap and the clinical Text Analysis and Knowledge Extraction System, two popular biomedical natural language processing systems. All codes required to reproduce our results and all tools developed as part of this study are released as open source, available under http://code.google.com/p/ytex.
Bioinformatics | 2011
Vijay Garla; Yong Kong; Sebastian Szpakowski; Michael Krauthammer
Motivation: Next-generation sequencing technologies enable the identification of sequence variation in the genome and transcriptome. Differences between the reference genome and transcript libraries complicate the determination of the effect of genomic sequence variants on protein products; similarly, these differences complicate the mapping of sequence variants found in transcripts to their respective genomic position. We have developed MU2A, a publicly available web service for variant annotation that reconciles differences between the genome and transcriptome, enabling the rapid and accurate determination of the effects of genomic variants on protein products, and the mapping of variants detected in transcripts to genomic coordinates. The MU2A web service is available at http://krauthammerlab.med.yale.edu/mu2a. We have released MU2A as open source, available at http://code.google.com/p/mu2a/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
Molecular Microbiology | 2014
Soheil Rastgou Talemi; Therese Jacobson; Vijay Garla; Clara Navarrete; Annemarie Wagner; Markus J. Tamás; Jörg Schaber
Arsenic has a dual role as causative and curative agent of human disease. Therefore, there is considerable interest in elucidating arsenic toxicity and detoxification mechanisms. By an ensemble modelling approach, we identified a best parsimonious mathematical model which recapitulates and predicts intracellular arsenic dynamics for different conditions and mutants, thereby providing novel insights into arsenic toxicity and detoxification mechanisms in yeast, which could partly be confirmed experimentally by dedicated experiments. Specifically, our analyses suggest that: (i) arsenic is mainly protein‐bound during short‐term (acute) exposure, whereas glutathione‐conjugated arsenic dominates during long‐term (chronic) exposure, (ii) arsenic is not stably retained, but can leave the vacuole via an export mechanism, and (iii) Fps1 is controlled by Hog1‐dependent and Hog1‐independent mechanisms during arsenite stress. Our results challenge glutathione depletion as a key mechanism for arsenic toxicity and instead suggest that (iv) increased glutathione biosynthesis protects the proteome against the damaging effects of arsenic and that (v) widespread protein inactivation contributes to the toxicity of this metalloid. Our work in yeast may prove useful to elucidate similar mechanisms in higher eukaryotes and have implications for the use of arsenic in medical therapy.
Journal of Clinical Oncology | 2012
Tamar H. Taddei; Laura S. Hunnibell; Anne DeLorenzo; Mirta Rosa; Donna M. Connery; Donna Vogel; Vijay Garla; Caroline Taylor; Michal G. Rose
77 Background: VA Connecticut Healthcare System has developed a web-based, EMR-linked Cancer Care Tracking System (CCTS) to facilitate tracking and follow-up of patients with imaging abnormalities concerning for lung or liver cancer. The tracker was developed to facilitate the efforts of a multidisciplinary team at the center of which is a cancer navigator. METHODS CCTS was first envisioned in 2007 when VACT hired a care navigator and implemented a radiology coding system to identify potential cancers. This created the need for a tool to process abnormal images and track the clinical steps required to reach a definitive diagnosis and treatment plan. CCTS was initially used for lung cancers and was expanded to track hepatocellular carcinoma (HCC) in 2009 with additional funding. In addition to case discovery, it offers easy access to patient information with live links to the VA EMR, a surveillance feature, and scheduling, alerting, and reporting functions. In 2011, the system was enhanced with a natural language processing (NLP) program that automatically identifies radiology reports describing potentially malignant lung or liver lesions. RESULTS CCTS has been in daily operation since February 2010, with 1,778 patients and 2,503 patients tracked in 2010 and 2011, respectively. Addition of NLP technology significantly increases the accuracy of identification of patients with lung or liver nodules. The NLP system identified 21% of all new cases with potential malignancies whose management could have been delayed through coding omissions or errors. Benefits of CCTS and our cancer care coordination program have included a decrease of 25 days in the time from abnormal image to treatment of lung cancer, a significant increase in the diagnosis of stage I/II lung cancers from 32% to 48%, and an increase in the incidence of liver cancer from 1% to 5% of all cancers at VACT. CONCLUSIONS A web-based, EMR-linked cancer care tracking system (CCTS) improves cancer detection, prevents loss to follow-up, provides a safety net for radiology coding omissions or errors, and improves provider efficiency. CCTS is an innovative tool to support multidisciplinary cancer care and has broad applicability to any electronic medical record.
ieee international conference on healthcare informatics, imaging and systems biology | 2012
Vijay Garla; Cynthia Brandt
Motivation: Word Sense Disambiguation (WSD) methods automatically assign an unambiguous concept to an ambiguous term based on context, and are important to many text processing tasks. In this study, we developed and evaluated a knowledge-based WSD method that uses semantic similarity measures derived from the Unified Medical Language System (UMLS), and we evaluated the contribution of WSD to clinical text classification. Results: We evaluated our system on biomedical WSD datasets; our system compares favorably to other knowledge-based methods. We evaluated the contribution of our WSD system to clinical document classification on the 2007 Computational Medicine Challenge corpus. Machine learning classifiers trained on disambiguated concepts significantly outperformed those trained using all concepts. Availability: We integrated our WSD system with MetaMap and cTAKES, two popular biomedical natural language processing systems. We released all code required to reproduce our results and all tools developed as part of this study as open source, available under http://code.google.com/p/ytex.