Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Prakash M. Nadkarni is active.

Publication


Featured researches published by Prakash M. Nadkarni.


Journal of the American Medical Informatics Association | 2011

Natural language processing: an introduction

Prakash M. Nadkarni; Lucila Ohno-Machado; Wendy W. Chapman

OBJECTIVES To provide an overview and tutorial of natural language processing (NLP) and modern NLP-system design. TARGET AUDIENCE This tutorial targets the medical informatics generalist who has limited acquaintance with the principles behind NLP and/or limited knowledge of the current state of the art. SCOPE We describe the historical evolution of NLP, and summarize common NLP sub-problems in this extensive field. We then provide a synopsis of selected highlights of medical NLP efforts. After providing a brief description of common machine-learning approaches that are being used for diverse NLP sub-problems, we discuss how modern NLP architectures are designed, with a summary of the Apache Foundations Unstructured Information Management Architecture. We finally consider possible future directions for NLP, and reflect on the possible impact of IBM Watson on the medical field.


Journal of the American Medical Informatics Association | 1999

Organization of Heterogeneous Scientific Data Using the EAV/CR Representation

Prakash M. Nadkarni; Luis N. Marenco; Roland Chen; Emmanouil Skoufos; Gordon M. Shepherd; Perry L. Miller

Entity-attribute-value (EAV) representation is a means of organizing highly heterogeneous data using a relatively simple physical database schema. EAV representation is widely used in the medical domain, most notably in the storage of data related to clinical patient records. Its potential strengths suggest its use in other biomedical areas, in particular research databases whose schemas are complex as well as constantly changing to reflect evolving knowledge in rapidly advancing scientific domains. When deployed for such purposes, the basic EAV representation needs to be augmented significantly to handle the modeling of complex objects (classes) as well as to manage interobject relationships. The authors refer to their modification of the basic EAV paradigm as EAV/CR (EAV with classes and relationships). They describe EAV/CR representation with examples from two biomedical databases that use it.


Journal of the American Medical Informatics Association | 2011

Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions

Wendy W. Chapman; Prakash M. Nadkarni; Lynette Hirschman; Leonard W. D'Avolio; Guergana Savova; Özlem Uzuner

This issue of JAMIA focuses on natural language processing (NLP) techniques for clinical-text information extraction. Several articles are offshoots of the yearly ‘Informatics for Integrating Biology and the Bedside’ (i2b2) (http://www.i2b2.org) NLP shared-task challenge, introduced by Uzuner et al ( see page 552 )1 and co-sponsored by the Veterans Administration for the last 2 years. This shared task follows long-running challenge evaluations in other fields, such as the Message Understanding Conference (MUC) for information extraction,2 TREC3 for text information retrieval, and CASP4 for protein structure prediction. Shared tasks in the clinical domain are recent and include annual i2b2 Challenges that began in 2006, a challenge for multi-label classification of radiology reports sponsored by Cincinnati Childrens Hospital in 2007,5 a 2011 Cincinnati Childrens Hospital challenge on suicide notes,6 and the 2011 TREC information retrieval shared task involving retrieval of clinical cases from narrative records.7 Although NLP research in the clinical domain has been active since the 1960s, progress in the development of NLP applications for clinical text has been slow and lags behind progress in the general NLP domain. There are several barriers to NLP development in the clinical domain, and shared tasks like the i2b2/VA Challenge address some of these barriers. Nevertheless, many barriers remain and unless the community takes a more active role in developing novel approaches for addressing the barriers, advancement and innovation will continue to be slow. Historically, there have been substantial barriers to NLP development in the clinical domain. These barriers are not unique to the clinical domain: they also occur in the fields of software engineering and general NLP. ### Lack of access to shared data Because of concerns regarding patient privacy and worry about revealing unfavorable institutional practices, hospitals and clinics have been extremely reluctant to allow access to clinical data for researchers from outside … Correspondence to Dr Wendy W Chapman, Department of Biomedical Informatics, University of California San Diego, 9500 Gilman Dr, Bldg 2 #0728, La Jolla, California, USA; wwchapman{at}ucsd.edu


Trends in Neurosciences | 1998

The Human Brain Project: neuroinformatics tools for integrating, searching and modeling multidisciplinary neuroscience data

Gordon M. Shepherd; Jason S. Mirsky; Matthew D. Healy; Michael S. Singer; Emmanouil Skoufos; Michael S. Hines; Prakash M. Nadkarni; Perry L. Miller

What is neuroinformatics? What is the Human Brain Project? Why should you care? Supported by a consortium of US funding agencies, the Human Brain Project aims to bring to the analysis of brain function the same advantages of Internet-accessible databases and database tools that have been crucial to the development of molecular biology and the Human Genome Project. The much greater complexity of neural data, however, makes this a far more challenging task. As a pilot project in this new initiative, we review some of the progress that has been made and indicate some of the problems, challenges and opportunities that lie ahead.


Journal of the American Medical Informatics Association | 2001

UMLS concept indexing for production databases: a feasibility study.

Prakash M. Nadkarni; Roland Chen; Cynthia Brandt

OBJECTIVES To explore the feasibility of using the National Library of Medicines Unified Medical Language System (UMLS) Metathesaurus as the basis for a computational strategy to identify concepts in medical narrative text preparatory to indexing. To quantitatively evaluate this strategy in terms of true positives, false positives (spuriously identified concepts) and false negatives (concepts missed by the identification process). METHODS Using the 1999 UMLS Metathesaurus, the authors processed a training set of 100 documents (50 discharge summaries, 50 surgical notes) with a concept-identification program, whose output was manually analyzed. They flagged concepts that were erroneously identified and added new concepts that were not identified by the program, recording the reason for failure in such cases. After several refinements to both their algorithm and the UMLS subset on which it operated, they deployed the program on a test set of 24 documents (12 of each kind). RESULTS Of 8,745 matches in the training set, 7,227 (82.6 percent ) were true positives, whereas of 1,701 matches in the test set, 1, 298 (76.3 percent) were true positives. Matches other than true positive indicated potential problems in production-mode concept indexing. Examples of causes of problems were redundant concepts in the UMLS, homonyms, acronyms, abbreviations and elisions, concepts that were missing from the UMLS, proper names, and spelling errors. CONCLUSIONS The error rate was too high for concept indexing to be the only production-mode means of preprocessing medical narrative. Considerable curation needs to be performed to define a UMLS subset that is suitable for concept matching.


International Journal of Medical Informatics | 2007

Guidelines for the Effective Use of Entity-Attribute-Value Modeling for Biomedical Databases

Valentin Dinu; Prakash M. Nadkarni

PURPOSE To introduce the goals of EAV database modeling, to describe the situations where entity-attribute-value (EAV) modeling is a useful alternative to conventional relational methods of database modeling, and to describe the fine points of implementation in production systems. METHODS We analyze the following circumstances: (1) data are sparse and have a large number of applicable attributes, but only a small fraction will apply to a given entity; (2) numerous classes of data need to be represented, each class has a limited number of attributes, but the number of instances of each class is very small. We also consider situations calling for a mixed approach where both conventional and EAV design are used for appropriate data classes. RESULTS AND CONCLUSIONS In robust production systems, EAV-modeled databases trade a modest data sub-schema for a complex metadata sub-schema. The need to design the metadata effectively makes EAV design potentially more challenging than conventional design.


Journal of the American Medical Informatics Association | 2000

Exploring performance issues for a clinical database organized using an entity-attribute-value representation.

Roland Chen; Prakash M. Nadkarni; Luis N. Marenco; Forrest W. Levin; Joseph Erdos; Perry L. Miller

BACKGROUND The entity-attribute-value representation with classes and relationships (EAV/CR) provides a flexible and simple database schema to store heterogeneous biomedical data. In certain circumstances, however, the EAV/CR model is known to retrieve data less efficiently than conventionally based database schemas. OBJECTIVE To perform a pilot study that systematically quantifies performance differences for database queries directed at real-world microbiology data modeled with EAV/CR and conventional representations, and to explore the relative merits of different EAV/CR query implementation strategies. METHODS Clinical microbiology data obtained over a ten-year period were stored using both database models. Query execution times were compared for four clinically oriented attribute-centered and entity-centered queries operating under varying conditions of database size and system memory. The performance characteristics of three different EAV/CR query strategies were also examined. RESULTS Performance was similar for entity-centered queries in the two database models. Performance in the EAV/CR model was approximately three to five times less efficient than its conventional counterpart for attribute-centered queries. The differences in query efficiency became slightly greater as database size increased, although they were reduced with the addition of system memory. The authors found that EAV/CR queries formulated using multiple, simple SQL statements executed in batch were more efficient than single, large SQL statements. CONCLUSION This paper describes a pilot project to explore issues in and compare query performance for EAV/CR and conventional database representations. Although attribute-centered queries were less efficient in the EAV/CR model, these inefficiencies may be addressable, at least in part, by the use of more powerful hardware or more memory, or both.


Journal of the American Medical Informatics Association | 2000

WebEAV: automatic metadata-driven generation of web interfaces to entity-attribute-value databases.

Prakash M. Nadkarni; Cynthia M. Brandt; Luis N. Marenco

The task of creating and maintaining a front end to a large institutional entity-attribute-value (EAV) database can be cumbersome when using traditional client-server technology. Switching to Web technology as a delivery vehicle solves some of these problems but introduces others. In particular, Web development environments tend to be primitive, and many features that client-server developers take for granted are missing. WebEAV is a generic framework for Web development that is intended to streamline the process of Web application development for databases having a significant EAV component. It also addresses some challenging user interface issues that arise when any complex system is created. The authors describe the architecture of WebEAV and provide an overview of its features with suitable examples.


Journal of the American Medical Informatics Association | 2000

Exploring the Degree of Concordance of Coded and Textual Data in Answering Clinical Queries from a Clinical Data Repository

H. David Stein; Prakash M. Nadkarni; Joseph Erdos; Perry L. Miller

OBJECTIVE To query a clinical data repository (CDR) for answers to clinical questions to determine whether different types of fields (coded and free text) would yield confirmatory, complementary, or conflicting information and to discuss the issues involved in producing the discrepancies between the fields. METHODS The appropriate data fields in a subset of a CDR (5,135 patient records) were searched for the answers to three questions related to surgical procedures. Each search included at least one coded data field and at least one free-text field. The identified free-text records were then searched manually to ensure correct interpretation. The fields were then compared to determine whether they agreed with each other, were supportive of each other, contained no entry (absence of data), or were contradictory. RESULTS The degree of concordance varied greatly according to the field and the question asked. Some fields were not granular enough to answer the question. The free-text fields often gave an answer that was not definitive. Absence of data was most logically interpreted in some cases as lack of completion of data and in others as a negative answer. Even with a question as specific as which side a hernia was on, contradictory data were found in 5 to 8 percent of the records. CONCLUSIONS Using the data in the CDR to answer clinical questions can yield significantly disparate results depending on the question and which data fields are searched. A database cannot just be queried in automated fashion and the results reported. Both coded and textual fields must be searched to obtain the fullest assessment. This can be expected to result in information that may be confirmatory, complementary, or conflicting. To yield the most accurate information possible, final answers to questions require human judgment and may require the gathering of additional information.


Community Genetics | 2003

The Cancer Genetics Network: Recruitment results and pilot studies

Hoda Anton-Culver; Argyrios Ziogas; Deborah J. Bowen; Dianne M. Finkelstein; Constance A. Griffin; James W. Hanson; Claudine Isaacs; Carol Kasten-Sportes; Geraldine P. Mineau; Prakash M. Nadkarni; Barbara K. Rimer; Joellen M. Schildkraut; Louise C. Strong; Barbara L. Weber; Deborah M. Winn; Robert A. Hiatt; Susan G. Nayfield

Objective: The National Cancer Institute established the Cancer Genetics Network (CGN) to support collaborative investigations into the genetic basis of cancer susceptibility, explore mechanisms to integrate this new knowledge into medical practice, and identify ways of addressing the associated psychosocial, ethical, legal, and public health issues. Subjects and Methods: The CGN has developed the complex infrastructure required to support the projects, including the establishment of guidelines and policies, uniform methods, standard questionnaires to be used by all of the centers, and a standard format for submission of data to the Informatics Center. Cancer patients and their family members have been invited to enroll and be included in a pool of potential study participants. The Information Technology Group is responsible for support of the design, implementation, and maintenance of the multicenter Network-wide research protocols. Results: As of January 2004, the CGN contained data on 23,995 probands (participants) and 425,798 family members. As a resource for cancer genetic studies, the CGN has a large number of probands and first-degree relatives with and without cancer and with multiple ethnicities. Different study designs can be used including case-control, case-case, and family studies. Conclusions: The unique resources of the CGN are available for studies on cancer genetic susceptibility, translational research, and behavioral research. The CGN is now at a point where approved collaborators may have access to enrolled patients and their families for special studies, as well as to the clinical, environmental and family cancer history data banked in the Informatics Center.

Collaboration


Dive into the Prakash M. Nadkarni's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chiquito J. Crasto

University of Alabama at Birmingham

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge