Ritu Khare
Drexel University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ritu Khare.
international conference on management of data | 2010
Ritu Khare; Yuan An; Il-Yeol Song
This paper presents a survey on the major approaches to search interface understanding. The Deep Web consists of data that exist on the Web but are inaccessible via text search engines. The traditional way to access these data, i.e., by manually filling-up HTML forms on search interfaces, is not scalable given the growing size of Deep Web. Automatic access to these data requires an automatic understanding of search interfaces. While it is easy for a human to perceive an interface, machine processing of an interface is challenging. During the last decade, several works addressed the automatic interface understanding problem while employing a variety of understanding strategies. This paper presents a survey conducted on the key works. This is the first survey in the field of search interface understanding. Through an exhaustive analysis, we organize the works on a 2-D graph based on the underlying database information extracted and based on the technique employed.
Briefings in Bioinformatics | 2016
Ritu Khare; Benjamin M. Good; Robert Leaman; Andrew I. Su; Zhiyong Lu
The use of crowdsourcing to solve important but complex problems in biomedical and clinical sciences is growing and encompasses a wide variety of approaches. The crowd is diverse and includes online marketplace workers, health information seekers, science enthusiasts and domain experts. In this article, we review and highlight recent studies that use crowdsourcing to advance biomedicine. We classify these studies into two broad categories: (i) mining big data generated from a crowd (e.g. search logs) and (ii) active crowdsourcing via specific technical platforms, e.g. labor markets, wikis, scientific games and community challenges. Through describing each study in detail, we demonstrate the applicability of different methods in a variety of domains in biomedical research, including genomics, biocuration and clinical research. Furthermore, we discuss and highlight the strengths and limitations of different crowdsourcing platforms. Finally, we identify important emerging trends, opportunities and remaining challenges for future crowdsourcing research in biomedicine.
data warehousing and olap | 2007
Il-Yeol Song; Ritu Khare; Bing Dai
The star schema is widely accepted as the de facto data model for data warehouse design. A popular approach for developing a star schema is to develop it from an entity-relationship diagram with some heuristics. Most of the existing approaches analyze the semantics of an ERD to generate a star schema. In this paper, we present the SAMSTAR method, which semi-automatically generates star schemas from an ERD by analyzing its semantics as well as structure. The novel features of SAMSTAR are (1) the use of the notion of Connection Topology Value (CTV) in identifying the candidates of facts and dimensions and (2) the use of Annotated Dimensional Design Patterns (A_DDP) as well as WordNet to extend the list of dimensions. We illustrate our method by applying it to the examples from existing literature. We prove that the outputs of our method are a superset of those of the existing methods. The SAMSTAR method simplifies the work of experienced designers and gives a smooth head-start to novices.
Journal of Biomedical Informatics | 2015
Robert Leaman; Ritu Khare; Zhiyong Lu
BACKGROUND Identifying key variables such as disorders within the clinical narratives in electronic health records has wide-ranging applications within clinical practice and biomedical research. Previous research has demonstrated reduced performance of disorder named entity recognition (NER) and normalization (or grounding) in clinical narratives than in biomedical publications. In this work, we aim to identify the cause for this performance difference and introduce general solutions. METHODS We use closure properties to compare the richness of the vocabulary in clinical narrative text to biomedical publications. We approach both disorder NER and normalization using machine learning methodologies. Our NER methodology is based on linear-chain conditional random fields with a rich feature approach, and we introduce several improvements to enhance the lexical knowledge of the NER system. Our normalization method - never previously applied to clinical data - uses pairwise learning to rank to automatically learn term variation directly from the training data. RESULTS We find that while the size of the overall vocabulary is similar between clinical narrative and biomedical publications, clinical narrative uses a richer terminology to describe disorders than publications. We apply our system, DNorm-C, to locate disorder mentions and in the clinical narratives from the recent ShARe/CLEF eHealth Task. For NER (strict span-only), our system achieves precision=0.797, recall=0.713, f-score=0.753. For the normalization task (strict span+concept) it achieves precision=0.712, recall=0.637, f-score=0.672. The improvements described in this article increase the NER f-score by 0.039 and the normalization f-score by 0.036. We also describe a high recall version of the NER, which increases the normalization recall to as high as 0.744, albeit with reduced precision. DISCUSSION We perform an error analysis, demonstrating that NER errors outnumber normalization errors by more than 4-to-1. Abbreviations and acronyms are found to be frequent causes of error, in addition to the mentions the annotators were not able to identify within the scope of the controlled vocabulary. CONCLUSION Disorder mentions in text from clinical narratives use a rich vocabulary that results in high term variation, which we believe to be one of the primary causes of reduced performance in clinical narrative. We show that pairwise learning to rank offers high performance in this context, and introduce several lexical enhancements - generalizable to other clinical NER tasks - that improve the ability of the NER system to handle this variation. DNorm-C is a high performing, open source system for disorders in clinical text, and a promising step toward NER and normalization methods that are trainable to a wide variety of domains and entities. (DNorm-C is open source software, and is available with a trained model at the DNorm demonstration website: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#DNorm.).
Journal of Biomedical Informatics | 2014
Ritu Khare; Jiao Li; Zhiyong Lu
Drug-disease treatment relationships, i.e., which drug(s) are indicated to treat which disease(s), are among the most frequently sought information in PubMed®. Such information is useful for feeding the Google Knowledge Graph, designing computational methods to predict novel drug indications, and validating clinical information in EMRs. Given the importance and utility of this information, there have been several efforts to create repositories of drugs and their indications. However, existing resources are incomplete. Furthermore, they neither label indications in a structured way nor differentiate them by drug-specific properties such as dosage form, and thus do not support computer processing or semantic interoperability. More recently, several studies have proposed automatic methods to extract structured indications from drug descriptions; however, their performance is limited by natural language challenges in disease named entity recognition and indication selection. In response, we report LabeledIn: a human-reviewed, machine-readable and source-linked catalog of labeled indications for human drugs. More specifically, we describe our semi-automatic approach to derive LabeledIn from drug descriptions through human annotations with aids from automatic methods. As the data source, we use the drug labels (or package inserts) submitted to the FDA by drug manufacturers and made available in DailyMed. Our machine-assisted human annotation workflow comprises: (i) a grouping method to remove redundancy and identify representative drug labels to be used for human annotation, (ii) an automatic method to recognize and normalize mentions of diseases in drug labels as candidate indications, and (iii) a two-round annotation workflow for human experts to judge the pre-computed candidates and deliver the final gold standard. In this study, we focused on 250 highly accessed drugs in PubMed Health, a newly developed public web resource for consumers and clinicians on prevention and treatment of diseases. These 250 drugs corresponded to more than 8000 drug labels (500 unique) in DailyMed in which 2950 candidate indications were pre-tagged by an automatic tool. After being reviewed independently by two experts, 1618 indications were selected, and additional 97 (missed by computer) were manually added, with an inter-annotator agreement of 88.35% as measured by the Kappa coefficient. Our final annotation results in LabeledIn consist of 7805 drug-disease treatment relationships where drugs are represented as a triplet of ingredient, dose form, and strength. A systematic comparison of LabeledIn with an existing computer-derived resource revealed significant discrepancies, confirming the need to involve humans in the creation of such a resource. In addition, LabeledIn is unique in that it contains detailed textual context of the selected indications in drug labels, making it suitable for the development of advanced computational methods for the automatic extraction of indications from free text. Finally, motivated by the studies on drug nomenclature and medication errors in EMRs, we adopted a fine-grained drug representation scheme, which enables the automatic identification of drugs with indications specific to certain dose forms or strengths. Future work includes expanding our coverage to more drugs and integration with other resources. The LabeledIn dataset and the annotation guidelines are available at http://ftp.ncbi.nlm.nih.gov/pub/lu/LabeledIn/.
Methods of Molecular Biology | 2014
Ritu Khare; Robert Leaman; Zhiyong Lu
Biomedical and life sciences literature is unique because of its exponentially increasing volume and interdisciplinary nature. Biomedical literature access is essential for several types of users including biomedical researchers, clinicians, database curators, and bibliometricians. In the past few decades, several online search tools and literature archives, generic as well as biomedicine specific, have been developed. We present this chapter in the light of three consecutive steps of literature access: searching for citations, retrieving full text, and viewing the article. The first section presents the current state of practice of biomedical literature access, including an analysis of the search tools most frequently used by the users, including PubMed, Google Scholar, Web of Science, Scopus, and Embase, and a study on biomedical literature archives such as PubMed Central. The next section describes current research and the state-of-the-art systems motivated by the challenges a user faces during query formulation and interpretation of search results. The research solutions are classified into five key areas related to text and data mining, text similarity search, semantic search, query support, relevance ranking, and clustering results. Finally, the last section describes some predicted future trends for improving biomedical literature access, such as searching and reading articles on portable devices, and adoption of the open access policy.
international health informatics symposium | 2010
Ritu Khare; Yuan An; Il-Yeol Song; Xiaohua Hu
Clinicians are becoming increasingly dependent on health information technologies (HIT) in their daily activities, like data collection. However, currently, most HITs are vendor designed systems, which are often inconsistent and inflexible with respect to the needs of the clinicians. Consequently, time and again, the HITs are found to be unfit for the healthcare workflow. A better HIT design is to empower the clinicians with the ability to modify system functionality as per their needs. In this paper, we propose a flexible Electronic Health Record (fEHR) system, which allows clinicians to build new templates/forms for data collection over an existing EHR system through a user interface. The system automatically translates the forms to underlying databases while shielding the user from knowing the technical details. A key contribution is that the generated databases are high-quality with desirable properties. To test the system usability, we conducted a user study with clinicians working in a nurse-managed health services center. The participants performed the given tasks with 100% effectiveness, within a short span of time, in all but one case; and exhibited an improvement in their understanding of the system. Our study demonstrates that the fEHR system has the potentials of incorporating flexibility into HITs to make them more effective and efficient for healthcare. We show that the fEHR is a favorable environment for clinicians to develop and improve their need-modeling skills.
Archive | 2013
Ritu Khare; Yuan An; Sandra Wolf; Paul Nyirjesy; Longjian Liu; Edgar Chou
An “EMR error” refers to any incorrect, incomplete, or inconsistent patient information entered into electronic medical records (EMRs). Currently, the administering clinicians “manually” resolve such errors. Designing automated error control algorithms is a significant, and yet under-explored, informatics problem. In this study, we assess the EMR error detection abilities of physicians, reveal their strategies, and draw implications for computational algorithm design. Focusing on gynecologic practice, we conducted an error simulation study by fabricating several “erroneous” patient visit notes. We presented these notes to 20 experienced gynecologists, and asked them to detect any errors. Despite devoting substantial time, the participants could detect <50% of the introduced errors. Nevertheless, the successful cases helped reveal the 5 kinds of automatable “triggers” that helped participants sense an error candidate. The participants were able to recognize these triggers because of their comprehensive gynecologic knowledge accumulated through experience and medical school training.
international conference on conceptual modeling | 2011
Yuan An; Ritu Khare; Il-Yeol Song; Xiaohua Hu
Forms are a standard way of gathering data into a database. Many applications need to support multiple users with evolving data gathering requirements. It is desirable to automatically link dynamic forms to the back-end database. We have developed the FormMapper system, a fully automatic solution that accepts user-created data entry forms, and maps and integrates them into an existing database in the same domain. The solution comprises of two components: tree extraction and form integration. The tree extraction component leverages a probabilistic process, Hidden Markov Model (HMM), for automatically extracting a semantic tree structure of a form. In the form integration component, we develop a merging procedure that maps and integrates a tree into an existing database and extends the database with desired properties. We conducted experiments evaluating the performance of the system on several large databases designed from a number of complex forms. Our experimental results show that the FormMapper system is promising: It generated databases that are highly similar (87% overlapped) to those generated by the human experts, given the same set of forms.
international health informatics symposium | 2012
Ritu Khare; Yuan An; Jiexun Li; Il-Yeol Song; Xiaohua Hu
The elements of clinical databases are usually named after the clinical terms used in various design artifacts. These terms are instinctively supplied by the users, and hence, different users often use different terms to describe the same clinical concept. This term diversity makes future database integration and analysis a huge challenge. In this paper, we study the problem of standardization of the terms used in a specific kind of user-designed artifact, the encounter forms or templates, using a popular clinical terminology, the SNOMED CT. In particular, we focus on the problem of mapping the terms on an encounter form to SNOMED CT concepts. Existing term mapping techniques are solely based on syntactic string similarity. Such techniques are unable to disambiguate among the terms that resemble one another linguistically, and yet differ semantically. To improve existing techniques, we consider the context of a term in the mapping process and propose a hybrid approach relying on linguistics as well as structural information. For a given form term, this approach (i) exploits the semantic structure of the form to derive the terms context, and (ii) maps the term to a linguistically- matching SNOMED CT concept that is compatible with the derived context. We test the approach on over 900 clinician-specified terms used in 26 forms. This method achieves 23% improvement in precision and 38% improvement in recall, over a pure linguistic-based approach. Our first contribution is that we introduce and address a new problem of mapping form terms to standard concepts. The second contribution is that the experimental evaluation confirms that structural information has a major role in improving mapping performance, and in addressing the key challenges associated with semantic mapping.