Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Anupama E. Gururaj is active.

Publication


Featured researches published by Anupama E. Gururaj.


Nature Genetics | 2017

Finding useful data across multiple biomedical data repositories using DataMed

Lucila Ohno-Machado; Susanna-Assunta Sansone; George Alter; Ian Fore; Jeffrey S. Grethe; Hua Xu; Alejandra Gonzalez-Beltran; Philippe Rocca-Serra; Anupama E. Gururaj; Elizabeth A. Bell; Ergin Soysal; Nansu Zong; Hyeoneui Kim

The value of broadening searches for data across multiple repositories has been identified by the biomedical research community. As part of the US National Institutes of Health (NIH) Big Data to Knowledge initiative, we work with an international community of researchers, service providers and knowledge experts to develop and test a data index and search engine, which are based on metadata extracted from various data sets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports the findability and accessibility of data sets. These characteristics—along with interoperability and reusability—compose the four FAIR principles to facilitate knowledge discovery in todays big data–intensive science landscape.


Scientific Data | 2017

DATS, the data tag suite to enable discoverability of datasets

Susanna-Assunta Sansone; Alejandra Gonzalez-Beltran; Philippe Rocca-Serra; George Alter; Jeffrey S. Grethe; Hua Xu; Ian Fore; Jared Lyle; Anupama E. Gururaj; Xiaoling Chen; Hyeoneui Kim; Nansu Zong; Yueling Li; Ruiling Liu; I. Burak Ozyurt; Lucila Ohno-Machado

Today’s science increasingly requires effective ways to find and access existing datasets that are distributed across a range of repositories. For researchers in the life sciences, discoverability of datasets may soon become as essential as identifying the latest publications via PubMed. Through an international collaborative effort funded by the National Institutes of Health (NIH)’s Big Data to Knowledge (BD2K) initiative, we have designed and implemented the DAta Tag Suite (DATS) model to support the DataMed data discovery index. DataMed’s goal is to be for data what PubMed has been for the scientific literature. Akin to the Journal Article Tag Suite (JATS) used in PubMed, the DATS model enables submission of metadata on datasets to DataMed. DATS has a core set of elements, which are generic and applicable to any type of dataset, and an extended set that can accommodate more specialized data types. DATS is a platform-independent model also available as an annotated serialization in schema.org, which in turn is widely used by major search engines like Google, Microsoft, Yahoo and Yandex.


Database | 2017

A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge

Trevor Cohen; Kirk Roberts; Anupama E. Gururaj; Xiaoling Chen; Saeid Pournejati; George Alter; William R. Hersh; Dina Demner-Fushman; Lucila Ohno-Machado; Hua Xu

Abstract The rapid proliferation of publicly available biomedical datasets has provided abundant resources that are potentially of value as a means to reproduce prior experiments, and to generate and explore novel hypotheses. However, there are a number of barriers to the re-use of such datasets, which are distributed across a broad array of dataset repositories, focusing on different data types and indexed using different terminologies. New methods are needed to enable biomedical researchers to locate datasets of interest within this rapidly expanding information ecosystem, and new resources are needed for the formal evaluation of these methods as they emerge. In this paper, we describe the design and generation of a benchmark for information retrieval of biomedical datasets, which was developed and used for the 2016 bioCADDIE Dataset Retrieval Challenge. In the tradition of the seminal Cranfield experiments, and as exemplified by the Text Retrieval Conference (TREC), this benchmark includes a corpus (biomedical datasets), a set of queries, and relevance judgments relating these queries to elements of the corpus. This paper describes the process through which each of these elements was derived, with a focus on those aspects that distinguish this benchmark from typical information retrieval reference sets. Specifically, we discuss the origin of our queries in the context of a larger collaborative effort, the biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium, and the distinguishing features of biomedical dataset retrieval as a task. The resulting benchmark set has been made publicly available to advance research in the area of biomedical dataset retrieval. Database URL: https://biocaddie.org/benchmark-data


Journal of the American Medical Informatics Association | 2018

User needs analysis and usability assessment of DataMed – a biomedical data discovery index

Ram Dixit; Deevakar Rogith; Vidya Narayana; Mandana Salimi; Anupama E. Gururaj; Lucila Ohno-Machado; Hua Xu; Todd R. Johnson

Abstract Objective To present user needs and usability evaluations of DataMed, a Data Discovery Index (DDI) that allows searching for biomedical data from multiple sources. Materials and Methods We conducted 2 phases of user studies. Phase 1 was a user needs analysis conducted before the development of DataMed, consisting of interviews with researchers. Phase 2 involved iterative usability evaluations of DataMed prototypes. We analyzed data qualitatively to document researchers’ information and user interface needs. Results Biomedical researchers’ information needs in data discovery are complex, multidimensional, and shaped by their context, domain knowledge, and technical experience. User needs analyses validate the need for a DDI, while usability evaluations of DataMed show that even though aggregating metadata into a common search engine and applying traditional information retrieval tools are promising first steps, there remain challenges for DataMed due to incomplete metadata and the complexity of data discovery. Discussion Biomedical data poses distinct problems for search when compared to websites or publications. Making data available is not enough to facilitate biomedical data discovery: new retrieval techniques and user interfaces are necessary for dataset exploration. Consistent, complete, and high-quality metadata are vital to enable this process. Conclusion While available data and researchers’ information needs are complex and heterogeneous, a successful DDI must meet those needs and fit into the processes of biomedical researchers. Research directions include formalizing researchers’ information needs, standardizing overviews of data to facilitate relevance judgments, implementing user interfaces for concept-based searching, and developing evaluation methods for open-ended discovery systems such as DDIs.


Journal of the American Medical Informatics Association | 2018

DataMed – an open source discovery index for finding biomedical datasets

Xiaoling Chen; Anupama E. Gururaj; Burak Ozyurt; Ruiling Liu; Ergin Soysal; Trevor Cohen; Firat Tiryaki; Yueling Li; Nansu Zong; Min Jiang; Deevakar Rogith; Mandana Salimi; Hyeoneui Kim; Philippe Rocca-Serra; Alejandra Gonzalez-Beltran; Claudiu Farcas; Todd R. Johnson; Ron Margolis; George Alter; Susanna-Assunta Sansone; Ian Fore; Lucila Ohno-Machado; Jeffrey S. Grethe; Hua Xu

Abstract Objective Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. Materials and Methods DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health–funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Results and Conclusion Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community.


Database | 2017

Information retrieval for biomedical datasets: the 2016 bioCADDIE dataset retrieval challenge

Kirk Roberts; Anupama E. Gururaj; Xiaoling Chen; Saeid Pournejati; William R. Hersh; Dina Demner-Fushman; Lucila Ohno-Machado; Trevor Cohen; Hua Xu


AMIA | 2017

Information Retrieval for Biomedical Datasets: The 2016 bioCADDIE Challenge.

Kirk Roberts; Anupama E. Gururaj; Xiaoling Chen; Saeid Pournejati; Trevor Cohen; William R. Hersh; Dina Demner-Fushman; Lucila Ohno-Machado; Hua Xu


AMIA | 2017

A Natural Language Processing System for Biomedical Dataset Retrieval.

Xiaoling Chen; Jun Xu; Jingqi Wang; Anupama E. Gururaj; Lucila Ohno-Machado; Hua Xu


AMIA | 2017

Meeting User Needs for a Data Discovery Index of Biomedical Big Data.

Ram Dixit; Deevakar Rogith; Vidya Narayana; Mandana Salimi; Anupama E. Gururaj; Lucila Ohno-Machado; Hua Xu; Todd R. Johnson


Archive | 2016

Metadata Mapping In Biocaddie: Challenging Cases

Nansu Zong; Diana Guijiarro; Sze Nga Wong; Shao Ling Soh; Muhammad Khan; Hyeoneui Kim; Jeffrey S. Grethe; Burak Ozyurt; Hua Xu; Xiaoling Chen; Ruiling Liu; Anupama E. Gururaj; Ergin Soysal; Yueling Li; Claudiu Farcas; Alejandra Gonzalez-Beltran; Philippe Rocca-Serra; Ian Fore; Ronald Margolis; George Alter; Susanna-Assunta Sansone; Lucila Ohno-Machado

Collaboration


Dive into the Anupama E. Gururaj's collaboration.

Top Co-Authors

Avatar

Hua Xu

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xiaoling Chen

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hyeoneui Kim

University of California

View shared research outputs
Top Co-Authors

Avatar

Ian Fore

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Trevor Cohen

University of Texas Health Science Center at Houston

View shared research outputs
Researchain Logo
Decentralizing Knowledge