Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ayush Singhal is active.

Publication


Featured researches published by Ayush Singhal.


PLOS Computational Biology | 2016

Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.

Ayush Singhal; Michael Simmons; Zhiyong Lu

The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient’s genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed). Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer’s disease, hemochromatosis, age-related macular degeneration (AMD), diabetes mellitus, and cystic fibrosis). We then evaluate our approach in two ways: (1) a direct comparison with the state of the art using benchmark datasets; (2) a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt) for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79) over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB), we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets (disease-gene-variant) that overlapped with entries in UniProt and 5,384 triplets without overlap in UniProt. Analysis of the overlapping triplets and of a stratified sample of the non-overlapping triplets revealed accuracies of 93% and 80% for the respective categories (cumulative accuracy, 77%). We conclude that our process represents an important and broadly applicable improvement to the state of the art for curation of disease-gene-variant relationships.


Journal of the American Medical Informatics Association | 2016

Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature

Ayush Singhal; Michael Simmons; Zhiyong Lu

OBJECTIVE Identifying disease-mutation relationships is a significant challenge in the advancement of precision medicine. The aim of this work is to design a tool that automates the extraction of disease-related mutations from biomedical text to advance database curation for the support of precision medicine. MATERIALS AND METHODS We developed a machine-learning (ML) based method to automatically identify the mutations mentioned in the biomedical literature related to a particular disease. In order to predict a relationship between the mutation and the target disease, several features, such as statistical features, distance features, and sentiment features, were constructed. Our ML model was trained with a pre-labeled dataset consisting of manually curated information about mutation-disease associations. The model was subsequently used to extract disease-related mutations from larger biomedical literature corpora. RESULTS The performance of the proposed approach was assessed using a benchmarking dataset. Results show that our proposed approach gains significant improvement over the previous state of the art and obtains F-measures of 0.880 and 0.845 for prostate and breast cancer mutations, respectively. DISCUSSION To demonstrate its utility, we applied our approach to all abstracts in PubMed for 3 diseases (including a non-cancer disease). The mutations extracted were then manually validated against human-curated databases. The validation results show that the proposed approach is useful in a real-world setting to extract uncurated disease mutations from the biomedical literature. CONCLUSIONS The proposed approach improves the state of the art for mutation-disease extraction from text. It is scalable and generalizable to identify mutations for any disease at a PubMed scale.


web intelligence | 2013

Leveraging Web Intelligence for Finding Interesting Research Datasets

Ayush Singhal; Ravindra Kasturi; Vidyashankar Sivakumar; Jaideep Srivastava

The problem of users interest to item matching is at the core of recommendation systems and search engines. This problem is well studied in different contexts such as item, document, music and movie recommendations. For the purpose of recommendation these systems store the context or the meta-data information about the item of interest (e.g. user rating for books, tags, price etc). However, the general approaches for finding relevant items for recommendation cannot be directly applied in the case when the context or meta-data information about the item of interest is missing. In this paper we describe an algorithmic approach to handle this problem of missing context for items. In the proposed approach we have extended the context of users interest and developed an unsupervised algorithm to find the items of interest for the user. Finally the items are ranked based on their relevance to the users interest. We study this problem in the domain of dataset recommendation where the meta-data information about the datasets is missing due to lack of coherent and complete repository for the research datasets. We evaluate the performance of the proposed framework with real world dataset consisting of 20 user queries. We find that the proposed framework can recommend datasets for user queries with a recall of 90% in the top-4 recommendations. We also compared the performance of the dataset finding algorithm with the state of art supervised classification approach. We get a significant improvement of 36% using the proposed algorithm.


Database | 2016

Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges

Ayush Singhal; Robert Leaman; Natalie L. Catlett; Thomas Lemberger; Johanna McEntyre; Shawn W. Polson; Ioannis Xenarios; Cecilia N. Arighi; Zhiyong Lu

Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system ‘accuracy’ remains a challenge and identify several additional common difficulties and potential research directions including (i) the ‘scalability’ issue due to the increasing need of mining information from millions of full-text articles, (ii) the ‘interoperability’ issue of integrating various text-mining systems into existing curation workflows and (iii) the ‘reusability’ issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.


Advances in Experimental Medicine and Biology | 2016

Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health

Michael Simmons; Ayush Singhal; Zhiyong Lu

The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next-generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text-found in biomedical publications and clinical notes-is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine.


web intelligence | 2013

Automating Document Annotation Using Open Source Knowledge

Ayush Singhal; Ravindra Kasturi; Jaideep Srivastava

Annotating documents with relevant and comprehensive keywords offers invaluable assistance to the readers to quickly overview any document. The problem of document annotation is addressed in the literature under two broad classes of techniques namely, key phrase extraction and key phrase abstraction. In this paper, we propose a novel approach to generate summary phrases for research documents. Given the dynamic nature of scientific research, it has become important to incorporate new and popular scientific terminologies in document annotations. For this purpose, we have used crowd-source knowledge bases like Wikipedia and WikiCFP (a open source information source for call for papers) for automating key phrase generation. Also, we have taken into account the lack of availability of the documents content (due to protective policies) and developed a global context based key-phrase identification approach. We show that given only the title of a document, the proposed approach generates its global context information using academic search engines like Google Scholar. We evaluated the performance of the proposed approach on real-world dataset obtained from a computer science research document corpus. We quantitatively evaluated the performance of the proposed approach and compared it with two baseline approaches.


advances in social networks analysis and mining | 2013

Dynamics of trust reciprocation in multi-relational networks

Ayush Singhal; Karthik Subbian; Jaideep Srivastava; Tamara G. Kolda; Ali Pinar

Understanding the dynamics of reciprocation is of great interest in sociology and computational social science. The recent growth of Massively Multi-player Online Games (MMOGs) has provided unprecedented access to large-scale data which enables us to study such complex human behavior in a more systematic manner. In this paper, we consider three different networks in the EverQuest2 game: chat, trade, and trust. The chat network has the highest level of reciprocation (33%) because there are essentially no barriers to it. The trade network has a lower rate of reciprocation (27%) because it has the obvious barrier of requiring goods or money for exchange; morever, there is no clear benefit to returning a trade link except in terms of social connections. The trust network has the lowest reciprocation (14%) because this equates to sharing certain within-game assets such as weapons, and so there is a high barrier for such connections In general, we observe that reciprocation rate is inversely related to the barrier level in these networks. We also note that reciprocation has connections across the heterogeneous networks. Our experiments indicate that players make use of the medium-barrier reciprocations to strengthen a relationship. We hypothesize that lower-barrier interactions are an important component to predicting higher-barrier ones. We verify our hypothesis using predictive models for trust reciprocations with features from trade interactions. Incorporating the number of trades (both before and after the initial trust link) boosts our ability to predict if the trust will be reciprocated up to 11% with respect to the AUC. More generally, we see strong correlations across the different networks and emphasize that network dynamics, such as reciprocation, cannot be studied in isolation on just a single type of connection.


ACM Transactions on Internet Technology | 2017

Formation and Reciprocation of Dyadic Trust

Atanu Roy; Ayush Singhal; Jaideep Srivastava

This paper reports a detailed empirical study of interpersonal trust in a multi-relational online social network. This study addresses two main aspects of interpersonal trust: formation and reciprocation. Computational models developed, using multi-relational networks, for these processes provide interesting insights about online social interactions. Our findings for trust formation (initiation) indicate a strong role of lower familiarity interactions before trust(high familiarity relationship) is formed. Similarly, trust reciprocation is not automatic, but strongly depends on enough lower familiarity interactions. This study is the first quantification of the “scaffolding role” played by lower familiarity interactions, in formation of high familiarity relationships.


international conference on web intelligence mining and semantics | 2014

Generating Semantic Annotations For Research Datasets

Ayush Singhal; Jaideep Srivastava

Annotations are important for the description of any object. They give understanding about the object in a summary form. Annotations, unlike tags, are structured form of meta-data information. Best structured information is prepared by humans. However, given the large volume and variety of objects like images, videos and documents, to name a few, it is practically impossible to annotate all the objects in the world. In such a situation, automated approaches to subscribe semantically correct and structured annotations is an extremely important task. In this paper we have proposed a novel problem of semantic annotation of research datasets. Explosion in the usage of social media and various electronic devices has led to collection of huge volumes of datasets for scientific research. Although, most of the datasets are available online, the lack of semantic annotations/meta-data and the lack of a unified public repository has made it difficult for researchers to browse through the datasets even with popular search engines. In this work we propose an algorithmic approach to automate the task of annotating the datasets in structured and semantic manner. We have used knowledge from the World Wide Web and organized knowledge bases such as dbpedia, yago, freebase and wordnet to derive context and annotations for the research datasets. The proposed approach is evaluated on two real world datasets, namely, UCI dataset repository and SNAP dataset collections. Using various experimental setups we show that the proposed approach outperforms the baseline approaches. We also perform a case study to compare our results with Google search engine. We find that using the semantic annotations the search accuracy increases by 18% over the normal search for datasets.


information reuse and integration | 2014

Leveraging the web for automating tag expansion for low-content items

Ayush Singhal; Jaideep Srivastava

Tags, as high quality semantic descriptors, are used in categorization, clustering and efficient retrieval of various items in the web corpus. Images, videos, songs and similar multimedia items are the most common items which are tagged either manually or in a semiautomatic manner. However, the tagging process becomes complicated when the content structure of an item is not interpretable. Such a problems occurs in items like scientific research datasets or documents with very little text content. In this work, we propose a generalized approach to automate tag expansion for such low-content items. We leverage intelligence of the web to generate secondary content for such items for the tag expansion process. While automating tag expansion, we also address the problem of topic drift by automating removal of the noisy tags from the set of candidate new tags. The effectiveness of the proposed approach is tested on a real world dataset. The performance of the proposed is compared with Wikipedia based nearest neighbor tagging (WikiSem) and non-negative matrix factorization (NMF) based tag expansion approaches. Based on the Mean Reciprocal Rank (MRR) metric, the proposed approach was twice as accurate as the WikiSem baseline (0.27 vs 0.13) and at least 2.25 times the NMF baselines (0.27 vs 0.12).

Collaboration


Dive into the Ayush Singhal's collaboration.

Top Co-Authors

Avatar

Jaideep Srivastava

Qatar Computing Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zhiyong Lu

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Michael Simmons

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Tamara G. Kolda

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Ali Pinar

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Atanu Roy

University of Minnesota

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Robert Leaman

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge