Asish Ghoshal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Asish Ghoshal is active.

Explore More

Publication

Featured researches published by Asish Ghoshal.

BMC Genomics | 2015

MicroRNA target prediction using thermodynamic and sequence curves.

Asish Ghoshal; Raghavendran Shankar; Saurabh Bagchi; Somali Chaterji

MicroRNAs (miRNAs) are small regulatory RNA that mediate RNA interference by binding to various mRNA target regions. There have been several computational methods for the identification of target mRNAs for miRNAs. However, these have considered all contributory features as scalar representations, primarily, as thermodynamic or sequence-based features. Further, a majority of these methods solely target canonical sites, which are sites with “seed” complementarity. Here, we present a machine-learning classification scheme, titled Avishkar, which captures the spatial profile of miRNA-mRNA interactions via smooth B-spline curves, separately for various input features, such as thermodynamic and sequence features. Further, we use a principled approach to uniformly model canonical and non-canonical seed matches, using a novel seed enrichment metric. We demonstrate that large number of seed-match patterns have high enrichment values, conserved across species, and that majority of miRNA binding sites involve non-canonical matches, corroborating recent findings. Using spatial curves and popular categorical features, such as target site length and location, we train a linear SVM model, utilizing experimental CLIP-seq data. Our model significantly outperforms all established methods, for both canonical and non-canonical sites. We achieve this while using a much larger candidate miRNA-mRNA interaction set than prior work. We have developed an efficient SVM-based model for miRNA target prediction using recent CLIP-seq data, demonstrating superior performance, evaluated using ROC curves, specifically about 20 % better than the state-of-the-art, for different species (human or mouse), or different target types (canonical or non-canonical). To the best of our knowledge we provide the first distributed framework for microRNA target prediction based on Apache Hadoop and Spark. All source code and data is publicly available at https://bitbucket.org/cellsandmachines/avishkar .

communication systems and networks | 2016

Fast training on large genomics data using distributed Support Vector Machines

Nawanol Theera-Ampornpunt; Seong Gon Kim; Asish Ghoshal; Saurabh Bagchi; Somali Chaterji

The field of genomics has seen a glorious explosion of high-quality data, with tremendous strides having been made in genomic sequencing instruments and computational genomics applications meant to make sense of the data. A common use case for genomics data is to answer the question if a specific genetic signature is correlated with some disease manifestations. Support Vector Machine (SVM) is a widely used classifier in computational literature. Previous studies have shown success in using these SVMs for the above use case of genomics data. However, SVMs suffer from a widely-recognized scalability problem in both memory use and computational time. It is as yet an unanswered question if training such classifiers can scale to the massive sizes that characterize many of the genomics data sets. We answer that question here for a specific dataset, in order to decipher whether some regulatory module of a particular combinatorial epigenetic “pattern” will regulate the expression of a gene. However, the specifics of the dataset is likely of less relevance to the claims of our work. We take a proposed theoretical technique for efficient training of SVM, namely Cascade SVM, create our classifier called EP-SVM, and empirically evaluate how it scales to the large genomics dataset. We implement Cascade SVM on the Apache Spark platform and open source this implementation1. Through our evaluation, we bring out the computational cost on each application process, the way of distributing the overall workload among multiple processes, which can potentially execute on different cores or different machines, and the cost of data transfer to different cores or different machines. We believe we are the first to shed light on the computational and network costs of training an SVM on a multi-dimensional genomics dataset. We also evaluate the accuracy of the classifier result as a function of the parameters of the SVM model.

international conference on bioinformatics | 2015

An ensemble SVM model for the accurate prediction of non-canonical MicroRNA targets

Asish Ghoshal; Saurabh Bagchi; Somali Chaterji

Background MicroRNAs are small non-coding endogenous RNAs that are responsible for post-transcriptional regulation of genes. Given that large numbers of human genes are targeted by microRNAs, understanding the precise mechanism of microRNA action and accurately mapping their targets is of paramount importance; this will uncover the role of microRNAs in development, differentiation, and disease pathogenesis. However, the current state-of-the-art computational methods for microRNA target prediction suffer from high false-positive rates to be useful in practice. Results In this paper, we develop a suite of models for microRNA target prediction, under the banner Avishkar, that have superior prediction performance over the state-of-the-art protocols. Specifically, our final model developed in this paper achieves an average true positive rate of more than 75%, when keeping the false positive rate of 20%, for non-canonical microRNA target sites in humans. This is an improvement of over 150% in the true positive rate for non-canonical sites, over the best competitive protocol. We are able to achieve such superior performance by representing the thermodynamic and sequence profiles of microRNA-mRNA interaction as curves, coming up with a novel metric of seed enrichment to model seed matches as well as all possible non-canonical matches, and learning an ensemble of microRNA family-specific non-linear SVM classifiers. We provide an easy-to-use system, built on top of Apache Spark, for large-scale interactive analysis and prediction of microRNA targets. All operations in our system, namely candidate set generation, feature generation and transformation, training, prediction and computing performance metrics are fully distributed and are scalable. Availability All source code and sample data is available at https://bitbucket.org/cellsandmachines/avishkar. We also provide scalable implementations of kernel SVM using Apache Spark, which can be used to solve large-scale non-linear binary classification problems at https://bitbucket.org/cellsandmachines/kernelsvmspark.

allerton conference on communication, control, and computing | 2016

From behavior to sparse graphical games: Efficient recovery of equilibria

Asish Ghoshal; Jean Honorio

In this paper we study the problem of exact recovery of the pure-strategy Nash equilibria (PSNE) set of a graphical game from noisy observations of joint actions of the players alone. We consider sparse linear influence games — a parametric class of graphical games with linear payoffs, and represented by directed graphs of n nodes (players) and in-degree of at most k. We present an ℓ1-regularized logistic regression based algorithm for recovering the PSNE set exactly, that is both computationally efficient — i.e. runs in polynomial time — and statistically efficient — i.e. has logarithmic sample complexity. Specifically, we show that the sufficient number of samples required for exact PSNE recovery scales as O(poly(k) log n). We also validate our theoretical results using synthetic experiments.

international conference on artificial intelligence and statistics | 2018