Raziur Rahman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Raziur Rahman is active.

Explore More

Publication

Featured researches published by Raziur Rahman.

PLOS ONE | 2015

A Copula Based Approach for Design of Multivariate Random Forests for Drug Sensitivity Prediction

Saad Haider; Raziur Rahman; Souparno Ghosh; Ranadip Pal

Modeling sensitivity to drugs based on genetic characterizations is a significant challenge in the area of systems medicine. Ensemble based approaches such as Random Forests have been shown to perform well in both individual sensitivity prediction studies and team science based prediction challenges. However, Random Forests generate a deterministic predictive model for each drug based on the genetic characterization of the cell lines and ignores the relationship between different drug sensitivities during model generation. This application motivates the need for generation of multivariate ensemble learning techniques that can increase prediction accuracy and improve variable importance ranking by incorporating the relationships between different output responses. In this article, we propose a novel cost criterion that captures the dissimilarity in the output response structure between the training data and node samples as the difference in the two empirical copulas. We illustrate that copulas are suitable for capturing the multivariate structure of output responses independent of the marginal distributions and the copula based multivariate random forest framework can provide higher accuracy prediction and improved variable selection. The proposed framework has been validated on genomics of drug sensitivity for cancer and cancer cell line encyclopedia database.

Bioinformatics | 2017

IntegratedMRF: random forest-based framework for integrating prediction from different data types

Raziur Rahman; John Otridge; Ranadip Pal

Summary: IntegratedMRF is an open‐source R implementation for integrating drug response predictions from various genomic characterizations using univariate or multivariate random forests that includes various options for error estimation techniques. The integrated framework was developed following superior performance of random forest based methods in NCI‐DREAM drug sensitivity prediction challenge. The computational framework can be applied to estimate mean and confidence interval of drug response prediction errors based on ensemble approaches with various combinations of genetic and epigenetic characterizations as inputs. The multivariate random forest implementation included in the package incorporates the correlations between output responses in the modeling and has been shown to perform better than existing approaches when the drug responses are correlated. Detailed analysis of the provided features is included in the Supplementary Material. Availability and Implementation: The framework has been implemented as a R package IntegratedMRF, which can be downloaded from https://cran.r‐project.org/web/packages/IntegratedMRF/index.html, where further explanation of the package is available. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Algorithms | 2016

Algorithms for Drug Sensitivity Prediction

Carlos De Niz; Raziur Rahman; Xiangyuan Zhao; Ranadip Pal

Precision medicine entails the design of therapies that are matched for each individual patient. Thus, predictive modeling of drug responses for specific patients constitutes a significant challenge for personalized therapy. In this article, we consider a review of approaches that have been proposed to tackle the drug sensitivity prediction problem especially with respect to personalized cancer therapy. We first discuss modeling approaches that are based on genomic characterizations alone and further the discussion by including modeling techniques that integrate both genomic and functional information. A comparative analysis of the prediction performance of four representative algorithms, elastic net, random forest, kernelized Bayesian multi-task learning and deep learning, reflecting the broad classes of regularized linear, ensemble, kernelized and neural network-based models, respectively, has been included in the paper. The review also considers the challenges that need to be addressed for successful implementation of the algorithms in clinical practice.

Scientific Reports | 2017

Heterogeneity Aware Random Forest for Drug Sensitivity Prediction

Raziur Rahman; Kevin Matlock; Souparno Ghosh; Ranadip Pal

Samples collected in pharmacogenomics databases typically belong to various cancer types. For designing a drug sensitivity predictive model from such a database, a natural question arises whether a model trained on diverse inter-tumor heterogeneous samples will perform similar to a predictive model that takes into consideration the heterogeneity of the samples in model training and prediction. We explore this hypothesis and observe that ensemble model predictions obtained when cancer type is known out-perform predictions when that information is withheld even when the samples sizes for the former is considerably lower than the combined sample size. To incorporate the heterogeneity idea in the commonly used ensemble based predictive model of Random Forests, we propose Heterogeneity Aware Random Forests (HARF) that assigns weights to the trees based on the category of the sample. We treat heterogeneity as a latent class allocation problem and present a covariate free class allocation approach based on the distribution of leaf nodes of the model ensemble. Applications on CCLE and GDSC databases show that HARF outperforms traditional Random Forest when the average drug responses of cancer types are different.

Cancer Informatics | 2015

Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction

Raziur Rahman; Saad Haider; Souparno Ghosh; Ranadip Pal

Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity prediction problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error.

Bioinformatics | 2018

Sequential feature selection and inference using multi-variate random forests

Joshua Mayer; Raziur Rahman; Souparno Ghosh; Ranadip Pal

Motivation Random forest (RF) has become a widely popular prediction generating mechanism. Its strength lies in its flexibility, interpretability and ability to handle large number of features, typically larger than the sample size. However, this methodology is of limited use if one wishes to identify statistically significant features. Several ranking schemes are available that provide information on the relative importance of the features, but there is a paucity of general inferential mechanism, particularly in a multi‐variate set up. We use the conditional inference tree framework to generate a RF where features are deleted sequentially based on explicit hypothesis testing. The resulting sequential algorithm offers an inferentially justifiable, but model‐free, variable selection procedure. Significant features are then used to generate predictive RF. An added advantage of our methodology is that both variable selection and prediction are based on conditional inference framework and hence are coherent. Results We illustrate the performance of our Sequential Multi‐Response Feature Selection approach through simulation studies and finally apply this methodology on Genomics of Drug Sensitivity for Cancer dataset to identify genetic characteristics that significantly impact drug sensitivities. Significant set of predictors obtained from our method are further validated from biological perspective. Availability and implementation https://github.com/jomayer/SMuRF Supplementary information Supplementary data are available at Bioinformatics online.

BMC Bioinformatics | 2018

Investigation of model stacking for drug sensitivity prediction

Kevin Matlock; Carlos De Niz; Raziur Rahman; Souparno Ghosh; Ranadip Pal

BackgroundA significant problem in precision medicine is the prediction of drug sensitivity for individual cancer cell lines. Predictive models such as Random Forests have shown promising performance while predicting from individual genomic features such as gene expressions. However, accessibility of various other forms of data types including information on multiple tested drugs necessitates the examination of designing predictive models incorporating the various data types.ResultsWe explore the predictive performance of model stacking and the effect of stacking on the predictive bias and squared error. In addition we discuss the analytical underpinnings supporting the advantages of stacking in reducing squared error and inherent bias of random forests in prediction of outliers. The framework is tested on a setup including gene expression, drug target, physical properties and drug response information for a set of drugs and cell lines.ConclusionThe performance of individual and stacked models are compared. We note that stacking models built on two heterogeneous datasets provide superior performance to stacking different models built on the same dataset. It is also noted that stacking provides a noticeable reduction in the bias of our predictors when the dominant eigenvalue of the principle axis of variation in the residuals is significantly higher than the remaining eigenvalues.

international conference on bioinformatics | 2017

Investigation of Model Stacking for Drug Sensitivity Prediction

Kevin Matlock; Carlos De Niz; Raziur Rahman; Souparno Ghosh; Ranadip Pal

A requirement for precision medicine is the accurate prediction of the sensitivity of a given drug on an individual patient. A very common method for this prediction is the use of Random Forest built on Genomic Features such as gene expression. However, effective drug sensitivity prediction requires the use of multiple heterogeneous dataset but it is rare that all such information is available for all drug and cell line combinations. To effectively incorporate all available data into a drug sensitivity prediction problem requires the use of stacking multiple models, built using a variety of different methods and data. In this article we investigate stacking in the context of drug sensitivity prediction. First, we examine basic models utilized in drug sensitivity prediction including the Random Forest, Neural Network, and K-Nearest Neighbor approach. To stack individual models we utilize a linear stacking method built utilizing a held-out validation set. We then investigate which form of stacking is most effective in improving accuracy. The two forms of stacking investigate are vertical stacking, in which our training samples are split and then stacked, and horizontal stacking, where our features are split before stacking. From a theoretical standpoint we have shown that horizontal stacking outperforms vertical stacking, especially as the sample size becomes large. Our theory is then proven utilizing both synthetically generated data, as well as data extracted from the Cancer Cell Line Encyclopedia. Despite having the best individual predictor, vertical stacking consistently underperformed compared to horizontal stacking. Finally, we build a stacked model utilizing data extracted from a variety of sources. Gene Expression for individual cell lines are taken from the GDSC database, drug target data is data mined from Pubchem and physical features are extracted utilizing PaDEL-Descriptor software. Using this data we aim to predict the Area Under the Curve value for individual drug-cell line combinations. We examine the performance of individual and stacked models. We note that stacking models built on two heterogeneous datasets provide superior performance to stacking different models built on the same dataset. It is also noted that stacking provides a noticeable reduction in the bias of our predictors when the dominant eigenvalue of the principle axis of variation in the residuals is significantly higher than the remaining eigenvalues.

ieee embs international conference on biomedical and health informatics | 2016

Analyzing drug sensitivity prediction based on dose response curve characteristics

Raziur Rahman; Ranadip Pal

Precision medicine for cancer involves design of drug sensitivity prediction models that can predict patient response to various drugs. The drug response is usually represented by a single feature such as Area Under the Curve or IC50 derived from the experimental dose response curve. In this article, we consider the idea that predicting the dose response curve and generating the curve features instead of directly predicting the curve characteristics can increase prediction accuracy. Using the cancer cell line encyclopedia database, we illustrate that predicting dose response curve points to calculate AUC instead of directly predicting AUC can reduce prediction mean square error and increase correlation between experimental and predicted values.

2016 IEEE Healthcare Innovation Point-Of-Care Technologies Conference (HI-POCT) | 2016

A mathematical framework for analyzing drug combination toxicity for personalized medicine applications

Raziur Rahman; Ranadip Pal

The use of drug combinations to increase efficacy and lower resistance to therapy for personalized cancer medicine is being commonly recognized. Approaches have been recently designed to address the selection of drug combinations that can be highly effective across tumor cells but limited research have been conducted on the toxicity of these unique drug combinations. In this article, we approach this problem of combination drug toxicity by analyzing drug synergy over in vitro normal cell lines and generate combination drug concentrations whose combined effect on normal cell lines is less than the maximum monotherapy effect at approved concentrations. We present a mathematical framework for combination response estimation among multiple cell cultures along with stochastic analysis of prediction uncertainty. Results indicate the ability of the proposed framework to generate feasibly combination drug concentrations satisfying monotherapy toxicity constraints.

Explore More