Mohammadmahdi R. Yousefi
Ohio State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mohammadmahdi R. Yousefi.
Bioinformatics | 2010
Mohammadmahdi R. Yousefi; Jianping Hua; Chao Sima; Edward R. Dougherty
MOTIVATION It is commonplace for authors to propose a new classification rule, either the operator construction part or feature selection, and demonstrate its performance on real data sets, which often come from high-dimensional studies, such as from gene-expression microarrays, with small samples. Owing to the variability in feature selection and error estimation, individual reported performances are highly imprecise. Hence, if only the best test results are reported, then these will be biased relative to the overall performance of the proposed procedure. RESULTS This article characterizes reporting bias with several statistics and computes these statistics in a large simulation study using both modeled and real data. The results appear as curves giving the different reporting biases as functions of the number of samples tested when reporting only the best or second best performance. It does this for two classification rules, linear discriminant analysis (LDA) and 3-nearest-neighbor (3NN), and for filter and wrapper feature selection, t-test and sequential forward search. These were chosen on account of their well-studied properties and because they were amenable to the extremely large amount of processing required for the simulations. The results across all the experiments are consistent: there is generally large bias overriding what would be considered a significant performance differential, when reporting the best or second best performing data set. We conclude that there needs to be a database of data sets and that, for those studies depending on real data, results should be reported for all data sets in the database. AVAILABILITY Companion web site at http://gsp.tamu.edu/Publications/supplementary/yousefi09a/
BMC Bioinformatics | 2013
Noushin Ghaffari; Mohammadmahdi R. Yousefi; Charles D. Johnson; Ivan Ivanov; Edward R. Dougherty
BackgroundA key goal of systems biology and translational genomics is to utilize high-throughput measurements of cellular states to develop expression-based classifiers for discriminating among different phenotypes. Recent developments of Next Generation Sequencing (NGS) technologies can facilitate classifier design by providing expression measurements for tens of thousands of genes simultaneously via the abundance of their mRNA transcripts. Because NGS technologies result in a nonlinear transformation of the actual expression distributions, their application can result in data that are less discriminative than would be the actual expression levels themselves, were they directly observable.ResultsUsing state-of-the-art distributional modeling for the NGS processing pipeline, this paper studies how that pipeline, via the resulting nonlinear transformation, affects classification and feature selection. The effects of different factors are considered and NGS-based classification is compared to SAGE-based classification and classification directly on the raw expression data, which is represented by a very high-dimensional model previously developed for gene expression. As expected, the nonlinear transformation resulting from NGS processing diminishes classification accuracy; however, owing to a larger number of reads, NGS-based classification outperforms SAGE-based classification.ConclusionsHaving high numbers of reads can mitigate the degradation in classification performance resulting from the effects of NGS technologies. Hence, when performing a RNA-Seq analysis, using the highest possible coverage of the genome is recommended for the purposes of classification.
IEEE Transactions on Signal Processing | 2012
Mohammadmahdi R. Yousefi; Aniruddha Datta; Edward R. Dougherty
Intervention in gene regulatory networks in the context of Markov decision processes has usually involved finding an optimal one-transition policy, where a decision is made at every transition whether or not to apply treatment. In an effort to model dosing constraint, a cyclic approach to intervention has previously been proposed in which there is a sequence of treatment windows and treatment is allowed only at the beginning of each window. This protocol ignores two practical aspects of therapy. First, a treatment typically has some duration of action: a drug will be effective for some period, after which there can be a recovery phase. This, too, might involve a cyclic protocol; however, in practice, a physician might monitor a patient at every stage and decide whether to apply treatment, and if treatment is applied, then the patient will be under the influence of the drug for some duration, followed by a recovery period. This results in an acyclic protocol. In this paper we take a unified approach to both cyclic and acyclic control with duration of effectiveness by placing the problem in the general framework of multiperiod decision epochs with infinite horizon discounting cost. The time interval between successive decision epochs can have multiple time units, where given the current state and the action taken, there is a joint probability distribution defined for the next state and the time when the next decision epoch will be called. Optimal control policies are derived, synthetic networks are used to investigate the properties of both cyclic and acyclic interventions with fixed-duration of effectiveness, and the methodology is applied to a mutated mammalian cell-cycle network.
Eurasip Journal on Bioinformatics and Systems Biology | 2015
Mohammadmahdi R. Yousefi; Lori A. Dalton
Typically, a vast amount of experience and data is needed to successfully determine cancer prognosis in the face of (1) the inherent stochasticity of cell dynamics, (2) incomplete knowledge of healthy cell regulation, and (3) the inherent uncertain and evolving nature of cancer progression. There is hope that models of cell regulation could be used to predict disease progression and successful treatment strategies, but there has been little work focusing on the third source of uncertainty above. In this work, we investigate the impact of this kind of network uncertainty in predicting cancer prognosis. In particular, we focus on a scenario in which the precise aberrant regulatory relationships between genes in a patient are unknown, but the patient gene regulatory network is contained in an uncertainty class of possible mutations of some known healthy network. We optimistically assume that the probabilities of these abnormal networks are available, along with the best treatment for each network. Then, given a snapshot of the patient gene activity profile at a single moment in time, we study what can be said regarding the patient’s treatability and prognosis. Our methodology is based on recent developments on optimal control strategies for probabilistic Boolean networks and optimal Bayesian classification. We show that in some circumstances, prognosis prediction may be highly unreliable, even in this optimistic setting with perfect knowledge of healthy biological processes and ideal treatment decisions.
Bioinformatics | 2012
Mohammadmahdi R. Yousefi; Edward R. Dougherty
MOTIVATION A common practice in biomarker discovery is to decide whether a large laboratory experiment should be carried out based on the results of a preliminary study on a small set of specimens. Consideration of the efficacy of this approach motivates the introduction of a probabilistic measure, for whether a classifier showing promising results in a small-sample preliminary study will perform similarly on a large independent sample. Given the error estimate from the preliminary study, if the probability of reproducible error is low, then there is really no purpose in substantially allocating more resources to a large follow-on study. Indeed, if the probability of the preliminary study providing likely reproducible results is small, then why even perform the preliminary study? RESULTS This article introduces a reproducibility index for classification, measuring the probability that a sufficiently small error estimate on a small sample will motivate a large follow-on study. We provide a simulation study based on synthetic distribution models that possess known intrinsic classification difficulties and emulate real-world scenarios. We also set up similar simulations on four real datasets to show the consistency of results. The reproducibility indices for different distributional models, real datasets and classification schemes are empirically calculated. The effects of reporting and multiple-rule biases on the reproducibility index are also analyzed. AVAILABILITY We have implemented in C code the synthetic data distribution model, classification rules, feature selection routine and error estimation methods. The source code is available at http://gsp.tamu.edu/Publications/supplementary/yousefi12a/.
Eurasip Journal on Bioinformatics and Systems Biology | 2014
Mohammadmahdi R. Yousefi; Edward R. Dougherty
Perfect knowledge of the underlying state transition probabilities is necessary for designing an optimal intervention strategy for a given Markovian genetic regulatory network. However, in many practical situations, the complex nature of the network and/or identification costs limit the availability of such perfect knowledge. To address this difficulty, we propose to take a Bayesian approach and represent the system of interest as an uncertainty class of several models, each assigned some probability, which reflects our prior knowledge about the system. We define the objective function to be the expected cost relative to the probability distribution over the uncertainty class and formulate an optimal Bayesian robust intervention policy minimizing this cost function. The resulting policy may not be optimal for a fixed element within the uncertainty class, but it is optimal when averaged across the uncertainly class. Furthermore, starting from a prior probability distribution over the uncertainty class and collecting samples from the process over time, one can update the prior distribution to a posterior and find the corresponding optimal Bayesian robust policy relative to the posterior distribution. Therefore, the optimal intervention policy is essentially nonstationary and adaptive.
ieee embs international conference on biomedical and health informatics | 2016
He Zhou; Jiang Hu; Sunil P. Khatri; Frank Liu; Cliff C. N. Sze; Mohammadmahdi R. Yousefi
A recently developed approach to precision medicine is the use of Markov Decision Processes (MDPs) on Gene Regulatory Networks (GRNs). Due to very limited information on the system dynamics of GRNs, the MDP must repeatedly conduct exhaustive search for a non-stationary policy, and thus entails exponential computational complexity. This has hindered its practical applications to date. With the goal of overcoming this obstacle, we investigate acceleration techniques, using the Graphic Processing Unit (GPU) platform, which allows massive parallelism. Our GPU-based acceleration techniques are applied with two different MDP approaches: the optimal Bayesian robust (OBR) policy and the forward search sparse sampling (FSSS) method. Simulation results demonstrate that our techniques achieve a speedup of two orders of magnitude over sequential implementations. In addition, we present a study on the memory utilization and error trends of these techniques.
ieee global conference on signal and information processing | 2013
Mohammadmahdi R. Yousefi; Ivan Ivanov
Optimal control policies for Markovian gene regulatory networks assume that external intervention is 100% specific to control genes. In practice, however, this effect may be unpredictable in the sense that intervention may also target alternative genes. Our goal is to find an optimal control policy that performs well in such cases. We model this by an uncertainty class of controlled networks corresponding to different affected genes, governed by a probability distribution representing our confidence in the intervention specificity to each gene, and optimize relative to this uncertainty class.
asilomar conference on signals, systems and computers | 2011
Mohammadmahdi R. Yousefi; Aniruddha Datta; Edward R. Dougherty
When implementing chemotherapy, it is possible to decrease the likelihood of visiting undesirable states by finding the right dose schedules that maximizes the benefit to toxicity ratio of a delivered drug. Therapies in a typical practice are usually administered in cycles. If treatment is applied at the beginning of a cycle, then the patient will be under the influence of the drug for some period of time, after which there can be a recovery phase. In this paper, we present a methodology to devise optimal intervention policies in Markovian genetic regulatory networks for the class of cyclic therapeutic methods where interventions have fixed-length duration of effectiveness.
Eurasip Journal on Bioinformatics and Systems Biology | 2015
Lori A. Dalton; Mohammadmahdi R. Yousefi
A recently proposed optimal Bayesian classification paradigm addresses optimal error rate analysis for small-sample discrimination, including optimal classifiers, optimal error estimators, and error estimation analysis tools with respect to the probability of misclassification under binary classes. Here, we address multi-class problems and optimal expected risk with respect to a given risk function, which are common settings in bioinformatics. We present Bayesian risk estimators (BRE) under arbitrary classifiers, the mean-square error (MSE) of arbitrary risk estimators under arbitrary classifiers, and optimal Bayesian risk classifiers (OBRC). We provide analytic expressions for these tools under several discrete and Gaussian models and present a new methodology to approximate the BRE and MSE when analytic expressions are not available. Of particular note, we present analytic forms for the MSE under Gaussian models with homoscedastic covariances, which are new even in binary classification.