Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ali Foroughi pour is active.

Publication


Featured researches published by Ali Foroughi pour.


ieee global conference on signal and information processing | 2014

Optimal Bayesian feature selection on high dimensional gene expression data

Ali Foroughi pour; Lori A. Dalton

Recent work proposes a Bayesian hierarchical model for feature selection in which priors are placed over the identity of each feature, as well as over the underlying feature-label distribution. Given data, Bayesian inference can be used to find a maximum posterior probability feature set. In this work, we examine the application of this theory to microarray data for biomarker discovery. A major challenge is in adapting the theory to very high-dimensional spaces, and we thus propose two suboptimal feature selection algorithms based on optimal Bayesian feature selection theory that perform very well with relatively low computational burden, thus being ideal for molecular biomarker discovery. We demonstrate in a synthetic microarray model that performance of the proposed methods are quite robust to the deviations from modeling assumptions, and in fact achieve outstanding performance relative to popular methods.


international conference on bioinformatics | 2015

Optimal bayesian feature filtering

Ali Foroughi pour; Lori A. Dalton

Recent work proposes a Bayesian hierarchical model for feature selection in which a prior describes the identity of feature sets as well as their underlying class-conditional distribution. In this work, we consider this model under an independence assumption on both feature identities, and the underlying class-conditional distribution of each feature. This framework results in optimal Bayesian feature filtering. Closed form solutions, which are applicable to high dimensional data with low computation cost, can be found. In addition, this model may be used to provide feedback on the quality of any feature set via closed-form estimators of the expected number of good features missed and the expected number of bad features selected. Synthetic data simulations depict outstanding performance of the optimal Bayesian feature filter relative to other popular feature selection methods. We also observe robustness with respect to assumptions in the model, particularly the independence assumption, and robustness under non-informative priors on the underlying class-conditional distributions. Furthermore, application of the optimal Bayesian feature filter on gene expression microarray datasets provides a gene list in which markers with known links to the cancer in question are highly ranked.


international conference on acoustics, speech, and signal processing | 2017

Robust feature selection for block covariance Bayesian models

Ali Foroughi pour; Lori A. Dalton

Recent work proposes new algorithms for feature selection based on a Bayesian hierarchical model that places priors on both the identity of all features, and the identity-conditioned feature-label distribution. Given training data, Bayesian inference can be used to predict the feature identities. While algorithms developed in prior work rely on certain independence assumptions, in this work we present a new algorithm, with low computational complexity, designed for a family of Bayesian models that each assume different block covariance structures. We show the new algorithm, and the previous algorithm assuming independent features, have robust performance across the family of models under synthetic data, and provide results from real colon cancer microarray data.


BMC Bioinformatics | 2018

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

Ali Foroughi pour; Lori A. Dalton

BackgroundMany bioinformatics studies aim to identify markers, or features, that can be used to discriminate between distinct groups. In problems where strong individual markers are not available, or where interactions between gene products are of primary interest, it may be necessary to consider combinations of features as a marker family. To this end, recent work proposes a hierarchical Bayesian framework for feature selection that places a prior on the set of features we wish to select and on the label-conditioned feature distribution. While an analytical posterior under Gaussian models with block covariance structures is available, the optimal feature selection algorithm for this model remains intractable since it requires evaluating the posterior over the space of all possible covariance block structures and feature-block assignments. To address this computational barrier, in prior work we proposed a simple suboptimal algorithm, 2MNC-Robust, with robust performance across the space of block structures. Here, we present three new heuristic feature selection algorithms.ResultsThe proposed algorithms outperform 2MNC-Robust and many other popular feature selection algorithms on synthetic data. In addition, enrichment analysis on real breast cancer, colon cancer, and Leukemia data indicates they also output many of the genes and pathways linked to the cancers under study.ConclusionsBayesian feature selection is a promising framework for small-sample high-dimensional data, in particular biomarker discovery applications. When applied to cancer data these algorithms outputted many genes already shown to be involved in cancer as well as potentially new biomarkers. Furthermore, one of the proposed algorithms, SPM, outputs blocks of heavily correlated genes, particularly useful for studying gene interactions and gene networks.


international conference on bioinformatics | 2016

Multiple Sclerosis Biomarker Discovery via Bayesian Feature Selection

Ali Foroughi pour; Lori A. Dalton

Recent work proposes a hierarchical Bayesian framework for feature selection, where a prior describes the identity of each feature set and the underlying distribution parameters. Assuming jointly Gaussian features, a posterior is found in closed form, and an approximation is presented to develop fast suboptimal algorithms. Applying this method to multiple sclerosis data we find highly ranked genes and pathways suggested to be involved in multiple sclerosis.


ieee global conference on signal and information processing | 2016

Optimal Bayesian feature selection with missing data

Ali Foroughi pour; Lori A. Dalton

We present a framework for optimal Bayesian feature selection and missing value estimation. Based on this framework, we derive optimal algorithms under an independent Gaussian model, and provide fast sub-optimal methods with superb performance for a dependent Gaussian model.


international conference on bioinformatics | 2018

Bayesian Biomarker Discovery for RNAseq Data

Ali Foroughi pour; Lori A. Dalton

RNAseq has become a popular technology for biomarker discovery. However, in many applications, such as single cell sequencing, zero counts comprise a considerable portion of data. Here we propose a new RNAseq model that explicitly models zero counts and solve a previously proposed feature selection framework, called Optimal Bayesian Filter (OBF), for this model and find the posterior probability of a feature having distributional differences across classes. As the posterior does not exist in closed form, we propose Sequence Approximation OBF (SA-OBF) as a closed form approximation which is based on log transformations of non-zero reads. We use SA-OBF to study two breast cancer RNAseq datasets.


international conference on bioinformatics | 2018

Biomarker Discovery via Optimal Bayesian Feature Filtering for Structured Multiclass Data

Ali Foroughi pour; Lori A. Dalton

Biomarker discovery aims to find a shortlist of high-profile biomarkers that can be further verified and utilized in downstream analysis. Many biomarkers exhibit structured multiclass behavior, where groups of interest may be clustered into a small number of patterns such that groups assigned the same pattern share a common governing distribution. While several algorithms are proposed for multiclass problems, to the best of our knowledge, none can take such constraints on the group-pattern assignment, or structure, as input, and output high-profile potential biomarkers as well as the structure they satisfy. While post analyses may be used to infer the structure, ignoring such information impedes feature selection to fully take advantage of experimental data. Recent work proposes a Bayesian framework for feature selection that places priors on feature-label distribution and label-conditioned feature distribution. Here we extend this framework for structured multiclass problems, solve the proposed model for the case of independent features, evaluate it in several synthetic simulations, apply it to two cancer datasets, and perform enrichment analysis. Many of the highly ranked genes and pathways are suggested to be affected in the cancer under study. We also find potentially new biomarkers. Not only do we detect biomarkers, but also make inferences about the underlying distributional connections across classes, which provide additional insight on cancer biology.


international conference on bioinformatics | 2017

Heuristic Algorithms for Feature Selection under Bayesian Models with Block-diagonal Covariance Structure

Ali Foroughi pour; Lori A. Dalton

Many bioinformatics studies aim to identify markers, or features that can be used to discriminate between distinct groups. In problems where strong individual markers are not available, or where interactions between gene products are of primary interest, it may be necessary to consider combinations of features as a marker family. To this end, recent work proposes a hierarchical Bayesian framework for feature selection that places a prior on the set of features we wish to select and on the label-conditioned feature distribution. While an analytical posterior under Gaussian models with block covariance structures is available, the optimal feature selection algorithm for this model remains intractable since it requires evaluating the posterior over the space of all possible covariance block structures and feature-block assignments. To address this computational barrier, prior work proposes a simple suboptimal algorithm, 2MNC-Robust, with robust performance across the space of block structures. Here, we present three new heuristic feature selection algorithms that outperform 2MNC-Robust on synthetic data. Enrichment analysis on real cancer data indicates that they also output many of the genes and pathways linked to the cancers under study.


international conference on bioinformatics | 2017

Integrating Prior Information with Bayesian Feature Selection

Ali Foroughi pour; Lori A. Dalton

Biomarker discovery aims to find biomarkers involved in biological mechanisms of a disease under study that can be further utilized for diagnosis, prognosis, drug development, etc. Although current high throughput technologies provide a deluge of data per point, research is usually constrained to small samples, impeding reliable biomarker discovery. Given the ongoing research on biomarker discovery during the past decades there exists an incredibly useful, but still limited, prior knowledge on cancer biology, such as small gene sets already known to be involved in cancer. This information, if properly integrated with feature selection, could potentially help to detect new biomarkers. However, most current methods used for biomarker discovery cannot easily be extended to account for such prior knowledge. Recent work proposes a hierarchical Bayesian framework for feature selection which places priors on both the identity of all features and identity-conditioned feature distribution. Various models are obtained based on this framework, including dependent good dependent bad (DGDB) model. An approximate solution of DGDB has been used with a set selection heuristic to successfully find genes involved in colon cancer and multiple sclerosis. While the approximate solution only relies on training data, we propose a new algorithm that takes advantage of previously known biomarkers to find additional biomarkers, hereafter called Informed Approximate 3MNC-DGDB (IA-3MNC). In three synthetic simulations we illustrate (a) IA-3MNC outperforms many popular feature selection algorithms, and (b) prior knowledge helps to correctly detect additional biomarkers, particularly under small samples. We apply IA-3MNC to colon cancer and breast cancer datasets deposited on gene expression omnibus with accession numbers GSE1456 and GSE41850, respectively. Studying top 20 selected genes of IA-3MNC and top 10 enriched pathways we find many of highly ranked genes and pathways are suggested to be involved in cancer.

Collaboration


Dive into the Ali Foroughi pour's collaboration.

Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge