Fazel Famili | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fazel Famili is active.

Explore More

Publication

Featured researches published by Fazel Famili.

Information Sciences | 2014

Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines

Sebastián Maldonado; Richard Weber; Fazel Famili

Feature selection and classification of imbalanced data sets are two of the most interesting machine learning challenges, attracting a growing attention from both, industry and academia. Feature selection addresses the dimensionality reduction problem by determining a subset of available features to build a good model for classification or prediction, while the class-imbalance problem arises when the class distribution is too skewed. Both issues have been independently studied in the literature, and a plethora of methods to address high dimensionality as well as class-imbalance has been proposed. The aim of this work is to simultaneously explore both issues, proposing a family of methods that select those attributes that are relevant for the identification of the target class in binary classification. We propose a backward elimination approach based on successive holdout steps, whose contribution measure is based on a balanced loss function obtained on an independent subset. Our experiments are based on six highly imbalanced microarray data sets, comparing our methods with well-known feature selection techniques, and obtaining a better prediction with consistently fewer relevant features.

IEEE Intelligent Systems & Their Applications | 1999

Data mining to predict aircraft component replacement

Sylvain Létourneau; Fazel Famili; Stan Matwin

Aircraft sensors generate vast amounts of data, much of which languishes in storage after its initial analysis. The authors have developed an approach for using this data to build models for predicting aircraft component failure. Their approach addresses several key data-mining issues.

BMC Bioinformatics | 2012

Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm

Alain B. Tchagang; Sieu Phan; Fazel Famili; Heather Shearer; Pierre R. Fobert; Yi Huang; Jitao Zou; Daiqing Huang; Adrian J. Cutler; Ziying Liu; Youlian Pan

BackgroundNowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space.ResultsWe developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples.ConclusionsOur analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.

BMC Bioinformatics | 2010

GOAL: A software tool for assessing biological significance of genes groups

Alain B. Tchagang; Alexander Gawronski; Hugo Bérubé; Sieu Phan; Fazel Famili; Youlian Pan

BackgroundModern high throughput experimental techniques such as DNA microarrays often result in large lists of genes. Computational biology tools such as clustering are then used to group together genes based on their similarity in expression profiles. Genes in each group are probably functionally related. The functional relevance among the genes in each group is usually characterized by utilizing available biological knowledge in public databases such as Gene Ontology (GO), KEGG pathways, association between a transcription factor (TF) and its target genes, and/or gene networks.ResultsWe developed GOAL: G ene O ntology A naL yzer, a software tool specifically designed for the functional evaluation of gene groups. GOAL implements and supports efficient and statistically rigorous functional interpretations of gene groups through its integration with available GO, TF-gene association data, and association with KEGG pathways. In order to facilitate more specific functional characterization of a gene group, we implement three GO-tree search strategies rather than one as in most existing GO analysis tools. Furthermore, GOAL offers flexibility in deployment. It can be used as a standalone tool, a plug-in to other computational biology tools, or a web server application.ConclusionWe developed a functional evaluation software tool, GOAL, to perform functional characterization of a gene group. GOAL offers three GO-tree search strategies and combines its strength in function integration, portability and visualization, and its flexibility in deployment. Furthermore, GOAL can be used to evaluate and compare gene groups as the output from computational biology tools such as clustering algorithms.

International Journal of Computer Mathematics | 2007

A novel pattern based clustering methodology for time-series microarray data

Sieu Phan; Fazel Famili; Zuojian Tang; Youlian Pan; Ziying Liu; Junjun Ouyang; Anne E.G. Lenferink; Maureen D. O'connor

Identification of co-expressed genes sharing similar biological behaviours is an essential step in functional genomics. Traditional clustering techniques are generally based on overall similarity of expression levels and often generate clusters with mixed profile patterns. A novel pattern recognition method for selecting co-expressed genes based on rate of change and modulation status of gene expression at each time interval is proposed in this paper. This method is capable of identifying gene clusters consisting of highly similar shapes of expression profiles and modulation patterns. Furthermore, we develop a quality index based on the semantic similarity in gene annotations to assess the likelihood of a cluster being a co-regulated group. The effectiveness of the proposed methodology is demonstrated by applying it to the well-known yeast sporulation dataset and an in-house cancer genomics dataset.

intelligent data analysis | 1999

Application of Rough Sets Algorithms to Prediction of Aircraft Component Failure

José M. Peña; Sylvain Létourneau; Fazel Famili

This paper presents application of Rough Sets algorithms to prediction of component failures in aerospace domain. To achieve this we first introduce a data preprocessing approach that consists of case selection, data labeling and attribute reduction. We also introduce a weight function to represent the importance of predictions as a function of time before the actual failure. We then build several models using rough set algorithms and reduce these models through a postprocessing phase. End results for failure prediction of a specific aircraft component are presented.

international conference on unmanned aircraft systems | 2014

Experimental evaluation of four feature detection methods for close range and distant airborne targets for Unmanned Aircraft Systems applications

Dan Tulpan; Nabil Belacel; Fazel Famili; Kristopher Ellis

Feature detection for Unmanned Aircraft Systems (UAS) sense and avoid scenarios is a crucial preliminary step for target detection. Its importance culminates when distant (pixel size) targets representing incoming aircraft are considered. This paper presents an experimental evaluation of four popular feature detection methods using flight test data and based on evaluation criteria such as first detection distance and percentage of frames with detected target features. Our results show that for close range targets all four methods have similar performance, while for distant (pixel-size) targets, the Shi and Tomasi method outperforms the other three methods (Harris-Stephens-Plessey, SUSAN and FAST).

intelligent data analysis | 2010

CliDaPa: A new approach to combining clinical data with DNA microarrays

Santiago González; L. Guerra; Víctor Robles; José M. Peña; Fazel Famili

Traditionally, clinical data have been used as the only source of information to diagnose diseases. Nowadays, other types of information, such as various forms of omics data (e.g. DNA microarrays), are taken into account to improve diagnosis and even prognosis in many diseases. This paper proposes a new approach, called CliDaPa, for efficiently combining both sources of information, namely clinical data and gene expressions, in order to further improve estimations. In this approach, patients are firstly divided into different clusters (represented as a decision tree) depending on their clinical information. Thus, different groups of patients with similar behaviors are identified. Each individual group can be studied and classified separately, using only gene expression data, with different supervised classification methods, such as decision trees, Bayesian networks or lazy induction learning. To validate this method, two datasets based on Breast Cancer, a high social impact disease, have been used. For the proposed approach, internal (0.632 Bootstrap) and external validations have been carried out. Results have shown improvements in accuracy in the internal and external validation compared with the standard methods with clinical data and gene expression data separately. Thus, the CliDaPa algorithm fulfills our proposed objectives.

knowledge discovery and data mining | 2000

Data mining to detect abnormal behavior in aerospace data

José M. Peña; Fazel Famili; Sylvain Létourneau

The operation and maintenance of todays aircraft is a complex task. It requires use of some state-of-the-art data mining facilites that are not currently available. This paper is about dev elopment and use of data mining techniques to detect abnormal situations in aircraft operation. Using historical sensor data, that is normally generated during the operation of aircraft, w e induce models to predict abonormal situations in aircraft engines. The method involv es creating new features from raw data and identifying trends in particular parameters of interest. We describe how models generated from individual aircraft with abnormal situations can be combined to generate a single model. We evaluate our approach using over 5 y ears of historical data from the operation of engines of 34 Airbus A-320s.

Oncotarget | 2016

Computational selection of antibody-drug conjugate targets for breast cancer

François Fauteux; Jennifer J. Hill; Maria L. Jaramillo; Youlian Pan; Sieu Phan; Fazel Famili; Maureen O’Connor-McCourt

The selection of therapeutic targets is a critical aspect of antibody-drug conjugate research and development. In this study, we applied computational methods to select candidate targets overexpressed in three major breast cancer subtypes as compared with a range of vital organs and tissues. Microarray data corresponding to over 8,000 tissue samples were collected from the public domain. Breast cancer samples were classified into molecular subtypes using an iterative ensemble approach combining six classification algorithms and three feature selection techniques, including a novel kernel density-based method. This feature selection method was used in conjunction with differential expression and subcellular localization information to assemble a primary list of targets. A total of 50 cell membrane targets were identified, including one target for which an antibody-drug conjugate is in clinical use, and six targets for which antibody-drug conjugates are in clinical trials for the treatment of breast cancer and other solid tumors. In addition, 50 extracellular proteins were identified as potential targets for non-internalizing strategies and alternative modalities. Candidate targets linked with the epithelial-to-mesenchymal transition were identified by analyzing differential gene expression in epithelial and mesenchymal tumor-derived cell lines. Overall, these results show that mining human gene expression data has the power to select and prioritize breast cancer antibody-drug conjugate targets, and the potential to lead to new and more effective cancer therapeutics.

Explore More