A. Fazel Famili | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where A. Fazel Famili is active.

Explore More

Publication

Featured researches published by A. Fazel Famili.

Archive | 2005

Advances in Intelligent Data Analysis VI

A. Fazel Famili; Joost N. Kok; José M. Peña; Arno Siebes; Ad Feelders

Probabilistic Latent Clustering of Device Usage.- Condensed Nearest Neighbor Data Domain Description.- Balancing Strategies and Class Overlapping.- Modeling Conditional Distributions of Continuous Variables in Bayesian Networks.- Kernel K-Means for Categorical Data.- Using Genetic Algorithms to Improve Accuracy of Economical Indexes Prediction.- A Distance-Based Method for Preference Information Retrieval in Paired Comparisons.- Knowledge Discovery in the Identification of Differentially Expressed Genes in Tumoricidal Macrophage.- Searching for Meaningful Feature Interactions with Backward-Chaining Rule Induction.- Exploring Hierarchical Rule Systems in Parallel Coordinates.- Bayesian Networks Learning for Gene Expression Datasets.- Pulse: Mining Customer Opinions from Free Text.- Keystroke Analysis of Different Languages: A Case Study.- Combining Bayesian Networks with Higher-Order Data Representations.- Removing Statistical Biases in Unsupervised Sequence Learning.- Learning from Ambiguously Labeled Examples.- Learning Label Preferences: Ranking Error Versus Position Error.- FCLib: A Library for Building Data Analysis and Data Discovery Tools.- A Knowledge-Based Model for Analyzing GSM Network Performance.- Sentiment Classification Using Information Extraction Technique.- Extending the SOM Algorithm to Visualize Word Relationships.- Towards Automatic and Optimal Filtering Levels for Feature Selection in Text Categorization.- Block Clustering of Contingency Table and Mixture Model.- Adaptive Classifier Combination for Visual Information Processing Using Data Context-Awareness.- Self-poised Ensemble Learning.- Discriminative Remote Homology Detection Using Maximal Unique Sequence Matches.- From Local Pattern Mining to Relevant Bi-cluster Characterization.- Machine-Learning with Cellular Automata.- MDS polar : A New Approach for Dimension Reduction to Visualize High Dimensional Data.- Miner Ants Colony: A New Approach to Solve a Mine Planning Problem.- Extending the GA-EDA Hybrid Algorithm to Study Diversification and Intensification in GAs and EDAs.- Spatial Approach to Pose Variations in Face Verification.- Analysis of Feature Rankings for Classification.- A Mixture Model-Based On-line CEM Algorithm.- Reliable Hierarchical Clustering with the Self-organizing Map.- Statistical Recognition of Noun Phrases in Unrestricted Text.- Successive Restrictions Algorithm in Bayesian Networks.- Modelling the Relationship Between Streamflow and Electrical Conductivity in Hollin Creek, Southeastern Australia.- Biological Cluster Validity Indices Based on the Gene Ontology.- An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering.- Dealing with Data Corruption in Remote Sensing.- Regularized Least-Squares for Parse Ranking.- Bayesian Network Classifiers for Time-Series Microarray Data.- Feature Discovery in Classification Problems.- A New Hybrid NM Method and Particle Swarm Algorithm for Multimodal Function Optimization.- Detecting Groups of Anomalously Similar Objects in Large Data Sets.

Artificial Intelligence in Medicine | 2004

Data mining of gene expression changes in Alzheimer brain

P. Roy Walker; Brandon Smith; Qing Yan Liu; A. Fazel Famili; Julio J. Valdés; Z. Liu; Boleslaw Lach

Genome-wide transcription profiling is a powerful technique for studying the enormous complexity of cellular states. Moreover, when applied to disease tissue it may reveal quantitative and qualitative alterations in gene expression that give information on the context or underlying basis for the disease and may provide a new diagnostic approach. However, the data obtained from high-density microarrays is highly complex and poses considerable challenges in data mining. The data requires care in both pre-processing and the application of data mining techniques. This paper addresses the problem of dealing with microarray data that come from two known classes (Alzheimer and normal). We have applied three separate techniques to discover genes associated with Alzheimer disease (AD). The 67 genes identified in this study included a total of 17 genes that are already known to be associated with Alzheimers or other neurological diseases. This is higher than any of the previously published Alzheimers studies. Twenty known genes, not previously associated with the disease, have been identified as well as 30 uncharacterized expressed sequence tags (ESTs). Given the success in identifying genes already associated with AD, we can have some confidence in the involvement of the latter genes and ESTs. From these studies we can attempt to define therapeutic strategies that would prevent the loss of specific components of neuronal function in susceptible patients or be in a position to stimulate the replacement of lost cellular function in damaged neurons. Although our study is based on a relatively small number of patients (four AD and five normal), we think our approach sets the stage for a major step in using gene expression data for disease modeling (i.e. classification and diagnosis). It can also contribute to the future of gene function identification, pathology, toxicogenomics, and pharmacogenomics.

canadian conference on artificial intelligence | 2006

Learning and evaluation in the presence of class hierarchies: application to text categorization

Svetlana Kiritchenko; Stan Matwin; Richard Nock; A. Fazel Famili

This paper deals with categorization tasks where categories are partially ordered to form a hierarchy. First, it introduces the notion of consistent classification which takes into account the semantics of a class hierarchy. Then, it presents a novel global hierarchical approach that produces consistent classification. This algorithm with AdaBoost as the underlying learning procedure significantly outperforms the corresponding “flat” approach, i.e. the approach that does not take into account the hierarchical information. In addition, the proposed algorithm surpasses the hierarchical local top-down approach on many synthetic and real tasks. For evaluation purposes, we use a novel hierarchical evaluation measure that has some attractive properties: it is simple, requires no parameter tuning, gives credit to partially correct classification and discriminates errors by both distance and depth in a class hierarchy.

BMC Evolutionary Biology | 2007

Evolution of motif variants and positional bias of the cyclic-AMP response element

Brandon Smith; Hung Fang; Youlian Pan; P. Roy Walker; A. Fazel Famili; Marianna Sikorska

BackgroundTranscription factors regulate gene expression by interacting with their specific DNA binding sites. Some transcription factors, particularly those involved in transcription initiation, always bind close to transcription start sites (TSS). Others have no such preference and are functional on sites even tens of thousands of base pairs (bp) away from the TSS.The Cyclic-AMP response element (CRE) binding protein (CREB) binds preferentially to a palindromic sequence (TGACGTCA), known as the canonical CRE, and also to other CRE variants. CREB can activate transcription at CREs thousands of bp away from the TSS, but in mammals CREs are found far more frequently within 1 to 150 bp upstream of the TSS than in any other region. This property is termed positional bias.The strength of CREB binding to DNA is dependent on the sequence of the CRE motif. The central CpG dinucleotide in the canonical CRE (TGACG TCA) is critical for strong binding of CREB dimers. Methylation of the cytosine in the CpG can inhibit binding of CREB. Deamination of the methylated cytosines causes a C to T transition, resulting in a functional, but lower affinity CRE variant, TGAT GTCA.ResultsWe performed genome-wide surveys of CREs in a number of species (from worm to human) and showed that only vertebrates exhibited a CRE positional bias. We performed pair-wise comparisons of human CREs with orthologous sequences in mouse, rat and dog genomes and found that canonical and TGAT GTCA variant CREs are highly conserved in mammals. However, when orthologous sequences differ, canonical CREs in human are most frequently TGAT GTCA in the other species and vice-versa. We have identified 207 human CREs showing such differences.ConclusionOur data suggest that the positional bias of CREs likely evolved after the separation of urochordata and vertebrata. Although many canonical CREs are conserved among mammals, there are a number of orthologous genes that have canonical CREs in one species but the TGAT GTCA variant in another. These differences are likely due to deamination of the methylated cytosines in the CpG and may contribute to differential transcriptional regulation among orthologous genes.

Bioinformatics | 2004

Evaluation and optimization of clustering in gene expression data analysis

A. Fazel Famili; Ganming Liu; Z. Liu

MOTIVATION A measurement of cluster quality is needed to choose potential clusters of genes that contain biologically relevant patterns of gene expression. This is strongly desirable when a large number of gene expression profiles have to be analyzed and proper clusters of genes need to be identified for further analysis, such as the search for meaningful patterns, identification of gene functions or gene response analysis. RESULTS We propose a new cluster quality method, called stability, by which unsupervised learning of gene expression data can be performed efficiently. The method takes into account a clusters stability on partition. We evaluate this method and demonstrate its performance using four independent, real gene expression and three simulated datasets. We demonstrate that our method outperforms other techniques listed in the literature. The method has applications in evaluating clustering validity as well as identifying stable clusters. AVAILABILITY Please contact the first author.

Journal of Bioinformatics and Computational Biology | 2004

DISCOVERY OF FUNCTIONAL GENES FOR SYSTEMIC ACQUIRED RESISTANCE IN ARABIDOPSIS THALIANA THROUGH INTEGRATED DATA MINING

Youlian Pan; Jeffrey D. Pylatuik; Junjun Ouyang; A. Fazel Famili; Pierre R. Fobert

Various data mining techniques combined with sequence motif information in the promoter region of genes were applied to discover functional genes that are involved in the defense mechanism of systemic acquired resistance (SAR) in Arabidopsis thaliana. A series of K-Means clustering with difference-in-shape as distance measure was initially applied. A stability measure was used to validate this clustering process. A decision tree algorithm with the discover-and-mask technique was used to identify a group of most informative genes. Appearance and abundance of various transcription factor binding sites in the promoter region of the genes were studied. Through the combination of these techniques, we were able to identify 24 candidate genes involved in the SAR defense mechanism. The candidate genes fell into 2 highly resolved categories, each category showing significantly unique profiles of regulatory elements in their promoter regions. This study demonstrates the strength of such integration methods and suggests a broader application of this approach.

computational systems bioinformatics | 2004

Selection of putative cis-regulatory motifs through regional and global conservation

Youlian Pan; Brandon Smith; Hung Fang; A. Fazel Famili; Marianna Sikorska; P. Roy Walker

Cis-regulatory motifs are often overrepresented in promoters and may exhibit frequency biases in subpromoter regions (SPRs). Many probabilistic algorithms have been used to predict such motifs, but they tend to generate many false positives. We devised a novel algorithm, MotifFilter, that computes representation indices (RIs) for putative motifs. MotifFilters RI is a ratio of the actual over expected frequency of a motif in promoters, SPRs or random genomic DNA that takes into account of the nucleotide probability distributions in these regions. This approach was applied to a genome-wide survey of putative cAMP-response elements (CREs) for motifs generated by a profile hidden Markov model. Twenty of 144 putative CRE motifs found in the survey were retained by the MotifFilter.

artificial intelligence applications and innovations | 2005

An Approach to Automated Knowledge Discovery in Bioinformatics

Junjun Ouyang; A. Fazel Famili; Weiling Xu

Extensive data mining applications to bioinformatics research have shown that knowledge discovery requires repeated manual interventions, and that conglomerating and summarizing the results would be time consuming and sometimes error prone. To assist in efficiently applying data mining technologies in bioinformatics, we have developed Automation facilities in our data mining software suite. Experiences gained from case studies are extracted and presented as scenarios, which are sets of data processing and analysis operations for specific data mining objectives. Built as sequences of these predefined scenarios, procedures apply previously established data mining strategies to new data sets in an automated way. Automation also highlights the results particularly related to researchers’ own areas of interest. We present insights into our automated knowledge discovery and two example scenarios extracted from one case study to demonstrate the usefulness of our approach.

industrial and engineering applications of artificial intelligence and expert systems | 2004

Knowledge discovery in hepatitis C virus transgenic mice

A. Fazel Famili; Junjun Ouyang; Marko Kryworuchko; Ikuri Alvarez-Maya Brandon Smith; Francisco Diaz-Mitoma

One of the difficulties of using Artificial Neural Networks (ANNs) to estimate atmospheric temperature is the large number of potential input variables available. In this study, four different feature extraction methods were used to reduce the input vector to train four networks to estimate temperature at different atmospheric levels. The four techniques used were: genetic algorithms (GA), coefficient of determination (CoD), mutual information (MI) and simple neural analysis (SNA). The results demonstrate that of the four methods used for this data set, mutual information and simple neural analysis can generate networks that have a smaller input parameter set, while still maintaining a high degree of accuracy.

iberoamerican congress on pattern recognition | 2014

Searching for Patterns in Imbalanced Data: Methods and Alternatives with Case Studies in Life Sciences

A. Fazel Famili

The prime motivation for pattern discovery and machine learning research has been the collection and warehousing of large amounts of data, in many domains such as life sciences and industrial processes. Examples of unique problems arisen are situations where the data is imbalanced. The class imbalance problem corresponds to situations where majority of cases belong to one class and a small minority belongs to the other, which in many cases is equally or even more important. To deal with this problem a number of approaches have been studied in the past. In this talk we provide an overview of some existing methods and present novel applications that are based on identifying the inherent characteristics of one class vs the other. We present the results of a number of studies focusing on real data from life science applications.

Explore More