Suhaila Zainudin
National University of Malaysia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Suhaila Zainudin.
Computational Biology and Chemistry | 2015
Faridah Hani Mohamed Salleh; Shereena M. Arif; Suhaila Zainudin; Mohd Firdaus-Raih
A gene regulatory network (GRN) is a large and complex network consisting of interacting elements that, over time, affect each others state. The dynamics of complex gene regulatory processes are difficult to understand using intuitive approaches alone. To overcome this problem, we propose an algorithm for inferring the regulatory interactions from knock-out data using a Gaussian model combines with Pearson Correlation Coefficient (PCC). There are several problems relating to GRN construction that have been outlined in this paper. We demonstrated the ability of our proposed method to (1) predict the presence of regulatory interactions between genes, (2) their directionality and (3) their states (activation or suppression). The algorithm was applied to network sizes of 10 and 50 genes from DREAM3 datasets and network sizes of 10 from DREAM4 datasets. The predicted networks were evaluated based on AUROC and AUPR. We discovered that high false positive values were generated by our GRN prediction methods because the indirect regulations have been wrongly predicted as true relationships. We achieved satisfactory results as the majority of sub-networks achieved AUROC values above 0.5.
science and information conference | 2015
Zohreh Madhoushi; Abdul Razak Hamdan; Suhaila Zainudin
Sentiment Analysis (SA) task is to label peoples opinions as different categories such as positive and negative from a given piece of text. Another task is to decide whether a given text is subjective, expressing the writers opinions, or objective, expressing. These tasks were performed at different levels of analysis ranging from the document level, to the sentence and phrase level. Another task is aspect extraction which originated from aspect-based sentiment analysis in phrase level. All these tasks are under the umbrella of SA. In recent years a large number of methods, techniques and enhancements have been proposed for the problem of SA in different tasks at different levels. This survey aims to categorize SA techniques in general, without focusing on specific level or task. And also to review the main research problems in recent articles presented in this field. We found that machine learning-based techniques including supervised learning, unsupervised learning and semi-supervised learning techniques, Lexicon-based techniques and hybrid techniques are the most frequent techniques used. The open problems are that recent techniques are still unable to work well in different domain; sentiment classification based on insufficient labeled data is still a challenging problem; there is lack of SA research in languages other than English; and existing techniques are still unable to deal with complex sentences that requires more than sentiment words and simple parsing.
Expert Systems With Applications | 2017
Nur Shazila Mohamed; Suhaila Zainudin; Zulaiha Ali Othman
Abstract Quality data mining analysis based on microarray gene expression data is a good approach for disease classification and other fields, such as pharmacology, as well as a useful tool for medical innovation. One of the challenges in classification is that microarrays involve high dimensionality and a large number of redundant and irrelevant features. Feature selection is the most popular method for determining the optimal number of features that will be used for classification. Feature selection is important to accelerate learning, which is represented only by the optimal feature subset. The current approach for microarray feature selection for the filter method is to simply select the top-ranked genes, i.e., keeping the 50 or 100 best-ranked genes. However, the current approach is determined by human intuition; it requires trial and error, and thus, is time-consuming. Accordingly, this study aims to propose a metaheuristic approach for selecting the top n relevant genes in drug microarray data to enhance the minimum redundancy–maximum relevance (mRMR) filter method. Three metaheuristics are applied, namely, particle swarm optimization (PSO), cuckoo search (CS), and artificial bee colony (ABC). Subsequently, k -nearest neighbor and support vector machine are used as classifiers to evaluate classification performance. The experiment used a microarray gene dataset of liver xenobiotic and pharmacological responses. Experimental results show that meta-heuristic is more efficient approaches that have reduced the complexity of the classifier. Furthermore, the results show that mRMR-CS exhibits the best performance compared with mRMR-PSO and mRMR-ABC.
intelligent systems design and applications | 2010
Suhaila Zainudin; Nur Shazila Mohamed
Research in systems biology integrates experimental, theoretical, and modeling techniques to study and understand biological processes such as gene regulation. The genomic sequences for human and other model organisms such as yeast and bacteria are already established. The next major step is to discover functional roles of genes whose functions are not yet discovered and to investigate how genes interact with each other to perform different biological processes. DNA microarray technology provides access to large-scale gene expression data which are necessary for understanding functional role of genes and how genes interact on a global scale. Gene network reconstruction is one of the major research areas in Systems Biology. Modeling gene network systems will generate useful hypothesis about novel gene functions. Clustering gene expression data is used to analyze the result of microarray study. This method is often useful in understanding how a class of genes performs together during a biological process. Therefore, the purpose of this research is to investigate different clustering algorithms used in this paper including Λ-means clustering, fuzzy c-means and self-organizing maps (SOM). Clusters that are produced from these methods are then used to develop the graphical model using Bayesian Network (BN). Experiment results from the clustering methods are considered towards the statistical validation and then compared with each other. From out experiments, we found that SOM is better than Λ-means and fuzzy c-means since it produced the highest total number of clusters.
intelligent systems design and applications | 2008
Suhaila Zainudin; Safaai Deris
Gene network reconstruction is a multidisciplinary research area involving data mining, machine learning, statistics, ontologies and others. Reconstructed gene network allows us to understand how genes interact with each other. However, network construction is very complex due to highly interactive nature of genes. A proposed approach to solve this complex problem is to cluster the genes according to similarity in their gene expression profiles. We applied k-means clustering with k = 10 to come up with ten clusters of genes. Then, we applied Bayesian Network structure learning with Hill-climbing search strategy and Akaike Information Criterion score to search for the best network. We compared inferred interactions to a reference positive interactions dataset and found similarities between our inferred interactions and the reference. We further study the gene interactions using Gene Ontology. From our findings, we conclude that the clustering step is essential in gene network reconstruction. Clustering produced better group of genes for Bayesian Network learning. Larger clusters also produced more gene interactions. Gene Ontology can be combined with clustering to produce better quality clusters to improve gene network construction.
international conference on computational science and its applications | 2007
Suhaila Zainudin; Safaai Deris
Gene network is a representation for gene interactions. A gene collaborates with other genes in order to function. Past researches have successfully inferred gene network from gene expression microarray data. Gene expression microarray data represent different levels of gene expressions for organisms during biological activity such as cell cycle. A framework for gene network inference is to normalize gene expression data, discretize data, learn gene network and evaluate gene interactions. This framework was used to learn the gene network for two S. cerevisiae gene expression datasets (Spellman Cell cycle and Gasch Yeast Stress). Gene interaction inference was also done on data contained in 8 major clusters found by Spellman. The inferred networks were compared to gene interaction data curated by Biogrid. Results from the comparison shows that some of the inferred gene interactions agree with data contained in Biogrid and by referring to curated genetic interactions in Biogrid, we can understand the significance of computationally inferred gene interactions.
fuzzy systems and knowledge discovery | 2012
Yazan Alaya AL-Khassawneh; Azuraliza Abu Bakar; Suhaila Zainudin
Graph-based Association Rules Mining (ARM) is a research area that represents a transactional database into a graph structure to optimize the search for frequent item sets. Sub-graph search is the process of pruning the search by looking for the best representation of connected nodes in a graph to represent the fully connected graphs. Triangle Counting Approach is one of the sub-graph search approaches to find the most represented graph. This study aims to employ the Triangle Counting Approach for graph-based association rules mining. A triangle counting method for graph-based ARM is proposed to prune the graph in the search for frequent item sets. The triangle counting is integrated with one of the graph-based ARM methods. It consists of four important phases; data representation, triangle construction, bit vector representation, and triangle integration with the graph-based ARM method. The performance of the proposed method is compared with the original graph-based ARM. Experimental results show that the proposed method reduces the execution time of rules generation and produces less number of rules with higher confidence.
international conference on electrical engineering and informatics | 2015
Nor Ashikin Mohamad Kamal; Azuraliza Abu Bakar; Suhaila Zainudin
Protein dataset contains high dimensional feature space. These features may encompass of noise and not relatively to protein function. Therefore, we need to select the appropriate features to improve the efficiency and performance of the classifier. Feature selection is an important step in any classification tasks. Filter methods are important in order to obtain only the relevant features to the class and to avoid redundancy. While wrapper methods are applied to get optimized features and better classification accuracy. This paper proposed a feature selection strategy for hierarchical classification of G-Protein-Coupled Receptors (GPCR) based on hybridization of correlation feature selection (CFS) filter and genetic algorithm (GA) wrapper methods. The optimum features were then classified using K-nearest neighbor algorithm. These methods are capable to reduce the features and achieved comparable classification accuracy at every hierarchy level. The results also shown that the integration between CFS and GA is capable of searching the optimum features for hierarchical protein classification.
international conference on electrical engineering and informatics | 2011
Azuraliza Abu Bakar; Nurfathehah Idris; Abdul Razak Hamdan; Zalinda Othman; Mohd Zakri Ahmad Nazari; Suhaila Zainudin
This study aim to investigate the data mining task and techniques specifically sequential pattern mining on the outbreak detection in oil and gas pollution area. The sequential pattern mining can be treated as a classification problem if enough data for certain sequence of time is available, as association problem if large number of related attributes are available, or can be seen as the deviation detection problem if the available data contain only few rare pattern or outliers. In this paper, the classification technique, decision tree is used for classification, and association rules mining is used for the outbreak detection task in oil and gas air dataset. The study found that unsupervised clustering using K-Means algorithm potentially obtain the rarely patterns of data distributing on several groups of pollutants and the average levels of supervised classification using the decision tree is a bit higher than the levels of association rules mining classification and appropriately used to classify the data by contaminants. Association rules mining on the other hand produce several sequences rules of contaminants. This study has high potential in producing quality rules for outbreak detection.
ieee region 10 conference | 2001
Suhaila Zainudin; Abdul Razak Hamdan
The paper discusses the proposed design for a workflow engine. Workflow is the automation of procedures where documents, information or work is passed between several processing entities. Work contains activities with certain alms. Usually any work can be divided into smaller subworks. When the subworks have been executed, the original work is done. Each processing entity executes its own part before the work is passed onto the next processing entity. The paper focuses on the work database and two main engine modules. The modules are the work administrator and the user interface. The work administrator enables the engine to process the workflows. The user interface provides two-way communication between the engine and the processing entities. The engine enables the execution and scheduling of work by ensuring that it is conveyed to the appropriate entity during a suitable interval. The work database stores the work processing information. Initial results show that the prototype based on the design is capable of processing production-based workflow.