Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ilkka Havukkala is active.

Publication


Featured researches published by Ilkka Havukkala.


Neural Computing and Applications | 2007

Classification consistency analysis for bootstrapping gene selection

Shaoning Pang; Ilkka Havukkala; Yingjie Hu; Nikola Kasabov

Consistency modelling for gene selection is a new topic emerging from recent cancer bioinformatics research. The result of operations such as classification, clustering, or gene selection on a training set is often found to be very different from the same operations on a testing set, presenting a serious consistency problem. In practice, the inconsistency of microarray datasets prevents many typical gene selection methods working properly for cancer diagnosis and prognosis. In an attempt to deal with this problem, this paper proposes a new concept of classification consistency and applies it for microarray gene selection problem using a bootstrapping approach, with encouraging results.


Applied Soft Computing | 2008

Soft computing methods to predict gene regulatory networks: An integrative approach on time-series gene expression data

Zeke S. H. Chan; Ilkka Havukkala; Vishal Jain; Yingjie Hu; Nikola Kasabov

To unravel the controlling mechanisms of gene regulation, in this paper we present the application of sophisticated soft computing methods applied on an important problem from Bioinformatics-inferring gene regulatory networks (GRN) from time series gene expression microarray data. The main questions addressed in this paper are: (a) what knowledge can be derived from different models? (b) Would an integrated approach be more suitable to reveal about the controls of gene regulation? To reduce the number of genes in addition to apply the appropriate clustering methods, here we have also considered the valuable inputs from the biological experiments. To infer the GRN we have applied: three computational intelligence methods-Least Angle Regression (LARS), Expectation Maximization (EM) with Kalman Filter (KF), and an Evolving Fuzzy Neural Network (EFuNN). The methods are applied on time series microarray data of Schizosaccharomyces pombe yeast cell-cycle genes. Each method reveals some new aspects of the problem and it is agreed that to infer the GRN and to understand the processes behind gene regulation it is more suitable to adopt such integrative approach as ours through which some new knowledge is discovered, such as: using LARS we hypothesize-first, an exoglucanase gene exg1 is now implicated to be tied with MCB cluster regulation and second, a mannosidase with histone linked mannoses. A new quantitative prediction is that the time delay of the interaction between two genes seems to be approximately 30min, or 0.17 cell cycles. Using the method of EM with KF, 25 cell cycle-regulated key genes were successfully clustered into three functionally co-regulated groups. We have also identified two genes namely Cdc22 and Suc22 that indeed interact with each other and are the potential candidates as a control in Ribonucleotide reductase (RNR) activity. Based on the EFuNN results and integrating knowledge from EM-KF method, we hypothesize that interaction between Suc22, Cdc22 and Mrc1 may be mediated by two other genes namely Cds1 and Spd1. The methods discussed and applied here can be used to analyze any kind of short time series of many interacting variables for inferring the regulatory network. Researchers should take such integrative computational intelligence approach seriously to understand the complex phenomenon of gene regulation and thus to simulate the development of the cell.


BMC Evolutionary Biology | 2007

Large-scale genomic 2D visualization reveals extensive CG-AT skew correlation in bird genomes

Xuegong Deng; Ilkka Havukkala; Xuemei Deng

BackgroundBird genomes have very different compositional structure compared with other warm-blooded animals. The variation in the base skew rules in the vertebrate genomes remains puzzling, but it must relate somehow to large-scale genome evolution. Current research is inclined to relate base skew with mutations and their fixation. Here we wish to explore base skew correlations in bird genomes, to develop methods for displaying and quantifying such correlations at different scales, and to discuss possible explanations for the peculiarities of the bird genomes in skew correlation.ResultsWe have developed a method called Base Skew Double Triangle (BSDT) for exhibiting the genome-scale change of AT/CG skew as a two-dimensional square picture, showing base skews at many scales simultaneously in a single image. By this method we found that most chicken chromosomes have high AT/CG skew correlation (symmetry in 2D picture), except for some microchromosomes. No other organisms studied (18 species) show such high skew correlations. This visualized high correlation was validated by three kinds of quantitative calculations with overlapping and non-overlapping windows, all indicating that chicken and birds in general have a special genome structure. Similar features were also found in some of the mammal genomes, but clearly much weaker than in chickens. We presume that the skew correlation feature evolved near the time that birds separated from other vertebrate lineages. When we eliminated the repeat sequences from the genomes, the AT and CG skews correlation increased for some mammal genomes, but were still clearly lower than in chickens.ConclusionOur results suggest that BSDT is an expressive visualization method for AT and CG skew and enabled the discovery of the very high skew correlation in bird genomes; this peculiarity is worth further study. Computational analysis indicated that this correlation might be a compositional characteristic, present not only in chickens, but also remained or developed in some mammals during evolution. Special aspects of bird metabolism related to e.g. flight may be the reason why birds evolved or retained the skew correlation. Our analysis also indicated that repetitive DNA sequence elements need to be taken into account in studying the evolution of the correlation between AT and CG skews.


international symposium on neural networks | 2006

Two-Class SVM trees (2-SVMT) for biomarker data analysis

Shaoning Pang; Ilkka Havukkala; Nikola Kasabov

High dimensionality two-class biomarker data (e.g. microarray and proteomics data with few samples but large numbers of variables) is often difficult to classify. Many currently used methods cannot easily deal with unbalanced datasets (when the number of samples in class 1 and class 2 are very different). This problem can be alleviated by the following new method: first, sample data space by recursive partitions, then use two-class support vector machine tree (2-SVMT) for classification. Recursive partitioning divides the feature space into more manageable portions, from which informative features are more easily found by 2-SVMT. Using two-class microarray and proteomics data for cancer diagnostics, we demonstrate that 2-SVMT results in higher classification accuracy and especially more consistent classification of various datasets than standard SVM, KNN or C4.5. The advantage of the method is its super robustness for class unbalanced datasets.


international conference on pattern recognition | 2006

Image and fractal information processing for large-scale chemoinformatics, genomics analyses and pattern discovery

Ilkka Havukkala; Lubica Benuskova; Shaoning Pang; Vishal Jain; Rene Kroon; Nikola Kasabov

Two promising approaches for handling large-scale biodata are presented and illustrated in several new contexts: molecular structure bitmap image processing for chemoinformatics, and fractal visualization methods for genome analyses. It is suggested that two-dimensional structure databases of bioactive molecules (e.g. proteins, drugs, folded RNAs), transformed to bitmap image databases, can be analysed by a variety of image processing methods, with an example of human microRNA folded 2D structures processed by Gabor filter. Another compact and efficient visualization method is comparison of huge amounts of genomic and proteomic data through fractal representation, with an example of analyzing oligomer frequencies in a bacterial phytoplasma genome. Bitmap visualization of bioinformatics data seems promising for complex parallel pattern discovery and large-scale genome comparisons, as powerful modern image processing methods can be applied to the 2D images.


Computational Intelligence in Biomedicine and Bioinformatics | 2008

Integrating Local and Personalised Modelling with Global Ontology Knowledge Bases for Biomedical and Bioinformatics Decision Support

Nikola Kasabov; Qun Song; Lubica Benuskova; Paulo C. M. Gottgtroy; Vishal Jain; Anju Verma; Ilkka Havukkala; Elaine Rush; Russel Pears; Alex Tjahjana; Yingjie Hu; Stephen G. MacDonell

A novel ontology based decision support framework and a development platform are described, which allow for the creation of global knowledge representation for local and personalised modelling and decision support. The main modules are: an ontology module; and a machine learning module. Both modules evolve through continuous learning from new data. Results from the machine learning procedures can be entered back to the ontology thus enriching its knowledge base and facilitating new discoveries. This framework supports global, local and personalised modelling. The latter is a process of model creation for a single person, based on their personal data and the information available in the ontology. Several methods for local and personalised modelling, both traditional and new, are described. A case study is presented on brain-gene-disease ontology, where a set of 12 genes related to central nervous system cancer are revealed from existing data and local profiles of patients are derived. Through ontology analysis, these genes are found to be related to different functions, areas, and other diseases of the brain. Two other case studies discussed in the paper are chronic disease ontology and risk evaluation, and cancer gene ontology and prognosis.


Journal of Computational Biology | 2007

On the reliable identification of plant sequences containing a polyadenylation site.

Ilkka Havukkala; Stijn Vanderlooy

It is a challenging task to predict with high reliability whether plant genomic sequences contain a polyadenylation (polyA) site or not. In this paper, we solve the task by means of a systematic machine-learning procedure applied on a dataset of 1000 Arabidopsis thaliana sequences flanking polyA sites. Our procedure consists of three steps. In the first step, we extract informative features from the sequences using the highly informative k-mer windows approach. Experiments with five classifiers show that the best performance is approximately 83%. In the second step, we improve performance to 95% by reducing the number of features using linear discriminant analysis, followed by applying the linear discriminant classifier. In the third step, we apply the transductive confidence machines approach and the receiver operating characteristic isometrics approach. The resulting two classifiers enable presetting any desired performance by dealing carefully with sequences for which it is unclear whether they contain polyA sites or not. For example, in our case study, we obtain 99% performance by leaving 26% of the sequences unclassified, and 100% performance by leaving 40% of the sequences unclassified. This is clearly useful for experimental verification of putative polyA sites in the laboratory. The novel methods in our machine-learning procedure should find applications in several areas of bioinformatics.


international conference hybrid intelligent systems | 2006

A Novel Microarray Gene Selection Method Based on Consistency

Yingjie Hu; Shaoning Pang; Ilkka Havukkala

Consistency modeling for gene selection is a new topic emerging from recent cancer bioinformatics research. The result of classification or clustering on a training set was often found very different from the same operations on a testing set. Here, we address this issue as a consistency problem. We propose a new concept of performance-based consistency and a new novel gene selection method, Genetic Algorithm Gene Selection method in terms of consistency (GAGSc). The proposed consistency concept and GAGSc method were investigated on eight benchmark microarray and proteomic datasets. The experimental results show that the different microarray datasets have different consistency characteristics, and that better consistency can lead to an unbiased and reproducible outcome with good disease prediction accuracy. More importantly, GAGSc has demonstrated that gene selection, with the proposed consistency measurement, is able to enhance the reproducibility in microarray diagnosis experiments.


International Journal of Image and Graphics | 2013

COMPARING BITMAPPED MICRORNA STRUCTURE IMAGES USING MUTUAL SYMMETRY

Arjan Kuijper; Ilkka Havukkala

We present a high-throughput method for analyzing large-scale bitmapped bio-data: processing of elongated molecular structures by 2D images and analyzing their shapes for chemoinformatics databases. Two-dimensional structure databases are transfered to bitmap images — a commonly used visualization widely spread online. Then, an efficient clustering of the molecular structures is achieved by a mutual symmetry-based binary matrix representation of the shapes. We present a method to compute the difference between two of such representations and evaluate its performance with respect to time and quality of matching. In our tests we use two bitmap databases, one containing true human microRNA folded 2D structures and one with claimed human microRNA folded 2D structures. We show the stability of the matching with respect to parameterization and orientation of the shapes. Our method enables a good automatic clustering of structures with high visual similarity.


pattern recognition in bioinformatics | 2007

Strong GC and AT skew correlation in chicken genome

Xuegong Deng; Xuemei Deng; Ilkka Havukkala

Chicken genome AT and GC skews for individual chromosomes were visualized simultaneously using a novel method of 2-dimensional colorcoded pixel matrix. The visualizations were compared to those of human, mouse and possum genomes. A strikingly strong correlation of AT skew and GC from small to large scale in chicken genome was found, compared to the other vertebrates. Some local skew correlations were also found for the other vertebrates, but only in small genomic scale. Quantitative measures of correlation were developed, and confirmed the special characteristic of chicken chromosomes. Possible explanations for uniqueness of birds in this respect are discussed. The phylogenetic distribution and evolutionary pressures responsible for this previously unreported skew correlation warrant further study.

Collaboration


Dive into the Ilkka Havukkala's collaboration.

Top Co-Authors

Avatar

Nikola Kasabov

Auckland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Shaoning Pang

Unitec Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Yingjie Hu

Auckland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Vishal Jain

Auckland University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xuegong Deng

Northeastern University

View shared research outputs
Top Co-Authors

Avatar

Xuemei Deng

China Agricultural University

View shared research outputs
Top Co-Authors

Avatar

Alex Tjahjana

Auckland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Anju Verma

Auckland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Elaine Rush

Auckland University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge