Ricco Rakotomalala | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ricco Rakotomalala is active.

Explore More

Publication

Featured researches published by Ricco Rakotomalala.

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems | 1998

FUSINTER: a method for discretization of continuous attributes

Djamel A. Zighed; Sabine Rabas'eda; Ricco Rakotomalala

In induction graphs methods such as C4.51 or SIPINA2, taking continuous attributes into account needs particular discretization procedures. In this paper, we propose on the one hand, an axiomatic leading to a set of criteria which can be used for continuous attributes discretization, and on the other hand, a method of discretization called FUSINTER. The results obtained by FUSINTER are compared to those obtained by techniques developed by Fayyad and Irani3 and Kerber4 and they have proved better for the majority of the examples studied.

Review of Scientific Instruments | 2005

Library design using genetic algorithms for catalyst discovery and optimization

Frederic Clerc; Mourad Lengliz; David Farrusseng; C. Mirodatos; Silvia R. M. Pereira; Ricco Rakotomalala

This study reports a detailed investigation of catalyst library design by genetic algorithm (GA). A methodology for assessing GA configurations is described. Operators, which promote the optimization speed while being robust to noise and outliers, are revealed through statistical studies. The genetic algorithms were implemented in GA platform software called OptiCat, which enables the construction of custom-made workflows using a tool box of operators. Two separate studies were carried out (i) on a virtual benchmark and (ii) on real surface response which is derived from HT screening. Additionally, we report a methodology to model a complex surface response by binning the search space in small zones that are then independently modeled by linear regression. In contrast to artificial neural networks, this approach allows one to obtain an explicit model in an analogical form that can be further used in Excel or entered in OptiCat to perform simulations. While speeding the implementation of a hybrid algorithm...

international conference on information and communication technologies | 2004

Textmining, feature selection and datamining for proteins classification

Faouzi Mhamdi; Mourad Elloumi; Ricco Rakotomalala

The present study presents the classification of proteins by basing on its primary structures. The sequence of proteins collected in a file. The application of textmining technique for extracting the features is proposed. An algorithm is also developed which extracts all the n-grams existing in the file of data and produced a learning file. Algorithm supplies three files, Boolean file, that is a relation of existence or not existence, frequencies files and occurrences files. The applied forward selection and backward elimination method is a learning file with an accepted features numbers.

european conference on principles of data mining and knowledge discovery | 2000

Fast Feature Selection Using Partial Correlation for Multi-vaslued Attributes

Stéphane Lallich; Ricco Rakotomalala

We propose a fast feature selection method in supervised learning for multi-valued attributes. The main idea is to rewrite the multi-valued problem in the space of examples into a boolean problem in the space of pairwise examples. On basis of this approach, we can use point correlation coefficient which is null in the case of conditional independence, and verifies a formula connecting partial coefficients with marginal coefficients. This property allows to reduce considerably the computing times because a single pass over the database is necessary to compute all coefficients. We test our algorithm on benchmark databases.

Information Sciences | 1996

A comparison of some contextual discretization methods

Sabine Rabas'eda; Ricco Rakotomalala; Marc Sebban

In induction methods by graphs, as C4.5 or SIPINA, taking continuous attributes into account needs particular discretization procedures. In this paper, we propose on the one hand, an axiom leading to a set of criteria which can be used for continuous attributes discretization, and on the other hand, a method of discretization called FUSINTER. The results obtained by FUSINTER are compared to those obtained by techniques developed by other researchers, and they have proved better for the majority of the examples studied. We also discuss other approaches based on statistical concepts and appealing either to inertia criteria or to nonparametric tests, as the one of Moods runs.

Archive | 2001

Sampling Strategy for Building Decision Trees from Very Large Databases Comprising Many Continuous Attributes

Jean-Hugues Chauchat; Ricco Rakotomalala

We propose a fast and efficient sampling strategy to build decision trees from a very large database, even when there are many continuous attributes which must be discretized at each step. Successive samples are used, one on each tree node. After a brief description of two fast sequential simple random sampling methods, we apply elements of statistical theory in order to determine the sample size that is sufficient at each step to obtain a decision tree as efficient as one built on the whole database. Applying the method to a simulated database (virtually infinite size), and to five usual benchmarks, confirms that when the database is large and contains many numerical attributes, our strategy of fast sampling on each node (with sample size about n = 300 or 500) speed up the mining process while maintaining the accuracy of the classifier.

european conference on principles of data mining and knowledge discovery | 1999

Studying the Behavior of Generalized Entropy in Induction Trees Using a M-of-N Concept

Ricco Rakotomalala; Stéphane Lallich; S. Di Palma

This paper study splitting criterion in decision trees using three original points of view. First we propose a unified formalization for association measures based on entropy of type beta. This formalization includes popular measures such as Gini index or Shannon entropy. Second, we generate artificial data from M-of-N concepts whose complexity and class distribution are controlled. Third, our experiment allows us to study the behavior of measures on datasets of growing complexity. The results show that the differences of performances between measures, which are significant when there is no noise in the data, disappear when the level of noise increases.

computer recognition systems | 2005

Feature Ranking for Protein Classification

Faouzi Mhamdi; Ricco Rakotomalala; Mourad Elloumi

In this paper, a knowledge discovery framework is used for protein classification. The processing is achieved in three steps: feature extraction, feature ranking and feature selection. Inspirited from text mining results for the first step, we use n-grams descriptors; descriptors are ranked from chi-2 statistical indices in the second step; and in the final step, the subset of descriptors is selected which will minimize the prediction error rate using a k-nearest neighbor classifier. Experiments show that this framework gives good results: the dimensionality reduction is effective and increases the classifier performances.

european conference on principles of data mining and knowledge discovery | 2000

Sampling Strategies for Targeting Rare Groups from a Bank Customer Database

Jean-Hugues Chauchat; Ricco Rakotomalala; Didier Robert

This paper presents various balanced sampling strategies for building decision trees in order to target rare groups. A new coefficient to compare targeting performances of various learning strategies is introduced. A real life application of targeting specific bank customer group for marketing actions is described. Results shows that local sampling on the nodes while constructing the tree requires small samples to achieve the performance of processing the complete base, with dramatically reduced computing times.

Archive | 2000