Nguyen-Khang Pham | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nguyen-Khang Pham is active.

Explore More

Publication

Featured researches published by Nguyen-Khang Pham.

knowledge discovery and data mining | 2008

A comparison of different off-centered entropies to deal with class imbalance for decision trees

Philippe Lenca; Stéphane Lallich; Thanh-Nghi Do; Nguyen-Khang Pham

In data mining, large differences in prior class probabilities known as the class imbalance problem have been reported to hinder the performance of classifiers such as decision trees. Dealing with imbalanced and cost-sensitive data has been recognized as one of the 10 most challenging problems in data mining research. In decision trees learning, many measures are based on the concept of Shannons entropy. A major characteristic of the entropies is that they take their maximal value when the distribution of the modalities of the class variable is uniform. To deal with the class imbalance problem, we proposed an off-centered entropy which takes its maximum value for a distribution fixed by the user. This distribution can be the a priori distribution of the class variable modalities or a distribution taking into account the costs of misclassification. Others authors have proposed an asymmetric entropy. In this paper we present the concepts of the three entropies and compare their effectiveness on 20 imbalanced data sets. All our experiments are founded on the C4.5 decision trees algorithm, in which only the function of entropy is modified. The results are promising and show the interest of off-centered entropies to deal with the problem of class imbalance.

advanced data mining and applications | 2010

High dimensional image categorization

François Poulet; Nguyen-Khang Pham

We are interested in varying the vocabulary size in the image categorization task with a bag-of-visual-words to investigate its influence on the classification accuracy in two cases: in the first one, both the test-set and the training set contains the same objects (with only different view points in the test-set) and the second one where objects in the test-set do not appear at all in the training set (only other objects from the same category appear). In order to perform these tasks, we need to scale-up the algorithms used to deal with millions data points in hundred of thousand dimensions. We present k-means (used in the quantization step) and SVM (used in the classification step) algorithms extended to deal with very large datasets. These new incremental and parallel algorithms can be used on various distributed architectures, like multithread computer, cluster or GPU (graphics processing units). The efficiency of the approach is shown with the categorization of the 3D-Dataset from Savarese and Fei-Fei containing about 6700 images of 3D objects from 10 different classes. The obtained incremental and parallel SVM algorithm is several orders of magnitude faster than usual ones (like lib-SVM, SVM-perf or CB-SVM) and the incremental and parallel k-means is at least one order of magnitude faster than usual implementations.

Revue Dintelligence Artificielle | 2008

TreeView, exploration interactive des arbres de décision

Nguyen-Khang Pham; Thanh-Nghi Do; François Poulet; Annie Morin

We propose a graphical environment using the new radial tree layout, zoom/pan techniques and some existing methods, including explorer-like, hierarchical visualization, interactive techniques to represent large decision trees in a graphical mode more intuitive than the results in output of usual decision tree algorithms. The interactive exploration system on one hand can present the global view and on the other hand, it also provides a very good performance for an interesting sub-tree with simplicity, speed of task completion, ease of use and user understanding. The user can easily extract inductive rules and prune the tree in the post-processing stage. The numerical test results with real datasets show that the proposed methods have given an insight into decision tree results. It can guide the user towards for evaluating the models and also making more accurate decisions.

content based multimedia indexing | 2008

Boosting of factorial correspondence analysis for image retrieval

Nguyen-Khang Pham; Annie Morin; Patrick Gros

We are concerned by the use of factorial correspondence analysis (FCA) for image retrieval. FCA is designed for analysing contingency tables. In textual data analysis (TDA), FCA analyses a contingency table crossing terms/words and documents. For adapting FCA on images, we first define rdquovisual wordsrdquo computed from scalable invariant feature transform (SIFT) descriptors in images and use them for image quantization. At this step, we can build a contingency table crossing rdquovisual wordsrdquo as terms/words and images as documents. In spite of its successful applications in information retrieval, FCA suffers from large dimension problem because of the diagonalization of a large matrix. We propose a new algorithm, CABoost, which overcomes this large dimension problem of FCA. The data are sampled by column (word) and a FCA is applied on the sample. After some samplings, we finally combine separated results by a weighting - principle component analysis (PCA). The numerical experiments show that our algorithm performs more rapidly than the classical FCA without losing precision.

information technology interfaces | 2008

CAViz, an interactive graphical tool for image mining

Nguyen-Khang Pham; Annie Morin; Patrick Gros

We propose an interactive graphical tool, CAViz, which allows us to display and to extract knowledge from the results of a Correspondence Analysis CA on images. CA is a descriptive technique designed to analyze simple two-way and multi-way tables containing some measure of correspondence between the rows and columns. CA is very often used in Textual Data Analysis (TDA) where the contingency table crosses words and documents. In image mining, the first step is to define ldquovisualrdquo words in images (similar to words in texts). These words are constructed from local descriptors (SIFT, Scale Invariant Feature Transform) in images. Our tool CAViz is interactive, and it helps the user interpretating the results and the graphs of CA. An application to the Caltech4 base (Sivic et al., 2005) illustrates the interest of CAViz in image mining.

Applied Stochastic Model and Data Analysis International Conference, ASMDA'2007 | 2007