Khanh-Chuong Duong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Khanh-Chuong Duong is active.

Explore More

Publication

Featured researches published by Khanh-Chuong Duong.

european conference on machine learning | 2013

A declarative framework for constrained clustering

Thi-Bich-Hanh Dao; Khanh-Chuong Duong; Christel Vrain

In recent years, clustering has been extended to constrained clustering, so as to integrate knowledge on objects or on clusters, but adding such constraints generally requires to develop new algorithms. We propose a declarative and generic framework, based on Constraint Programming, which enables to design clustering tasks by specifying an optimization criterion and some constraints either on the clusters or on pairs of objects. In our framework, several classical optimization criteria are considered and they can be coupled with different kinds of constraints. Relying on Constraint Programming has two main advantages: the declarativity, which enables to easily add new constraints and the ability to find an optimal solution satisfying all the constraints (when there exists one). On the other hand, computation time depends on the constraints and on their ability to reduce the domain of variables, thus avoiding an exhaustive search.

Artificial Intelligence | 2017

Constrained clustering by constraint programming

Thi-Bich-Hanh Dao; Khanh-Chuong Duong; Christel Vrain

Cluster analysis is an important task in Data Mining with hundreds of different approaches in the literature. Since the last decade, the cluster analysis has been extended to constrained clustering, also called semi-supervised clustering, so as to integrate previous knowledge on data to clustering algorithms. In this dissertation, we explore Constraint Programming (CP) for solving the task of constrained clustering. The main principles in CP are: (1) users specify declaratively the problem in a Constraint Satisfaction Problem; (2) solvers search for solutions by constraint propagation and search. Relying on CP has two main advantages: the declarativity, which enables to easily add new constraints and the ability to find an optimal solution satisfying all the constraints (when there exists one). We propose two models based on CP to address constrained clustering tasks. The models are flexible and general and supports instance-level constraints and different cluster-level constraints. It also allows the users to choose among different optimization criteria. In order to improve the efficiency, different aspects have been studied in the dissertation. Experiments on various classical datasets show that our models are competitive with other exact approaches. We show that our models can easily be embedded in a more general process and we illustrate this on the problem of finding the Pareto front of a bi-criterion optimization process.

principles and practice of constraint programming | 2015

Constrained Minimum Sum of Squares Clustering by Constraint Programming

Thi-Bich-Hanh Dao; Khanh-Chuong Duong; Christel Vrain

The Within-Cluster Sum of Squares (WCSS) is the most used criterion in cluster analysis. Optimizing this criterion is proved to be NP-Hard and has been studied by different communities. On the other hand, Constrained Clustering allowing to integrate previous user knowledge in the clustering process has received much attention this last decade. As far as we know, there is a single approach that aims at finding the optimal solution for the WCSS criterion and that integrates different kinds of user constraints. This method is based on integer linear programming and column generation. In this paper, we propose a global optimization constraint for this criterion and develop a filtering algorithm. It is integrated in our Constraint Programming general and declarative framework for Constrained Clustering. Experiments on classic datasets show that our approach outperforms the exact approach based on integer linear programming and column generation.

european conference on artificial intelligence | 2016

Repetitive branch-and-bound using constraint programming for constrained minimum sum-of-squares clustering

Tias Guns; Thi-Bich-Hanh Dao; Christel Vrain; Khanh-Chuong Duong

Minimum sum-of-squares clustering (MSSC) is a widely studied task and numerous approximate as well as a number of exact algorithms have been developed for it. Recently the interest of integrating prior knowledge in data mining has been shown, and much attention has gone into incorporating user constraints into clustering algorithms in a generic way. Exact methods for MSSC using integer linear programming or constraint programming have been shown to be able to incorporate a wide range of constraints. However, a better performing method for unconstrained exact clustering is the Repetitive Branch-and-Bound Algorithm (RBBA) algorithm. In this paper we show that both approaches can be combined. The key idea is to replace the internal branch-and-bound of RBBA by a constraint programming solver, and use it to compute tight lower and upper bounds. To achieve this, we integrate the computed bounds into the solver using a novel constraint. Our method combines the best of both worlds, and is generic as well as performing better than other exact constrained methods. Furthermore, we show that our method can be used for multi-objective MSSC clustering, including constrained multi-objective clustering.

international conference on tools with artificial intelligence | 2013

A Filtering Algorithm for Constrained Clustering with Within-Cluster Sum of Dissimilarities Criterion

Thi-Bich-Hanh Dao; Khanh-Chuong Duong; Christel Vrain

Constrained clustering is an important task in Data Mining. In the last ten years, many works have been done to extend classical clustering algorithms to handle user-defined constraints, but restricted to handle one kind of user-constraints. In a previous work [1], we have proposed a declarative and generic framework, based on Constraint Programming, which enables to design a clustering task by specifying an optimization criterion and different kinds of user-constraints. One of the criteria is the within-cluster sum of dissimilarities, which is represented by a sum constraint and reified equality constraints V=Σ1≤i<;j≤n(G[i]==G[j])aij· A direct implementation using predefined constraints is not effective as the propagation of theses constraints is weak. In this paper, we consider this criterion as a global constraint and develop a filtering algorithm for it. This filtering helps to improve significantly the model performance. Experiments on classical databases show the interest of our approach.

database and expert systems applications | 2017

MapFIM: Memory Aware Parallelized Frequent Itemset Mining in Very Large Datasets

Khanh-Chuong Duong; Mostafa Bamha; Arnaud Giacometti; Dominique Haoyuan Li; Arnaud Soulet; Christel Vrain

Mining frequent itemsets in large datasets has received much attention, in recent years, relying on MapReduce programming models. Many famous FIM algorithms have been parallelized in a MapReduce framework like Parallel Apriori, Parallel FP-Growth and Dist-Eclat. However, most papers focus on work partitioning and/or load balancing but they are not extensible because they require some memory assumptions. A challenge in designing parallel FIM algorithms is thus finding ways to guarantee that data structures used during mining always fit in the local memory of the processing nodes during all computation steps.

european conference on artificial intelligence | 2016