Thanh N. Tran
Radboud University Nijmegen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thanh N. Tran.
Computational Statistics & Data Analysis | 2006
Thanh N. Tran; Ron Wehrens; L.M.C. Buydens
Density-based clustering algorithms for multivariate data often have difficulties with high-dimensional data and clusters of very different densities. A new density-based clustering algorithm, called KNNCLUST, is presented in this paper that is able to tackle these situations. It is based on the combination of nonparametric k-nearest-neighbor (KNN) and kernel (KNN-kernel) density estimation. The KNN-kernel density estimation technique makes it possible to model clusters of different densities in high-dimensional data sets. Moreover, the number of clusters is identified automatically by the algorithm. KNNCLUST is tested using simulated data and applied to a multispectral compact airborne spectrographic imager (CASI)_image of a floodplain in the Netherlands to illustrate the characteristics of the method.
IEEE Transactions on Geoscience and Remote Sensing | 2005
Thanh N. Tran; Ron Wehrens; L.M.C. Buydens
Markov random field (MRF) clustering, utilizing both spectral and spatial interpixel dependency information, often improves classification accuracy for remote sensing images, such as multichannel polarimetric synthetic aperture radar (SAR) images. However, it is heavily sensitive to initial conditions such as the choice of the number of clusters and their parameters. In this paper, an initialization scheme for MRF clustering approaches is suggested for remote sensing images. The proposed method derives suitable initial cluster parameters from a set of homogeneous regions, and estimates the number of clusters using the pseudolikelihood information criterion (PLIC). The method works best for an image consisting of many large homogeneous regions, such as agricultural crops areas. It is illustrated using a well-known polarimetric SAR image of Flevoland in the Netherlands. The experiment shows a superior performance compared to several other methods, such as fuzzy C-means and iterated conditional modes (ICM) clustering.
Analytica Chimica Acta | 2003
Thanh N. Tran; Ron Wehrens; Lutgarde M. C. Buydens
Abstract Multispectral images such as multispectral chemical images or multispectral satellite images provide detailed data with information in both the spatial and spectral domains. Many segmentation methods for multispectral images are based on a per-pixel classification, which uses only spectral information and ignores spatial information. A clustering algorithm based on both spectral and spatial information would produce better results. In this work, spatial refinement clustering (SpaRef), a new clustering algorithm for multispectral images is presented. Spatial information is integrated with partitional and agglomeration clustering processes. The number of clusters is automatically identified. SpaRef is compared with a set of well-known clustering methods on compact airborne spectrographic imager (CASI) over an area in the Klompenwaard, The Netherlands. The clusters obtained show improved results. Applying SpaRef to multispectral chemical images would be a straight-forward step.
2003 2nd GRSS/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas | 2003
Thanh N. Tran; Ron Wehrens; Lutgarde M. C. Buydens
High resolution and high dimension satellite images cause problems for clustering methods due to clusters of different sizes, shapes and densities. The most common clustering methods, e.g. K-means and ISODATA, do not work well for such kinds of datasets. In this work, density estimation techniques and density-based clustering methods are exploited. Density-based clustering is well known in data mining to classify a data set based on its density parameters, where lower density areas separate high-density areas, although it can only work with a simple data set in which cluster densities are not very different. Out contribution is to propose the k nearest neighbor (knn) density-based rule for high dimensional dataset and to develop a new knn density-based clustering (KNNCLUST) for such complex dataset. KNNCLUST is stable, clear and easy to understand and implement. The number of clusters is automatically determined. These properties are illustrated by the segmentation of a multispectral image of a floodplain in the Netherlands.
Journal of Chemometrics | 2017
Thanh N. Tran; Ewa Szymańska; Jan Gerretzen; Lutgarde M. C. Buydens; Nelson Lee Afanador; Lionel Blanchet
The selection of the optimal number of components remains a difficult but essential task in partial least squares (PLS). Randomization tests have the advantage of being automatic and they make use of the entire dataset, in contrary with the widely used cross‐validation approaches. Partial least squares modeling may include component(s) with a large amount of irrelevant data variation, and this might affect the model, depending on the assigned y‐loading (which is the regression coefficient in the latent domain). This has recently been indicated by us in the basic sequence framework with respect to the underlying theory of the PLS algorithm and presented to the chemometrics society. We will show in this work that this irrelevant data variation is the root cause of the difficulty in current methods for selecting the optimal number of components. For randomization tests, PLS models with nonsignificant components may result in false positive tests because of the incorrect assumption that “the components enter the model in a natural order”.
international geoscience and remote sensing symposium | 2004
Thanh N. Tran; Ron Wehrens; Lutgarde M. C. Buydens
Markov Random Field clustering, utilizing both spectral and spatial inter-pixel dependency information, often provides higher accuracy for remote sensing images, such as polarimetric SAR images. However, it is heavily sensitive to initial conditions, i.e. the initialization of parameters and the choice of the number of clusters. In this paper, an initialization scheme for MRF clustering approaches for polarimetric SAR images is suggested. The method takes into account spatial relations between pixels and provides a guideline to the choice of the number of clusters using Pseudolikelihood Information Criterion (PLIC) criterion. A well-known polarimetric SAR image of Flevoland in the Netherlands is given as an example, showing that this approach gives very good performance.
Analytica Chimica Acta | 2018
Yang Liu; Thanh N. Tran; G.J. Postma; Lutgarde M. C. Buydens; Jeroen J. Jansen
Principal Component Analysis (PCA) is widely used in analytical chemistry, to reduce the dimensionality of a multivariate data set in a few Principal Components (PCs) that summarize the predominant patterns in the data. An accurate estimate of the number of PCs is indispensable to provide meaningful interpretations and extract useful information. We show how existing estimates for the number of PCs may fall short for datasets with considerable coherence, noise or outlier presence. We present here how Angle Distribution of the Loading Subspaces (ADLS) can be used to estimate the number of PCs based on the variability of loading subspace across bootstrap resamples. Based on comprehensive comparisons with other well-known methods applied on simulated dataset, we show that ADLS (1) may quantify the stability of a PCA model with several numbers of PCs simultaneously; (2) better estimate the appropriate number of PCs when compared with the cross-validation and scree plot methods, specifically for coherent data, and (3) facilitate integrated outlier detection, which we introduce in this manuscript. We, in addition, demonstrate how the analysis of different types of real-life spectroscopic datasets may benefit from these advantages of ADLS.
Remote Sensing | 2018
Jacopo Acquarelli; Elena Marchiori; Lutgarde M. C. Buydens; Thanh N. Tran; Twan van Laarhoven
Spectral-spatial classification of remotely sensed hyperspectral images has been the subject of many studies in recent years. Current methods achieve excellent performance on benchmark hyperspectral image labeling tasks when a sufficient number of labeled pixels is available. However, in the presence of only very few labeled pixels, such classification becomes a challenging problem. In this paper we propose to tackle this problem using convolutional neural networks (CNNs) and data augmentation. Our newly developed method relies on the assumption of spectral-spatial locality: nearby pixels in a hyperspectral image are related, in the sense that their spectra and their labels are likely to be similar. We exploit this assumption to develop 1) a new data augmentation procedure which adds new samples to the train set and 2) a tailored loss function which penalize differences among weights of the network corresponding to nearby wavelengths of the spectra. We train a simple single layer convolutional neural network with this loss function and augmented train set and use it to classify all unlabeled pixels of the given image. To assess the efficacy of our method, we used five publicly available hyperspectral images: Pavia Center, Pavia University, KSC, Indian Pines and Salina. On these images our method significantly outperforms other baselines. Notably, with just 1% of labeled pixels per class, on these dataset our method achieves an accuracy of 99.5%, etc. Furthermore we show that our method improves over other baselines also in a supervised setting, when no overlap between train and test pixels is allowed. Overall our investigation demonstrates that spectral-spatial locality can be easily embedded in a simple convolutional neural network through data augmentation and a tailored loss function.Spectral-spatial classification of hyperspectral images has been the subject of many studies in recent years. In the presence of only very few labeled pixels, this task becomes challenging. In this paper we address the following two research questions: 1) Can a simple neural network with just a single hidden layer achieve state of the art performance in the presence of few labeled pixels? 2) How is the performance of hyperspectral image classification methods affected when using disjoint train and test sets? We give a positive answer to the first question by using three tricks within a very basic shallow Convolutional Neural Network (CNN) architecture: a tailored loss function, and smooth- and label-based data augmentation. The tailored loss function enforces that neighborhood wavelengths have similar contributions to the features generated during training. A new label-based technique here proposed favors selection of pixels in smaller classes, which is beneficial in the presence of very few labeled pixels and skewed class distributions. To address the second question, we introduce a new sampling procedure to generate disjoint train and test set. Then the train set is used to obtain the CNN model, which is then applied to pixels in the test set to estimate their labels. We assess the efficacy of the simple neural network method on five publicly available hyperspectral images. On these images our method significantly outperforms considered baselines. Notably, with just 1% of labeled pixels per class, on these datasets our method achieves an accuracy that goes from 86.42% (challenging dataset) to 99.52% (easy dataset). Furthermore we show that the simple neural network method improves over other baselines in the new challenging supervised setting. Our analysis substantiates the highly beneficial effect of using the entire image (so train and test data) for constructing a model.
Chemometrics and Intelligent Laboratory Systems | 2005
Thanh N. Tran; Ron Wehrens; Lutgarde M. C. Buydens
Chemometrics and Intelligent Laboratory Systems | 2014
Thanh N. Tran; Nelson Lee Afanador; Lutgarde M. C. Buydens; Lionel Blanchet