Carmen Lai
Delft University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Carmen Lai.
BMC Bioinformatics | 2006
Carmen Lai; Marcel J. T. Reinders; Laura J. Van't Veer; Lodewyk F. A. Wessels
BackgroundGene selection is an important step when building predictors of disease state based on gene expression data. Gene selection generally improves performance and identifies a relevant subset of genes. Many univariate and multivariate gene selection approaches have been proposed. Frequently the claim is made that genes are co-regulated (due to pathway dependencies) and that multivariate approaches are therefore per definition more desirable than univariate selection approaches. Based on the published performances of all these approaches a fair comparison of the available results can not be made. This mainly stems from two factors. First, the results are often biased, since the validation set is in one way or another involved in training the predictor, resulting in optimistically biased performance estimates. Second, the published results are often based on a small number of relatively simple datasets. Consequently no generally applicable conclusions can be drawn.ResultsIn this study we adopted an unbiased protocol to perform a fair comparison of frequently used multivariate and univariate gene selection techniques, in combination with a ränge of classifiers. Our conclusions are based on seven gene expression datasets, across several cancer types.ConclusionOur experiments illustrate that, contrary to several previous studies, in five of the seven datasets univariate selection approaches yield consistently better results than multivariate approaches. The simplest multivariate selection approach, the Top Scoring method, achieves the best results on the remaining two datasets. We conclude that the correlation structures, if present, are difficult to extract due to the small number of samples, and that consequently, overly-complex gene selection algorithms that attempt to extract these structures are prone to overtraining.
Clinical Cancer Research | 2010
Hugo M. Horlings; Carmen Lai; Dimitry S.A. Nuyten; Hans Halfwerk; Petra Kristel; Erik H. van Beers; Simon A. Joosse; Christiaan Klijn; Petra M. Nederlof; Marcel J. T. Reinders; Lodewyk F. A. Wessels; Marc J. van de Vijver
Purpose: Several prognostic gene expression profiles have been identified in breast cancer. In spite of this progress in prognostic classification, the underlying mechanisms that drive these gene expression patterns remain unknown. Specific genomic alterations, such as copy number alterations, are an important factor in tumor development and progression and are also associated with changes in gene expression. Experimental Design: We carried out array comparative genomic hybridization in 68 human breast carcinomas for which gene expression and clinical data were available. We used a two-class supervised algorithm, Supervised Identification of Regions of Aberration in aCGH data sets, for the identification of regions of chromosomal alterations that are associated with specific sample labeling. Using gene expression data from the same tumors, we identified genes in the altered regions for which the expression level is significantly correlated with the copy number and validated our results in public available data sets. Results: Specific chromosomal aberrations are related to clinicopathologic characteristics and prognostic gene expression signatures. The previously identified poor prognosis, 70-gene expression signature is associated with the gain of 3q26.33-27.1, 8q22.1-24.21, and 17q24.3-25.1; the 70-gene good prognosis profile is associated with the loss at 16q12.1-13 and 16q22.1-24.1; basal-like tumors are associated with the gain of 6p12.3-23, 8q24.21-22, and 10p12.33-14 and losses at 4p15.31, 5q12.3-13.1, 5q33.1, 10q23.33, 12q13.13-3, 15q15.1, and 15q21.1; HER2+ breast show amplification at 17q11.1-12 and 17q21.31-23.2 (including HER2 gene). Conclusions: There is a strong correlation between the different gene expression signatures and underlying genomic changes. These findings help to establish a link between genomic changes and gene expression signatures, enabling a better understanding of the tumor biology that causes poor prognosis. Clin Cancer Res; 16(2); 651–63
International Journal of Pattern Recognition and Artificial Intelligence | 2004
Carmen Lai; David M. J. Tax; Robert P. W. Duin; Elzbieta Pekalska; Pavel Paclík
A flexible description of images is offered by a cloud of points in a feature space. In the context of image retrieval such clouds can be represented in a number of ways. Two approaches are here considered. The first approach is based on the assumption of a normal distribution, hence homogeneous clouds, while the second one focuses on the boundary description, which is more suitable for multimodal clouds. The images are then compared either by using the Mahalanobis distance or by the support vector data description (SVDD), respectively. The paper investigates some possibilities of combining the image clouds based on the idea that responses of several cloud descriptions may convey a pattern, specific for semantically similar images. A ranking of image dissimilarities is used as a comparison for two image databases targeting image classification and retrieval problems. We show that combining of the SVDD descriptions improves the retrieval performance with respect to ranking, on the contrary to the Mahalanobis case. Surprisingly, it turns out that the ranking of the Mahalanobis distances works well also for inhomogeneous images.
multiple classifier systems | 2002
Carmen Lai; David M. J. Tax; Robert P. W. Duin; Elzbieta Pekalska; Pavel Paclík
In image retrieval systems, images can be represented by single feature vectors or by clouds of points. A cloud of points offers a more flexible description but suffers from class overlap. We propose a novel approach for describing clouds of points based on support vector data description (SVDD). We show that combining SVDD-based classifiers improves the retrieval precision. We investigate the performance of the proposed retrieval technique on a database of 368 texture images and compare it to other methods.
BMC Bioinformatics | 2007
Carmen Lai; Hugo M. Horlings; Marc J. van de Vijver; Eric H. van Beers; Petra M. Nederlof; Lodewyk F. A. Wessels; Marcel J. T. Reinders
BackgroundArray comparative genome hybridization (aCGH) provides information about genomic aberrations. Alterations in the DNA copy number may cause the cell to malfunction, leading to cancer. Therefore, the identification of DNA amplifications or deletions across tumors may reveal key genes involved in cancer and improve our understanding of the underlying biological processes associated with the disease.ResultsWe propose a supervised algorithm for the analysis of aCGH data and the identification of regions of chromosomal alteration (SIRAC). We first determine the DNA-probes that are important to distinguish the classes of interest, and then evaluate in a systematic and robust scheme if these relevant DNA-probes are closely located, i.e. form a region of amplification/deletion. SIRAC does not need any preprocessing of the aCGH datasets, and requires only few, intuitive parameters.ConclusionWe illustrate the features of the algorithm with the use of a simple artificial dataset. The results on two breast cancer datasets show promising outcomes that are in agreement with previous findings, but SIRAC better pinpoints the dissimilarities between the classes of interest.
computer recognition systems | 2005
Elzbieta Pekalska; Artsiom Harol; Carmen Lai; Robert P. W. Duin
Learning from given patterns is realized by learning from their appropriate representations. This is usually practiced either by defining a set of features or by measuring proximities between pairs of objects. Both approaches are problem dependent and aim at the construction of some representation space, where discrimination functions can be defined.
computational systems bioinformatics | 2005
Carmen Lai; Marcel J. T. Reinders; Lodewyk F. A. Wessels
When building predictors of disease state based on gene expression data, gene selection is performed in order to achieve a good performance and to identify a relevant subset of genes. Although several gene selection algorithms have been proposed, a fair comparison of the available results is very problematic. This mainly stems from two factors. First, the results are often biased, since the test set is in one way or another involved in training the predictor, resulting in optimistically biased performance estimates. Second, the published results are often based on a small number of relatively simple datasets. Therefore, no general applicative conclusions can be drawn. We therefore adopted an unbiased protocol to perform a fair comparison of state of the art multivariate and univariate gene selection techniques, in combination with a range of classifiers. Our conclusions are based on seven gene expression datasets, across many cancer types. Surprisingly, we could not detect any significant improvement of multivariate feature selection techniques over univariate approaches. We speculate on the possible causes of this finding, ranging from the small sample size problem to the particular nature of the multivariate gene dependencies.
international conference on pattern recognition | 2008
Pavel Paclík; Carmen Lai; Jana Novovicová; Robert P. W. Duin
Receiver operating characteristic (ROC) analysis enables fine-tuning of a trained classifier to a desired performance trade-off situation. ROC estimated from a finite test set is, however, insufficient for the sake of classifier comparison as it neglects performance variances. This research presents a practical algorithm for variance estimation at individual operating points of ROC curves or surfaces. It generalizes the threshold averaging of Fawcett et.al. to arbitrary operating point definition including the weighting-based formulation used in multi-class ROC analysis. The statistical test comparing performance differences between operating points of the same curve is illustrated for two-class and multi-class ROC.
international conference on pattern recognition | 2010
Pavel Paclík; Carmen Lai; Thomas C.W. Landgrebe; Robert P. W. Duin
Instead of solving complex pattern recognition problems using a single complicated classifier, it is often beneficial to leverage our prior knowledge and decompose the problem into parts. These may be tackled using specific feature subsets and simpler classifiers resulting in a hierarchical system. In this paper, we propose an efficient and scalable approach for cost-sensitive optimization of a general hierarchical classifier using ROC analysis. This allows the designer to view the hierarchy of trained classifiers as a system, and tune it according to the application needs.
Pattern Recognition Letters | 2006
Carmen Lai; Marcel J. T. Reinders; Lodewyk F. A. Wessels