Chris H. Q. Ding
University of Texas at Arlington
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chris H. Q. Ding.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2005
Hanchuan Peng; Fuhui Long; Chris H. Q. Ding
Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.
Journal of Bioinformatics and Computational Biology | 2005
Chris H. Q. Ding; Hanchuan Peng
How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their ...
international conference on machine learning | 2004
Chris H. Q. Ding; Xiaofeng He
Principal component analysis (PCA) is a widely used statistical technique for unsupervised dimension reduction. K-means clustering is a commonly used data clustering for performing unsupervised learning tasks. Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering. New lower bounds for K-means objective function are derived, which is the total variance minus the eigenvalues of the data covariance matrix. These results indicate that unsupervised dimension reduction is closely related to unsupervised learning. Several implications are discussed. On dimension reduction, the result provides new insights to the observed effectiveness of PCA-based data reductions, beyond the conventional noise-reduction explanation that PCA, via singular value decomposition, provides the best low-dimensional linear approximation of the data. On learning, the result suggests effective techniques for K-means data clustering. DNA gene expression and Internet newsgroups are analyzed to illustrate our results. Experiments indicate that the new bounds are within 0.5-1.5% of the optimal values.
knowledge discovery and data mining | 2006
Chris H. Q. Ding; Tao Li; Wei Peng; Haesun Park
Currently, most research on nonnegative matrix factorization (NMF)focus on 2-factor
international conference on data mining | 2001
Chris H. Q. Ding; Xiaofeng He; Hongyuan Zha; Ming Gu; Horst D. Simon
X=FG^T
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2010
Chris H. Q. Ding; Tao Li; Michael I. Jordan
factorization. We provide a systematicanalysis of 3-factor
computational systems bioinformatics | 2003
Chris H. Q. Ding; Hanchuan Peng
X=FSG^T
international conference on machine learning | 2006
Chris H. Q. Ding; Ding Zhou; Xiaofeng He; Hongyuan Zha
NMF. While it unconstrained 3-factor NMF is equivalent to it unconstrained 2-factor NMF, itconstrained 3-factor NMF brings new features to it constrained 2-factor NMF. We study the orthogonality constraint because it leadsto rigorous clustering interpretation. We provide new rules for updating
international acm sigir conference on research and development in information retrieval | 2008
Dingding Wang; Tao Li; Shenghuo Zhu; Chris H. Q. Ding
F,S, G
Computational Statistics & Data Analysis | 2008
Chris H. Q. Ding; Tao Li; Wei Peng
and prove the convergenceof these algorithms. Experiments on 5 datasets and a real world casestudy are performed to show the capability of bi-orthogonal 3-factorNMF on simultaneously clustering rows and columns of the input datamatrix. We provide a new approach of evaluating the quality ofclustering on words using class aggregate distribution andmulti-peak distribution. We also provide an overview of various NMF extensions andexamine their relationships.