Is this you? Create Your Porfile

Eric C. Chi

North Carolina State University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eric C. Chi is active.

Explore More

Publication

Featured researches published by Eric C. Chi.

SIAM Journal on Matrix Analysis and Applications | 2012

On Tensors, Sparsity, and Nonnegative Factorizations

Eric C. Chi; Tamara G. Kolda

Tensors have found application in a variety of fields, ranging from chemometrics to signal processing and beyond. In this paper, we consider the problem of multilinear modeling of sparse count data. Our goal is to develop a descriptive tensor factorization model of such data, along with appropriate algorithms and theory. To do so, we propose that the random variation is best described via a Poisson distribution, which better describes the zeros observed in the data as compared to the typical assumption of a Gaussian distribution. Under a Poisson assumption, we fit a model to observed data using the negative log-likelihood score. We present a new algorithm for Poisson tensor factorization called CANDECOMP--PARAFAC alternating Poisson regression (CP-APR) that is based on a majorization-minimization approach. It can be shown that CP-APR is a generalization of the Lee--Seung multiplicative updates. We show how to prevent the algorithm from converging to non-KKT points and prove convergence of CP-APR under mil...

Journal of Computational and Graphical Statistics | 2015

Splitting Methods for Convex Clustering

Eric C. Chi; Kenneth Lange

Clustering is a fundamental problem in many scientific applications. Standard methods such as k-means, Gaussian mixture models, and hierarchical clustering, however, are beset by local minima, which are sometimes drastically suboptimal. Recently introduced convex relaxations of k-means and hierarchical clustering shrink cluster centroids toward one another and ensure a unique global minimizer. In this work, we present two splitting methods for solving the convex clustering problem. The first is an instance of the alternating direction method of multipliers (ADMM); the second is an instance of the alternating minimization algorithm (AMA). In contrast to previously considered algorithms, our ADMM and AMA formulations provide simple and unified frameworks for solving the convex clustering problem under the previously studied norms and open the door to potentially novel norms. We demonstrate the performance of our algorithm on both simulated and real data examples. While the differences between the two algorithms appear to be minor on the surface, complexity analysis and numerical experiments show AMA to be significantly more efficient. This article has supplementary materials available online.

Genome Research | 2013

Genotype imputation via matrix completion

Eric C. Chi; Hua Zhou; Gary K. Chen; Diego Ortega-Del Vecchyo; Kenneth Lange

Most current genotype imputation methods are model-based and computationally intensive, taking days to impute one chromosome pair on 1000 people. We describe an efficient genotype imputation method based on matrix completion. Our matrix completion method is implemented in MATLAB and tested on real data from HapMap 3, simulated pedigree data, and simulated low-coverage sequencing data derived from the 1000 Genomes Project. Compared with leading imputation programs, the matrix completion algorithm embodied in our program MENDEL-IMPUTE achieves comparable imputation accuracy while reducing run times significantly. Implementation in a lower-level language such as Fortran or C is apt to further improve computational efficiency.

Mathematical Programming | 2014

Distance majorization and its applications

Eric C. Chi; Hua Zhou; Kenneth Lange

The problem of minimizing a continuously differentiable convex function over an intersection of closed convex sets is ubiquitous in applied mathematics. It is particularly interesting when it is easy to project onto each separate set, but nontrivial to project onto their intersection. Algorithms based on Newton’s method such as the interior point method are viable for small to medium-scale problems. However, modern applications in statistics, engineering, and machine learning are posing problems with potentially tens of thousands of parameters or more. We revisit this convex programming problem and propose an algorithm that scales well with dimensionality. Our proposal is an instance of a sequential unconstrained minimization technique and revolves around three ideas: the majorization-minimization principle, the classical penalty method for constrained optimization, and quasi-Newton acceleration of fixed-point algorithms. The performance of our distance majorization algorithms is illustrated in several applications.

international symposium on biomedical imaging | 2013

Imaging genetics via sparse canonical correlation analysis

Eric C. Chi; Genevera I. Allen; Hua Zhou; Omid Kohannim; Kenneth Lange; Paul M. Thompson

The collection of brain images from populations of subjects who have been genotyped with genome-wide scans makes it feasible to search for genetic effects on the brain. Even so, multivariate methods are sorely needed that can search both images and the genome for relationships, making use of the correlation structure of both datasets. Here we investigate the use of sparse canonical correlation analysis (CCA) to home in on sets of genetic variants that explain variance in a set of images. We extend recent work on penalized matrix decomposition to account for the correlations in both datasets. Such methods show promise in imaging genetics as they exploit the natural covariance in the datasets. They also avoid an astronomically heavy statistical correction for searching the whole genome and the entire image for promising associations.

Biometrics | 2017

Convex biclustering: Convex Biclustering

Eric C. Chi; Genevera I. Allen; Richard G. Baraniuk

In the biclustering problem, we seek to simultaneously group observations and features. While biclustering has applications in a wide array of domains, ranging from text mining to collaborative filtering, the problem of identifying structure in high-dimensional genomic data motivates this work. In this context, biclustering enables us to identify subsets of genes that are co-expressed only within a subset of experimental conditions. We present a convex formulation of the biclustering problem that possesses a unique global minimizer and an iterative algorithm, COBRA, that is guaranteed to identify it. Our approach generates an entire solution path of possible biclusters as a single tuning parameter is varied. We also show how to reduce the problem of selecting this tuning parameter to solving a trivial modification of the convex biclustering problem. The key contributions of our work are its simplicity, interpretability, and algorithmic guarantees-features that arguably are lacking in the current alternative algorithms. We demonstrate the advantages of our approach, which includes stably and reproducibly identifying biclusterings, on simulated and real microarray data.

American Mathematical Monthly | 2014

A Look at the Generalized Heron Problem through the Lens of Majorization-Minimization

Eric C. Chi; Kenneth Lange

Abstract In a recent issue of this Monthly, Mordukhovich, Nam, and Salinas pose and solve an interesting non-differentiable generalization of the Heron problem in the framework of modern convex analysis. In the generalized Heron problem, we are given k + 1 closed convex sets in ℝd equipped with its Euclidean norm and asked to find the point in the last set such that the sum of the distances to the first k sets is minimal. In later work, the authors generalize the Heron problem even further, relax its convexity assumptions, study its theoretical properties, and pursue subgradient algorithms for solving the convex case. Here, we revisit the original problem solely from the numerical perspective. By exploiting the majorizationminimization (MM) principle of computational statistics and rudimentary techniques from differential calculus, we are able to construct a very fast algorithm for solving the Euclidean version of the generalized Heron problem.

The American Statistician | 2016

k-POD: A Method for k-Means Clustering of Missing Data

Jocelyn T. Chi; Eric C. Chi; Richard G. Baraniuk

The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, are common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our k-POD method presents a simple extension of k-means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data. [Received November 2014. Revised August 2015.]

PLOS Computational Biology | 2015

Convex clustering: an attractive alternative to hierarchical clustering.

Gary K. Chen; Eric C. Chi; John Michael O. Ranola; Kenneth Lange

The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/

Journal of Computational and Graphical Statistics | 2014

Robust Parametric Classification and Variable Selection by a Minimum Distance Criterion

Eric C. Chi; David W. Scott

We investigate a robust penalized logistic regression algorithm based on a minimum distance criterion. Influential outliers are often associated with the explosion of parameter vector estimates, but in the context of standard logistic regression, the bias due to outliers always causes the parameter vector to implode, that is, shrink toward the zero vector. Thus, using LASSO-like penalties to perform variable selection in the presence of outliers can result in missed detections of relevant covariates. We show that by choosing a minimum distance criterion together with an elastic net penalty, we can simultaneously find a parsimonious model and avoid estimation implosion even in the presence of many outliers in the important small n large p situation. Minimizing the penalized minimum distance criterion is a challenging problem due to its nonconvexity. To meet the challenge, we develop a simple and efficient MM (majorization–minimization) algorithm that can be adapted gracefully to the small n large p context. Performance of our algorithm is evaluated on simulated and real datasets. This article has supplementary materials available online.

Explore More