Shuheng Zhou
University of Michigan
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shuheng Zhou.
Machine Learning | 2010
Shuheng Zhou; John D. Lafferty; Larry Wasserman
Undirected graphs are often used to describe high dimensional distributions. Under sparsity conditions, the graph can be estimated using ℓ1 penalization methods. However, current methods assume that the data are independent and identically distributed. If the distribution, and hence the graph, evolves over time then the data are not longer identically distributed. In this paper we develop a nonparametric method for estimating time varying graphical structure for multivariate Gaussian distributions using an ℓ1 regularization method, and show that, as long as the covariances change smoothly over time, we can estimate the covariance matrix well (in predictive risk) even when p is large.
Journal of the American Statistical Association | 2010
Larry Wasserman; Shuheng Zhou
One goal of statistical privacy research is to construct a data release mechanism that protects individual privacy while preserving information content. An example is a random mechanism that takes an input database X and outputs a random database Z according to a distribution Qn(⋅|X). Differential privacy is a particular privacy requirement developed by computer scientists in which Qn(⋅|X) is required to be insensitive to changes in one data point in X. This makes it difficult to infer from Z whether a given individual is in the original database X. We consider differential privacy from a statistical perspective. We consider several data-release mechanisms that satisfy the differential privacy requirement. We show that it is useful to compare these schemes by computing the rate of convergence of distributions and densities constructed from the released data. We study a general privacy method, called the exponential mechanism, introduced by McSherry and Talwar (2007). We show that the accuracy of this method is intimately linked to the rate at which the probability that the empirical distribution concentrates in a small ball around the true distribution.
symposium on discrete algorithms | 2005
T.-H. Hubert Chan; Anupam Gupta; Bruce M. Maggs; Shuheng Zhou
We study the problem of routing in doubling metrics, and show how to perform hierarchical routing in such metrics with small stretch and compact routing tables (i.e., with small amount of routing information stored at each vertex). We say that a metric (<i>X, d</i>) has <i>doubling dimension</i> dim(<i>X</i>) at most α if every set of diameter <i>D</i> can be covered by 2<sup>α</sup> sets of diameter <i>D</i>/2. (A <i>doubling metric</i> is one whose doubling dimension dim(<i>X</i>) is a constant.) We show how to perform (1 + τ)-stretch routing on metrics for any 0 < <i>T</i> ≤ 1 with routing tables of size at most (α/τ)<sup><i>O</i>(α)</sup> log<sup>2</sup> Δ bits with only (α/τ)<sup><i>O</i>(α)</sup> log Δ <i>entries</i>, where Δ is the diameter of the graph; hence the number of routing table entries is just τ<sup>-<i>O</i>(1)</sup> log Δ for doubling metrics. These results extend and improve on those of Talwar (2004).We also give better constructions of sparse <i>spanners</i> for doubling metrics than those obtained from the routing tables above; for τ > 0, we give algorithms to construct (1 + τ)-stretch spanners for a metric (<i>X, d</i>) with maximum degree at most (2 + 1/τ)<sup><i>O</i>(dim(<i>X</i>))</sup>, matching the results of Das et al. for Euclidean metrics.
IEEE Transactions on Information Theory | 2013
Mark Rudelson; Shuheng Zhou
Random matrices are widely used in sparse recovery problems, and the relevant properties of matrices with i.i.d. entries are well understood. This paper discusses the recently introduced restricted eigenvalue (RE) condition, which is among the most general assumptions on the matrix, guaranteeing recovery. We prove a reduction principle showing that the RE condition can be guaranteed by checking the restricted isometry on a certain family of low-dimensional subspaces. This principle allows us to establish the RE condition for several broad classes of random matrices with dependent entries, including random matrices with sub-Gaussian rows and nontrivial covariance structure, as well as matrices with independent rows, and uniformly bounded entries.
Electronic Journal of Statistics | 2011
Sara van de Geer; Peter Bühlmann; Shuheng Zhou
We revisit the adaptive Lasso as well as the thresholded Lasso with refitting, in a high-dimensional linear model, and study prediction error,
IEEE Transactions on Information Theory | 2009
Shuheng Zhou; John D. Lafferty; Larry Wasserman
\ell_q
Annals of Statistics | 2014
Shuheng Zhou
-error (
SIAM Journal on Computing | 2010
Satish Rao; Shuheng Zhou
q \in \{1, 2 \}
international symposium on information theory | 2009
Shuheng Zhou; Katrina Ligett; Larry Wasserman
), and number of false positive selections. Our theoretical results for the two methods are, at a rather fine scale, comparable. The differences only show up in terms of the (minimal) restricted and sparse eigenvalues, favoring thresholding over the adaptive Lasso. As regards prediction and estimation, the difference is virtually negligible, but our bound for the number of false positives is larger for the adaptive Lasso than for thresholding. Moreover, both these two-stage methods add value to the one-stage Lasso in the sense that, under appropriate restricted and sparse eigenvalue conditions, they have similar prediction and estimation error as the one-stage Lasso, but substantially less false positives.
international colloquium on automata languages and programming | 2006
Satish Rao; Shuheng Zhou
Recent research has studied the role of sparsity in high-dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models. This line of work shows that lscr1 -regularized least squares regression can accurately estimate a sparse linear model from noisy examples in high dimensions. We study a variant of this problem where the original n input variables are compressed by a random linear transformation to m Lt n examples in p dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of projections that are required for lscr1 -regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called ldquosparsistence.rdquo We also show that lscr1 -regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called ldquopersistence.rdquo Finally, we characterize the privacy properties of the compression procedure, establishing upper bounds on the mutual information between the compressed and uncompressed data that decay to zero.