Yingyu Liang
Princeton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yingyu Liang.
international colloquium on automata languages and programming | 2012
Maria-Florina Balcan; Yingyu Liang
Motivated by the fact that distances between data points in many real-world clustering instances are often based on heuristic measures, Bilu and Linial [6] proposed analyzing objective based clustering problems under the assumption that the optimum clustering to the objective is preserved under small multiplicative perturbations to distances between points. In this paper, we provide several results within this framework. For separable center-based objectives, we present an algorithm that can optimally cluster instances resilient to
siam international conference on data mining | 2015
Aurélien Bellet; Yingyu Liang; Alireza Bagheri Garakani; Maria-Florina Balcan; Fei Sha
(1 + \sqrt{2})
NeuroImage | 2017
Kiran Vodrahalli; Po-Hsuan Chen; Yingyu Liang; Christopher Baldassano; Janice Chen; Esther Yong; Christopher J. Honey; Uri Hasson; Peter J. Ramadge; Kenneth A. Norman; Sanjeev Arora
-factor perturbations, solving an open problem of Awasthi et al. [2]. For the k-median objective, we additionally give algorithms for a weaker, relaxed, and more realistic assumption in which we allow the optimal solution to change in a small fraction of the points after perturbation. We also provide positive results for min-sum clustering which is a generally much harder objective than k-median (and also non-center-based). Our algorithms are based on new linkage criteria that may be of independent interest.
SIAM Journal on Computing | 2016
Maria-Florina Balcan; Yingyu Liang
Learning sparse combinations is a frequent theme in machine learning. In this paper, we study its associated optimization problem in the distributed setting where the elements to be combined are not centrally located but spread over a network. We address the key challenges of balancing communication costs and optimization errors. To this end, we propose a distributed Frank-Wolfe (dFW) algorithm. We obtain theoretical guarantees on the optimization error
SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition | 2013
Maria-Florina Balcan; Yingyu Liang
\epsilon
acm multimedia | 2009
Yingyu Liang; Jianmin Li; Bo Zhang
and communication cost that do not depend on the total number of combining elements. We further show that the communication cost of dFW is optimal by deriving a lower-bound on the communication cost required to construct an
conference on innovations in theoretical computer science | 2018
Maria-Florina Balcan; Yingyu Liang; David P. Woodruff; Hongyang Zhang
\epsilon
knowledge discovery and data mining | 2016
Maria-Florina Balcan; Yingyu Liang; Le Song; David P. Woodruff; Bo Xie
-approximate solution. We validate our theoretical analysis with empirical studies on synthetic and real-world data, which demonstrate that dFW outperforms both baselines and competing methods. We also study the performance of dFW when the conditions of our analysis are relaxed, and show that dFW is fairly robust.
conference on multimedia modeling | 2010
Yingyu Liang; Jianmin Li; Bo Zhang
ABSTRACT Several research groups have shown how to map fMRI responses to the meanings of presented stimuli. This paper presents new methods for doing so when only a natural language annotation is available as the description of the stimulus. We study fMRI data gathered from subjects watching an episode of BBCs Sherlock (Chen et al., 2017), and learn bidirectional mappings between fMRI responses and natural language representations. By leveraging data from multiple subjects watching the same movie, we were able to perform scene classification with 72% accuracy (random guessing would give 4%) and scene ranking with average rank in the top 4% (random guessing would give 50%). The key ingredients underlying this high level of performance are (a) the use of the Shared Response Model (SRM) and its variant SRM‐ICA (Chen et al., 2015; Zhang et al., 2016) to aggregate fMRI data from multiple subjects, both of which are shown to be superior to standard PCA in producing low‐dimensional representations for the tasks in this paper; (b) a sentence embedding technique adapted from the natural language processing (NLP) literature (Arora et al., 2017) that produces semantic vector representation of the annotations; (c) using previous timestep information in the featurization of the predictor data. These optimizations in how we featurize the fMRI data and text annotations provide a substantial improvement in classification performance, relative to standard approaches. HIGHLIGHTSWe learn maps between fMRI data and fine‐grained text annotations.The Shared Response Model highlights movie‐related variance in the fMRI response.Semantic annotations can be featurized with weighted sums of word embeddings.Using previous timepoints helps with fMRI to Text, but hurts Text to fMRI.Our methods attain high performance on scene classification and ranking tasks.
neural information processing systems | 2014
Bo Dai; Bo Xie; Niao He; Yingyu Liang; Anant Raj; Maria-Florina Balcan; Le Song
Motivated by the fact that distances between data points in many real-world clustering instances are often based on heuristic measures, Bilu and Linial [Proceedings of the Symposium on Innovations in Computer Science, 2010] proposed analyzing objective based clustering problems under the assumption that the optimum clustering to the objective is preserved under small multiplicative perturbations to distances between points. The hope is that by exploiting the structure in such instances, one can overcome worst case hardness results. In this paper, we provide several results within this framework. For center-based objectives, we present an algorithm that can optimally cluster instances resilient to perturbations of factor