Is this you? Create Your Porfile

Sewoong Oh

University of Illinois at Urbana–Champaign

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sewoong Oh is active.

Explore More

Publication

Featured researches published by Sewoong Oh.

IEEE Transactions on Information Theory | 2010

Matrix Completion From a Few Entries

Raghunandan H. Keshavan; Andrea Montanari; Sewoong Oh

Let M be an n¿ × n matrix of rank r, and assume that a uniformly random subset E of its entries is observed. We describe an efficient algorithm, which we call OptSpace, that reconstructs M from |E| = O(rn) observed entries with relative root mean square error 1/2 RMSE ¿ C(¿) (nr/|E|)1/2 with probability larger than 1 - 1/n3. Further, if r = O(1) and M is sufficiently unstructured, then OptSpace reconstructs it exactly from |E| = O(n log n) entries with probability larger than 1 - 1/n3. This settles (in the case of bounded rank) a question left open by Candes and Recht and improves over the guarantees for their reconstruction algorithm. The complexity of our algorithm is O(|E|r log n), which opens the way to its use for massive data sets. In the process of proving these statements, we obtain a generalization of a celebrated result by Friedman-Kahn-Szemeredi and Feige-Ofek on the spectrum of sparse random matrices.

international symposium on information theory | 2009

Matrix completion from a few entries

Raghunandan H. Keshavan; Sewoong Oh; Andrea Montanari

Let M be an nα × n matrix of rank r ≪ n, and assume that a uniformly random subset E of its entries is observed. We describe an efficient algorithm that reconstructs M from |E| = O(r n) observed entries with relative root mean square error RMSE ≤ C(α) (nr/|E|)1/2. Further, if r = O(1) and M is sufficiently unstructured, then it can be reconstructed exactly from |E| = O(n log n) entries. This settles (in the case of bounded rank) a question left open by Candes and Recht and improves over the guarantees for their reconstruction algorithm. The complexity of our algorithm is O(|E|r log n), which opens the way to its use for massive data sets. In the process of proving these statements, we obtain a generalization of a celebrated result by Friedman-Kahn-Szemeredi and Feige-Ofek on the spectrum of sparse random matrices.

very large data bases | 2012

Counting with the crowd

Adam Marcus; David R. Karger; Samuel Madden; Robert C. Miller; Sewoong Oh

In this paper, we address the problem of selectivity estimation in a crowdsourced database. Specifically, we develop several techniques for using workers on a crowdsourcing platform like Amazons Mechanical Turk to estimate the fraction of items in a dataset (e.g., a collection of photos) that satisfy some property or predicate (e.g., photos of trees). We do this without explicitly iterating through every item in the dataset. This is important in crowd-sourced query optimization to support predicate ordering and in query evaluation, when performing a GROUP BY operation with a COUNT or AVG aggregate. We compare sampling item labels, a traditional approach, to showing workers a collection of items and asking them to estimate how many satisfy some predicate. Additionally, we develop techniques to eliminate spammers and colluding attackers trying to skew selectivity estimates when using this count estimation approach. We find that for images, counting can be much more effective than sampled labeling, reducing the amount of work necessary to arrive at an estimate that is within 1% of the true fraction by up to an order of magnitude, with lower worker latency. We also find that sampled labeling outperforms count estimation on a text processing task, presumably because people are better at quickly processing large batches of images than they are at reading strings of text. Our spammer detection technique, which is applicable to both the label- and count-based approaches, can improve accuracy by up to two orders of magnitude.

information theory workshop | 2010

Sensor network localization from local connectivity: Performance analysis for the MDS-MAP algorithm

Sewoong Oh; Andrea Montanari; Amin Karbasi

Sensor localization from only connectivity information is a highly challenging problem. To this end, our result for the first time establishes an analytic bound on the performance of the popular MDS-MAP algorithm based on multidimensional scaling. For a network consisting of n sensors positioned randomly on a unit square and a given radio range r = o(1), we show that resulting error is bounded, decreasing at a rate that is inversely proportional to r, when only connectivity information is given. The same bound holds for the range-based model, when we have an approximate measurements for the distances, and the same algorithm can be applied without any modification.

Operations Research | 2017

Rank centrality: Ranking from pairwise comparisons

Sahand Negahban; Sewoong Oh; Devavrat Shah

The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g., MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining a ranking, finding ‘scores’ for each object (e.g., player’s rating) is of interest for understanding the intensity of the preferences.In this paper, we propose Rank Centrality , an iterative rank aggregation algorithm for discovering scores for objects (or items) from pairwise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with an edge present between a pair of objects if they are compared; the score, which we call Rank Centrality, of an object turns out to be its stationary probability under this random walk.To study the efficacy of the algorithm, we consider the popular Bradley-Terry-Luce (BTL) model (equivalent to the Multinomial Logit (MNL) for pairwise comparisons) in which each object has an associated score that determines the probabilistic outcomes of pairwise comparisons between objects. In terms of the pairwise marginal probabilities, which is the main subject of this paper, the MNL model and the BTL model are identical. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. In particular, the number of samples required to learn the score well with high probability depends on the structure of the comparison graph. When the Laplacian of the comparison graph has a strictly positive spectral gap, e.g., each item is compared to a subset of randomly chosen items, this leads to dependence on the number of samples that is nearly order optimal.Experimental evaluations on synthetic data sets generated according to the BTL model show that our algorithm performs as well as the maximum likelihood estimator for that model and outperforms other popular ranking algorithms.

allerton conference on communication, control, and computing | 2009

Low-rank matrix completion with noisy observations: A quantitative comparison

Raghunandan H. Keshavan; Andrea Montanari; Sewoong Oh

We consider a problem of significant practical importance, namely, the reconstruction of a low-rank data matrix from a small subset of its entries. This problem appears in many areas such as collaborative filtering, computer vision and wireless sensor networks. In this paper, we focus on the matrix completion problem in the case when the observed samples are corrupted by noise. We compare the performance of three state-of-the-art matrix completion algorithms (OptSpace, ADMiRA and FPCA) on a single simulation platform and present numerical results. We show that in practice these efficient algorithms can be used to reconstruct real data matrices, as well as randomly generated matrices, accurately.

measurement and modeling of computer systems | 2011

Gossip PCA

Satish Babu Korada; Andrea Montanari; Sewoong Oh

Eigenvectors of data matrices play an important role in many computational problems, ranging from signal processing to machine learning and control. For instance, algorithms that compute positions of the nodes of a wireless network on the basis of pairwise distance measurements require a few leading eigenvectors of the distances matrix. While eigenvector calculation is a standard topic in numerical linear algebra, it becomes challenging under severe communication or computation constraints, or in absence of central scheduling. In this paper we investigate the possibility of computing the leading eigenvectors of a large data matrix through gossip algorithms. The proposed algorithm amounts to iteratively multiplying a vector by independent random sparsification of the original matrix and averaging the resulting normalized vectors. This can be viewed as a generalization of gossip algorithms for consensus, but the resulting dynamics is significantly more intricate. Our analysis is based on controlling the convergence to stationarity of the associated Kesten-Furstenberg Markov chain.

measurement and modeling of computer systems | 2015

Spy vs. Spy: Rumor Source Obfuscation

Giulia C. Fanti; Peter Kairouz; Sewoong Oh; Pramod Viswanath

Anonymous messaging platforms, such as Secret, Yik Yak and Whisper, have emerged as important social media for sharing ones thoughts without the fear of being judged by friends, family, or the public. Further, such anonymous platforms are crucial in nations with authoritarian governments; the right to free expression and sometimes the personal safety of the author of the message depend on anonymity. Whether for fear of judgment or personal endangerment, it is crucial to keep anonymous the identity of the user who initially posted a sensitive message. In this paper, we consider an adversary who observes a snapshot of the spread of a message at a certain time. Recent advances in rumor source detection shows that the existing messaging protocols are vulnerable against such an adversary. We introduce a novel messaging protocol, which we call adaptive diffusion, and show that it spreads the messages fast and achieves a perfect obfuscation of the source when the underlying contact network is an infinite regular tree: all users with the message are nearly equally likely to have been the origin of the message. Experiments on a sampled Facebook network show that it effectively hides the location of the source even when the graph is finite, irregular and has cycles.

IEEE Journal of Selected Topics in Signal Processing | 2015

The Staircase Mechanism in Differential Privacy

Quan Geng; Peter Kairouz; Sewoong Oh; Pramod Viswanath

Adding Laplacian noise is a standard approach in differential privacy to sanitize numerical data before releasing it. In this paper, we propose an alternative noise adding mechanism: the staircase mechanism, which is a geometric mixture of uniform random variables. The staircase mechanism can replace the Laplace mechanism in each instance in the literature and for the same level of differential privacy, the performance in each instance improves; the improvement is particularly stark in medium-low privacy regimes. We show that the staircase mechanism is the optimal noise adding mechanism in a universal context, subject to a conjectured technical lemma (which we also prove to be true for one and two dimensional data).

sensor array and multichannel signal processing workshop | 2010

On positioning via distributed matrix completion

Andrea Montanari; Sewoong Oh

The basic question in matrix completion is to infer a large low-rank matrix from a small subset of its entries. Positioning refers to the task of inferring the locations of n points from a subset of their distance. It turns out that positioning can be viewed as a matrix completion problem, although of a peculiar type. This paper discusses the applicability of distributed matrix completion algorithms to the positioning problem.

Explore More