Grigory Yaroslavtsev
Pennsylvania State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Grigory Yaroslavtsev.
ACM Transactions on Database Systems | 2014
Vishesh Karwa; Sofya Raskhodnikova; Adam D. Smith; Grigory Yaroslavtsev
We present efficient algorithms for releasing useful statistics about graph data while providing rigorous privacy guarantees. Our algorithms work on datasets that consist of relationships between individuals, such as social ties or email communication. The algorithms satisfy edge differential privacy, which essentially requires that the presence or absence of any particular relationship be hidden. Our algorithms output approximate answers to subgraph counting queries. Given a query graph H, for example, a triangle, k-star, or k-triangle, the goal is to return the number of edge-induced isomorphic copies of H in the input graph. The special case of triangles was considered by Nissim et al. [2007] and a more general investigation of arbitrary query graphs was initiated by Rastogi et al. [2009]. We extend the approach of Nissim et al. to a new class of statistics, namely k-star queries. We also give algorithms for k-triangle queries using a different approach based on the higher-order local sensitivity. For the specific graph statistics we consider (i.e., k-stars and k-triangles), we significantly improve on the work of Rastogi et al.: our algorithms satisfy a stronger notion of privacy that does not rely on the adversary having a particular prior distribution on the data, and add less noise to the answers before releasing them. We evaluate the accuracy of our algorithms both theoretically and empirically, using a variety of real and synthetic datasets. We give explicit, simple conditions under which these algorithms add a small amount of noise. We also provide the average-case analysis in the Erdős-Rényi-Gilbert G(n,p) random graph model. Finally, we give hardness results indicating that the approach Nissim et al. used for triangles cannot easily be extended to k-triangles (hence justifying our development of a new algorithmic approach).
symposium on the theory of computing | 2014
Alexandr Andoni; Aleksandar Nikolov; Krzysztof Onak; Grigory Yaroslavtsev
We give algorithms for geometric graph problems in the modern parallel models such as MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a (1 + ε)-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem [9], despite drawing significant attention in recent years. We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, n1+oε(1). We note that while recently [33] have developed a near-linear time algorithm for (1 + ε)-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in [33]. Furthermore, our algorithm immediately gives a (1+ ε)-approximation algorithm with nδ space in the streaming-with-sorting model with 1/δO(1) passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem.
international conference on data engineering | 2013
Grigory Yaroslavtsev; Graham Cormode; Cecilia M. Procopiuc; Divesh Srivastava
A central problem in releasing aggregate information about sensitive data is to do so accurately while providing a privacy guarantee on the output. Recent work focuses on the class of linear queries, which include basic counting queries, data cubes, and contingency tables. The goal is to maximize the utility of their output, while giving a rigorous privacy guarantee. Most results follow a common template: pick a “strategy” set of linear queries to apply to the data, then use the noisy answers to these queries to reconstruct the queries of interest. This entails either picking a strategy set that is hoped to be good for the queries, or performing a costly search over the space of all possible strategies. In this paper, we propose a new approach that balances accuracy and efficiency: we show how to improve the accuracy of a given query set by answering some strategy queries more accurately than others. This leads to an efficient optimal noise allocation for many popular strategies, including wavelets, hierarchies, Fourier coefficients and more. For the important case of marginal queries we show that this strictly improves on previous methods, both analytically and empirically. Our results also extend to ensuring that the returned query answers are consistent with an (unknown) data set at minimal extra cost in terms of time and noise.
conference on computational complexity | 2014
Eric Blais; Sofya Raskhodnikova; Grigory Yaroslavtsev
We show how the communication complexity method introduced in (Blais, Brody, Matulef 2012) can be used to prove lower bounds on the number of queries required to test properties of functions with non-hypercube domains. We use this method to prove strong, and in many cases optimal, lower bounds on the query complexity of testing fundamental properties of functions f : {1, . . ., n}d → ℝ over hypergrid domains: monotonicity, the Lipschitz property, separate convexity, convexity and monotonicity of higher-order derivatives. There is a long line of work on upper bounds and lower bounds for many of these properties that uses a diverse set of combinatorial techniques. Our method provides a unified treatment of lower bounds for all these properties based on Fourier analysis. A key ingredient in our new lower bounds is a set of Walsh functions, a canonical Fourier basis for the set of functions on the line {1, . . ., n}. The orthogonality of the Walsh functions lets us use a product construction to extend our method from properties of functions over the line to properties of functions over hypergrids. Our product construction applies to properties over hypergrids that can be expressed in terms of axis-parallel directional derivatives, such as monotonicity, the Lipschitz property and separate convexity. We illustrate the robustness of our method by making it work for convexity, which is the property of the Hessian matrix of second derivatives being positive semidefinite and thus cannot be described by axis-parallel directional derivatives alone. Such robustness contrasts with the state of the art in the upper bounds for testing properties over hypergrids: methods that work for other properties are not applicable for testing convexity, for which no nontrivial upper bounds are known for d ≥ 2.
Proceedings of the National Academy of Sciences of the United States of America | 2016
Michael J. Kearns; Aaron Roth; Zhiwei Steven Wu; Grigory Yaroslavtsev
Significance Motivated by tensions between data privacy for individual citizens, and societal priorities such as counterterrorism, we introduce a computational model that distinguishes between parties for whom privacy is explicitly protected, and those for whom it is not (the “targeted” subpopulation). Within this framework, we provide provably privacy-preserving algorithms for targeted search in social networks. We validate the utility of our algorithms with extensive computational experiments on two large-scale social network datasets. Motivated by tensions between data privacy for individual citizens and societal priorities such as counterterrorism and the containment of infectious disease, we introduce a computational model that distinguishes between parties for whom privacy is explicitly protected, and those for whom it is not (the targeted subpopulation). The goal is the development of algorithms that can effectively identify and take action upon members of the targeted subpopulation in a way that minimally compromises the privacy of the protected, while simultaneously limiting the expense of distinguishing members of the two groups via costly mechanisms such as surveillance, background checks, or medical testing. Within this framework, we provide provably privacy-preserving algorithms for targeted search in social networks. These algorithms are natural variants of common graph search methods, and ensure privacy for the protected by the careful injection of noise in the prioritization of potential targets. We validate the utility of our algorithms with extensive computational experiments on two large-scale social network datasets.
symposium on the theory of computing | 2014
Piotr Berman; Sofya Raskhodnikova; Grigory Yaroslavtsev
We initiate a systematic study of sublinear algorithms for approximately testing properties of real-valued data with respect to Lp distances for p = 1, 2. Such algorithms distinguish datasets which either have (or are close to having) a certain property from datasets which are far from having it with respect to Lp distance. For applications involving noisy real-valued data, using Lp distances allows algorithms to withstand noise of bounded Lp norm. While the classical property testing framework developed with respect to Hamming distance has been studied extensively, testing with respect to Lp distances has received little attention. We use our framework to design simple and fast algorithms for classic problems, such as testing monotonicity, convexity and the Lipschitz property, and also distance approximation to monotonicity. In particular, for functions over the hypergrid domains [n]d, the complexity of our algorithms for all these properties does not depend on the linear dimension n. This is impossible in the standard model. Most of our algorithms require minimal assumptions on the choice of sampled data: either uniform or easily samplable random queries suffice. We also show connections between the Lp-testing model and the standard framework of property testing with respect to Hamming distance. Some of our results improve existing bounds for Hamming distance.
principles of distributed computing | 2014
Joshua Brody; Amit Chakrabarti; Ranganath Kondapally; David P. Woodruff; Grigory Yaroslavtsev
We consider the following fundamental communication problem - there is data that is distributed among servers, and the servers want to compute the intersection of their data sets, e.g., the common records in a relational database. They want to do this with as little communication and as few messages (rounds) as possible. They are willing to use randomization, and fail with a tiny probability. Given a protocol for computing the intersection, it can also be used to compute the exact Jaccard similarity, the rarity, the number of distinct elements, and joins between databases. Computing the intersection is at least as hard as the set disjointness problem, which asks whether the intersection is empty. Formally, in the two-server setting, the players hold subsets S, T ⊆ [n]. In many realistic scenarios, the sizes of S and T are significantly smaller than n, so we impose the constraint that |S|, |T| ≤ k. We study the minimum number of bits the parties need to communicate in order to compute the intersection set S ∩ T, given a certain number r of messages that are allowed to be exchanged. While O(k log (n/k)) bits is achieved trivially and deterministically with a single message, we ask what is possible with more than one message and with randomization. We give a smooth communication/round tradeoff which shows that with O(log* k) rounds, O(k) bits of communication is possible, which improves upon the trivial protocol by an order of magnitude. This is in contrast to other basic problems such as computing the union or symmetric difference, for which Ω(k log(n/k)) bits of communication is required for any number of rounds. For two players, known lower bounds for the easier problem of set disjointness imply our algorithms are optimal up to constant factors in communication and number of rounds. We extend our protocols to
international workshop and international workshop on approximation randomization and combinatorial optimization algorithms and techniques | 2014
Joshua Brody; Amit Chakrabarti; Ranganath Kondapally; David P. Woodruff; Grigory Yaroslavtsev
m
Combinatorica | 2014
Piotr Berman; Arnab Bhattacharyya; Elena Grigorescu; Sofya Raskhodnikova; David P. Woodruff; Grigory Yaroslavtsev
-player protocols, obtaining an optimal O(mk) bits of communication with a similarly small number of rounds.
international workshop and international workshop on approximation, randomization, and combinatorial optimization. algorithms and techniques | 2012
Piotr Berman; Grigory Yaroslavtsev
The EQUALITY problem is usually one’s first encounter with communication complexity and is one of the most fundamental problems in the field. Although its deterministic and randomized communication complexity were settled decades ago, we find several new things to say about the problem by focusing on three subtle aspects. The first is to consider the expected communication cost (at a worst-case input) for a protocol that uses limited interaction—i.e., a bounded number of rounds of communication—and whose error probability is zero or close to it. The second is to treat the false negative error rate separately from the false positive error rate. The third is to consider the information cost of such protocols. We obtain asymptotically optimal rounds-versus-cost tradeoffs for EQUALITY: both expected communication cost and information cost scale as Theta(log log ... log n), with r-1 logs, where r is the number of rounds. These bounds hold even when the false negative rate approaches 1. For the case of zero-error communication cost, we obtain essentially matching bounds, up to a tiny additive constant. We also provide some applications.