Flávio du Pin Calmon
Harvard University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Flávio du Pin Calmon.
allerton conference on communication, control, and computing | 2012
Flávio du Pin Calmon; Nadia Fawaz
We propose a general statistical inference framework to capture the privacy threat incurred by a user that releases data to a passive but curious adversary, given utility constraints. We show that applying this general framework to the setting where the adversary uses the self-information cost function naturally leads to a non-asymptotic information-theoretic approach for characterizing the best achievable privacy subject to utility constraints. Based on these results we introduce two privacy metrics, namely average information leakage and maximum information leakage. We prove that under both metrics the resulting design problem of finding the optimal mapping from the users data to a privacy-preserving output can be cast as a modified rate-distortion problem which, in turn, can be formulated as a convex program. Finally, we compare our framework with differential privacy.
ieee global conference on signal and information processing | 2013
Salman Salamatian; Amy Zhang; Flávio du Pin Calmon; Sandilya Bhamidipati; Nadia Fawaz; Branislav Kveton; Pedro Oliveira; Nina Taft
We propose a practical methodology to protect a users private data, when he wishes to publicly release data that is correlated with his private data, in the hope of getting some utility. Our approach relies on a general statistical inference framework that captures the privacy threat under inference attacks, given utility constraints. Under this framework, data is distorted before it is released, according to a privacy-preserving probabilistic mapping. This mapping is obtained by solving a convex optimization problem, which minimizes information leakage under a distortion constraint. We address a practical challenge encountered when applying this theoretical framework to real world data: the optimization may become untractable and face scalability issues when data assumes values in large size alphabets, or is high dimensional. Our work makes two major contributions. We first reduce the optimization size by introducing a quantization step, and show how to generate privacy mappings under quantization. Second, we evaluate our method on a dataset showing correlations between political views and TV viewing habits, and demonstrate that good privacy properties can be achieved with limited distortion so as not to undermine the original purpose of the publicly released data, e.g. recommendations.
international symposium on information theory | 2015
Flávio du Pin Calmon; Ali Makhdoumi; Muriel Médard
We investigate the problem of intentionally disclosing information about a set of measurement points X (useful information), while guaranteeing that little or no information is revealed about a private variable S (private information). Given that S and X are drawn from a finite set with joint distribution pS,X, we prove that a non-trivial amount of useful information can be disclosed while not disclosing any private information if and only if the smallest principal inertia component of the joint distribution of S and X is 0. This fundamental result characterizes when useful information can be privately disclosed for any privacy metric based on statistical dependence. We derive sharp bounds for the tradeoff between disclosure of useful and private information, and provide explicit constructions of privacy-assuring mappings that achieve these bounds.
vehicular technology conference | 2013
Jason Cloud; Flávio du Pin Calmon; Weifei Zeng; Giovanni Pau; Linda Zeger; Muriel Médard
Existing mobile devices have the capability to use multiple network technologies simultaneously to help increase performance; but they rarely, if at all, effectively use these technologies in parallel. We first present empirical data to help understand the mobile environment when three heterogeneous networks are available to the mobile device (i.e., a WiFi network, WiMax network, and an Iridium satellite network). We then propose a reliable, multi-path protocol called Multi-Path TCP with Network Coding (MPTCP/NC) that utilizes each of these networks in parallel. An analytical model is developed and a mean-field approximation is derived that gives an estimate of the protocols achievable throughput. Finally, a comparison between MPTCP and MPTCP/NC is presented using both the empirical data and mean-field approximation. Our results show that network coding can provide users in mobile environments a higher quality of service by enabling the use of multiple network technologies and the capability to overcome packet losses due to lossy, wireless network connections.
allerton conference on communication, control, and computing | 2013
Flávio du Pin Calmon; Mayank Varia; Muriel Médard; Mark M. Christiansen; Ken R. Duffy; Stefano Tessaro
Lower bounds for the average probability of error of estimating a hidden variable X given an observation of a correlated random variable Y, and Fanos inequality in particular, play a central role in information theory. In this paper, we present a lower bound for the average estimation error based on the marginal distribution of X and the principal inertias of the joint distribution matrix of X and Y. Furthermore, we discuss an information measure based on the sum of the largest principal inertias, called k-correlation, which generalizes maximal correlation. We show that k-correlation satisfies the Data Processing Inequality and is convex in the conditional distribution of Y given X. Finally, we investigate how to answer a fundamental question in inference and privacy: given an observation Y, can we estimate a function f(X) of the hidden random variable X with an average error below a certain threshold? We provide a general method for answering this question using an approach based on rate-distortion theory.
information theory workshop | 2014
Flávio du Pin Calmon; Mayank Varia; Muriel Médard
The principal inertia components of the joint distribution of two random variables X and Y are inherently connected to how an observation of Y is statistically related to a hidden variable X. In this paper, we explore this connection within an information theoretic framework. We show that, under certain symmetry conditions, the principal inertia components play an important role in estimating one-bit functions of X, namely f(X), given an observation of Y. In particular, the principal inertia components bear an interpretation as filter coefficients in the linear transformation of pf(X)|X into pf(X)|Y. This interpretation naturally leads to the conjecture that the mutual information between f(X) and Y is maximized when all the principal inertia components have equal value. We also study the role of the principal inertia components in the Markov chain B → X → Y → B̂, where B and B̂ are binary random variables. We illustrate our results for the setting where X and Y are binary strings and Y is the result of sending X through an additive noise binary channel.
IEEE Journal of Selected Topics in Signal Processing | 2015
Salman Salamatian; Amy X. Zhang; Flávio du Pin Calmon; Sandilya Bhamidipati; Nadia Fawaz; Branislav Kveton; Pedro Oliveira; Nina Taft
We propose a practical methodology to protect a users private data, when he wishes to publicly release data that is correlated with his private data, to get some utility. Our approach relies on a general statistical inference framework that captures the privacy threat under inference attacks, given utility constraints. Under this framework, data is distorted before it is released, according to a probabilistic privacy mapping. This mapping is obtained by solving a convex optimization problem, which minimizes information leakage under a distortion constraint. We address practical challenges encountered when applying this theoretical framework to real world data. On one hand, the design of optimal privacy mappings requires knowledge of the prior distribution linking private data and data to be released, which is often unavailable in practice. On the other hand, the optimization may become untractable when data assumes values in large size alphabets, or is high dimensional. Our work makes three major contributions. First, we provide bounds on the impact of a mismatched prior on the privacy-utility tradeoff. Second, we show how to reduce the optimization size by introducing a quantization step, and how to generate privacy mappings under quantization. Third, we evaluate our method on two datasets, including a new dataset that we collected, showing correlations between political convictions and TV viewing habits. We demonstrate that good privacy properties can be achieved with limited distortion so as not to undermine the original purpose of the publicly released data, e.g., recommendations.
allerton conference on communication, control, and computing | 2012
Flávio du Pin Calmon; Muriel Médard; Linda Zeger; João Barros; Mark M. Christiansen; Ken R. Duffy
We present a new information-theoretic definition and associated results, based on list decoding in a source coding setting. We begin by presenting list-source codes, which naturally map a key length (entropy) to list size. We then show that such codes can be analyzed in the context of a novel information-theoretic metric, ϵ-symbol secrecy, that encompasses both the one-time pad and traditional rate-based asymptotic metrics, but, like most cryptographic constructs, can be applied in non-aymptotic settings. We derive fundamental bounds for ϵ-symbol secrecy and demonstrate how these bounds can be achieved with MDS codes when the source is uniformly distributed. We discuss applications and implementation issues of our codes.
international symposium on information theory | 2015
Flávio du Pin Calmon; Yury Polyanskiy; Yihong Wu
This work presents strong data processing results for the power-constrained additive Gaussian channel. Explicit bounds on the amount of decrease of mutual information under convolution with Gaussian noise are shown. The analysis leverages the connection between information and estimation (I-MMSE) and the following estimation-theoretic result of independent interest. It is proved that any random variable for which there exists an almost optimal (in terms of the mean-squared error) linear estimator operating on the Gaussian-corrupted measurement must necessarily be almost Gaussian (in terms of the Kolmogorov-Smirnov distance).
IEEE Transactions on Information Theory | 2015
Mark M. Christiansen; Ken R. Duffy; Flávio du Pin Calmon; Muriel Médard
The Guesswork problem was originally motivated by a desire to quantify computational security for single user systems. Leveraging recent results from its analysis, we extend the remit and utility of the framework to the quantification of the computational security for multi-user systems. In particular, assume that V users independently select strings stochastically from a finite, but potentially large, list. An inquisitor who does not know which strings have been selected wishes to identify U of them. The inquisitor knows the selection probabilities of each user and is equipped with a method that enables the testing of each (user, string) pair, one at a time, for whether that string had been selected by that user. Here we establish that, unless U = V , there is no general strategy that minimizes the distribution of the number of guesses, but in the asymptote as the strings become long we prove the following: by construction, there is an asymptotically optimal class of strategies; the number of guesses required in an asymptotically optimal strategy satisfies a large deviation principle with a rate function, which is not necessarily convex, that can be determined from the rate functions of optimally guessing individual users’ strings; if all user’s selection statistics are identical, the exponential growth rate of the average guesswork as the string-length (