Mathias Humbert
École Polytechnique Fédérale de Lausanne
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mathias Humbert.
computer and communications security | 2013
Mathias Humbert; Erman Ayday; Jean-Pierre Hubaux; Amalio Telenti
The rapid progress in human-genome sequencing is leading to a high availability of genomic data. This data is notoriously very sensitive and stable in time. It is also highly correlated among relatives. A growing number of genomes are becoming accessible online (e.g., because of leakage, or after their posting on genome-sharing websites). What are then the implications for kin genomic privacy? We formalize the problem and detail an efficient reconstruction attack based on graphical models and belief propagation. With this approach, an attacker can infer the genomes of the relatives of an individual whose genome is observed, relying notably on Mendels Laws and statistical relationships between the nucleotides (on the DNA sequence). Then, to quantify the level of genomic privacy as a result of the proposed inference attack, we discuss possible definitions of genomic privacy metrics. Genomic data reveals Mendelian diseases and the likelihood of developing degenerative diseases such as Alzheimers. We also introduce the quantification of health privacy, specifically the measure of how well the predisposition to a disease is concealed from an attacker. We evaluate our approach on actual genomic data from a pedigree and show the threat extent by combining data gathered from a genome-sharing website and from an online social network.
IEEE Transactions on Mobile Computing | 2017
Alexandra-Mihaela Olteanu; Kévin Huguenin; Reza Shokri; Mathias Humbert; Jean-Pierre Hubaux
Co-location information about users is increasingly available online. For instance, mobile users more and more frequently report their co-locations with other users in the messages and in the pictures they post on social networking websites by tagging the names of the friends they are with. The users’ IP addresses also constitute a source of co-location information. Combined with (possibly obfuscated) location information, such co-locations can be used to improve the inference of the users’ locations, thus further threatening their location privacy: As co-location information is taken into account, not only a users reported locations and mobility patterns can be used to localize her, but also those of her friends (and the friends of their friends and so on). In this paper, we study this problem by quantifying the effect of co-location information on location privacy, considering an adversary such as a social network operator that has access to such information. We formalize the problem and derive an optimal inference algorithm that incorporates such co-location information, yet at the cost of high complexity. We propose some approximate inference algorithms, including a solution that relies on the belief propagation algorithm executed on a general Bayesian network model, and we extensively evaluate their performance. Our experimental results show that, even in the case where the adversary considers co-locations of the targeted user with a single friend, the median location privacy of the user is decreased by up to 62 percent in a typical setting. We also study the effect of the different parameters (e.g., the settings of the location-privacy protection mechanisms) in different scenarios.
decision and game theory for security | 2010
Mathias Humbert; Mohammad Hossein Manshaei; Julien Freudiger; Jean-Pierre Hubaux
Users of mobile networks can change their identifiers in regions called mix zones in order to defeat the attempt of third parties to track their location. Mix zones must be deployed carefully in the network to reduce the cost they induce on mobile users and to provide high location privacy. Unlike most previous works that assume a global adversary, we consider a local adversary equipped with multiple eavesdropping stations. We study the interaction between the local adversary deploying eavesdropping stations to track mobile users and mobile users deploying mix zones to protect their location privacy. We use a game-theoretic model to predict the strategies of both players. We derive the strategies at equilibrium in complete and incomplete information scenarios and propose an algorithm to compute the equilibrium in a large network. Finally, based on real road-traffic information, we numerically quantify the effect of complete and incomplete information on the strategy selection of mobile users and of the adversary. Our results enable system designers to predict the best response of mobile users with respect to a local adversary strategy, and thus to select the best deployment of countermeasures.
privacy enhancing technologies | 2015
Mathias Humbert; Kévin Huguenin; Joachim Hugonot; Erman Ayday; Jean-Pierre Hubaux
Abstract People increasingly have their genomes sequenced and some of them share their genomic data online. They do so for various purposes, including to find relatives and to help advance genomic research. An individual’s genome carries very sensitive, private information such as its owner’s susceptibility to diseases, which could be used for discrimination. Therefore, genomic databases are often anonymized. However, an individual’s genotype is also linked to visible phenotypic traits, such as eye or hair color, which can be used to re-identify users in anonymized public genomic databases, thus raising severe privacy issues. For instance, an adversary can identify a target’s genome using known her phenotypic traits and subsequently infer her susceptibility to Alzheimer’s disease. In this paper, we quantify, based on various phenotypic traits, the extent of this threat in several scenarios by implementing de-anonymization attacks on a genomic database of OpenSNP users sequenced by 23andMe. Our experimental results show that the proportion of correct matches reaches 23% with a supervised approach in a database of 50 participants. Our approach outperforms the baseline by a factor of four, in terms of the proportion of correct matches, in most scenarios. We also evaluate the adversary’s ability to predict individuals’ predisposition to Alzheimer’s disease, and we observe that the inference error can be halved compared to the baseline. We also analyze the effect of the number of known phenotypic traits on the success rate of the attack. As progress is made in genomic research, especially for genotype-phenotype associations, the threat presented in this paper will become more serious.
european symposium on research in computer security | 2013
Mathias Humbert; Théophile Studer; Matthias Grossglauser; Jean-Pierre Hubaux
In this paper, we introduce a navigation privacy attack, where an external adversary attempts to find a target user by exploiting publicly visible attributes of intermediate users. If such an attack is successful, it implies that a user cannot hide simply by excluding himself from a central directory or search function. The attack exploits the fact that most attributes (such as place of residence, age, or alma mater) tend to correlate with social proximity, which can be exploited as navigational cues while crawling the network. The problem is exacerbated by privacy policies where a user who keeps his profile private remains nevertheless visible in his friends’ “friend lists”; such a user is still vulnerable to our navigation attack. Experiments with Facebook and Google+ show that the majority of users can be found efficiently using our attack, if a small set of attributes are known about the target as side information. Our results suggest that, in an online social network where many users reveal a (even limited) set of attributes, it is nearly impossible for a specific user to “hide in the crowd”.
ieee european symposium on security and privacy | 2017
Florian Tramèr; Vaggelis Atlidakis; Roxana Geambasu; Daniel J. Hsu; Jean-Pierre Hubaux; Mathias Humbert; Ari Juels; Huang Lin
In a world where traditional notions of privacy are increasingly challenged by the myriad companies that collect and analyze our data, it is important that decision-making entities are held accountable for unfair treatments arising from irresponsible data usage. Unfortunately, a lack of appropriate methodologies and tools means that even identifying unfair or discriminatory effects can be a challenge in practice. We introduce the unwarranted associations (UA) framework, a principled methodology for the discovery of unfair, discriminatory, or offensive user treatment in data-driven applications. The UA framework unifies and rationalizes a number of prior attempts at formalizing algorithmic fairness. It uniquely combines multiple investigative primitives and fairness metrics with broad applicability, granular exploration of unfair treatment in user subgroups, and incorporation of natural notions of utility that may account for observed disparities. We instantiate the UA framework in FairTest, the first comprehensive tool that helps developers check data-driven applications for unfair user treatment. It enables scalable and statistically rigorous investigation of associations between application outcomes (such as prices or premiums) and sensitive user attributes (such as race or gender). Furthermore, FairTest provides debugging capabilities that let programmers rule out potential confounders for observed unfair effects. We report on use of FairTest to investigate and in some cases address disparate impact, offensive labeling, and uneven rates of algorithmic error in four data-driven applications. As examples, our results reveal subtle biases against older populations in the distribution of error in a predictive health application and offensive racial labeling in an image tagger.
financial cryptography | 2015
Mathias Humbert; Erman Ayday; Jean-Pierre Hubaux; Amalio Telenti
Over the last few years, the vast progress in genome sequencing has highly increased the availability of genomic data. Today, individuals can obtain their digital genomic sequences at reasonable prices from many online service providers. Individuals can store their data on personal devices, reveal it on public online databases, or share it with third parties. Yet, it has been shown that genomic data is very privacy-sensitive and highly correlated between relatives. Therefore, individuals’ decisions about how to manage and secure their genomic data are crucial. People of the same family might have very different opinions about (i) how to protect and (ii) whether or not to reveal their genome. We study this tension by using a game-theoretic approach. First, we model the interplay between two purely-selfish family members. We also analyze how the game evolves when relatives behave altruistically. We define closed-form Nash equilibria in different settings. We then extend the game to N players by means of multi-agent influence diagrams that enable us to efficiently compute Nash equilibria. Our results notably demonstrate that altruism does not always lead to a more efficient outcome in genomic-privacy games. They also show that, if the discrepancy between the genome-sharing benefits that players perceive is too high, they will follow opposite sharing strategies, which has a negative impact on the familial utility.
allerton conference on communication, control, and computing | 2011
Mathias Humbert; Mohammad Hossein Manshaei; Jean-Pierre Hubaux
Scrip is a generic term for any substitute for real currency; it can be converted into goods or services sold by the issuer. In the classic scrip system model, one agent is helped by another in return for one unit of scrip. In this paper, we present an upgraded model, the one-to-n scrip system, where users need to find n agents to accomplish a single task. We provide a detailed analytical evaluation of this system based on a game-theoretic approach. We establish that a nontrivial Nash equilibrium exists in such systems under certain conditions. We study the effect of n on the equilibrium, on the distribution of scrip in the system and on its performance. Among other results, we show that the system designer should increase the average amount of scrip in the system when n increases in order to optimize its efficiency. We also explain how our new one-to-n scrip system can be applied to foster cooperation in two privacy-enhancing applications.
computer and communications security | 2016
Michael Backes; Pascal Berrang; Mathias Humbert; Praveen Manoharan
The continuous decrease in cost of molecular profiling tests is revolutionizing medical research and practice, but it also raises new privacy concerns. One of the first attacks against privacy of biological data, proposed by Homer et al. in 2008, showed that, by knowing parts of the genome of a given individual and summary statistics of a genome-based study, it is possible to detect if this individual participated in the study. Since then, a lot of work has been carried out to further study the theoretical limits and to counter the genome-based membership inference attack. However, genomic data are by no means the only or the most influential biological data threatening personal privacy. For instance, whereas the genome informs us about the risk of developing some diseases in the future, epigenetic biomarkers, such as microRNAs, are directly and deterministically affected by our health condition including most common severe diseases. In this paper, we show that the membership inference attack also threatens the privacy of individuals contributing their microRNA expressions to scientific studies. Our results on real and public microRNA expression data demonstrate that disease-specific datasets are especially prone to membership detection, offering a true-positive rate of up to 77% at a false-negative rate of less than 1%. We present two attacks: one relying on the L_1 distance and the other based on the likelihood-ratio test. We show that the likelihood-ratio test provides the highest adversarial success and we derive a theoretical limit on this success. In order to mitigate the membership inference, we propose and evaluate both a differentially private mechanism and a hiding mechanism. We also consider two types of adversarial prior knowledge for the differentially private mechanism and show that, for relatively large datasets, this mechanism can protect the privacy of participants in miRNA-based studies against strong adversaries without degrading the data utility too much. Based on our findings and given the current number of miRNAs, we recommend to only release summary statistics of datasets containing at least a couple of hundred individuals.
international world wide web conferences | 2018
Yang Zhang; Mathias Humbert; Tahleen Rahman; Cheng Te Li; Jun Pang; Michael Backes
Hashtag has emerged as a widely used concept of popular culture and campaigns, but its implications on people»s privacy have not been investigated so far. In this paper, we present the first systematic analysis of privacy issues induced by hashtags. We concentrate in particular on location, which is recognized as one of the key privacy concerns in the Internet era. By relying on a random forest model, we show that we can infer a user»s precise location from hashtags with accuracy of 70% to 76%, depending on the city. To remedy this situation, we introduce a system called Tagvisor that systematically suggests alternative hashtags if the user-selected ones constitute a threat to location privacy. Tagvisor realizes this by means of three conceptually different obfuscation techniques and a semantics-based metric for measuring the consequent utility loss. Our findings show that obfuscating as little as two hashtags already provides a near-optimal trade-off between privacy and utility in our dataset. This in particular renders Tagvisor highly time-efficient, and thus, practical in real-world settings.