Is this you? Create Your Porfile

Mingqiang Xue

National University of Singapore

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mingqiang Xue is active.

Explore More

Publication

Featured researches published by Mingqiang Xue.

location and context awareness | 2009

Location Diversity: Enhanced Privacy Protection in Location Based Services

Mingqiang Xue; Panos Kalnis; Hung Keng Pung

Location-based Services are emerging as popular applications in pervasive computing. Spatial k -anonymity is used in Location-based Services to protect privacy, by hiding the association of a specific query with a specific user. Unfortunately, this approach fails in many practical cases such as: (i) personalized services, where the user identity is required, or (ii) applications involving groups of users (e.g., employees of the same company); in this case, associating a query to any member of the group, violates privacy. In this paper, we introduce the concept of Location Diversity , which solves the above-mentioned problems. Location Diversity improves Spatial k -anonymity by ensuring that each query can be associated with at least *** different semantic locations (e.g., school, shop, hospital, etc). We present an attack model that maps each observed query to a linear equation involving semantic locations, and we show that a necessary condition to preserve privacy is the existence of infinite solutions in the resulting system of linear equations. Based on this observation, we develop algorithms that generate groups of semantic locations, which preserve privacy and minimize the expected query processing and communication cost. The experimental evaluation demonstrates that our approach reduces significantly the privacy threats, while incurring minimal overhead.

knowledge discovery and data mining | 2012

Anonymizing set-valued data by nonreciprocal recoding

Mingqiang Xue; Panagiotis Karras; Chedy Raïssi; Jaideep Vaidya; Kian-Lee Tan

Today there is a strong interest in publishing set-valued data in a privacy-preserving manner. Such data associate individuals to sets of values (e.g., preferences, shopping items, symptoms, query logs). In addition, an individual can be associated with a sensitive label (e.g., marital status, religious or political conviction). Anonymizing such data implies ensuring that an adversary should not be able to (1) identify an individuals record, and (2) infer a sensitive label, if such exists. Existing research on this problem either perturbs the data, publishes them in disjoint groups disassociated from their sensitive labels, or generalizes their values by assuming the availability of a generalization hierarchy. In this paper, we propose a novel alternative. Our publication method also puts data in a generalized form, but does not require that published records form disjoint groups and does not assume a hierarchy either; instead, it employs generalized bitmaps and recasts data values in a nonreciprocal manner; formally, the bipartite graph from original to anonymized records does not have to be composed of disjoint complete subgraphs. We configure our schemes to provide popular privacy guarantees while resisting attacks proposed in recent research, and demonstrate experimentally that we gain a clear utility advantage over the previous state of the art.

conference on information and knowledge management | 2012

Delineating social network data anonymization via random edge perturbation

Mingqiang Xue; Panagiotis Karras; Raissi Chedy; Panos Kalnis; Hung Keng Pung

Social network data analysis raises concerns about the privacy of related entities or individuals. To address this issue, organizations can publish data after simply replacing the identities of individuals with pseudonyms, leaving the overall structure of the social network unchanged. However, it has been shown that attacks based on structural identification (e.g., a walk-based attack) enable an adversary to re-identify selected individuals in an anonymized network. In this paper we explore the capacity of techniques based on random edge perturbation to thwart such attacks. We theoretically establish that any kind of structural identification attack can effectively be prevented using random edge perturbation and show that, surprisingly, important properties of the whole network, as well as of subgraphs thereof, can be accurately calculated and hence data analysis tasks performed on the perturbed data, given that the legitimate data recipient knows the perturbation probability as well. Yet we also examine ways to enhance the walk-based attack, proposing a variant we call probabilistic attack. Nevertheless, we demonstrate that such probabilistic attacks can also be prevented under sufficient perturbation. Eventually, we conduct a thorough theoretical study of the probability of success of any}structural attack as a function of the perturbation probability. Our analysis provides a powerful tool for delineating the identification risk of perturbed social network data; our extensive experiments with synthetic and real datasets confirm our expectations.

computer and communications security | 2015

k-Anonymization by Freeform Generalization

Katerina Doka; Mingqiang Xue; Dimitrios Tsoumakos; Panagiotis Karras

Syntactic data anonymization strives to (i) ensure that an adversary cannot identify an individuals record from published attributes with high probability, and (ii) provide high data utility. These mutually conflicting goals can be expressed as an optimization problem with privacy as the constraint and utility as the objective function. Conventional research using the k-anonymity model has resorted to publishing data in homogeneous generalized groups. A recently proposed alternative does not create such cliques; instead, it recasts data values in a heterogeneous manner, aiming for higher utility. Nevertheless, such works never defined the problem in the most general terms; thus, the utility gains they achieve are limited. In this paper, we propose a methodology that achieves the full potential of heterogeneity and gains higher utility while providing the same privacy guarantee. We formulate the problem of maximal-utility k-anonymization by freeform generalization as a network flow problem. We develop an optimal solution therefor using Mixed Integer Programming. Given the non-scalability of this solution, we develop an O(k n2) Greedy algorithm that has no time-complexity disadvantage vis-á-vis previous approaches, an O(k n2 log n) enhanced version thereof, and an O(k n3) adaptation of the Hungarian algorithm; these algorithms build a set of k perfect matchings from original to anonymized data, a novel approach to the problem. Moreover, our techniques can resist adversaries who may know the employed algorithms. Our experiments with real-world data verify that our schemes achieve near-optimal utility (with gains of up to 41%), while they can exploit parallelism and data partitioning, gaining an efficiency advantage over simpler methods.

database systems for advanced applications | 2011

Distributed privacy preserving data collection

Mingqiang Xue; Panagiotis Papadimitriou; Chedy Raïssi; Panagiotis Kalnis; Hung Keng Pung

We study the distributed privacy preserving data collection problem: an untrusted data collector (e.g., a medical research institute) wishes to collect data (e.g., medical records) from a group of respondents (e.g., patients). Each respondent owns a multi-attributed record which contains both non-sensitive (e.g., quasi-identifiers) and sensitive information (e.g., a particular disease), and submits it to the data collector. Assuming T is the table formed by all the respondent data records, we say that the data collection process is privacy preserving if it allows the data collector to obtain a k-anonymized or l-diversified version of T without revealing the original records to the adversary. We propose a distributed data collection protocol that outputs an anonymized table by generalization of quasi-identifier attributes. The protocol employs cryptographic techniques such as homomorphic encryption, private information retrieval and secure multiparty computation to ensure the privacy goal in the process of data collection. Meanwhile, the protocol is designed to leak limited but noncritical information to achieve practicability and efficiency. Experiments show that the utility of the anonymized table derived by our protocol is in par with the utility achieved by traditional anonymization techniques.

conference on information and knowledge management | 2012

Discretionary social network data revelation with a user-centric utility guarantee

Yi Song; Panagiotis Karras; Sadegh Nobari; Giorgos Cheliotis; Mingqiang Xue; Stéphane Bressan

The proliferation of online social networks has created intense interest in studying their nature and revealing information of interest to the end user. At the same time, such revelation raises privacy concerns. Existing research addresses this problem following an approach popular in the database community: a model of data privacy is defined, and the data is rendered in a form that satisfies the constraints of that model while aiming to maximize some utility measure. Still, these is no consensus on a clear and quantifiable utility measure over graph data. In this paper, we take a different approach: we define a utility guarantee, in terms of certain graph properties being preserved, that should be respected when releasing data, while otherwise distorting the graph to an extend desired for the sake of confidentiality. We propose a form of data release which builds on current practice in social network platforms: A user may want to see a subgraph of the network graph, in which that user as well as connections and affiliates participate. Such a snapshot should not allow malicious users to gain private information, yet provide useful information for benevolent users. We propose a mechanism to prepare data for user view under this setting. In an experimental study with real data, we demonstrate that our method preserves several properties of interest more successfully than methods that randomly distort the graph to an equal extent, while withstanding structural attacks proposed in the literature.

conference on information and knowledge management | 2011

Utility-driven anonymization in data publishing

Mingqiang Xue; Panagiotis Karras; Chedy Raïssi; Hung Keng Pung

Privacy-preserving data publication has been studied intensely in the past years. Still, all existing approaches transform data values by random perturbation or generalization. In this paper, we introduce a radically different data anonymization methodology. Our proposal aims to maintain a certain amount of patterns, defined in terms of a set of properties of interest that hold for the original data. Such properties are represented as linear relationships among data points. We present an algorithm that generates a set of anonymized data that strictly preserves these properties, thus maintaining specified patterns in the data. Extensive experiments with real and synthetic data show that our algorithm is efficient, and produces anonymized data that affords high utility in several data analysis tasks while safeguarding privacy.

international conference on big data | 2015

Heterogeneous k-anonymization with high utility

Katerina Doka; Mingqiang Xue; Dimitrios Tsoumakos; Panagiotis Karras; Alfredo Cuzzocrea; Nectarios Koziris

Among the privacy-preserving approaches that are known in the literature, h-anonymity remains the basis of more advanced models while still being useful as a stand-alone solution. Applying h-anonymity in practice, though, incurs severe loss of data utility, thus limiting its effectiveness and reliability in real-life applications and systems. However, such loss in utility does not necessarily arise from an inherent drawback of the model itself, but rather from the deficiencies of the algorithms used to implement the model. Conventional approaches rely on a methodology that publishes data in homogeneous generalized groups. An alternative modern data publishing scheme focuses on publishing the data in heterogeneous groups and achieves higher utility, while ensuring the same privacy guarantees. As conventional approaches cannot anonymize data following this heterogeneous scheme, innovative solutions are required for this purpose. Following this approach, in this paper we provide a set of algorithms that ensure high-utility h-anonymity, via solving an equivalent graph processing problem.

knowledge discovery and data mining | 2014