Boxiang Dong
Stevens Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Boxiang Dong.
DBSec 2013 Proceedings of the 27th Annual IFIP WG 11.3 Conference on Data and Applications Security and Privacy XXVII - Volume 7964 | 2013
Boxiang Dong; Ruilin Liu; Hui Wendy Wang
The data-mining-as-a-service (DMaS) paradigm enables the data owner (client) that lacks expertise or computational resources to outsource its mining tasks to a third-party service provider (server). Outsourcing, however, raises a serious security issue: how can the client of weak computational power verify that the server returned correct mining result? In this paper, we focus on the problem of frequent itemset mining, and propose efficient and practical probabilistic verification approaches to check whether the server has returned correct and complete frequent itemsets.
IEEE Transactions on Services Computing | 2016
Boxiang Dong; Ruilin Liu; Hui Wendy Wang
Cloud computing is popularizing the computing paradigm in which data is outsourced to a third-party service provider (server) for data mining. Outsourcing, however, raises a serious security issue: how can the client of weak computational power verify that the server returned correct mining result? In this paper, we focus on the specific task of frequent itemset mining. We consider the server that is potentially untrusted and tries to escape from verification by using its prior knowledge of the outsourced data. We propose efficient probabilistic and deterministic verification approaches to check whether the server has returned correct and complete frequent itemsets. Our probabilistic approach can catch incorrect results with high probability, while our deterministic approach measures the result correctness with 100 percent certainty. We also design efficient verification methods for both cases that the data and the mining setup are updated. We demonstrate the effectiveness and efficiency of our methods using an extensive set of empirical results on real datasets.
conference on information and knowledge management | 2014
Boxiang Dong; Ruilin Liu; Wendy Hui Wang
The data-cleaning-as-a-service (DCaS) paradigm enables users to outsource their data and data cleaning needs to computationally powerful third-party service providers. It raises several security issues. One of the issues is how the client can protect the private information in the outsourced data. In this paper, we focus on data deduplication as the main data cleaning task, and design two efficient privacy-preserving data-deduplication methods for the DCaS paradigm. We analyze the robustness of our two methods against the attacks that exploit the auxiliary frequency distribution and the knowledge of the encoding algorithms. Our empirical study demonstrates the efficiency and effectiveness of our privacy preserving approaches.
international conference on data mining | 2013
Boxiang Dong; Ruilin Liu; Wendy Hui Wang
In this paper, we focus on the problem of result integrity verification for outsourcing of frequent item set mining. We design efficient cryptographic approaches that verify whether the returned frequent item set mining results are correct and complete with deterministic guarantee. The key of our solution is that the service provider constructs cryptographic proofs of the mining results. Both correctness and completeness of the mining results are measured against the proofs. We optimize the verification by minimizing the number of proofs. Our empirical study demonstrates the efficiency and effectiveness of the verification approaches.
information reuse and integration | 2016
Boxiang Dong; Wendy Hui Wang
In this paper, we consider the outsourcing model in which a third-party server provides data integration as a service. Identifying approximately duplicate records in databases is an essential step for the information integration processes. Most existing approaches rely on estimating the similarity of potential duplicates. The service provider returns all records from the outsourced dataset that are similar according to specific distance metrics. A major security concern of this outsourcing paradigm is whether the service provider returns sound and complete near-duplicates. In this paper, we design ARM, an authentication system for the outsourced record matching. The key idea of ARM is that besides the similar record pairs, the server returns the verification object (VO) of these similar pairs to prove their correctness. First, we design an authenticated data structure named MB-tree for VO construction. Second, we design a lightweight authentication method that can catch the service providers various cheating behaviors by utilizing VOs. We perform an extensive set of experiment on real-world datasets to demonstrate that ARM can verify the record matching results with cheap cost.
data and knowledge engineering | 2018
Boxiang Dong; Hui Wang
Abstract Cloud computing enables end-users to outsource their dataset and data management needs to a third-party service provider. One of the major security concerns of the outsourcing paradigm is how to protect sensitive information in the outsourced dataset. In some applications, only partial values are considered sensitive. In general, the sensitive information can be protected by encryption. However, data dependency constraints (together with the unencrypted data) in the outsourced data may serve as adversary knowledge and bring security vulnerabilities to the encrypted data. In this paper, we focus on functional dependency (FD), an important type of data dependency constraints, and study the security threats by the adversarial FDs. We design a practical scheme that can defend against the FD attack by encrypting a small amount of non-sensitive data (encryption overhead). We prove that finding the scheme that leads to the optimal encryption overhead is NP-complete, and design efficient heuristic algorithms, under the presence of one or multiple FDs. We design a secure query rewriting scheme that enables the service provider to answer various types of queries on the encrypted data with provable security guarantee. We extend our study to enforce security when there are conditional functional dependencies (CFDs) and data updates. We conduct an extensive set of experiments on two real-world datasets. The experiment results show that our heuristic approach brings small amounts of encryption overhead (at most 1% more than the optimal overhead), and enjoys a 10-time speedup compared with the optimal solution. Besides, our approach can reduce up to 90% of the encryption overhead of state-of-the-art solution.
international conference on distributed computing systems | 2017
Changjiang Cai; Haipei Sun; Boxiang Dong; Bo Zhang; Ting Wang; Hui Wang
Crowdsourced ranking algorithms ask the crowd to compare the objects and infer the full ranking based on the crowdsourced pairwise comparison results. In this paper, we consider the setting in which the task requester is equipped with a limited budget that can afford only a small number of pairwise comparisons. To make the problem more complicated, the crowd may return noisy comparison answers. We propose an approach to obtain a good-quality full ranking from a small number of pairwise preferences in two steps, namely task assignment and result inference. In the task assignment step, we generate pairwise comparison tasks that produce a full ranking with high probability. In the result inference step, based on the transitive property of pairwise comparisons and truth discovery, we design an efficient heuristic algorithm to find the best full ranking from the potentially conflictive pairwise preferences. The experiment results demonstrate the effectiveness and efficiency of our approach.
international conference on data engineering | 2017
Boxiang Dong; Wendy Hui Wang
The cloud paradigm enables users to outsource their data to computationally powerful third-party service providers for data management. Many data management tasks rely on the data dependency in the outsourced data. This raises an important issue of how the data owner can protect the sensitive information in the outsourced data while preserving the data dependency. In this paper, we consider functional dependency (FD), an important type of data dependency. Although simple deterministic encryption schemes can preserve FDs, they may be vulnerable against the frequency analysis attack. We design a frequency hiding, FD-preserving probabilistic encryption scheme, named F2, that enables the service provider to discover the FDs from the encrypted dataset. We consider two attacks, namely the frequency analysis (FA) attack and the FD-preserving chosen plaintext attack (FCPA), and show that the F2 encryption scheme can defend against both attacks with formal provable guarantee. Our empirical study demonstrates the efficiency and effectiveness of F2, as well as its security against both FA and FCPA attacks.
information reuse and integration | 2017
Boxiang Dong; Hui Wang
Cloud computing enables the outsourcing of big data analytics, where a third-party server is responsible for data management and processing. In this paper, we consider the outsourcing model in which a third-party server provides record matching as a service. In particular, given a target record, the service provider returns all records from the outsourced dataset that match the target according to specific distance metrics. Identifying matching records in databases plays an important role in information integration and entity resolution. A major security concern of this outsourcing paradigm is whether the service provider returns the correct record matching results. To solve the problem, we design EARRING, an Efficient Authentication of outsouRced Record matchING framework. EARRING requires the service provider to construct the verification object (VO) of the record matching results. From the VO, the client is able to catch any incorrect result with cheap computational cost. Experiment results on real-world datasets demonstrate the efficiency of EARRING.
IFIP Annual Conference on Data and Applications Security and Privacy | 2017
Bo Zhang; Boxiang Dong; Wendy Hui Wang
When outsourcing data mining needs to an untrusted service provider in the Data-Mining-as-a-Service (DMaS) paradigm, it is important to verify whether the service provider (server) returns correct mining results (in the format of data mining objects). We consider the setting in which each data mining object is associated with a weight for its importance. Given a client who is equipped with limited verification budget, the server selects a subset of mining results whose total verification cost does not exceed the given budget, while the total weight of the selected results is maximized. This maps to the well-known budgeted maximum coverage (BMC) problem, which is NP-hard. Therefore, the server may execute a heuristic algorithm to select a subset of mining results for verification. The server has financial incentives to cheat on the heuristic output, so that the client has to pay more for verification of the mining results that are less important. Our aim is to verify that the mining results selected by the server indeed satisfy the budgeted maximization requirement. It is challenging to verify the result integrity of the heuristic algorithms as the results are non-deterministic. We design a probabilistic verification method by including negative candidates (NCs) that are guaranteed to be excluded from the budgeted maximization result of the ratio-based BMC solutions. We perform extensive experiments on real-world datasets, and show that the NC-based verification approach can achieve high guarantee with small overhead.