Yitao Duan
University of California, Berkeley
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yitao Duan.
international workshop on peer to peer systems | 2002
Ben Y. Zhao; Yitao Duan; Ling Huang; Anthony D. Joseph; John Kubiatowicz
Recent work such as Tapestry, Pastry, Chord and CAN provide efficient location utilities in the form of overlay infrastructures. These systems treat nodes as if they possessed uniform resources, such as network bandwidth and connectivity. In this paper, we propose a systemic design for a secondaryoverlay of super-nodes which can be used to deliver messages directly to the destinations local network, thus improving route efficiency. We demonstrate the potential performance benefits by proposing a name mapping scheme for a Tapestry-Tapestry secondary overlay, and show preliminary simulation results demonstrating significant routing performance improvement.
privacy enhancing technologies | 2004
Yitao Duan; John F. Canny
In a Ubiquitous Computing environment, sensors are actively collecting data, much of which can be very sensitive. Data will often be streaming at high rates (video and audio) and it must be dealt with in real-time. Protecting the privacy of users is of central importance. Dealing with these issues will be a central challenge for ubicomp for some time to come. Here we propose some simple design principles which address several of these issues. We illustrate them through the design of a smart room capture system we are building. The main design principle is “data discretion:” users should have access and control of data about them, and should be able to determine how it is used. We show how data discretion supports both personal and collaborative uses. In our implementation, the data discretion principle is enforced with cryptographic techniques. Unlike ACL based access control systems, our scheme embeds access rights of legitimate users within the data. An important property of the method is that it hides meta-information about data access: no user can determine who (else) has access to any given datum. Access information is sensitive because it discloses information about which and when users were in the room. We have implemented a prototype system in the smart room equipped with several cameras, and we give data throughput rates under various degrees of protection. Finally we describe ongoing work towards a trustworthy ubicomp environment whose discretion is realistically checkable.
conference on information and knowledge management | 2009
Yitao Duan
This paper presents several results on statistical database privacy. We first point out a serious vulnerability in a widely-accepted approach which perturbs query results with additive noise. We then show that for sum queries which aggregate across all records, when the dataset is sufficiently large, the inherent uncertainty associated with unknown quantities is enough to provide similar perturbation and the same privacy can be obtained without external noise. Sum query is a surprisingly general primitive supporting a large number of data mining algorithms such as SVD, PCA, k-means, ID3, SVM, EM, and all the algorithms in the statistical query model. We derive privacy conditions for sum queries and provide the first mathematical proof for the intuition that aggregates across a large number of individuals is private using a widely accepted notion of privacy. We also show how the results can be used to construct simulatable query auditing algorithms with stronger privacy.
computational intelligence and data mining | 2007
Yitao Duan; John F. Canny; Justin Zhan
In this paper we introduce a new practical framework, called P4P (peers for privacy), for privacy-preserving data mining. P4P features a hybrid architecture combining P2P and client-server paradigms and provides practical private protocols for user data validation and general computation. The architecture is guided by the natural incentives of the participants and allows the computation to be based on verifiable secret sharing (VSS) where arithmetic operations are done over small fields (e.g. 32 or 64 bits), so that private arithmetic operations have the same cost as normal arithmetic. Verification of user data, which uses large-field public-key arithmetic (1024 bits or more) and homomorphic computation, only requires a small number (constant or logarithmic in the size of user data) of large integer operations. The solution is extremely efficient: In experiments with our implementation, verification of a million-element vector takes a few seconds of server or client time on commodity PCs (in contrast, using standard techniques takes hours). This verification can be used in many privacy-preserving data mining tasks to detect cheating users who attempt to bias the computation by submitting exaggerated values as their inputs. As an example, we demonstrate how association rule mining can be done in the P4P model with near-optimal efficiency and provable privacy
Computational and Mathematical Organization Theory | 2005
Yitao Duan; Jingtao Wang; Matthew Kam; John F. Canny
Link analysis algorithms have been used successfully on hyperlinked data to identify authoritative documents and retrieve other information. They also showed great potential in many new areas such as counterterrorism and surveillance. Emergence of new applications and changes in existing ones created new opportunities, as well as difficulties, for them: (1) In many situations where link analysis is applicable, there may not be an explicit hyperlinked structure. (2) The system can be highly dynamic, resulting in constant update to the graph. It is often too expensive to rerun the algorithm for each update. (3) The application often relies heavily on client-side logging and the information encoded in the graph can be very personal and sensitive. In this case privacy becomes a major concern. Existing link analysis algorithms, and their traditional implementations, are not adequate in face of these new challenges. In this paper we propose the use of a weighted graph to define and/or augment a link structure. We present a generalized HITS algorithm that is suitable for running in a dynamic environment. The algorithm uses the idea of “lazy update” to amortize cost across multiple updates while still providing accurate ranking to users in the mean time. We prove the convergence of the new algorithm and evaluate its benefit using the Enron email dataset. Finally we devise a distributed implementation of the algorithm that preserves user privacy thus making it socially acceptable in real-world applications.
granular computing | 2006
Yitao Duan; John F. Canny
This paper introduces a new framework for privacy preserving computation to the granular computing community. The framework is called P4P (Peers for Privacy) and features a unique architecture and practical protocols for user data validation and vector addition-based computation. It turned out that many non-trivial and non-linear computations can be done using an iterative algorithm with vector-addition aggregation steps. Examples include voting, summation, SVD, regression, and ANOVA etc. P4P allows them to be carried out while preserving users privacy. To demonstrate its application in granular computing, we present two practical protocols that test the equality of user vectors in zero-knowledge. Our protocols only involve constant number of public key operations (independent of vector size) and are very efficient. These protocols can be used to perform granulation, which is a fundamental task of granular computing, in a privacy-preserving manner. They can also be of independent interest for other fields such as data mining as well.
ieee international conference computer and communications | 2007
Yitao Duan; John F. Canny
Many network applications are based on a group communications model where one party sends messages to a large number of authorized recipients and/or receives messages from multiple senders. In this paper we present a secure group communication scheme based on a new cryptosystem that admits a rigorous proof of security against adaptive chosen ciphertext attack (IND-CCA2). Our scheme is bi-directional, supporting both one-to-many and many-to-one communications. Compared with existing solutions, our scheme achieves the following improvements: (1) It guarantees data confidentiality and authenticity in both directions; (2) It is the most scalable solution so far among all existing schemes achieving (1). The group member storage overhead is constant while both the center storage and rekeying communication complexity are independent of group size. (3) It can be made to achieve higher level of security and hide even the information about the group dynamics. We show that this protection is more effective and more efficient than existing solutions.
the cryptographers track at the rsa conference | 2006
Yitao Duan; John F. Canny
In this paper we present a general framework for constructing efficient multicast cryptosystems with provable security and show that a line of previous work on multicast encryption are all special cases of this general approach. We provide new methods for building such cryptosystems with various levels of security (e.g., IND-CPA, IND-CCA2). The results we obtained enable the construction of a whole class of new multicast schemes with guaranteed security using a broader range of common primitives such as OAEP. Moreover, we show that multicast cryptosystems with high level of security (e.g. IND-CCA2) can be based upon public key cryptosystems with weaker (e.g. CPA) security as long as the decryption can be securely and efficiently “shared”. Our constructions feature truly constant-size decryption keys whereas the lengths of both the encryption key and ciphertext are independent of group size.
principles of distributed computing | 2007
Yitao Duan; John F. Canny
In this paper we explore private computation built on vector addition which is a surprisingly general tool for implementing many useful analysis on user-provided data. Examples include both linear and non-linear algorithms such as singular value decomposition (SVD), regression, analysis of variance (ANOVA), and several machine learning algorithms based on Expectation Maximization (EM). The non-linear algorithms aggregate user data only in certain steps, such as conjugate gradient, which are linear in per-user data. We introduce a new and highly efficient VSS (Verifiable Secret-Sharing) protocol in a special but widely-applicable model that allows secret-shared arithmetic operations in such aggregation steps to be done over small fields (e.g. 32 or 64 bits), so that private arithmetic operations have the same cost as normal arithmetic. Verification of user data is required to prevent a malicious user from biasing the computation. We provide a random projection method for verification that uses a linear number of inexpensive small field operations, and only a logarithmic number of large-field (1024 bits or more) cryptographic operations. Our implementation shows that the approach can achieve orders of magnitude reduction in running time over standard techniques (from hours to seconds) for large scale problems (e.g. at the scale where the number of values per user is 106).
Large-Scale Data Analytics | 2014
Yitao Duan; John F. Canny
In this chapter we investigate practical technologies for security and privacy in data analysis at large scale. We motivate our approach by discussing the challenges and opportunities in light of current and emerging analysis paradigms on large data sets. In particular, we present a framework for privacy-preserving distributed data analysis that is practical for many real-world applications. The framework is called Peers for Privacy (P4P) and features a novel heterogeneous architecture and a number of efficient tools for performing private computation and offering security at large scale. It maintains three key properties, which are essential for real-world applications: (i) provably strong privacy; (ii) adequate efficiency at reasonably large scale; and (iii) robustness against realistic adversaries. The framework gains its practicality by decomposing data mining algorithms into a sequence of vector addition steps, which can be privately evaluated using efficient cryptographic tools, namely verifiable secret sharing over small field (e.g., 32 or 64 bits), which have the same cost as regular, non-private arithmetic. This paradigm supports a large number of statistical learning algorithms, including SVD, PCA, k-means, ID3 and machine learning algorithms based on Expectation-Maximization, as well as all algorithms in the statistical query model (Kearns, Efficient noise-tolerant learning from statistical queries. In: STOC’93, San Diego, pp. 392–401, 1993). As a concrete example, we show how singular value decomposition, which is an extremely useful algorithm and the core of many data mining tasks, can be performed efficiently with privacy in P4P. Using real data, we demonstrate that P4P is orders of magnitude faster than other solutions.