Yin Yang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yin Yang is active.

Explore More

Publication

Featured researches published by Yin Yang.

international conference on distributed computing systems | 2015

DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams

Tom Z. J. Fu; Jianbing Ding; Richard T. B. Ma; Marianne Winslett; Yin Yang; Zhenjie Zhang

In a data stream management system (DSMS), users register continuous queries, and receive result updates as data arrive and expire. We focus on applications with real-time constraints, in which the user must receive each result update within a given period after the update occurs. To handle fast data, the DSMS is commonly placed on top of a cloud infrastructure. Because stream properties such as arrival rates can fluctuate unpredictably, cloud resources must be dynamically provisioned and scheduled accordingly to ensure real-time response. It is essential, for the existing systems or future developments, to possess the ability of scheduling resources dynamically according to the current workload, in order to avoid wasting resources, or failing in delivering correct results on time. Motivated by this, we propose DRS, a novel dynamic resource scheduler for cloud-based DSMSs. DRS overcomes three fundamental challenges: (a) how to model the relationship between the provisioned resources and query response time (b) where to best place resources, and (c) how to measure system load with minimal overhead. In particular, DRS includes an accurate performance model based on the theory of Jackson open queueing networks and is capable of handling arbitrary operator topologies, possibly with loops, splits and joins. Extensive experiments with real data confirm that DRS achieves real-time response with close to optimal resource consumption.

Bioinformatics | 2015

Deterministic Identification of Specific Individuals from GWAS Results

Ruichu Cai; Zhifeng Hao; Marianne Winslett; Xiaokui Xiao; Yin Yang; Zhenjie Zhang; Shuigeng Zhou

MOTIVATIONnGenome-wide association studies (GWASs) are commonly applied on human genomic data to understand the causal gene combinations statistically connected to certain diseases. Patients involved in these GWASs could be re-identified when the studies release statistical information on a large number of single-nucleotide polymorphisms. Subsequent work, however, found that such privacy attacks are theoretically possible but unsuccessful and unconvincing in real settings.nnnRESULTSnWe derive the first practical privacy attack that can successfully identify specific individuals from limited published associations from the Wellcome Trust Case Control Consortium (WTCCC) dataset. For GWAS results computed over 25 randomly selected loci, our algorithm always pinpoints at least one patient from the WTCCC dataset. Moreover, the number of re-identified patients grows rapidly with the number of published genotypes. Finally, we discuss prevention methods to disable the attack, thus providing a solution for enhancing patient privacy.nnnAVAILABILITY AND IMPLEMENTATIONnProofs of the theorems and additional experimental results are available in the support online documents. The attack algorithm codes are publicly available at https://sites.google.com/site/zhangzhenjie/GWAS_attack.zip. The genomic dataset used in the experiments is available at http://www.wtccc.org.uk/ on request.

ACM Transactions on Database Systems | 2015

Optimizing Batch Linear Queries under Exact and Approximate Differential Privacy

Ganzhao Yuan; Zhenjie Zhang; Marianne Winslett; Xiaokui Xiao; Yin Yang; Zhifeng Hao

Differential privacy is a promising privacy-preserving paradigm for statistical query processing over sensitive data. It works by injecting random noise into each query result such that it is provably hard for the adversary to infer the presence or absence of any individual record from the published noisy results. The main objective in differentially private query processing is to maximize the accuracy of the query results while satisfying the privacy guarantees. Previous work, notably Li et al. [2010], has suggested that, with an appropriate strategy, processing a batch of correlated queries as a whole achieves considerably higher accuracy than answering them individually. However, to our knowledge there is currently no practical solution to find such a strategy for an arbitrary query batch; existing methods either return strategies of poor quality (often worse than naive methods) or require prohibitively expensive computations for even moderately large domains. Motivated by this, we propose a low-rank mechanism (LRM), the first practical differentially private technique for answering batch linear queries with high accuracy. LRM works for both exact (i.e., ε-) and approximate (i.e., (ε, Δ)-) differential privacy definitions. We derive the utility guarantees of LRM and provide guidance on how to set the privacy parameters, given the users utility expectation. Extensive experiments using real data demonstrate that our proposed method consistently outperforms state-of-the-art query processing solutions under differential privacy, by large margins.

knowledge discovery and data mining | 2016

Convex Optimization for Linear Query Processing under Approximate Differential Privacy

Ganzhao Yuan; Yin Yang; Zhenjie Zhang; Zhifeng Hao

Differential privacy enables organizations to collect accurate aggregates over sensitive data with strong, rigorous guarantees on individuals privacy. Previous work has found that under differential privacy, computing multiple correlated aggregates as a batch, using an appropriate strategy, may yield higher accuracy than computing each of them independently. However, finding the best strategy that maximizes result accuracy is non-trivial, as it involves solving a complex constrained optimization program that appears to be non-convex. Hence, in the past much effort has been devoted in solving this non-convex optimization program. Existing approaches include various sophisticated heuristics and expensive numerical solutions. None of them, however, guarantees to find the optimal solution of this optimization problem. This paper points out that under (ε, ཬ)-differential privacy, the optimal solution of the above constrained optimization problem in search of a suitable strategy can be found, rather surprisingly, by solving a simple and elegant convex optimization program. Then, we propose an efficient algorithm based on Newtons method, which we prove to always converge to the optimal solution with linear global convergence rate and quadratic local convergence rate. Empirical evaluations demonstrate the accuracy and efficiency of the proposed solution.

international conference on management of data | 2016

Elastic Pipelining in an In-Memory Database Cluster

Li Wang; Minqi Zhou; Zhenjie Zhang; Yin Yang; Aoying Zhou; Dina Bitton

An in-memory database cluster consists of multiple interconnected nodes with a large capacity of RAM and modern multi-core CPUs. As a conventional query processing strategy, pipelining remains a promising solution for in-memory parallel database systems, as it avoids expensive intermediate result materialization and parallelizes the data processing among nodes. However, to fully unleash the power of pipelining in a cluster with multi-core nodes, it is crucial for the query optimizer to generate good query plans with appropriate intra-node parallelism, in order to maximize CPU and network bandwidth utilization. A suboptimal plan, on the contrary, causes load imbalance in the pipelines and consequently degrades the query performance. Parallelism assignment optimization at compile time is nearly impossible, as the workload in each node is affected by numerous factors and is highly dynamic during query evaluation. To tackle this problem, we propose elastic pipelining, which makes it possible to optimize intra-node parallelism assignments in the pipelines based on the actual workload at runtime. It is achieved with the adoption of new elastic iterator model and a fully optimized dynamic scheduler. The elastic iterator model generally upgrades traditional iterator model with new dynamic multi-core execution adjustment capability. And the dynamic scheduler efficiently provisions CPU cores to query execution segments in the pipelines based on the light-weight measurements on the operators. Extensive experiments on real and synthetic (TPC-H) data show that our proposal achieves almost full CPU utilization on typical decision-making analytical queries, outperforming state-of-the-art open-source systems by a huge margin.

Computer Networks | 2016

Auction-based cloud service differentiation with service level objectives

Jianbing Ding; Zhenjie Zhang; Richard T. B. Ma; Yin Yang

We present a new study on service differentiation techniques for general cloud system. Our solution potentially opens new business models for cloud systems in the future, and enables ordinary users to exploit the benefits of clouds.We propose Abacus an auction based approach to cloud system resource allocation and scheduling, with enticing features such as incentive-compatibility, system stability and efficiency.We simplify the auction procedure by allowing the users to skip the utility function when the user is unsure or unaware of the exact utility model of his own repeated jobs.We implement Abacus by modifying the scheduling algorithm in Hadoop, and test it on a large-scale cloud platform. Our experimental results verify the truthfulness of our auction-based mechanism, system efficiency, as well as the accuracy of our utility prediction algorithm. The emergence of the cloud computing paradigm has greatly enabled innovative service models, such as Platform as a Service (PaaS), and distributed computing frameworks, such as MapReduce. However, most existing cloud systems fail to distinguish users with different preferences, or jobs of different natures. Consequently, they are unable to provide service differentiation, leading to inefficient allocations of cloud resources. Moreover, contentions on the resources exacerbate this inefficiency, when prioritizing crucial jobs is necessary, but impossible. Motivated by this, we propose Abacus, a generic resource management framework addressing this problem. Abacus interacts with users through an auction mechanism, which allows users to specify their priorities using budgets, and job characteristics via utility functions. Based on this information, Abacus computes the optimal allocation and scheduling of resources. Meanwhile, the auction mechanism in Abacus possesses important properties including incentive compatibility (i.e., the users best strategy is to simply bid their true budgets and job utilities) and monotonicity (i.e., users are motivated to increase their budgets in order to receive better services). In addition, when the user is unclear about her utility function, Abacus automatically learns this function based on statistics of her previous jobs. Extensive experiments, running Hadoop on a private cluster and Amazon EC2, demonstrate the high performance and other desirable properties of Abacus.

acm multimedia | 2015

LiveTraj: Real-Time Trajectory Tracking over Live Video Streams

Tom Z. J. Fu; Jianbing Ding; Richard T. B. Ma; Marianne Winslett; Yin Yang; Zhenjie Zhang; Yong Pei; Bingbing Ni

We present LiveTraj, a novel system for tracking trajectories in a live video stream in real time, backed by a cloud platform. Although trajectory tracking is a well-studied topic in computer vision, so far most attention has been devoted to improving the accuracy of trajectory tracking, rather than the efficiency. To our knowledge, LiveTraj is the first that achieves real-time efficiency in trajectory tracking, which can be a key enabler in many important applications such as video surveillance, action recognition and robotics. LiveTraj is based on a state-of-the-art approach to (offline) trajectory tracking; its main innovation is to adapt this base solution to run on an elastic cloud platform to achieve real-time tracking speed at an affordable cost. The video demo shows the offline base solution and LiveTraj side by side, both running on a video stream containing human actions. Besides demonstrating the real-time efficiency of LiveTraj, our video demo also exhibits important system parameters to the audience such as latency and cloud resource usage for different components of the system. Further, if the conference venue provides sufficiently fast Internet connection to our cloud platform, we also plan to demonstrate LiveTraj on-site, during which we will show LiveTraj identifying and tracking trajectories from a live video stream captured by a camera.

IEEE ACM Transactions on Networking | 2017

DRS: Auto-Scaling for Real-Time Stream Analytics

Tom Z. J. Fu; Jianbing Ding; Richard T. B. Ma; Marianne Winslett; Yin Yang; Zhenjie Zhang

In a stream data analytics system, input data arrive continuously and trigger the processing and updating of analytics results. We focus on applications with real-time constraints, in which, any data unit must be completely processed within a given time duration. To handle fast data, it is common to place the stream data analytics system on top of a cloud infrastructure. Because stream properties, such as arrival rates can fluctuate unpredictably, cloud resources must be dynamically provisioned and scheduled accordingly to ensure real-time responses. It is essential, for existing systems or future developments, to possess the ability of scaling resources dynamically according to the instantaneous workload, in order to avoid wasting resources or failing in delivering the correct analytics results on time. Motivated by this, we propose DRS, a dynamic resource scaling framework for cloud-based stream data analytics systems. DRS overcomes three fundamental challenges: 1) how to model the relationship between the provisioned resources and the application performance, 2) where to best place resources, and 3) how to measure the system load with minimal overhead. In particular, DRS includes an accurate performance model based on the theory of Jackson open queueing networks and is capable of handling arbitrary operator topologies, possibly with loops, splits, and joins. Extensive experiments with real data show that DRS is capable of detecting sub-optimal resource allocation and making quick and effective resource adjustment.

arXiv: Distributed, Parallel, and Cluster Computing | 2015