Ruichuan Chen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ruichuan Chen is active.

Explore More

Publication

Featured researches published by Ruichuan Chen.

computer and communications security | 2012

Non-tracking web analytics

Istemi Ekin Akkus; Ruichuan Chen; Michaela Hardt; Paul Francis; Johannes Gehrke

Today, websites commonly use third party web analytics services t obtain aggregate information about users that visit their sites. This information includes demographics and visits to other sites as well as user behavior within their own sites. Unfortunately, to obtain this aggregate information, web analytics services track individual user browsing behavior across the web. This violation of user privacy has been strongly criticized, resulting in tools that block such tracking as well as anti-tracking legislation and standards such as Do-Not-Track. These efforts, while improving user privacy, degrade the quality of web analytics. This paper presents the first design of a system that provides web analytics without tracking. The system gives users differential privacy guarantees, can provide better quality analytics than current services, requires no new organizational players, and is practical to deploy. This paper describes and analyzes the design, gives performance benchmarks, and presents our implementation and deployment across several hundred users.

acm special interest group on data communication | 2013

SplitX: high-performance private analytics

Ruichuan Chen; Istemi Ekin Akkus; Paul Francis

There is a growing body of research on mechanisms for preserving online user privacy while still allowing aggregate queries over private user data. A common approach is to store user data at users devices, and to query the data in such a way that a differentially private noisy result is produced without exposing individual user data to any system component. A particular challenge is to design a system that scales well while limiting how much the malicious users can distort the result. This paper presents SplitX, a high-performance analytics system for making differentially private queries over distributed user data. SplitX is typically two to three orders of magnitude more efficient in bandwidth, and from three to five orders of magnitude more efficient in computation than previous comparable systems, while operating under a similar trust model. SplitX accomplishes this performance by replacing public-key operations with exclusive-or operations. This paper presents the design of SplitX, analyzes its security and performance, and describes its implementation and deployment across 416 users.

conference on emerging network experiment and technology | 2011

Address-based route reflection

Ruichuan Chen; Aman Shaikh; Jia Wang; Paul J. Francis

BGP Route Reflectors (RR), which are commonly used to help scale Internal BGP (iBGP), can produce oscillations, forwarding loops, and path inefficiencies. ISPs avoid these pitfalls through careful topology design, RR placement, and link-metric assignment. This paper presents Address-Based Route Reflection (ABRR): the first iBGP solution that completely solves all oscillation and looping problems, has no path inefficiencies, and puts no constraints on RR placement. ABRR does this by emulating the semantics of full-mesh iBGP, and thereby adopting the correctness and path efficiency properties of full-mesh iBGP. Both traditional Topology-Based Route Reflection (TBRR) and ABRR take a divide-and-conquer approach. While TBRR scales by making each RR responsible for all prefixes from some fraction of routers, ABRR scales by making each RR responsible for some fraction of prefixes from all routers. We have implemented a fully functional ABRR prototype. Using BGP data from a Tier-1 ISP, our analytical and implementation results show that ABRRs scaling and convergence properties compare positively with traditional TBRR.

Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference on | 2017

StreamApprox: approximate computing for stream analytics

Do Le Quoc; Ruichuan Chen; Pramod Bhatotia; Christof Fetzer; Volker Hilt; Thorsten Strufe

Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing --- based on the chosen sample size --- can make a systematic trade-off between the output accuracy and computation efficiency. Unfortunately, the state-of-the-art systems for approximate computing primarily target batch analytics, where the input data remains unchanged during the course of computation. Thus, they are not well-suited for stream analytics. This motivated the design of StreamApprox--- a stream analytics system for approximate computing. To realize this idea, we designed an online stratified reservoir sampling algorithm to produce approximate output with rigorous error bounds. Importantly, our proposed algorithm is generic and can be applied to two prominent types of stream processing systems: (1) batched stream processing such as Apache Spark Streaming, and (2) pipelined stream processing such as Apache Flink. To showcase the effectiveness of our algorithm, we implemented StreamApprox as a fully functional prototype based on Apache Spark Streaming and Apache Flink. We evaluated StreamApprox using a set of microbenchmarks and real-world case studies. Our results show that Spark- and Flink-based StreamApprox systems achieve a speedup of 1.15×---3× compared to the respective native Spark Streaming and Flink executions, with varying sampling fraction of 80% to 10%. Furthermore, we have also implemented an improved baseline in addition to the native execution baseline --- a Spark-based approximate computing system leveraging the existing sampling modules in Apache Spark. Compared to the improved baseline, our results show that StreamApprox achieves a speedup of 1.1×---2.4× while maintaining the same accuracy level.

Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference on | 2017

Sieve: actionable insights from monitored metrics in distributed systems

Jörg Thalheim; Antonio Rodrigues; Istemi Ekin Akkus; Pramod Bhatotia; Ruichuan Chen; Bimal Viswanath; Lei Jiao; Christof Fetzer

Major cloud computing operators provide powerful monitoring tools to understand the current (and prior) state of the distributed systems deployed in their infrastructure. While such tools provide a detailed monitoring mechanism at scale, they also pose a significant challenge for the application developers/operators to transform the huge space of monitored metrics into useful insights. These insights are essential to build effective management tools for improving the efficiency, resiliency, and dependability of distributed systems. This paper reports on our experience with building and deploying Sieve---a platform to derive actionable insights from monitored metrics in distributed systems. Sieve builds on two core components: a metrics reduction framework, and a metrics dependency extractor. More specifically, Sieve first reduces the dimensionality of metrics by automatically filtering out unimportant metrics by observing their signal over time. Afterwards, Sieve infers metrics dependencies between distributed components of the system using a predictive-causality model by testing for Granger Causality. We implemented Sieve as a generic platform and deployed it for two microservices-based distributed systems: OpenStack and Share-Latex. Our experience shows that (1) Sieve can reduce the number of metrics by at least an order of magnitude (10 -- 100×), while preserving the statistical equivalence to the total number of monitored metrics; (2) Sieve can dramatically improve existing monitoring infrastructures by reducing the associated overheads over the entire system stack (CPU---80%, storage---90%, and network---50%); (3) Lastly, Sieve can be effective to support a wide-range of workflows in distributed systems---we showcase two such workflows: Orchestration of autoscaling, and Root Cause Analysis (RCA).

symposium on cloud computing | 2018

ApproxJoin: Approximate Distributed Joins.

Do Le Quoc; Istemi Ekin Akkus; Pramod Bhatotia; Spyros Blanas; Ruichuan Chen; Christof Fetzer; Thorsten Strufe

A distributed join is a fundamental operation for processing massive datasets in parallel. Unfortunately, computing an equi-join over such datasets is very resource-intensive, even when done in parallel. Given this cost, the equi-join operator becomes a natural candidate for optimization using approximation techniques, which allow users to trade accuracy for latency. Finding the right approximation technique for joins, however, is a challenging task. Sampling, in particular, cannot be directly used in joins; naïvely performing a join over a sample of the dataset will not preserve statistical properties of the query result. To address this problem, we introduce ApproxJoin. We interweave Bloom filter sketching and stratified sampling with the join computation in a new operator that preserves statistical properties of an aggregation over the join output. ApproxJoin leverages Bloom filters to avoid shuffling non-joinable data items around the network, and then applies stratified sampling to obtain a representative sample of the join output. We implemented ApproxJoin in Apache Spark, and evaluated it using microbenchmarks and real-world workloads. Our evaluation shows that ApproxJoin scales well and significantly reduces data movement, without sacrificing tight error bounds on the accuracy of the final results. ApproxJoin achieves a speedup of up to 9x over unmodified Spark-based joins with the same sampling ratio. Furthermore, the speedup is accompanied by a significant reduction in the shuffled data volume, which is up to 82x less than unmodified Spark-based joins.

conference on emerging network experiment and technology | 2017

Towards Reliable Application Deployment in the Cloud

Ruichuan Chen; Istemi Ekin Akkus; Bimal Viswanath; Ivica Rimac; Volker Hilt

A common practice to increase the reliability of a cloud application is to deploy redundant instances. Unfortunately such redundancy efforts can be undermined if the applications instances share common dependencies. This paper presents ReCloud, a novel system that can efficiently find a reliable deployment plan for cloud applications. ReCloud considers and avoids common dependencies shared across application instances that may lead to correlated failures, and works with applications that even have complex internal structures. ReCloud utilizes various pieces of available dependency information (e.g., hardware, software and/or network dependencies) about the cloud infrastructure to quantitatively assess the reliability of the applications deployment plan with rigorous error bounds. This assessment further enables ReCloud to find a deployment plan that balances between reliability and other criteria such as application performance and resource utilization. We implemented a fully functional system. The experimental results show that, even in a large cloud environment with more than 27K hosts, ReCloud needs only 30 seconds to find a deployment plan that is one order of magnitude more reliable than the common practice.

networked systems design and implementation | 2012