Ze Yu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ze Yu is active.

Explore More

Publication

Featured researches published by Ze Yu.

international conference on cloud computing | 2012

IncMR: Incremental Data Processing Based on MapReduce

Cairong Yan; Xin Yang; Ze Yu; Min Li; Xiaolin Li

MapReduce programming model is widely used for large scale and one-time data-intensive distributed computing, but lacks flexibility and efficiency of processing small incremental data. IncMR framework is proposed in this paper for incrementally processing new data of a large data set, which takes state as implicit input and combines it with new data. Map tasks are created according to new splits instead of entire splits while reduce tasks fetch their inputs including the state and the intermediate results of new map tasks from designate nodes or local nodes. Data locality is considered as one of the main optimization means for job scheduling. It is implemented based on Hadoop, compatible with the original MapReduce interfaces and transparent to users. Experiments show that non-iterative algorithms running in MapReduce framework can be migrated to IncMR directly to get efficient incremental and continuous processing without any modification. IncMR is competitive and in all studied cases runs faster than that processing the entire data set.

utility and cloud computing | 2012

CloudBay: Enabling an Online Resource Market Place for Open Clouds

Han Zhao; Ze Yu; Shivam Tiwari; Xing Mao; Kyungyong Lee; David Isaac Wolinsky; Xiaolin Li; Renato J. O. Figueiredo

This paper presents Cloud Bay, an online resource trading and leasing platform for multi-party resource sharing. Following a market-oriented design principle, Cloud Bay provides an abstraction of a shared virtual resource space across multiple administration domains, and features enhanced functionalities for scalable and automatic resource management and efficient service provisioning. Cloud Bay distinguishes itself from existing research and contributes in a number of aspects. First, it leverages scalable network virtualization and self-configurable virtual appliances to facilitate resource federation and parallel application deployment. Second, Cloud Bay adopts an eBay-style transaction model that supports differentiated services with different levels of job priorities. For cost-sensitive users, Cloud Bay implements an efficient matchmaking algorithm based on auction theory and enables opportunistic resource access through preemptive service scheduling. The proposed Cloud Bay platform stands between HPC service sellers and buyers, and offers a comprehensive solution for resource advertising and stitching, transaction management, and application-to-infrastructure mapping. In this paper, we present the design details of Cloud Bay, and discuss lessons and challenges encountered in the implementation process. The proof-of-concept prototype of Cloud Bay is justified through experiments across multiple sites and simulations.

international conference on cluster computing | 2012

Affinity-aware Virtual Cluster Optimization for MapReduce Applications

Cairong Yan; Ming Zhu; Xin Yang; Ze Yu; Min Li; Youqun Shi; Xiaolin Li

Infrastructure-as-a-Service clouds are becoming ubiquitous for provisioning virtual machines on demand. Cloud service providers expect to use least resources to deliver best services. As users frequently request virtual machines to build virtual clusters and run MapReduce-like jobs for big data processing, cloud service providers intend to place virtual machines closely to minimize network latency and subsequently reduce data movement cost. In this paper we focus on the virtual machine placement issue for provisioning virtual clusters with minimum network latency in clouds. We define distance as the latency between virtual machines and use it to measure the affinity of virtual clusters. Such metric of distance indicates the considerations of virtual machine placement and topology of physical nodes in clouds. Then we formulate our problem as the classical shortest distance problem and solve it by modeling to integer programming problem. A greedy virtual machine placement algorithm is designed to get a compact virtual cluster. Furthermore, an improved heuristic algorithm is also presented for achieving a global resource optimization. The simulation results verify our algorithms and the experiment results validate the improvement achieved by our approaches.

international conference on cloud computing | 2014

Palantir: Reseizing Network Proximity in Large-Scale Distributed Computing Frameworks Using SDN

Ze Yu; Min Li; Xin Yang; Xiaolin Li

Parallel/Distributed computing frameworks, such as MapReduce and Dryad, have been widely adopted to analyze massive data. Traditionally, these frameworks depend on manual configuration to acquire network proximity information to optimize the data placement and task scheduling. However, this approach is cumbersome, inflexible or even infeasible in largescale deployments, for example, across multiple datacenters. In this paper, we address this problem by utilizing the Software-Defined Networking (SDN) capability. We build Palantir, an SDN service specific for parallel/distributed computing frameworks to abstract the proximity information out of the network. Palantir frees the framework developers/ administrators from having to manually configure the network. In addition, Palantir is flexible because it allows different frameworks to define the proximity according to the framework-specific metrics. We design and implement a datacenter-aware MapReduce to demonstrate Palantirs usefullness. Our evaluation shows that, based on Palantir, datacenter-aware MapReduce achieves siginficant performance improvement.

Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference on | 2013

Mammoth: autonomic data processing framework for scientific state-transition applications

Xin Yang; Ze Yu; Min Li; Xiaolin Li

Scientific computing is becoming increasingly data-intensive, and more high-impact discoveries are relying on efficient processing of big scientific data. The popular MapReduce framework such as Hadoop offers an alternative to conventional solutions (e.g., MPI or OpenMP). However, they perform moderately when processing state-transition applications. There are three key challenges: (1) these applications generate the inflated intermediate data that may saturate the network; (2) they may cause substantial synchronization overheads if not managed well; (3) dynamically evolving scientific phenomena result in heterogeneous data distributions, causing significant computation skews. In this paper, we propose Mammoth, an autonomic parallel data processing framework for scientific state-transition applications. Mammoth features a MapReduce-style programming model that is familiar to users. To address the challenges mentioned, it is further enhanced with a series of optimizations that parallelize the computation automatically and efficiently. We evaluate Mammoth via a weather prediction application with real-world datasets. The experimental evaluation demonstrates that Mammoth is competitive with the MPI-based solution and at least 30% faster than the optimized Hadoop-based solution.

Proceedings of the 2014 ACM international workshop on Software-defined ecosystems | 2014

GatorCloud: a fine-grained and dynamic resourcesharing architecture for multiple cloud services

Ze Yu; Min Li; Yang Liu; Xiaolin Li

Cloud datacenters are incorporating diverse workloads with different service models, such as Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS). Multiplexing various cloud service models on the same cluster, instead of maintaining a separate cluster for each one, is the key to increasing the resource utilization and lowering the operation costs for the cloud providers. However, it is challenging to achieve multi-resource sharing among different services while preserving performance isolation among them. To address this problem, we present GatorCloud, a new cloud resource management architecture that enables fine-grained and dynamic resource sharing among different service models, such as IaaS and PaaS. GatorCloud exploits a new abstraction called balloon to encapsulate both the execution context of the services and the resources associated with them. It uses a two-level architecture to dedicate the fine-grained resource management to the controller of each service. An SDN-enabled control knob is introduced to directly manipulate the network resources. We demonstrate the detailed design and implementation of a GatorCloud prototype which multiplexes an OpenStack-based IaaS service and two PaaS services (Hadoop and Spark).

high performance computing and communications | 2013

MyCloud: On-Demand Virtual Cluster Provisioning on HPC Resources

Min Li; Xin Yang; Ze Yu; Xiaolin Li

Advances in the industry are gaining attentions from the scientific computing community. Programming models and frameworks such as MapReduce and Hadoop are attracting scientists due to their easy-to-use interfaces and autonomic parallel processing abilities. Compared with the conventional HPC execution environments, clouds possess desirable features such as administrative privilege, customizable software environments, and are more capable of supporting new frameworks and boosting inter-organization collaborations. Isolated private clouds and public clouds are used by scientists. However, extra administrative and monetary costs are inevitable in both cases. In this paper, we propose My Cloud, a resource management framework integrating cloud-based techniques into HPC systems. My Cloud supports both conventional HPC jobs and cloud-like on-demand virtual cluster provisioning on the same HPC cluster simultaneously. Dynamic resource sharing between the two environments is beneficial for cluster utilization and burst usages handling. The design of My Cloud is friendly to the state-of-the-art Software Defined Networking technologies and opens opportunities for fine-grained application-aware network engineering.

international conference on cluster computing | 2015

Taming Non-local Stragglers Using Efficient Prefetching in MapReduce

Ze Yu; Min Li; Xin Yang; Han Zhao; Xiaolin Li

MapReduce has been widely adopted as a programming model to process big data. However, parallel jobs in MapReduce are prone to be plagued by stragglers caused by non-local tasks for two reasons: first, system logs from production clusters show that a non-local task can be two times slower than a local task; second, a jobs completion time is bottlenecked by its slowest parallel tasks. As a result, even one single non-local task can become the straggler of the whole job, causing significant delay of the whole job. In this paper, we propose to alleviate this problem by proactively prefetching input data for non-local tasks. However, performing such prefetching efficiently in MapReduce is difficult, because it requires both application-level information to generate accurate prefetching requests at runtime, and an appropriate network flow scheduling mechanism to guarantee the timeliness of prefetching flows. To address these challenges, we design and implement FlexFetch, which 1) leverages a novel mechanism called speculative scheduling to accurately generate prefetching flows, 2) explicitly allocates network resources to prefetching flows using a criticality-aware deadline-driven flow scheduling algorithm. We evaluate FlexFetch through both testbed experiments and large-scale simulations using production workloads. The results show that FlexFetch reduces the completion time by 41.8% for small jobs and 26.8% on average, compared with the default MapReduce implementation in Hadoop.

international conference on cloud computing | 2014

Taming Computation Skews of Block-Oriented Iterative Scientific Applications in MapReduce Systems

Xin Yang; Min Li; Ze Yu; Xiaolin Li

Nowadays, scientists are embracing big data techniques for exploring significant discoveries from large volumes of scientific data quickly. Properly partitioning workloads is essential for fully exploiting the benefit of parallelism, but is difficult for applications whose computations change iteratively. Computation skews are inevitable when executing block-oriented iterative scientific applications in MapReduce systems. This paper proposes iPart, an autonomic workload partitioning system for taming computation skews of block-oriented iterative scientific applications in MapReduce systems. iPart introduces a workload control loop into the conventional execution of MapReduce jobs. Workload estimates in terms of execution time are collected in the reduce phase and fed back to the partition phase to update partitioning plans. Computation skews are detected and addressed by adapting partitioning to computation changes iteratively. Two adaptive partitioning methods based on the binary partitioning method are presented. Experimental evaluations with two simulated applications and the synthetic and real-world data prove that iPart responds to computation changes and adapts partitioning quickly and accurately.

international conference on parallel and distributed systems | 2013

Apala: Adaptive Partitioning and Load Balancing for State-Transition Applications

Xin Yang; Ze Yu; Min Li; Xiaolin Li; Ming Xue; Sergiu Sanielevici; David O'Neal

With continued scaling, all transistors are no longer created equal. The delay of a length 4 horizontal routing segment at coordinates (23,17) will differ from one at (12,14) in the sameFPGA and from the same segment in another FPGA. The vendor tools give conservative values for these delays, but knowing exactly what these delays are can be invaluable. In this paper, we show how to obtain this information, inexpensively, using only components that already exist on the FPGA (configurable PLLs, registers, logic, and interconnect). The techniques we present are general and can be used to measure the delays of any resource on any FPGA with these components. We provide general algorithms for identifying the set of useful delay components, the set of measurements necessary to compute these delay components, and the calculations necessary to perform the computation. We demonstrate our techniques on the interconnect for an Altera Cyclone III (65nm). As a result, we are able to quantify over a 100 ps spread in delays for nominally identical routing segments on a single FPGA.This paper proposes a partitioning and load-balance scheme for parallelizing state-transition applications on computer clusters. Existing schemes insufficiently balance both the computation of complex state-transition algorithms and the increasing volume of scientific data simultaneously. Apala addresses this problem by introducing the time metric to unify the workloads of computation and data. System profiles in terms of CPU and I/O speeds are considered for accurate workload estimations. Apala consists of two major components: (1) an adaptive decomposition scheme that uses the quad-tree structure to break up workloads and manage data dependencies, (2) a decentralized scheme for distributing workloads across processors. Experimental results from the real-world weather data demonstrate that Apala outperforms other partitioning schemes, and can be readily ported to diverse systems with satisfactory performance.To solve the performance bottleneck of tree-based structure and single point failure issues of data center networks, this thesis proposes an undirected double-loop data center network structure, which makes use of the cyclic graphs excellent properties on the node number and path length to achieve high performance, good scalability and fault-tolerance. Main issues include the method to build the minimum distance diagram of undirected double-loop networks, the algorithm to calculate its diameter, and the step to build optimal undirected double-loop networks, etc. We prove that the diameter of the undirected double-loop network is equal to the height of its tree structure, and propose a rapid algorithm to calculate the diameter, and find that there are lots of optimal undirected double-loop networks in some infinite clusters. Finally, the lower bound proposed by Yebra is verified by experiment. According to these results above, the transmission performance of undirected double-loop data center networks will be optimized.Compression of science data for space missions is an established technique whereas data compression of housekeeping telemetry is rare. This paper questions this state of affairs and we investigate the potential advantages and disadvantages of the latter. Using real data from the ROSETTA spacecraft we describe a set of experiments demonstrating that massive compression of housekeeping data can be achieved using standard off the shelf products. We then use these experiments to describe a pre-processing technique that enables a very simple algorithm to obtain similar performances, thus showing that implementation can be relatively straightforward. On average, compression factors often were achievable using standard software and compression factors of seven using very simple software. We then address the issue of risk and demonstrate how a combination of compression and mitigation techniques can be used to reduce mission risk rather than increase it. Given these results the paper attempts to shift the focus on each new mission to ask itself “How can we afford not to compress our housekeeping data?”

Explore More