Guanying Wang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guanying Wang is active.

Explore More

Publication

Featured researches published by Guanying Wang.

modeling, analysis, and simulation on computer and telecommunication systems | 2009

A simulation approach to evaluating design decisions in MapReduce setups

Guanying Wang; Ali Raza Butt; Prashant Pandey; Karan Gupta

MapReduce has emerged as a model of choice for supporting modern data-intensive applications. The model is easy-to-use and promising in reducing time-to-solution. It is also a key enabler for cloud computing, which provides transparent and flexible access to a large number of compute, storage and networking resources. Setting up and operating a large MapReduce cluster entails careful evaluation of various design choices and run-time parameters to achieve high efficiency. However, this design space has not been explored in detail. In this paper, we adopt a simulation approach to systematically understanding the performance of MapReduce setups. The resulting simulator, MRPerf, captures such aspects of these setups as node, rack and network configurations, disk parameters and performance, data layout and application I/O characteristics, among others, and uses this information to predict expected application performance. Specifically, we use MRPerf to explore the effect of several component inter-connect topologies, data locality, and software and hardware failures on overall application performance. MR-Perf allows us to quantify the effect of these factors, and thus can serve as a tool for optimizing existing MapReduce setups as well as designing new ones.

acm workshop on large scale system and application performance | 2009

Using realistic simulation for performance analysis of mapreduce setups

Guanying Wang; Ali Raza Butt; Prashant Pandey; Karan Gupta

Recently, there has been a huge growth in the amount of data processed by enterprises and the scientific computing community. Two promising trends ensure that applications will be able to deal with ever increasing data volumes: First, the emergence of cloud computing, which provides transparent access to a large number of compute, storage and networking resources; and second, the development of the MapReduce programming model, which provides a high-level abstraction for data-intensive computing. However, the design space of these systems has not been explored in detail. Specifically, the impact of various design choices and run-time parameters of a MapReduce system on application performance remains an open question. To this end, we embarked on systematically understanding the performance of MapReduce systems, but soon realized that understanding effects of parameter tweaking in a large-scale setup with many variables was impractical. Consequently, in this paper, we present the design of an accurate MapReduce simulator, MRPerf, for facilitating exploration of MapReduce design space. MRPerf captures various aspects of a MapReduce setup, and uses this information to predict expected application performance. In essence, MRPerf can serve as a design tool for MapReduce infrastructure, and as a planning tool for making MapReduce deployment far easier via reduction in the number of parameters that currently have to be hand-tuned using rules of thumb. Our validation of MRPerf using data from medium-scale production clusters shows that it is able to predict application performance accurately, and thus can be a useful tool in enabling cloud computing. Moreover, an initial application of MRPerf to our test clusters running Hadoop, revealed a performance bottleneck, fixing which resulted in up to 28.05% performance improvement.

international parallel and distributed processing symposium | 2012

SAHAD: Subgraph Analysis in Massive Networks Using Hadoop

Zhao Zhao; Guanying Wang; Ali Raza Butt; Maleq Khan; V. S. Anil Kumar; Madhav V. Marathe

Relational sub graph analysis, e.g. finding labeled sub graphs in a network, which are isomorphic to a template, is a key problem in many graph related applications. It is computationally challenging for large networks and complex templates. In this paper, we develop SAHAD, an algorithm for relational sub graph analysis using Hadoop, in which the sub graph is in the form of a tree. SAHAD is able to solve a variety of problems closely related with sub graph isomorphism, including counting labeled/unlabeled sub graphs, finding supervised motifs, and computing graph let frequency distribution. We prove that the worst case work complexity for SAHAD is asymptotically very close to that of the best sequential algorithm. On a mid-size cluster with about 40 compute nodes, SAHAD scales to networks with up to 9 million nodes and a quarter billion edges, and templates with up to 12 nodes. To the best of our knowledge, SAHAD is the first such Hadoop based subgraph/subtree analysis algorithm, and performs significantly better than prior approaches for very large graphs and templates. Another unique aspect is that SAHAD is also amenable to running quite easily on Amazon EC2, without needs for any system level optimization.

international conference on big data | 2013

On the use of shared storage in shared-nothing environments

K R Krish; Aleksandr Khasymski; Guanying Wang; Ali Raza Butt; Gaurav Makkar

Shared-nothing environments, exemplified by systems such as MapReduce and Hadoop, employ node-local storage to achieve high scalability. The exponential growth in application datasets, however, demands ever higher I/O throughput and disk capacity. Simply equipping individual nodes in a Hadoop cluster with more disks is not scalable as it: increases the per-node cost, increases the probability of storage failure at the node, and worsens node failure recovery times. To this end, we propose dividing a Hadoop rack into several (small) sub-racks, and consolidating disks of a sub-racks compute nodes into a separate shared Localized Storage Node (LSN) within the subrack. Such a shared LSN is easier to manage and provision, and can offer an economically better solution by employing overall fewer disks at the LSN than the total of the sub-racks individual nodes, while still achieving high I/O performance. In this paper, we provide a quantitative study on the impact of shared storage in Hadoop clusters. We utilize several typical Hadoop applications and test them on a medium-sized cluster and via simulations. Our evaluation shows that: (i) the staggered workload allows our design to support the same number of compute nodes at a comparable or better throughput using fewer total disks than in the node-local case, thus providing more efficient resource utilization; (ii) the impact of lost locality can be mitigated by better provisioning the LSN-node network interconnect and the number of disks in an LSN; and (iii) the consolidation of disks into an LSN is a viable and efficient alternative to the extant node-local storage design. Finally, we show that LSN-based design can deliver up to 39% performance improvement over standard Hadoop.

international conference on green computing | 2010

A light-weight approach to reducing energy management delays in disks

Guanying Wang; Ali Raza Butt; Chris Gniady; Puranjoy Bhattacharjee

Todays enterprise computing systems routinely employ a large number of computers for tasks ranging from supporting daily business operations to mission-critical back-end applications. These computers consume a lot of energy whose monetary cost accounts for a significant portion of an enterprises operating budget. Consequently, enterprises employ energy saving techniques such as turning machines off overnight and dynamic energy management during the business hours. Unfortunately, dynamic energy management, especially that for disks, introduces delays when an accessed disk is in a low power state and needs to be brought into an active state. Existing techniques mainly focus on reducing energy consumption and do not take advantage of enterprise-wide resources to mitigate the associated delays. Thus, systems designers are faced with a critical trade-off: saving energy reduces operating costs but may increase the delays exposed to the users, conversely, reducing access latencies and making the system more responsive may preclude energy management techniques. In this paper, we propose System-wide Alternative Retrieval of Data (SARD) that exploits the large number of machines in an enterprise environment to transparently retrieve binaries from other nodes, thus avoiding access delays when the local disk is in a low power mode. SARD uses a software-based approach to reduce spin-up delays while eliminating the need for major operating system changes, custom buffering, or shared memory infrastructure. The main goal of SARD is not to increase energy savings, rather reduce delays associated with energy management techniques, which will encourage users to utilize energy management techniques more frequently and realize the energy savings. Our evaluation of SARD using trace-driven simulations as well as an actual implementation in a real system shows over 71% average reduction in delays associated with energy management. Moreover, SARD achieves an additional 5.1% average reduction in energy consumption for typical desktop applications compared to the widely-used timeout-based disk energy management.

modeling, analysis, and simulation on computer and telecommunication systems | 2009

Mitigating disk energy management delays by exploiting peer memory

Guanying Wang; Ali Raza Butt; Chris Gniady

Modern enterprises employ hundreds of workstations for daily business operations, which consume a lot of energy and thus have significant operating costs. To reduce such costs, dynamic energy management is often employed. However, dynamic energy management, especially that for disks, introduces delays when an accessed disk is in a low power state and needs to be brought into active state. In this paper, we propose System-wide Alternative Retrieval of Data (SARD) that exploits the large number of machines in an enterprise environment to transparently retrieve binaries from other nodes, thus avoiding access delays when the local disk is in a low power mode. SARD uses a software-based approach to reduce spin-up delays while eliminating the need for major operating system changes, custom buffering, or shared memory infrastructure.

international conference on parallel and distributed systems | 2013

Towards Improving MapReduce Task Scheduling Using Online Simulation Based Predictions

Guanying Wang; Aleksandr Khasymski; K R Krish; Ali Raza Butt

MapReduce is the model of choice for processing emerging big-data applications, and is facing an ever increasing demand for higher efficiency. In this context, we propose a novel task scheduling scheme that uses current task and system state information to drive online simulations concurrently within Hadoop, and predict with high accuracy future events, e.g., when a job would complete, or when task-specific data-local nodes would be available. These predictions can then be used to make more efficient resource scheduling decisions. Our framework consists of two components: (i) Task Predictor that predicts task-level execution times based on historical data of the same type of tasks, and (ii) Job Simulator that instantiates the real task scheduler in a simulated environment, and predicts expected scheduling decisions for all the tasks comprising a MapReduce job. Evaluation shows that our framework can achieve high prediction accuracy - 95% of the predicted task execution times are within 10% of the actual times - with negligible overhead (1.29%). Finally, we also present two realistic use cases, job data prefetching and a multi-strategy dynamic scheduler, which can benefit from integration of our prediction framework in Hadoop.This work presents a computer architectural approach at identifying sources of inefficiency in a typical 5-stage pipelined, general purpose soft processor implementation on a modern FPGA. The analysis starts with a naive implementation of the processor which focuses on correctness, modularity, and speed of development. It then extracts a list of components and mechanisms in the processor pipeline as sources of inefficiency. A designer would have to cleverly redesign these components in order to improve the processors operating clock frequency. Using the results of this analysis, this work proposes various optimizations to improve the efficiency of such components. The optimizations increase the processor clock frequency from 145MHz to 281MHz on Stratix III devices, while overall instruction processing throughput increases by 80%.Similarity measurement is a crucial process in collaborative filtering. User similarity is computed solely based on the numerical ratings of users. In this paper, we argue that the social information of users should be also taken into consideration to improve the performance of traditional similarity measurements. To achieve this, we propose a clustering-based similarity measurement approach incorporating user social information. In order to cluster the users effectively, we propose a novel distance metric based on taxonomy tree which can easily process the numerical and categorical information of users. Meanwhile, we also address how to determine the contribution of different types of information in the distance metric. After clustering the users, we introduce the incorporating strategy of our proposed similarity measurement. We perform a series of experiments on a real world dataset and compare the performance of our approach against that of traditional approaches. Experiments demonstrate that the proposed approach considerably outperforms the traditional approaches.

Journal of Parallel and Distributed Computing | 2013

On reducing energy management delays in disks

K R Krish; Guanying Wang; Puranjoy Bhattacharjee; Ali Raza Butt; Chris Gniady

Enterprise computing systems consume a large amount of energy, the cost of which contributes significantly to the operating budget. Consequently, dynamic energy management techniques are prevalent. Unfortunately, dynamic energy management for disks impose delays associated with powering up the disks from a low-power state. Systems designers face a critical trade-off: saving energy reduces operating costs but may increase delays; conversely, reduced access latency makes the systems more responsive but may preclude energy management. In this paper, we propose a System-wide Alternative Retrieval of Data (SARD) scheme. SARD exploits the similarity in software deployment and configuration in enterprise computers to retrieve binaries transparently from other nodes, thus avoiding access delays when the local disk is in a low-power state. SARD uses a software-based approach to reduce spin-up delays while eliminating custom buffering, shared memory infrastructure, or the need for major changes in the operating system. SARD achieves over 71% reduction in delays on trace-driven simulations and in an actual implementation. This will encourage users to utilize energy management techniques more frequently. SARD also achieves an additional 5.1% average reduction in energy consumption for typical desktop applications compared to the widely-used timeout-based disk energy management.

modeling, analysis, and simulation on computer and telecommunication systems | 2011