K R Krish | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where K R Krish is active.

Explore More

Publication

Featured researches published by K R Krish.

cluster computing and the grid | 2014

hatS: A Heterogeneity-Aware Tiered Storage for Hadoop

K R Krish; Ali Anwar; Ali Raza Butt

Hadoop has become the de-facto large-scale data processing framework for modern analytics applications. A major obstacle for sustaining high performance and scalability in Hadoop is managing the data growth while meeting the ever higher I/O demand. To this end, a promising trend in storage systems is to utilize hybrid and heterogeneous devices - Solid State Disks (SSD), ram disks and Network Attached Storage (NAS), which can help achieve very high I/O rates at acceptable cost. However, the Hadoop Distributed File System (HDFS) that is unable to exploit such heterogeneous storage. This is because HDFS works on the assumption that the underlying devices are homogeneous storage blocks, disregarding their individual I/O characteristics, which leads to performance degradation. In this paper, we present hatS, a Heterogeneity-Aware Tiered Storage, which is a novel redesign of HDFS into a multi-tiered storage system that seamlessly integrates heterogeneous storage technologies into the Hadoop ecosystem. hatS also proposes data placement and retrieval policies, which improve the utilization of the storage devices based on their characteristics such as I/O throughput and capacity. We evaluate hatS using an actual implementation on a medium-sized cluster consisting of HDDs and two types of SSDs (i.e., SATA SSD and PCIe SSD). Experiments show that hatS achieves 32.6% higher read bandwidth, on average, than HDFS for the test Hadoop jobs (such as Grep and Test DFSIO) by directing 64% of the I/O accesses to the SSD tiers. We also evaluate our approach with trace-driven simulations using synthetic Facebook workloads, and show that compared to the standard setup, hatS improves the average I/O rate by 36%, which results in 26% improvement in the job completion time.

modeling, analysis, and simulation on computer and telecommunication systems | 2014

[phi]Sched: A Heterogeneity-Aware Hadoop Workflow Scheduler

K R Krish; Ali Anwar; Ali Raza Butt

Enterprise Hadoop applications now routinely comprise complex workflows that are managed by specialized workflow schedulers such as Oozie. The resources are assumed to be similar or homogeneous and data locality is often the only scheduling constraint considered. However, introduction of specialized architectures and regular system upgrades lead to Hadoop data center hardware becoming increasingly heterogeneous, in that a data center may have several clusters each boasting different characteristics. However, the workflow scheduler is not aware of such heterogeneity, and thus cannot ensure that a cluster selected based on data locality is also suitable for supporting the jobs efficiently in terms of execution time and resource consumption. In this paper, we adopt a quantitative approach where we first study detailed behavior of various representative Hadoop applications running on four different hardware configurations. Next, we incorporate this information into a hardware-aware scheduler, ØSched, to improve the resource application match. To ensure that job associated data is available locally (or nearby) to a cluster in a multi-cluster deployment, we configure a single Hadoop Distributed File System (HDFS) instance across all the participating clusters. We also design and implement region-aware data placement and retrieval for HDFS in order to reduce the network overhead and achieve cluster-level data locality. We evaluate our approach using experiments on Amazon EC2 with four clusters of eight homogeneous nodes each, where each cluster has a different hardware configuration. We find that ØScheds optimized placement of applications across the test clusters reduces the execution time of the test applications by 18.7%, on average, when compared to extant hardware oblivious scheduling. Moreover, our HDFS enhancement increases the I/O throughput by up to 23% and the average I/O rate by up to 26% for the TestDFSIO benchmark.

international conference on cluster computing | 2014

On the use of microservers in supporting hadoop applications

Ali Anwar; K R Krish; Ali Raza Butt

The use of economical, low-power microservers comprising of embedded CPUs is on the rise in supporting a myriad of applications. State of the art microservers can already match the performance of low-end traditional servers, and have been advocated as an energy-efficient alternative computing substrate for data centers as well. In this paper, we explore whether cluster comprising microservers can support the popular Hadoop platform. We conduct a quantitative study of six representative Hadoop applications on five hardware configurations. To compare the different clusters, we also define a comprehensive metric, PerfEC, which unifies the performance, energy consumption, and the acquisition and operating costs of the applications, and helps identify appropriate clusters for Hadoop applications. Experiments on our test clusters suggest that for applications such as TeraSort, RandomWriter and Grep microservers offer up to two orders of magnitude better efficiency in terms of PerfEC than traditional clusters. Similarly, a 3000-node cluster simulation driven by a real-world trace from Facebook shows that on average the studied microservers can match the performance of standard servers, while providing up to 31% energy savings at only 60% of the acquisition cost. We also compare PerfEC to the extant Total Cost of Ownership (TCO) metric, and find that our approach is better able to capture the trade-offs involved.

international conference on big data | 2014

VENU: Orchestrating SSDs in hadoop storage

K R Krish; M. Safdar Iqbal; Ali Raza Butt

A major obstacle in sustaining high performance and scalability in the Hadoop data processing framework is managing the growing data and the need for very high I/O rates. Solid State Disks (SSDs) are promising and are being employed alongside the slower hard disk drives (HDDs) in emerging storage architectures. However, we observed that SSDs are not always a cost-effective option for all Hadoop workloads, and there is a critical need to identify usecases where SSDs can help. To this end, we present VENU, a dynamic data management system for Hadoop. VENU aims to improve overall I/O throughput via effective use of SSDs as a cache for the slower HDDs, not for all data, but for only the workloads that are expected to benefit from SSDs. In addition, we design placement and retrieval schemes to efficiently use the SSD cache. We evaluate our implementation of VENU on a medium-sized cluster and show that it achieves 11% improvement in application completion times when 10% of the available storage is provided by SSDs.

international conference on cluster computing | 2013

AptStore: Dynamic storage management for hadoop

K R Krish; Aleksandr Khasymski; Ali Raza Butt; Sameer Tiwari; Milind Bhandarkar

Typical Hadoop setups employ Direct Attached Storage (DAS) with compute nodes and uniform replication of data to sustain high I/O throughput and fault tolerance. However, not all data is accessed at the same time or rate. Thus, if a large replication factor is used to support higher throughput for popular data, it wastes storage by unnecessarily replicating unpopular data as well. Conversely, if less replication is used to conserve storage for the unpopular data, it means fewer replicas for even popular data and thus lower I/O throughput. We present AptStore, a dynamic data management system for Hadoop, which aims to improve overall I/O throughput while reducing storage cost. We design a tiered storage that uses the standard DAS for popular data to sustain high I/O throughput, and network-attached enterprise filers for cost-effective, fault-tolerant, but lower-throughput storage for unpopular data. We design a file Popularity Predictor (PP) that analyzes file system audit logs and predicts the appropriate storage policy of each file, as well as use the information for transparent data movement between tiers. Our evaluation of AptStore on a real cluster shows 21.3% improvement in application execution time over standard Hadoop, while trace driven simulations show 23.7% increase in read throughput and 43.4% reduction in the storage capacity requirement of the system.

network aware data management | 2014

Towards energy awareness in Hadoop

K R Krish; M. Safdar Iqbal; M. Mustafa Rafique; Ali Raza Butt

With the rise in the use of data centers comprised of commodity clusters for data-intensive applications, the energy efficiency of these setups is becoming a paramount concern for data center operators. Moreover, applications developed for Hadoop framework, which has now become a de-facto implementation of the MapReduce framework, now comprise complex workflows that are managed by specialized workflow schedulers, such as Oozie. These schedulers assume cluster resources to be homogeneous and often consider data locality to be the only scheduling constraint. However, this is increasingly not the case in modern data centers. The addition of low-power computing devices and regular hardware upgrades have made heterogeneity the norm, in that clusters are now comprised of several logical sub-clusters each with its own performance and energy profile. In this paper we present oSched, a workflow scheduler that profiles the performance and the energy characteristics of applications on each hardware sub-cluster in a heterogeneous cluster in order to improve the application-resource match while ensuring energy efficiency and performance related Service Level Agreement (SLA) goals. oSched borrows from our earlier work, fSched, a hardware-aware scheduler, that improves the resource-application match to improve application performance. We evaluate oSched on three clusters with different hardware configurations and energy profiles, where each subcluster comprises of five homogeneous nodes. Our evaluation of oSched shows that application performance and power characteristics vary significantly across different hardware configurations. We show that the hardware-aware scheduling can perform 12.8% faster, while saving 21% more power than hardware oblivious scheduling for the studied applications.

cluster computing and the grid | 2016

On Efficient Hierarchical Storage for Big Data Processing

K R Krish; Bharti Wadhwa; M. Safdar Iqbal; M. Mustafa Rafique; Ali Raza Butt

A promising trend in storage management for big data frameworks, such as Hadoop and Spark, is the emergence of heterogeneous and hybrid storage systems that employ different types of storage devices, e.g. SSDs, RAMDisks, etc., alongside traditional HDDs. However, scheduling data accesses or requests to an appropriate storage device is non-trivial and depends on several factors such as data locality, device performance, and application compute and storage resources utilization. To this end, we present DUX, an application-attuned dynamic data management system for data processing frameworks, which aims to improve overall application I/O throughput by efficiently using SSDs only for workloads that are expected to benefit from them rather than the extant approach of storing a fraction of the overall workloads in SSDs. The novelty of DUX lies in profiling application performance on SSDs and HDDs, analyzing the resulting I/O behavior, and considering the available SSDs at runtime to dynamically place data in an appropriate storage tier. Evaluation of DUX with trace-driven simulations using synthetic Facebook workloads shows that even when using 5.5× fewer SSDs compared to a SSD-only solution, DUX incurs only a small (5%) performance overhead, and thus offers an affordable and efficient storage tier management.

international conference on big data | 2013

On the use of shared storage in shared-nothing environments

K R Krish; Aleksandr Khasymski; Guanying Wang; Ali Raza Butt; Gaurav Makkar

Shared-nothing environments, exemplified by systems such as MapReduce and Hadoop, employ node-local storage to achieve high scalability. The exponential growth in application datasets, however, demands ever higher I/O throughput and disk capacity. Simply equipping individual nodes in a Hadoop cluster with more disks is not scalable as it: increases the per-node cost, increases the probability of storage failure at the node, and worsens node failure recovery times. To this end, we propose dividing a Hadoop rack into several (small) sub-racks, and consolidating disks of a sub-racks compute nodes into a separate shared Localized Storage Node (LSN) within the subrack. Such a shared LSN is easier to manage and provision, and can offer an economically better solution by employing overall fewer disks at the LSN than the total of the sub-racks individual nodes, while still achieving high I/O performance. In this paper, we provide a quantitative study on the impact of shared storage in Hadoop clusters. We utilize several typical Hadoop applications and test them on a medium-sized cluster and via simulations. Our evaluation shows that: (i) the staggered workload allows our design to support the same number of compute nodes at a comparable or better throughput using fewer total disks than in the node-local case, thus providing more efficient resource utilization; (ii) the impact of lost locality can be mitigated by better provisioning the LSN-node network interconnect and the number of disks in an LSN; and (iii) the consolidation of disks into an LSN is a viable and efficient alternative to the extant node-local storage design. Finally, we show that LSN-based design can deliver up to 39% performance improvement over standard Hadoop.

ieee international conference on cloud computing technology and science | 2013

AptStore: Dynamic Storage Management for Hadoop

K R Krish; Aleksandr Khasymski; Ali Raza Butt; Sameer Tiwari; Milind Bhandarkar

international conference on parallel and distributed systems | 2013

Towards Improving MapReduce Task Scheduling Using Online Simulation Based Predictions

Guanying Wang; Aleksandr Khasymski; K R Krish; Ali Raza Butt

MapReduce is the model of choice for processing emerging big-data applications, and is facing an ever increasing demand for higher efficiency. In this context, we propose a novel task scheduling scheme that uses current task and system state information to drive online simulations concurrently within Hadoop, and predict with high accuracy future events, e.g., when a job would complete, or when task-specific data-local nodes would be available. These predictions can then be used to make more efficient resource scheduling decisions. Our framework consists of two components: (i) Task Predictor that predicts task-level execution times based on historical data of the same type of tasks, and (ii) Job Simulator that instantiates the real task scheduler in a simulated environment, and predicts expected scheduling decisions for all the tasks comprising a MapReduce job. Evaluation shows that our framework can achieve high prediction accuracy - 95% of the predicted task execution times are within 10% of the actual times - with negligible overhead (1.29%). Finally, we also present two realistic use cases, job data prefetching and a multi-strategy dynamic scheduler, which can benefit from integration of our prediction framework in Hadoop.This work presents a computer architectural approach at identifying sources of inefficiency in a typical 5-stage pipelined, general purpose soft processor implementation on a modern FPGA. The analysis starts with a naive implementation of the processor which focuses on correctness, modularity, and speed of development. It then extracts a list of components and mechanisms in the processor pipeline as sources of inefficiency. A designer would have to cleverly redesign these components in order to improve the processors operating clock frequency. Using the results of this analysis, this work proposes various optimizations to improve the efficiency of such components. The optimizations increase the processor clock frequency from 145MHz to 281MHz on Stratix III devices, while overall instruction processing throughput increases by 80%.Similarity measurement is a crucial process in collaborative filtering. User similarity is computed solely based on the numerical ratings of users. In this paper, we argue that the social information of users should be also taken into consideration to improve the performance of traditional similarity measurements. To achieve this, we propose a clustering-based similarity measurement approach incorporating user social information. In order to cluster the users effectively, we propose a novel distance metric based on taxonomy tree which can easily process the numerical and categorical information of users. Meanwhile, we also address how to determine the contribution of different types of information in the distance metric. After clustering the users, we introduce the incorporating strategy of our proposed similarity measurement. We perform a series of experiments on a real world dataset and compare the performance of our approach against that of traditional approaches. Experiments demonstrate that the proposed approach considerably outperforms the traditional approaches.

Explore More