Kai Ren | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kai Ren is active.

Explore More

Publication

Featured researches published by Kai Ren.

symposium on cloud computing | 2011

YCSB++: benchmarking and performance debugging advanced features in scalable table stores

Swapnil Patil; Milo Polte; Kai Ren; Wittawat Tantisiriroj; Lin Xiao; Julio Lopez; Garth A. Gibson; Adam Fuchs; Billie Rinaldi

Inspired by Googles BigTable, a variety of scalable, semi-structured, weak-semantic table stores have been developed and optimized for different priorities such as query speed, ingest speed, availability, and interactivity. As these systems mature, performance benchmarking will advance from measuring the rate of simple workloads to understanding and debugging the performance of advanced features such as ingest speed-up techniques and function shipping filters from client to servers. This paper describes YCSB++, a set of extensions to the Yahoo! Cloud Serving Benchmark (YCSB) to improve performance understanding and debugging of these advanced features. YCSB++ includes multi-tester coordination for increased load and eventual consistency measurement, multi-phase workloads to quantify the consequences of work deferment and the benefits of anticipatory configuration optimization such as B-tree pre-splitting or bulk loading, and abstract APIs for explicit incorporation of advanced features in benchmark tests. To enhance performance debugging, we customized an existing cluster monitoring tool to gather the internal statistics of YCSB++, table stores, system services like HDFS, and operating systems, and to offer easy post-test correlation and reporting of performance behaviors. YCSB++ features are illustrated in case studies of two BigTable-like table stores, Apache HBase and Accumulo, developed to emphasize high ingest rates and finegrained security.

ieee international conference on high performance computing data and analytics | 2014

IndexFS: scaling file system metadata performance with stateless caching and bulk insertion

Kai Ren; Qing Zheng; Swapnil Patil; Garth A. Gibson

The growing size of modern storage systems is expected to exceed billions of objects, making metadata scalability critical to overall performance. Many existing distributed file systems only focus on providing highly parallel fast access to file data, and lack a scalable metadata service. In this paper, we introduce a middleware design called Index FS that adds support to existing file systems such as PVFS, Lustre, and HDFS for scalable high-performance operations on metadata and small files. Index FS uses a table-based architecture that incrementally partitions the namespace on a per-directory basis, preserving server and disk locality for small directories. An optimized log-structured layout is used to store metadata and small files efficiently. We also propose two client-based storm free caching techniques: bulk namespace insertion for creation intensive workloads such as N-N check pointing, and stateless consistent metadata caching for hot spot mitigation. By combining these techniques, we have demonstrated Index FS scaled to 128 metadata servers. Experiments show our out-of-core metadata throughput out-performing existing solutions such as PVFS, Lustre, and HDFS by 50% to two orders of magnitude.

symposium on cloud computing | 2015

ShardFS vs. IndexFS: replication vs. caching strategies for distributed metadata management in cloud storage systems

Lin Xiao; Kai Ren; Qing Zheng; Garth A. Gibson

The rapid growth of cloud storage systems calls for fast and scalable namespace processing. While few commercial file systems offer anything better than federating individually non-scalable namespace servers, a recent academic file system, IndexFS, demonstrates scalable namespace processing based on client caching of directory entries and permissions (directory lookup state) with no per-client state in servers. In this paper we explore explicit replication of directory lookup state in all servers as an alternative to caching this information in all clients. Both eliminate most repeated RPCs to different servers in order to resolve hierarchical permission tests. Our realization for server replicated directory lookup state, ShardFS, employs a novel file system specific hybrid optimistic and pessimistic concurrency control favoring single object transactions over distributed transactions. Our experimentation suggests that if directory lookup state mutation is a fixed fraction of operations (strong scaling for metadata), server replication does not scale as well as client caching, but if directory lookup state mutation is proportional to the number of jobs, not the number of processes per job, (weak scaling for metadata), then server replication can scale more linearly than client caching and provide lower 70 percentile response times as well.

Proceedings of the 9th Parallel Data Storage Workshop on | 2014

BatchFS: scaling the file system control plane with client-funded metadata servers

Qing Zheng; Kai Ren; Garth A. Gibson

Parallel file systems are often characterized by a layered architecture that decouples metadata management from I/O operations, allowing file systems to facilitate fast concurrent access to file contents. However, metadata intensive workloads are still likely to bottleneck at the file system control plane due to namespace synchronization, which taxes application performance through lock contention on directories, transaction serialization, and RPC overheads. In this paper, we propose a client-driven file system metadata architecture, BatchFS, that is optimized for noninteractive, or batch, workloads. To avoid metadata bottlenecks, BatchFS features a relaxed consistency model marked by lazy namespace synchronization and optimistic metadata verification. Capable of executing namespace operations on client-provisioned resources without contacting any metadata server, BatchFS clients are able to delay namespace synchronization until synchronization is really needed. Our goal in this vision paper is to handle these delayed operations securely and efficiently with metadata verification and bulk insertion. Preliminary experiments demonstrate that our client-funded metadata architecture outperforms a traditional synchronous file system by orders of magnitude.

Proceedings of the second international workshop on MapReduce and its applications | 2011

Otus: resource attribution in data-intensive clusters

Kai Ren; Julio Lopez; Garth A. Gibson

Frameworks for large scale data-intensive applications, such as Hadoop and Dryad, have gained tremendous popularity.Understanding the resource requirements of these frameworks and the performance characteristics of distributed applications is inherently difficult. We present an approach, based on resource attribution, that aims at facilitating performance analyses of distributed data-intensive applications.This approach is embodied in Otus, a monitoring tool to attribute resource usage to jobs and services in Hadoop clusters.Otus collects and correlates performance metrics from distributed components and provides views that display time-series of these metrics filtered and aggregated using multiple criteria.Our evaluation shows that this approach can be deployed without incurring major overheads.Our experience with Otus in a production cluster suggests its effectiveness at helping users and cluster administrators with application performance analysis and troubleshooting.

high performance distributed computing | 2010

DiscFinder: a data-intensive scalable cluster finder for astrophysics

Bin Fu; Kai Ren; Julio Lopez; Eugene Fink; Garth A. Gibson

DiscFinder is a scalable approach for identifying large-scale astronomical structures, such as galaxy clusters, in massive observation and simulation astrophysics datasets. It is designed to operate on datasets with tens of billions of astronomical objects, even in the case when the dataset is much larger than the aggregate memory of compute cluster used for the processing.

petascale data storage workshop | 2015

DeltaFS: exascale file systems scale better without dedicated servers

Qing Zheng; Kai Ren; Garth A. Gibson; Bradley W. Settlemyer; Gary Grider

High performance computing fault tolerance depends on scalable parallel file system performance. For more than a decade scalable bandwidth has been available from the object storage systems that underlie modern parallel file systems, and recently we have seen demonstrations of scalable parallel metadata using dynamic partitioning of the namespace over multiple metadata servers. But even these scalable parallel file systems require significant numbers of dedicated servers, and some workloads still experience bottlenecks. We envision exascale parallel file systems that do not have any dedicated server machines. Instead a parallel job instantiates a file system namespace service in client middleware that operates on only scalable object storage and communicates with other jobs by sharing or publishing namespace snapshots. Experiments shows that our serverless file system design, DeltaFS, performs metadata operations orders of magnitude faster than traditional file system architectures.

ieee international conference on high performance computing data and analytics | 2012

A Case for Scaling HPC Metadata Performance through De-specialization

Swapnil Patil; Kai Ren; Garth A. Gibson

Lack of a highly scalable and parallel metadata service is the Achilles heel for many cluster file system deployments in both the HPC world and the Internet services world. This is because most cluster file systems have focused on scaling the data path, i.e. providing high bandwidth parallel I/O to files that are gigabytes in size. But with proliferation of massively parallel applications that produce metadata-intensive workloads, such as large number of simultaneous file creates and large-scale storage management, cluster file systems also need to scale metadata performance. To realize these goals, this paper makes a case for a scalable metadata service middleware that layers on existing cluster file system deployments and distributes file system metadata, including the namespace tree, small directories and large directories, across many servers. Our key idea is to effectively synthesize a concurrent indexing technique to distribute metadata with a tabular, on-disk representation of all file system metadata.

very large data bases | 2017

SlimDB: a space-efficient key-value storage engine for semi-sorted data

Kai Ren; Qing Zheng; Joy Arulraj; Garth A. Gibson

Modern key-value stores often use write-optimized indexes and compact in-memory indexes to speed up read and write performance. One popular write-optimized index is the Log-structured merge-tree (LSM-tree) which provides indexed access to write-intensive data. It has been increasingly used as a storage backbone for many services, including file system metadata management, graph processing engines, and machine learning feature storage engines. Existing LSM-tree implementations often exhibit high write amplifications caused by compaction, and lack optimizations to maximize read performance on solid-state disks. The goal of this paper is to explore techniques that leverage common workload characteristics shared by many systems using key-value stores to reduce the read/write amplification overhead typically associated with general-purpose LSM-tree implementations. Our experiments show that by applying these design techniques, our new implementation of a key-value store, SlimDB, can be two to three times faster, use less memory to cache metadata indices, and show lower tail latency in read operations compared to popular LSM-tree implementations such as LevelDB and RocksDB.

very large data bases | 2013