Biplob Debnath | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Biplob Debnath is active.

Explore More

Publication

Featured researches published by Biplob Debnath.

very large data bases | 2010

FlashStore: high throughput persistent key-value store

Biplob Debnath; Sudipta Sengupta; Jin Li

We present FlashStore, a high throughput persistent key-value store, that uses flash memory as a non-volatile cache between RAM and hard disk. FlashStore is designed to store the working set of key-value pairs on flash and use one flash read per key lookup. As the working set changes over time, space is made for the current working set by destaging recently unused key-value pairs to hard disk and recycling pages in the flash store. FlashStore organizes key-value pairs in a log-structure on flash to exploit faster sequential write performance. It uses an in-memory hash table to index them, with hash collisions resolved by a variant of cuckoo hashing. The in-memory hash table stores compact key signatures instead of full keys so as to strike tradeoffs between RAM usage and false flash read operations. FlashStore can be used as a high throughput persistent key-value storage layer for a broad range of server class applications. We compare FlashStore with BerkeleyDB, an embedded key-value store application, running on hard disk and flash separately, so as to bring out the performance gain of FlashStore in not only using flash as a cache above hard disk but also in its use of flash aware algorithms. We use real-world data traces from two data center applications, namely, Xbox LIVE Primetime online multi-player game and inline storage deduplication, to drive and evaluate the design of FlashStore on traditional and low power server platforms. FlashStore outperforms BerkeleyDB by up to 60x on throughput (ops/sec), up to 50x on energy efficiency (ops/Joule), and up to 85x on cost efficiency (ops/sec/dollar) on the evaluated datasets.

ieee conference on mass storage systems and technologies | 2011

A Forest-structured Bloom Filter with flash memory

Guanlin Lu; Biplob Debnath; David Hung-Chang Du

A Bloom Filter (BF) is a data structure based on probability to compactly represent/record a set of elements (keys). It has wide applications on efficiently identifying a key that has been seen before with minimum amount of recording space used. BF is heavily used in chunking based data de-duplication. Traditionally, a BF is implemented as in-RAM data structure; hence its size is limited by the available RAM space on the machine. For certain applications like data de-duplication that require a big BF beyond the size of available RAM space, it becomes necessary to store a BF into a secondary storage device. Since BF operations are inherently random in nature, magnetic disk provides worse performance for the random read and write operations. It will not be a good fit for storing the large BF. Flash memory based Solid State Drive (SSD) has been considered as an emerging storage device that has superior performance and can potentially replace disks as the preferred secondary storage devices. However, several special characteristics of flash memory make designing a flash memory based BF very challenging. In this paper, our goal is to design an efficient flash memory based BF that is fully aware of these physical characteristics. To this end, we propose a Forest-structured BF design (FBF). FBF uses a combination of RAM and flash memory to design a BF. BF is stored on the flash, while RAM helps to mitigate the impact of slow write performance of flash memory. In addition, in-flash BF is organized in a forest-like structure in order to improve the lookup performance. Our experimental results show that FBF design achieves 2 times faster processing speed with 50% less number of flash write operations when compared with the existing flash memory based BF designs.

international conference on distributed computing systems | 2011

BloomFlash: Bloom Filter on Flash-Based Storage

Biplob Debnath; Sudipta Sengupta; Jin Li; David J. Lilja; David Hung-Chang Du

The bloom filter is a probabilistic data structure that provides a compact representation of a set of elements. To keep false positive probabilities low, the size of the bloom filter must be dimensioned a priori to be linear in the maximum number of keys inserted, with the linearity constant ranging typically from one to few bytes. A bloom filter is most commonly used as an in memory data structure, hence its size is limited by the availability of RAM space on the machine. As datasets have grown over time to Internet scale, so have the RAM space requirements of bloom filters. If sufficient RAM space is not available, we advocate that flash memory may serve as a suitable medium for storing bloom filters, since it is about one-tenth the cost of RAM per GB while still providing access times orders of magnitude faster than hard disk. We present BLOOMFLASH, a bloom filter designed for flash memory based storage, that provides a new dimension of trade off with bloom filter access times to reduce RAM space usage (and hence system cost). The simple design of a single flat bloom filter on flash suffers from many performance bottlenecks, including in-place bit updates that are inefficient on flash and multiple reads and random writes spread out across many flash pages for a single lookup or insert operation. To mitigate these performance bottlenecks, BLOOMFLASH leverages two key design innovations: (i) buffering bit updates in RAM and applying them in bulk to flash that helps to reduce random writes to flash, and (ii) a hierarchical bloom filter design consisting of component bloom filters, stored one per flash page, that helps to localize reads and writes on flash. We use two real-world data traces taken from representative bloom filter applications to drive and evaluate our design. BLOOMFLASH achieves bloom filter access times in the range of few tens of microseconds, thus allowing up to order of tens of thousands operations per sec.

measurement and modeling of computer systems | 2010

CFTL: a convertible flash translation layer adaptive to data access patterns

Dong-Chul Park; Biplob Debnath; David Hung-Chang Du

The flash translation layer (FTL) is a software/hardware in terface inside NAND flash memory. Since FTL has a critical impact on the performance of NAND flash-based devices, a variety of FTL schemes have been proposed to improve their performance. In this paper, we propose a novel hybrid FTL scheme named Convertible Flash Translation Layer (CFTL). Unlike other existing FTLs using static address mapping schemes, CFTL is adaptive to data access patterns so that it can dynamically switch its mapping scheme to either a read-optimized or a write-optimized mapping scheme. In addition to this convertible scheme, we propose an efficient caching strategy to further improve the CFTL performance with only a simple hint. Consequently, both the convertible feature and the caching strategy empower CFTL to achieve good read performance as well as good write performance.

international conference on data engineering | 2008

SARD: A statistical approach for ranking database tuning parameters

Biplob Debnath; David J. Lilja; Mohamed F. Mokbel

Traditionally, DBMSs are shipped with hundreds of configuration parameters. Since the database performance highly depends on the appropriate settings of the configuration parameters, DBAs spend a lot of their time and effort to find the best parameter values for tuning the performance of the application of interest. In many cases, they rely on their experience and some rules of thumbs. However, time and effort may be wasted by tuning those parameters which may have no or marginal effects. Moreover, tuning effects also vary depending on the expertise of the DBAs, but skilled DBAs are increasingly becoming rare and expensive to employ. To address these problems, we present a statistical approach for ranking database parameters (SARD), which is based on the Plackett & Burman statistical design methodology. SARD takes the query workload and the number of configuration parameters as inputs, and using only a linear number of experiments, generates a ranking of database parameters based on their relative impacts on the DBMS performance. Preliminary experimental results using TPC-H and PostgreSQL show that SARD generated ranking can correctly identify critical configuration parameters.

modeling, analysis, and simulation on computer and telecommunication systems | 2009

Large Block CLOCK (LB-CLOCK): A write caching algorithm for solid state disks

Biplob Debnath; Sunil Subramanya; David Hung-Chang Du; David J. Lilja

Solid State Disks (SSDs) using NAND flash memory are increasingly being adopted in the high-end servers of datacenters to improve performance of the I/O-intensive applications. Compared to the traditional enterprise class hard disks, SSDs provide faster read performance, lower cooling cost, and higher power efficiency. However, write performance of a flash based SSD can be up to an order of magnitude slower than its read performance. Furthermore, frequent write operations degrade the lifetime of flash memory. A nonvolatile cache can greatly help to solve these problems. Although a RAM cache is relative high in cost, it has successfully eliminated the performance gap between fast CPU and slow magnetic disk. Similarly, a nonvolatile cache in an SSD can alleviate the disparity between the flash memorys read and write performance. A small write cache that reduces the number of flash block erase operations, can lead to substantial performance gain for write-intensive applications and can extend the overall lifetime of flash based SSDs. This paper presents a novel write caching algorithm, the Large Block CLOCK (LB-CLOCK) algorithm, which considers ‘recency’ and ‘block space utilization’ metrics to make cache management decisions. LB-CLOCK dynamically varies the priority between these two metrics to adapt to changes in workload characteristics. Our simulation based experimental results show that LB-CLOCK outperforms the best known existing flash caching algorithms for a wide range of workloads.

modeling analysis and simulation on computer and telecommunication systems | 2011

A Workload-Aware Adaptive Hybrid Flash Translation Layer with an Efficient Caching Strategy

Dong-Chul Park; Biplob Debnath; David Hung-Chang Du

In this paper, we propose a Convertible Flash Translation Layer (CFTL) for NAND flash-based storage systems. CFTL is a novel hybrid flash translation layer adaptive to workloads so that it can dynamically switch its mapping scheme to either a page level mapping or a block level mapping scheme to fully exploit the benefits of them. Moreover, we propose an efficient caching strategy to further improve the CFTL performance. Consequently, both the convertible feature and the caching strategy empower CFTL to achieve good read performance as well as good write performance. Our experimental evaluation with various realistic workloads demonstrates that CFTL outweighs other FTL schemes. In particular, our new caching strategy remarkably improves cache hit ratios, by an average of 245%, and exhibits much higher hit ratios especially for randomly read intensive workloads.

conference on information and knowledge management | 2007

Towards efficient search on unstructured data: an intelligent-storage approach

Aravindan Raghuveer; Meera Jindal; Mohamed F. Mokbel; Biplob Debnath; David Hung-Chang Du

Applications that create and consume unstructured data have grown both in scale of storage requirements and complexity of search primitives. We consider two such applications: exhaustive search and integration of structured and unstructured data. Current block-based storage systems are either incapable or inefficient to address the challenges bought forth by the above applications. We propose a storage framework to efficiently store and search unstructured and structured data while controlling storage management costs. Experimental results based on our prototype show that the proposed system can provide impressive performance and feature benefits.

international conference on data engineering | 2013

TBF: A memory-efficient replacement policy for flash-based caches

Cristian Ungureanu; Biplob Debnath; Stephen Rago; Akshat Aranya

The performance and capacity characteristics of flash storage make it attractive to use as a cache. Recency-based cache replacement policies rely on an in-memory full index, typically a B-tree or a hash table, that maps each object to its recency information. Even though the recency information itself may take very little space, the full index for a cache holding N keys requires at least log N bits per key. This metadata overhead is undesirably high when used for very large flash-based caches, such as key-value stores with billions of objects. To solve this problem, we propose a new RAM-frugal cache replacement policy that approximates the least-recently-used (LRU) policy. It uses two in-memory Bloom sub-filters (TBF) for maintaining the recency information and leverages an on-flash key-value store to cache objects. TBF requires only one byte of RAM per cached object, making it suitable for implementing very large flash-based caches. We evaluate TBF through simulation on traces from several block stores and key-value stores, as well as evaluate it using the Yahoo! Cloud Serving Benchmark in a real system implementation. Evaluation results show that TBF achieves cache hit rate and operations per second comparable to those of LRU in spite of its much smaller memory requirements.

acm symposium on applied computing | 2012

HotDataTrap: a sampling-based hot data identification scheme for flash memory

Dong-Chul Park; Biplob Debnath; Youngjin Nam; David Hung-Chang Du; Young-Kyun Kim; Youngchul Kim

Hot data identification is an issue of paramount importance in flash-based storage devices since it has a great impact on their overall performance as well as retains a big potential to be applicable to many other fields. However, it has been least investigated. HotDataTrap is a novel on-line hot data identification scheme adopting a sampling mechanism. This sampling-based algorithm enables HotDataTrap to early discard some of the cold items so that it can reduce runtime overheads and a waste of memory spaces. Moreover, its two-level hierarchical hash indexing scheme helps HotDataTrap directly look up a requested item in the cache and save a memory space further by exploiting spatial localities. Both our sampling approach and hierarchical hash indexing scheme empower HotDataTrap to precisely and efficiently identify hot data with a very limited memory space. Our extensive experiments with various realistic workloads demonstrate that our HotDataTrap outperforms the state-of-the-art scheme by an average of 335% and and our two-level hash indexing scheme considerably improves further HotDataTrap performance up to 50.8%.

Explore More