Song Jiang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Song Jiang is active.

Explore More

Publication

Featured researches published by Song Jiang.

measurement and modeling of computer systems | 2002

LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance

Song Jiang; Xiaodong Zhang

Although LRU replacement policy has been commonly used in the buffer cache management, it is well known for its inability to cope with access patterns with weak locality. Previous work, such as LRU-K and 2Q, attempts to enhance LRU capacity by making use of additional history information of previous block references other than only the recency information used in LRU. These algorithms greatly increase complexity and/or can not consistently provide performance improvement. Many recently proposed policies, such as UBM and SEQ, improve replacement performance by exploiting access regularities in references. They only address LRU problems on certain specific and well-defined cases such as access patterns like sequences and loops. Motivated by the limits of previous studies, we propose an efficient buffer cache replacement policy, called Low Inter-reference Recency Set (LIRS). LIRS effectively addresses the limits of LRU by using recency to evaluate Inter-Reference Recency (IRR) for making a replacement decision. This is in contrast to what LRU does: directly using recency to predict next reference timing. At the same time, LIRS almost retains the same simple assumption of LRU to predict future access behavior of blocks. Our objectives are to effectively address the limits of LRU for a general purpose, to retain the low overhead merit of LRU, and to outperform those replacement policies relying on the access regularity detections. Conducting simulations with a variety of traces and a wide range of cache sizes, we show that LIRS significantly outperforms LRU, and outperforms other existing replacement algorithms in most cases. Furthermore, we show that the additional cost for implementing LIRS is trivial in comparison with LRU.

measurement and modeling of computer systems | 2012

Workload analysis of a large-scale key-value store

Berk Atikoglu; Yuehai Xu; Eitan Frachtenberg; Song Jiang; Mike Paleczny

Key-value stores are a vital component in many scale-out enterprises, including social networks, online retail, and risk analysis. Accordingly, they are receiving increased attention from the research community in an effort to improve their performance, scalability, reliability, cost, and power consumption. To be effective, such efforts require a detailed understanding of realistic key-value workloads. And yet little is known about these workloads outside of the companies that operate them. This paper aims to address this gap. To this end, we have collected detailed traces from Facebooks Memcached deployment, arguably the worlds largest. The traces capture over 284 billion requests from five different Memcached use cases over several days. We analyze the workloads from multiple angles, including: request composition, size, and rate; cache efficacy; temporal patterns; and application use cases. We also propose a simple model of the most representative trace to enable the generation of more realistic synthetic workloads by the community. Our analysis details many characteristics of the caching workload. It also reveals a number of surprises: a GET/SET ratio of 30:1 that is higher than assumed in the literature; some applications of Memcached behave more like persistent storage than a cache; strong locality metrics, such as keys accessed many millions of times a day, do not always suffice for a high hit rate; and there is still room for efficiency and hit rate improvements in Memcacheds implementation. Toward the last point, we make several suggestions that address the exposed deficiencies.

conference on high performance computing (supercomputing) | 2005

Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers

Roberto Gioiosa; José Carlos Sancho; Song Jiang; Fabrizio Petrini; Kei Davis

We describe the software architecture, technical features, and performance of TICK (Transparent Incremental Checkpointer at Kernel level), a system-level checkpointer implemented as a kernel thread, specifi- cally designed to provide fault tolerance in Linux clusters. This implementation, based on the 2.6.11 Linux kernel, provides the essential functionality for transparent, highly responsive, and efficient fault tolerance based on full or incremental checkpointing at system level. TICK is completely user-transparent and does not require any changes to user code or system libraries; it is highly responsive: an interrupt, such as a timer interrupt, can trigger a checkpoint in as little as 2.5µs; and it supports incremental and full checkpoints with minimal overhead-less than 6% with full checkpointing to disk performed as frequently as once per minute.

architectural support for programming languages and operating systems | 2014

SDF: software-defined flash for web-scale internet storage systems

Jian Ouyang; Shiding Lin; Song Jiang; Zhenyu Hou; Yong Wang; Yuanzheng Wang

In the last several years hundreds of thousands of SSDs have been deployed in the data centers of Baidu, Chinas largest Internet search company. Currently only 40\% or less of the raw bandwidth of the flash memory in the SSDs is delivered by the storage system to the applications. Moreover, because of space over-provisioning in the SSD to accommodate non-sequential or random writes, and additionally, parity coding across flash channels, typically only 50-70\% of the raw capacity of a commodity SSD can be used for user data. Given the large scale of Baidus data center, making the most effective use of its SSDs is of great importance. Specifically, we seek to maximize both bandwidth and usable capacity. To achieve this goal we propose {\em software-defined flash} (SDF), a hardware/software co-designed storage system to maximally exploit the performance characteristics of flash memory in the context of our workloads. SDF exposes individual flash channels to the host software and eliminates space over-provisioning. The host software, given direct access to the raw flash channels of the SSD, can effectively organize its data and schedule its data access to better realize the SSDs raw performance potential. Currently more than 3000 SDFs have been deployed in Baidus storage system that supports its web page and image repository services. Our measurements show that SDF can deliver approximately 95% of the raw flash bandwidth and provide 99% of the flash capacity for user data. SDF increases I/O bandwidth by 300\% and reduces per-GB hardware cost by 50% on average compared with the commodity-SSD-based system used at Baidu.

international symposium on low power electronics and design | 2006

SmartSaver: turning flash drive into a disk energy saver for mobile computers

Feng Chen; Song Jiang; Xiaodong Zhang

In a mobile computer the hard disk consumes a considerable amount of energy. Existing dynamic power management policies usually take conservative approaches to save disk energy, and disk energy consumption remains a serious issue. Meanwhile, the flash drive is becoming a must-have portable storage device for almost every laptop user on travel. In this paper, we propose to make another highly desired use of the flash drive - saving disk energy. This is achieved by using the flash drive as a standby buffer for caching and prefetching disk data. Our design significantly extends disk idle times with careful and deliberate consideration of the particular characteristics of the flash drive. Trace-driven simulations show that up to 41% of disk energy can be saved with a relatively small amount of data written to the flash drive

cluster computing and the grid | 2007

Exploiting Lustre File Joining for Effective Collective IO

Weikuan Yu; Jeffrey S. Vetter; Richard Shane Canon; Song Jiang

Lustre is a parallel file system that presents high aggregated IO bandwidth by striping file extents across many storage devices. However, our experiments indicate excessively wide striping can cause performance degradation. Lustre supports an innovative file joining feature that joins files in place. To mitigate striping overhead and benefit collective IO, we propose two techniques: split writing and hierarchical striping. In split writing, a file is created as separate subfiles, each of which is striped to only a few storage devices. They are joined as a single file at the file close time. Hierarchical striping builds on top of split writing and orchestrates the span of subfiles in a hierarchical manner to avoid overlapping and achieve the appropriate coverage of storage devices. Together, these techniques can avoid the overhead associated with large stripe width, while still being able to combine bandwidth available from many storage devices. We have prototyped these techniques in the ROMIO implementation of MPI-IO. Experimental results indicate that split writing and hierarchical striping can significantly improve the performance of Lustre collective IO in terms of both data transfer and management operations. On a Lustre file system configured with 46 object storage targets, our implementation improves collective write performance of a 16-process job by as much as 220%.

IEEE Transactions on Parallel and Distributed Systems | 2008

LightFlood: Minimizing Redundant Messages and Maximizing Scope of Peer-to-Peer Search

Song Jiang; Lei Guo; Xiaodong Zhang; Haodong Wang

Flooding is a fundamental file search operation in unstructured peer-to-peer (P2P) file sharing systems, in which a peer starts the file search procedure by broadcasting a query to its neighbors, who continue to propagate it to their neighbors. This procedure repeats until a time-to-live (TTL) counter is decremented to 0. Flooding can seriously limit system scalability, because the number of redundant query messages grows exponentially during the message propagation. Our study shows that more than 70 percent of the generated messages are redundant in a flooding with a TTL of 7 in a moderately connected Gnutella network. Existing efforts to address this issue have been focused on limiting the use of the flooding operation. We propose a new flooding scheme, called LightFlood, with the objective of minimizing the number of redundant messages and retaining a similar message-propagating scope as that of the standard flooding. In the scheme, each peer keeps track of the connectivities of every immediate and next indirect neighbor peers, which can be acquired locally. LightFlood identifies the neighbor with the highest connectivity and uses the link to that neighbor to form a suboverlay within the existing P2P overlay. In LightFlood, flooding is divided into two stages. The first stage is a standard flooding with a limited number of TTL hops, where a message can spread to a sufficiently large scope with a small number of redundant messages. In the second stage, message propagating is only conducted along the suboverlay, significantly reducing the number of redundant messages. Our analysis and simulation experiments show that the LightFlood scheme provides a low-overhead broadcast facility that can be effectively used in P2P search. For example, compared with standard flooding with seven TTL hops, we show that LightFlood with an additional two to three hops can reduce up to 69 percent of the flooding messages and retain the same flooding scope. We believe that LightFlood can be widely used as a core mechanism for efficient message broadcasting in P2P systems due to its near-optimal performance.

international parallel and distributed processing symposium | 2005

Current practice and a direction forward in checkpoint/restart implementations for fault tolerance

José Carlos Sancho; Fabrizio Petrini; Kei Davis; Roberto Gioiosa; Song Jiang

Checkpoint/restart is a general idea for which particular implementations enable various functionalities in computer systems, including process migration, gang scheduling, hibernation, and fault tolerance. For fault tolerance, in current practice, implementations can be at user-level or system-level. User-level implementations are relatively easy to implement and portable, but suffer from a lack of transparency, flexibility, and efficiency, and in particular are unsuitable for the autonomic (self-managing) computing systems envisioned as the next revolutionary development in system management. In contrast, a system-level implementation can exhibit all of these desirable features, at the cost of a more sophisticated implementation, and is seen as an essential mechanism for the next generation of fault tolerant - and ultimately autonomic - large-scale computing systems. Linux is becoming the operating system of choice for the largest-scale machines, but development of system-level checkpoint/restart mechanisms for Linux is still in its infancy, with all extant implementations exhibiting serious deficiencies for achieving transparent fault tolerance. This paper provides a survey of extant implementations in a natural taxonomy, highlighting their strengths and inherent weaknesses.

IEEE Transactions on Computers | 2005

Making LRU friendly to weak locality workloads: a novel replacement algorithm to improve buffer cache performance

Song Jiang; Xiaodong Zhang

Although the LRU replacement algorithm has been widely used in buffer cache management, it is well-known for its inability to cope with access patterns with weak locality. Previously proposed algorithms to improve LRU greatly increase complexity and/or cannot provide consistently improved performance. Some of the algorithms only address LRU problems on certain specific and predefined cases. Motivated by the limitations of existing algorithms, we propose a general and efficient replacement algorithm, called Low Inter-reference Recency Set (LIRS). LIRS effectively addresses the limitations of LRU by using recency to evaluate Inter-Reference Recency (IRR) of accessed blocks for making a replacement decision. This is in contrast to what LRU does: directly using recency to predict the next reference time. Meanwhile, LIRS mostly retains the simple assumption adopted by LRU for predicting future block access behaviors. Conducting simulations with a variety of traces of different access patterns and with a wide range of cache sizes, we show that LIRS significantly outperforms LRU and outperforms other existing replacement algorithms in most cases. Furthermore, we show that the additional cost for implementing LIRS is trivial in comparison with that of LRU. We also show that the LIRS algorithm can be extended into a family of replacement algorithms, in which LRU is a special member.

international conference on distributed computing systems | 2004

ULC: a file block placement and replacement protocol to effectively exploit hierarchical locality in multi-level buffer caches

Song Jiang; Xiaodong Zhang

In a large client/server cluster system, file blocks are cached in a multilevel storage hierarchy. Existing file block placement and replacement are either conducted on each level of the hierarchy independently, or by applying an LRU policy on more than one levels. One major limitation of these schemes is that hierarchical locality of file blocks with nonuniform strengths is ignored, resulting in many unnecessary block misses, or additional communication overhead. To address this issue, we propose a client-directed, coordinated file block placement and replacement protocol, where the nonuniform strengths of locality are dynamically identified on the client level to direct servers on placing or replacing file blocks accordingly on different levels of the buffer caches. In other words, the caching layout of the blocks in the hierarchy dynamically matches the locality of block accesses. The effectiveness of our proposed protocol comes from achieving the following three goals: (1) The multilevel cache retains the same hit rate as that of a single level cache whose size equals to the aggregate size of multilevel caches. (2) The nonuniform locality strengths of blocks are fully exploited and ranked to fit into the physical multilevel caches. (3) The communication overheads between caches are also reduced.

Explore More