Is this you? Create Your Porfile

Yukun Zhou

Huazhong University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yukun Zhou is active.

Explore More

Publication

Featured researches published by Yukun Zhou.

Proceedings of the IEEE | 2016

A Comprehensive Study of the Past, Present, and Future of Data Deduplication

Wen Xia; Hong Jiang; Dan Feng; Fred Douglis; Philip Shilane; Yu Hua; Min Fu; Yucheng Zhang; Yukun Zhou

Data deduplication, an efficient approach to data reduction, has gained increasing attention and popularity in large-scale storage systems due to the explosive growth of digital data. It eliminates redundant data at the file or subfile level and identifies duplicate content by its cryptographically secure hash signature (i.e., collision-resistant fingerprint), which is shown to be much more computationally efficient than the traditional compression approaches in large-scale storage systems. In this paper, we first review the background and key features of data deduplication, then summarize and classify the state-of-the-art research in data deduplication according to the key workflow of the data deduplication process. The summary and taxonomy of the state of the art on deduplication help identify and understand the most important design considerations for data deduplication systems. In addition, we discuss the main applications and industry trend of data deduplication, and provide a list of the publicly available sources for deduplication research and studies. Finally, we outline the open problems and future research directions facing deduplication-based storage systems.

ieee conference on mass storage systems and technologies | 2015

SecDep: A user-aware efficient fine-grained secure deduplication scheme with multi-level key management

Yukun Zhou; Dan Feng; Wen Xia; Min Fu; Fangting Huang; Yucheng Zhang; Chunguang Li

Nowadays, many customers and enterprises backup their data to cloud storage that performs deduplication to save storage space and network bandwidth. Hence, how to perform secure deduplication becomes a critical challenge for cloud storage. According to our analysis, the state-of-the-art secure deduplication methods are not suitable for cross-user finegrained data deduplication. They either suffer brute-force attacks that can recover files falling into a known set, or incur large computation (time) overheads. Moreover, existing approaches of convergent key management incur large space overheads because of the huge number of chunks shared among users. Our observation that cross-user redundant data are mainly from the duplicate files, motivates us to propose an efficient secure deduplication scheme SecDep. SecDep employs User-Aware Convergent Encryption (UACE) and Multi-Level Key management (MLK) approaches. (1) UACE combines cross-user file-level and inside-user chunk-level deduplication, and exploits different secure policies among and inside users to minimize the computation overheads. Specifically, both of file-level and chunk-level deduplication use variants of Convergent Encryption (CE) to resist brute-force attacks. The major difference is that the file-level CE keys are generated by using a server-aided method to ensure security of cross-user deduplication, while the chunk-level keys are generated by using a user-aided method with lower computation overheads. (2) To reduce key space overheads, MLK uses file-level key to encrypt chunk-level keys so that the key space will not increase with the number of sharing users. Furthermore, MLK splits the file-level keys into share-level keys and distributes them to multiple key servers to ensure security and reliability of file-level keys. Our security analysis demonstrates that SecDep ensures data confidentiality and key security. Our experiment results based on several large real-world datasets show that SecDep is more time-efficient and key-space-efficient than the state-of-the-art secure deduplication approaches.

international conference on computer communications | 2015

AE: An Asymmetric Extremum content defined chunking algorithm for fast and bandwidth-efficient data deduplication

Yucheng Zhang; Hong Jiang; Dan Feng; Wen Xia; Min Fu; Fangting Huang; Yukun Zhou

Data deduplication, a space-efficient and bandwidth-saving technology, plays an important role in bandwidth-efficient data transmission in various data-intensive network and cloud applications. Rabin-based and MAXP-based Content-Defined Chunking (CDC) algorithms, while robust in finding suitable cut-points for chunk-level redundancy elimination, face the key challenges of (1) low chunking throughput that renders the chunking stage the deduplication performance bottleneck and (2) large chunk-size variance that decreases deduplication efficiency. To address these challenges, this paper proposes a new CDC algorithm called the Asymmetric Extremum (AE) algorithm. The main idea behind AE is based on the observation that the extreme value in an asymmetric local range is not likely to be replaced by a new extreme value in dealing with the boundaries-shift problem, which motivates AEs use of asymmetric (rather than symmetric as in MAXP) local range to identify cut-points and simultaneously achieve high chunking throughput and low chunk-size variance. As a result, AE simultaneously addresses the problems of low chunking throughput in MAXP and Rabin and high chunk-size variance in Rabin. The experimental results based on four real-world datasets show that AE improves the throughput performance of the state-of-the-art CDC algorithms by 3x while attaining comparable or higher deduplication efficiency.

international parallel and distributed processing symposium | 2016

Security RBSG: Protecting Phase Change Memory with Security-Level Adjustable Dynamic Mapping

Fangting Huang; Dan Feng; Wen Xia; Wen Zhou; Yucheng Zhang; Min Fu; Chuntao Jiang; Yukun Zhou

As an emerging memory technology to build the future main memory systems, Phase Change Memory (PCM) can increase memory capacity in a cost-effective and power-efficient way. However, PCM is facing security threats for its limited write endurance: a malicious adversary could wear out the cells and cause the whole PCM system to fail within a short period of time. To address this issue, several wear-leveling schemes have been proposed to evenly distribute write traffic in a security-aware manner. In this work, we present a new type of timing attacknamed Remapping Timing Attack (RTA), based on the asymmetry in write time of PCM. Our analysis and experimental results show that the new revealed RTA can make two state-of-the-art wear-leveling schemes (Region Based Start-Gap and Security Refresh) lose effectiveness, failing PCM with these two techniques in several days (even minutes). In order to defend such attack, we further propose a novel wear-leveling scheme called Security Region Based Start-Gap (Security RBSG), which employs a two-stage strategy and uses a dynamic Feistel Network to enhance the simple Start-Gap wear leveling with level-adjustable security assurance. The theoretical analysis and evaluation results show that the proposed Security RBSG is the most robust wear-leveling methodology so far, which not only better defends the new RTA, but also performs well on the traditional malicious attacks, i.e., Repeated Address Attack and Birthday Paradox Attack.

IEEE Transactions on Computers | 2017

A Fast Asymmetric Extremum Content Defined Chunking Algorithm for Data Deduplication in Backup Storage Systems

Yucheng Zhang; Dan Feng; Hong Jiang; Wen Xia; Min Fu; Fangting Huang; Yukun Zhou

Chunk-level deduplication plays an important role in backup storage systems. Existing Content-Defined Chunking (CDC) algorithms, while robust in finding suitable chunk boundaries, face the key challenges of (1) low chunking throughput that renders the chunking stage a serious deduplication performance bottleneck, (2) large chunk size variance that decreases deduplication efficiency, and (3) being unable to find proper chunk boundaries in low-entropy strings and thus failing to deduplicate these strings. To address these challenges, this paper proposes a new CDC algorithm called the Asymmetric Extremum (AE) algorithm. The main idea behind AE is based on the observation that the extreme value in an asymmetric local range is not likely to be replaced by a new extreme value in dealing with the boundaries-shifting problem. As a result, AE has higher chunking throughput, smaller chunk size variance than the existing CDC algorithms, and is able to find proper chunk boundaries in low-entropy strings. The experimental results based on real-world datasets show that AE improves the throughput performance of the state-of-the-art CDC algorithms by more than

Future Generation Computer Systems | 2017

A similarity-aware encrypted deduplication scheme with flexible access control in the cloud

Yukun Zhou; Dan Feng; Yu Hua; Wen Xia; Min Fu; Fangting Huang; Yucheng Zhang

2.3\times

international conference on computer communications | 2017

BAC: Bandwidth-aware compression for efficient live migration of virtual machines

Chunguang Li; Dan Feng; Yu Hua; Wen Xia; Leihua Qin; Yue Huang; Yukun Zhou

, which is fast enough to remove the chunking-throughput performance bottleneck of deduplication, and accelerates the system throughput by more than 50 percent, while achieving comparable deduplication efficiency.

networking architecture and storages | 2017

Reducing Chunk Fragmentation for In-Line Delta Compressed and Deduplicated Backup Systems

Yucheng Zhang; Dan Feng; Yu Hua; Yuchong Hu; Wen Xia; Min Fu; Xiaolan Tang; Zhikun Wang; Fangting Huang; Yukun Zhou

Abstract Data deduplication has been widely used in the cloud to reduce storage space. To protect data security, users encrypt data with message-locked encryption (MLE) to enable deduplication over ciphertexts. However, existing secure deduplication schemes suffer from security weakness (i.e., brute-force attacks) and fail to support flexible access control. The process of chunk-level MLE key generation and sharing exists potential privacy issues and heavy computation consumption. We propose EDedup, a similarity-aware encrypted deduplication scheme that supports flexible access control with revocation. Specifically, EDedup groups files into segments and performs server-aided MLE at segment-level, which exploits similarity via a representative hash (e.g., the min-hash) to reduce computation consumption. This nevertheless faces a new attack that an attacker gets keys by guessing the representative hash. And hence EDedup combines source-based similar-segment detection and target-based duplicate-chunk checking to resist attacks and guarantee deduplication efficiency. Furthermore, EDedup generates message-derived file keys for duplicate files to manage metadata. EDedup encrypts file keys via proxy-based attribute-based encryption, which reduces metadata storage overheads and implements flexible access control with revocation. Evaluation results demonstrate that EDedup improves the speed of MLE up to 10.9X and 0.36X compared with DupLESS-chunk and SecDep respectively. EDedup reduces metadata storage overheads by 39.9%–65.7% relative to REED.

international conference on parallel and distributed systems | 2016

DEC: An Efficient Deduplication-Enhanced Compression Approach

Zijin Han; Wen Xia; Yuchong Hu; Dan Feng; Yucheng Zhang; Yukun Zhou; Min Fu; Liang Gu

Live migration of virtual machines (VM) is one of the key characteristics of virtualization for load balancing, system maintenance, power management, etc., in data centers or clusters. In order to reduce the data transferred and shorten the migration time, the compression techniques have been widely used to accelerate VM migration. However, different compression approaches have different compression ratios and speeds. Because there is a trade-off between compression and transmission, the migration performance improvements obtained from different compression approaches are differentiated, and the improvements vary with the network bandwidth. Besides, the compression window sizes used in most compression algorithms are typically much larger than a single page size, so the traditional single page compression loses some potential compression benefits. In this paper, we design and implement a Bandwidth-Aware Compression (BAC) scheme for VM migration. BAC chooses suitable compression approach according to the network bandwidth available for the migration process, and employs multi-page compression. These features make BAC obtain more migration performance improvements from compression. Experiments under various network scenarios demonstrate that, compared with conventional compression approaches, BAC shortens the total migration time while achieving comparable performance for the total data transferred and the downtime.

Performance Evaluation | 2014

Ddelta: A deduplication-inspired fast delta compression approach

Wen Xia; Hong Jiang; Dan Feng; Lei Tian; Min Fu; Yukun Zhou

Chunk-level deduplication, while robust in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation and accelerate the restore speed. Delta compression can remove redundant data between non-duplicate but similar chunks which cannot be eliminated by chunk-level deduplication. Some applications use delta compression as a complement for chunk-level deduplication to attain extra space and bandwidth savings. However, we observe that delta compression introduces a new type of chunk fragmentation stemming from delta compressed chunks whose base chunks are fragmented. We refer to such delta compressed chunks as base-fragmented chunks. We found that this new type of chunk fragmentation has a more severely impact on the restore performance than the chunk fragmentation introduced by chunk-level deduplication and cannot be reduced by existing rewriting algorithms. In order to address the problem due to the base-fragmented chunks, we propose SDC, a scheme that selectively performs delta compression after chunk-level deduplication. The main idea behind SDC is to simulate a restore cache to identify the non-base-fragmented chunks and only perform delta compression for these chunks, thus avoiding the new type of chunk fragmentation. Due to the locality among the backup streams, most of the non-base-fragmented chunks can be detected by the simulated restore cache. Experimental results based on real-world datasets show that SDC improves the restore performance of the delta compressed and deduplicated backup system by 1.93X-7.48X, and achieves 95.5%-97.4% of its compression, while imposing negligible impact on the backup throughput.

Explore More