Nikolai Joukov | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nikolai Joukov is active.

Explore More

Publication

Featured researches published by Nikolai Joukov.

ACM Transactions on Storage | 2008

A nine year study of file system and storage benchmarking

Avishay Traeger; Erez Zadok; Nikolai Joukov; Charles P. Wright

Benchmarking is critical when evaluating performance, but is especially difficult for file and storage systems. Complex interactions between I/O devices, caches, kernel daemons, and other OS components result in behavior that is rather difficult to analyze. Moreover, systems have different features and optimizations, so no single benchmark is always suitable. The large variety of workloads that these systems experience in the real world also adds to this difficulty. In this article we survey 415 file system and storage benchmarks from 106 recent papers. We found that most popular benchmarks are flawed and many research papers do not provide a clear indication of true performance. We provide guidelines that we hope will improve future performance evaluations. To show how some widely used benchmarks can conceal or overemphasize overheads, we conducted a set of experiments. As a specific example, slowing down read operations on ext2 by a factor of 32 resulted in only a 2--5% wall-clock slowdown in a popular compile benchmark. Finally, we discuss future work to improve file system and storage benchmarking.

ACM Transactions on Storage | 2006

On incremental file system development

Erez Zadok; Rakesh Iyer; Nikolai Joukov; Gopalan Sivathanu; Charles P. Wright

Developing file systems from scratch is difficult and error prone. Using layered, or stackable, file systems is a powerful technique to incrementally extend the functionality of existing file systems on commodity OSes at runtime. In this article, we analyze the evolution of layering from historical models to what is found in four different present day commodity OSes: Solaris, FreeBSD, Linux, and Microsoft Windows. We classify layered file systems into five types based on their functionality and identify the requirements that each class imposes on the OS. We then present five major design issues that we encountered during our experience of developing over twenty layered file systems on four OSes. We discuss how we have addressed each of these issues on current OSes, and present insights into useful OS and VFS features that would provide future developers more versatile solutions for incremental file system development.

workshop on storage security and survivability | 2006

Secure deletion myths, issues, and solutions

Nikolai Joukov; Erez Zadok

This paper has three goals. (1) We try to debunk several held misconceptions about secure deletion: that encryption is an ideal solution for everybody, that existing data-overwriting tools work well, and that securely deleted files must be overwritten many times. (2) We discuss new and important issues that are often neglected: secure deletion consistency in case of power failures, handling versioning and journalling file systems, and metadata overwriting. (3) We present two solutions for on-demand secure deletion. First, we have created a highly portable and flexible system that performs only the minimal amount of work in kernel mode. Second, we present two in-kernel solutions in the form of Ext3 file system patches that can perform comprehensive data and metadata overwriting. We evaluated our proposed solutions and discuss the trade-offs involved.

ieee conference on mass storage systems and technologies | 2007

RAIF: Redundant Array of Independent Filesystems

Nikolai Joukov; Arun M. Krishnakumar; Chaitanya Patti; Abhishek Rai; Sunil Satnur; Avishay Traeger; Erez Zadok

Storage virtualization and data management are well known problems for individual users as well as large organizations. Existing storage-virtualization systems either do not support a complete set of possible storage types, do not provide flexible data-placement policies, or do not support per-file conversion (e.g., encryption). This results in suboptimal utilization of resources, inconvenience, low reliability, and poor performance. We have designed a stackable file system called redundant array of independent filesystems (RAIF). It combines the data survivability and performance benefits of traditional RAID with the flexibility of composition and ease of development of stackable file systems. RAIF can be mounted on top of directories and thus on top of any combination of network, distributed, disk-based, and memory-based file systems. Individual files can be replicated, striped, or stored with erasure-correction coding on any subset of the underlying file systems. RAIF has similar performance to RAID. In configurations with parity, RAIFs write performance is better than the performance of driver-level and even entry-level hardware RAID systems. This is because RAIF has better control over the data and parity caching.

Ibm Journal of Research and Development | 2008

Galapagos: model-driven discovery of end-to-end application-storage relationships in distributed systems

Kostas Magoutis; Murthy V. Devarakonda; Nikolai Joukov; Norbert G. Vogl

Modern business information systems are typically multitiered distributed systems comprising Web services, application services, databases, enterprise information systems, file systems, storage controllers, and other storage systems. In such environments, data is stored in different forms at multiple tiers, with each tier associated with some level of data abstraction. An information entity owned by an application generally maps to several data entities, logically associated across tiers and related to the application. Discovery of such relationships in a distributed system is a challenging problem, complicated by the widespread adoption of virtualization technologies and by the traditional tendency to manage each tier as an independent domain. In this paper, we present a system and methodology for model-driven discovery of end-to-end application-data relationships spanning multiple tiers, from the applications to the lowest levels of the storage hierarchy. The key to our methodology involves modeling how data is used and transformed by distributed software components. An important benefit of our system, which we call Galapagos, is the ability to reflect business decisions expressed at the application level to the level of storage.

european conference on computer systems | 2008

GreenFS: making enterprise computers greener by protecting them better

Nikolai Joukov; Josef Sipek

Hard disks contain data - frequently an irreplaceable asset of high monetary and non-monetary value. At the same time, hard disks are mechanical devices that consume power, are noisy, and fragile when their platters are rotating. In this paper we demonstrate that hard disks cause different kinds of problems for different types of computer systems and demystify several common misconceptions. We show that solutions developed to date are incapable of solving the power consumption, noise, and data reliability problems without sacrificing hard disk life-time, data reliability, or user convenience. We considered data reliability, recovery, performance, user convenience, and hard disk-caused problems together at the enterprise scale. We have designed GreenFS: a fan-out stackable file system that offers all-time all-data run-time data protection, improves performance under typical user workloads, and allows hard disks to be kept off most of the time. As a result, GreenFS improves enterprise data protection, minimizes disk drive-related power consumption and noise and increases the chances of disk drive survivability in case of unexpected external impacts.

cluster computing and the grid | 2005

Increasing distributed storage survivability with a stackable RAID-like file system

Nikolai Joukov; Abhishek Rai; Erez Zadok

We have designed a stackable file system called Redundant Array of Independent Filesystems (RAIF). It combines the data survivability properties and performance benefits of traditional RAIDs with the unprecedented flexibility of composition, improved security, and ease of development of stackable file systems. RAIF can be mounted on top of any combination of other file systems including network, distributed, disk-based, and memory-based file systems. Existing encryption, compression, antivirus, and consistency checking stackable file systems can be mounted above and below RAIF, to efficiently cope up with slow or unsecure branches. Individual files can be distributed across branches, replicated, stored with parity, or stored with erasure correction coding to recover from failures on multiple branches. Per-file incremental recovery, storage type migration, and load-balancing are especially well suited for grid storages. In this paper, we describe the current RAIF design, provide preliminary performance results and discuss current status and future directions.

haifa experimental systems conference | 2010

Application-storage discovery

Nikolai Joukov; Birgit Pfitzmann; HariGovind V. Ramasamy; Murthy V. Devarakonda

Discovering application dependency on data and storage is a key prerequisite for many storage optimization tasks such as data assignment to storage tiers, storage consolidation, virtualization, and handling unused data. However, in the real world these dependencies are rarely known, and discovering them is a challenge because of virtualization at various levels and the need for discovery methods to be non-intrusive. As a result, many optimization tasks are performed, if at all, without the full knowledge of application-to-storage dependencies. This paper presents a non-intrusive application-to-storage discovery method, and while it is built on our prior work, the storage discovery described here is entirely new. We used this discovery method in two production enterprise environments, consisting of about 323 servers, and we show how the discovered data enables three optimization tasks. First, we relate application criticality with storage tiers. Second, we find unused storage devices and we show how this information together with storage consolidation can be used to achieve power savings of up to two orders of magnitude. Third, we identify opportunities for database storage optimization.

ieee international conference on services computing | 2011

Migration to Multi-image Cloud Templates

Birgit Pfitzmann; Nikolai Joukov

IT management costs increasingly dominate the overall IT costs. The main hope for reducing them is to standardize software and processes, as this leads to economies of scale in the management services. A key vehicle by which enterprises hope to achieve this is cloud computing, and they start to show interest in clouds outside the initial sweet spot of development and test. As business applications typically contain multiple images with dependencies, one is starting to standardize on multi-image structures. Benefits are ease of deployment of the entire structure and consistent later management services for the business applications. Enterprises have huge investments in their existing business applications, e.g., their web design, special code, database schemas, and data. The promises of clouds can only be realized if a significant fraction of these existing applications can be migrated into the clouds. We therefore present analysis techniques for mapping existing IT environments to multi-image cloud templates. We propose multiple matching criteria, leading to tradeoffs between the number of matches and the migration overhead, and present efficient algorithms for these special graph matching problems. We also present results from analyzing an existing enterprise environment with about 1600 servers.

workshop on storage security and survivability | 2006

Using free web storage for data backup

Avishay Traeger; Nikolai Joukov; Josef Sipek; Erez Zadok

Backing up important data is crucial. A variety of causes can lead to data loss, such as disk failures, administration errors, virus infiltration, theft, and physical damage to equipment. Users and businesses have important information that is difficult to replace, such as financial records and contacts. Reliable backups are crucial because some data cannot be replaced, while recreating other data can be expensive in terms of time and money. We propose two methods which leverage various types of free Web storage to provide simple, reliable, and free backup solutions.The first method is based on the storage of data in the caches of Internet search engines. We have developed CrawlBackup, a tool which prepares and provides the data for Web crawlers and can then restore the data from the Internet even if all the data on the original computer is unavailable. The second method, called MailBackup, stores redundant copies of the important data in the mailboxes of Internet mail services. We have successfully used these backup systems since the middle of 2005. In this paper we discuss and compare these methods, their feasibility of deployment, their security, and their flexibility.

Explore More