Effi Ofer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Effi Ofer is active.

Explore More

Publication

Featured researches published by Effi Ofer.

acm international conference on systems and storage | 2018

Keeping deep learning GPUs well fed using object storage

Or Ozeri; Effi Ofer; Ronen I. Kat

In recent years, machine learning and deep learning techniques such as deep neural networks and recurrent neural networks have found uses in diverse fields including computer vision, speech recognition, natural language processing, social network analysis, bioinformatics and medicine, where they have produced results comparable to and in some cases surpassing human experts. Machine learning requires large amount of data for training its models with much of this data residing in object storage, an inexpensive and scalable data store. Also, deep learning make use of state of the art processing capabilities from high-end GPUs and accelerators, such as Google Tensor Processing Units (TPUs), which enable parallel and efficient execution. The throughput that such GPUs can support is very high. This however constitutes an impedance mismatch as the object storage is not designed for high performance data transfers and standard practices for feeding deep learning models from the object storage can result in poor training performance. Furthermore, the typical deep learning framework uses a file access interface, and object storage support a REST based interface with different APIs and semantics than a file system [2]. To fully take advantage of these GPUs and operate at full utilization, frameworks, such as TensorFlow, Cafe, and Torch, needs to deliver data as fast as possible to keep the GPUs busy. This becomes a significant challenge when the training data does not reside in the same machine as the GPUs, as is the case when using object storage, resulting in a utilization challenge for the expensive processing units. To solve the impedance mismatch and keep the processing units fully utilized, we have added a FUSE based file system, S3fs [1], to our deep learning stack. S3fs translates POSIX file API requests into REST API against the object storage. It is an open source project which, as part of this work, we optimized so that read requests are performed using new innovative logic that translates the requests into multiple concurrent range reads requests against the object storage. This enables us to obtain higher throughput from the object storage than is possible using the naive approach. Reads are cached in memory and are served back to the deep learning framework asynchronously. Since deep learning frameworks often run their training in multiple epochs the in memory cache speed is highly beneficial. Our FUSE based architecture has been implemented in the Deep Learning as a Service offering on the IBM Cloud, and our S3fs enhancements have been contributed to the S3fs project repository. Using our architecture we are able to speed up deep learning performance many folds and keep expensive GPUs fully utilized.

acm international conference on systems and storage | 2018

Applying Deep Learning to Object Store Caching

Effi Ofer; Amir Epstein; Dafna Sadeh; Danny Harnik

Cache replacement policies comprise one of the oldest and most researched topic in computer science. But recent advances in the fields of artificial intelligence and machine learning introduce novel insight and new opportunities which can benefit prefetching and cache replacement policies. In recent years the capabilities of artificial intelligence based algorithms have vastly expanded. State of the art artificial intelligence systems are capable of recognize images, predict human behavior, and beat the world champion in the ancient game of Go, by utilizing machine and deep learning techniques that can identify patterns within vast quantities of data. While cache replacement algorithms, such as LRU, ARC, LFU, and others, have been extensively studied, utilizing machine learning techniques to identify what and when to prefetch into the cache, remains an underexplored area. In this paper we make use of machine learning techniques to implement pattern based caching, an algorithm where we are able to identify which objects to prefetch from a multi-tenant cloud based object storage service into a shared cache before they are requested. Our prefetching solution is based on a deep neural network that includes a word embedding phase and a deep learning phase. Word embedding is a technique to map words into vectors in high dimensional space. These vectors, which are capable of capturing syntactic and semantic relationships between words, have been shown to boost natural language processing tasks. Rather than convert words to vectors, we use word embedding to convert objects in the object store into vectors. Objects identifiers, similar to words in a text, have temporal relationships [1]. Thus, a sequence of objects identifiers requests can be analyzed in the same manner as a large corpus of text using word2vec type algorithms to produce object embeddings. Objects vectors are positioned in the vector space such that objects that have time correlations are in close proximity to each other in high dimensional space. Once we generate our object embeddings we then use recurrent neural networks (RNN) to predict relative likelihood of sequence of objects. This provide us with a model which we can then use to predict next object requests given a previous sequence of requests. We tested our approach using simulations on traces taken from a real world publicly available cloud based multi-tenant object store - IBM Cloud Object Storage on the IBM Cloud. Object stores are uniquely fitting for machine learning based prefetching since each object is available with its meta data, enabling machine learning algorithms to take advantage of the semantic relationships contained within. We have implemented our algorithm and tested it on real world data. We studied the benefits and issues involved and compare our results to other cache replacement policies. We built a simulation to compared its hit rate performance to that of an LRU based cache and found that under the right condition it outperforms the LRU. In this poster we will dive into the details of our neural network based algorithm and describe our insight into when our machine learning based prefetching outperforms regular cache replacement algorithms and when it is worse.

symposium on cloud computing | 2017

Stocator: an object store aware connector for apache spark

Gil Vernik; Michael Factor; Elliot K. Kolodner; Effi Ofer; Pietro Michiardi; Francesco Pace

Data is the natural resource of the 21st century. It is being produced at dizzying rates, e.g., for genomics, for media and entertainment, and for Internet of Things. Object storage systems such as Amazon S3, Azure Blob storage, and IBM Cloud Object Storage, are highly scalable distributed storage systems that offer high capacity, cost effective storage. But it is not enough just to store data; we also need to derive value from it. Apache Spark is the leading big data analytics processing engine combining MapReduce, SQL, streaming, and complex analytics. We present Stocator, a high performance storage connector, enabling Spark to work directly on data stored in object storage systems, while providing the same correctness guarantees as Hadoops original storage system, HDFS. Current object storage connectors from the Hadoop community, e.g., for the S3 and Swift APIs, do not deal well with eventual consistency, which can lead to failure. These connectors assume file system semantics, which is natural given that their model of operation is based on interaction with HDFS. In particular, Spark and Hadoop achieve fault tolerance and enable speculative execution by creating temporary files, listing directories to identify these files, and then renaming them. This paradigm avoids interference between tasks doing the same work and thus writing output with the same name. However, with eventually consistent object storage, a container listing may not yet include a recently created object, and thus an object may not be renamed, leading to incomplete or incorrect results. Solutions such as EMRFS [1] from Amazon, S3mper [4] from Netflix, and S3Guard [2], attempt to overcome eventual consistency by requiring additional strongly consistent data storage. These solutions require multiple storage systems, are costly, and can introduce issues of consistency between the stores. Current object storage connectors from the Hadoop community are also notorious for their poor performance for write workloads. This, too, stems from their use of the rename operation, which is not a native object storage operation; not only is it not atomic, but it must be implemented using a costly copy operation, followed by delete. Others have tried to improve the performance of object storage connectors by eliminating rename, e.g., the Direct-ParquetOutputCommitter [5] for S3a introduced by Databricks, but have failed to preserve fault tolerance and speculation. Stocator takes advantage of object storage semantics to achieve both high performance and fault tolerance. It eliminates the rename paradigm by writing each output object to its final name. The name includes both the part number and the attempt number, so that multiple attempts to write the same part use different objects. Stocator proposes to extend an already existing success indicator object written at the end of a Spark job, to include a manifest with the names of all the objects that compose the final output; this ensures that a subsequent job will correctly read the output, without resorting to a list operation whose results may not be consistent. By leveraging the inherent atomicity of object creation and using a manifest we obtain fault tolerance and enable speculative execution; by avoiding the rename paradigm we greatly decrease the complexity of the connector and the number of operations on the object storage. We have implemented our connector and shared it in open source [3]. We have compared its performance with the S3a and Hadoop Swift connectors over a range of workloads and found that it executes many fewer operations on the object storage, in some cases as few as one thirtieth. Since the price for an object storage service typically includes charges based on the number of operations executed, this reduction in operations lowers the costs for clients in addition to reducing the load on client software. It also reduces costs and load for the object storage provider since it can serve more clients with the same amount of processing power. Stocator also substantially increases performance for Spark workloads running over object storage, especially for write intensive workloads, where it is as much as 18 times faster.

acm international conference on systems and storage | 2017

Stocator: a high performance object store connector for spark

Gil Vernik; Michael Factor; Elliot K. Kolodner; Effi Ofer; Pietro Michiardi; Francesco Pace

Data is the natural resource of the 21st century. It is being produced at dizzying rates, e.g., for genomics by sequencers, for Media and Entertainment with very high resolution formats, and for Internet of Things (IoT) by multitudes of sensors. Object Stores such as AWS S3, Azure Blob storage, and IBM Cloud Object Storage, are highly scalable distributed storage systems that offer high capacity, cost effective storage for this data. But it is not enough just to store data; we also need to derive value from it. Apache Spark is the leading big data analytics processing engine. It runs up to one hundred times faster than Hadoop MapReduce and combines SQL, streaming and complex analytics. In this poster we present Stocator, a high performance storage connector, that enables Spark to work directly on data stored in object storage systems.

Archive | 2004