Kyumars Sheykh Esmaili

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kyumars Sheykh Esmaili is active.

Explore More

Publication

Featured researches published by Kyumars Sheykh Esmaili.

international conference on big data | 2013

CORE: Cross-object redundancy for efficient data repair in storage systems

Kyumars Sheykh Esmaili; Lluis Pamies-Juarez; Anwitaman Datta

Erasure codes are an integral part of many distributed storage systems aimed at Big Data, since they provide high fault-tolerance for low overheads. However, traditional erasure codes are inefficient on replenishing lost data (vital for long term resilience) and on reading stored data in degraded environments (when nodes might be unavailable). Consequently, novel codes optimized to cope with distributed storage system nuances are vigorously being researched. In this paper, we take an engineering alternative, exploring the use of simple and mature techniques - juxtaposing a standard erasure code with RAID-4 like parity to realize cross object redundancy (CORE), and integrate it with HDFS. We benchmark the implementation in a proprietary cluster and in EC2. Our experiments show that for an extra 20% storage overhead (compared to traditional erasure codes) CORE yields up to 58% saving in bandwidth and is up to 76% faster while recovering a single failed node. The gains are respectively 16% and 64% for double node failures.

distributed event-based systems | 2013

Ariadne: managing fine-grained provenance on data streams

Boris Glavic; Kyumars Sheykh Esmaili; Peter Fischer; Nesime Tatbul

Managing fine-grained provenance is a critical requirement for data stream management systems (DSMS), not only to address complex applications that require diagnostic capabilities and assurance, but also for providing advanced functionality such as revision processing or query debugging. This paper introduces a novel approach that uses operator instrumentation, i.e., modifying the behavior of operators, to generate and propagate fine-grained provenance through several operators of a query network. In addition to applying this technique to compute provenance eagerly during query execution, we also study how to decouple provenance computation from query processing to reduce run-time overhead and avoid unnecessary provenance retrieval. This includes computing a concise superset of the provenance to allow lazily replaying a query network and reconstruct its provenance as well as lazy retrieval to avoid unnecessary reconstruction of provenance. We develop stream-specific compression methods to reduce the computational and storage overhead of provenance generation and retrieval. Ariadne, our provenance-aware extension of the Borealis DSMS implements these techniques. Our experiments confirm that Ariadne manages provenance with minor overhead and clearly outperforms query rewrite, the current state-of-the-art.

2013 ACS International Conference on Computer Systems and Applications (AICCSA) | 2013

Building a Test Collection for Sorani Kurdish

Kyumars Sheykh Esmaili; Donya Eliassi; Shahin Salavati; Purya Aliabadi; Asrin Mohammadi; Somayeh Yosefi; Shownem Hakimi

Despite having a large number of speakers, Sorani - one of the two principle branches of the Kurdish language - is among the less-resourced languages. This paper reports on the outcomes of a project aimed at providing the essential resources for processing Sorani texts. The primary output of this project is Pewan, the first standard Test Collection to evaluate Sorani Information Retrieval systems. The other language resources that we have constructed in this project are: (i) a light-stemmer, (ii) a list of affixes, and (iii) a list of stopwords. We also used these newly-built resources to study the effectiveness of basic IR strategies on Sorani documents. Our experimental results show that normalization and, to a lesser extent, stemming can greatly improve the performance of Sorani IR systems.

ACM Transactions on Internet Technology | 2014

Efficient Stream Provenance via Operator Instrumentation

Boris Glavic; Kyumars Sheykh Esmaili; Peter Fischer; Nesime Tatbul

Managing fine-grained provenance is a critical requirement for data stream management systems (DSMS), not only for addressing complex applications that require diagnostic capabilities and assurance, but also for providing advanced functionality, such as revision processing or query debugging. This article introduces a novel approach that uses operator instrumentation, that is, modifying the behavior of operators, to generate and propagate fine-grained provenance through several operators of a query network. In addition to applying this technique to compute provenance eagerly during query execution, we also study how to decouple provenance computation from query processing to reduce runtime overhead and avoid unnecessary provenance retrieval. Our proposals include computing a concise superset of the provenance (to allow lazily replaying a query and reconstruct its provenance) as well as lazy retrieval (to avoid unnecessary reconstruction of provenance). We develop stream-specific compression methods to reduce the computational and storage overhead of provenance generation and retrieval. Ariadne, our provenance-aware extension of the Borealis DSMS implements these techniques. Our experiments confirm that Ariadne manages provenance with minor overhead and clearly outperforms query rewrite, the current state of the art.

international conference on big data | 2013

Efficient updates in cross-object erasure-coded storage systems

Kyumars Sheykh Esmaili; Aatish Chiniah; Anwitaman Datta

In the past few years erasure codes have been increasingly embraced by distributed storage systems as an alternative for replication, since they provide high fault-tolerance for low overheads. Erasure codes, however, have few shortcomings that need to be addressed to make them a complete solution for networked storage systems. Lack of support for efficient data repair and data update are the two most notable shortcomings. We recently proposed to use a 2-dimensional product code-Reed-Solomon coding per object and simple XORing across objects- and showed that at a reasonable storage overhead, it can greatly reduce the repair cost. In this paper we propose an efficient approach to handle data updates in cross-object erasure-coded storage systems. Our proposed solution has been implemented and experimentally evaluated. Our results show that compared to the naive approach (re-encoding the data), our proposed scheme can considerably decrease the update cost, especially for when the number of updated blocks is small.

ACM Transactions on Asian Language Information Processing | 2014

Towards Kurdish Information Retrieval

Kyumars Sheykh Esmaili; Shahin Salavati; Anwitaman Datta

The Kurdish language is an Indo-European language spoken in Kurdistan, a large geographical region in the Middle East. Despite having a large number of speakers, Kurdish is among the less-resourced languages and has not seen much attention from the IR and NLP research communities. This article reports on the outcomes of a project aimed at providing essential resources for processing Kurdish texts. A principal output of this project is Pewan, the first standard Test Collection to evaluate Kurdish Information Retrieval systems. The other language resources that we have built include a lightweight stemmer and a list of stopwords. Our second principal contribution is using these newly-built resources to conduct a thorough experimental study on Kurdish documents. Our experimental results show that normalization, and to a lesser extent, stemming, can greatly improve the performance of Kurdish IR systems.

asia information retrieval symposium | 2013

Stemming for Kurdish Information Retrieval

Shahin Salavati; Kyumars Sheykh Esmaili; Fardin Akhlaghian

Resource scarcity along with diversity –in both dialect and script– are the two primary challenges in Kurdish language processing. In this paper we aim at addressing these two problems by building stemmers for the two main dialects of the Kurdish language (i.e. Sorani and Kurmanji) and investigate their effectiveness on Kurdish Information Retrieval.

arXiv: Distributed, Parallel, and Cluster Computing | 2013