Kave Eshghi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kave Eshghi is active.

Explore More

Publication

Featured researches published by Kave Eshghi.

modeling, analysis, and simulation on computer and telecommunication systems | 2009

Extreme Binning: Scalable, parallel deduplication for chunk-based file backup

Deepavali Bhagwat; Kave Eshghi; Darrell D. E. Long; Mark David Lillibridge

Data deduplication is an essential and critical component of backup systems. Essential, because it reduces storage space requirements, and critical, because the performance of the entire backup operation depends on its throughput. Traditional backup workloads consist of large data streams with high locality, which existing deduplication techniques require to provide reasonable throughput. We present Extreme Binning, a scalable deduplication technique for non-traditional backup workloads that are made up of individual files with no locality among consecutive files in a given window of time. Due to lack of locality, existing techniques perform poorly on these workloads. Extreme Binning exploits file similarity instead of locality, and makes only one disk access for chunk lookup per file, which gives reasonable throughput. Multi-node backup systems built with Extreme Binning scale gracefully with the amount of input data; more backup nodes can be added to boost throughput. Each file is allocated using a stateless routing algorithm to only one node, allowing for maximum parallelization, and each backup node is autonomous with no dependency across nodes, making data management tasks robust with low overhead.

knowledge discovery and data mining | 2005

Finding similar files in large document repositories

George Forman; Kave Eshghi; Stephane Chiocchetti

Hewlett-Packard has many millions of technical support documents in a variety of collections. As part of content management, such collections are periodically merged and groomed. In the process, it becomes important to identify and weed out support documents that are largely duplicates of newer versions. Doing so improves the quality of the collection, eliminates chaff from search results, and improves customer satisfaction.The technical challenge is that through workflow and human processes, the knowledge of which documents are related is often lost. We required a method that could identify similar documents based on their content alone, without relying on metadata, which may be corrupt or missing.We present an approach for finding similar files that scales up to large document repositories. It is based on chunking the byte stream to find unique signatures that may be shared in multiple files. An analysis of the file-chunk graph yields clusters of related files. An optional bipartite graph partitioning algorithm can be applied to greatly increase scalability.

knowledge discovery and data mining | 2008

Locality sensitive hash functions based on concomitant rank order statistics

Kave Eshghi; Shyam Sundar Rajaram

Locality Sensitive Hash functions are invaluable tools for approximate near neighbor problems in high dimensional spaces. In this work, we are focused on LSH schemes where the similarity metric is the cosine measure. The contribution of this work is a new class of locality sensitive hash functions for the cosine similarity measure based on the theory of concomitants, which arises in order statistics. Consider n i.i.d sample pairs, {(X1; Y1); (X2; Y2); : : : ;(Xn; Yn)} obtained from a bivariate distribution f(X, Y). Concomitant theory captures the relation between the order statistics of X and Y in the form of a rank distribution given by Prob(Rank(Yi)=j-Rank(Xi)=k). We exploit properties of the rank distribution towards developing a locality sensitive hash family that has excellent collision rate properties for the cosine measure. The computational cost of the basic algorithm is high for high hash lengths. We introduce several approximations based on the properties of concomitant order statistics and discrete transforms that perform almost as well, with significantly reduced computational cost. We demonstrate the practical applicability of our algorithms by using it for finding similar images in an image repository.

Operating Systems Review | 2009

Efficient detection of large-scale redundancy in enterprise file systems

George Forman; Kave Eshghi; Jaap Suermondt

In order to catch and reduce waste in the exponentially increasing demand for disk storage, we have developed very efficient technology to detect approximate duplication of large directory hierarchies. Such duplication can be caused, for example, by unnecessary mirroring of repositories by uncoordinated employees or departments. Identifying these duplicate or near-duplicate hierarchies allows appropriate action to be taken at a high level. For example, one could coordinate and consolidate multiple copies in one location.

international workshop on advanced issues of e commerce and web based information systems wecwis | 2002

WebMon: A performance profiler for web transactions

Thomas Gschwind; Kave Eshghi; Pankaj K. Garg; Klaus Wurster

We describe WebMon, a tool for correlated, transaction-oriented performance monitoring of web services. Data collected with WebMon can be analyzed from a variety of perspectives: business, client, transaction, or systems. Maintainers of web services can use such analysis to better understand and manage the performance of their services. Moreover WebMons data will enable the construction of more accurate performance prediction models for web services. Current web logging techniques create a log file per server making it difficult to correlate data from log files with respect to a given transaction. Additionally, data about the quality of service perceived by the client is missing entirely. WebMon overcomes these limitations by providing heterogenous instrumentation sensors and HTTP cookie-based correlators. In this paper, we present the design and implementation of of WebMon and our experience in applying WebMon to an HP Library web service.

international conference on distributed computing systems | 2002

Intrinsic references in distributed systems

Kave Eshghi

The notion of intrinsic references, i.e. references based on the hash digest of the referent, is introduced and contrasted with that of physical references, where the referent is defined relative to the state of a physical system. A retrieval mechanism using intrinsic references, the Elephant Store, is presented. The use of intrinsic references in hierarchical data structures is discussed, and the advantages regarding version management, consistency and distributed storage are argued.

integrated network management | 1995

Managing in a distributed world

Adrian Pell; Kave Eshghi; Jean-Jacques Moreau; Simon Towers

The task of networked systems management has become increasingly complex in recent years. Reducing this complexity and permitting easy management are major challenges to the acceptance of networked systems and applications. This paper introduces a language for describing these systems and applications and gives an example of its use.

IEEE Transactions on Multimedia | 2014

Discrete Cosine Transform Locality-Sensitive Hashes for Face Retrieval

Mehran Kafai; Kave Eshghi; Bir Bhanu

Descriptors such as local binary patterns perform well for face recognition. Searching large databases using such descriptors has been problematic due to the cost of the linear search, and the inadequate performance of existing indexing methods. We present Discrete Cosine Transform (DCT) hashing for creating index structures for face descriptors. Hashes play the role of keywords: an index is created, and queried to find the images most similar to the query image. Common hash suppression is used to improve retrieval efficiency and accuracy. Results are shown on a combination of six publicly available face databases (LFW, FERET, FEI, BioID, Multi-PIE, and RaFD). It is shown that DCT hashing has significantly better retrieval accuracy and it is more efficient compared to other popular state-of-the-art hash algorithms.

knowledge discovery and data mining | 2007

Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus

Deepavali Bhagwat; Kave Eshghi; Pankaj Mehra

We present a document routing and index partitioning scheme for scalable similarity-based search of documents in a large corpus. We consider the case when similarity-based search is performed by finding documents that have features in common with the query document. While it is possible to store all the features of all the documents in one index, this suffers from obvious scalability problems. Our approach is to partition the feature index into multiple smaller partitions that can be hosted on separate servers, enabling scalable and parallel search execution. When a document is ingested into the repository, a small number of partitions are chosen to store the features of the document. To perform similarity-based search, also, only a small number of partitions are queried. Our approach is stateless and incremental. The decision as to which partitions the features of the document should be routed to (for storing at ingestion time, and for similarity based search at query time) is solely based on the features of the document. Our approach scales very well. We show that executing similarity-based searches over such a partitioned search space has minimal impact on the precision and recall of search results, even though every search consults less than 3% of the total number of partitions.

Lecture Notes in Computer Science | 2002

Enabling Network Caching of Dynamic Web Objects

Pankaj K. Garg; Kave Eshghi; Thomas Gschwind; Boudewijn R. Haverkort; Katinka Wolter

The World Wide Web is an important infrastructure for enabling modern information-rich applications. Businesses can lose value due to lack of timely employee communication, poor employee coordination, or poor brand image with slow or unresponsive web applications. In this paper, we analyze the responsiveness of an Intranet web application, i.e., an application within the corporate firewalls. Using a new Web monitoring tool called Web Mon, we found, contrary to our initial expectations, substantial variations in the responsiveness for different users of the Intranet Web application. As in the Internet, traditional caching approaches could improve the responsiveness of the Intranet web-application, as far as static objects are concerned. We provide a solution to enable network caching of dynamic web objects, which ordinarily would not be cached by clients and proxies. Overall, our solution significantly improved the performance of the web application and reduced the variance in the response times by three orders of magnitude. Our cache enabling architecture can be used in other web applications.

Explore More