Dana Shapira | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dana Shapira is active.

Explore More

Publication

Featured researches published by Dana Shapira.

Discrete Applied Mathematics | 2016

Random access to Fibonacci encoded files

Shmuel T. Klein; Dana Shapira

A Wavelet tree is a data structure adjoined to a file that has been compressed by a variable length encoding, which allows direct access to the underlying file, resulting in the fact that the compressed file is not needed any more. We adapt, in this paper, the Wavelet tree to Fibonacci codes, so that in addition to supporting direct access to the Fibonacci encoded file, we also increase the compression savings when compared to the original Fibonacci compressed file. The improvements are achieved by means of a new pruning technique.

Journal of Discrete Algorithms | 2017

A space efficient direct access data structure

Gilad Baruch; Shmuel T. Klein; Dana Shapira

In previous work we have suggested a data structure based on pruning a Huffman shaped Wavelet tree according to the underlying skeleton Huffman tree. This pruned Wavelet tree was especially designed to support faster random access and save memory storage, at the price of less effective rank and select operations, as compared to the original Huffman shaped Wavelet tree. In this paper we improve the pruning procedure and give empirical evidence that when memory storage is of main concern, our suggested data structure outperforms other direct access techniques such as those due to Külekci, DACs and sampling, with a slowdown as compared to DACs and fixed length encoding.

Theoretical Computer Science | 2016

Compressed matching for feature vectors

Shmuel T. Klein; Dana Shapira

The problem of compressing a large collection of feature vectors is investigated, so that object identification can be processed on the compressed form of the features. The idea is to perform matching of a query image against an image database, using directly the compressed form of the descriptor vectors, without decompression. Specifically, we concentrate on the Scale Invariant Feature Transform (SIFT), a known object detection method, as well as on Dense SIFT and PHOW features, that contain, for each image, about 300 times as many vectors as the original SIFT. Given two feature vectors, we suggest achieving our goal by compressing them using a lossless encoding by means of a Fibonacci code, for which the pairwise matching can be done directly on the compressed files. In our experiments, this approach improves the processing time and incurs only a small loss in compression efficiency relative to standard compressors requiring a decoding phase.

The Computer Journal | 2018

Context Sensitive Rewriting Codes for Flash Memory

Shmuel T. Klein; Dana Shapira

Writing data on flash memory is asymmetric in the sense that it is possible to change a 0 into a 1-bit, but erasing a 1 back to value 0 is much more expensive and can only be done in blocks. This has triggered the development of rewriting codes in which new data can overwrite the old one, subject to the constraint of never changing a 1 into a zero. The notion of context-sensitive rewriting codes is introduced and we bring here the analysis of the compression performance of a family of such codes, based on generalizations of the Fibonacci sequence. This is then compared with experimental results.

Archive | 2018

Applying Compression to Hierarchical Clustering

Gilad Baruch; Shmuel T. Klein; Dana Shapira

Hierarchical Clustering is widely used in Machine Learning and Data Mining. It stores bit-vectors in the nodes of a k-ary tree, usually without trying to compress them. We suggest a data compression application of hierarchical clustering with a double usage of the xoring operations defining the Hamming distance used in the clustering process, extending it also to be used to transform the vector in one node into a more compressible form, as a function of the vector in the parent node. Compression is then achieved by run-length encoding, followed by optional Huffman coding, and we show how the compressed file may be processed directly, without decompression.

string processing and information retrieval | 2017

Optimal Skeleton Huffman Trees

Shmuel T. Klein; Tamar C. Serebro; Dana Shapira

A skeleton Huffman tree is a Huffman tree from which all complete subtrees of depth \(h \ge 1\) have been pruned. Skeleton Huffman trees are used to save storage and enhance processing time in several applications such as decoding, compressed pattern matching and Wavelet trees for random access. However, the straightforward way of basing the construction of a skeleton tree on a canonical Huffman tree does not necessarily yield the least number of nodes. The notion of optimal skeleton trees is introduced, and an algorithm for achieving such trees is investigated. The resulting more compact trees can be used to further enhance the time and space complexities of the corresponding algorithms.

language and automata theory and applications | 2017

Integrated Encryption in Dynamic Arithmetic Compression

Shmuel T. Klein; Dana Shapira

A compression cryptosystem based on adaptive arithmetic coding is proposed, in which the updates of the frequency tables for the underlying alphabet are done selectively, according to some secret key K. We give empirical evidence that the compression performance is not hurt, and discuss also aspects of the system being used as an encryption method.

international conference on technologies and applications of artificial intelligence | 2014

Expert-Based Fusion Algorithm of an Ensemble of Anomaly Detection Algorithms

Esther David; Guy Leshem; Michal Chalamish; Alvin Chiang; Dana Shapira

Data fusion systems are widely used in various areas such as sensor networks, robotics, video and image processing, and intelligent system design. Data fusion is a technology that enables the process of combining information from several sources in order to form a unified picture or a decision. Today, anomaly detection algorithms (ADAs) are in use in a wide variety of applications (e.g. cyber security systems, etc.). In particular, in this research we focus on the process of integrating the output of multiple ADAs that perform within a particular domain. More specifically, we propose a two stage fusion process, which is based on the expertise of the individual ADA that is derived in the first step. The main idea of the proposed method is to identify multiple types of outliers and to find a set of expert outlier detection algorithms for each type. We propose to use semi-supervised methods. Preliminary experiments for the single-type outlier case are provided where we show that our method outperforms other benchmark methods that exist in the literature.

Discrete Applied Mathematics | 2018