Oded Margalit | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Oded Margalit is active.

Explore More

Publication

Featured researches published by Oded Margalit.

ieee conference on mass storage systems and technologies | 2012

Estimation of deduplication ratios in large data sets

Danny Harnik; Oded Margalit; Dalit Naor; Dmitry Sotnikov; Gil Vernik

We study the problem of accurately estimating the data reduction ratio achieved by deduplication and compression on a specific data set. This turns out to be a challenging task - It has been shown both empirically and analytically that essentially all of the data at hand needs to be inspected in order to come up with a accurate estimation when deduplication is involved. Moreover, even when permitted to inspect all the data, there are challenges in devising an efficient, yet accurate, method. Efficiency in this case refers to the demanding CPU, memory and disk usage associated with deduplication and compression. Our study focuses on what can be done when scanning the entire data set. We present a novel two-phased framework for such estimations. Our techniques are provably accurate, yet run with very low memory requirements and avoid overheads associated with maintaining large deduplication tables. We give formal proofs of the correctness of our algorithm, compare it to existing techniques from the database and streaming literature and evaluate our technique on a number of real world workloads. For example, we estimate the data reduction ratio of a 7 TB data set with accuracy guarantees of at most a 1% relative error while using as little as 1 MB of RAM (and no additional disk access). In the interesting case of full-file deduplication, our framework readily accepts optimizations that allow estimation on a large data set without reading most of the actual data. For one of the workloads we used in this work we achieved accuracy guarantee of 2% relative error while reading only 27% of the data from disk. Our technique is practical, simple to implement, and useful for multiple scenarios, including estimating the number of disks to buy, choosing a deduplication technique, deciding whether to dedupe or not dedupe and conducting large-scale academic studies related to deduplication ratios.

theory and applications of satisfiability testing | 2012

Perfect hashing and CNF encodings of cardinality constraints

Yael Ben-Haim; Alexander Ivrii; Oded Margalit; Arie Matsliah

We study the problem of encoding cardinality constraints (threshold functions) on Boolean variables into CNF. Specifically, we propose new encodings based on (perfect) hashing that are efficient in terms of the number of clauses, auxiliary variables, and propagation strength. We compare the properties of our encodings to known ones, and provide experimental results evaluating their practical effectiveness.

Ibm Journal of Research and Development | 2011

Smarter log analysis

Ehud Aharoni; Shai Fine; Yaara Goldschmidt; Ofer Lavi; Oded Margalit; Michal Rosen-Zvi; Lavi Shpigelman

Modern computer systems generate an enormous number of logs. IBM Mining Effectively Large Output Data Yield (MELODY) is a unique and innovative solution for handling these logs and filtering out the anomalies and failures. MELODY can detect system errors early on and avoid subsequent crashes by identifying the root causes of such errors. By analyzing the logs leading up to a problem, MELODY can pinpoint when and where things went wrong and visually present them to the user, ensuring that corrections are accurately and effectively done. We present the MELODY solution and describe its architecture, algorithmic components, functions, and benefits. After being trained on a large portion of relevant data, MELODY provides alerts of abnormalities in newly arriving log files or in streams of logs. The solution is being used by IBM services groups that support IBM xSeries® servers on a regular basis. MELODY was recently tested with ten large IBM customers who use zSeries® machines and was found to be extremely useful for the information technology experts in those companies. They found that the solutions ability to reduce extensively large log data to manageable sets of highlighted messages saved them time and helped them make better use of the data.

information theory and applications | 2014

On the riddle of coding equality function in the garden hose model

Oded Margalit

Recently, Harry Buhrman et al introduced a novel communication complexity model, called “garden-hose”. This model sprout from a research on using quantum properties to allow for position based cryptography. SAT is one of the fundamental NP-Complete problem - finding a satisfying assignment to a CNF (conjunctive Normal Form) formula, or proving that none exists. We will not get into the details of the quantum physics, neither we are going to explain the internal work of SAT solver. Instead we will start from the mathematical garden hose model; describe the way we used SAT solver as a tool; give some lower and upper bounds on implementing the equality function; and conclude with open questions.

international conference on software testing verification and validation workshops | 2013

Better Bounds for Event Sequencing Testing

Oded Margalit

A permutation of a sequence of events is a common construction in many testing environments. Covering all possible permutations has clearly an exponential behavior; so one can ask for partial (easier) requirement, to cover all possible orders: permutations induced on of a small cardinality subset of elements. In our paper we show better (both lower and upper) bounds on this event sequencing testing problem. We also discuss another variant of the problem where we impose restrictions on the permutations. In this case we show how to achieve an exponential growth in the complexity of the problem. We also give solutions for specific cases of the problem.

acm special interest group on data communication | 2017

Cluster-Based Load Balancing for Better Network Security

Gal Frishman; Yaniv Ben-Itzhak; Oded Margalit

In the big-data era, the amount of traffic is rapidly increasing. Therefore, scaling methods are commonly used. For instance, an appliance composed of several instances (scaled-out method), and a load-balancer that distributes incoming traffic among them. While the most common way of load balancing is based on round robin, some approaches optimize the load across instances according to the appliance-specific functionality. For instance, load-balancing for scaled-out proxy-server that increases the cache hit ratio. In this paper, we present a novel load-balancing approach for machine-learning based security appliances. Our proposed load-balancer uses clustering method while keeping balanced load across all of the network security appliances instances. We demonstrate that our approach is scalable and improves the machine-learning performance of the instances, as compared to traditional load-balancers.

Archive | 2014

Pareto Landscapes Analyses via Graph-Based Modeling for Interactive Decision-Making

Ofer Shir; Shahar Chen; David Amid; Oded Margalit; Michael Masin; Ateret Anaby-Tavor; David Boaz

We consider two complementary tasks for consuming optimization results of a given multiobjective problem by decision-makers. The underpinning in both exploratory tasks is analyzing Pareto landscapes, and we propose in both cases discrete graph-based reductions. Firstly, we introduce interactive navigation from a given suboptimal reference solution to Pareto efficient solution-points. The proposed traversal mechanism is based upon landscape improvement-transitions from the reference towards Pareto-dominating solutions in a baby-steps fashion – accepting relatively small variations in the design-space. The Efficient Frontier and the archive of Pareto suboptimal points are to be obtained by population-based multiobjective solvers, such as Evolutionary Multiobjective Algorithms. Secondly, we propose a framework for automatically recommending a preferable subset of points belonging to the Frontier that accounts for the decision-maker’s tendencies. We devise a line of action that activates one of two approaches: either recommending the top offensive team – the gain-prone subset of points, or the top defensive team – the loss-averse subset of points. We describe the entire recommendation process and formulate mixed-integer linear programs for solving its combinatorial graph-based problems.

International Symposium on Cyber Security Cryptography and Machine Learning | 2018

Brief Announcement: Adversarial Evasion of an Adaptive Version of Western Electric Rules.

Oded Margalit

Western-Electric are one of the earliest, and widely used, anomaly detection rules. In this paper we describe an adaptive scenario using these rules and show how a malicious player can optimally fabricate data to deceive the algorithm to enlarge the standard deviation of the data while avoiding being detected.

haifa verification conference | 2014

Generating Modulo-2 Linear Invariants for Hardware Model Checking

Gadi Aleksandrowicz; Alexander Ivrii; Oded Margalit; Dan Rasin

We present an algorithm to automatically extract inductive modulo-2 linear invariants from a design. This algorithm makes use of basic linear algebra and is realized on top of an incremental SAT solver. The experimental results demonstrate that a large number of designs possess linear invariants that can be efficiently found by our method. We study how these invariants can be helpful in the contexts of model checking and synthesis.

algorithmic learning theory | 2008

Finding the Rare Cube

Shlomo Hoory; Oded Margalit

In this paper we investigate the problem of active learning the partition of the n-dimensional hypercube into mcubes, where the i-th cube has color i. The model we are using is exact learning via color evaluation queries , without equivalence queries, as proposed by the work of Fine and Mansour. We give a randomized algorithm solving this problem in O(mlogn) expected number of queries, which is tight, while its expected running time is O(m2nlogn). Furthermore, we generalize the problem to allow partitions of the cube into mmonochromatic parts, where each part is the union of pcubes. We give two randomized algorithms for the generalized problem. The first uses O(mp22plogn) expected number of queries, which is almost tight with the lower bound. However, its naive implementation requires an exponential running time in n. The second, more practical, algorithm achieves a better running time complexity of

Explore More