Ran Wolff | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ran Wolff is active.

Explore More

Publication

Featured researches published by Ran Wolff.

IEEE Internet Computing | 2006

Distributed Data Mining in Peer-to-Peer Networks

Souptik Datta; Kanishka Bhaduri; Chris Giannella; Ran Wolff; Hillol Kargupta

Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact, well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data, computing nodes, and users. This article offers an overview of DDM applications and algorithms for P2P environments, focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner

Knowledge and Information Systems | 2013

In-network outlier detection in wireless sensor networks

Joel W. Branch; Chris Giannella; Boleslaw K. Szymanski; Ran Wolff; Hillol Kargupta

To address the problem of unsupervised outlier detection in wireless sensor networks, we develop an approach that (1) is flexible with respect to the outlier definition, (2) computes the result in-network to reduce both bandwidth and energy consumption, (3) uses only single-hop communication, thus permitting very simple node failure detection and message reliability assurance mechanisms (e.g., carrier-sense), and (4) seamlessly accommodates dynamic updates to data. We examine performance by simulation, using real sensor data streams. Our results demonstrate that our approach is accurate and imposes reasonable communication and power consumption demands.

international conference on management of data | 2001

Communication-efficient distributed mining of association rules

Assaf Schuster; Ran Wolff

Mining for associations between items in large transactional databases is a central problem in the field of knowledge discovery. When the database is partitioned among several share-nothing machines, the problem can be addressed using distributed data mining algorithms. One such algorithm, called CD, was proposed by Agrawal and Shafer in [1] and was later enhanced by the FDM algorithm of Cheung, Han et al. [5]. The main problem with these algorithms is that they do not scale well with the number of partitions. They are thus impractical for use in modern distributed environments such as peer-to-peer systems, in which hundreds or thousands of computers may interact. In this paper we present a set of new algorithms that solve the Distributed Association Rule Mining problem using far less communication. In addition to being very efficient, the new algorithms are also extremely robust. Unlike existing algorithms, they continue to be efficient even when the data is skewed or the partition sizes are imbalanced. We present both experimental and theoretical results concerning the behavior of these algorithms and explain how they can be implemented in different settings.

very large data bases | 2008

Providing k-anonymity in data mining

Arik Friedman; Ran Wolff; Assaf Schuster

In this paper we present extended definitions of k-anonymity and use them to prove that a given data mining model does not violate the k-anonymity of the individuals represented in the learning examples. Our extension provides a tool that measures the amount of anonymity retained during data mining. We show that our model can be applied to various data mining problems, such as classification, association rule mining and clustering. We describe two data mining algorithms which exploit our extension to guarantee they will generate only k-anonymous output, and provide experimental results for one of them. Finally, we show that our method contributes new and efficient ways to anonymize data and preserve patterns during anonymization.

international conference on data mining | 2003

A high-performance distributed algorithm for mining association rules

Assaf Schuster; Ran Wolff; Dan Trock

We present a new distributed association rule mining (D-ARM) algorithm that demonstrates superlinear speed-up with the number of computing nodes. The algorithm is the first D-ARM algorithm to perform a single scan over the database. As such, its performance is unmatched by any previous algorithm. Scale-up experiments over standard synthetic benchmarks demonstrate stable run time regardless of the number of computers. Theoretical analysis reveals a tighter bound on error probability than the one shown in the corresponding sequential algorithm. As a result of this tighter bound and by utilizing the combined memory of several computers, the algorithm generates far fewer candidates than comparable sequential algorithms—the same order of magnitude as the optimum.

international conference on data mining | 2003

Association rule mining in peer-to-peer systems

Ran Wolff; Assaf Schuster

We extend the problem of association rule mining-a key data mining problem-to systems in which the database is partitioned among a very large number of computers that are dispersed over a wide area. Such computing systems include grid computing platforms, federated database systems, and peer-to-peer computing environments. The scale of these systems poses several difficulties, such as the impracticality of global communications and global synchronization, dynamic topology changes of the network, on-the-fly data updates, the need to share resources with other applications, and the frequent failure and recovery of resources. We present an algorithm by which every node in the system can reach the exact solution, as if it were given the combined database. The algorithm is entirely asynchronous, imposes very little communication overhead, transparently tolerates network topology changes and node failures, and quickly adjusts to changes in the data as they occur. Simulation of up to 10 000 nodes show that the algorithm is local: all rules, except for those whose confidence is about equal to the confidence threshold, are discovered using information gathered from a very small vicinity, whose size is independent of the size of the system.We extend the problem of association rule mining - a key data mining problem - to systems in which the database is partitioned among a very large number of computers that are dispersed over a wide area. Such computing systems include GRID computing platforms, federated database systems, and peer-to-peer computing environments. The scale of these systems poses several difficulties, such as the impracticality of global communications and global synchronization, dynamic topology changes of the network, on-the-fly data updates, the need to share resources with other applications, and the frequent failure and recovery of resources. We present an algorithm by which every node in the system can reach the exact solution, as if it were given the combined database. The algorithm is entirely asynchronous, imposes very little communication overhead, transparently tolerates network topology changes and node failures, and quickly adjusts to changes in the data as they occur. Simulation of up to 10000 nodes show that the algorithm is local: all rules, except for those whose confidence is about equal to the confidence threshold, are discovered using information gathered from a very small vicinity, whose size is independent of the size of the system.

knowledge discovery and data mining | 2004

k-TTP: a new privacy model for large-scale distributed environments

Bobi Gilburd; Assaf Schuster; Ran Wolff

Secure multiparty computation allows parties to jointly compute a function of their private inputs without revealing anything but the output. Theoretical results [2] provide a general construction of such protocols for any function. Protocols obtained in this way are, however, inefficient, and thus, practically speaking, useless when a large number of participants are involved.The contribution of this paper is to define a new privacy model -- k-privacy -- by means of an innovative, yet natural generalization of the accepted trusted third party model. This allows implementing cryptographically secure efficient primitives for real-world large-scale distributed systems.As an example for the usefulness of the proposed model, we employ k-privacy to introduce a technique for obtaining knowledge -- by way of an association-rule mining algorithm -- from large-scale Data Grids, while ensuring that the privacy is cryptographically secure.

IEEE Transactions on Knowledge and Data Engineering | 2009

A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems

Ran Wolff; Kanishka Bhaduri; Hillol Kargupta

In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the systems functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the outcome of the function (e.g., the load experienced by each peer) as the \emph{model} of the system. Since the state of the system is constantly changing, it is necessary to keep the models up-to-date. Computing global data mining models e.g. decision trees, k-means clustering in large distributed systems may be very costly due to the scale of the system and due to communication cost, which may be high. The cost further increases in a dynamic scenario when the data changes rapidly. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient \emph{local} algorithm which can be used to monitor a wide class of data mining models. Then, we use this algorithm as a feedback loop for the monitoring of complex functions of the data such as its k-means clustering. The theoretical claims are corroborated with a thorough experimental analysis.

distributed computing in sensor systems | 2005

A local facility location algorithm for sensor networks

Denis Krivitski; Assaf Schuster; Ran Wolff

In this paper we address a well-known facility location problem (FLP) in a sensor network environment. The problem deals with finding the optimal way to provide service to a (possibly) very large number of clients. We show that a variation of the problem can be solved using a local algorithm. Local algorithms are extremely useful in a sensor network scenario. This is because they allow the communication range of the sensor to be restricted to the minimum, they can operate in routerless networks, and they allow complex problems to be solved on the basis of very little information, gathered from nearby sensors. The local facility location algorithm we describe is entirely asynchronous, seamlessly supports failures and changes in the data during calculation, poses modest memory and computational requirements, and can provide an anytime solution which is guaranteed to converge to the exact same one that would be computed by a centralized algorithm given the entire data.

IEEE Transactions on Knowledge and Data Engineering | 2005

Hierarchical decision tree induction in distributed genomic databases

Amir Bar-or; Daniel Keren; Assaf Schuster; Ran Wolff

Classification based on decision trees is one of the important problems in data mining and has applications in many fields. In recent years, database systems have become highly distributed, and distributed system paradigms, such as federated and peer-to-peer databases, are being adopted. In this paper, we consider the problem of inducing decision trees in a large distributed network of genomic databases. Our work is motivated by the existence of distributed databases in healthcare and in bioinformatics, and by the emergence of systems which automatically analyze these databases, and by the expectancy that these databases will soon contain large amounts of highly dimensional genomic data. Current decision tree algorithms require high communication bandwidth when executed on such data, which are large-scale distributed systems. We present an algorithm that sharply reduces the communication overhead by sending just a fraction of the statistical data. A fraction which is nevertheless sufficient to derive the exact same decision tree learned by a sequential learner on all the data-in the network. Extensive experiments using standard synthetic SNP data show that the algorithm utilizes the high dependency among attributes, typical to genomic data, to reduce communication overhead by up to 99 percent. Scalability tests show that the algorithm scales well with both the size of the data set, the dimensionality of the data, and the size of the distributed system.

Explore More