Arik Friedman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arik Friedman is active.

Explore More

Publication

Featured researches published by Arik Friedman.

knowledge discovery and data mining | 2010

Data mining with differential privacy

Arik Friedman; Assaf Schuster

We consider the problem of data mining with formal privacy guarantees, given a data access interface based on the differential privacy framework. Differential privacy requires that computations be insensitive to changes in any particular individuals record, thereby restricting data leaks through the results. The privacy preserving interface ensures unconditionally safe access to the data and does not require from the data miner any expertise in privacy. However, as we show in the paper, a naive utilization of the interface to construct privacy preserving data mining algorithms could lead to inferior data mining results. We address this problem by considering the privacy and the algorithmic requirements simultaneously, focusing on decision tree induction as a sample application. The privacy mechanism has a profound effect on the performance of the methods chosen by the data miner. We demonstrate that this choice could make the difference between an accurate classifier and a completely useless one. Moreover, an improved algorithm can achieve the same level of accuracy and privacy as the naive implementation but with an order of magnitude fewer learning samples.

very large data bases | 2008

Providing k-anonymity in data mining

Arik Friedman; Ran Wolff; Assaf Schuster

In this paper we present extended definitions of k-anonymity and use them to prove that a given data mining model does not violate the k-anonymity of the individuals represented in the learning examples. Our extension provides a tool that measures the amount of anonymity retained during data mining. We show that our model can be applied to various data mining problems, such as classification, association rule mining and clustering. We describe two data mining algorithms which exploit our extension to guarantee they will generate only k-anonymous output, and provide experimental results for one of them. Finally, we show that our method contributes new and efficient ways to anonymize data and preserve patterns during anonymization.

conference on recommender systems | 2015

Applying Differential Privacy to Matrix Factorization

Arnaud Berlioz; Arik Friedman; Mohamed Ali Kaafar; Roksana Boreli; Shlomo Berkovsky

Recommender systems are increasingly becoming an integral part of on-line services. As the recommendations rely on personal user information, there is an inherent loss of privacy resulting from the use of such systems. While several works studied privacy-enhanced neighborhood-based recommendations, little attention has been paid to privacy preserving latent factor models, like those represented by matrix factorization techniques. In this paper, we address the problem of privacy preserving matrix factorization by utilizing differential privacy, a rigorous and provable privacy preserving method. We propose and study several approaches for applying differential privacy to matrix factorization, and evaluate the privacy-accuracy trade-offs offered by each approach. We show that input perturbation yields the best recommendation accuracy, while guaranteeing a solid level of privacy protection.

internet measurement conference | 2014

Censorship in the Wild: Analyzing Internet Filtering in Syria

Abdelberi Chaabane; Terence Chen; Mathieu Cunche; Emiliano De Cristofaro; Arik Friedman; Mohamed Ali Kaafar

Internet censorship is enforced by numerous governments worldwide, however, due to the lack of publicly available information, as well as the inherent risks of performing active measurements, it is often hard for the research community to investigate censorship practices in the wild. Thus, the leak of 600GB worth of logs from 7 Blue Coat SG-9000 proxies, deployed in Syria to filter Internet traffic at a country scale, represents a unique opportunity to provide a detailed snapshot of a real-world censorship ecosystem. This paper presents the methodology and the results of a measurement analysis of the leaked Blue Coat logs, revealing a relatively stealthy, yet quite targeted, censorship. We find that traffic is filtered in several ways: using IP addresses and domain names to block subnets or websites, and keywords or categories to target specific content. We show that keyword-based censorship produces some collateral damage as many requests are blocked even if they do not relate to sensitive content. We also discover that Instant Messaging is heavily censored, while filtering of social media is limited to specific pages. Finally, we show that Syrian users try to evade censorship by using web/socks proxies, Tor, VPNs, and BitTorrent. To the best of our knowledge, our work provides the first analytical look into Internet filtering in Syria.

european conference on principles of data mining and knowledge discovery | 2006

k -Anonymous Decision Tree Induction

Arik Friedman; Assaf Schuster; Ran Wolff

In this paper we explore an approach to privacy preserving data mining that relies on the k-anonymity model. The k-anonymity model guarantees that no private information in a table can be linked to a group of less than k individuals. We suggest extended definitions of k-anonymity that allow the k-anonymity of a data mining model to be determined. Using these definitions, we present decision tree induction algorithms that are guaranteed to maintain k-anonymity of the learning examples. Experiments show that embedding anonymization within the decision tree induction process provides better accuracy than anonymizing the data first and inducing the tree later.

Recommender systems handbook | 2015

Privacy Aspects of Recommender Systems

Arik Friedman; Bart P. Knijnenburg; Kris Vanhecke; Luc Martens; Shlomo Berkovsky

The popularity of online recommender systems has soared; they are deployed in numerous websites and gather tremendous amounts of user data that are necessary for recommendation purposes. This data, however, may pose a severe threat to user privacy, if accessed by untrusted parties or used inappropriately. Hence, it is of paramount importance for recommender system designers and service providers to find a sweet spot, which allows them to generate accurate recommendations and guarantee the privacy of their users. In this chapter we overview the state of the art in privacy enhanced recommendations. We analyze the risks to user privacy imposed by recommender systems, survey the existing solutions, and discuss the privacy implications for the users of recommenders. We conclude that a considerable effort is still required to develop practical recommendation solutions that provide adequate privacy guarantees, while at the same time facilitating the delivery of high-quality recommendations to their users.

User Modeling and User-adapted Interaction | 2016

A differential privacy framework for matrix factorization recommender systems

Arik Friedman; Shlomo Berkovsky; Mohamed Ali Kaafar

Recommender systems rely on personal information about user behavior for the recommendation generation purposes. Thus, they inherently have the potential to hamper user privacy and disclose sensitive information. Several works studied how neighborhood-based recommendation methods can incorporate user privacy protection. However, privacy preserving latent factor models, in particular, those represented by matrix factorization techniques, the state-of-the-art in recommender systems, have received little attention. In this paper, we address the problem of privacy preserving matrix factorization by utilizing differential privacy, a rigorous and provable approach to privacy in statistical databases. We propose a generic framework and evaluate several ways, in which differential privacy can be applied to matrix factorization. By doing so, we specifically address the privacy-accuracy trade-off offered by each of the algorithms. We show that, of all the algorithms considered, input perturbation results in the best recommendation accuracy, while guaranteeing a solid level of privacy protection against attacks that aim to gain knowledge about either specific user ratings or even the existence of these ratings. Our analysis additionally highlights the system aspects that should be addressed when applying differential privacy in practice, and when considering potential privacy preserving solutions.

workshop on privacy in the electronic society | 2014

Secure Evaluation Protocol for Personalized Medicine

Mentari Djatmiko; Arik Friedman; Roksana Boreli; Felix Lawrence; Brian Thorne; Stephen Hardy

The increasing availability and use of genome data for applications like personalized medicine have created opportunities for the improved diagnosis and treatment of various medical conditions. However, it has a potential to be used for discrimination, thereby presenting a set of serious challenges in privacy and security. We propose a secure evaluation algorithm to compute genomic tests that are based on a linear combination of genome data values (we use the Warfarin dosing algorithm as a representative example). Our proposal relies on a combination of partially homomorphic Paillier encryption and private information retrieval. We implement a prototype system that includes the Paillier encryption part of our protocol. Our initial evaluation demonstrates a good potential for real time use in a physician-patient scenario, with a response time of around 200ms in a Wi-Fi communications environment.

IEEE Communications Magazine | 2013

Collaborative network outage troubleshooting with secure multiparty computation

Mentari Djatmiko; Dominik Schatzmann; Xenofontas A. Dimitropoulos; Arik Friedman; Roksana Boreli

Troubleshooting network outages is a complex and time-consuming process. Network administrators are typically overwhelmed with large volumes of monitoring data, like SMTP and NetFlow measurements, from which it is very hard to separate between actionable and non-actionable events. In addition, they can only debug network problems using very basic tools, like ping and traceroute. In this context, intelligent correlation of measurements from different Internet locations is essential for analyzing the root cause of outages. However, correlating measurements across domains raises privacy concerns and hence is largely avoided. A possible solution to the privacy barrier is secure multi-party computation (MPC), that is, a set of cryptographic methods that enable a number of parties to aggregate private data without revealing sensitive information. In this article, we propose a distributed mechanism based on MPC for privacy-preserving correlation of NetFlow measurements from multiple ISPs, which helps in the diagnosis of network outages. We first outline an MPC protocol that can be used to analyze the scope (local, global, or semi-global) and severity of network outages across multiple ISPs. Then we use NetFlow data from a medium-sized ISP to evaluate the performance of our protocol. Our findings indicate that correlating data from several dozens of ISPs is feasible in near real time, with a delay of just a few seconds. This demonstrates the scalability and potential for real-world deployment of MPC-based schemes. Finally, as a case study we demonstrate how our scheme helped analyze, from multiple domains, the impact that Hurricane Sandy had on Internet connectivity in terms of scope and severity.

conference on emerging network experiment and technology | 2013

Federated flow-based approach for privacy preserving connectivity tracking

Mentari Djatmiko; Dominik Schatzmann; Xenofontas A. Dimitropoulos; Arik Friedman; Roksana Boreli

Network outages are an important issue for Internet Service Providers (ISPs) and, more generally, online service providers, as they can result in major financial losses and negatively impact relationships with their customers. Troubleshooting network outages is a complex and time-consuming process. Network administrators are overwhelmed with large volumes of monitoring data and are limited to using very basic tools for debugging, e.g., ping and traceroute. Intelligent correlation of measurements from different Internet locations is very useful for analyzing the root cause of outages. However, correlating measurements of user traffic across domains is largely avoided as it raises privacy concerns. A possible solution is secure multi-party computation (MPC), a set of cryptographic methods that enable a number of parties to aggregate data in a privacy-preserving manner. In this work, we describe a novel system that helps diagnose network outages by correlating passive measurements from multiple ISPs in a privacy-preserving manner. We first show how MPC can be used to compute the scope (local, global, or semi-global) and severity (number of affected hosts) of network outages. To meet near-real-time monitoring guarantees, we then present an efficient protocol for MPC multiset union that uses counting Bloom filters (CBF) to drastically accelerate MPC comparison operations. Finally, we demonstrate the utility of our scheme using real-world traffic measurements from a national ISP and we discuss the trade-offs of the CBF-based computation.

Explore More