István Hegedüs | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where István Hegedüs is active.

Explore More

Publication

Featured researches published by István Hegedüs.

international conference on parallel processing | 2011

Asynchronous peer-to-peer data mining with stochastic gradient descent

Róbert Ormándi; István Hegedüs; Márk Jelasity

Fully distributed data mining algorithms build global models over large amounts of data distributed over a large number of peers in a network, without moving the data itself. In the area of peer-to-peer (P2P) networks, such algorithms have various applications in P2P social networking, and also in trackerless BitTorrent communities. The difficulty of the problem involves realizing good quality models with an affordable communication complexity, while assuming as little as possible about the communication model. Here we describe a conceptually simple, yet powerful generic approach for designing efficient, fully distributed, asynchronous, local algorithms for learning models of fully distributed data. The key idea is that many models perform a random walk over the network while being gradually adjusted to fit the data they encounter, using a stochastic gradient descent search. We demonstrate our approach by implementing the support vector machine (SVM) method and by experimentally evaluating its performance in various failure scenarios over different benchmark datasets. Our algorithm scheme can implement a wide range of machine learning methods in an extremely robust manner.

self-adaptive and self-organizing systems | 2012

Gossip-Based Learning under Drifting Concepts in Fully Distributed Networks

István Hegedüs; Róbert Ormándi; Márk Jelasity

In fully distributed networks data mining is an important tool for monitoring, control, and for offering personalized services to users. The underlying data model can change as a function of time according to periodic (daily, weakly) patterns, sudden changes, or long term transformations of the environment or the system itself. For a large space of the possible models for this dynamism-when the network is very large but only a few training samples can be obtained at all nodes locally-no efficient fully distributed solution is known. Here we present an approach, that is able to follow concept drift in very large scale and fully distributed networks. The algorithm does not collect data to a central location, instead it is based on online learners taking random walks in the network. To achieve adaptivity the diversity of the learners is controlled by managing the life spans of the models. We demonstrate through a thorough experimental analysis, that in a well specified range of feasible models of concept drift, where there is little data available locally in a large network, our algorithm outperforms known methods from related work.

international symposium on intelligent systems and informatics | 2012

Detecting concept drift in fully distributed environments

István Hegedüs; Lehel Nyers; Róbert Ormándi

Applying sophisticated machine learning techniques on fully distributed data is increasingly important in many applications like distributed recommender systems or spam filters. In this type of networked environment the data model can change dynamically over time (concept drift). Identifying when concept drift occurred is a key for several drift handling techniques and important in numerous scenarios. However drift handling approaches exist, no efficient solution for detecting the drift is known in very large scale networks. Here, we propose an approach that can detect the concept drift in large scale and fully distributed networks. In our approach, the learning is performed by applying online learners that take random walks in the network while updating themselves using the samples available at the nodes. The drift detection is based on an adaptive mechanism which uses the historical performances of the models. Through empirical evaluations we demonstrate that our approach handles the drifting concept while additionally detects the occurrence of the concept drift with high accuracy.

workshops on enabling technologies: infrastracture for collaborative enterprises | 2010

Towards Inferring Ratings from User Behavior in BitTorrent Communities

Róbert Ormándi; István Hegedüs; Kornél Csernai; Márk Jelasity

Peer-to-peer file-sharing has been increasingly popular in the last decade. In most cases file-sharing communities provide only minimal functionality, such as search and download. Extra features such as recommendation are difficult to implement because users are typically unwilling to provide sufficient rating information for the items they download. For this reason, it would be desirable to utilize user behavior to infer implicit ratings. For example, if a user deletes a file after downloading it, we could infer that the rating is low, or if the user is seeding the file for a long time, the rating is high. In this paper we demonstrate that it is indeed possible to infer implicit ratings from user behavior. We work with a large trace of Filelist.org, a BitTorrent-based private community, and demonstrate that we can identify a binary like/dislike distinction over the set of files users are downloading, using dynamic features of swarm membership. The resulting database containing the inferred ratings will be published online publicly and it can be used as a benchmark for P2P recommender systems.

parallel, distributed and network-based processing | 2016

Dimension Reduction Methods for Collaborative Mobile Gossip Learning

Árpád Berta; István Hegedüs; Márk Jelasity

Decentralized learning algorithms are very sensitive to the size of the raw data records due to the resulting large communication cost. This can, in the worst case, even make decentralized learning infeasible. Dimension reduction is a key technique to compress data and to obtain small models. In this paper, we propose a number of robust and efficient decentralized approaches to dimension reduction in the system model where each network node holds only one data record. These algorithms build on searching for good random projections. We present a thorough experimental comparison of the proposed algorithms and compare them with a variant of distributed singular value decomposition (SVD), a state-of-the-art algorithm for dimension reduction. We base our experiments on a trace of real mobile phone usage. We conclude that our method based on selecting good random projections is preferable and provides good quality results when the output is required on a very short timescale, within tens of minutes. We also present a hybrid method that combines the advantages of random projections and SVD. We demonstrate that the hybrid method offers good performance over all timescales.

international conference on peer to peer computing | 2014

Fully distributed robust singular value decomposition

István Hegedüs; Márk Jelasity; Levente Kocsis; András A. Benczúr

Low-rank matrix approximation is an important tool in data mining with a wide range of applications including recommender systems, clustering, and identifying topics in documents. The problem we tackle is implementing singular value decomposition (SVD)-a popular method for low rank approximation in large fully distributed P2P systems in a robust and scalable manner. We assume that the matrix to be approximated is stored in a large network where each node knows one row of the matrix (personal attributes, documents, media ratings, etc). In our P2P model, we do not allow this personal information to leave the node, yet we want the nodes to collaboratively compute the SVD. Methods applied in large scale distributed systems such as synchronized parallel gradient search or distributed iterative methods are not preferable in our system model due to their requirements of synchronized rounds or their inherent issues with load balancing. Our approach overcomes these limitations with the help of a distributed stochastic gradient search in which the personal part of the decomposition remains local, and the global part (e.g., movie features) converges at all nodes to the correct value. We present a theoretical derivation of our algorithm, as well as a thorough experimental evaluation of real and synthetic data as well. We demonstrate that the convergence speed of our method is competitive while not relying on synchronization and being robust to extreme failure scenarios.

text speech and dialogue | 2010

Opinion mining by transformation-based domain adaptation

Róbert Ormándi; István Hegedüs; Richárd Farkas

Here we propose a novel approach for the task of domain adaptation for Natural Language Processing. Our approach captures relations between the source and target domains by applying a model transformation mechanism which can be learnt by using labeled data of limited size taken from the target domain. Experimental results on several Opinion Mining datasets show that our approach significantly outperforms baselines and published systems when the amount of labeled data is extremely small.

parallel, distributed and network-based processing | 2016

Distributed Differentially Private Stochastic Gradient Descent: An Empirical Study

István Hegedüs; Márk Jelasity

In fault-prone large-scale distributed environments stochastic gradient descent (SGD) is a popular approach to implement machine learning algorithms. Data privacy is a key concern in such environments, which is often addressed within the framework of differential privacy. The output quality of differentially private SGD implementations as a function of design choices has not yet been thoroughly evaluated. In this study, we examine this problem experimentally. We assume that every data record is stored by an independent node, which is a typical setup in networks of mobile devices or Internet of things (IoT) applications. In this model we identify a set of possible distributed differentially private SGD implementations. In these implementations all the sensitive computations are strictly local, and any public information is protected by differentially private mechanisms. This means that personal information can leak only if the corresponding node is directly compromised. We then perform a set of experiments to evaluate these implementations over several machine learning problems with both logistic regression and support vector machine (SVM) loss functions. Depending on the parameter setting and the choice of the algorithm, the performance of the noise-free algorithm can be closely approximated by differentially private variants.

international conference on machine learning | 2013