Bálint Zoltán Daróczy
Hungarian Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bálint Zoltán Daróczy.
cross language evaluation forum | 2008
Thomas Deselaers; Allan Hanbury; Ville Viitaniemi; András A. Benczúr; Mátyás Brendel; Bálint Zoltán Daróczy; Hugo Jair Escalante Balderas; Theo Gevers; Carlos Arturo Hernández Gracidas; Steven C. H. Hoi; Jorma Laaksonen; Mingjing Li; Heidy Marisol Marin Castro; Hermann Ney; Xiaoguang Rui; Nicu Sebe; Julian Stöttinger; Lei Wu
We describe the object retrieval task of ImageCLEF 2007, give an overview of the methods of the participating groups, and present and discuss the results. The task was based on the widely used PASCAL object recognition data to train object recognition methods and on the IAPR TC-12 benchmark dataset from which images of objects of the ten different classes bicycles, buses, cars, motorbikes, cats, cows, dogs, horses, sheep, and persons had to be retrieved. Seven international groups participated using a wide variety of methods. The results of the evaluation show that the task was very challenging and that different methods for relevance assessment can have a strong influence on the results of an evaluation.
vehicular technology conference | 2015
Bálint Zoltán Daróczy; P Vaderna; András A. Benczúr
Abnormal bearer session release (i.e. bearer session drop) in cellular telecommunication networks may seriously impact the quality of experience of mobile users. The latest mobile technologies enable high granularity real-time reporting of all conditions of individual sessions, which gives rise to use data analytics methods to process and monetize this data for network optimization. One such example for analytics is Machine Learning (ML) to predict session drops well before the end of session. In this paper a novel ML method is presented that is able to predict session drops with higher accuracy than using traditional models. The method is applied and tested on live LTE data offline. The high accuracy predictor can be part of a SON function in order to eliminate the session drops or mitigate their effects.
Proceedings of the 2014 Recommender Systems Challenge on | 2014
Róbert Pálovics; Frederick Ayala-Gómez; Balázs Csikota; Bálint Zoltán Daróczy; Levente Kocsis; Dominic Spadacene; András A. Benczúr
In this paper we give our solution to the RecSys Challenge 2014. In our ensemble we use (1) a mix of binary classification methods for predicting nonzero engagement, including logistic regression and SVM; (2) regression methods for directly predicting the engagement, including linear regression and gradient boosted trees; (3) matrix factorization and factorization machines over the user-movie matrix, by using user and movie features as side information. For most of the methods, we use the GraphLab Create implementation. Our current nDCG@10 achieves 0.874. We release our experiments as IPython Notebooks.
Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality | 2012
Dávid Siklósi; Bálint Zoltán Daróczy; András A. Benczúr
In this paper we improve trust, bias and factuality classification over Web data on the domain level. Unlike the majority of literature in this area that aims at extracting opinion and handling short text on the micro level, we aim to aid a researcher or an archivist in obtaining a large collection that, on the high level, originates from unbiased and trustworthy sources. Our method generates features as Jensen-Shannon distances from centers in a host-term biclustering. On top of the distance features, we apply kernel methods and also combine with baseline text classifiers. We test our method on the ECML/PKDD Discovery Challenge data set DC2010. Our method improves over the best achieved text classification NDCG results by over 3--10% for neutrality, bias and trustworthiness. The fact that the ECML/PKDD Discovery Challenge 2010 participants reached an AUC only slightly above 0.5 indicates the hardness of the task.
cross language evaluation forum | 2008
András A. Benczúr; István Bíró; Mátyás Brendel; Károly Csalogány; Bálint Zoltán Daróczy; Dávid Siklósi
We describe our approach to the ImageCLEFphoto 2007 task. The novelty of our method consists of biclustering image segments and annotation words. Given the query words, it is possible to select the image segment clusters that have strongest cooccurrence with the corresponding word clusters. These image segment clusters act as the selected segments relevant to a query. We rank text hits by our own tf.idf-based information retrieval system and image similarities by using a 20-dimensional vector describing the visual content of an image segment. Relevant image segments were selected by the biclustering procedure. Images were segmented by graph-based segmentation. We used neither query expansion nor relevance feedback; queries were generated automatically from the title and the description words. The later were weighted by 0.1.
Internet Mathematics | 2014
Miklós Erdélyi; András A. Benczúr; Bálint Zoltán Daróczy; András Garzó; Tamás Kiss; Dávid Siklósi
Abstract In this article we give a comprehensive overview of features devised for web spam detection and investigate how much various classes, some requiring very high computational effort, add to the classification accuracy. We collect and handle a large number of features based on recent advances in web spam filtering, including temporal ones; in particular, we analyze the strength and sensitivity of linkage change. We propose new, temporal link-similarity-based features and show how to compute them efficiently on large graphs. We show that machine learning techniques, including ensemble selection, LogitBoost, and random forest significantly improve accuracy. We conclude that, with appropriate learning techniques, a simple and computationally inexpensive feature subset outperforms all previous results published so far on our dataset and can be further improved only slightly by computationally expensive features. We test our method on three major publicly available datasets: the Web Spam Challenge 2008 dataset WEBSPAM-UK2007, the ECML/PKDD Discovery Challenge dataset DC2010, and the Waterloo Spam Rankings for ClueWeb09. Our classifier ensemble sets the strongest classification benchmark compared to participants of the Web Spam and ECML/PKDD Discovery Challenges as well as the TREC Web track. To foster research in the area, we make several feature sets and source codes public,1 https://datamining.sztaki.hu/en/download/web-spam-resources including the temporal features of eight .uk crawl snapshots that include WEBSPAM-UK2007 as well as the Web Spam Challenge features for the labeled part of ClueWeb09.
cross language evaluation forum | 2009
Bálint Zoltán Daróczy; István Petrás; András A. Benczúr; Zsolt Fekete; Dávid Márk Nemeskey; Dávid Siklósi; Zsuzsa Weiner
Our approach to the ImageCLEF 2009 tasks is based on image segmentation, SIFT keypoints and Okapi BM25-based text retrieval. We use feature vectors to describe the visual content of an image segment, a keypoint or the entire image. The features include color histograms, a shape descriptor as well as a 2D Fourier transform of a segment and an orientation histogram of detected keypoints. We trained a Gaussian Mixture Model (GMM) to cluster the feature vectors extracted from the image segments and keypoints independently. The normalized Fisher gradient vector computed from GMM of SIFT descriptors is a well known technique to represent an image with only one vector. Novel to our method is the combination of Fisher vectors for keypoints with those of the image segments to improve classification accuracy. We introduced correlation-based combining methods to further improve classification quality.
web science | 2017
Frederick Ayala-Gómez; Bálint Zoltán Daróczy; Michael Mathioudakis; András A. Benczúr; Aristides Gionis
Location-Based Social Networks (LBSNs) enable their users to share with their friends the places they go to and whom they go with. Additionally, they provide users with recommendations for Points of Interest (POI) they have not visited before. This functionality is of great importance for users of LBSNs, as it allows them to discover interesting places in populous cities that are not easy to explore. For this reason, previous research has focused on providing recommendations to LBSN users. Nevertheless, while most existing work focuses on recommendations for individual users, techniques to provide recommendations to groups of users are scarce. In this paper, we consider the problem of recommending a list of POIs to a group of users in the areas that the group frequents. Our data consist of activity on Swarm, a social networking app by Foursquare, and our results demonstrate that our proposed Geo-Group-Recommender (GGR), a class of hybrid recommender systems that combine the group geographical preferences using Kernel Density Estimation, category and location features and group check-ins outperform a large number of other recommender systems. Moreover, we find evidence that user preferences differ both in venue category and in location between individual and group activities. We also show that combining individual recommendations using group aggregation strategies is not as good as building a profile for a group. Our experiments show that (GGR) outperforms the baselines in terms of precision and recall at different cutoffs.
international world wide web conferences | 2015
Bálint Zoltán Daróczy; David Siklois; Róbert Pálovics; András A. Benczúr
We compare machine learning methods to predict quality aspects of the C3 dataset collected as a part of the Reconcile project. We give methods for automatically assessing the credibility, presentation, knowledge, intention and completeness by extending the attributes in the C3 dataset by the page textual content. We use Gradient Boosted Trees and recommender methods over the evaluator, site, evaluation triplets and their metadata and combine with text classifiers. In our experiments best results can be reached by the theoretically justified normalized SVM kernel. The normalization can be derived by using the Fisher information matrix of the text content. As the main contribution, we describe the theory of the Fisher matrix and show that SVM may be particularly suitable for difficult text classification tasks.
cross language evaluation forum | 2008
Bálint Zoltán Daróczy; Zsolt Fekete; Mátyás Brendel; Simon Rácz; András A. Benczúr; Dávid Siklósi; Attila Pereszlényi
We describe our image processing system used in the Image-CLEF 2008 Photo Retrieval and Visual Concept Detection tasks. Our method consists of image segmentation followed by feature generation over the segments based on color, shape and texture. In the paper we elaborate on the importance of choices in the segmentation procedure with emphasis on edge detection. We also measure the relative importance of the visual features as well as the right choice of the distance function. Finally, given a very large number of parameters in our image processing system, we give a method for parameter optimization by measuring how well the similarity measures separate sample images of the same topic from those of different topics.