Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jácint Szabó is active.

Publication


Featured researches published by Jácint Szabó.


adversarial information retrieval on the web | 2008

Latent dirichlet allocation in web spam filtering

István Bíró; Jácint Szabó; András A. Benczúr

Latent Dirichlet allocation (LDA) (Blei, Ng, Jordan 2003) is a fully generative statistical language model on the content and topics of a corpus of documents. In this paper we apply a modification of LDA, the novel multi-corpus LDA technique for web spam classification. We create a bag-of-words document for every Web site and run LDA both on the corpus of sites labeled as spam and as non-spam. In this way collections of spam and non-spam topics are created in the training phase. In the test phase we take the union of these collections, and an unseen site is deemed spam if its total spam topic probability is above a threshold. As far as we know, this is the first web retrieval application of LDA. We test this method on the UK2007-WEBSPAM corpus, and reach a relative improvement of 11% in F-measure by a logistic regression based combination with strong link and content baseline classifiers.


adversarial information retrieval on the web | 2009

Linked latent Dirichlet allocation in web spam filtering

István Bíró; Dávid Siklósi; Jácint Szabó; András A. Benczúr

Latent Dirichlet allocation (LDA) (Blei, Ng, Jordan 2003) is a fully generative statistical language model on the content and topics of a corpus of documents. In this paper we apply an extension of LDA for web spam classification. Our linked LDA technique takes also linkage into account: topics are propagated along links in such a way that the linked document directly influences the words in the linking document. The inferred LDA model can be applied for classification as dimensionality reduction similarly to latent semantic indexing. We test linked LDA on the WEBSPAM-UK2007 corpus. By using BayesNet classifier, in terms of the AUC of classification, we achieve 3% improvement over plain LDA with BayesNet, and 8% over the public link features with C4.5. The addition of this method to a log-odds based combination of strong link and content baseline classifiers results in a 3% improvement in AUC. Our method even slightly improves over the best Web Spam Challenge 2008 result.


european conference on machine learning | 2009

Latent Dirichlet Allocation for Automatic Document Categorization

István Bíró; Jácint Szabó

In this paper we introduce and evaluate a technique for applying latent Dirichlet allocation to supervised semantic categorization of documents. In our setup, for every category an own collection of topics is assigned, and for a labeled training document only topics from its category are sampled. Thus, compared to the classical LDA that processes the entire corpus in one, we essentially build separate LDA models for each category with the category-specific topics, and then these topic collections are put together to form a unified LDA model. For an unseen document the inferred topic distribution gives an estimation how much the document fits into the category. We use this method for Web document classification. Our key results are 46% decrease in 1-AUC value in classification accuracy over tf.idf with SVM and 43% over the plain LDA baseline with SVM. Using a careful vocabulary selection method and a heuristic which handles the effect that similar topics may arise in distinct categories the improvement is 83% over tf.idf with SVM and 82% over LDA with SVM in 1-AUC.


Journal of Combinatorial Theory | 2012

A proof of Cunningham's conjecture on restricted subgraphs and jump systems

Yusuke Kobayashi; Jácint Szabó; Kenjiro Takazawa

For an undirected graph and a fixed integer k, a 2-matching is said to be k-restricted if it has no cycle of length k or less. The problem of finding a maximum cardinality k-restricted 2-matching is polynomially solvable when k= =5. On the other hand, the degree sequences of the k-restricted 2-matchings form a jump system for k= =5, which is consistent with the polynomial solvability of the maximization problem. In 2002, Cunningham conjectured that the degree sequences of 4-restricted 2-matchings form a jump system and the maximum cardinality 4-restricted 2-matching can be found in polynomial time. In this paper, we show that the first conjecture is true, that is, the degree sequences of 4-restricted 2-matchings form a jump system. We also show that the maximum weight 4-restricted 2-matchings in a bipartite graph induce an M-concave function on the jump system if and only if the weight function is vertex-induced on every square. This result is also consistent with the polynomial solvability of the maximum weight 4-restricted 2-matching problem in bipartite graphs.


latin american web congress | 2008

A Comparative Analysis of Latent Variable Models for Web Page Classification

István Bíró; András A. Benczúr; Jácint Szabó; Ana Gabriela Maguitman

A main challenge for Web content classification is how to model the input data. This paper discusses the application of two text modeling approaches, latent semantic analysis (LSA) and latent Dirichlet allocation (LDA), in the Web page classification task. We report results on a comparison of these two approaches using different vocabularies consisting of links and text. Both models are evaluated using different numbers of latent topics. Finally, we evaluate a hybrid latent variable model that combines the latent topics resulting from both LSA and LDA. This new approach turns out to be superior to the basic LSA and LDA models. In our experiments with categories and pages obtained from the ODP Web directory the hybrid model achieves an averaged F-measure value of 0.852 and an averaged ROC value of 0.96.


Discrete Mathematics | 2008

Note: A note on degree-constrained subgraphs

András Frank; Lap Chi Lau; Jácint Szabó

Elementary proofs are presented for two graph theoretic results, originally proved by H. Shirazi and J. Verstraete using the combinatorial Nullstellensatz.


Archive | 2006

On the Generalization of the Matroid Parity Problem

András Recski; Jácint Szabó

Let T 1, T 2, ..., T t be disjoint κ-element sets and let their union be denoted by S. Let A be a subset of 0, 1, 2, ..., κ. For an integer 0 ≤ c ≤ t, a subset X ⊑ S is called (≥ c)-legal if |X∩T i| e A holds for at least c subscripts and it is called c-legal if |X∩T i| e A holds for exactly c subscripts. Let M be a matroid on S. In this paper we study problems like “Does there exist a (≥ c)-legal (or c-legal) independent set of given cardinality in M?” Observe that if A = {0, κ} and c = t then both problems reduce to the matroid κ-parity problem (in particular, to the classical matroid parity problem for κ = 2). The problems have some motivations from engineering applications and are also related to the more recent theory of jump systems.


Journal of Combinatorial Theory | 2009

Good characterizations for some degree constrained subgraphs

Jácint Szabó

The degree constrained subgraph problem is to find a subgraph of a graph with degrees as close to a given collection of degree prescriptions as possible. This problem is NP-complete in general, but for the case when no prescription contains two consecutive gaps, Lovasz gave a structural description, and Cornuejols gave a polynomial algorithm. However, compact good characterizations are known only in some special cases, such as parity intervals or general antifactors. The main result of the present paper is a simple good characterization for the special case when for every prescription it holds that all gaps have the same parity. This class contains most cases where compact good characterizations were known. The technique we apply is replacing the vertices by certain subgraphs, called gadgets-a method developed by Tutte for showing how the simple b-matching problem can be reduced to classical matchings. For this class, using a result of Pap, this approach yields the polynomiality of the edge weighted degree constrained subgraph problem.


Graphs and Combinatorics | 2009

Elementary Graphs with Respect to f -Parity Factors

Mikio Kano; Gyula Y. Katona; Jácint Szabó

This note concerns the f-parity subgraph problem, i.e., we are given an undirected graph G and a positive integer value function


Combinatorica | 2009

A note on parity constrained orientations

Tamás Király; Jácint Szabó

Collaboration


Dive into the Jácint Szabó's collaboration.

Top Co-Authors

Avatar

István Bíró

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zsolt Fekete

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Dávid Siklósi

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Márton Makai

Eötvös Loránd University

View shared research outputs
Top Co-Authors

Avatar

Gyula Pap

Eötvös Loránd University

View shared research outputs
Top Co-Authors

Avatar

Miklós Kurucz

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Simon Rácz

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Adrienn Szabó

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

András Frank

Eötvös Loránd University

View shared research outputs
Researchain Logo
Decentralizing Knowledge