T István Nagy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where T István Nagy is active.

Explore More

Publication

Featured researches published by T István Nagy.

ACM Transactions on Speech and Language Processing | 2013

Learning to detect english and hungarian light verb constructions

Veronika Vincze; T István Nagy; János Zsibrita

Light verb constructions consist of a verbal and a nominal component, where the noun preserves its original meaning while the verb has lost it (to some degree). They are syntactically flexible and their meaning can only be partially computed on the basis of the meaning of their parts, thus they require special treatment in natural language processing. For this purpose, the first step is to identify light verb constructions. In this study, we present our conditional random fields-based tool—called FXTagger—for identifying light verb constructions. The flexibility of the tool is demonstrated on two, typologically different, languages, namely, English and Hungarian. As earlier studies labeled different linguistic phenomena as light verb constructions, we first present a linguistics-based classification of light verb constructions and then show that FXTagger is able to identify different classes of light verb constructions in both languages. Different types of texts may contain different types of light verb constructions; moreover, the frequency of light verb constructions may differ from domain to domain. Hence we focus on the portability of models trained on different corpora, and we also investigate the effect of simple domain adaptation techniques to reduce the gap between the domains. Our results show that in spite of domain specificities, out-domain data can also contribute to the successful LVC detection in all domains.

Acta Cybernetica | 2012

Person attribute extraction from the textual parts of web pages

T István Nagy

We present the RGAI systems which participated in the third Web People Search Task challenge. The chief characteristics of our approach are that we focus on the raw textual parts of the Web pages instead of the structured parts, we group similar attribute classes together and we explicitly handle their interdependencies. The RGAI systems achieved top results on the attribute extraction subtask, and average results on the clustering subtask.

text speech and dialogue | 2008

Web-Based Lemmatisation of Named Entities

Richárd Farkas; Veronika Vincze; T István Nagy; Róbert Ormándi; György Szarvas; Attila Almási

Identifying the lemma of a Named Entity is important for many Natural Language Processing applications like Information Retrieval. Here we introduce a novel approach for Named Entity lemmatisation which utilises the occurrence frequencies of each possible lemma. We constructed four corpora in English and Hungarian and trained machine learning methods using them to obtain simple decision rules based on the web frequencies of the lemmas. In experiments our web-based heuristic achieved an average accuracy of nearly 91%.

text speech and dialogue | 2014

Document Classification with Deep Rectifier Neural Networks and Probabilistic Sampling

Tamás Grósz; T István Nagy

Deep learning is regarded by some as one of the most important technological breakthroughs of this decade. In recent years it has been shown that using rectified neurons, one can match or surpass the performance achieved using hyperbolic tangent or sigmoid neurons, especially in deep networks. With rectified neurons we can readily create sparse representations, which seems especially suitable for naturally sparse data like the bag of words representation of documents. To test this, here we study the performance of deep rectifier networks in the document classification task. Like most machine learning algorithms, deep rectifier nets are sensitive to class imbalances, which is quite common in document classification. To remedy this situation we will examine the training scheme called probabilistic sampling, and show that it can improve the performance of deep rectifier networks. Our results demonstrate that deep rectifier networks generally outperform other typical learning algorithms in the task of document classification.

text, speech and dialogue | 2011

On positive and unlabeled learning for text classification

T István Nagy; Richárd Farkas; János Csirik

In this paper we present a slightly modified machine learning approach for text classification working exclusively from positive and unlabeled samples. Our method can assure that the positive class is not underrepresented during the iterative training process and it can achieve 30% better F-value when the amount of positive examples is low.

text speech and dialogue | 2013

English Nominal Compound Detection with Wikipedia-Based Methods

T István Nagy; Veronika Vincze

Nominal compounds (NCs) are lexical units that consist of two or more elements that exist on their own, function as a noun and have a special added meaning. Here, we present the results of our experiments on how the growth of Wikipedia added to the performance of our dictionary labeling methods to detecting NCs. We also investigated how the size of an automatically generated silver standard corpus can affect the performance of our machine learning-based method. The results we obtained demonstrate that the bigger the dataset, the better the performance will be.

language resources and evaluation | 2014