Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where András Kornai is active.

Publication


Featured researches published by András Kornai.


meeting of the association for computational linguistics | 2007

HunPos: an open source trigram tagger

Péter Halácsy; András Kornai; Csaba Oravecz

In the world of non-proprietary NLP software the standard, and perhaps the best, HMM-based POS tagger is TnT (Brants, 2000). We argue here that some of the criticism aimed at HMM performance on languages with rich morphology should more properly be directed at TnTs peculiar license, free but not open source, since it is those details of the implementation which are hidden from the user that hold the key for improved POS tagging across a wider variety of languages. We present HunPos, a free and open source (LGPL-licensed) alternative, which can be tuned by the user to fully utilize the potential of HMM architectures, offering performance comparable to more complex models, but preserving the ease and speed of the training and tagging process.


PLOS ONE | 2012

Dynamics of Conflicts in Wikipedia

Taha Yasseri; Robert Sumi; András Rung; András Kornai; János Kertész

In this work we study the dynamical features of editorial wars in Wikipedia (WP). Based on our previously established algorithm, we build up samples of controversial and peaceful articles and analyze the temporal characteristics of the activity in these samples. On short time scales, we show that there is a clear correspondence between conflict and burstiness of activity patterns, and that memory effects play an important role in controversies. On long time scales, we identify three distinct developmental patterns for the overall behavior of the articles. We are able to distinguish cases eventually leading to consensus from those cases where a compromise is far from achievable. Finally, we analyze discussion networks and conclude that edit wars are mainly fought by few editors only.


meeting of the association for computational linguistics | 2005

Hunmorph: Open Source Word Analysis

Viktor Trón; Gyögy Gyepesi; Péter Halácsky; András Kornai; László Németh; Dániel Varga

Common tasks involving orthographic words include spellchecking, stemming, morphological analysis, and morphological synthesis. To enable significant reuse of the language-specific resources across all such tasks, we have extended the functionality of the open source spellchecker MySpell, yielding a generic word analysis library, the runtime layer of the hunmorph toolkit. We added an offline resource management component, hunlex, which complements the efficiency of our runtime layer with a high-level description language and a configurable precompiler.


Discrete Applied Mathematics | 1992

Narrowness, pathwidth, and their application in natural language processing

András Kornai; Zsolt Tuza

Abstract In the syntactic theory of Tesniere (1959) the structural description of sentences are given as graphs. We discuss how the graph-theoretic concept of pathwidth is relevant in this approach. In particular, we point out the importance of graphs with pathwidth ≤6 in connection with natural language processing, and give a short proof of the characterization theorem of trees with pathwidth k .


WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus | 2006

Web-based frequency dictionaries for medium density languages

András Kornai; Péter Halácsy; Viktor Nagy; Csaba Oravecz; Viktor Trón; Dániel Varga

Frequency dictionaries play an important role both in psycholinguistic experiment design and in language technology. The paper describes a new, freely available, web-based frequency dictionary of Hungarian that is being used for both purposes, and the language-independent techniques used for creating it.


applied imagery pattern recognition workshop | 1999

Robust language-independent OCR system

Zhidong A. Lu; Issam Bazzi; András Kornai; John Makhoul; Premkumar Natarajan; Richard M. Schwartz

We present a language-independent optical character recognition system that is capable, in principle, of recognizing printed text from most of the worlds languages. For each new language or script the system requires sample training data along with ground truth at the text-line level; there is no need to specify the location of either the lines or the words and characters. The system uses hidden Markov modeling technology to model each character. In addition to language independence, the technology enhances performance for degraded data, such as fax, by using unsupervised adaptation techniques. Thus far, we have demonstrated the language-independence of this approach for Arabic, English, and Chinese. Recognition results are presented in this paper, including results on faxed data.


PLOS ONE | 2012

A Practical Approach to Language Complexity: A Wikipedia Case Study

Taha Yasseri; András Kornai; János Kertész

In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, that is, that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully developing articles, concluding that controversy has the effect of reducing language complexity.


PLOS ONE | 2013

Digital Language Death

András Kornai

Of the approximately 7,000 languages spoken today, some 2,500 are generally considered endangered. Here we argue that this consensus figure vastly underestimates the danger of digital language death, in that less than 5% of all languages can still ascend to the digital realm. We present evidence of a massive die-off caused by the digital divide.


systems man and cybernetics | 1995

An HMM-based legal amount field OCR system for checks

András Kornai; K.M. Mohiuddin; Scott D. Connell

The system described in this paper applies hidden Markov technology to the task of recognizing the handwritten legal amount on personal checks. We argue that the most significant source of error in handwriting recognition is the segmentation process. In traditional handwriting OCR systems, recognition is performed at the character level, using the output of an independent segmentation step. Using a fixed stepsize series of vertical slices from the image, the HMM system described in this paper avoids taking segmentation decisions early in the recognition process.


conference of the european chapter of the association for computational linguistics | 1985

Natural languages and the Chomsky hierarchy

András Kornai

The central claim of the paper is that NL stringsets are regular. Three independent arguments are offered in favor of this position: one based on parsimony considerations, one employing the McCullogh-Pitts (1943) model of neuruns, and a purely linguistic one. It is possible to derive explicit upper bounds for the number of (live) states in NL acceptors: the results show that finite state NL parsers can be implemented on present-day computers. The position of NL stringsets within the regular family is also investigated: it is proved that NLs are counter-free, but not locally testable.

Collaboration


Dive into the András Kornai's collaboration.

Top Co-Authors

Avatar

Péter Halácsy

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Dániel Varga

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Dávid Márk Nemeskey

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Gábor András Recski

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

András Rung

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Judit Ács

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Márton Makrai

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Viktor Trón

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar

Csaba Oravecz

Hungarian Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge