Zoltán Alexin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zoltán Alexin is active.

Explore More

Publication

Featured researches published by Zoltán Alexin.

inductive logic programming | 1999

Application of Different Learning Methods to Hungarian Part-of-Speech Tagging

Tamás Horváth; Zoltán Alexin; Tibor Gyimóthy; Stefan Wrobel

From the point of view of computational linguistics, Hungarian is a diffcult language due to its complex grammar and rich morphology. This means that even a common task such as part-of-speech tagging presents a new challenge for learning when looked at for the Hungarian language, especially given the fact that this language has fairly free word order. In this paper we therefore present a case study designed to illustrate the potential and limits of current ILP and non-ILP algorithms on the Hungarian POS-tagging task. We have selected the popular C4.5 and Progol systems as propositional and ILP representatives, adding experiments with our own methods AGLEARN, a C4.5 preprocessor based on attribute grammars, and the ILP approaches PHM and RIBL. The systems were compared on the Hungarian version of the multilingual morphosyntactically annotated MULTEXT-East TELRI corpus which consists of about 100.000 tokens. Experimental results indicate that Hungarian POS-tagging is indeed a challenging task for learning algorithms, that even simple background knowledge leads to large differences in accuracy, and that instance-based methods are promising approaches to POS tagging also for Hungarian. The paper also includes experiments with some different cascade connections of the taggers.

conference of the european chapter of the association for computational linguistics | 2003

Manually annotated Hungarian corpus

Zoltán Alexin; Tibor Gyimóthy; Csaba Hatvani; László Tihanyi; János Csirik; Károly Bibok; Gabor Proszeky

Current paper presents the results of a two-year project during which a consortium of the University of Szeged and the MorphoLogic Ltd. Budapest developed a morpho-syntactically parsed and annotated (disambiguated) corpus for Hungarian. For morpho-syntactic encoding, the Hungarian version of MSD (Morpho-Syntactic Description) has been used. The corpus contains texts of five different topic areas: schoolchildrens compositions, fiction, computer-related texts, news, and legal texts. During annotation, linguists have checked the morpho-syntactic parsing of each word. Finding part-of-speech tagging (disambiguation) rules by machine learning algorithms was also studied by the researchers of the consortium. Due to the fact that the size of the corpus reaches up to 1 million text words without punctuation characters, it may serve as a reference source for numerous future research applications. The corpus can be obtained freely via Internet for research and educational purposes.

inductive logic programming | 1997

Learning Phonetic Rules in a Speech Recognition System

Zoltán Alexin; János Csirik; Tibor Gyimóthy; Márk Jelasity; László Tóth

Current speech recognition systems can be categorized into two broad classes; the knowledge-based approach and the stochastic one. In this paper we present a rule-based method for the recognition of Hungarian vowels. A spectrogram model was used as a front-end module and some acoustic features were extracted (e.g. locations, intensities and shapes of local maxima) from spectrograms by using a genetic algorithm method. On the basis of these features we developed a rule set for the recognition of isolated Hungarian vowels. These rules represented by Prolog clauses were refined by the IMPUT Inductive Logic Programming method.

artificial intelligence in medicine in europe | 1997

Application of Inductive Logic Programming for Learning ECG Waveforms

Gabriella Kókai; Zoltán Alexin; Tibor Gyimóthy

In this paper a learning system is presented which integrates an ECG waveform classifier (called PECG) with an interactive learner (called IMPUT). The PECG system is based on an attribute grammar specification of ECGs that has been transformed to Prolog. The IMPUT system combines the interactive debugging technique IDT with the unfolding algorithm introduced in SPECTRE. Using the IMPUT system we can effectively assist in preparing the correct description of the basic structures of ECG waveforms.

intelligent data analysis | 1997

IMPUT: An Interactive Learning Tool Based on Program Specialization

Zoltán Alexin; Tibor Gyimóthy; Henrik Boström

The algorithm SPECTRE specializes logic programs with respect to positive and negative examples by applying the transformation rule unfolding together with clause removal. The method IMPUT presented in this paper gives a modified version of this algorithm by integrating the algorithmic debugging system IDTS with SPECTRE. The main idea of the IMPUT method, is that the identification of a clause to be unfolded has a crucial importance on the effectiveness of the specialization process. The debugging system IDTS is used to identify this buggy clause.

inductive logic programming | 1996

Analyzing and Learning ECG Waveforms

Gabriella Kókai; Zoltán Alexin; Tibor Gyimóthy

In this paper we present a system which integrates an ECG waveform classifier (called PECG) with an interactive learner (called IMPUT. The PECG system is based on an attribute grammar specification of ECGs that has been transformed to Prolog. The IMPUT system combines the interactive debugging technique IDT with the unfolding algorithm introduced in SPECTRE. The main result achieved in the new version of the PECG system is that an ILP method can be used to improve the effectiveness of a real size Prolog application. Applying the IMPUT method, the extended PECG system is able to suggest a correct solution to the user to replace the buggy clause recognized during the debugging process.

text speech and dialogue | 2012

A Manually Annotated Corpus of Pharmaceutical Patents

Márton Kiss; Ágoston Nagy; Veronika Vincze; Attila Almási; Zoltán Alexin; János Csirik

The language of patent claims differs from ordinary language to a great extent, which results in the fact that tools especially adapted to patent language are needed in patent processing. In order to evaluate these tools, manually annotated patent corpora are necessary. Thus, we constructed a corpus of English language pharmaceutical patents belonging to the class A61K, on which several layers of manual annotation (such as named entities, keys, NucleusNPs, quantitative expressions, heads and complements, perdurants) were carried out and on which tools for patent processing can be evaluated.

language resources and evaluation | 2010