Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Beáta Megyesi is active.

Publication


Featured researches published by Beáta Megyesi.


empirical methods in natural language processing | 2007

Single Malt or Blended? A Study in Multilingual Parser Optimization

Johan Hall; Jens Nilsson; Joakim Nivre; G"ulsen Eryigit; Beáta Megyesi; Mattias Nilsson; Markus Saers

We describe a two-stage optimization of the MaltParser system for the ten languages in the multilingual track of the CoNLL 2007 shared task on dependency parsing. The first stage consists in tuning a single-parser system for each language by optimizing parameters of the parsing algorithm, the feature model, and the learning algorithm. The second stage consists in building an ensemble system that combines six different parsing strategies, extrapolating from the optimal parameter settings for each language. When evaluated on the official test sets, the ensemble system significantly outperformed the single-parser system and achieved the highest average labeled attachment score of all systems participating in the shared task.


meeting of the association for computational linguistics | 2006

A Study on Automatically Extracted Keywords in Text Categorization

Anette Hulth; Beáta Megyesi

This paper presents a study on if and how automatically extracted keywords can be used to improve text categorization. In summary we show that a higher performance --- as measured by micro-averaged F-measure on a standard text categorization collection --- is achieved when the full-text representation is combined with the automatically extracted keywords. The combination is obtained by giving higher weights to words in the full-texts that are also extracted as keywords. We also present results for experiments in which the keywords are the only input to the categorizer, either represented as unigrams or intact. Of these two experiments, the unigrams have the best performance, although neither performs as well as headlines only.


sighum workshop on language technology for cultural heritage social sciences and humanities | 2014

A Multilingual Evaluation of Three Spelling Normalisation Methods for Historical Text

Eva Pettersson; Beáta Megyesi; Joakim Nivre

We present a multilingual evaluation of approaches for spelling normalisation of historical text based on data from five languages: English, German, Hungarian, Icelandic, and Swedish. Three different normalisation methods are evaluated: a simplistic filtering model, a Levenshteinbased approach, and a character-based statistical machine translation approach. The evaluation shows that the machine translation approach often gives the best results, but also that all approaches improve over the baseline and that no single method works best for all languages.


Nordic Journal of Linguistics | 2014

Professional language in Swedish clinical text: Linguistic characterization and comparative studies

Kelly Smith; Beáta Megyesi; Sumithra Velupillai; Maria Kvist

This study investigates the linguistic characteristics of Swedish clinical text in radiology reports and doctors daily notes from electronic health records (EHRs) in comparison to general Swedish and biomedical journal text. We quantify linguistic features through a comparative register analysis to determine how the free text of EHRs differ from general and biomedical Swedish text in terms of lexical complexity, word and sentence composition, and common sentence structures. The linguistic features are extracted using state-of-the-art computational tools: a tokenizer, a part-of-speech tagger, and scripts for statistical analysis. Results show that technical terms and abbreviations are more frequent in clinical text, and lexical variance is low. Moreover, clinical text frequently omit subjects, verbs, and function words resulting in shorter sentences. Clinical text not only differs from general Swedish, but also internally, across its sub-domains, e.g. sentences lacking verbs are significantly more frequent in radiology reports. These results provide a foundation for future development of automatic methods for EHR simplification or clarification.


conference of the european chapter of the association for computational linguistics | 2014

EACL - Expansion of Abbreviations in CLinical text

Lisa Tengstrand; Beáta Megyesi; Aron Henriksson; Martin Duneld; Maria Kvist

In the medical domain, especially in clinical texts, non-standard abbreviations are prevalent, which impairs readability for patients. To ease the understanding of the physicians’ notes, abbreviations need to be identified and expanded to their original forms. We present a distributional semantic approach to find candidates of the original form of the abbreviation, and combine this with Levenshtein distance to choose the correct candidate among the semantically related words. We apply the method to radiology reports and medical journal texts, and compare the results to general Swedish. The results show that the correct expansion of the abbreviation can be found in 40% of the cases, an improvement by 24 percentage points compared to the baseline (0.16), and an increase by 22 percentage points compared to using word space models alone (0.18).


text speech and dialogue | 2000

Ensemble of Classifiers for Noise Detection in PoS Tagged Corpora

Harald Berthelsen; Beáta Megyesi

In this paper we apply the ensemble approach to the identification of incorrectly annotated items (noise) in a training set. In a controlled experiment, memory-based, decision tree-based and transformation-based classifiers are used as a filter to detect and remove noise deliberately introduced into a manually tagged corpus. The results indicate that the method can be successfully applied to automatically detect errors in a corpus.


Archive | 2002

Data-driven syntactic analysis-Methods and applications for Swedish

Beáta Megyesi


Journal of Machine Learning Research | 2002

Shallow parsing with pos taggers and linguistic features

Beáta Megyesi


empirical methods in natural language processing | 1999

Improving Brill's PoS Tagger for an Agglutinative Language

Beáta Megyesi


Proceedings of Fonetik 2002, Stockholm, May 2002 | 2002

Boundaries and groupings: the structuring of speech in different communicative situations. A description of the GROG project

Rolf Carlson; Björn Granström; Mattias Heldner; David House; Beáta Megyesi; Eva Strangert; Marc Swerts

Collaboration


Dive into the Beáta Megyesi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kevin Knight

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge