Amália Mendes
University of Lisbon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Amália Mendes.
processing of the portuguese language | 2012
Michel Généreux; Iris Hendrickx; Amália Mendes
We present a newly available on-line resource for Portuguese, a corpus of 310 million words, a new version of the Reference Corpus of Contemporary Portuguese, now searchable via a user-friendly web interface. Here we report on work carried out on the corpus previous to its publication on-line. We focus on the processes and tools involved for the cleaning, preparation and annotation to make the corpus suitable for linguistic inquiries.
processing of the portuguese language | 2014
Paulo Quaresma; Amália Mendes; Iris Hendrickx; Teresa Gonçalves
We present in this paper an experiment in automatically tagging a set of Portuguese modal verbs with modal information. Modality is the expression of the speaker’s (or the subject’s) attitude towards the content of the sentences and may be marked with lexical clues such as verbs, adverbs, adjectives, but also by mood and tense. Here we focus exclusively on 9 verbal clues that are frequent in Portuguese and that may have more than one modal meaning. We use as our gold data set a corpus of 160.000 tokens manually annotated, according to a modality annotation scheme for Portuguese. We apply a machine learning approach to predict the modal meaning of a verb in context. This modality tagger takes into consideration all the features available from the parsed data (pos, syntactic and semantic). The results show that the tagger improved the baseline for all verbs, and reached macro-average F-measures between 35 and 81% depending on the modal verb and on the modal value.
processing of the portuguese language | 2006
Sandra Antunes; Maria Fernanda Bacelar do Nascimento; João Miguel Casteleiro; Amália Mendes; Luísa Pereira; Tiago Sá
This presentation focuses on an ongoing project which aims at the creation of a large lexical database of Portuguese multiword (MW) units, automatically extracted through the analysis of a balanced 50 million word corpus, statistically interpreted with lexical association measures and validated by hand. This database covers different types of MW units, like named entities, and lexical associations ranging from sets of favoured co-occurring forms to strongly lexicalized expressions. This new resource has a two-fold objective: to be an important research tool which supports the development of MW units typologies; to be of major help in developing and evaluating language processing tools able of dealing with MW expressions.
PROPOR | 2018
Amália Mendes; Iria del Río
We describe two new resources that have been prepared for European Portuguese and how they are used for discourse parsing: the Portuguese subpart of the TED-MDB corpus, a multilingual corpus of TED Talks that has been annotated in the PDTB style, and the Lexicon of Discourse Markers for Portuguese (LDM-PT). Both lexicon and corpus are used in a preliminary experiment for discourse connective identification in texts. This includes, in many cases, the difficult task of disambiguating between connective and non-connective uses. We annotated the PT-TED-MDB corpus with POS, lemma and syntactic constituency and focus on the 10 most frequent connectives in the corpus. The best approach considers word-form+POS+syntactic annotation and leads to 85% precision.
conference on intelligent text processing and computational linguistics | 2016
João Sequeira; Teresa Gonçalves; Paulo Quaresma; Amália Mendes; Iris Hendrickx
This paper presents a study made in a field poorly explored in the Portuguese language – modality and its automatic tagging. Our main goal was to find a set of attributes for the creation of automatic taggers with improved performance over the bag-of-words (bow) approach. The performance was measured using precision, recall and \(F_1\). Because it is a relatively unexplored field, the study covers the creation of the corpus (composed by eleven verbs), the use of a parser to extract syntactic and semantic information from the sentences and a machine learning approach to identify modality values. Based on three different sets of attributes – from trigger itself and the trigger’s path (from the parse tree) and context – the system creates a tagger for each verb achieving (in almost every verb) an improvement in \(F_1\) when compared to the traditional bow approach.
Pluricentric Languages and Non-Dominant Varieties Worldwide: Volume 2: The pluricentricity of Portuguese and Spanish: New concepts and descriptions | 2016
Amália Mendes; Antónia Estrela; Fernanda Bacelar do Nascimento; Luísa Pereira; Sandra Antunes
The present study focuses on the language attitudes of Russian L2 learners of Greek - who reside in Cyprus - towards Cypriot Greek (CG) and Standard Modern Greek (SMG) in the light of pluricentricity theory (Clyne, 1992; Muhr, 2003, 2005; Muhr, 2012). The matched-guise technique (Lambert, 1960; 1967; Evripidou, 2011) was implemented and 50 L1 Russian participants were asked to evaluate the personal qualities of bi-dialectal speakers through the use of the Likert-scale questionnaire and recordings which were used in Evripidou’s study (2013). Participants completed the semantic differential scale and assessed the recorded passages of the same speakers on two different guises: CG and SMG. The results showed that L2 learners of Greek with L1 Russian background tend to have a more positive attitude towards SMG than CG. Overall, people who speak SMG are considered to be kinder, more sincere, educated, attractive, friendly, modern, hard-working, intelligent, and have a better sense of humour than speakers of CG. When comparing and constructing the results with Evripidou’s study (ibid), these appear to be mainly in disagreement. In general, Russian L2 learners of Greek who live in Cyprus, have a negative attitude towards CG, an unofficial variety of the pluricentric language (Greek), while they seem to favour SMG, the official variety.
language resources and evaluation | 2006
Florbela Barreto; António Branco; Eduardo Ferreira; Amália Mendes; Maria Fernanda Bacelar do Nascimento; Filipe Nunes; João Ricardo Silva
language resources and evaluation | 2012
Iris Hendrickx; Amália Mendes; Silvia Mencarelli
linguistic annotation workshop | 2010
Iris Hendrickx; Amália Mendes; Sílvia Afonso Pereira; Anabela Gonçalves; Inês Duarte
language resources and evaluation | 2006
Amália Mendes; Sandra Antunes; Maria Fernanda Bacelar do Nascimento; João Miguel Casteleiro; Luísa Pereira; Tiago Sá