Amália Mendes | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Amália Mendes is active.

Explore More

Publication

Featured researches published by Amália Mendes.

processing of the portuguese language | 2012

A large portuguese corpus on-line: cleaning and preprocessing

Michel Généreux; Iris Hendrickx; Amália Mendes

We present a newly available on-line resource for Portuguese, a corpus of 310 million words, a new version of the Reference Corpus of Contemporary Portuguese, now searchable via a user-friendly web interface. Here we report on work carried out on the corpus previous to its publication on-line. We focus on the processes and tools involved for the cleaning, preparation and annotation to make the corpus suitable for linguistic inquiries.

processing of the portuguese language | 2014

Tagging and Labelling Portuguese Modal Verbs

Paulo Quaresma; Amália Mendes; Iris Hendrickx; Teresa Gonçalves

We present in this paper an experiment in automatically tagging a set of Portuguese modal verbs with modal information. Modality is the expression of the speaker’s (or the subject’s) attitude towards the content of the sentences and may be marked with lexical clues such as verbs, adverbs, adjectives, but also by mood and tense. Here we focus exclusively on 9 verbal clues that are frequent in Portuguese and that may have more than one modal meaning. We use as our gold data set a corpus of 160.000 tokens manually annotated, according to a modality annotation scheme for Portuguese. We apply a machine learning approach to predict the modal meaning of a verb in context. This modality tagger takes into consideration all the features available from the parsed data (pos, syntactic and semantic). The results show that the tagger improved the baseline for all verbs, and reached macro-average F-measures between 35 and 81% depending on the modal verb and on the modal value.

processing of the portuguese language | 2006

A lexical database of portuguese multiword expressions

Sandra Antunes; Maria Fernanda Bacelar do Nascimento; João Miguel Casteleiro; Amália Mendes; Luísa Pereira; Tiago Sá

This presentation focuses on an ongoing project which aims at the creation of a large lexical database of Portuguese multiword (MW) units, automatically extracted through the analysis of a balanced 50 million word corpus, statistically interpreted with lexical association measures and validated by hand. This database covers different types of MW units, like named entities, and lexical associations ranging from sets of favoured co-occurring forms to strongly lexicalized expressions. This new resource has a two-fold objective: to be an important research tool which supports the development of MW units typologies; to be of major help in developing and evaluating language processing tools able of dealing with MW expressions.

PROPOR | 2018

Using a Discourse Bank and a Lexicon for the Automatic Identification of Discourse Connectives

Amália Mendes; Iria del Río

We describe two new resources that have been prepared for European Portuguese and how they are used for discourse parsing: the Portuguese subpart of the TED-MDB corpus, a multilingual corpus of TED Talks that has been annotated in the PDTB style, and the Lexicon of Discourse Markers for Portuguese (LDM-PT). Both lexicon and corpus are used in a preliminary experiment for discourse connective identification in texts. This includes, in many cases, the difficult task of disambiguating between connective and non-connective uses. We annotated the PT-TED-MDB corpus with POS, lemma and syntactic constituency and focus on the 10 most frequent connectives in the corpus. The best approach considers word-form+POS+syntactic annotation and leads to 85% precision.

conference on intelligent text processing and computational linguistics | 2016

Using Syntactic and Semantic Features for Classifying Modal Values in the Portuguese Language.

João Sequeira; Teresa Gonçalves; Paulo Quaresma; Amália Mendes; Iris Hendrickx

This paper presents a study made in a field poorly explored in the Portuguese language – modality and its automatic tagging. Our main goal was to find a set of attributes for the creation of automatic taggers with improved performance over the bag-of-words (bow) approach. The performance was measured using precision, recall and \(F_1\). Because it is a relatively unexplored field, the study covers the creation of the corpus (composed by eleven verbs), the use of a parser to extract syntactic and semantic information from the sentences and a machine learning approach to identify modality values. Based on three different sets of attributes – from trigger itself and the trigger’s path (from the parse tree) and context – the system creates a tagger for each verb achieving (in almost every verb) an improvement in \(F_1\) when compared to the traditional bow approach.

Pluricentric Languages and Non-Dominant Varieties Worldwide: Volume 2: The pluricentricity of Portuguese and Spanish: New concepts and descriptions | 2016

New words, old suffixes: Nominal derivation in the African varieties of Portuguese compared to European Portuguese

Amália Mendes; Antónia Estrela; Fernanda Bacelar do Nascimento; Luísa Pereira; Sandra Antunes

The present study focuses on the language attitudes of Russian L2 learners of Greek - who reside in Cyprus - towards Cypriot Greek (CG) and Standard Modern Greek (SMG) in the light of pluricentricity theory (Clyne, 1992; Muhr, 2003, 2005; Muhr, 2012). The matched-guise technique (Lambert, 1960; 1967; Evripidou, 2011) was implemented and 50 L1 Russian participants were asked to evaluate the personal qualities of bi-dialectal speakers through the use of the Likert-scale questionnaire and recordings which were used in Evripidou’s study (2013). Participants completed the semantic differential scale and assessed the recorded passages of the same speakers on two different guises: CG and SMG. The results showed that L2 learners of Greek with L1 Russian background tend to have a more positive attitude towards SMG than CG. Overall, people who speak SMG are considered to be kinder, more sincere, educated, attractive, friendly, modern, hard-working, intelligent, and have a better sense of humour than speakers of CG. When comparing and constructing the results with Evripidou’s study (ibid), these appear to be mainly in disagreement. In general, Russian L2 learners of Greek who live in Cyprus, have a negative attitude towards CG, an unofficial variety of the pluricentric language (Greek), while they seem to favour SMG, the official variety.

language resources and evaluation | 2006