Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paula Carvalho is active.

Publication


Featured researches published by Paula Carvalho.


conference on information and knowledge management | 2009

Clues for detecting irony in user-generated contents: oh...!! it's "so easy" ;-)

Paula Carvalho; Luís Sarmento; Mário J. Silva; Eugénio de Oliveira

We investigate the accuracy of a set of surface patterns in identifying ironic sentences in comments submitted by users to an on-line newspaper. The initial focus is on identifying irony in sentences containing positive predicates since these sentences are more exposed to irony, making their true polarity harder to recognize. We show that it is possible to find ironic sentences with relatively high precision (from 45% to 85%) by exploring certain oral or gestural clues in user comments, such as emoticons, onomatopoeic expressions for laughter, heavy punctuation marks, quotation marks and positive interjections. We also demonstrate that clues based on deeper linguistic information are relatively inefficient in capturing irony in user-generated content, which points to the need for exploring additional types of oral clues.


cross language evaluation forum | 2008

GeoCLEF 2008: the CLEF 2008 cross-language geographic information retrieval track overview

Thomas Mandl; Paula Carvalho; Giorgio Maria Di Nunzio; Fredric C. Gey; Ray R. Larson; Diana Santos; Christa Womser-Hacker

GeoCLEF is an evaluation task running under the scope of the Cross Language Evaluation Forum (CLEF). The purpose of GeoCLEF is to test and evaluate cross-language geographic information retrieval (GIR). The GeoCLEF 2008 task presented twenty-five geographically challenging search topics for English, German and Portuguese. Eleven participants submitted 131 runs, based on a variety of approaches, including sample documents, named entity extraction and ontology based retrieval. The evaluation methodology and results are presented in the paper.


conference on information and knowledge management | 2009

Automatic creation of a reference corpus for political opinion mining in user-generated content

Luís Sarmento; Paula Carvalho; Mário J. Silva; Eugénio de Oliveira

We propose and evaluate a method for automatically creating a reference corpus for training text classification procedures for mining political opinions in user-generated content. The process starts by compiling a collection of highly opinionated comments posted by users on an on-line newspaper. Then, we define and use a set of manually-crafted high-precision rules supported by a large sentiment-lexicon in order to identify sentences in each comment expressing opinions about political entities. Finally, the opinions found are propagated to the remainder sentences of the comment mentioning the same entities, thus increasing the number and variety of opinion-bearing sentences. Results show that most of the rules can identify negative opinions with very high precision, and these can be safely propagated to the remainder sentences in the comment in almost 100% of the cases. Due to problems arising from irony, the precision of identification drops for positive opinions, but several rules still reach high precision. Propagation of positive opinions is correct in about 77% of the cases, and most errors at this stage result from irony and polarity inversion throughout the comment.


processing of the portuguese language | 2012

Building a sentiment lexicon for social judgement mining

Mário J. Silva; Paula Carvalho; Luís Sarmento

We present a methodology for automatically enlarging a Portuguese sentiment lexicon for mining social judgments from text, i.e., detecting opinions on human entities. Starting from publicly-availabe language resources, the identification of human adjectives is performed through the combination of a linguistic-based strategy, for extracting human adjective candidates from corpora, and machine learning for filtering the human adjectives from the candidate list. We then create a graph of the synonymic relations among the human adjectives, which is built from multiple open thesauri. The graph provides distance features for training a model for polarity assignment. Our initial evaluation shows that this method produces results at least as good as the best that have been reported for this task.


conference on computational natural language learning | 2016

Modelling Context with User Embeddings for Sarcasm Detection in Social Media.

Silvio Amir; Byron C. Wallace; Hao Lyu; Paula Carvalho; Mário J. Silva

We introduce a deep neural network for automated sarcasm detection. Recent work has emphasized the need for models to capitalize on contextual features, beyond lexical and syntactic cues present in utterances. For example, different speakers will tend to employ sarcasm regarding different subjects and, thus, sarcasm detection models ought to encode such speaker information. Current methods have achieved this by way of laborious feature engineering. By contrast, we propose to automatically learn and then exploit user embeddings, to be used in concert with lexical signals to recognize sarcasm. Our approach does not require elaborate feature engineering (and concomitant data scraping); fitting user embeddings requires only the text from their previous posts. The experimental results show that our model outperforms a state-of-the-art approach leveraging an extensive set of carefully crafted features.


Environmental Earth Sciences | 2012

Assessment to the potential mobility and toxicity of metals and metalloids in soils contaminated by old Sb–Au and As–Au mines (NW Portugal)

Paula Carvalho; A.M.R. Neiva; M.M.V.G. Silva

The main purpose of this study is to assess arsenic and antimony availability in soils, as well as Co, Cr, Cu, Fe, Mn, Ni, Pb and Zn availability in soils derived from the schist–metagraywacke complex close to old Sb–Au mines and in soils developed from Ordovician slates and close to an old As–Au mine in Portugal. The availability was determined using a European certified sequential extraction procedure (BCR). The results demonstrated that metalloids are not readily bioavailable, because they are mainly associated with the residual fraction. Arsenic and antimony proportions in exchangeable fractions are up to 3 and 1%, respectively. However, arsenic is up to 24% in oxy-hydroxide fractions, while antimony is up to 4% in them, demonstrating the highest bioavailability of arsenic compared to that of antimony, as metalloids are weakly bound to the soils in that fraction. Therefore, arsenic tends to be more toxic than antimony in all soils studied. However, the pseudo-total contents show that both metalloids are above the Italian and Dutch guidelines. Therefore, if physico-chemical changes occur arsenic and antimony will show higher potential environmental risk than evidenced by Co, Cr, Cu, Fe, Mn, Ni, Pb and Zn.


processing of the portuguese language | 2008

Second HAREM: New Challenges and Old Wisdom

Diana Santos; Cláudia Freitas; Hugo Gonçalo Oliveira; Paula Carvalho

Discussion of the Second HAREM: changes to the guidelines, introduction of new tracks, improvement of evaluation measures and description of the new evaluation resources.


Geochemistry-exploration Environment Analysis | 2009

Geochemistry of soils, stream sediments and waters close to abandoned W–Au–Sb mines at Sarzedas, Castelo Branco, central Portugal

Paula Carvalho; A.M.R. Neiva; M.M.V.G. Silva

ABSTRACT In the Sarzedas area, central Portugal, the Cambrian schist–metagreywacke complex predominates and is intersected by W–Au–Sb quartz veins and Sb–Au-bearing felsitic dykes, which were exploited for W, Au and Sb at the Gatas–Santa and Pomar–Galdins mines. The soils from the Gatas–Santa and Pomar–Galdins areas contain up to 11 000 and 21 600 ppm Sb, respectively. In the Sarzedas area, the stream sediments contain up to 840 ppm Sb. Only some surface waters contain Sb, which reaches 2.6 mg l−1. The waters have pH values of 5.5–7.3; they are neutralized by carbonate intercalations in the schist–metagrawacke complex. The mine wastes contain ferberite and sulphides, and have goethite and ferrihydrite coatings, which have up to 8.10 wt% Sb2O5 and 2.26 wt% As2O5 and contain inclusions of stibnite, arsenopyrite and pyrite. The highest Sb and As concentrations in soils, stream sediments and surface waters are related to the mineralized veins and dykes and mine wastes. The good correlation between W, Sb and As contents in soils from Gatas–Santa and good correlation coefficients for W–Sb, W–Pb and Sb–Pb in stream sediments from Gatas–Santa and Pomar–Galdins are due to the weathering of ferberite, stibnite, arsenopyrite and galena from mineralized veins and dykes and mine wastes. Soils, stream sediments and waters from the Sarzedas area are contaminated, which is consistent with findings for comparable effects in historical Sb mine sites elsewhere in Europe and Australasia, but the Sarzedas waters contain the highest Sb concentration and the lowest As concentration.


Proceedings of the 14th International Academic MindTrek Conference on Envisioning Future Media Environments | 2010

VIRUS: video information retrieval using subtitles

Thibault Langlois; Teresa Chambel; Eva Oliveira; Paula Carvalho; Gonçalo Marques; André O. Falcão

Video is a very rich medium that is becoming increasingly dominant. A massive amount of video information is available, but very difficult to access if not adequately indexed: a challenging task to accomplish. We describe a Video Information Retrieval system, under development, that operates on a database composed of subtitled documents. The simultaneous analysis of video, subtitles and audio streams is performed in order to index, visualize and retrieve excerpts of video documents that share a certain emotional or semantic property.


ElectricDict '04 Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries | 2004

Multiword lexical acquisition and dictionary formalization

Cristina Mota; Paula Carvalho; Elisabete Ranchhod

In this paper, we present the current state of development of a large-scale lexicon built at LabEL for Portuguese. We will concentrate on multiword expressions (MWE), particularly on multiword nouns, (i) illustrating their most relevant morphological features, and (ii) pointing out the methods and techniques adopted to generate the inflected forms from lemmas. Moreover, we describe a corpus-based aproach for the acquisition of new multiword nouns, which led to a significant enlargement of the existing lexicon. Evaluation results concerning lexical coverage in the corpus are also discussed.

Collaboration


Dive into the Paula Carvalho's collaboration.

Top Co-Authors

Avatar

Cristina Mota

Technical University of Lisbon

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge