Luís Sarmento
University of Porto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Luís Sarmento.
conference on information and knowledge management | 2009
Paula Carvalho; Luís Sarmento; Mário J. Silva; Eugénio de Oliveira
We investigate the accuracy of a set of surface patterns in identifying ironic sentences in comments submitted by users to an on-line newspaper. The initial focus is on identifying irony in sentences containing positive predicates since these sentences are more exposed to irony, making their true polarity harder to recognize. We show that it is possible to find ironic sentences with relatively high precision (from 45% to 85%) by exploring certain oral or gestural clues in user comments, such as emoticons, onomatopoeic expressions for laughter, heavy punctuation marks, quotation marks and positive interjections. We also demonstrate that clues based on deeper linguistic information are relatively inefficient in capturing irony in user-generated content, which points to the need for exploring additional types of oral clues.
international world wide web conferences | 2012
Matko Bošnjak; Eduardo Oliveira; José Martins; Eduarda Mendes Rodrigues; Luís Sarmento
Modern social network analysis relies on vast quantities of data to infer new knowledge about human relations and communication. In this paper we describe TwitterEcho, an open source Twitter crawler for supporting this kind of research, which is characterized by a modular distributed architecture. Our crawler enables researchers to continuously collect data from particular user communities, while respecting Twitters imposed limits. We present the core modules of the crawling server, some of which were specifically designed to focus the crawl on the Portuguese Twittosphere. Additional modules can be easily implemented, thus changing the focus to a different community. Our evaluation of the system shows high crawling performance and coverage.
conference on information and knowledge management | 2007
Luís Sarmento; Valentin Jijkuon; Maarten de Rijke; Eugénio C. Oliveira
We present a corpus-based approach to the class expansion task. For a given set of seed entities we use co-occurrence statistics taken from a text collection to define a membership function that is used to rank candidate entities for inclusion in the set. We describe an evaluation framework that uses data from Wikipedia. The performance of our class extension method improves as the size of the text collection increases.
conference on information and knowledge management | 2009
Luís Sarmento; Paula Carvalho; Mário J. Silva; Eugénio de Oliveira
We propose and evaluate a method for automatically creating a reference corpus for training text classification procedures for mining political opinions in user-generated content. The process starts by compiling a collection of highly opinionated comments posted by users on an on-line newspaper. Then, we define and use a set of manually-crafted high-precision rules supported by a large sentiment-lexicon in order to identify sentences in each comment expressing opinions about political entities. Finally, the opinions found are propagated to the remainder sentences of the comment mentioning the same entities, thus increasing the number and variety of opinion-bearing sentences. Results show that most of the rules can identify negative opinions with very high precision, and these can be safely propagated to the remainder sentences in the comment in almost 100% of the cases. Due to problems arising from irony, the precision of identification drops for positive opinions, but several rules still reach high precision. Propagation of positive opinions is correct in about 77% of the cases, and most errors at this stage result from irony and polarity inversion throughout the comment.
adaptive agents and multi-agents systems | 2003
Eugénio C. Oliveira; Luís Sarmento
During the last two decades, researchers have collected a decisive amount of experimental evidence about the fundamental role of Emotion on cognitive processing. Emotional phenomena have been correlated with effective decision-making processes, memory, learning and other high-level cognitive capabilities and skills (e.g. risk assessment). In this paper we will describe an ongoing work that aims to design new Agent Architectures influenced by what has been learned in psychology and neurosciences about Emotion-cognition interaction. We will present an Agent architecture that includes several emotional-like mechanisms, namely: emotional evaluation functions, Emotion-biased processing, emotional tagging and mood congruent memory. These mechanisms are intended to increase the performance and adaptability of Agents operating in real-time environments. We will also introduce Pyrosim, a MAS platform we have developed to serve as an appropriate test-bed for Emotional-based Architectures, which simulates a forest fire in a complex 3D environment.
processing of the portuguese language | 2012
Mário J. Silva; Paula Carvalho; Luís Sarmento
We present a methodology for automatically enlarging a Portuguese sentiment lexicon for mining social judgments from text, i.e., detecting opinions on human entities. Starting from publicly-availabe language resources, the identification of human adjectives is performed through the combination of a linguistic-based strategy, for extracting human adjective candidates from corpora, and machine learning for filtering the human adjectives from the candidate list. We then create a graph of the synonymic relations among the human adjectives, which is built from multiple open thesauri. The graph provides distance features for training a model for polarity assignment. Our initial evaluation shows that this method produces results at least as good as the best that have been reported for this task.
portuguese conference on artificial intelligence | 2005
David Pereira; Eugénio C. Oliveira; Nelma Moreira; Luís Sarmento
In this paper we present the emotional-BDI architecture, an extension to the BDI architecture supporting artificial emotions and including internal representations for agents capabilities and resources. The architecture we present here is conceptual, defining which components should exist so that emotional-BDI agents can use effective capabilities as well as effective resources in order to better cope with highly dynamic environments
machine learning and data mining in pattern recognition | 2009
Luís Sarmento; Alexander Kehlenbeck; Eugénio C. Oliveira; Lyle H. Ungar
We present a multi-pass clustering approach to large scale, wide-scope named-entity disambiguation (NED) on collections of web pages. Our approach uses name co-occurrence information to cluster and hence disambiguate entities, and is designed to handle NED on the entire web. We show that on web collections, NED becomes increasingly difficult as the corpus size increases, not only because of the challenge of scaling the NED algorithm, but also because new and surprising facets of entities become visible in the data. This effect limits the potential benefits for data-driven approaches of processing larger data-sets, and suggests that efficient clustering-based disambiguation methods for the web will require extracting more specialized information from documents.
processing of the portuguese language | 2006
Luís Sarmento
In this paper we describe SIEMES, a named-entity recognition system for Portuguese that relies on a set of similarity rules to base the classification procedure. These rules try to obtain soft matches between candidate entities found in text and instances contained in a wide-scope gazetteer, and avoid the need for coding large sets of rules by exploiting lexical similarities. Using this matching procedure, SIEMES generates a set of classification hypotheses based solely on internal evidence, which may be disambiguated in a later step by relatively simple rules based on contextual clues. We explain SIEMES architecture and its named-entity identification and classification procedure. We also briefly discuss the results of the participation of SIEMES in HAREM, the named-entity evaluation contest for Portuguese, and describe future work.
processing of the portuguese language | 2006
Luís Sarmento; Ana Sofia Pinto; Luís Miguel Cabral
In this paper we describe REPENTINO, a publicly available gazetteer intended to help the development of named entity recognition systems for Portuguese. REPENTINO wishes to minimize the problems developers face due to the limited availability of this type of lexical-semantic resources for Portuguese. The data stored in REPENTINO was mostly extracted from corpora and from the web using simple semi-automated methods. Currently, REPENTINO stores nearly 450k instances of named entities divided in more than 100 categories and subcategories covering a much wider set of domains than those usually included in traditional gazetteers. We will present some figures regarding the current content of the gazetteer and describe future work regarding the evaluation of this resource and its enrichment with additional information.