Vitor Rocio
Universidade Aberta
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vitor Rocio.
Grammars | 2001
Vitor Rocio; Gabriel Pereira Lopes; Éric Villemonte de la Clergerie
Efficient partial parsing systems (chunkers) are urgently required by various natural language application areas because these parsers always produce partially parsed text even when the text does not fully fit existing lexica and grammars. Availability of partially parsed corpora is absolutely necessary for extracting various kinds of information that may then be fed into those systems, thereby increasing their processing power. In this paper, we propose an efficient partial parsing scheme, based on chart parsing, that is flexible enough to support both normal parsing tasks and diagnosis in previously obtained partial parses of possible causes (kinds of faults) that led to those partial, instead of complete, parses. Through the use of the built-in tabulation capabilites of the DyALog system, we implemented a partial parser that runs as fast as the best non-deterministic parsers. In this paper we elaborate on the implementation of two different grammar formalisms: Definite Clause Grammars (DCG) extended with head declarations and Bound Movement Grammars (BMG).
Archive | 2003
Vitor Rocio; Mário Amado Alves; J. Gabriel Pereira Lopes; Maria Francisca Xavier; Graça Vicente
The growing trend towards corpus-based linguistics has led researchers to manually annotate large quantities of text. The human effort involved in this task is often enormous, and requires highly specialised linguistically trained manpower. According to our point of view, another approach should be followed, using this highly trained manpower in other activities, more rewarding and creative, in a constructive dialogue among the various kinds of expertise needed for overcoming our ignorance about languages. As an experiment, we used tools and linguistic resources previously built for Contemporary Portuguese for partially automating the process of partial annotation of a Medieval Portuguese corpus. In this paper, we describe the tools used (POS tagger, lexical analyser and partial parser) and demonstrate that the similarities between a language at two different time periods is sufficient for bootstrapping and acquiring lexical knowledge from the partially parsed, automatically annotated corpus.
conference on information and knowledge management | 2007
Gracinda Carvalho; David Martins de Matos; Vitor Rocio
Question Answering (QA) has been an area of interest for researchers, in part motivated by the international QA evaluation forums, namely the Text REtrieval Conference (TREC), and more recently, the Cross Language Evaluation Forum (CLEF) through QA@CLEF, that since 2004 includes the Portuguese language. In these forums, a collection of written documents is provided, as well as a set of questions, which are to be answered by the participating systems. Each system is evaluated by its capacity to answer the questions, as a whole, and there are relatively few results published that focus on the performance of its different components and their influence on the overall system performance. That is the case of the Information Retrieval (IR) component, which is broadly used in QA systems. Our work concentrates on the different options of preprocessing Portuguese text before feeding it to the IR component, evaluating their impact on the IR performance in the specific context of QA, so that we can make a sustained choice of which options to choose. From this work we conclude the clear advantage of the basic preprocessing techniques: case folding and removal of punctuation marks. For the other techniques considered, stop word removal enhanced the performance of the IR system but that was not the case as far as Stemming and Lemmatization are concerned.
cross language evaluation forum | 2008
Gracinda Carvalho; David Martins de Matos; Vitor Rocio
IdSay is an open domain Question Answering (QA) system for Portuguese. Its current version can be considered a baseline version, using mainly techniques from the area of Information Retrieval (IR). The only external information it uses besides the text collections is lexical information for Portuguese. It was submitted to the monolingual Portuguese task of the QA track of the Cross-Language Evaluation Forum 2008 (QA@CLEF) for the first time, and it answered correctly to 65 of the 200 questions in the first answer, and to 85 answers considering the three answers that could be returned per question. Generally, the types of questions that are answered better by IdSay system are measure factoids, count factoids and definitions, but there is still work to be done in these areas, as well as in the treatment of time. List questions, location and people/organization factoids are the types of question with more room for improvement.
processing of the portuguese language | 2010
Gracinda Carvalho; David Martins de Matos; Vitor Rocio
IdSay is a Question Answering system for Portuguese that participated at QA@CLEF 2008 with a baseline version (IdSayBL). Despite the encouraging results, there was still much room for improvement. The participation of six systems in the Portuguese task, with very good results either individually or in an hypothetical combination run, provided a valuable source of information. We made an analysis of all the answers submitted by all systems to identify their strengths and weaknesses. We used the conclusions of that analysis to guide our improvements, keeping in mind the two key characteristics we want for the system: efficiency in terms of response time and robustness to treat different types of data. As a result, an improved version of IdSay was developed, including as the most important enhancement the introduction of semantic information. We obtained significantly better results, from an accuracy in the first answer of 32.5% in IdSayBL to 50.5% in IdSay, without degradation of response time.
portuguese conference on artificial intelligence | 2007
Vitor Rocio; Joaquim Ferreira da Silva; Gabriel Pereira Lopes
Automatic morphosyntactic tagging of corpora is usually imperfect. Wrong or strange tagging may be automatically repeated following some patterns. It is usually hard to manually detect all these errors, as corpora may contain millions of tags. This paper presents an approach to detect sequences of part-of-speech tags that have an internal cohesiveness in corpora. Some sequences match to syntactic chunks or correct sequences, but some are strange or incorrect, usually due to systematically wrong tagging. The amount of time spent in separating incorrect bigrams and trigrams from correct ones is very small, but it allows us to detect 70% of all tagging errors in the corpus.
processing of the portuguese language | 2012
Gracinda Carvalho; Isabel Falé; David Martins de Matos; Vitor Rocio
A mixed corpus of Portuguese is one in which texts of different origins produce different spelling variants for the same word. A new norm, which will bring together the written texts produced both in Portugal and Brazil, giving then a more uniform orthography, has been effective since 2009, but what happens in the perspective of search, to corpora created before the norm came into practice, or within the transition period? Is the information they contain outdated and worthless? Do they need to be converted to the new norm? In the present work we analyse these questions.
portuguese conference on artificial intelligence | 2005
Gabriel Pereira Lopes; Joaquim Ferreira da Silva; Vitor Rocio; Paulo Quaresma
This chapter includes the short-papers presented in the workshop on Text Mining and Applications (TeMA 2005), organized in the framework of the Portuguese Association for Artificial Intelligence conference (EPIA). This workshop aimed at attracting quality papers and enhancing the knowledge in this area. First paper works on extraction of semantic relations from text corpora. Second paper measures the impact of cognates in parallel text alignment. Third presents two methods of using long first-order frequent patterns in text mining.
TAPD | 1998
Vitor Rocio; José Gabriel Lopes
Archive | 2009
Vitor Rocio; José Coelho; Alda Pereira