João Ricardo Silva
University of Lisbon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by João Ricardo Silva.
conference of the european chapter of the association for computational linguistics | 2006
António Branco; João Ricardo Silva
In this paper we present LX-Suite, a set of tools for the shallow processing of Portuguese. This suite comprises several modules, namely: a sentence chunker, a tokenizer, a POS tagger, featurizers and lemmatizers.
processing of the portuguese language | 2010
João Ricardo Silva; António Branco; Sérgio Castro; Ruben Reis
In this paper we assess to what extent the available Portuguese treebanks and available probabilistic parsers are suitable for out-of-the-box robust parsing of Portuguese. We also announce the release of the best parser coming out of this exercise, which is, to the best of our knowledge, the first robust parser widely available for Portuguese.
processing of the portuguese language | 2003
António Branco; João Ricardo Silva
Ambiguous strings are strings of non-whitespace characters, typically coinciding with orthographic contractions of word forms, that depending on the specific occurrence, are to be considered as consisting of one or more than one token. This sort of strings is shown to raise the problem of undesired circularity between tokenization and tagging. This paper presents a strategy to resolve ambiguous strings and dissolve such circularity.
processing of the portuguese language | 2016
João António Rodrigues; António Branco; Steven Neale; João Ricardo Silva
In this article we describe the creation and distribution of the first publicly available word embeddings for Portuguese. Our embeddings are evaluated on their own and also compared with the original English models on a well-known analogy task. We gathered a large Portuguese corpus of 1.7 billion tokens, developed the first distributional semantic analogies test set for Portuguese, and proceeded with the first parametrization and evaluation of Portuguese word embeddings models.
processing of the portuguese language | 2014
António Branco; João António Rodrigues; Francisco Costa; João Ricardo Silva; Rui Vaz
This paper is concerned with a tool that supports human experts in their task of classifying text excerpts suitable to be used in quizzes for learning materials and as items of exams that are aimed at assessing and certifying the language level of students taking courses of Portuguese as a second language.
meeting of the association for computational linguistics | 2016
Rosa Del Gaudio; Gorka Labaka; Eneko Agirre; Petya Osenova; Kiril Simov; Martin Popel; Dieke Oele; Gertjan van Noord; Luís Gomes; João António Rodrigues; Steven Neale; João Ricardo Silva; Andreia Querido; Nuno Rendeiro; António Branco
This paper presents the description of 12 systems submitted to the WMT16 IT-task, covering six different languages, namely Basque, Bulgarian, Dutch, Czech, Portuguese and Spanish. All these systems were developed under the scope of the QTLeap project, presenting a common strategy. For each language two different systems were submitted, namely a phrasebased MT system built using Moses, and a system exploiting deep language engineering approaches, that in all the languages but Bulgarian was implemented using TectoMT. For 4 of the 6 languages, the TectoMT-based system performs better than the Moses-based one.
text speech and dialogue | 2012
João Ricardo Silva; António Branco
Deep linguistic grammars provide complex grammatical representations of sentences, capturing, for instance, long-distance dependencies and returning semantic representations, making them suitable for advanced natural language processing. However, they lack robustness in that they do not gracefully handle words missing from the lexicon of the grammar. Several approaches have been taken to handle this problem, one of which consists in pre-annotating the input to the grammar with shallow processing machine-learning tools. This is usually done to speed-up parsing (supertagging) but it can also be used as a way of handling unknown words in the input. These pre-processing tools, however, must be able to cope with the vast tagset required by a deep grammar. We investigate the training and evaluation of several supertaggers for a deep linguistic processing grammar and report on it in this paper.
processing of the portuguese language | 2006
António Branco; João Ricardo Silva
A widespread assumption about the analysis of inflection features is that this task is to be performed by a tagger with an extended tagset. This typically leads to a POS precision drop due to the data-sparseness problem. In this paper we tackle this problem by addressing inflection tagging as a dedicated task, separated from that of POS tagging. More specifically, this paper describes and evaluates a rule-based approach to the tagging of Gender, Number and Degree inflection of open nominal morphosyntactic categories. This approach achieves a better F-measure than the typical approach of inflection analysis via stochastic state-of-the-art tagging.
meeting of the association for computational linguistics | 2009
António Branco; Francisco Costa; Eduardo Ferreira; Pedro Martins; Filipe Nunes; João Ricardo Silva; Sara Silveira
This is a paper supporting the demonstration of the LX-Center at ACL-IJCNLP-09. LX-Center is a web center of online linguistic services aimed at both demonstrating a range of language technology tools and at fostering the education, research and development in natural language science and technology.
processing of the portuguese language | 2008
António Branco; Lino Rodrigues; João Ricardo Silva; Sara Silveira
This paper describes XisQue ( http://xisque.di.fc.ul.pt ) an online service for real-time, open-domain question answering (QA) on the Portuguese Web.