João Ricardo Silva | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where João Ricardo Silva is active.

Explore More

Publication

Featured researches published by João Ricardo Silva.

conference of the european chapter of the association for computational linguistics | 2006

A suite of shallow processing tools for Portuguese: LX-suite

António Branco; João Ricardo Silva

In this paper we present LX-Suite, a set of tools for the shallow processing of Portuguese. This suite comprises several modules, namely: a sentence chunker, a tokenizer, a POS tagger, featurizers and lemmatizers.

processing of the portuguese language | 2010

Out-of-the-box robust parsing of Portuguese

João Ricardo Silva; António Branco; Sérgio Castro; Ruben Reis

In this paper we assess to what extent the available Portuguese treebanks and available probabilistic parsers are suitable for out-of-the-box robust parsing of Portuguese. We also announce the release of the best parser coming out of this exercise, which is, to the best of our knowledge, the first robust parser widely available for Portuguese.

processing of the portuguese language | 2003

Contractions: breaking the tokenization-tagging circularity

António Branco; João Ricardo Silva

Ambiguous strings are strings of non-whitespace characters, typically coinciding with orthographic contractions of word forms, that depending on the specific occurrence, are to be considered as consisting of one or more than one token. This sort of strings is shown to raise the problem of undesired circularity between tokenization and tagging. This paper presents a strategy to resolve ambiguous strings and dissolve such circularity.

processing of the portuguese language | 2016

LX-DSemVectors: Distributional Semantics Models for Portuguese

João António Rodrigues; António Branco; Steven Neale; João Ricardo Silva

In this article we describe the creation and distribution of the first publicly available word embeddings for Portuguese. Our embeddings are evaluated on their own and also compared with the original English models on a well-known analogy task. We gathered a large Portuguese corpus of 1.7 billion tokens, developed the first distributional semantic analogies test set for Portuguese, and proceeded with the first parametrization and evaluation of Portuguese word embeddings models.

processing of the portuguese language | 2014

Rolling out Text Categorization for Language Learning Assessment Supported by Language Technology

António Branco; João António Rodrigues; Francisco Costa; João Ricardo Silva; Rui Vaz

This paper is concerned with a tool that supports human experts in their task of classifying text excerpts suitable to be used in quizzes for learning materials and as items of exams that are aimed at assessing and certifying the language level of students taking courses of Portuguese as a second language.

meeting of the association for computational linguistics | 2016

SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task

Rosa Del Gaudio; Gorka Labaka; Eneko Agirre; Petya Osenova; Kiril Simov; Martin Popel; Dieke Oele; Gertjan van Noord; Luís Gomes; João António Rodrigues; Steven Neale; João Ricardo Silva; Andreia Querido; Nuno Rendeiro; António Branco

This paper presents the description of 12 systems submitted to the WMT16 IT-task, covering six different languages, namely Basque, Bulgarian, Dutch, Czech, Portuguese and Spanish. All these systems were developed under the scope of the QTLeap project, presenting a common strategy. For each language two different systems were submitted, namely a phrasebased MT system built using Moses, and a system exploiting deep language engineering approaches, that in all the languages but Bulgarian was implemented using TectoMT. For 4 of the 6 languages, the TectoMT-based system performs better than the Moses-based one.

text speech and dialogue | 2012

Assigning Deep Lexical Types

João Ricardo Silva; António Branco

Deep linguistic grammars provide complex grammatical representations of sentences, capturing, for instance, long-distance dependencies and returning semantic representations, making them suitable for advanced natural language processing. However, they lack robustness in that they do not gracefully handle words missing from the lexicon of the grammar. Several approaches have been taken to handle this problem, one of which consists in pre-annotating the input to the grammar with shallow processing machine-learning tools. This is usually done to speed-up parsing (supertagging) but it can also be used as a way of handling unknown words in the input. These pre-processing tools, however, must be able to cope with the vast tagset required by a deep grammar. We investigate the training and evaluation of several supertaggers for a deep linguistic processing grammar and report on it in this paper.

processing of the portuguese language | 2006

Dedicated nominal featurization of portuguese

António Branco; João Ricardo Silva

A widespread assumption about the analysis of inflection features is that this task is to be performed by a tagger with an extended tagset. This typically leads to a POS precision drop due to the data-sparseness problem. In this paper we tackle this problem by addressing inflection tagging as a dedicated task, separated from that of POS tagging. More specifically, this paper describes and evaluates a rule-based approach to the tagging of Gender, Number and Degree inflection of open nominal morphosyntactic categories. This approach achieves a better F-measure than the typical approach of inflection analysis via stochastic state-of-the-art tagging.

meeting of the association for computational linguistics | 2009

LX-Center: a center of online linguistic services

António Branco; Francisco Costa; Eduardo Ferreira; Pedro Martins; Filipe Nunes; João Ricardo Silva; Sara Silveira

This is a paper supporting the demonstration of the LX-Center at ACL-IJCNLP-09. LX-Center is a web center of online linguistic services aimed at both demonstrating a range of language technology tools and at fostering the education, research and development in natural language science and technology.

processing of the portuguese language | 2008