António Branco
University of Lisbon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by António Branco.
conference of the european chapter of the association for computational linguistics | 2006
António Branco; João Ricardo Silva
In this paper we present LX-Suite, a set of tools for the shallow processing of Portuguese. This suite comprises several modules, namely: a sentence chunker, a tokenizer, a POS tagger, featurizers and lemmatizers.
Computational Linguistics archive | 2002
António Branco
Binding constraints form one of the most robust modules of grammatical knowledge. Despite their crosslinguistic generality and practical relevance for anaphor resolution, they have resisted full integration into grammar processing. The ultimate reason for this is to be found in the original exhaustive coindexation rationale for their specification and verification. As an alternative, we propose an approach which, while permitting a unification-based specification of binding constraints, allows for a verification methodology that helps to overcome previous drawbacks. This alternative approach is based on the rationale that anaphoric nominals can be viewed as binding machines.
processing of the portuguese language | 2010
João Ricardo Silva; António Branco; Sérgio Castro; Ruben Reis
In this paper we assess to what extent the available Portuguese treebanks and available probabilistic parsers are suitable for out-of-the-box robust parsing of Portuguese. We also announce the release of the best parser coming out of this exercise, which is, to the best of our knowledge, the first robust parser widely available for Portuguese.
processing of the portuguese language | 2010
Francisco Costa; António Branco
In this paper we present LXGram, a general purpose grammar for the deep linguistic processing of Portuguese that delivers high precision grammatical analysis and detailed meaning representations. We present the main design features and evaluation results on the grammar’s coverage as well as its ability to produce correct grammatical analyses.
Archive | 2009
Iris Hendrickx; Sobha Lalitha Devi; António Branco; Ruslan Mitkov
Resolution Methodology.- Why Would a Robot Make Use of Pronouns? An Evolutionary Investigation of the Emergence of Pronominal Anaphora.- Automatic Recognition of the Function of Singular Neuter Pronouns in Texts and Spoken Data.- A Deeper Look into Features for Coreference Resolution.- Computational Applications.- Coreference Resolution on Blogs and Commented News.- Identification of Similar Documents Using Coherent Chunks.- Language Analysis.- Binding without Identity: Towards a Unified Semantics for Bound and Exempt Anaphors.- The Doubly Marked Reflexive in Chinese.- Human Processing.- Definiteness Marking Shows Late Effects during Discourse Processing: Evidence from ERPs.- Pronoun Resolution to Commanders and Recessors: A View from Event-Related Brain Potentials.- Effects of Anaphoric Dependencies and Semantic Representations on Pronoun Interpretation.
applications of natural language to data bases | 2012
Sara Silveira; António Branco
This paper presents a method for extractive multi-document summarization that explores a two-phase clustering approach. First, sentences are clustered by similarity, and one sentence per cluster is selected, to reduce redundancy. Then, in order to group them according to topics, those sentences are clustered considering the collection of keywords. Additionally, the summarization process further includes a sentence simplification step, which aims not only to create simpler and more incisive sentences, but also to make room for the inclusion of relevant content in the summary as much as possible.
Natural Language Engineering | 2014
Rosa Del Gaudio; Gustavo E. A. P. A. Batista; António Branco
This paper addresses the task of automatic extraction of definitions by thoroughly exploring an approach that solely relies on machine learning techniques, and by focusing on the issue of the imbalance of relevant datasets. We obtained a breakthrough in terms of the automatic extraction of definitions, by extensively and systematically experimenting with different sampling techniques and their combination, as well as a range of different types of classifiers. Performance consistently scored in the range of 0.95–0.99 of area under the receiver operating characteristics, with a notorious improvement between 17 and 22 percentage points regarding the baseline of 0.73–0.77, for datasets with different rates of imbalance. Thus, the present paper also represents a contribution to the seminal work in natural language processing that points toward the importance of exploring the research path of applying sampling techniques to mitigate the bias induced by highly imbalanced datasets, and thus greatly improving the performance of a large range of tools that rely on them.
processing of the portuguese language | 2003
António Branco; João Ricardo Silva
Ambiguous strings are strings of non-whitespace characters, typically coinciding with orthographic contractions of word forms, that depending on the specific occurrence, are to be considered as consisting of one or more than one token. This sort of strings is shown to raise the problem of undesired circularity between tokenization and tagging. This paper presents a strategy to resolve ambiguous strings and dissolve such circularity.
processing of the portuguese language | 2016
João António Rodrigues; António Branco; Steven Neale; João Ricardo Silva
In this article we describe the creation and distribution of the first publicly available word embeddings for Portuguese. Our embeddings are evaluated on their own and also compared with the original English models on a well-known analogy task. We gathered a large Portuguese corpus of 1.7 billion tokens, developed the first distributional semantic analogies test set for Portuguese, and proceeded with the first parametrization and evaluation of Portuguese word embeddings models.
processing of the portuguese language | 2012
Francisco Costa; António Branco
This paper reports on experimenting with the extraction of temporal information from Portuguese texts and presents LX- TimeAnalyzer, a tool that annotates a text with the temporal information conveyed by it. This tool is the first of its kind being reported for Portuguese, and its performance is similar to the state-of-the-art for other languages.