António Branco | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where António Branco is active.

Explore More

Publication

Featured researches published by António Branco.

conference of the european chapter of the association for computational linguistics | 2006

A suite of shallow processing tools for Portuguese: LX-suite

António Branco; João Ricardo Silva

In this paper we present LX-Suite, a set of tools for the shallow processing of Portuguese. This suite comprises several modules, namely: a sentence chunker, a tokenizer, a POS tagger, featurizers and lemmatizers.

Computational Linguistics archive | 2002

Binding machines

António Branco

Binding constraints form one of the most robust modules of grammatical knowledge. Despite their crosslinguistic generality and practical relevance for anaphor resolution, they have resisted full integration into grammar processing. The ultimate reason for this is to be found in the original exhaustive coindexation rationale for their specification and verification. As an alternative, we propose an approach which, while permitting a unification-based specification of binding constraints, allows for a verification methodology that helps to overcome previous drawbacks. This alternative approach is based on the rationale that anaphoric nominals can be viewed as binding machines.

processing of the portuguese language | 2010

Out-of-the-box robust parsing of Portuguese

João Ricardo Silva; António Branco; Sérgio Castro; Ruben Reis

In this paper we assess to what extent the available Portuguese treebanks and available probabilistic parsers are suitable for out-of-the-box robust parsing of Portuguese. We also announce the release of the best parser coming out of this exercise, which is, to the best of our knowledge, the first robust parser widely available for Portuguese.

processing of the portuguese language | 2010

LXGram: a deep linguistic processing grammar for Portuguese

Francisco Costa; António Branco

In this paper we present LXGram, a general purpose grammar for the deep linguistic processing of Portuguese that delivers high precision grammatical analysis and detailed meaning representations. We present the main design features and evaluation results on the grammar’s coverage as well as its ability to produce correct grammatical analyses.

Archive | 2009

Anaphora Processing and Applications

Iris Hendrickx; Sobha Lalitha Devi; António Branco; Ruslan Mitkov

Resolution Methodology.- Why Would a Robot Make Use of Pronouns? An Evolutionary Investigation of the Emergence of Pronominal Anaphora.- Automatic Recognition of the Function of Singular Neuter Pronouns in Texts and Spoken Data.- A Deeper Look into Features for Coreference Resolution.- Computational Applications.- Coreference Resolution on Blogs and Commented News.- Identification of Similar Documents Using Coherent Chunks.- Language Analysis.- Binding without Identity: Towards a Unified Semantics for Bound and Exempt Anaphors.- The Doubly Marked Reflexive in Chinese.- Human Processing.- Definiteness Marking Shows Late Effects during Discourse Processing: Evidence from ERPs.- Pronoun Resolution to Commanders and Recessors: A View from Event-Related Brain Potentials.- Effects of Anaphoric Dependencies and Semantic Representations on Pronoun Interpretation.

applications of natural language to data bases | 2012

Extracting multi-document summaries with a double clustering approach

Sara Silveira; António Branco

This paper presents a method for extractive multi-document summarization that explores a two-phase clustering approach. First, sentences are clustered by similarity, and one sentence per cluster is selected, to reduce redundancy. Then, in order to group them according to topics, those sentences are clustered considering the collection of keywords. Additionally, the summarization process further includes a sentence simplification step, which aims not only to create simpler and more incisive sentences, but also to make room for the inclusion of relevant content in the summary as much as possible.

Natural Language Engineering | 2014

Coping with highly imbalanced datasets: a case study with definition extraction in a multilingual setting

Rosa Del Gaudio; Gustavo E. A. P. A. Batista; António Branco

This paper addresses the task of automatic extraction of definitions by thoroughly exploring an approach that solely relies on machine learning techniques, and by focusing on the issue of the imbalance of relevant datasets. We obtained a breakthrough in terms of the automatic extraction of definitions, by extensively and systematically experimenting with different sampling techniques and their combination, as well as a range of different types of classifiers. Performance consistently scored in the range of 0.95–0.99 of area under the receiver operating characteristics, with a notorious improvement between 17 and 22 percentage points regarding the baseline of 0.73–0.77, for datasets with different rates of imbalance. Thus, the present paper also represents a contribution to the seminal work in natural language processing that points toward the importance of exploring the research path of applying sampling techniques to mitigate the bias induced by highly imbalanced datasets, and thus greatly improving the performance of a large range of tools that rely on them.

processing of the portuguese language | 2003

Contractions: breaking the tokenization-tagging circularity

António Branco; João Ricardo Silva

Ambiguous strings are strings of non-whitespace characters, typically coinciding with orthographic contractions of word forms, that depending on the specific occurrence, are to be considered as consisting of one or more than one token. This sort of strings is shown to raise the problem of undesired circularity between tokenization and tagging. This paper presents a strategy to resolve ambiguous strings and dissolve such circularity.

processing of the portuguese language | 2016

LX-DSemVectors: Distributional Semantics Models for Portuguese

João António Rodrigues; António Branco; Steven Neale; João Ricardo Silva

In this article we describe the creation and distribution of the first publicly available word embeddings for Portuguese. Our embeddings are evaluated on their own and also compared with the original English models on a well-known analogy task. We gathered a large Portuguese corpus of 1.7 billion tokens, developed the first distributional semantic analogies test set for Portuguese, and proceeded with the first parametrization and evaluation of Portuguese word embeddings models.

processing of the portuguese language | 2012

Extracting temporal information from portuguese texts

Francisco Costa; António Branco

This paper reports on experimenting with the extraction of temporal information from Portuguese texts and presents LX- TimeAnalyzer, a tool that annotates a text with the temporal information conveyed by it. This tool is the first of its kind being reported for Portuguese, and its performance is similar to the state-of-the-art for other languages.

Explore More