João Graça
INESC-ID
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by João Graça.
Computational Linguistics | 2010
João Graça; Kuzman Ganchev; Ben Taskar
Word-level alignment of bilingual text is a critical resource for a growing variety of tasks. Probabilistic models for word alignment present a fundamental trade-off between richness of captured constraints and correlations versus efficiency and tractability of inference. In this article, we use the Posterior Regularization framework (Graça, Ganchev, and Taskar 2007) to incorporate complex constraints into probabilistic models during learning without changing the efficiency of the underlying model. We focus on the simple and tractable hidden Markov model, and present an efficient learning algorithm for incorporating approximate bijectivity and symmetry constraints. Models estimated with these constraints produce a significant boost in performance as measured by both precision and recall of manually annotated alignments for six language pairs. We also report experiments on two different tasks where word alignments are required: phrase-based machine translation and syntax transfer, and show promising improvements over standard methods.
Journal of Artificial Intelligence Research | 2011
João Graça; Kuzman Ganchev; Luísa Coheur; Fernando Pereira; Benjamin Taskar
We consider the problem of fully unsupervised learning of grammatical (part-of-speech) categories from unlabeled text. The standard maximum-likelihood hidden Markov model for this task performs poorly, because of its weak inductive bias and large model capacity. We address this problem by refining the model and modifying the learning objective to control its capacity via parametric and non-parametric constraints. Our approach enforces word-category association sparsity, adds morphological and orthographic features, and eliminates hard-to-estimate parameters for rare words. We develop an efficient learning algorithm that is not much more computationally intensive than standard training. We also provide an open-source implementation of the algorithm. Our experiments on five diverse languages (Bulgarian, Danish, English, Portuguese, Spanish) achieve significant improvements compared with previous methods for the same task.
The Prague Bulletin of Mathematical Linguistics | 2009
João Graça; Kuzman Ganchev; Ben Taskar
PostCAT - Posterior Constrained Alignment Toolkit In this paper we present a new open-source toolkit for statistical word alignments - Posterior Constrained Alignment Toolkit (PostCAT). The toolkit implements three well known word alignment algorithms (IBM M1, IBM M2, HMM) as well as six new models. In addition to the usual Viterbi decoding scheme, the toolkit provides posterior decoding with several flavors for tuning the threshold. The toolkit also provides an implementation of alignment symmetrization heuristics and a set of utilities for analyzing and pretty printing alignments. The new models have already been shown to improve intrinsic alignment metrics and also to lead to better translations when integrated into a state of the art machine translation system. The toolkit is developed in Java and available in source at its website1. We encourage other researchers to build on our work by modifying the toolkit and using it for their research.
international symposium on circuits and systems | 2015
Diogo M. Caetano; Moisés Piedade; João Graça; Jorge R. Fernandes; Luis S. Rosado; Tiago Costa
Non-destructive testing (NDT) based on eddy currents (EC) is commonly used to detect defects in conductive materials. Usually the system includes an emitter coil, and one receiver coil or one Magnetoresistive (MR) sensor. In this work we added an interface ASIC that pre-amplifies and filters the signal from an array of MR sensors. This demo will present a new version based on the work presented at the ECNDT 2014 conference with a paper entitled “A CMOS ASIC for Precise Reading of a Magnetoresistive Sensor Array for NDT”. Since this is an on-going work, improvements have been made, namely the reduction of the system thermal noise to 30 nV/√Hz, the development of a multigain amplifier and the application of the same concept and circuit to a multichannel parallel signal acquisition system. Detection of surface and buried defects will be demonstrated in different material mock-ups.
Journal of Machine Learning Research | 2010
Kuzman Ganchev; João Graça; Jennifer Gillenwater; Ben Taskar
meeting of the association for computational linguistics | 2008
Kuzman Ganchev; João Graça; Ben Taskar
empirical methods in natural language processing | 2012
Shen Li; João Graça; Ben Taskar
uncertainty in artificial intelligence | 2008
Kuzman Ganchev; João Graça; John Blitzer; Ben Taskar
empirical methods in natural language processing | 2007
Mark Dredze; John Blitzer; Partha Pratim Talukdar; Kuzman Ganchev; João Graça; Fernando Pereira
meeting of the association for computational linguistics | 2010
Jennifer Gillenwater; Kuzman Ganchev; João Graça; Fernando Pereira; Ben Taskar