Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Toni Badia is active.

Publication


Featured researches published by Toni Badia.


Machine Translation | 2008

METIS-II: low resource machine translation

Michael Carl; Maite Melero; Toni Badia; Vincent Vandeghinste; Peter Dirix; Ineke Schuurman; Stella Markantonatou; Sokratis Sofianopoulos; Marina Vassiliou; Olga Yannoutsou

METIS-II was an EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use “basic” linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four partners, translating from their “home” languages Greek, Dutch, German, and Spanish into English. The paper outlines the basic ideas of the project, their implementation, the resources used, and the results obtained. It also gives examples of how METIS-II has continued beyond its lifetime and the original scope of the project. On the basis of the results and experiences obtained, we believe that the approach is promising and offers the potential for development in various directions.


WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus | 2006

CUCWeb: a Catalan corpus built from the web

Gemma Boleda; Stefan Bott; Rodrigo Meza; Carlos Castillo; Toni Badia; Vicente López

This paper presents CUCWeb, a 166 million word corpus for Catalan built by crawling the Web. The corpus has been annotated with NLP tools and made available to language users through a flexible web interface. The developed architecture is quite general, so that it can be used to create corpora for other languages.


international conference on computational linguistics | 2004

Acquisition of semantic classes for adjectives from distributional evidence

Gemma Boleda; Toni Badia; Eloi Batlle

In this paper, we present a clustering experiment directed at the acquisition of semantic classes for adjectives in Catalan, using only shallow distributional features.We define a broad-coverage classification for adjectives based on Ontological Semantics. We classify along two parameters (number of arguments and ontological kind of denotation), achieving reliable agreement results among human judges. The clustering procedure achieves a comparable agreement score for one of the parameters, and a little lower for the other.


Computational Linguistics | 2012

Modeling Regular Polysemy: A Study on the Semantic Classification of Catalan Adjectives

Gemma Boleda; Sabine Schulte im Walde; Toni Badia

We present a study on the automatic acquisition of semantic classes for Catalan adjectives from distributional and morphological information, with particular emphasis on polysemous adjectives. The aim is to distinguish and characterize broad classes, such as qualitative (gran ‘big’) and relational (pulmonar ‘pulmonary’) adjectives, as well as to identify polysemous adjectives such as econòmic (‘economic ∣ cheap’). We specifically aim at modeling regular polysemy, that is, types of sense alternations that are shared across lemmata. To date, both semantic classes for adjectives and regular polysemy have only been sparsely addressed in empirical computational linguistics.Two main specific questions are tackled in this article. First, what is an adequate broad semantic classification for adjectives? We provide empirical support for the qualitative and relational classes as defined in theoretical work, and uncover one type of adjective that has not received enough attention, namely, the event-related class. Second, how is regular polysemy best modeled in computational terms? We present two models, and argue that the second one, which models regular polysemy in terms of simultaneous membership to multiple basic classes, is both theoretically and empirically more adequate than the first one, which attempts to identify independent polysemous classes. Our best classifier achieves 69.1% accuracy, against a 51% baseline.


meeting of the association for computational linguistics | 2005

Morphology vs. Syntax in Adjective Class Acquisition

Gemma Boleda; Toni Badia; Sabine Schulte im Walde

This paper discusses the role of morphological and syntactic information in the automatic acquisition of semantic classes for Catalan adjectives, using decision trees as a tool for exploratory data analysis. We show that a simple mapping from the derivational type to the semantic class achieves 70.1% accuracy; syntactic function reaches a slightly higher accuracy of 73.5%. Although the accuracy scores are quite similar with the two resulting classifications, the kinds of mistakes are qualitatively very different. Morphology can be used as a baseline classification, and syntax can be used as a clue when there are mismatches between morphology and semantics.


language resources and evaluation | 2005

Automatic acquisition of syntactic verb classes with basic resources

Laia Mayol; Gemma Boleda; Toni Badia

This paper describes a methodology aimed at grouping Catalan verbs according to their syntactic behavior. Our goal is to acquire a small number of basic classes with a high level of accuracy, using minimal resources. Information on syntactic class, expensive and slow to compile by hand, is useful for any NLP task requiring specific lexical information. We show that it is possible to acquire this kind of information using only a POS-tagged corpus. We perform two clustering experiments. The first one aims at classifying verbs into transitive, intransitive and verbs alternating with a se-construction. Our system achieves an average 0.84 F-score, for a task with a 0.33 baseline. The second experiment aims at further distinguishing among pure intransitives and verbs bearing a prepositional object. The baseline for the task is 0.51 and the upperbound 0.98. The system achieves an average 0.88 F-score.


international conference on computational science | 2014

A Hybrid Recommender Combining User, Item and Interaction Data

Jens Grivolla; Diego Campo; Miquel Sonsona; Jose-Miguel Pulido; Toni Badia

While collaborative filtering often yields very good recommendation results, in many real-world recommendation scenarios cold-start and data sparseness remain important problems. This paper presents a hybrid recommender system that integrates user demographics and item characteristics, around a collaborative filtering core based on user-item interactions. The recommender system is evaluated on Movie lens data (including genre information and user data) as well as real-world data from a discount coupon provider. We show that the inclusion of additional item and user information can have great impact on recommendation quality, especially in settings where little interaction data is available.


conference on applied natural language processing | 1997

CATMORF: Multi two-level steps for Catalan morphology

Toni Badia; Angels Egea; Antoni Tuells

In computational morphology the two level paradigm is regarded as a standard in this paper we describe CATMORF the rst wide coverage multi two level steps mor phological analyzer for Catalan which has been implemented in SEGMORF a two level morphological formalism We discuss the bene ts and drawbacks of the appli cation of the two level paradigm to Cata lan morphology and compare our results to those obtained for other Romance lan guages like Spanish We therefore put for ward a slightly di erent two level frame work the Multi two level steps frame work for dealing with Catalan and Span ish morphology The paper also illustrates the acquistion of lexical entries for the an alyzer s lexicon out of a machine readable dictionary MRD What made this task not so trivial is also reported


2008 International Conference on Automated Solutions for Cross Media Content and Multi-Channel Distribution | 2008

ESEDA: A Tool for Enhanced Speech Emotion Detection and Analysis

Julia Sidorova; Toni Badia

This demo paper presents a speech emotion recognition tool, based on standard supervised machine learning methods and enhanced with an additional block of classification error analysis and fixing. The fixing part incorporates two optimisations: classification decomposition and treatment of the minority class problem. Experimental results demonstrate validity of this enhancement. The presentation will show capabilities of the tool described in this paper.


international conference on computational linguistics | 2014

EUMSSI: a Platform for Multimodal Analysis and Recommendation using UIMA

Jens Grivolla; Maite Melero; Toni Badia; Cosmin Cabulea; Yannick Estève; Eelco Herder; Jean-Marc Odobez; Susanne Preuss; Raúl Marín

The EUMSSI project (Event Understanding through Multimodal Social Stream Interpretation) aims at developing technologies for aggregating data presented as unstructured information in sources of very different nature. The multimodal analytics will help organize, classify and cluster cross-media streams, by enriching its associated metadata in an interactive manner, so that the data resulting from analysing one media helps reinforce the aggregation of information from other media, in a cross-modal semantic representation framework. Once all the available descriptive information has been collected, an interpretation component will dynamically reason over the semantic representation in order to derive implicit knowledge. Finally the enriched information will be fed to a hybrid recommendation system, which will be at the basis of two well-motivated use-cases. In this paper we give a brief overview of EUMSSI’s main goals and how we are approaching its implementation using UIMA to integrate and combine various layers of annotations coming from different sources.

Collaboration


Dive into the Toni Badia's collaboration.

Top Co-Authors

Avatar

Gemma Boleda

Pompeu Fabra University

View shared research outputs
Top Co-Authors

Avatar

Maite Melero

Pompeu Fabra University

View shared research outputs
Top Co-Authors

Avatar

Roser Saurí

Pompeu Fabra University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Oriol Valentín

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Martí Quixal

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge