David Tomás | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Tomás is active.

Explore More

Publication

Featured researches published by David Tomás.

Journal of Web Semantics | 2011

The QALL-ME Framework: A specifiable-domain multilingual Question Answering architecture

Óscar Ferrández; Christian Spurk; Milen Kouylekov; Iustin Dornescu; Sergio Ferrández; Matteo Negri; Rubén Izquierdo; David Tomás; Constantin Orasan; Guenter Neumann; Bernardo Magnini; José L. Vicedo

This paper presents the QALL-ME Framework, a reusable architecture for building multi- and cross-lingual Question Answering (QA) systems working on structured data modelled by an ontology. It is released as free open source software with a set of demo components and extensive documentation, which makes it easy to use and adapt. The main characteristics of the QALL-ME Framework are: (i) its domain portability, achieved by an ontology modelling the target domain; (ii) the context awareness regarding space and time of the question; (iii) the use of textual entailment engines as the core of the question interpretation; and (iv) an architecture based on Service Oriented Architecture (SOA), which is realized using interchangeable web services for the framework components. Furthermore, we present a running example to clarify how the framework processes questions as well as a case study that shows a QA application built as an instantiation of the QALL-ME Framework for cinema/movie events in the tourism domain.

cross language evaluation forum | 2005

AliQAn, spanish QA system at CLEF-2005

Sandra Roger; Sergio Ferrández; Antonio Ferrández; Jesús Peral; Fernando Llopis; Antonia Aguilar; David Tomás

Question Answering is a major research topic at the University of Alicante. For this reason, this year two groups participated in the QA@CLEF track using different approaches. In this paper we describe the work of Alicante 2 group. This paper describes AliQAn, a monolingual open-domain Question Answering (QA) System developed in the Department of Language Processing and Information Systems at the University of Alicante for CLEF-2005 Spanish monolingual QA evaluation task. Our approach is based fundamentally on the use of syntactic pattern recognition in order to identify possible answers. Besides this, Word Sense Disambiguation (WSD) is applied to improve the system. The results achieved (overall accuracy of 33%) are shown and discussed in the paper.

Knowledge and Information Systems | 2013

Minimally supervised question classification on fine-grained taxonomies

David Tomás; José L. Vicedo

This article presents a minimally supervised approach to question classification on fine-grained taxonomies. We have defined an algorithm that automatically obtains lists of weighted terms for each class in the taxonomy, thus identifying which terms are highly related to the classes and are highly discriminative between them. These lists have then been applied to the task of question classification. Our approach is based on the divergence of probability distributions of terms in plain text retrieved from the Web. A corpus of questions with which to train the classifier is not therefore necessary. As the system is based purely on statistical information, it does not require additional linguistic resources or tools. The experiments were performed on English questions and their Spanish translations. The results reveal that our system surpasses current supervised approaches in this task, obtaining a significant improvement in the experiments carried out.

international conference natural language processing | 2006

Automatic feature extraction for question classification based on dissimilarity of probability distributions

David Tomás; José L. Vicedo; Empar Bisbal; Lidia Moreno

Question classification is one of the first tasks carried out in a Question Answering system. In this paper we present a multilingual question classification system based on machine learning techniques. We use Support Vector Machines to classify the questions. All the features needed to train and test this method are automatically extracted through statistical information in an unsupervised way, comparing Poisson distributions of single words in two plain corpora of questions and documents. Thus, we need nothing but plain text to train the system, obtaining a flexible approach easy to adapt to new languages and domains. We have tested it on a bilingual corpus of questions in English and Spanish.

mexican international conference on artificial intelligence | 2005

A multilingual SVM-based question classification system

Empar Bisbal; David Tomás; Lidia Moreno; José L. Vicedo; Armando Suárez

Question Classification (QC) is usually the first stage in a Question Answering system. This paper presents a multilingual SVM-based question classification system aiming to be language and domain independent. For this purpose, we use only surface text features. The system has been tested on the TREC QA track questions set obtaining encouraging results.

International Journal of Virtual Communities and Social Networking | 2015

Dynamic Social and Media Content Syndication for Second Screen

Andreas Menychtas; David Tomás; Marco Tiemann; Christina Santzaridou; Alexandros Psychas; Dimosthenis Kyriazis; Juan Vicente Vidagany Espert; Stuart Campbell

Todays generation of Internet devices has changed how users are interacting with media, from passive and unidirectional users to proactive and interactive. Users can use these devices to comment or rate a TV show and search for related information regarding characters, facts or personalities. This phenomenon is known as second screen. This paper describes SAM, an EU-funded research project that focuses on developing an advanced digital media delivery platform based on second screen interaction and content syndication within a social media context, providing open and standardised ways of characterising, discovering and syndicating digital assets. This work provides an overview of the project and its main objectives, focusing on the NLP challenges to be faced and the technologies developed so far.

international conference on computational linguistics | 2009

A Parallel Corpus Labeled Using Open and Restricted Domain Ontologies

Ester Boldrini; Sergio Ferrández; Rubén Izquierdo; David Tomás; José L. Vicedo

The analysis and creation of annotated corpus is fundamental for implementing natural language processing solutions based on machine learning. In this paper we present a parallel corpus of 4500 questions in Spanish and English on the touristic domain, obtained from real users. With the aim of training a question answering system, the questions were labeled with the expected answer type, according to two different ontologies. The first one is an open domain ontology based on Sekines Extended Named Entity Hierarchy, while the second one is a restricted domain ontology, specific for the touristic field. Due to the use of two ontologies with different characteristics, we had to solve many problematic cases and adjusted our annotation thinking on the characteristics of each one. We present the analysis of the domain coverage of these ontologies and the results of the inter-annotator agreement. Finally we use a question classification system to evaluate the labeling of the corpus.

conference on human system interactions | 2009

A proposal of Expected Answer Type and Named Entity annotation in a Question Answering context

Ester Boldrini; Sergio Ferrández; Rubén Izquierdo; David Tomás; Óscar Ferrández; José L. Vicedo

This paper presents our research related to automatic Expected Answer Type and Named Entity annotation tasks in a Question Answering context. We present the initial step of our research, in which we created the annotation guidelines. We therefore show and justify the tag set employed in the annotation of a collection of questions, and finally, different evaluations in order to test the consistency of the labelled corpus are also presented.

text speech and dialogue | 2007

Multiple-taxonomy question classification for category search on faceted information

David Tomás; Jose-Luis Vicedo

In this paper we present a novel multiple-taxonomy question classification system, facing the challenge of assigning categories inmultiple taxonomies to natural language questions. We applied our system to category search on faceted information. The system provides a natural language interface to faceted information, detecting the categories requested by the user and narrowing down the document search space to those documents pertaining to the facet values identified. The system was developed in the framework of language modeling, and the models to detect categories are inferred directly from the corpus of documents.

acm international conference on interactive experiences for tv and online video | 2015

SAM: Dynamic and Social Content Delivery for Second Screen Interaction

Atta Badii; Marco Tiemann; Andreas Menychtas; Christina Santzaridou; Alexandros Psychas; David Tomás; Stuart Campbell; Juan Vicente Vidagany Espert

Social media services offer a wide range of opportunities for businesses and developers to exploit the vast amount of information and user-generated content produced via social media. In addition, the notion of TV second screen usage -- the interleaved usage of TV and smart devices such as smartphones -- appears ever more prominent, with viewers continuously seeking further information and deeper engagement while watching movies, TV shows or event coverage. In this work-in-progress contribution, we present SAM, an innovative platform that combines social media, content syndication and targets second screen usage to enhance media content provisioning and advance the user experience. SAM incorporates modern technologies and novel features in the areas of content management, dynamic social media, social mining, semantic annotation and multi-device representation to facilitate an advanced business environment for broadcasters, content and metadata providers and editors to better exploit their assets and increase revenues.

Explore More