Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where María Jesús Aranzabe is active.

Publication


Featured researches published by María Jesús Aranzabe.


international conference on computational linguistics | 2010

EusPropBank: integrating semantic information in the basque dependency treebank

Izaskun Aldezabal; María Jesús Aranzabe; Arantza Díaz de Ilarraza; Ainara Estarrona; Larraitz Uria

This paper deals with theoretical problems found in the work that is being carried out for annotating semantic roles in the Basque Dependency Treebank (BDT). We will present the resources used and the way the annotation is being done. Following the model proposed in the PropBank project, we will show the problems found in the annotation process and decisions we have taken. The representation of the semantic tag has been established and detailed guidelines for the annotation process have been defined, although it is a task that needs continuous updating. Besides, we have adapted AbarHitz, a tool used in the construction of the BDT, to this task.


international conference on computational linguistics | 2009

Evaluation of the Syntactic Annotation in EPEC, the Reference Corpus for the Processing of Basque

Larraitz Uria; Ainara Estarrona; Izaskun Aldezabal; María Jesús Aranzabe; Arantza Díaz de Ilarraza; Mikel Iruskieta

The aim of this work is to evaluate the dependency-based annotation of EPEC (the Reference Corpus for the Processing of Basque) by means of an experiment: two annotators have syntactically tagged a sample of the mentioned corpus in order to evaluate the agreement-rate between them and to identify those issues that have to be improved in the syntactic annotation process. In this article we present the quantitative and qualitative results of this evaluation.


Corpus Linguistics and Linguistic Theory | 2009

Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues

Izaskun Aldezabal; María Jesús Aranzabe; Jose Mari Arriola; Arantza Díaz de Ilarraza

Abstract In this paper, we will describe some theoretical and practical issues raised during the construction of the Basque Dependency Treebank (BDT): the syntactic annotation of EPEC (Reference Corpus for the Processing of Basque). EPEC is a 300,000 word corpus of standard written Basque whose purpose is to be a training corpus for the development and improvement of several NLP (Natural Language Processing) tools for Basque. BDT will be the first corpus for the Basque language tagged at syntactic level. We will also present the dependency-based annotation hierarchy that we have established for the syntactic tagging. Decisions made during design of the annotation hierarchy are based on the description of Basque grammar made by Euskaltzaindia (Academy for the Basque Language). When describing dependency relations, we consider lexical units as syntactic heads. This will open up a way for us to work with semantics.


Digital Scholarship in the Humanities | 2016

A Methodology for the Semiautomatic Annotation of EPEC-RolSem, a Basque Corpus Labeled at Predicate Level following the PropBank-VerbNet Model

Ainara Estarrona; Izaskun Aldezabal; Arantza Díaz de Ilarraza; María Jesús Aranzabe

In this article we describe the methodology developed for the semiautomatic annotation of EPEC-RolSem, a Basque corpus labeled at predicate level that follows the PropBank-VerbNet model. The methodology presented is the product of detailed theoretical study of the semantic nature of verbs in Basque and of their similarities and differences with verbs in other languages. As part of the proposed methodology, we are creating a Basque lexicon on the PropBank-VerbNet model that we have named the Basque Verb Index (BVI). Our work thus dovetails with the general trend toward building lexicons from tagged corpora that is clear in work conducted for other languages. EPEC-RolSem and BVI are two important resources for the computational semantic processing of Basque; as far as the authors are aware, they are also the first resources of their kind developed for Basque. In addition, each entry in BVI is linked to the corresponding verb-entry in well-known resources like PropBank, VerbNet, WordNet, FrameNet, and Levin’s classification. We have also implemented several automatic processes to aid in creating and annotating the BVI, including processes designed to facilitate the task of manual annotation.


international conference on computational linguistics | 2013

Detecting apposition for text simplification in basque

Itziar Gonzalez-Dios; María Jesús Aranzabe; Arantza Díaz de Ilarraza; Ander Soraluze

In this paper we have performed a study on Apposition in Basque and we have developed a tool to identify and to detect automatically these structures. In fact, it is necessary to detect and to code this structures for advanced NLP applications. In our case, we plan to use the Apposition Detector in our Automatic Text Simplification system. This Detector applies a grammar that has been created using the Constraint Grammar formalism. The grammar is based, among others, on morphological features and linguistic information obtained by a named entity recogniser. We present the evaluation of that grammar and moreover, based on a study on errors, we propose a method to improve the results. We also use a Mention Detection System and we combine our results with those obtained by the Mention Detector to improve the performance.


language resources and evaluation | 2018

The corpus of Basque simplified texts (CBST)

Itziar Gonzalez-Dios; María Jesús Aranzabe; Arantza Díaz de Ilarraza

AbstractIn this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.


conference on intelligent text processing and computational linguistics | 2016

Adapting TimeML to Basque: Event Annotation

Begoña Altuna; María Jesús Aranzabe; Arantza Díaz de Ilarraza

In this paper we present an event annotation effort following EusTimeML, a temporal mark-up language for Basque based on TimeML. For this, we first describe events and their main ontological and grammatical features. We base our analysis on Basque grammars and TimeML mark-up language classification of events. Annotation guidelines have been created to address the event information annotation for Basque and an annotation experiment has been conducted. A first round has served to evaluate the preliminary guidelines and decisions on event annotation have been taken according to annotations and inter-annotator agreement results. Then a guideline tuning period has followed. In the second round, we have created a manually-annotated gold standard corpus for event annotation in Basque. Event analysis and annotation experiment are part of a complete temporal information analysis and corpus creation work.


Proceedings of the Workshop on Automatic Text Simplification - Methods and Applications in the Multilingual Society (ATS-MA 2014) | 2014

Making Biographical Data in Wikipedia Readable: A Pattern-based Multilingual Approach

Itziar Gonzalez-Dios; María Jesús Aranzabe; Arantza Díaz de Ilarraza

In this paper we present Biografix, a pattern based tool that simplifies parenthetical structures with biographical information, whose aim is to create simple, readable and accessible sentences. To that end, we analysed the parenthetical structures that appear in the first paragraph of the Basque Wikipedia, and concentrated on biographies. Although it has been designed and developed for Basque we adapted it and evaluated with other five languages. We also perform an extrinsic evaluation with a question generation system to see if Biografix improve its results.


Procesamiento Del Lenguaje Natural | 2013

Transforming Complex Sentences using Dependency Trees for Automatic Text Simplification in Basque

María Jesús Aranzabe; Arantza Díaz de Ilarraza; Itziar Gonzalez-Dios


language resources and evaluation | 2010

Building the Basque PropBank.

Izaskun Aldezabal; María Jesús Aranzabe; Arantza Díaz de Ilarraza Sánchez; Ainara Estarrona

Collaboration


Dive into the María Jesús Aranzabe's collaboration.

Top Co-Authors

Avatar

Arantza Díaz de Ilarraza

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Itziar Gonzalez-Dios

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Izaskun Aldezabal

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Ainara Estarrona

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Begoña Altuna

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Jose Maria Arriola

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Mikel Iruskieta

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Ander Soraluze

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Itziar Aduriz

University of the Basque Country

View shared research outputs
Researchain Logo
Decentralizing Knowledge