Is this you? Create Your Porfile

Nerea Ezeiza

University of the Basque Country

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nerea Ezeiza is active.

Explore More

Publication

Featured researches published by Nerea Ezeiza.

MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing | 2004

Representation and treatment of multiword expressions in Basque

Iñaki Alegria; Olatz Ansa; Xabier Artola; Nerea Ezeiza; Koldo Gojenola; Ruben Urizar

This paper describes the representation of Basque Multiword Lexical Units and the automatic processing of Multiword Expressions. After discussing and stating which kind of multiword expressions we consider to be processed at the current stage of the work, we present the representation schema of the corresponding lexical units in a general-purpose lexical database. Due to its expressive power, the schema can deal not only with fixed expressions but also with morphosyntactically flexible constructions. It also allows us to lemmatize word combinations as a unit and yet to parse the components individually if necessary. Moreover, we describe HABIL, a tool for the automatic processing of these expressions, and we give some evaluation results. This work must be placed in a general framework of written Basque processing tools, which currently ranges from the tokenization and segmentation of single words up to the syntactic tagging of general texts.

iberoamerican congress on pattern recognition | 2003

Selection of Lexical Units for Continuous Speech Recognition of basque

K. López de Ipiña; Manuel Graña; Nerea Ezeiza; M. Hernández; Ekaitz Zulueta; Aitzol Ezeiza; C. Tovar

The selection of appropriate Lexical Units (LUs) is an important issue in the development of Continuous Speech Recognition (CSR) systems. Words have been used classically as the recognition unit in most of them. However, proposals of non-word units are beginning to arise. Basque is an agglutinative language with some structure inside words, for which non-word morpheme like units could be an appropriate choice. In this work a statistical analysis of units obtained after morphological segmentation has been carried out. This analysis shows a potential gain of confusion rates in CSR systems, due to the growth of the set of acoustically similar and short morphemes. Thus, several proposals of Lexical Units are analysed to deal with the problem. Measures of Phonetic Perplexity and Speech Recognition rates have been computed using different sets of units and, based on these measures, a set of alternative non-word units have been selected.

international conference on implementation and application of automata | 2001

Using Finite State Technology in Natural Language Processing of Basque

Iñaki Alegria; Maxux J. Aranzabe; Nerea Ezeiza; Aitzol Ezeiza; Ruben Urizar

This paper describes the components used in the design and implementation of NLP tools for Basque. These components are based on finite state technology and are devoted to the morphological analysis of Basque, an agglutinative pre-Indo-European language. We think that our design can be interesting for the treatment of other languages. The main components developed are a general and robust morphological analyser/generator and a spelling checker/corrector for Basque named Xuxen. The analyser is a basic tool for current and future work on NLP of Basque, such as the lemmatiser/tagger Euslem, an Intranet search engine or an assistant for verse-making.

conference of the european chapter of the association for computational linguistics | 1993

A morphological analysis based method for spelling correction

Itziar Aduriz; Eneko Agirre; Iñaki Alegria; Xabier Arregi; Jose Mari Arriola; Xabier Artola; A. Díaz de Ilarraza; Nerea Ezeiza; Montse Maritxalar; Kepa Sarasola; Miriam Urkia

Xuxen is a spelling checker/corrector for Basque which is going to be comercialized next year. The checker recognizes a word-form if a correct morphological breakdown is allowed. The morphological analysis is based on two-level morphology. The correction method distinguishes between orthographic errors and typographical errors. • Typographical errors (or misstypings) are uncognitive errors which do not follow linguistic criteria. • Orthographic errors are cognitive errors which occur when the writer does not know or has forgotten the correct spelling for a word. They are more persistent because of their cognitive nature, they leave worse impression and, finally, its treatment is an interesting application for language standardization purposes.

text speech and dialogue | 2011

Semantic relatedness for named entity disambiguation using a small wikipedia

Izaskun Fernández; Iñaki Alegria; Nerea Ezeiza

Resolving Named Entity Disambiguation task with a small knowledge base makes the task more challenging. Concretely, we present an evaluation of the state-of-the-art methods in this task for Basque NE disambiguation based on the Basque Wikipedia. We have used MFS, VSM, ESA and UKB for linking any ambiguous surface NE form occurrence in a text with its corresponding Wikipedia entry in the Basque Wikipedia version. We have analysed their performance with different corpora and as it was expected, most of them perform worse than when using big Wikipedias such as the English version, but we think these results are more realistic for less-resourced languages. We propose a new normalization factor for ESA to minimise the effect of the knowledge base size.

Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002. | 2002

Morphological segmentation for speech processing in Basque

K. López de Ipiña; Nerea Ezeiza; Germán Bordel; Manuel Graña

Morphological information is traditionally used to develop high quality text to speech (TTS) and automatic speech recognition (ASR) systems. The use of this information improves the naturalness and intelligibility of the TTS synthesis and provides an appropriated way to select lexical units (LU) for ASR. Basque is an agglutinative language with a complex structure inside the words and the morphological information is essential both in TTS and ASR. In this work an automatic morphological segmentation tool oriented to TTS and ASR tasks is presented.

iberoamerican congress on pattern recognition | 2003

Decision tree-based context dependent sublexical units for Continuous Speech Recognition of basque

K. López de Ipiña; Manuel Graña; Nerea Ezeiza; M. Hernández; Ekaitz Zulueta; Aitzol Ezeiza

This paper presents a new methodology, based on the classical decision trees, to get a suitable set of context dependent sublexical units for Basque Continuous Speech Recognition (CSR). The original method proposed by Bahl [1] was applied as the benchmark. Then two new features were added: a data massaging to emphasise the data and a fast and efficient Growing and Pruning algorithm for DT construction. In addition, the use of the new context dependent units to build word models was addressed. The benchmark Bahl approach gave recognition rates clearly outperforming those of context independent phone-like units. Finally the new methodology improves over the benchmark DT approach.

text speech and dialogue | 2016

A Modular Chain of NLP Tools for Basque

Arantxa Otegi; Nerea Ezeiza; Iakes Goenaga; Gorka Labaka

This work describes the initial stage of designing and implementing a modular chain of Natural Language Processing tools for Basque. The main characteristic of this chain is the deep morphosyntactic analysis carried out by the first tool of the chain and the use of these morphologically rich annotations by the following linguistic processing tools of the chain. It is designed following a modular approach, showing high ease of use of its processors. Two tools have been adapted and integrated to the chain so far, and are ready to use and freely available, namely the morphosyntactic analyzer and PoS tagger, and the dependency parser. We have evaluated these tools and obtained competitive results. Furthermore, we have tested the robustness of the tools on an extensive processing of Basque documents in various research projects.

meeting of the association for computational linguistics | 1998