Maite Oronoz
University of the Basque Country
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Maite Oronoz.
conference on intelligent text processing and computational linguistics | 2004
Itziar Aduriz; Maxux J. Aranzabe; Jose Maria Arriola; Arantza Díaz de Ilarraza; Koldo Gojenola; Maite Oronoz; Larraitz Uria
This article presents a robust syntactic analyser for Basque and the different modules it contains. Each module is structured in different analysis layers for which each layer takes the information provided by the previous layer as its input; thus creating a gradually deeper syntactic analysis in cascade. This analysis is carried out using the Constraint Grammar (CG) formalism. Moreover, the article describes the standardisation process of the parsing formats using XML.
Journal of Biomedical Informatics | 2015
Maite Oronoz; Koldo Gojenola; Alicia Pérez; Arantza Díaz de Ilarraza; Arantza Casillas
The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.
iberoamerican congress on pattern recognition | 2013
Maite Oronoz; Arantza Casillas; Koldo Gojenola; Alicia Pérez
This paper presents an annotation tool that detects entities in the biomedical domain. By enriching the lexica of the Freeling analyzer with bio-medical terms extracted from dictionaries and ontologies as SNOMED CT, the system is able to automatically detect medical terms in texts. An evaluation has been performed against a manually tagged corpus focusing on entities referring to pharmaceutical drug-names, substances and diseases. The obtained results show that a good annotation tool would help to leverage subsequent processes as data mining or pattern recognition tasks in the biomedical domain.
Expert Systems With Applications | 2016
Arantza Casillas; Alicia Pérez; Maite Oronoz; Koldo Gojenola; Sara Santiso
Inference of a prediction model able to deal with a skewed classification problem.Hybrid medical event extraction combining knowledge-based and inferred classifiers.Detection of cause-effect relations between drugs and diseases.Analysis of Electronic Health Records written in Spanish. Objective: To tackle the extraction of adverse drug reaction events in electronic health records. The challenge stands in inferring a robust prediction model from highly unbalanced data. According to our manually annotated corpus, only 6% of the drug-disease entity pairs trigger a positive adverse drug reaction event and this low ratio makes machine learning tough.Method: We present a hybrid system utilising a self-developed morpho-syntactic and semantic analyser for medical texts in Spanish. It performs named entity recognition of drugs and diseases and adverse drug reaction event extraction. The event extraction stage operates using rule-based and machine learning techniques.Results: We assess both the base classifiers, namely a knowledge-based model and an inferred classifier, and also the resulting hybrid system. Moreover, for the machine learning approach, an analysis of each particular bio-cause triggering the adverse drug reaction is carried out.Conclusions: One of the contributions of the machine learning based system is its ability to deal with both intra-sentence and inter-sentence events in a highly skewed classification environment. Moreover, the knowledge-based and the inferred model are complementary in terms of precision and recall. While the former provides high precision and low recall, the latter is the other way around. As a result, an appropriate hybrid approach seems to be able to benefit from both approaches and also improve them. This is the underlying motivation for selecting the hybrid approach. In addition, this is the first system dealing with real electronic health records in Spanish.
international conference on computational linguistics | 2014
Koldo Gojenola; Maite Oronoz; Alicia Pérez; Arantza Casillas
This paper presents the results of the IxaMed team at the SemEval-2014 Shared Task 7 on Analyzing Clinical Texts. We have developed three different systems based on: a) exact match, b) a general-purpose morphosyntactic analyzer enriched with the SNOMED CT terminology content, and c) a perceptron sequential tagger based on a Global Linear Model. The three individual systems result in similar f-score while they vary in their precision and recall. We have also tried direct combinations of the individual systems, obtaining considerable improvements in performance.
international conference on computational linguistics | 2005
Arantza Díaz de Ilarraza; Koldo Gojenola; Maite Oronoz
This paper presents the design and development of a system for the detection and correction of syntactic errors in free texts. The system is composed of three main modules: a) a robust syntactic analyser, b) a compiler that will translate error processing rules, and c) a module that coordinates the results of the analyser, applying different combinations of the already compiled error rules. The use of the syntactic analyser (a) and the rule processor (b) is independent and not necessarily sequential. The specification language used for the description of the error detection/correction rules is abstract, general, declarative, and based on linguistic information.
BMC Medical Informatics and Decision Making | 2015
Olatz Perez-de-Viñaspre; Maite Oronoz
BackgroundThe SystematizedNomenclature ofMedicine -ClinicalTerms (SNOMED CT) is officially released in English and Spanish. In the Basque Autonomous Community two languages, Spanish and Basque, are official. The first attempt to semi-automatically translate the SNOMED CT terminology content to Basque, a less resourced language is presented in this paper.MethodsA translation algorithm that has its basis in Natural Language Processing methods has been designed and partially implemented. The algorithm comprises four phases from which the first two have been implemented and quantitatively evaluated.ResultsResults are promising as we obtained the equivalents in Basque of 21.41% of the disorder terms of the English SNOMED CT release. As the methods developed are focused on that hierarchy, the results in other hierarchies are lower (12.57% for body structure descriptions, 8.80% for findings and 3% for procedures).ConclusionsWe are in the way to reach two of our objectives when translating SNOMED CT to Basque: to use our language to access rich multilingual resources and to strengthen the use of the Basque language in the biomedical area.
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi) | 2014
Sara Santiso; Arantza Casillas; Alicia Pérez; Maite Oronoz; Koldo Gojenola
The aim of this work is to infer a model able to extract cause-effect relations between drugs and diseases. A two-level system is proposed. The first level carries out a shallow analysis of Electronic Health Records (EHRs) in order to identify medical concepts such as drug brandnames, substances, diseases, etc. Next, all the combination pairs formed by a concept from the group of drugs (drug and substances) and the group of diseases (diseases and symptoms) are characterised through a set of 57 features. A supervised classifier inferred on those features is in charge of deciding whether that pair represents a cause-effect type of event.
Journal of Biomedical Informatics | 2017
Alicia Pérez; Rebecka Weegar; Arantza Casillas; Koldo Gojenola; Maite Oronoz; Hercules Dalianis
OBJECTIVE The goal of this study is to investigate entity recognition within Electronic Health Records (EHRs) focusing on Spanish and Swedish. Of particular importance is a robust representation of the entities. In our case, we utilized unsupervised methods to generate such representations. METHODS The significance of this work stands on its experimental layout. The experiments were carried out under the same conditions for both languages. Several classification approaches were explored: maximum probability, CRF, Perceptron and SVM. The classifiers were enhanced by means of ensembles of semantic spaces and ensembles of Brown trees. In order to mitigate sparsity of data, without a significant increase in the dimension of the decision space, we propose the use of clustered approaches of the hierarchical Brown clustering represented by trees and vector quantization for each semantic space. RESULTS The results showed that the semi-supervised approaches significantly improved standard supervised techniques for both languages. Moreover, clustering the semantic spaces contributed to the quality of the entity recognition while keeping the dimension of the feature-space two orders of magnitude lower than when directly using the semantic spaces. CONCLUSIONS The contributions of this study are: (a) a set of thorough experiments that enable comparisons regarding the influence of different types of features on different classifiers, exploring two languages other than English; and (b) the use of ensembles of clusters of Brown trees and semantic spaces on EHRs to tackle the problem of scarcity of available annotated data.
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi) | 2014
Olatz Perez-de-Viñaspre; Maite Oronoz
This paper presents the first attempt to semi-automatically translate SNOMED CT (Systematized Nomenclature of Medicine ‐ Clinical Terms) terminology content to Basque, a less resourced language. Thus, it would be possible to build a new clinical healthcare terminology for Basque. We have designed the translation algorithm and the first two phases of the algorithm that feed the SNOMED CT’s Terminology content, have been implemented (it is composed of four phases). The goal of the translation is twofold: the enforcement of the use of Basque in the bio-sanitary area and the access to a rich multilingual resource in our language.