Arantza Casillas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arantza Casillas is active.

Explore More

Publication

Featured researches published by Arantza Casillas.

text speech and dialogue | 2003

Document Clustering into an Unknown Number of Clusters Using a Genetic Algorithm

Arantza Casillas; M. T. González de Lena; Raquel Martínez

We present a genetic algorithm that deals with document clustering. This algorithm calculates an approximation of the optimum k value, and solves the best grouping of the documents into these k clusters. We have evaluated this algorithm with sets of documents that are the output of a query in a search engine. The experiments show that, most of the times, our genetic algorithm obtains better values of the fitness function than the well known Calinski and Harabasz stopping rule, and takes less time.

meeting of the association for computational linguistics | 2006

Multilingual Document Clustering: An Heuristic Approach Based on Cognate Named Entities

Soto Montalvo; Raquel Martínez; Arantza Casillas; Víctor Fresno

This paper presents an approach for Multilingual Document Clustering in comparable corpora. The algorithm is of heuristic nature and it uses as unique evidence for clustering the identification of cognate named entities between both sides of the comparable corpora. One of the main advantages of this approach is that it does not depend on bilingual or multilingual resources. However, it depends on the possibility of identifying cognate named entities between the languages used in the corpus. An additional advantage of the approach is that it does not need any information about the right number of clusters; the algorithm calculates it. We have tested this approach with a comparable corpus of news written in English and Spanish. In addition, we have compared the results with a system which translates selected document features. The obtained results are encouraging.

Journal of Biomedical Informatics | 2015

On the creation of a clinical gold standard corpus in Spanish

Maite Oronoz; Koldo Gojenola; Alicia Pérez; Arantza Díaz de Ilarraza; Arantza Casillas

The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.

iberoamerican congress on pattern recognition | 2013

Automatic Annotation of Medical Records in Spanish with Disease, Drug and Substance Names

Maite Oronoz; Arantza Casillas; Koldo Gojenola; Alicia Pérez

This paper presents an annotation tool that detects entities in the biomedical domain. By enriching the lexica of the Freeling analyzer with bio-medical terms extracted from dictionaries and ontologies as SNOMED CT, the system is able to automatically detect medical terms in texts. An evaluation has been performed against a manually tagged corpus focusing on entities referring to pharmaceutical drug-names, substances and diseases. The obtained results show that a good annotation tool would help to leverage subsequent processes as data mining or pattern recognition tasks in the biomedical domain.

Pattern Recognition Letters | 2007

Multilingual news clustering: Feature translation vs. identification of cognate named entities

Soto Montalvo; Raquel Martínez; Arantza Casillas; Víctor Fresno

In this paper we evaluate the influence of different document representations in the results of multilingual news clustering. We aim at proving whether or not the use of only named entities is a good source of knowledge for multilingual news clustering. We compare two approaches: one based on feature translation, and another based on cognate identification. Our main contribution is using only some categories of cognate named entities like document representation features to perform multilingual news clustering, without the need of translation resources. The results show that the use of cognate named entities, as the only type of features to represent news, leads to good multilingual clustering performance, comparable to the one obtained by using the feature translation approach.

text speech and dialogue | 2007

Bilingual news clustering using named entities and fuzzy similarity

Soto Montalvo; Raquel Martínez; Arantza Casillas; Víctor Fresno

This paper is focused on discovering bilingual news clusters in a comparable corpus. Particularly, we deal with the news representation and with the calculation of the similarity between documents. We use as representative features of the news the cognate named entities they contain. One of our main goals consists of proving whether the use of only named entities is a good source of knowledge for multilingual news clustering. In the vectorial news representation we take into account the category of the named entities. In order to determine the similarity between two documents, we propose a new approach based on a fuzzy system, with a knowledge base that tries to incorporate the human knowledge about the importance of the named entities category in the news. We have compared our approach with a traditional one obtaining better results in a comparable corpus with news in Spanish and English.

Expert Systems With Applications | 2016

Learning to extract adverse drug reaction events from electronic health records in Spanish

Arantza Casillas; Alicia Pérez; Maite Oronoz; Koldo Gojenola; Sara Santiso

Inference of a prediction model able to deal with a skewed classification problem.Hybrid medical event extraction combining knowledge-based and inferred classifiers.Detection of cause-effect relations between drugs and diseases.Analysis of Electronic Health Records written in Spanish. Objective: To tackle the extraction of adverse drug reaction events in electronic health records. The challenge stands in inferring a robust prediction model from highly unbalanced data. According to our manually annotated corpus, only 6% of the drug-disease entity pairs trigger a positive adverse drug reaction event and this low ratio makes machine learning tough.Method: We present a hybrid system utilising a self-developed morpho-syntactic and semantic analyser for medical texts in Spanish. It performs named entity recognition of drugs and diseases and adverse drug reaction event extraction. The event extraction stage operates using rule-based and machine learning techniques.Results: We assess both the base classifiers, namely a knowledge-based model and an inferred classifier, and also the resulting hybrid system. Moreover, for the machine learning approach, an analysis of each particular bio-cause triggering the adverse drug reaction is carried out.Conclusions: One of the contributions of the machine learning based system is its ability to deal with both intra-sentence and inter-sentence events in a highly skewed classification environment. Moreover, the knowledge-based and the inferred model are complementary in terms of precision and recall. While the former provides high precision and low recall, the latter is the other way around. As a result, an appropriate hybrid approach seems to be able to benefit from both approaches and also improve them. This is the underlying motivation for selecting the hybrid approach. In addition, this is the first system dealing with real electronic health records in Spanish.

international conference on computational linguistics | 2014

IxaMed: Applying Freeling and a Perceptron Sequential Tagger at the Shared Task on Analyzing Clinical Texts

Koldo Gojenola; Maite Oronoz; Alicia Pérez; Arantza Casillas

This paper presents the results of the IxaMed team at the SemEval-2014 Shared Task 7 on Analyzing Clinical Texts. We have developed three different systems based on: a) exact match, b) a general-purpose morphosyntactic analyzer enriched with the SNOMED CT terminology content, and c) a perceptron sequential tagger based on a Global Linear Model. The three individual systems result in similar f-score while they vary in their precision and recall. We have also tried direct combinations of the individual systems, obtaining considerable improvements in performance.

meeting of the association for computational linguistics | 1998

Bitext Correspondences through Rich Mark-up

Raquel Martínez; Joseba Abaitua; Arantza Casillas

Rich mark-up can considerably benefit the process of establishing bitext correspondences, that is, the task of providing correct identification and alignment methods for text segments that are translation equivalences of each other in a parallel corpus. We present a sentence alignment algorithm that, by taking advantage of previously annotated texts, obtains accuracy rates close to 100%. The algorithm evaluates the similarity of the linguistic and extralinguistic mark-up in both sides of a bitext. Given that annotations are neutral with respect to typological, grammatical and orthographical differences between languages, rich mark-up becomes an optimal foundation to support bitext correspondences. The main originality of this approach is that it makes maximal use of annotations, which is a very sensible and efficient method for the exploitation of parallel corpora when annotations exist.

Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi) | 2014

Adverse Drug Event prediction combining shallow analysis and machine learning

Sara Santiso; Arantza Casillas; Alicia Pérez; Maite Oronoz; Koldo Gojenola

The aim of this work is to infer a model able to extract cause-effect relations between drugs and diseases. A two-level system is proposed. The first level carries out a shallow analysis of Electronic Health Records (EHRs) in order to identify medical concepts such as drug brandnames, substances, diseases, etc. Next, all the combination pairs formed by a concept from the group of drugs (drug and substances) and the group of diseases (diseases and symptoms) are characterised through a set of 57 features. A supervised classifier inferred on those features is in charge of deciding whether that pair represents a cause-effect type of event.

Explore More