Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Roland Roller is active.

Publication


Featured researches published by Roland Roller.


cross language evaluation forum | 2014

Self-supervised Relation Extraction Using UMLS

Roland Roller; Mark Stevenson

Self-supervised relation extraction uses a knowledge base to automatically annotate a training corpus which is then used to train a classifier. This approach has been successfully applied to different domains using a range of knowledge bases. This paper applies the approach to the biomedical domain using UMLS, a large biomedical knowledge base containing millions of concepts and relations among them. The approach is evaluated using two different techniques. The presented results are promising and indicate that UMLS is a useful resource for semi-supervised relation extraction.


Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi) | 2014

Applying UMLS for Distantly Supervised Relation Detection

Roland Roller; Mark Stevenson

This paper describes first results using the Unified Medical Language System (UMLS) for distantly supervised relation extraction. UMLS is a large knowledge base which contains information about millions of medical concepts and relations between them. Our approach is evaluated using existing relation extraction data sets that contain relations that are similar to some of those in UMLS.


international joint conference on natural language processing | 2015

Improving distant supervision using inference learning

Roland Roller; Eneko Agirre; Aitor Soroa; Mark Stevenson

Distant supervision is a widely applied approach to automatic training of relation extraction systems and has the advantage that it can generate large amounts of labelled data with minimal effort. However, this data may contain errors and consequently systems trained using distant supervision tend not to perform as well as those based on manually labelled data. This work proposes a novel method for detecting potential false negative training examples using a knowledge inference method. Results show that our approach improves the performance of relation extraction systems trained using distantly supervised data.


empirical methods in natural language processing | 2015

Held-out versus Gold Standard: Comparison of Evaluation Strategies for Distantly Supervised Relation Extraction from Medline abstracts

Roland Roller; Mark Stevenson

Distant supervision is a useful technique for creating relation classifiers in the absence of labelled data. The approaches are often evaluated using a held-out portion of the distantly labelled data, thereby avoiding the need for lablelled data entirely. However, held-out evaluation means that systems are tested against noisy data, making it difficult to determine their true accuracy. This paper examines the effectiveness of using held-out data to evaluate relation extraction systems by comparing the results that are produced with those generated using manually labelled versions of the same data. We train classifiers to detect two UMLS Metathesaurus relations (may-treat and may-prevent) in Medline abstracts. A new evaluation data set for these relations is made available. We show that evaluation against a distantly labelled gold standard tends to overestimate performance and that no direct connection can be found between improved performance against distantly and manually labelled gold standards.


recent advances in natural language processing | 2017

Annotation of Entities and Relations in Spanish Radiology Reports.

Viviana Cotik; Darío Filippo; Roland Roller; Hans Uszkoreit; Feiyu Xu

Radiology reports express the results of a radiology study and contain information about anatomical entities, findings, measures and impressions of the medical doctor. The use of information extraction techniques can help physicians to access this information in order to understand data and to infer further knowledge. Supervised machine learning methods are very popular to address information extraction, but are usually domain and language dependent. To train new classification models, annotated data is required. Moreover, annotated data is also required as an evaluation resource of information extraction algorithms. However, one major drawback of processing clinical data is the low availability of annotated datasets. For this reason we performed a manual annotation of radiology reports written in Spanish. This paper presents the corpus, the annotation schema, the annotation guidelines and further insight of the data.


Proceedings of BioNLP 15 | 2015

Making the most of limited training data using distant supervision

Roland Roller; Mark Stevenson

Automatic recognition of relationships between key entities in text is an important problem which has many applications. Supervised machine learning techniques have proved to be the most effective approach to this problem. However, they require labelled training data which may not be available in sufficient quantity (or at all) and is expensive to produce. This paper proposes a technique that can be applied when only limited training data is available. The approach uses a form of distant supervision but does not require an external knowledge base. Instead, it uses information from the training set to acquire new labelled data and combines it with manually labelled data. The approach was tested on an adverse drug data set using a limited amount of manually labelled training data and shown to outperform a supervised approach.


International Conference of the German Society for Computational Linguistics and Language Technology | 2017

Detecting Named Entities and Relations in German Clinical Reports

Roland Roller; Nils Rethmeier; Philippe Thomas; Marc Hübner; Hans Uszkoreit; Oliver Staeck; Klemens Budde; Fabian Halleck; Danilo Schmidt

Clinical notes and discharge summaries are commonly used in the clinical routine and contain patient related information such as well-being, findings and treatments. Information is often described in text form and presented in a semi-structured way. This makes it difficult to access the highly valuable information for patient support or clinical studies. Information extraction can help clinicians to access this information. However, most methods in the clinical domain focus on English data. This work aims at information extraction from German nephrology reports. We present on-going work in the context of detecting named entities and relations. Underlying to this work is a currently generated corpus annotation which includes a large set of different medical concepts, attributes and relations. At the current stage we apply a number of classification techniques to the existing dataset and achieve promising results for most of the frequent concepts and relations.


international conference on computational linguistics | 2016

A fine-grained corpus annotation schema of German nephrology records.

Roland Roller; Hans Uszkoreit; Feiyu Xu; Laura Seiffe; Michael Mikhailov; Oliver Staeck; Klemens Budde; Fabian Halleck; Danilo Schmidt


international conference on computational linguistics | 2016

Negation Detection in Clinical Reports Written in German.

Viviana Cotik; Roland Roller; Feiyu Xu; Hans Uszkoreit; Klemens Budde; Danilo Schmidt


meeting of the association for computational linguistics | 2013

Identification of Genia Events using Multiple Classifiers

Roland Roller; Mark Stevenson

Collaboration


Dive into the Roland Roller's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Viviana Cotik

University of Buenos Aires

View shared research outputs
Top Co-Authors

Avatar

Philippe Thomas

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ulf Leser

Humboldt University of Berlin

View shared research outputs
Researchain Logo
Decentralizing Knowledge