Rebecka Weegar
Stockholm University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rebecka Weegar.
empirical methods in natural language processing | 2015
Rebecka Weegar; Hercules Dalianis
National cancer registries collect cancer related information from multiple sources and make it available for research. Part of this information originates from pathology reports, and in this pre-study the possibility of a system for automatic extraction of information from Norwegian pathology reports is investigated. A set of 40 pathology reports describing breast cancer tissue samples has been used to develop a rule based system for information extraction. To validate the performance of this system its output has been compared to the data produced by experts doing manual encoding of the same pathology reports. On average, a precision of 80%, a recall of 98% and an F-score of 86% has been achieved, showing that such a system is indeed feasible.
Journal of Biomedical Informatics | 2017
Alicia Pérez; Rebecka Weegar; Arantza Casillas; Koldo Gojenola; Maite Oronoz; Hercules Dalianis
OBJECTIVE The goal of this study is to investigate entity recognition within Electronic Health Records (EHRs) focusing on Spanish and Swedish. Of particular importance is a robust representation of the entities. In our case, we utilized unsupervised methods to generate such representations. METHODS The significance of this work stands on its experimental layout. The experiments were carried out under the same conditions for both languages. Several classification approaches were explored: maximum probability, CRF, Perceptron and SVM. The classifiers were enhanced by means of ensembles of semantic spaces and ensembles of Brown trees. In order to mitigate sparsity of data, without a significant increase in the dimension of the decision space, we propose the use of clustered approaches of the hierarchical Brown clustering represented by trees and vector quantization for each semantic space. RESULTS The results showed that the semi-supervised approaches significantly improved standard supervised techniques for both languages. Moreover, clustering the semantic spaces contributed to the quality of the entity recognition while keeping the dimension of the feature-space two orders of magnitude lower than when directly using the semantic spaces. CONCLUSIONS The contributions of this study are: (a) a set of thorough experiments that enable comparisons regarding the influence of different types of features on different classifiers, exploring two languages other than English; and (b) the use of ensembles of clusters of Brown trees and semantic spaces on EHRs to tackle the problem of scarcity of available annotated data.
international conference on pattern recognition | 2014
Agnes Tegen; Rebecka Weegar; Linus Hammarlund; Magnus Oskarsson; Fangyuan Jiang; Dennis Medved; Pierre Nugues; Kalle Åström
In this paper we investigate the problem of segmenting images using the information in text annotations. In contrast to the general image understanding problem, this type of annotation guided segmentation is less ill-posed in the sense that for the output there is higher consensus among human annotations. In the paper we present a system based on a combined visual and semantic pipeline. In the visual pipeline, a list of tentative figure-ground segmentations is first proposed. Each such segmentation is classified into a set of visual categories. In the natural language processing pipeline, the text is parsed and chunked into objects. Each chunk is then compared with the visual categories and the relative distance is computed using the word-net structure. The final choice of segments and their correspondence to the chunked objects are then obtained using combinatorial optimization. The output is compared to manually annotated ground-truth images. The results are promising and there are several interesting avenues for continued research.
recent advances in natural language processing | 2017
Rebecka Weegar; Jan F. Nygård; Hercules Dalianis
In this article we present a system that extracts information from pathology reports. The reports are written in Norwegian and contain free text describing prostate biopsies. Currently, these reports are manually coded for research and statistical purposes by trained experts at the Cancer Registry of Norway where the coders extract values for a set of predefined fields that are specific for prostate cancer. The presented system is rule based and achieves an average F-score of 0.91 for the fields Gleason grade, Gleason score, the number of biopsies that contain tumor tissue, and the orientation of the biopsies. The system also identifies reports that contain ambiguity or other content that should be reviewed by an expert. The system shows potential to encode the reports considerably faster, with less resources, and similar high quality to the manual encoding.
Procesamiento Del Lenguaje Natural | 2018
Rebecka Weegar; Alicia Pérez; Hercules Dalianis; Koldo Gojenola; Arantza Casillas; Maite Oronoz
Health records are a valuable source of clinical knowledge and Natural Language Processing techniques have previously been applied to the text in health records for a number of applications. Often, a first step in clinical text processing is clinical entity recognition; identifying, for example, drugs, disorders, and body parts in clinical text. However, most of this work has focused on records in English.Therefore, this work aims to improve clinical entity recognition for languages other than English by comparing the same methods on two different languages, specifically by employing ensemble methods. Models were created for Spanish and Swedish health records using SVM, Perceptron, and CRF and four different feature sets, including unsupervised features. Finally, the models were combined in ensembles. Weighted voting was applied according to the models individual F-scores. In conclusion, the ensembles improved the overall performance for Spanish and the precision for Swedish.
conference on computational natural language learning | 2015
Rebecka Weegar; Kalle rAström; Pierre Nugues
This paper describes a set of methods to link entities across images and text. As a corpus, we used a data set of images, where each image is commented by a short caption and where the regions in the images are manually segmented and labeled with a category. We extracted the entity mentions from the captions and we computed a semantic similarity between the mentions and the region labels. We also measured the statistical associations between these mentions and the labels and we combined them with the semantic similarity to produce mappings in the form of pairs consisting of a region label and a caption entity. In a second step, we used the syntactic relationships between the mentions and the spatial relationships between the regions to rerank the lists of candidate mappings. To evaluate our methods, we annotated a test set of 200 images, where we manually linked the im- age regions to their corresponding mentions in the captions. Eventually, we could match objects in pictures to their correct mentions for nearly 89 percent of the segments, when such a matching exists. (Less)
conference on advanced information systems engineering | 2015
Hercules Dalianis; Aron Henriksson; Maria Kvist; Sumithra Velupillai; Rebecka Weegar
american medical informatics association annual symposium | 2015
Rebecka Weegar; Maria Kvist; Karin Sundström; Søren Brunak; Hercules Dalianis
national conference on artificial intelligence | 2014
Rebecka Weegar; Linus Hammarlund; Agnes Tegen; Magnus Oskarsson
language and technology conference | 2014
Juri Pyykkö; Rebecka Weegar; Pierre Nugues