Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Seid Muhie Yimam is active.

Publication


Featured researches published by Seid Muhie Yimam.


meeting of the association for computational linguistics | 2014

Automatic Annotation Suggestions and Custom Annotation Layers in WebAnno

Seid Muhie Yimam; Chris Biemann; Richard Eckart de Castilho; Iryna Gurevych

In this paper, we present a flexible approach to the efficient and exhaustive manual annotation of text documents. For this purpose, we extend WebAnno (Yimam et al., 2013) an open-source web-based annotation tool. 1 While it was previously limited to specific annotation layers, our extension allows adding and configuring an arbitrary number of layers through a web-based UI. These layers can be annotated separately or simultaneously, and support most types of linguistic annotations such as spans, semantic classes, dependency relations, lexical chains, and morphology. Further, we tightly integrate a generic machine learning component for automatic annotation suggestions of span annotations. In two case studies, we show that automatic annotation suggestions, combined with our split-pane UI concept, significantly reduces annotation time.


Brain Informatics | 2016

An adaptive annotation approach for biomedical entity and relation recognition

Seid Muhie Yimam; Chris Biemann; Ljiljana Majnarić; Šefket Šabanović; Andreas Holzinger

Abstract In this article, we demonstrate the impact of interactive machine learning: we develop biomedical entity recognition dataset using a human-into-the-loop approach. In contrary to classical machine learning, human-in-the-loop approaches do not operate on predefined training or test sets, but assume that human input regarding system improvement is supplied iteratively. Here, during annotation, a machine learning model is built on previous annotations and used to propose labels for subsequent annotation. To demonstrate that such interactive and iterative annotation speeds up the development of quality dataset annotation, we conduct three experiments. In the first experiment, we carry out an iterative annotation experimental simulation and show that only a handful of medical abstracts need to be annotated to produce suggestions that increase annotation speed. In the second experiment, clinical doctors have conducted a case study in annotating medical terms documents relevant for their research. The third experiment explores the annotation of semantic relations with relation instance learning across documents. The experiments validate our method qualitatively and quantitatively, and give rise to a more personalized, responsive information extraction technology.


International Conference on Brain Informatics and Health | 2015

Interactive and Iterative Annotation for Biomedical Entity Recognition

Seid Muhie Yimam; Chris Biemann; Ljiljana Majnarić; Šefket Šabanović; Andreas Holzinger

In this paper, we demonstrate the impact of interactive machine learning for the development of a biomedical entity recognition dataset using a human-into-the-loop approach: during annotation, a machine learning model is built on previous annotations and used to propose labels for subsequent annotation. To demonstrate that such interactive and iterative annotation speeds up the development of quality dataset annotation, we conduct two experiments. In the first experiment, we carry out an iterative annotation experimental simulation and show that only a handful of medical abstracts need to be annotated to produce suggestions that increase annotation speed. In the second experiment, clinical doctors have conducted a case study in annotating medical terms documents relevant for their research. The experiments validate our method qualitatively and quantitatively, and give rise to a more personalized, responsive information extraction technology.


meeting of the association for computational linguistics | 2016

new/s/leak - Information Extraction and Visualization for Investigative Data Journalists.

Seid Muhie Yimam; Heiner Ulrich; Tatiana von Landesberger; Marcel Rosenbach; Michaela Regneri; Alexander Panchenko; Franziska Lehmann; Uli Fahrer; Chris Biemann; Kathrin Ballweg

We present new/s/leak, a novel tool developed for and with the help of journalists, which enables the automatic analysis and discovery of newsworthy stories from large textual datasets. We rely on different NLP preprocessing steps such named entity tagging, extraction of time expressions, entity networks, relations and metadata. The system features an intuitive web-based user interface based on network visualization combined with data exploring methods and various search and faceting mechanisms. We report the current state of the software and exemplify it with the WikiLeaks PlusD (Cablegate) data.


north american chapter of the association for computational linguistics | 2015

Narrowing the Loop: Integration of Resources and Linguistic Dataset Development with Interactive Machine Learning

Seid Muhie Yimam

This thesis proposal sheds light on the role of interactive machine learning and implicit user feedback for manual annotation tasks and semantic writing aid applications. First we focus on the cost-effective annotation of training data using an interactive machine learning approach by conducting an experiment for sequence tagging of German named entity recognition. To show the effectiveness of the approach, we further carry out a sequence tagging task on Amharic part-of-speech and are able to significantly reduce time used for annotation. The second research direction is to systematically integrate different NLP resources for our new semantic writing aid tool using again an interactive machine learning approach to provide contextual paraphrase suggestions. We develop a baseline system where three lexical resources are combined to provide paraphrasing in context and show that combining resources is a promising direction.


recent advances in natural language processing | 2017

Multilingual and Cross-Lingual Complex Word Identification.

Seid Muhie Yimam; Sanja Štajner; Martin Riedl; Chris Biemann

Complex Word Identification (CWI) is an important task in lexical simplification and text accessibility. Due to the lack of CWI datasets, previous works largely depend on Simple English Wikipedia and edit histories for obtaining ‘gold standard’ annotations, which are of doubtable quality, and limited only to English. We collect complex words/phrases (CP) for English, German and Spanish, annotated by both native and non-native speakers, and propose language independent features that can be used to train multilingual and cross-lingual CWI models. We show that the performance of cross-lingual CWI systems (using a model trained on one language and applying it on the other languages) is comparable to the performance of monolingual CWI systems.


social informatics | 2018

New/s/leak 2.0 – Multilingual Information Extraction and Visualization for Investigative Journalism

Gregor Wiedemann; Seid Muhie Yimam; Chris Biemann

Investigative journalism in recent years is confronted with two major challenges: (1) vast amounts of unstructured data originating from large text collections such as leaks or answers to Freedom of Information requests, and (2) multi-lingual data due to intensified global cooperation and communication in politics, business and civil society. Faced with these challenges, journalists are increasingly cooperating in international networks. To support such collaborations, we present the new version of new/s/leak 2.0, our open-source software for content-based searching of leaks. It includes three novel main features: (1) automatic language detection and language-dependent information extraction for 40 languages, (2) entity and keyword visualization for efficient exploration, and (3) decentral deployment for analysis of confidential data from various formats. We illustrate the new analysis capabilities with an exemplary case study.


RANLP 2017 - Biomedical NLP Workshop | 2017

Entity-Centric Information Access with the Human-in-the-Loop for the Biomedical Domains

Seid Muhie Yimam; Steffen Remus; Alexander Panchenko; Andreas Holzinger; Chris Biemann

In this paper, we describe the concept of entity-centric information access for the biomedical domain. With entity recognition technologies approaching acceptable levels of accuracy, we put forward a paradigm of document browsing and searching where the entities of the domain and their relations are explicitly modeled to provide users the possibility of collecting exhaustive information on relations of interest. We describe three working prototypes along these lines: NEW/S/LEAK, which was developed for investigative journalists who need a quick overview of large leaked document collections; STORYFINDER, which is a personalized organizer for information found in web pages that allows adding entities as well as relations, and is capable of personalized information management; and adaptive annotation capabilities of WEBANNO, which is a general-purpose linguistic annotation tool. We will discuss future steps towards the adaptation of these tools to biomedical data, which is subject to a recently started project on biomedical knowledge acquisition. A key difference to other approaches is the centering around the user in a Human-in-theLoop machine learning approach, where users define and extend categories and enable the system to improve via feedback and interaction.


Archive | 2017

Collaborative Web-Based Tools for Multi-layer Text Annotation

Chris Biemann; Kalina Bontcheva; Richard Eckart de Castilho; Iryna Gurevych; Seid Muhie Yimam

Effectively managing the collaboration of many annotators is a crucial ingredient for the success of larger annotation projects. For collaboration, web-based tools offer a low-entry way gathering annotations from distributed contributors. While the management structure of annotation tools is more or less stable across projects, the kind of annotations vary widely between projects. The challenge for web-based tools for multi-layer text annotation is to combine ease of use and availability through the web with maximal flexibility regarding the types and layers of annotations. In this chapter, we outline requirements for web-based annotation tools in detail and review a variety of tools in respect to these requirements. Further, we discuss two web-based multi-layer annotation tools in detail: GATE Teamware and WebAnno. While differing in some aspects, both tools largely fulfill the requirements for today’s web-based annotation tools. Finally, we point out further directions, such as increased schema flexibility and tighter integration of automation for annotation suggestions.


EuroVA@EuroVis | 2017

Guidance for Multi-Type Entity Graphs from Text Collections.

Martin Müller; Kathrin Ballweg; Tatiana von Landesberger; Seid Muhie Yimam; Uli Fahrer; Chris Biemann; Marcel Rosenbach; Michaela Regneri; Heiner Ulrich

The visual exploration of graphs encoding relationships between entities of multiple types (e.g., persons, locations,...) supports journalists in finding newsworthy information in large text collections. Journalists may have interest in certain entity types or their relations such as locations or person-person relations. This interest may change during the exploration process. The exploration of such large graphs is often supported by guidance using a degree-of-interest (DOI) function. Although many DOIs exist, they do not differentiate entity types, rely on additional data, or require complex settings overburding the journalists. We present a novel DOI for graphs with multiple types of entities. We show the interesting subgraph around the focal node and offer information about possible further steps. The user can interactively set her interest in entity types and entity relations. We apply our approach to a graph extracted from WikiLeaks PlusD Cablegate documents and report on journalists’ feedback.

Collaboration


Dive into the Seid Muhie Yimam's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Iryna Gurevych

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar

Martin Riedl

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar

Richard Eckart de Castilho

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar

Tatiana von Landesberger

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar

Kathrin Ballweg

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sanja Štajner

University of Wolverhampton

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Darina Benikova

Technische Universität Darmstadt

View shared research outputs
Researchain Logo
Decentralizing Knowledge