Anca Dumitrache | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anca Dumitrache is active.

Explore More

Publication

Featured researches published by Anca Dumitrache.

Ksii Transactions on Internet and Information Systems | 2018

Crowdsourcing Ground Truth for Medical Relation Extraction

Anca Dumitrache; Lora Aroyo; Chris Welty

Cognitive computing systems require human labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task.

european semantic web conference | 2015

Crowdsourcing Disagreement for Collecting Semantic Annotation

Anca Dumitrache

This paper proposes an approach to gathering semantic annotation, which rejects the notion that human interpretation can have a single ground truth, and is instead based on the observation that disagreement between annotators can signal ambiguity in the input text, as well as how the annotation task has been designed. The purpose of this research is to investigate whether disagreement-aware crowdsourcing is a scalable approach to gather semantic annotation across various tasks and domains. We propose a methodology for answering this question that involves, for each task and domain: defining the crowdsourcing setup, experimental data collection, and evaluating both the setup and the results. We present initial results for the task of medical relation extraction, and propose an evaluation plan for crowdsourcing semantic annotation for several tasks and domains.

international conference on e-science | 2015

Provenance-driven Representation of Crowdsourcing Data for Efficient Data Analysis

Carlos Martinez-Ortiz; Lora Aroyo; Oana Inel; Stavros Champilomatis; Anca Dumitrache; Benjamin Timmermans

Crowdsourcing has proved to be a feasible way of harnessing human computation for solving complex problems. However, crowdsourcing frequently faces various challenges: data handling, task reusability, and platform selection. Domain scientists rely on eScientists to find solutions for these challenges. CrowdTruth is a framework that builds on existing crowdsourcing platforms and provides an enhanced way to manage crowdsourcing tasks across platforms, offering solutions to commonly faced challenges. Provenance modeling proves means for documenting and examining scientific workflows. CrowdTruth keeps a provenance trace of the data flow through the framework, thus allowing to trace how data was transformed and by whom to reach its final state. In this way, eScientists have a tool to determine the impact that crowdsourcing has on enhancing their data.

web science | 2013

Identifying research talent using web-centric databases

Anca Dumitrache; Paul T. Groth; Peter van den Besselaar

Metrics play a key part in the assessment of scholars. These metrics are primarily computed using bibliometric data collected in offline procedures. In this work, we compare the usage of a publication database based on a Web crawl and a traditional publication database for computing scholarly metrics. We focus on metrics that determine the independence of researchers from their supervisor, which are used to assess the growth of young researchers. We describe two types of graphs that can be constructed from online data: the co-author network of the young researcher, and the combined topic network of the young researcher and their supervisor, together with a series of network properties that describe these graphs. Finally, we show that, for the purpose of discovering emerging talent, dynamic online publication resources provide better coverage than more traditional datasets, and more importantly, lead to very different results.

international semantic web conference | 2014