Anca Dumitrache
VU University Amsterdam
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anca Dumitrache.
Ksii Transactions on Internet and Information Systems | 2018
Anca Dumitrache; Lora Aroyo; Chris Welty
Cognitive computing systems require human labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task.
european semantic web conference | 2015
Anca Dumitrache
This paper proposes an approach to gathering semantic annotation, which rejects the notion that human interpretation can have a single ground truth, and is instead based on the observation that disagreement between annotators can signal ambiguity in the input text, as well as how the annotation task has been designed. The purpose of this research is to investigate whether disagreement-aware crowdsourcing is a scalable approach to gather semantic annotation across various tasks and domains. We propose a methodology for answering this question that involves, for each task and domain: defining the crowdsourcing setup, experimental data collection, and evaluating both the setup and the results. We present initial results for the task of medical relation extraction, and propose an evaluation plan for crowdsourcing semantic annotation for several tasks and domains.
international conference on e-science | 2015
Carlos Martinez-Ortiz; Lora Aroyo; Oana Inel; Stavros Champilomatis; Anca Dumitrache; Benjamin Timmermans
Crowdsourcing has proved to be a feasible way of harnessing human computation for solving complex problems. However, crowdsourcing frequently faces various challenges: data handling, task reusability, and platform selection. Domain scientists rely on eScientists to find solutions for these challenges. CrowdTruth is a framework that builds on existing crowdsourcing platforms and provides an enhanced way to manage crowdsourcing tasks across platforms, offering solutions to commonly faced challenges. Provenance modeling proves means for documenting and examining scientific workflows. CrowdTruth keeps a provenance trace of the data flow through the framework, thus allowing to trace how data was transformed and by whom to reach its final state. In this way, eScientists have a tool to determine the impact that crowdsourcing has on enhancing their data.
web science | 2013
Anca Dumitrache; Paul T. Groth; Peter van den Besselaar
Metrics play a key part in the assessment of scholars. These metrics are primarily computed using bibliometric data collected in offline procedures. In this work, we compare the usage of a publication database based on a Web crawl and a traditional publication database for computing scholarly metrics. We focus on metrics that determine the independence of researchers from their supervisor, which are used to assess the growth of young researchers. We describe two types of graphs that can be constructed from online data: the co-author network of the young researcher, and the combined topic network of the young researcher and their supervisor, together with a series of network properties that describe these graphs. Finally, we show that, for the purpose of discovering emerging talent, dynamic online publication resources provide better coverage than more traditional datasets, and more importantly, lead to very different results.
international semantic web conference | 2014
Oana Inel; Khalid Khamkham; Tatiana Cristea; Anca Dumitrache; Arne Rutjes; Jelle van der Ploeg; Lukasz Romaszko; Lora Aroyo; Robert-Jan Sips
CrowdSem'13 Proceedings of the 1st International Conference on Crowdsourcing the Semantic Web - Volume 1030 | 2013
Anca Dumitrache; Lora Aroyo; Chris Welty; Robert-Jan Sips; Anthony Levas
BDM2I@ISWC | 2015
Anca Dumitrache; Lora Aroyo; Chris Welty
CEUR Workshop Proceedings | 2015
Anca Dumitrache; Lora Aroyo; Chris Welty
Crowdsourcing the Semantic Web | 2013
Anca Dumitrache; L Aroyo; Chris Welty; Robert-Jan Sips; A. Levas; Vu
arXiv: Human-Computer Interaction | 2018
Anca Dumitrache; Oana Inel; Benjamin Timmermans; Carlos Ortiz; Robert-Jan Sips; Lora Aroyo; Chris Welty