Michaela Regneri
Saarland University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michaela Regneri.
european conference on computer vision | 2012
Marcus Rohrbach; Michaela Regneri; Mykhaylo Andriluka; Sikandar Amin; Manfred Pinkal; Bernt Schiele
State-of-the-art human activity recognition methods build on discriminative learning which requires a representative training set for good performance. This leads to scalability issues for the recognition of large sets of highly diverse activities. In this paper we leverage the fact that many human activities are compositional and that the essential components of the activities can be obtained from textual descriptions or scripts. To share and transfer knowledge between composite activities we model them by a common set of attributes corresponding to basic actions and object participants. This attribute representation allows to incorporate script data that delivers new variations of a composite activity or even to unseen composite activities. In our experiments on 41 composite cooking tasks, we found that script data to successfully capture the high variability of composite activities. We show improvements in a supervised case where training data for all composite cooking tasks is available, but we are also able to recognize unseen composites by just using script data and without any manual video annotation.
International Journal of Computer Vision | 2016
Marcus Rohrbach; Anna Rohrbach; Michaela Regneri; Sikandar Amin; Mykhaylo Andriluka; Manfred Pinkal; Bernt Schiele
Activity recognition has shown impressive progress in recent years. However, the challenges of detecting fine-grained activities and understanding how they are combined into composite activities have been largely overlooked. In this work we approach both tasks and present a dataset which provides detailed annotations to address them. The first challenge is to detect fine-grained activities, which are defined by low inter-class variability and are typically characterized by fine-grained body motions. We explore how human pose and hands can help to approach this challenge by comparing two pose-based and two hand-centric features with state-of-the-art holistic features. To attack the second challenge, recognizing composite activities, we leverage the fact that these activities are compositional and that the essential components of the activities can be obtained from textual descriptions or scripts. We show the benefits of our hand-centric approach for fine-grained activity classification and detection. For composite activity recognition we find that decomposition into attributes allows sharing information across composites and is essential to attack this hard task. Using script data we can recognize novel composites without having training data for them.
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages | 2014
Guy Emerson; Liling Tan; Susanne Fertmann; Alexis Palmer; Michaela Regneri
A broad-coverage corpus such as the Human Language Project envisioned by Abney and Bird (2010) would be a powerful resource for the study of endangered languages. Existing corpora are limited in the range of languages covered, in standardisation, or in machine-readability. In this paper we present SeedLing, a seed corpus for the Human Language Project. We first survey existing efforts to compile cross-linguistic resources, then describe our own approach. To build the foundation text for a Universal Corpus, we crawl and clean texts from several web sources that contain data from a large number of languages, and convert them into a standardised form consistent with the guidelines of Abney and Bird (2011). The resulting corpus is more easily-accessible and machine-readable than any of the underlying data sources, and, with data from 1451 languages covering 105 language families, represents a significant base corpus for researchers to draw on and add to in the future. To demonstrate the utility of SeedLing for cross-lingual computational research, we use our data in the test application of detecting similar languages.
meeting of the association for computational linguistics | 2008
Michaela Regneri; Markus Egg; Alexander Koller
Underspecification-based algorithms for processing partially disambiguated discourse structure must cope with extremely high numbers of readings. Based on previous work on dominance graphs and weighted tree grammars, we provide the first possibility for computing an underspecified discourse description and a best discourse representation efficiently enough to process even the longest discourses in the RST Discourse Treebank.
meeting of the association for computational linguistics | 2016
Seid Muhie Yimam; Heiner Ulrich; Tatiana von Landesberger; Marcel Rosenbach; Michaela Regneri; Alexander Panchenko; Franziska Lehmann; Uli Fahrer; Chris Biemann; Kathrin Ballweg
We present new/s/leak, a novel tool developed for and with the help of journalists, which enables the automatic analysis and discovery of newsworthy stories from large textual datasets. We rely on different NLP preprocessing steps such named entity tagging, extraction of time expressions, entity networks, relations and metadata. The system features an intuitive web-based user interface based on network visualization combined with data exploring methods and various search and faceting mechanisms. We report the current state of the software and exemplify it with the WikiLeaks PlusD (Cablegate) data.
Archive | 2013
Michaela Regneri
This thesis proposes new techniques for mining scripts. Scripts are essential pieces of common sense knowledge that contain information about everyday scenarios (like going to a restaurant), namely the events that usually happen in a scenario (entering, sitting down, reading the menu...), their typical order (ordering happens before eating), and the participants of these events (customer, waiter, food...). Because many conventionalized scenarios are shared common sense knowledge and thus are usually not described in standard texts, we propose to elicit sequential descriptions of typical scenario instances via crowdsourcing over the internet. This approach overcomes the implicitness problem and, at the same time, is scalable to large data collections. To generalize over the input data, we need to mine event and participant paraphrases from the textual sequences. For this task we make use of the structural commonalities in the collected sequential descriptions, which yields much more accurate paraphrases than approaches that do not take structural constraints into account. We further apply the algorithm we developed for event paraphrasing to parallel standard texts for extracting sentential paraphrases and paraphrase fragments. In this case we consider the discourse structure in a text as a sequential event structure. As for event paraphrasing, the structure-aware paraphrasing approach clearly outperforms systems that do not consider discourse structure. As a multimodal application, we develop a new resource in which textual event descriptions are grounded in videos, which enables new investigations on action description semantics and a more accurate modeling of event description similarities. This grounding approach also opens up new possibilities for applying the computed script knowledge for automated event recognition in videos.
EuroVA@EuroVis | 2017
Martin Müller; Kathrin Ballweg; Tatiana von Landesberger; Seid Muhie Yimam; Uli Fahrer; Chris Biemann; Marcel Rosenbach; Michaela Regneri; Heiner Ulrich
The visual exploration of graphs encoding relationships between entities of multiple types (e.g., persons, locations,...) supports journalists in finding newsworthy information in large text collections. Journalists may have interest in certain entity types or their relations such as locations or person-person relations. This interest may change during the exploration process. The exploration of such large graphs is often supported by guidance using a degree-of-interest (DOI) function. Although many DOIs exist, they do not differentiate entity types, rely on additional data, or require complex settings overburding the journalists. We present a novel DOI for graphs with multiple types of entities. We show the interesting subgraph around the focal node and offer information about possible further steps. The user can interactively set her interest in entity types and entity relations. We apply our approach to a graph extracted from WikiLeaks PlusD Cablegate documents and report on journalists’ feedback.
Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning | 2016
Michaela Regneri; Diane King
We present a study about automated discourse analysis of oral narrative language in adolescents with autistic spectrum disorder (ASD). The basis of this evaluation is an existing dataset of fictional narrations of individuals with ASD and two matched comparison groups. We use three robust measures for quantifying different aspects of text cohesion on this corpus. These measures and several combinations of them correlate strongly with human cohesion annotations. Our evaluation will show which of these also distinguish the ASD group from the two comparison groups, which do not, and which differences are related to language competence rather than to factors specific to ASD.
meeting of the association for computational linguistics | 2014
Anjana Sofia Vakil; Max Paulus; Alexis Palmer; Michaela Regneri
This paper describes lex4all, an opensource PC application for the generation and evaluation of pronunciation lexicons in any language. With just a few minutes of recorded audio and no expert knowledge of linguistics or speech technology, individuals or organizations seeking to create speech-driven applications in lowresource languages can build lexicons enabling the recognition of small vocabularies (up to 100 terms, roughly) in the target language using an existing recognition engine designed for a high-resource source language (e.g. English). To build such lexicons, we employ an existing method for cross-language phoneme-mapping. The application also offers a built-in audio recorder that facilitates data collection, a significantly faster implementation of the phoneme-mapping technique, and an evaluation module that expedites research on small-vocabulary speech recognition for low-resource languages.
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages | 2014
Alexis Palmer; Michaela Regneri
This paper describes a local effort to bridge the gap between computational and documentary linguistics by teaching students and young researchers in computational linguistics about doing research and developing systems for low-resource languages. We describe four student software projects developed within one semester. The projects range from a front-end for building small-vocabulary speech recognition systems, to a broad-coverage (more than 1000 languages) language identification system, to language-specific systems: a lemmatizer for the Mayan language Uspanteko and named entity recognition systems for both Slovak and Persian. Teaching efforts such as these are an excellent way to develop not only tools for low-resource languages, but also computational linguists well-equipped to work on endangered and low-resource languages.