Klemens Muthmann
Dresden University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Klemens Muthmann.
european conference on web services | 2009
Marius Feldmann; Tobias Nestler; Klemens Muthmann; Uwe Jugel; Gerald Hübsch; Alexander Schill
Developing Service-based interactive applications is a time consuming and nontrivial task. The idea of annotating Web Services with information fragments used for deriving parts of interactive applications automatically promises the simplification of this task, thus enabling the creation of Service-based interactive applications for end-users without any implementation skills. The paper discusses a model-driven approach for generating executable Service-based interactive applications directly from the output of a visual authoring tool. Besides the introduction of details about the model-driven methodology, this paper makes two central contributions: Firstly, technical details about the developed end-user enabled authoring tool are introduced. Secondly, the meta-model applied as serialization format of the authoring tool and as input for an application generation approach is presented.
international database engineering and applications symposium | 2009
Klemens Muthmann; Wojciech M. Barczynski; Falk Brauer; Alexander Löser
Current forum search technologies lack the ability to identify threads with near-duplicate content and to group these threads in the search results. As a result, forum users are overloaded with duplicated search results and prefer to create new threads without trying to find existing ones. In this paper we therefore identify common reasons leading to near-duplicates and develop a new near-duplicate detection algorithm for forum threads. The algorithm is implemented using a large case study of a real-world forum serving more than one million users. We compare this work with current algorithms, similar to [4, 5], for detecting near-duplicates on machine generated web pages. Our preliminary results show, that we significantly outperform these algorithms and that we are able to group forum threads with a precision of 74%.
document recognition and retrieval | 2012
Daniel Esser; Daniel Schuster; Klemens Muthmann; Michael Stübert Berger; Alexander Schill
Archiving official written documents such as invoices, reminders and account statements in business and private area gets more and more important. Creating appropriate index entries for document archives like senders name, creation date or document number is a tedious manual work. We present a novel approach to handle automatic indexing of documents based on generic positional extraction of index terms. For this purpose we apply the knowledge of document templates stored in a common full text search index to find index positions that were successfully extracted in the past.
international conference on document analysis and recognition | 2013
Daniel Schuster; Klemens Muthmann; Daniel Esser; Alexander Schill; Michael Stübert Berger; Christoph Weidling; Kamil Aliyev; Andreas Hofmeier
Automatic information extraction from scanned business documents is especially valuable in the application domain of document archiving. But current systems for automated document processing still require a lot of configuration work that can only be done by experienced users or administrators. We present an approach for information extraction which purely builds on end-user provided training examples and intentionally omits efficient known extraction techniques like rule based extraction that require intense training and/or information extraction expertise. Our evaluation on a large corpus of business documents shows competitive results of above 85% F1-measure on 10 commonly used fields like document type, sender, receiver and date. The system is deployed and used inside the commercial document management system DocuWare.
international conference on move to meaningful internet systems | 2010
Klemens Muthmann; Alexander Löser
A webforum is a large database of community knowledge, with information of the most recent events and developments. Unfortunately this knowledge is presented in a format easily understood by humans but not automatically by machines. However, from observing several forums for a long time it seems obvious that there are several distinct types of postings and relations between them. One often occurring and very annoying relation between two contributions is the near-duplicate relation. In this paper we propose a work to detect and utilize contribution relations, concentrating on near-duplication. We propose ideas on how to calculate similarity, build groups of similar threads and thus make near-duplicates in forums evident. One of the core theses is, that it is possible to apply information from forum and thread structure to improve existing near-duplicate detection approaches. In addition, the proposed work shows the qualitative and quantitative results of applying such principles, thereby finding out which features are really useful in the near-duplicate detection process. Also proposed are several sample applications, which benefit from forum near-duplicate detection.
international conference on enterprise information systems | 2014
Daniel Esser; Daniel Schuster; Klemens Muthmann; Alexander Schill
The automatic extraction of relevant information from business documents (sender, recipient, date, etc.) is a valuable task in the application domain of document management and archiving. Although current scientific and commercial self-learning solutions for document classification and extraction work pretty well, they still require a high effort of on-site configuration done by domain experts and administrators. Small office/home office (SOHO) users and private individuals do often not benefit from such systems. A low extraction effi- ciency especially in the starting period due to a small number of initially available example documents and a high effort to annotate new documents, drastically lowers their acceptance to use a self-learning information extraction system. Therefore we present a solution for information extraction that fits the requirements of these users. It adopts the idea of one-shot learning from computer vision to the domain of business document processing and requires only a minimal number of training to reach competitive extraction efficiency. Our evaluation on a document set of 12,500 documents following 399 different layouts/templates shows extraction results of 88% F1 score on 10 commonly used fields like document type, sender, recipient, and date. We already reach an F1 score of 78% with only one document of each template in the training set.
ieee acm international conference utility and cloud computing | 2014
Tenshi Hara; Thomas Springer; Klemens Muthmann; Alexander Schill
In the course of the last few years crowd sourcing has received growing research focus due to its conception of solving complex tasks with the help of a flexible group of contributors of whom each needs to only contribute a simpler task part. Hence, the crowd can contribute by collecting data from distributed locations, completing map information, or voting on product ideas, et cetera. However, even though it is a necessary conceptual feature, the participation of large numbers of users with heterogeneous devices, generic infrastructures for crowd sourcing can hardly be found. For example, the management of users, mobile devices and contributed data has to be repetitively implemented in new projects. To ease the development of crowd sourcing applications, in this paper we propose a generic platform for simplified crowd sourcing deployment while supporting diverse crowd sourcing scenarios, the ability to handle large numbers of users and the involvement of heterogeneous mobile devices. The focus therein is put on the deployment process. Hence, the evaluation is based on an actual deployment, namely the migration of Cyface, an existing crowd sourcing project build from scratch, into using our proposed infrastructure.
document recognition and retrieval | 2013
Daniel Schuster; Marcel Hanke; Klemens Muthmann; Daniel Esser
Current systems for automatic extraction of index terms from business documents either take a rule-based or training-based approach. As both approaches have their advantages and disadvantages it seems natural to combine both methods to get the best of both worlds. We present a combination method with the steps selection, normalization, and combination based on comparable scores produced during extraction. Furthermore, novel evaluation metrics are developed to support the assessment of each step in an existing extraction system. Our methods were evaluated on an example extraction system with three individual extractors and a corpus of 12,000 scanned business documents.
document engineering | 2013
Daniel Esser; Klemens Muthmann; Daniel Schuster
Businesses and large organizations currently prefer scanners to incorporate paper documents into their electronic document archives. While cameras integrated into mobile devices such as smartphones and tablets are commonly available, it is still unclear how using a mobile device for document capture influences document content recognition. This is especially important for information extraction carried out on documents captured in a mobile scenario. Therefore this paper presents a set of experiments to compare automatic index data extraction from business documents in a static and in a mobile case. The paper shows which decline in extraction one can expect, explains the reasons and gives a short overview over possible solutions.
information integration and web-based applications & services | 2011
Sandro Reichert; David Urbansky; Klemens Muthmann; Philipp Katz; Matthias Wauer; Alexander Schill
Web feeds allow users to retrieve new content from pages on the World Wide Web. Feeds are offered by a multitude of web pages, ranging from conventional news sites to pages with user generated content such as wikis, forums or personal blogs. They notify interested readers of new content and are therefore interesting for information retrieval tasks. Unfortunately, there is no comprehensive dataset of feeds publicly available, making it difficult for researchers to work with this kind of data and, more importantly, to compare their research results by using a common dataset. In this work we present an extensive real-world dataset of 200,000 diversified feeds, as well as an analysis thereof. The dataset has been collected for a time span of four weeks, yielding over 54 million entries and 100 GB of compressed data. One important outcome of the analysis is, that feeds show different activity patterns that should be considered by aggregators, such as feed reader software, to improve polling strategies. The dataset has been made publicly available for use by research communities around the world.