Michael Piotrowski | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Piotrowski is active.

Explore More

Publication

Featured researches published by Michael Piotrowski.

Synthesis Lectures on Human Language Technologies | 2012

Natural Language Processing for Historical Texts

Michael Piotrowski

More and more historical texts are becoming available in digital form. Digitization of paper documents is motivated by the aim of preserving cultural heritage and making it more accessible, both to laypeople and scholars. As digital images cannot be searched for text, digitization projects increasingly strive to create digital text, which can be searched and otherwise automatically processed, in addition to facsimiles. Indeed, the emerging field of digital humanities heavily relies on the availability of digital text for its studies. Together with the increasing availability of historical texts in digital form, there is a growing interest in applying natural language processing (NLP) methods and tools to historical texts. However, the specific linguistic properties of historical texts -- the lack of standardized orthography, in particular -- pose special challenges for NLP. This book aims to give an introduction to NLP for historical texts and an overview of the state of the art in this field. The book starts with an overview of methods for the acquisition of historical texts (scanning and OCR), discusses text encoding and annotation schemes, and presents examples of corpora of historical texts in a variety of languages. The book then discusses specific methods, such as creating part-of-speech taggers for historical languages or handling spelling variation. A final chapter analyzes the relationship between NLP and the digital humanities. Certain recently emerging textual genres, such as SMS, social media, and chat messages, or newsgroup and forum postings share a number of properties with historical texts, for example, nonstandard orthography and grammar, and profuse use of abbreviations. The methods and techniques required for the effective processing of historical texts are thus also of interest for research in other domains.

technical symposium on computer science education | 2006

EduComponents: experiences in e-assessment in computer science education

Mario Amelung; Michael Piotrowski; Dietmar F. Rösner

To reduce the workload of teachers and to improve the effectiveness of face-to-face courses, it is desirable to supplement them with Web-based tools. This paper presents our approach for supporting computer science education with software components which support the creation, management, submission, and assessment of assignments and tests, including the automatic assessment of programming exercises. These components are integrated into a general-purpose content management system (CMS) and can combined with other components to create tailor-made learning environments. We describe the design and implementation of these components, and we report on our practical experience with deploying the software in our courses.

international conference on computational linguistics | 2008

Linguistic support for revising and editing

Cerstin Mahlow; Michael Piotrowski

Revising and editing are important parts of the writing process. In fact, multiple revision and editing cycles are crucial for the production of highquality texts. However, revising and editing are also tedious and error-prone, since changes may introduce new errors. Grammar checkers, as offered by some word processors, are not a solution. Besides the fact that they are only available for few languages, and regardless of the questionable quality, their conceptual approach is not suitable for experienced writers,who actively create their texts.Word processors offer few, if any, functions for handling text on the same cognitive level as the author: While the author is thinking in high-level linguistic terms, editors and word processors mostly provide low-level character oriented functions. Mapping the intended outcome to these low-level operations is distracting for the author, who now has to focus for a long time on small parts of the text. This results in a loss of global overview of the text and in typical revision errors (duplicate verbs, extraneous conjunctions, etc.). We therefore propose functions for text processors that work on the conceptual level of writers. These functions operate on linguistic elements, not on lines and characters. We describe how these functions can be implemented by making use of NLP methods and linguistic resources.

geographic information retrieval | 2010

Towards mapping of alpine route descriptions

Michael Piotrowski; Samuel Läubli; Martin Volk

We describe a corpus of historic mountaineering accounts and on-going work on geocoding toponyms and route descriptions in these accounts. Mountaineering accounts contain a wealth of geographic information but its extraction for purposes of geographic information retrieval poses specific challenges, in particular the distinction between toponyms pertinent to route descriptions and those mentioned in descriptions of panoramas. We describe some preliminary considerations for natural language cues to distinguish between these two types of occurrences.

document engineering | 2010

Document conversion for cultural heritage texts: FrameMaker to HTML revisited

Michael Piotrowski

Many large-scale digitization projects are currently under way that intend to preserve the cultural heritage contained in paper documents (in particular books) and make it available on the Web. Typically OCR is used to produce searchable electronic texts from books. For newer books, approximately from the late 1980s onwards,digital text may already exist in the form of typesetting data. For applications that require a higher level of accuracy than OCR can deliver, the conversion of typesetting data can thus be an alternative to manual keying. In this paper, we describe a tool for converting typesetting data in FrameMaker format to XHTML+CSS developed for a collection of source editions of medieval and early modern documents. Even though the books of the Collection are typeset in good quality and in modern typefaces, OCR is unusable,since the text is in various historical forms of German, French,Italian, Rhaeto-Romanic, and Latin. The conversion of typesetting data produces fully reliable text free from OCR errors and thus also provides a basis for the construction of language resources for the processing of historical texts.

document engineering | 2016

Future Publishing Formats

Michael Piotrowski

The familiar PDF-based scholarly publishing workflow-which emulates even earlier paper-based workflows-has been surprisingly resistent to change. However, it is becoming increasingly clear that it no longer meets the requirements of a quickly evolving scholarly, technical, and political environment, which includes the trend towards open access publishing, reproducible research, mobile devices, linked open data, and many other developments. This workshop approaches scholarly publishing from a document engineering perspective and focuses on the question of document formats for submission, review, publication, and archival of scholarly publications. We will discuss the current state of scholarly publishing from a document engineering point of view, with the explicit goal of identifying potential alternatives to the current workflow.

geographic information retrieval | 2010

Leveraging back-of-the-book indices to enable spatial browsing of a historical document collection

Michael Piotrowski

We describe ongoing work on detecting toponyms in back-of-the-book indices to geocode historical documents not available in full text; the goal is specifically to provide spatial browsing for the Collection of Swiss Law Sources. We discuss some of the peculiarities of handcrafted indices and approaches for coping with them.

document engineering | 2009

Linguistic editing support

Michael Piotrowski; Cerstin Mahlow

Unlike programmers, authors only get very little support from their writing tools, i.e., their word processors and editors. Current editors are unaware of the objects and structures of natural languages and only offer character-based operations for manipulating text. Writers thus have to execute complex sequences of low-level functions to achieve their rhetoric or stylistic goals while composing. Software requiring long and complex sequences of operations causes users to make slips. In the case of editing and revising, these slips result in typical revision errors, such as sentences without a verb, agreement errors, or incorrect word order. In the LingURed project, we are developing language-aware editing functions to prevent errors. These functions operate on linguistic elements, not characters, thus shortening the command sequences writers have to execute. This paper describes the motivation and background of the LingURed project and shows some prototypical language-aware functions.

Mahlow, C; Piotrowski, M (2009). SMM: Detailed, Structured Morphological Analysis for Spanish. Polibits: Computer Science and Computer Engineering with Applications, 39(6):41-48. | 2009

SMM: Detailed, Structured Morphological Analysis for Spanish

Cerstin Mahlow; Michael Piotrowski

We present a morphological analyzer for Spanish called SMM. SMM is implemented in the grammar development framework Malaga, which is based on the formalism of Left- Associative Grammar. We briefly present the Malaga framework, describe the implementation decisions for some interesting mor- phological phenomena of Spanish, and report on the evaluation results from the analysis of corpora. SMM was originally only designed for analyzing word forms; in this article we outline two approaches for using SMM and the facilities provided by Malaga to also generate verbal paradigms. SMM can also be embedded into applications by making use of the Malaga programming interface; we briefly discuss some application scenarios.

Archive | 2011

Systems and Frameworks for Computational Morphology: Second International Workshop, SFCM 2011, Zurich, Switzerland, August 26, 2011, Proceedings

Cerstin Mahlow; Michael Piotrowski

This book constitutes the refereed proceedings of the Second International Workshop on Systems and Frameworks for Computational Morphology, SFCM 2011, held in Zurich, Switzerland in August 2011. The eight revised full papers presented together with one invited paper were carefully reviewed and selected from 13 submissions. The papers address various topics in computational morphology and the relevance of morphology to computational linguistics more broadly.

Explore More