Greta Franzini
University of Göttingen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Greta Franzini.
EuroVis (STARs) | 2015
Stefan Jänicke; Greta Franzini; Muhammad Faisal Cheema; Gerik Scheuermann
We present an overview of the last ten years of research on visualizations that support close and distant reading of textual data in the digital humanities. We look at various works published within both the visualization and digital humanities communities. We provide a taxonomy of applied methods for close and distant reading, and illustrate approaches that combine both reading techniques to provide a multifaceted view of the data. Furthermore, we list toolkits and potentially beneficial visualization approaches for research in the digital humanities. Finally, we summarize collaboration experiences when developing visualizations for close and distant reading, and give an outlook on future challenges in that research area.
Computer Graphics Forum | 2017
Stefan Jänicke; Greta Franzini; Muhammad Faisal Cheema; Gerik Scheuermann
In 2005, Franco Moretti introduced Distant Reading to analyse entire literary text collections. This was a rather revolutionary idea compared to the traditional Close Reading, which focuses on the thorough interpretation of an individual work. Both reading techniques are the prior means of Visual Text Analysis. We present an overview of the research conducted since 2005 on supporting text analysis tasks with close and distant reading visualizations in the digital humanities. Therefore, we classify the observed papers according to a taxonomy of text analysis tasks, categorize applied close and distant reading techniques to support the investigation of these tasks and illustrate approaches that combine both reading techniques in order to provide a multi‐faceted view of the textual data. In addition, we take a look at the used text sources and at the typical data transformation steps required for the proposed visualizations. Finally, we summarize collaboration experiences when developing visualizations for close and distant reading, and we give an outlook on future challenges in that research area.
ACM Journal on Computing and Cultural Heritage | 2016
Maria Moritz; Barbara Pavlek; Greta Franzini; Gregory Crane
We present an approach to shorten Ancient Greek sentences by using morpho-syntactic information attached to each word in a sentence. This work underpins the content of our eLearning application, AncientGeek, whose unique teaching technique draws from primary Greek sources. By applying a technique that skips the clausal dependents of a main verb, we reached a well-formed rate of 89% of the sentences.
Archive | 2014
Marco Büchler; Philip R. Burns; Martin Müller; Emily Franzini; Greta Franzini
Text re-use describes the spoken and written repetition of information. Historical text re-use, with its longer time span, embraces a larger set of morphological, linguistic, syntactic, semantic and copying variations, thus adding a complication to text-reuse detection. Furthermore, it increases the chances of redundancy in a Digital Library. In Natural Language Processing it is crucial to remove these redundancies before applying any kind of machine learning techniques to the text. In Humanities, these redundancies foreground textual criticism and allow scholars to identify lines of transmission. This chapter investigates two aspects of the historical text re-use detection process, based on seven English editions of the Holy Bible. First, we measure the performance of several techniques. For this purpose, when considering a verse—such as book Genesis, Chapter 1, Verse 1—that is present in two editions, one verse is always understood as a paraphrase of the other. It is worth noting that paraphrasing is considered a hyponym of text re-use. Depending on the intention with which the new version was created, verses tend to differ significantly in the wording, but not in the meaning. Secondly, this chapter explains and evaluates a way of extracting paradigmatic relations. However, as regards historical languages, there is a lack of language resources (for example, WordNet) that makes non-literal text re-use and paraphrases much more difficult to identify. These differences are present in the form of replacements, corrections, varying writing styles, etc. For this reason, we introduce both the aforementioned and other correlated steps as a method to identify text re-use, including language acquisition to detect changes that we call paradigmatic relations. The chapter concludes with the recommendation to move from a ”single run” detection to an iterative process by using the acquired relations to run a new task.
Frontiers in Digital Humanities | 2018
Greta Franzini; Mike Kestemont; Gabriela Rotari; Melina Jander; Jeremi K. Ochab; Emily Franzini; Joanna Byszuk; Jan Rybicki
This article presents the results of a multidisciplinary project aimed at better understanding the impact of different digitization strategies in computational text analysis. More specifically, it describes an effort to automatically discern the authorship of Jacob and Wilhelm Grimm in a body of uncorrected correspondence processed by HTR (Handwritten Text Recognition) and OCR (Optical Character Recognition), reporting on the effect this noise has on the analyses necessary to computationally identify the different writing style of the two brothers. In summary, our findings show that OCR digitization serves as a reliable proxy for the more painstaking process of manual digitization, at least when it comes to authorship attribution. Our results suggest that attribution is viable even when using training and test sets from different digitization pipelines. With regard to HTR, this research demonstrates that even though automated transcription significantly increases risk of text misclassification when compared to OCR, a cleanliness above ≈ 20% is already sufficient to achieve a higher-than-chance probability of correct binary attribution.
international conference on big data | 2016
Marco Büchler; Greta Franzini; Emily Franzini; Thomas Eckart
From 2004 to 2016 the Leipzig Linguistic Services (LLS) existed as a SOAP-based cyberinfrastructure of atomic micro-services for the Wortschatz project, which covered different-sized textual corpora in more than 230 languages. The LLS were developed in 2004 and went live in 2005 in order to provide a webservice-based API to these corpus databases. In 2006, the LLS infrastructure began to systematically log and store requests made to the text collection, and in August 2016 the LLS were shut down. This article summarises the experience of the past ten years of running such a cyberinfrastructure with a total of nearly one billion requests. It includes an explanation of the technical decisions and limitations but also provides an overview of how the services were used.
Digital Scholarship in the Humanities | 2015
Stefan Jänicke; Annette Geßner; Greta Franzini; Melissa Terras; Simon Mahony; Gerik Scheuermann
Journal of the Text Encoding Initiative | 2014
Monica Berti; Bridget Almas; David Dubin; Greta Franzini; Simona Stoyanova; Gregory R. Crane
language resources and evaluation | 2014
Frederik Baumgardt; Giuseppe G. A. Celano; Gregory R. Crane; Stella Dee; Maryam Foradi; Emily Franzini; Greta Franzini; Monica Lent; Maria Moritz; Simona Stoyanova
DH | 2018
Marco Büchler; Greta Franzini; Mike Kestemont; Enrique Manjavacas