Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jan Rybicki is active.

Publication


Featured researches published by Jan Rybicki.


Literary and Linguistic Computing | 2006

Burrowing into Translation: Character Idiolects in Henryk Sienkiewicz's Trilogy and its Two English Translations

Jan Rybicki

Character idiolects in Henryk Sienkiewiczs trilogy were studied in the original and in two English translations by Jeremiah Curtin and W. S. Kuniczak. The method used was Burrowss technique of multivariate analysis of correlation matrices of relative frequencies of the most frequent words in the dialogue. The aim of the study was to verify the intuitions of traditional interpretations, to acquire a more comprehensive view of the phenomenon, and to obtain new insights into the nature of idiolect differentiation in Sienkiewicz. Multidimensional scaling plots for the original yielded patterns of idiolect differentiation by nationality, social status, gender, and age. Corresponding plots for the two translations preserved many of these patterns and exhibited strong similarities to each other. More studies including modified methods (including Burrowss Delta) are needed to observe further and explain why exactly patterns of similarity/difference between character idiolects are so strongly preserved in translation.


Literary and Linguistic Computing | 2013

Do birds of a feather really flock together, or how to choose training samples for authorship attribution

Maciej Eder; Jan Rybicki

This study investigates the problem of appropriate choice of texts for the training set in machine-learning classification techniques. Although intuition suggests picking the most typical texts (whatever ‘typical’ means) by the authors studied, any arbitrary choice might substantially affect the final results. Thus, to eschew cherry-picking, we introduce a method of verification of the choice of ‘typical’ samples, inspired by k-fold cross-validation procedures. Namely, we use a bootstrap-like approach to choose randomly, in 500 iterations, the samples for the training and the test sets. Next, we examine the obtained 500 attribution accuracy scores: if the density function shows widespread results, the corpus is assumed to be very sensitive to the permutations of the training set. To test this methodology empirically, we have selected roughly similar corpora in five languages: English, French, German, Italian and Polish. The results show considerable resistance of the English corpus to permutations, while the other corpora turned out to be more dependent on the choice of the samples; the Polish corpus produces both accuracy and consistency below any acceptable standards.


Literary and Linguistic Computing | 2013

The stylistics and stylometry of collaborative translation: Woolf’s Night and Day in Polish

Jan Rybicki; Magda Heydel

© The Author 2013. Published by Oxf ord University Press on behalf of ALLC. All rig hts reserved. For Permissions, please email: [email protected] Issue Section: Original Articles You do not currently have access to this article. Download all figures The study investigates to what extent traditional stylistics and nontraditional stylometry can co-operate in the study of translations in terms of translatorial style. Stylistic authorship attribution methods based on a multivariate analysis of most-frequent-word frequencies are used in attempts at identifying translators. While these methods usually identify the author of the original rather than the translator, a case study is presented of the Polish translation of a single novel by Virginia Woolf, Night and Day, in which one translator took over from the other; the point of this takeover has been successfully identified with the above-mentioned methods.


Archive | 2012

The great mystery of the (almost) invisible translator: Stylometry in translation

Jan Rybicki

Machine-learning stylometric distance methods based on most-frequent-word frequencies are well-accepted and successful in authorship attribution. This study investigates the results of one of these methods, Burrows’s Delta, when applied to translations. Basing the empirical results on a number of corpora of literary translations, it shows that, except for some few highly adaptative translations, Delta usually fails to identify the translator and identifies the author of the original instead.


Studies in Polish Linguistics | 2015

Success Rates in Most-frequent-word-based Authorship Attribution. A Case Study of 1000 Polish Novels from Ignacy Krasicki to Jerzy Pilch

Jan Rybicki

The success rate of authorship attribution by multivariate analysis of most-frequent-word frequencies is studied in a 1000-novel corpus of Polish literary works from the late 18 th


Frontiers in Digital Humanities | 2018

Attributing authorship in the noisy digitized correspondence of Jacob and Wilhelm Grimm

Greta Franzini; Mike Kestemont; Gabriela Rotari; Melina Jander; Jeremi K. Ochab; Emily Franzini; Joanna Byszuk; Jan Rybicki

This article presents the results of a multidisciplinary project aimed at better understanding the impact of different digitization strategies in computational text analysis. More specifically, it describes an effort to automatically discern the authorship of Jacob and Wilhelm Grimm in a body of uncorrected correspondence processed by HTR (Handwritten Text Recognition) and OCR (Optical Character Recognition), reporting on the effect this noise has on the analyses necessary to computationally identify the different writing style of the two brothers. In summary, our findings show that OCR digitization serves as a reliable proxy for the more painstaking process of manual digitization, at least when it comes to authorship attribution. Our results suggest that attribution is viable even when using training and test sets from different digitization pipelines. With regard to HTR, this research demonstrates that even though automated transcription significantly increases risk of text misclassification when compared to OCR, a cleanliness above ≈ 20% is already sufficient to achieve a higher-than-chance probability of correct binary attribution.


Digital Scholarship in the Humanities | 2016

Multi-Retranslation Corpora: Visibility, Variation, Value, and Virtue

Tom Cheesman; Kevin Flanagan; Stephan Thiel; Jan Rybicki; Robert S. Laramee; Jonathan Hope; Avraham Roos

Variation among human translations is usually invisible, little understood, and under-valued. Previous statistical research finds that translations vary most where the source items are most semantically significant or express most ‘attitude’ (affect, evaluation, ideology). Understanding how and why translations vary is important for translator training and translation quality assessment, for cultural research, and for machine translation development. Our experimental project began with the intuition that quantitative variation in a corpus of historical retranslations might be used to project quasi-qualitative annotations onto the translated text. We present a web-based system which enables users to create parallel, segment-aligned multi-version corpora, and provides visual interfaces for exploring multiple translations, with their variation projected onto a base text. The system can support any corpus of variant versions. We report experiments using our tools (and stylometric analysis) to investigate a corpus of forty German versions of a work by Shakespeare. Initial findings lead to more questions than answers.


Digital Scholarship in the Humanities | 2016

Vive la différence: Tracing the (authorial) gender signal by multivariate analysis of word frequencies

Jan Rybicki

Multivariate analysis of word frequencies is used to identify the gender of authors in a corpus of 18th- and early 19th-century English sentimentalist and Gothic fiction. Results obtained with most frequent words are compared to those produced with medium-frequency Burrowss Zeta words characteristic for both genders. Gender-sensitive words from two periods (18th/19th c. and 19th/20th c.) are compared in terms of their usefulness for gender identification in literary texts.


Przekładaniec. Półrocznik Katedry UNESCO do Badań nad Przekładem i Komunikacją Międzykulturową UJ | 2013

Stylometryczna niewidzialność tłumacza

Jan Rybicki

Translator’s Stylometric Invisibility In a corpus of the writings of several authors, each author being represented by several texts, it is usually enough to compare the similarities between the frequencies of some 100 most frequent words (obviously, these usually include various function words rather than content words) in these texts to group the texts correctly by the authors. This paper investigates the phenomenon that translated texts also tend to be grouped by the original author rather than by the translator despite the fact that the most frequent words in a corpus of translations in no way maintain a one-to-one relationship with those in the original corpus. This is illustrated with examples of experiments performed on a variety of parallel sets of literary texts in English, French and Polish.


Rocznik Przekładoznawczy | 2008

Stylometria komputerowa w służbie tłumacza (na przykładzie własnych przekładów)

Jan Rybicki

This paper presents a stylometric analysis of two “most literary spy novels” by John le Carre, A Perfect Spy (1986) and Absolute Friends (2003). Written 17 years apart, they were translated by the author of this paper into Polish less than months one from the other in 2003 and 2004. From the very start, it was evident for the translator that the two novels would be an interesting subject of study due to their being built according to a very similar model, especially where characterization is concerned. Both feature a slightly foolish British agent (le Carre’s famous trademark), his highly intellectual yet physically handicapped East German nemesis, the British agent’s boss/friend, etc. Since these two very similar works shared their Polish translator – who continued to experience a very strong feeling of deja vu while working on the two novels, this case seemed perfect for a study of stylistic relationships between original and translation. The main effect observed in this study was that of the three above-mentioned couples of corresponding characters, two are very expectedly similar, while one (the two East-German double agents) is not. Their similarity is “regained” in the translation – an interesting corroboration of the translator’s “intuitive” suspicion during his work on the Polish version. These results show that, at least in this – very special – case, the accuracy of studies performed by Multidimensional Scaling of correlation matrices of relative frequencies of the most frequent words is quite considerable when applied to translation. This is true despite the disquieting fact that, like previous statistical authorship attribution techniques, this correspondence lacks any compelling theoretical justification. The tentative explanations proposed so far by van Leuven-Zwart’s postulate of microstructural changes influencing the text’s macrostructure, 1995) or by McKenna, Burrows and Antonia are certainly not enough. Since overlapping semantic fields of the most frequent words of texts and divergent linguistic systems make one-on-one correspondences impossible, a more general underlying mechanism must be found. At the same time, empirical studies hinting at the existence of such a mechanism have still been very few. This is why more are needed to explain the compelling yet somewhat mysterious successes of Burrows’s “old” method.

Collaboration


Dive into the Jan Rybicki's collaboration.

Top Co-Authors

Avatar

Maciej Eder

Pedagogical University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andrzej Kulig

Polish Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jaroslaw Kwapien

Polish Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Magda Heydel

Jagiellonian University

View shared research outputs
Top Co-Authors

Avatar

Pawel Oswiecimka

Polish Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

S. Drozdz

Polish Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge