Piotr Wierzchoń | Researchain

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Piotr Wierzchoń is active.

Explore More

Publication

Featured researches published by Piotr Wierzchoń.

language and technology conference | 2015

RetroC - A Corpus for Evaluating Temporal Classifiers.

Filip Graliński; Piotr Wierzchoń

We present a corpus for training and evaluating systems for the dating of Polish texts. A number of baselines (using year references, knowledge of spelling reforms and birth years) are given for the temporal classification task. We also show that the problem can be viewed as a regression problem and a standard supervised learning tool (Vowpal Wabbit) can be applied. So far, the best result has been achieved with supervised learning with word tokens and character 5-g as features. In addition, error analysis of the results obtained with the best solution are presented in this paper.

computational methods in science and technology | 2018

Re-research.pl: where Humanities Meet Computer Science

Daniel Dzienisiewicz; Łukasz Borchmann; Piotr Wierzchoń; Filip Graliński

The article discusses selected projects from the field of digital humanities realised by the Re-research.pl group. The group consists of researchers from the Institute of Linguistics and the Department of Natural Language Processing at Adam Mickiewicz University, Poznań, Poland. The projects discussed include National Photocorpus of Polish, Discovermat, Korea, Koreans and ‘Koreanity’ in the digitised Polish press of the 20 century, Biography of the Nation, 100,000 ministories, Gonito.net and 50,000 words. Domain and chronologisation index. However, the main focus of the article is the interdisciplinary popular-scientific blog Re-research.pl. The daily blog posts include texts on a variety of subjects, ranging from linguistics, history and folklore to computer science. Selected posts and categories of posts are discussed, such as chronologisational challenges, texts devoted to folklore and materials on the structure of text files. Apart from providing daily analyses, the blog promotes other projects and serves as a dialogue platform for representatives of various fields.

asian conference on intelligent information and database systems | 2017

The Great National Photocorpus of 20th-Century Vietnamese. Origins, Assumptions and Goals

Piotr Wierzchoń

Lexicography is the science and practice of making dictionaries. Its development has led to new techniques for the visual presentation of lexicographic entries. This article focuses on the technique of photodocumentation, which enables a textual quotation to be shown in its natural context. We aim to present a technological system which will make it possible, relatively cheaply, to produce a monolingual dictionary together with quotations and chronologisation—that is, the date at which a given word first appears. We consider the example of Vietnamese. As a preliminary database of material we selected just over 100 books, which we scanned and from which we excerpted quotations to illustrate the natural use of the headwords.

Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage | 2017

The RetroC challenge: how to guess the publication year of a text?

Filip Graliński; Rafał Jaworski; Łukasz Borchmann; Piotr Wierzchoń

This article describes research in automatic content-based temporal classification of texts. Experiments are carried out on a set of texts coming from Polish digital libraries, dating between the years 1814 and 2013. Following successful research in the field of temporal classification, this work aims at creating an automatic dating mechanism to be used in situations, where the publication date of the text is unknown. Automatic publication date assessment from the computer system can provide useful for researchers from various fields of humanities, such as history (incl. history of language), culture-historical archaeology, sociology or anthropology.

text speech and dialogue | 2016

Vive la Petite Différence

Filip Graliński; Rafał Jaworski; Łukasz Borchmann; Piotr Wierzchoń

This article describes a series of experiments on gender attribution of Polish texts. The research was conducted on the publicly available corpus called “He Said She Said”, consisting of a large number of short texts from the Polish version of Common Crawl. As opposed to other experiments on gender attribution, this research takes on a task of classifying relatively short texts, authored by many different people.

asian conference on intelligent information and database systems | 2016

Big Data in Contemporary Linguistic Research. In Search of Optimum Methods for Language Chronologization

Piotr Wierzchoń

The paper will concern the theoretical and practical problems of analysing the mass of linguistic data which has arisen in conjunction with the development of many fields of life. Moreover, the universe of texts is growing every day – both forwards and backwards. Forwards because every new article, book, blog, e-mail or text message expands the set of existing texts; and backwards because the same set is also expanded whenever a scan is made of another historical text. Our knowledge about past times is growing by leaps and bounds. We are therefore particularly interested in the analysis of historical texts that can be carried out in the second decade of the 21st century.

Investigationes Linguisticae | 2018

Automatic Diachronic Normalization of Polish Texts

Jassem Krzysztof; Filip Graliński; Tomasz Obrębski; Piotr Wierzchoń

Electronic lexicography in the 21st century: Proceedings of eLex 2017 conference, 2017, págs. 680-702 | 2017

From Printed Materials to Electronic Demonstrative Dictionary – the Story of the National Photocorpus of Polish and its Korean and Vietnamese Descendants

Lukasz Borchmann; Daniel Dzienisiewicz; Piotr Wierzchoń

text, speech and dialogue | 2016

Vive la Petite Différence! - Exploiting Small Differences for Gender Attribution of Short Texts.

Filip Graliński; Rafał Jaworski; Lukasz Borchmann; Piotr Wierzchoń

language resources and evaluation | 2016

He Said She Said ― a Male/Female Corpus of Polish.

Filip Graliński; Lukasz Borchmann; Piotr Wierzchoń

Explore More

Collaboration

Dive into the Piotr Wierzchoń's collaboration.

Top Co-Authors

Filip Graliński

Adam Mickiewicz University in Poznań

View shared research outputs

Top Co-Authors

Rafał Jaworski

Adam Mickiewicz University in Poznań

View shared research outputs

Top Co-Authors

Łukasz Borchmann

Adam Mickiewicz University in Poznań

View shared research outputs

Explore More

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where Piotr Wierzchoń is active.

Publication

Featured researches published by Piotr Wierzchoń.

RetroC - A Corpus for Evaluating Temporal Classifiers.

Re-research.pl: where Humanities Meet Computer Science

The Great National Photocorpus of 20th-Century Vietnamese. Origins, Assumptions and Goals

The RetroC challenge: how to guess the publication year of a text?

Vive la Petite Différence

Big Data in Contemporary Linguistic Research. In Search of Optimum Methods for Language Chronologization

Automatic Diachronic Normalization of Polish Texts

From Printed Materials to Electronic Demonstrative Dictionary – the Story of the National Photocorpus of Polish and its Korean and Vietnamese Descendants

Vive la Petite Différence! - Exploiting Small Differences for Gender Attribution of Short Texts.

He Said She Said ― a Male/Female Corpus of Polish.

Collaboration

Dive into the Piotr Wierzchoń's collaboration.