Jan Rygl
Masaryk University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jan Rygl.
text speech and dialogue | 2014
Jan Rygl
Many Internet users face the problem of anonymous documents and texts with a counterfeit authorship. The number of questionable documents exceeds the capacity of human experts, therefore a universal automated authorship identification system supporting all types of documents is needed. In this paper, five predominant document types are analysed in the context of the authorship verification: books, blogs, discussions, comments and tweets. A method of an automatic selection of authors’ stylometric features using a double-layer machine learning is proposed and evaluated. Experiments are conducted on ten disjunct train and test sets and a method of an efficient training of large number of machine learning models is introduced (163,700 models were trained).
text speech and dialogue | 2016
Jan Švec; Jan Rygl
Authorship recognition, machine translation detection, pedophile identification and other stylometry techniques are daily used in applications for the most widely used languages. On the other hand, under-represented languages lack data sources usable for stylometry research. In this paper, we propose novel algorithm to build corpora containing meta-information required for stylometry experiments (author information, publication time, document heading, document borders) and introduce our tool Authorship Corpora Builder (ACB). We modify data-cleaning techniques for purposes of stylometry field and add a heuristic layer to detect and extract valuable meta-information.
text speech and dialogue | 2012
Jan Rygl; Aleš Horák
In the traditional authorship attribution task, forensic linguistic specialists analyse and compare documents to determine who was their (real) author. In the current days, the number of anonymous documents is growing ceaselessly because of Internet expansion. That is why the manual part of the authorship attribution process needs to be replaced with automatic methods. Specialized algorithms (SA) like delta-score and word length statistic were developed to quantify the similarity between documents, but currently prevailing techniques build upon the machine learning (ML) approach.
RASLAN | 2012
Jan Rygl; Kristýna Zemková; Vojtěch Kovář
Archive | 2016
Jan Rygl; Petr Sojka; Michal Růžička; Radim Řehůřek
RASLAN | 2015
Jan Švec; Jan Rygl
RASLAN | 2011
Jan Rygl; Aleš Horák
meeting of the association for computational linguistics | 2017
Jan Rygl; Jan Pomikálek; Radim Řehůřek; Michal Růžička; Vít Novotný; Petr Sojka
RASLAN | 2016
Jan Rygl; Petr Sojka; Michal Ruzicka; Radim Rehurek
Archive | 2015
Jan Rygl