Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jan Rygl is active.

Publication


Featured researches published by Jan Rygl.


text speech and dialogue | 2014

Automatic Adaptation of Author’s Stylometric Features to Document Types

Jan Rygl

Many Internet users face the problem of anonymous documents and texts with a counterfeit authorship. The number of questionable documents exceeds the capacity of human experts, therefore a universal automated authorship identification system supporting all types of documents is needed. In this paper, five predominant document types are analysed in the context of the authorship verification: books, blogs, discussions, comments and tweets. A method of an automatic selection of authors’ stylometric features using a double-layer machine learning is proposed and evaluated. Experiments are conducted on ten disjunct train and test sets and a method of an efficient training of large number of machine learning models is introduced (163,700 models were trained).


text speech and dialogue | 2016

Building Corpora for Stylometric Research

Jan Švec; Jan Rygl

Authorship recognition, machine translation detection, pedophile identification and other stylometry techniques are daily used in applications for the most widely used languages. On the other hand, under-represented languages lack data sources usable for stylometry research. In this paper, we propose novel algorithm to build corpora containing meta-information required for stylometry experiments (author information, publication time, document heading, document borders) and introduce our tool Authorship Corpora Builder (ACB). We modify data-cleaning techniques for purposes of stylometry field and add a heuristic layer to detect and extract valuable meta-information.


text speech and dialogue | 2012

Authorship Attribution: Comparison of Single-Layer and Double-Layer Machine Learning

Jan Rygl; Aleš Horák

In the traditional authorship attribution task, forensic linguistic specialists analyse and compare documents to determine who was their (real) author. In the current days, the number of anonymous documents is growing ceaselessly because of Internet expansion. That is why the manual part of the authorship attribution process needs to be replaced with automatic methods. Specialized algorithms (SA) like delta-score and word length statistic were developed to quantify the similarity between documents, but currently prevailing techniques build upon the machine learning (ML) approach.


RASLAN | 2012

Authorship Verification based on Syntax Features

Jan Rygl; Kristýna Zemková; Vojtěch Kovář


Archive | 2016

ScaleText: The Design of a Scalable, Adaptable andUser-Friendly Document System for Similarity Searches : Diggingfor Nuggets of Wisdom in Text

Jan Rygl; Petr Sojka; Michal Růžička; Radim Řehůřek


RASLAN | 2015

Slavonic Corpus for Stylometry Research.

Jan Švec; Jan Rygl


RASLAN | 2011

A Framework for Authorship Identification in the Internet Environment

Jan Rygl; Aleš Horák


meeting of the association for computational linguistics | 2017

Semantic Vector Encoding and Similarity Search Using Fulltext Search Engines

Jan Rygl; Jan Pomikálek; Radim Řehůřek; Michal Růžička; Vít Novotný; Petr Sojka


RASLAN | 2016

ScaleText: The Design of a Scalable, Adaptable and User-Friendly Document System for Similarity Searches.

Jan Rygl; Petr Sojka; Michal Ruzicka; Radim Rehurek


Archive | 2015

Style & Identity Recognition

Jan Rygl

Collaboration


Dive into the Jan Rygl's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge