Miloš Jakubíček | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Miloš Jakubíček is active.

Explore More

Publication

Featured researches published by Miloš Jakubíček.

language and technology conference | 2009

Syntactic analysis using finite patterns: a new parsing system for Czech

Vojtěch Kovář; Aleš Horák; Miloš Jakubíček

Syntactic analysis of natural languages is considered to be one of the basic steps to advanced natural language processing, such as logical analysis or information retrieval with natural language texts. The Czech language can be characterized as a morphologically rich language with a relatively free word order, which further complicates the problem of syntactic analysis. Current parsing systems for Czech fight many problems including low precision or high ambiguity of the parser output. In this paper, we show a new approach to syntactic analysis of free-word-order languages based on the idea of pattern matching linking rules. The system, named SET, is currently developed and tested with the Czech language as a representative of free-word-order languages with very rich morphological system. We briefly mention current approaches and parsing systems for Czech. Then we describe the basic ideas as well as details of SETs prototype implementation of the pattern matching approach to syntactic analysis. We also offer preliminary analysis of the system parsing precision and discuss the advantages and disadvantages of this approach.

text speech and dialogue | 2009

Mining Phrases from Syntactic Analysis

Miloš Jakubíček; Aleš Horák; Vojtěch Kovář

In this paper we describe the exploitation of the syntactic parser synt to obtain information about syntactic structures (such as noun or verb phrases) of common sentences in Czech. These phrases/structures are from the analysis point of view usually identical to nonterminals in the grammar used by the parser to find possible valid derivations of the given sentence. The parser has been extended in such a way that enables its highly ambiguous output to be used for mining those phrases unambiguously and offers several ways how to identify them. To achieve this, some previously unused results of syntactic analysis have been evolved leading to more precise morphological analysis and hence also to deeper distinction among various syntactic (sub)structures. Finally, an application for shallow valency extraction and punctuation correction is presented.

Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016

English-French Document Alignment Based on Keywords and Statistical Translation.

Marek Medveď; Vojtěch Kovář; Miloš Jakubíček

In this paper we present our approach to the Bilingual Document Alignment Task (WMT16), where the main goal was to reach the best recall on extracting aligned pages within the provided data. Our approach consists of tree main parts: data preprocessing, keyword extraction and text pairs scoring based on keyword matching. For text preprocessing we use the TreeTagger pipeline that contains the Unitok tool (Michelfeit et al., 2014) for tokenization and the TreeTagger morphological analyzer (Schmid, 1994). After keywords extraction from the texts according TF-IDF scoring our system searches for comparable English-French pairs. Using a statistical dictionary created from a large English-French parallel corpus, the system is able to find comaparable documents. At the end this procedure is combined with the baseline algorithm and best one-to-one pairing is selected. The result reaches 91.6% recall on provided training data. After a deep error analysis (see section 5) the recall reached 97.4%.

international conference on computational linguistics | 2013

Enhancing czech parsing with verb valency frames

Miloš Jakubíček; Vojtěch Kovář

In this paper an exploitation of the verb valency lexicons for the Czech parsing system Syntis presented and an effective implementation is described that uses the syntactic information in the complex valency frames to resolve some of the standard parsing ambiguities, thereby improving the analysis results. We discuss the implementation in detail and provide evaluation showing improvements in parsing accuracy on the Brno Phrasal Treebank.

conference on intelligent text processing and computational linguistics | 2016

Adam Kilgarriff’s Legacy to Computational Linguistics and Beyond

Roger Evans; Alexander Gelbukh; Gregory Grefenstette; Patrick Hanks; Miloš Jakubíček; Diana McCarthy; Martha Palmer; Ted Pedersen; Michael Rundell; Pavel Rychlý; Serge Sharoff; David Tugwell

The 2016 CICLing conference was dedicated to the memory of Adam Kilgarriff who died the year before. Adam leaves behind a tremendous scientific legacy and those working in computational linguistics, other fields of linguistics and lexicography are indebted to him. This paper is a summary review of some of Adam’s main scientific contributions. It is not and cannot be exhaustive. It is written by only a small selection of his large network of collaborators. Nevertheless we hope this will provide a useful summary for readers wanting to know more about the origins of work, events and software that are so widely relied upon by scientists today, and undoubtedly will continue to be so in the foreseeable future.

text, speech and dialogue | 2011

Effective parsing using competing CFG rules

Miloš Jakubíček

In this paper a new pruning method for a rule-based parser is described that relies on separating the underlying grammar rules into several mutually competing levels. This method has been developed and exploited for Czech in the syntactic parser Synt to reduce the number of possible output derivation trees. The algorithm behind operates on a so called packed forest of trees, a compressing data structure used for internal representation of parallel analyses, and thus performs very effectively. An evaluation of its contribution has been performed on the Brno Phrasal Treebank showing that the algorithm significantly prunes the resulting tree space while preserving perspective parses.

Lexicography ASIALEX | 2014