Martin Holub
Charles University in Prague
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Martin Holub.
text speech and dialogue | 2001
Eva Hajičová; Jan Hajic; Barbora Hladká; Martin Holub; Petr Pajas; Veronika Reznícková; Petr Sgall
The Prague Dependency Treebank (PDT) project is conceived of as a many-layered scenario, both from the point of view of the stratal annotation scheme, from the division-of-labor point of view, and with regard to the level of detail captured at the highest, tectogrammatical layer. The following aspects of the present status of the PDT are discussed in detail: the now-available PDT version 1.0, annotated manually at the morphemic and analytic layers, including the recent experience with post-annotation checking; the ongoing effort of tectogrammatical layer annotation, with a specific attention to the so-called model collection; and to two different areas of exploitation of the PDT, for linguistic research purposes and for information retrieval application purposes.
recent advances in natural language processing | 2000
Martin Holub; Alena Böhmová
In several recent years, natural language processing (NLP) has brought some very interesting and promising outcomes. In the field of information retrieval (IR), however, these significant advances have not been applied in an optimal way yet.Author argues that traditional IR methods, i.e. methods based on dealing with individual terms without considering their relations, can be overcome using NLP procedures. The reason for this expectation is the fact that NLP methods are able to detect the relations among terms in sentences and that the information obtained can be stored and used for searching. Features of word senses and the significance of word contexts are analysed and possibility of searching based on word senses instead of mere words is examined.The core part of the paper focuses on analysing Czech sentences and extracting the context relations among words from them. In order to make use of lemmatisation and morphological and syntactic tagging of Czech texts, author proposes a method for construction of dependency word microcontexts fully automatically extracted from texts, and several ways how to exploit the microcontexts for the sake of increasing retrieval performance.
Language and Linguistics Compass | 2015
Barbora Hladká; Martin Holub
We present a gentle introduction to machine learning in natural language processing. Our goal is to navigate readers through basic machine learning concepts and experimental techniques. As an illustrative example we practically address the task of word sense disambiguation using the R software system. We focus especially on students and junior researchers who are not trained in experimenting with machine learning yet and who want to start. To some extent, machine learning process is independent on both addressed task and software system used. Therefore readers who deal with tasks from different research areas or who prefer different software systems will gain useful knowledge as well.
conference on current trends in theory and practice of informatics | 2000
Martin Holub
This paper focuses especially on two problems that are crucial for retrieval performance in information retrieval (IR) systems: the lack of information caused by document pre-processing and the difficulty caused by homonymous and synonymous words in natural language. Author argues that traditional IR methods, i. e. methods based on dealing with individual terms without considering their relations, can be overcome using natural language processing (NLP). In order to detect the relations among terms in sentences and make use of lemmatisation and morphological and syntactic tagging of Czech texts, author proposes a method for construction of dependency word microcontexts fully automatically extracted from texts, and several ways how to exploit the microcontexts for the sake of increasing retrieval performance.
meeting of the association for computational linguistics | 2004
Martin Holub; Jiří Semecký; Jiří Diviš
We describe an original method that automatically finds specific topics in a large collection of texts. Each topic is first identified as a specific cluster of texts and then represented as a virtual concept, which is a weighted mixture of words. Our intention is to employ these virtual concepts in document indexing.In this paper we show some preliminary experimental results and discuss directions of future work.
text speech and dialogue | 2010
Silvie Cinková; Martin Holub; Pavel Rychlý; Lenka Smejkalová; Jana ýindlerová
Corpus Pattern Analysis (CPA) [1], coined and implemented by Hanks as the Pattern Dictionary of English Verbs (PDEV) [2], appears to be the only deliberate and consistent implementation of Sinclairs concept of Lexical Item [3]. In his theoretical inquiries [4] Hanks hypothesizes that the pattern repository produced by CPA can also support the word sense disambiguation task. Although more than 670 verb entries have already been compiled in PDEV, no systematic evaluation of this ambitious project has been reported yet. Assuming that the Sinclairian concept of the Lexical Item is correct, we started to closely examine PDEV with its possible NLP application in mind. Our experiments presented in this paper have been performed on a pilot sample of English verbs to provide a first reliable view on whether humans can agree in assigning PDEV patterns to verbs in a corpus. As a conclusion we suggest procedures for future development of PDEV.
conference of the european chapter of the association for computational linguistics | 2012
Silvie Cinková; Martin Holub; Vincent Kr'iž
language resources and evaluation | 2012
Silvie Cinková; Martin Holub; Adam Rambousek; Lenka Smejkalová
ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies | 2003
Martin Holub
international conference on computational linguistics | 2012
Martin Holub; Vincent Kr'iž; Silvie Cinková; Eckhard Bick