Martin Holub | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Martin Holub is active.

Explore More

Publication

Featured researches published by Martin Holub.

text speech and dialogue | 2001

The Current Status of the Prague Dependency Treebank

Eva Hajičová; Jan Hajic; Barbora Hladká; Martin Holub; Petr Pajas; Veronika Reznícková; Petr Sgall

The Prague Dependency Treebank (PDT) project is conceived of as a many-layered scenario, both from the point of view of the stratal annotation scheme, from the division-of-labor point of view, and with regard to the level of detail captured at the highest, tectogrammatical layer. The following aspects of the present status of the PDT are discussed in detail: the now-available PDT version 1.0, annotated manually at the morphemic and analytic layers, including the recent experience with post-annotation checking; the ongoing effort of tectogrammatical layer annotation, with a specific attention to the so-called model collection; and to two different areas of exploitation of the PDT, for linguistic research purposes and for information retrieval application purposes.

recent advances in natural language processing | 2000

Use of Dependency Tree Structures for the Microcontext Extraction

Martin Holub; Alena Böhmová

In several recent years, natural language processing (NLP) has brought some very interesting and promising outcomes. In the field of information retrieval (IR), however, these significant advances have not been applied in an optimal way yet.Author argues that traditional IR methods, i.e. methods based on dealing with individual terms without considering their relations, can be overcome using NLP procedures. The reason for this expectation is the fact that NLP methods are able to detect the relations among terms in sentences and that the information obtained can be stored and used for searching. Features of word senses and the significance of word contexts are analysed and possibility of searching based on word senses instead of mere words is examined.The core part of the paper focuses on analysing Czech sentences and extracting the context relations among words from them. In order to make use of lemmatisation and morphological and syntactic tagging of Czech texts, author proposes a method for construction of dependency word microcontexts fully automatically extracted from texts, and several ways how to exploit the microcontexts for the sake of increasing retrieval performance.

Language and Linguistics Compass | 2015

A Gentle Introduction to Machine Learning for Natural Language Processing: How to Start in 16 Practical Steps

Barbora Hladká; Martin Holub

We present a gentle introduction to machine learning in natural language processing. Our goal is to navigate readers through basic machine learning concepts and experimental techniques. As an illustrative example we practically address the task of word sense disambiguation using the R software system. We focus especially on students and junior researchers who are not trained in experimenting with machine learning yet and who want to start. To some extent, machine learning process is independent on both addressed task and software system used. Therefore readers who deal with tasks from different research areas or who prefer different software systems will gain useful knowledge as well.

conference on current trends in theory and practice of informatics | 2000

Use of Dependency Microcontexts in Information Retrieval

Martin Holub

This paper focuses especially on two problems that are crucial for retrieval performance in information retrieval (IR) systems: the lack of information caused by document pre-processing and the difficulty caused by homonymous and synonymous words in natural language. Author argues that traditional IR methods, i. e. methods based on dealing with individual terms without considering their relations, can be overcome using natural language processing (NLP). In order to detect the relations among terms in sentences and make use of lemmatisation and morphological and syntactic tagging of Czech texts, author proposes a method for construction of dependency word microcontexts fully automatically extracted from texts, and several ways how to exploit the microcontexts for the sake of increasing retrieval performance.

meeting of the association for computational linguistics | 2004

Searching for topics in a large collection of texts

Martin Holub; Jiří Semecký; Jiří Diviš

We describe an original method that automatically finds specific topics in a large collection of texts. Each topic is first identified as a specific cluster of texts and then represented as a virtual concept, which is a weighted mixture of words. Our intention is to employ these virtual concepts in document indexing.In this paper we show some preliminary experimental results and discuss directions of future work.

text speech and dialogue | 2010

Can corpus pattern analysis be used in NLP

Silvie Cinková; Martin Holub; Pavel Rychlý; Lenka Smejkalová; Jana ýindlerová

Corpus Pattern Analysis (CPA) [1], coined and implemented by Hanks as the Pattern Dictionary of English Verbs (PDEV) [2], appears to be the only deliberate and consistent implementation of Sinclairs concept of Lexical Item [3]. In his theoretical inquiries [4] Hanks hypothesizes that the pattern repository produced by CPA can also support the word sense disambiguation task. Although more than 670 verb entries have already been compiled in PDEV, no systematic evaluation of this ambitious project has been reported yet. Assuming that the Sinclairian concept of the Lexical Item is correct, we started to closely examine PDEV with its possible NLP application in mind. Our experiments presented in this paper have been performed on a pilot sample of English verbs to provide a first reliable view on whether humans can agree in assigning PDEV patterns to verbs in a corpus. As a conclusion we suggest procedures for future development of PDEV.

conference of the european chapter of the association for computational linguistics | 2012