Karel Pala
Masaryk University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Karel Pala.
meeting of the association for computational linguistics | 2007
Karel Pala; Dana Hlaváċková
In the paper we describe enriching Czech WordNet with the derivational relations that in highly inflectional languages like Czech form typical derivational nests (or subnets). Derivational relations are mostly of semantic nature and their regularity in Czech allows us to add them to the WordNet almost automatically. For this purpose we have used the derivational version of morphological analyzer Ajka that is able to handle the basic and most productive derivational relations in Czech. Using a special derivational interface developed in our NLP Lab we have explored the semantic nature of the selected noun derivational suffixes and established a set of the semantically labeled derivational relations -- presently 14. We have added them to the Czech WordNet and in this way enriched it with approx. 30 000 new Czech synsets. A similar enrichment for Princeton WordNet has been reported in its recently released version 3.0, we will comment on the partial similarities and differences.
conference on current trends in theory and practice of informatics | 1997
Karel Pala; Pavel Rychlý; Pavel Smrz
This paper deals with Czech disambiguated corpus DESAM. It is a tagged corpus which has been manually disambiguated and can be used in various applications. We discuss the structure of the corpus, tools used for its managing, linguistic applications, and also possible use of machine learning techniques relying on the disambiguated data. Possible ways of developing the procedures for complete automatic disambiguation are considered.
language resources and evaluation | 2010
Karel Pala; Pavel Rychlý; Pavel Šmerk
Law texts including constitution, acts, public notices and court judgements form a huge database of texts. As many texts from small domains, the used sublanguage is partially restricted and also different from general language (Czech). As a starting collection of data, the legal database Lexis containing approx. 50,000 Czech law documents has been chosen. Our attention is concentrated mostly on noun groups, which are the main candidates for law terms. We were able to recognize 3992 such different noun groups in the selected text samples. The paper also presents results of the morphological analysis, lemmatization, tagging, disambiguation, and the basic syntactic analysis of Czech law texts as these tasks are crucial for any further sophisticated natural language processing. The verbs in legal texts have been explored preliminarily as well. In this respect, we are trying to explore how the linguistic analysis can help in identification of the semantic nature of law terms.
text speech and dialogue | 2001
Shun Ha Sylvia Wong; Karel Pala
In the paper we compare a selected collection of Chinese radicals and their meanings with Top Ontology developed in the framework of EuroWordNet 1, 2 project (EWN). The main attention is paid to the question whether there are some interesting relations between them and if so whether the knowledge about them can be employed in building more adequate descriptions of natural language semantics. The result shows that Chinese organizes concepts in a very different manner from EWN TO. We discuss what potential implications this organization may have on the future development of EWN.
international conference on computational linguistics | 2005
Karel Pala; Radek Sedláček
In this paper, we deal with the derivational (word formation) relations as they are handled by the Czech morphological module Ajka. First, we show that they represent empirically well-based semantic relations forming small semantic networks, and then we solve the problem how to integrate them into lexical database such as (Czech) WordNet. In this respect we examine the relation between the derivational relations and semantic roles (deep cases) defined as Internal Language Relations in EuroWordNet. An attempt is made to match up the inventory of the semantic roles in EWN with the derivational (semantic) relations. We also use a tool called SAFT that can process a raw (corpus) text in such a way that it uses module Ajka to find links relating the WordNet senses to the noun and verbal lemmata obtained from the raw (corpus) text. This technique allows us to enrich Czech WordNet with the derivational subnets and represent them in a XML format. The result is a new kind of the semantic network, which consists of two layers, upper and lower. The result is a more powerful and efficient resource for applications like tools for WSD, web searching or information extraction.
text speech and dialogue | 2003
Karel Pala; Pavel Rychlý; Pavel Smrž
This paper presents a description of a Czech text corpus (Chyby) containing various kinds of errors such as spelling, typographical, grammatical, style, lexical. We explain how Chyby has been built, how the errors in it have been discovered, marked and annotated. The classification of the errors is presented and the statistics concerning the types of errors is given. The tools for annotating the errors are also described. To the best of our knowledge, this is first text corpus of this sort prepared for Czech.
conference of the european chapter of the association for computational linguistics | 2003
Karel Pala; Radek Sedláċek; Marek Veber
One of the main goals of this paper is to describe a formal procedure linking inflectional and derivational processes in Czech and to indicate that they can be, if appropriate tools and resources are used, applied to other Slavonic languages. The tools developed at the NLP Laboratory FI MU, have been used, particularly the morphological analyser ajka and the program I_par for processing and maintaining the morphological database.
text speech and dialogue | 1999
Karel Pala
In the presented paper we deal with the issue of semantic tagging of the (Czech) corpus texts. An attempt has been made to take advantage of the grammatical tagging and relabel some of the tags as semantic and pragmatic. Then the notion of the enriched valency frame is introduced - we call it lexical valency frame.
text speech and dialogue | 1999
Eva Zácková; Karel Pala
In this paper we present a method for extracting general structures of the verb groups from a tagged and fully disambiguated corpus and consecutive exploitation of these structures for the building a formal grammar in the Prolog DCG fashion. Our goal is to apply them as a rules for the analysis of the Czech verb groups in the nondisambiguated grammatically tagged Czech corpus texts. The problem of the recognition of verb discontinuous constituents in Czech is also approached and obtained statistical data are presented.
text speech and dialogue | 2015
Karel Pala; Pavel Šmerk
The paper describes a new tool Derivancze, which provides an information on derivational relations between Czech words. After a summary of linguistic descriptions of Czech derivation we present a structure of our data and types of derivational relations we use. We compare our approach and results with Czech lexical network DeriNet, in particular, we discuss many differences between the two approaches. Our tool presently works with Czech data only, but the solution is general and can be used also for other languages.