Ján Macutek
Comenius University in Bratislava
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ján Macutek.
Glottotheory | 2009
Gabriel Altmann; Peter Grzybek; Bijapur D. Jayaram; Reinhard Köhler; Viktor Krupa; Ján Macutek; Regina Pustet; Ludmilla Uhlirová; Matummal N. Vidya; Ioan-Iovitz Popescu
Word frequency plays a prominent role in many scientific and applicational fields. The book presents innovative methods in research and new results important for language and text characterization. Based on a general theory, surprising interrelations are shown between word frequency and other linguistic properties. Interrelations between previously known methods and new characteristics such as the h-point and other measures developed in the book are investigated. Furthermore, new statistical tests are introduced.
Journal of Quantitative Linguistics | 2010
Radek Čech; Petr Pajas; Ján Macutek
Abstract The aim of the article is to introduce a new approach to verb valency analysis. This approach – full valency – observes properties of verbs which occur solely in actual language usage. The term “full valency” means that all arguments, without distinguishing complements (obligatory arguments governed by the verb) and adjuncts (optional arguments directly dependent on the predicate verb), are taken into account. Because of an expectation that full valency reflects some mechanism which governs verb behaviour in a language, hypotheses concerning (1) the distribution of full valency frames, (2) the relationship between the number of valency frames and the frequency of the verb, and (3) the relationship between the number of valency frames and verb length were tested empirically. To test the hypotheses, a Czech syntactically annotated corpus – the Prague Dependency Treebank – was used.
Journal of Quantitative Linguistics | 2013
Ján Macutek; Gejza Wimmer
Abstract The paper questions the use of the Pearson chi-square goodness-of-fit test for discrete models in linguistics. It is argued that the stochastic independence, one of necessary conditions for a correct application of the test, is not realistic for linguistic data. Several alternative possibilities (computational and empirical approaches) are suggested. Advantages and drawbacks of the alternatives are discussed.
Journal of Quantitative Linguistics | 2007
Ján Macutek; Gabriel Altmann
Abstract We derive a mathematically based method for switching from continuous to discrete linguistic models and back. Several examples are presented. A general algorithmic approach is suggested.
Journal of Quantitative Linguistics | 2007
Agnieszka Kułacka; Ján Macutek
Abstract The aim of the article is to present the most recent outcomes of investigation into the Menzerath-Altmann law. We have addressed the formula of the law introduced by Gabriel Altmann and we will discuss its limits leading to a formula defined for discrete data. We will also compare the results of the research on the law in Polish syntax while applying the formulae for discrete data and continuous data.
Archive | 2015
Arjuna Tuzzi; Martina Benesová; Ján Macutek
Quantitative Linguistics is a rapidly developing discipline covering more and more areas of linguistic and textological research. The book represents an overview of the state of the art in Quantitative Linguistics, its scope and reach. Some of the topics: linguistic laws, frequency analyses, synergetic models of language, networks, part-of-speech systems, authorship attribution, polyfunctionality and polysemy, and opinion target identification.
Journal of Quantitative Linguistics | 2016
Michaela Koščová; Ján Macutek; Emmerich Kelih
Abstract The Ord graph is a simple graphical method for displaying frequency distributions of data or theoretical distributions in the two-dimensional plane. Its coordinates are proportions of the first three moments, either empirical or theoretical. A modification of the Ord graph based on proportions of indices of qualitative variation is presented. Such a modification makes the graph applicable also to categorical data. In addition, the indices are normalized with values between 0 and 1, which enables comparison of data files divided into different numbers of categories. Both the original and the new graph are used to display grapheme frequencies in eleven Slavic languages. As the original Ord graph requires an assignment of numbers to the categories, graphemes are ordered by decreasing frequency. Data are taken from parallel corpora; in the present instance these are grapheme frequencies from a Russian novel and its translations into ten other Slavic languages. Cluster analysis is then applied to the graph coordinates. While the original graph yields results which are not linguistically interpretable, its modification reveals meaningful relations among the languages.
Glottotheory | 2010
Ioan-Iovitz Popescu; Ján Macutek; Gabriel Altmann
Computing the vocabulary richness of a text has always been joined with two serious difficulties: the indicators usually depended on N, the text size (= number of tokens in text), and the variance of the vocabulary (Var(V)) could not be derived and computed. There are several trials to overcome this difficulty, for example in Ejiri, Smith (1993) who normalized the relationship between V and N and proposed a useful but non-testable indicator. In our previous work (cf. Popescu, Ma utek, Altmann 2009; Popescu et al. 2010) we tried to involve into computations the arc length formed between the individual frequencies of the rank-frequency distribution. In this case the vocabulary of word-forms or lemmas represents simply the inventory (= number of types). Since the maximal value of the arc length can easily be computed, the arc length can be relativized and its variance can be derived. An asymptotic test is possible. Nevertheless, the arc length increases with increasing text length, too, thus at last, one must try to normalize it in such a way that N plays only the role of a constant; in such a case an indicator based on the rank-frequency distribution can be used both for measuring style differences as well as differences between languages, i.e. for typological purposes. In the past, one frequently tried to stabilize or to normalize the TTR and create an indicator of vocabulary richness but one could not eliminate the impact of text size. The problems connected with these endeavours are shown in Wimmer, Altmann (1999). In the present paper we do not touch either TTR or vocabulary richness but concentrate on the rank-frequency distribution alone. Let V be the vocabulary of word-forms in a text, {f1,f2,...fV} be the sequence of ranked frequencies of word forms, N the text length (= number of tokens). Let the arc length be defined as
Theory of Probability and Its Applications | 2006
Ján Macutek
Let random variables
Archive | 2018
Radek Čech; Jiří Milička; Ján Macutek; Michaela Koščová; Markéta Lopatková
X^{\ast}, X