Václav Novák
Charles University in Prague
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Václav Novák.
international workshop/conference on parsing technologies | 2005
Keith B. Hall; Václav Novák
We present a corrective model for recovering non-projective dependency structures from trees generated by state-of-the-art constituency-based parsers. The continuity constraint of these constituency-based parsers makes it impossible for them to posit non-projective dependency trees. Analysis of the types of dependency errors made by these parsers on a Czech corpus show that the correct governor is likely to be found within a local neighborhood of the governor proposed by the parser. Our model, based on a MaxEnt classifier, improves overall dependency accuracy by .7% (a 4.5% reduction in error) with over 50% accuracy for non-projective structures.
text speech and dialogue | 2007
Václav Novák; Zdeněk Žabokrtsky
In this paper we present the results of our experiments with modifications of the feature set used in the Czech mutation of the Maximum Spanning Tree parser. First we show how new feature templates improve the parsing accuracy and second we decrease the dimensionality of the feature space to make the parsing process more effective without sacrificing accuracy.
workshop on statistical machine translation | 2009
Ondřej Bojar; David Mareċek; Václav Novák; Martin Popel; Jan Pt'aċek; Jan Rouš; Zdenėk Żabokrtsk'y
We describe two systems for English-to-Czech machine translation that took part in the WMT09 translation task. One of the systems is a tuned phrase-based system and the other one is based on a linguistically motivated analysis-transfer-synthesis approach.
annual meeting of the special interest group on discourse and dialogue | 2009
Giang Linh Ngduy; Václav Novák; Zdenėk Żabokrtsk'y
In this paper we compare two Machine Learning approaches to the task of pronominal anaphora resolution: a conventional classification system based on C5.0 decision trees, and a novel perceptron-based ranker. We use coreference links annotated in the Prague Dependency Treebank 2.0 for training and evaluation purposes. The perceptron system achieves f-score 79.43% on recognizing coreference of personal and possessive pronouns, which clearly outperforms the classifier and which is the best result reported on this data set so far.
linguistic annotation workshop | 2009
Václav Novák; Magda Razímová
We present a new method for automated discovery of inconsistencies in a complex manually annotated corpora. The proposed technique is based on Apriori algorithm for mining association rules from datasets. By setting appropriate parameters to the algorithm, we were able to automatically infer highly reliable rules of annotation and subsequently we searched for records for which the inferred rules were violated. We show that the violations found by this simple technique are often caused by an annotation error. We present an evaluation of this technique on a hand-annotated corpus PDT 2.0, present the error analysis and show that in the first 100 detected nodes 20 of them contained an annotation error.
north american chapter of the association for computational linguistics | 2007
Václav Novák
We present a demonstration of an annotation tool designed to annotate texts into a semantic network formalism called Multi-Net. The tool is based on a Java Swing GUI and allows the annotators to edit nodes and relations in the network, as well as links between the nodes in the network and the nodes from the previous layer of annotation. The data processed by the tool in this presentation are from the English version of the Wall Street Journal.
Trends in Parsing Technology | 2010
Keith B. Hall; Václav Novák
This chapter presents a discriminative modeling technique which corrects the errors made by an automatic parser. The model is similar to reranking; however, it does not require the generation of k-best lists as in MCDonald et al. (2005), McDonald and Pereira (2006), Charniak and Johnson (2005), and Hall (2007). The corrective strategy employed by our technique is to explore a set of candidate parses which are constructed by making structurally—local perturbations to an automatically generated parse tree. We train a model which makes local, corrective decisions in order to optimize for parsing performance. The technique is independent of the parser generating the first set of parses. We show in this chapter that the only requirement for this technique is the ability to define a local neighborhood in which a large number of the errors occur.
north american chapter of the association for computational linguistics | 2009
Václav Novák; Sven Hartrumpf; Keith B. Hall
We introduce a large-scale semantic-network annotation effort based on the MutliNet formalism. Annotation is achieved via a process which incorporates several independent tools including a MultiNet graph editing tool, a semantic concept lexicon, a user-editable knowledge-base for semantic concepts, and a MultiNet parser. We present an evaluation metric for these semantic networks, allowing us to determine the quality of annotations in terms of inter-annotator agreement. We use this metric to report the agreement rates for a pilot annotation effort involving three annotators.
Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006 | 2006
Václav Novák
We present a comparison of two formalisms for representing natural language utterances, namely deep syntactical Tectogrammatical Layer of Functional Generative Description (FGD) and a semantic formalism, MultiNet. We discuss the possible position of MultiNet in the FGD framework and present a preliminary mapping of representational means of these two formalisms.
Archive | 2008
David Mareÿcek; Václav Novák