Petr Pajas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Petr Pajas is active.

Explore More

Publication

Featured researches published by Petr Pajas.

workshop on statistical machine translation | 2008

TectoMT: Highly Modular MT System with Tectogrammatics Used as Transfer Layer

Zdenek Zabokrtsky; Jan Ptáček; Petr Pajas

We present a new English→Czech machine translation system combining linguistically motivated layers of language description (as defined in the Prague Dependency Treebank annotation scenario) with statistical NLP approaches.

international conference on computational linguistics | 2008

Recent Advances in a Feature-Rich Framework for Treebank Annotation

Petr Pajas; Jan Štėpánek

This paper presents recent advances in an established treebank annotation framework comprising of an abstract XML-based data format, fully customizable editor of tree-based annotations, a toolkit for all kinds of automated data processing with support for cluster computing, and a work-in-progress database-driven search engine with a graphical user interface built into the tree editor.

text speech and dialogue | 2001

The Current Status of the Prague Dependency Treebank

Eva Hajičová; Jan Hajic; Barbora Hladká; Martin Holub; Petr Pajas; Veronika Reznícková; Petr Sgall

The Prague Dependency Treebank (PDT) project is conceived of as a many-layered scenario, both from the point of view of the stratal annotation scheme, from the division-of-labor point of view, and with regard to the level of detail captured at the highest, tectogrammatical layer. The following aspects of the present status of the PDT are discussed in detail: the now-available PDT version 1.0, annotated manually at the morphemic and analytic layers, including the recent experience with post-annotation checking; the ongoing effort of tectogrammatical layer annotation, with a specific attention to the so-called model collection; and to two different areas of exploitation of the PDT, for linguistic research purposes and for information retrieval application purposes.

Journal of Quantitative Linguistics | 2010

Full Valency. Verb Valency without Distinguishing Complements and Adjuncts

Radek Čech; Petr Pajas; Ján Macutek

Abstract The aim of the article is to introduce a new approach to verb valency analysis. This approach – full valency – observes properties of verbs which occur solely in actual language usage. The term “full valency” means that all arguments, without distinguishing complements (obligatory arguments governed by the verb) and adjuncts (optional arguments directly dependent on the predicate verb), are taken into account. Because of an expectation that full valency reflects some mechanism which governs verb behaviour in a language, hypotheses concerning (1) the distribution of full valency frames, (2) the relationship between the number of valency frames and the frequency of the verb, and (3) the relationship between the number of valency frames and verb length were tested empirically. To test the hypotheses, a Czech syntactically annotated corpus – the Prague Dependency Treebank – was used.

linguistic annotation workshop | 2009

The Coding Scheme for Annotating Extended Nominal Coreference and Bridging Anaphora in the Prague Dependency Treebank

Anna Nedoluzhko; Jiří Mírovský; Petr Pajas

The present paper outlines an ongoing project of annotation of the extended nominal coreference and the bridging anaphora in the Prague Dependency Treebank. We describe the annotation scheme with respect to the linguistic classification of coreferential and bridging relations and focus also on details of the annotation process from the technical point of view. We present methods of helping the annotators -- by a pre-annotation and by several useful features implemented in the annotation tool. Our method of the inter-annotator agreement is focused on the improvement of the annotation guidelines; we present results of three subsequent measurements of the agreement.

spoken language technology workshop | 2008

PDTSL: An annotated resource for speech reconstruction

Jan Hajic; Silvie Cinková; Marie Mikulová; Petr Pajas; Jan Ptáček; Josef Toman; Zdenka Uresová

We present a description of a new resource (Prague Dependency Treebank of Spoken Language) being created for English and Czech to be used for the task of speech understanding, broad natural language analysis for dialog systems and other speech-related tasks, including speech editing. The resources we have created so far contain audio and a standard transcription of spontaneous speech, but as a novel layer, we add an edited (ldquoreconstructedrdquo) version of the spoken utterances. These edits go beyond the scope of current speech reconstruction efforts in that we allow, on top of the usual deletions of speech artifacts, fillers, etc. also for word modifications, insertions and word order changes. We have used both monologue and dialogue recordings in English and Czech to verify the feasibility of such transcription. We have also assessed the quality of the resulting annotation since the relative freedom of the editing raises an issue of what a ldquocorrectrdquo annotation is.

Glottotheory | 2009

Pitfalls of the Transitivity Hypothesis: Transitivity in Conversation and Written Language in Czech

Radek Čech; Petr Pajas

Abstract The aim of the article is to test empirically predictions formulated in the Transitivity Hypothesis framework. Methodological problems of the original approach are discussed and some solutions are offered. For the testing of the hypotheses two corpora of Czech were used (Prague Spoken Corpus and Prague Dependency Treebank). The results question both the predicted impact of the language form on transitivity and, more importantly, the concept of the Transitivity Hypothesis in general.

text speech and dialogue | 2017

PDTSC 2.0 - Spoken Corpus with Rich Multi-layer Structural Annotation

Marie Mikulová; Jiří Mírovský; Anja Nedoluzhko; Petr Pajas; Jan Štěpánek; Jan Hajic

We present a richly annotated spoken language resource, the Prague Dependency Treebank of Spoken Czech 2.0, the primary purpose of which is to serve for speech-related NLP tasks. The treebank features several novel annotation schemas close to the audio and transcript, and the morphological, syntactic and semantic annotation corresponds to the family of Prague Dependency Treebanks; it could thus be used also for linguistic studies, including comparative studies regarding text and speech. The most unique and novel feature is our approach to syntactic annotation, which differs from other similar corpora such as Treebank-3 [8] in that it does not attempt to impose syntactic structure over input, but it includes one more layer which edits the literal transcript to fluent Czech while keeping the original transcript explicitly aligned with the edited version. This allows the morphological, syntactic and semantic annotation to be deterministically and fully mapped back to the transcript and audio. It brings new possibilities for modeling morphology, syntax and semantics in spoken language – either at the original transcript with mapped annotation, or at the new layer after (automatic) editing. The corpus is publicly and freely available.

text speech and dialogue | 2000

Evaluation of Tectogrammatical Annotation of PDT

Eva Hajičová; Petr Pajas

Two phases of an evaluation of annotating a Czech text corpus on an underlying syntactic level are described and the results are compared and analysed.

language resources and evaluation | 2006