Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Filip Graliński is active.

Publication


Featured researches published by Filip Graliński.


Computational Linguistics - Applications | 2013

PSI-Toolkit: A Natural Language Processing Pipeline

Filip Graliński; Krzysztof Jassem; Marcin Junczys-Dowmunt

The paper presents the main ideas and the architecture of the open source PSI-Toolkit, a set of linguistic tools being developed within a project financed by the Polish Ministry of Science and Higher Education. The toolkit is intended for experienced language engineers as well as casual users not having any technological background. The former group of users is delivered a set of libraries that may be included in their Perl, Python or Java applications. The needs of the latter group should be satisfied by a user friendly web interface. The main feature of the toolkit is its data structure, the so-called PSI-lattice that assembles annotations delivered by all PSI tools. This cohesive architecture allows the user to invoke a series of processes with one command. The command has the form of a pipeline of instructions resembling shell command pipelines known from Linux-based systems.


international multiconference on computer science and information technology | 2010

Matura Evaluation experiment based on human evaluation of machine translation

Aleksandra Wojak; Filip Graliński

A Web-based system for human evaluation of machine translation is presented in this paper. The system is based on comprehension tests similar to the ones used in Polish matura (secondary school-leaving) examinations. The results of preliminary experiments for Polish-English and English-Polish machine translation evaluation are presented and discussed.


language and technology conference | 2015

RetroC - A Corpus for Evaluating Temporal Classifiers.

Filip Graliński; Piotr Wierzchoń

We present a corpus for training and evaluating systems for the dating of Polish texts. A number of baselines (using year references, knowledge of spelling reforms and birth years) are given for the temporal classification task. We also show that the problem can be viewed as a regression problem and a standard supervised learning tool (Vowpal Wabbit) can be applied. So far, the best result has been achieved with supervised learning with word tokens and character 5-g as features. In addition, error analysis of the results obtained with the best solution are presented in this paper.


text speech and dialogue | 2012

Mining the Web for Idiomatic Expressions Using Metalinguistic Markers

Filip Graliński

In this paper, methods for identification and delimitation of idiomatic expressions in large Web corpora are presented. The proposed methods are based on the observation that idiomatic expressions are sometimes accompanied by metalinguistic expressions, e.g. the word “proverbial”, the expression “as they say” or quotation marks. Even though the frequency of such idiom-related metalinguistic markers is not very high, it is possible to identify new idiomatic expressions with a sufficiently large corpus (only type identification of idiomatic expressions is discussed here, not the token identification). In this paper, we propose to combine infrequent but reliable idiom-related markers (such as the word “proverbial”) with frequent but unreliable markers (such as quotation marks). The former could be used for the identification of idiom candidates, the latter – for their delimitation. The experiments for the estimation of recall upper bound of the proposed methods are also presented in this paper. Even though the paper is concerned with identification and delimitations of Polish idiomatic expressions, the approaches proposed here should also be feasible for other languages with sufficiently large web corpora, English in particular.


language and technology conference | 2009

Acquiring bilingual lexica from keyword listings

Filip Graliński; Krzysztof Jassem; Roman Kurc

In this paper, we present a new method for acquiring bilingual dictionaries from on-line text corpora. The method merges rulebased techniques for obtaining dictionaries from structuralised data, such as paper dictionaries (in electronic form) or on-line glossaries, with methods used by aligning tools, such as GIZA. The basic idea is to search for anchor words such as abstract or keywords followed by their equivalents in another language. Text fragments that follow anchor words are likely to supply new entries for bilingual lexica.


international multiconference on computer science and information technology | 2009

Looking for new words out there

Filip Graliński; Marcin Walas

This paper presents methods for automatic extraction of new lexemes from Web corpora in order to obtain a comprehensive list of Polish words. We present the following methods: Reverse Derivation, Compound Formation, List Extraction, extraction of adjectives from addresses, Polonisation of English words. We proceed to describe the process of correcting errors that arise from the application of automated methods. Quantitative evaluation of the project and presentation of its results are given.


text speech and dialogue | 2006

Some methods of describing discontinuity in polish and their cost-effectiveness

Filip Graliński

The aim of this paper is to present some methods of handling discontinuity (and freer word order in general) within a medium-level grammatical framework A context-free formalism and the “backbone” set of rules for verbal phrases are presented as the background for this paper The main result consists in showing how discontinuous infinitive phrases and discontinuous noun phrases (interrogative phrases included) can be theoretically covered within the introduced formalism and similar grammatical frameworks The second result reported in this paper is the cost-effectiveness analysis of introducing discontinuity rules into a medium-level grammatical framework: it turns out that attempting to cover some types of discontinuity may be unprofitable within a given grammatical framework Although only examples from the Polish language are discussed, the described solutions are likely to be relevant for other languages with similar word order properties.


computational methods in science and technology | 2018

Re-research.pl: where Humanities Meet Computer Science

Daniel Dzienisiewicz; Łukasz Borchmann; Piotr Wierzchoń; Filip Graliński

The article discusses selected projects from the field of digital humanities realised by the Re-research.pl group. The group consists of researchers from the Institute of Linguistics and the Department of Natural Language Processing at Adam Mickiewicz University, Poznań, Poland. The projects discussed include National Photocorpus of Polish, Discovermat, Korea, Koreans and ‘Koreanity’ in the digitised Polish press of the 20 century, Biography of the Nation, 100,000 ministories, Gonito.net and 50,000 words. Domain and chronologisation index. However, the main focus of the article is the interdisciplinary popular-scientific blog Re-research.pl. The daily blog posts include texts on a variety of subjects, ranging from linguistics, history and folklore to computer science. Selected posts and categories of posts are discussed, such as chronologisational challenges, texts devoted to folklore and materials on the structure of text files. Apart from providing daily analyses, the blog promotes other projects and serves as a dialogue platform for representatives of various fields.


Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage | 2017

The RetroC challenge: how to guess the publication year of a text?

Filip Graliński; Rafał Jaworski; Łukasz Borchmann; Piotr Wierzchoń

This article describes research in automatic content-based temporal classification of texts. Experiments are carried out on a set of texts coming from Polish digital libraries, dating between the years 1814 and 2013. Following successful research in the field of temporal classification, this work aims at creating an automatic dating mechanism to be used in situations, where the publication date of the text is unknown. Automatic publication date assessment from the computer system can provide useful for researchers from various fields of humanities, such as history (incl. history of language), culture-historical archaeology, sociology or anthropology.


text speech and dialogue | 2016

Vive la Petite Différence

Filip Graliński; Rafał Jaworski; Łukasz Borchmann; Piotr Wierzchoń

This article describes a series of experiments on gender attribution of Polish texts. The research was conducted on the publicly available corpus called “He Said She Said”, consisting of a large number of short texts from the Polish version of Common Crawl. As opposed to other experiments on gender attribution, this research takes on a task of classifying relatively short texts, authored by many different people.

Collaboration


Dive into the Filip Graliński's collaboration.

Top Co-Authors

Avatar

Piotr Wierzchoń

Adam Mickiewicz University in Poznań

View shared research outputs
Top Co-Authors

Avatar

Krzysztof Jassem

Adam Mickiewicz University in Poznań

View shared research outputs
Top Co-Authors

Avatar

Marcin Junczys-Dowmunt

Adam Mickiewicz University in Poznań

View shared research outputs
Top Co-Authors

Avatar

Rafał Jaworski

Adam Mickiewicz University in Poznań

View shared research outputs
Top Co-Authors

Avatar

Łukasz Borchmann

Adam Mickiewicz University in Poznań

View shared research outputs
Top Co-Authors

Avatar

Tomasz Kowalski

Adam Mickiewicz University in Poznań

View shared research outputs
Top Co-Authors

Avatar

A. Wagner

Adam Mickiewicz University in Poznań

View shared research outputs
Top Co-Authors

Avatar

Aleksandra Wojak

Adam Mickiewicz University in Poznań

View shared research outputs
Top Co-Authors

Avatar

Marcin Walas

Adam Mickiewicz University in Poznań

View shared research outputs
Researchain Logo
Decentralizing Knowledge