Is this you? Create Your Porfile

Marcin Kuta

AGH University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marcin Kuta is active.

Explore More

Publication

Featured researches published by Marcin Kuta.

soft computing | 2010

Clustering polish texts with latent semantic analysis

Marcin Kuta; Jacek Kitowski

The document clustering is an important technique of Natural Language Processing (NLP). The paper presents performance of partitional and agglomerative algorithms applied to clustering large number of Polish newspaper articles. We investigate different representations of the documents. The focus of the paper is on the applicability of the Latent Semantic Analysis to such clustering for Polish.

international conference on computational science | 2008

Accuracy of Baseline and Complex Methods Applied to Morphosyntactic Tagging of Polish

Marcin Kuta; Michał Wrzeszcz; Paweł Chrząszcz; Jacek Kitowski

The paper presents baseline and complex part-of-speech taggers applied to the modified corpus of Frequency Dictionary of Contemporary Polish. Accuracy of 5 baseline part-of-speech taggers is reported. On the base of these results complex methods are worked out. Thematic split and attribute split methods are proposed and evaluated. Tagging accuracy of voting methods is evaluated finally. The most accurate baseline taggers are SVMTool (for a simple tagset) and fnTBL (for a complex tagset). Voting method called Total Precision achieves the top accuracy among all looked over methods.

international conference on artificial intelligence and soft computing | 2014

Optimisation of Character n-gram Profiles Method for Intrinsic Plagiarism Detection

Marcin Kuta; Jacek Kitowski

The focus of the paper is to improve intrinsic plagiarism detection. The paper investigates and improves performance of character n-grams profiles method proposed by Stamatatos by tuning its parameter settings and proposing new modifications and rich feature sets. We raised the overall plagdet score from 24.67% to 33.41% for the PAN-PC09 corpus and from 18.83% to 26.66% for the PAN-PC11 corpus. Results are reported on PAN-PC09 and PAN-PC11 corpora, which are especially well suited for this task and were previously used in Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection (PAN) competitions.

Computer Science | 2011

BENCHMARKING HIGH PERFORMANCE ARCHITECTURES WITH NATURAL LANGUAGE PROCESSING ALGORITHMS

Marcin Kuta; Jacek Kitowski

Natural Language Processing algorithms are resource demanding, especially when tuning toinflective language like Polish is needed. The paper presents time and memory requirementsof part of speech tagging and clustering algorithms applied to two corpora of the Polishlanguage. The algorithms are benchmarked on three high performance platforms of differentarchitectures. Additionally sequential versions and OpenMP implementations of clusteringalgorithms were compared.

parallel processing and applied mathematics | 2005

Data access time estimation for the CASTOR HSM system

Marcin Kuta; Darin Nikolow; Renata Slota; Jacek Kitowski

The paper presents a system for estimating the access time for data stored on the CASTOR Hierarchical Storage Management (HSM) system developed at CERN. The estimation is based on the gray-box approach. The system consists of two modules: Monitor and Simulator. Information about the current state of the HSM system is obtained by the Monitor via CASTOR API functions. The second module is based on event driven simulation of the HSM system. Special attention to the queuing algorithm is paid. Implementation details and tests results are presented.

international conference on artificial intelligence and soft computing | 2016

Authorship Attribution of Polish Newspaper Articles

Marcin Kuta; Bartłomiej Puto; Jacek Kitowski

This paper examines the machine learning approach to authorship attribution of articles in the Polish language. The focus is on the effect of the data volume, number of authors and thematic homogeneity on authorship attribution quality. We study the impact of feature selection under various feature selection criteria, mainly chi square and information gain measures, as well as the effect of combining features of different types. Results are reported for the Rzeczpospolita corpus in terms of the \(F_1\) measure.

parallel processing and applied mathematics | 2009

Application of stacked methods to part-of-speech tagging of polish

Marcin Kuta; Wojciech Wójcik; Michał Wrzeszcz; Jacek Kitowski

We compare the accuracy of several single and combination part-of-speech tagging methods applied to Polish and evaluated on the modified corpus of Frequency Dictionary of Contemporary Polish (m-FDCP). Three well known combination methods (weighted voting, distributed voting, and stacked) are analyzed, as well as two new weighted voting methods: MorphCatPrecision and AmbClassPrecision methods are proposed. The MorphCatPrecision method achieves the highest accuracy among all considered weighted voting methods. The best combination method achieves 11.9% error reduction with respect to the best baseline tagger. We report also the statistical significance of the difference in accuracy between various methods measured by means of the McNemar test. Selection of the best algorithms was conducted on a multiprocessor supercomuter due to the high time and memory requirements of most of these algorithms.

parallel processing and applied mathematics | 2009

Replica management for national data storage

Renata Slota; Darin Nikolow; Marcin Kuta; Mariusz Kapanowski; Kornel Skałkowski; Marek Pogoda; Jacek Kitowski

National Data Storage is a distributed data storage system intended to provide high quality backup, archiving and data access services. These services guarantee high level of data protection as well as high performance of data storing and retrieval by using replication techniques. In this paper some conceptual and implementation details on creating a Prediction and Load Balancing Subsystem for replica management are presented. Preliminary real system test results are also shown.

text speech and dialogue | 2017

Sentiment Analysis with Tree-Structured Gated Recurrent Units

Marcin Kuta; Mikołaj Morawiec; Jacek Kitowski

Advances in neural network models and deep learning mark great impact on sentiment analysis, where models based on recursive or convolutional neural networks show state-of-the-art results leaving behind non-neural models like SVM or traditional lexicon-based approaches. We present Tree-Structured Gated Recurrent Unit network, which exhibits greater simplicity in comparison to the current state of the art in sentiment analysis, Tree-Structured LSTM model.

international conference on parallel processing | 2017

Actor Model of a New Functional Language - Anemone

Paweł Batko; Marcin Kuta

This paper describes actor system of a new functional language called Anemone and compares it with actor systems of Scala and Erlang. Implementation details of the actor system are described. Performance evaluation is provided on sequential and concurrent programs.

Explore More