Tomasz Walkowiak
University of Science and Technology, Sana'a
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tomasz Walkowiak.
depcos-relcomex | 2017
Tomasz Walkowiak
The paper presents Language Processing Modelling Notation (LPMN). It is a formal language used to orchestrate a set of NLP microservices. The LPMN allows modeling and running complex workflows of language and machine learning tools. The scalability of the solution was achieved by a usage of message-oriented middleware. LPMN is used for developing text mining application with web-based interface and performing research experiments that requires a usage of NLP and machine learning tools.
international conference on artificial intelligence and soft computing | 2018
Tomasz Walkowiak; Szymon Datko; Henryk Maciejewski
In this work we evaluate two different methods for deriving features for a subject classification of text documents. The first method uses the standard Bag-of-Words (BoW) approach, which represents the documents with vectors of frequencies of selected terms appearing in the documents. This method heavily relies on the natural language processing (NLP) tools to properly preprocess text in the grammar- and inflection-conscious way. The second approach is based on the word-embedding technique recently proposed by Mikolov and does not require any NLP preprocessing. In this method the words are represented as vectors in continuous space and this representation of words is used to construct the feature vectors of the documents. We evaluate these fundamentally different approaches in the task of classification of Polish language Wikipedia articles with 34 subject areas. Our study suggests that the word-embedding based features seem to outperform the standard NLP-based features providing sufficiently large training dataset is available.
international conference on artificial intelligence and soft computing | 2017
Maciej Baj; Tomasz Walkowiak
The aim of the paper is to compare stylometric methods in a task of authorship, author gender and literacy period recognition for texts in Polish language. Different feature selection and classification methods were analyzed. Features sets include common words (the most common, the rarest and all words) and grammatical classes frequencies, as well as simple statistics of selected characters, words and sentences. Due to the fact that Polish is a highly inflected language common words features are calculated as the frequencies of the lexemes obtained by morpho-syntactic tagger for Polish. Nine different classifiers were analysed. Authors tested proposed methods on a set of Polish novels. Recognition was done on whole novels and chunked texts. Performed experiments showed that the best results are obtained for features based on all words. For ill defined problems (with small recognition accuracy) the random forest classifier gave the best results. In other cases (for tasks with medium or high recognition accuracy) the multilayer perceptron and the linear regression learned by stochastic gradient descent gave the best results. Moreover, the paper includes an analysis of statistical importance of used features.
depcos-relcomex | 2017
Dariusz Caban; Tomasz Walkowiak
A class of Systems-of-Systems (SoS) is considered, where systems are hierarchically composed of subsystems. The structure of the system changes during its lifetime, i.e. component subsystems are moved to other parents. Each system has its configurable parameters. When the configuration changes, it may lead to conflicts in the configuration of its components. There are constraints on component systems configurations that are not limited to the systems, or even to their ancestors in the hierarchy. A domain specific language is proposed to describe constraints in the SoS. It consists of a list of assertions that the SoS configuration must meet. Each assertion is a logical expression that is scoped to a specific subset of component systems.
depcos-relcomex | 2016
Tomasz Walkowiak
The paper presents an online system for clustering and classification of texts in the Polish language. It allows running complex workflows of language and machine learning tools. A high throughput and low latency was achieved by an asynchronous style of programming and a usage of message oriented middleware—RabbitMQ. Authors discuss the architecture assumptions, the language processing modelling notation for a workflow definition and the system architecture. Moreover, a sample Single Page Application is presented that clusters uploaded corpora and shows results online.
international conference on artificial intelligence and soft computing | 2018
Tomasz Walkowiak; Maciej Piasecki
In this work we compare different methods for deriving features for text representation in two stylometric tasks of gender and author recognition. The first group of methods uses the Bag-of-Words (BoW) approach, which represents the documents with vectors of frequencies of selected features occurring in the documents. We analyze features such as the most frequent 1000 lemmas, word forms, all lemmas, selected (content insensitive) lemmas, bigrams of grammatical classes and mixture of bigrams of grammatical classes, selected lemmas and punctuations. Moreover, the approach based on the recently proposed fastText algorithm (for vector based representation of text) is also applied. We evaluate these different approaches on two publicly available collections of Polish literary texts from late 19th- and early 20th-century: one consisting of 99 novels from 33 authors and the second one 888 novels from 58 authors. Our study suggests that depending on the corpora the best are the style features (grammatical bigrams) or semantic features (1000 lemmas extracted from the training set). We also noticed the importance of proper division of corpora into training and testing sets.
International Conference on Dependability and Complex Systems | 2018
Tomasz Walkowiak; Szymon Datko; Henryk Maciejewski
This paper deals with the problem of classification of Polish language documents in terms of a subject category. We compare four state-of-the-art approaches to this task which differ primarily in the way the documents are represented by feature vectors. Two methods considered in the study use frequency-of-words or frequency-of-topics representation of the documents and rely on the Natural Language Processing (NLP) technology to pre-process the raw text. Two alternative methods do not involve the NLP technology. They construct feature vectors using vector representation of words (Word2Vec method) or using a frequency of topics derived from the raw text. These four approaches are evaluated using 3 corpora with 5, 34 and 25 subject categories respectively and with a different level of class discrimination. Results suggest that no single method outperforms other method in all tests, however tests with large number of training observations seem to favour the NLP-free Word2Vec methods.
International Conference on Dependability and Complex Systems | 2018
Dariusz Caban; Tomasz Walkowiak
A class of Systems-of-Systems (SoS) is considered, where systems are hierarchically composed of subsystems. If a system component fails, the system configuration must change (to a new valid state) to tolerate this fault. The dependability of SoS systems improves significantly due to these fault driven reconfigurations. Two methods are proposed to estimate this improvement. One relies on determining the minimal configurations and applying the k-out-of-n reliability model. The second is based on state-transition analysis, where for each state a valid configuration is searched for. As discussed, this is a more appropriate approach, though more complex computationally. Both approaches require an efficient tool for pre-validating configurations. The domain specific language, proposed in [4], is demonstrated to be useful for this.
International Conference on Reliability and Statistics in Transportation and Communication | 2017
Artur Zochniak; Tomasz Walkowiak
Websites serving dynamic content must access remotely stored data to present it in a browser, but before that an appropriate document has to be built. The process of building a WWW document is done fully on the client-side. That means that its duration can be measured and results compared among different solutions. To ensure the best possible user experience it should be as short as possible. Authors present performance comparison of observer design pattern implementations in two JavaScript frameworks: AngularJS, EmberJS. Different types of data observer implementations are described. Authors implemented an exemplar application in all analyzed frameworks and tested their performance in a function of size of input data. The fastest solution is shown and reasons of differences is analyzed. The presented results allow to build fast response web pages.
International Conference on Reliability and Statistics in Transportation and Communication | 2017
Marcin Pol; Tomasz Walkowiak; Maciej Piasecki
The paper presents a new functionality of CLARIN-PL Language Technology Centre (LTC). LTC Platform is developed as a research place for processing, visualizing and depositing language data. It can connect and support the research workflow, enabling scientists to increase the efficiency and effectiveness of their research in connection to CLARIN services. The platform is a free and open source web application. Researchers can use it to collaborate, document, archive, share, and register research projects, materials, and data.