Julian Szymański
Gdańsk University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Julian Szymański.
networked digital technologies | 2010
Julian Szymański
The paper concerns the problem of automatic category system creation for a set of documents connected with references. Presented approach has been evaluated on the Polish Wikipedia, where two graphs: the Wikipedia category graph and article graph has been analyzed. The linkages between Wikipedia articles has been used to create a new category graph with weighted edges. We compare the created category graph with the original Wikipedia category graph, testing its quality in terms of coverage.
Cybernetics and Systems | 2014
Julian Szymański
In our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article—evaluation of approaches to text representation for machine learning tasks—indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot be compensated for even by sophisticated machine learning algorithms. It confirms the thesis that proper data representation is a prerequisite for achieving high-quality results of data analysis. Evaluation of the text representations was performed within the Wikipedia repository by examination of classification parameters observed during automatic reconstruction of human-made categories. For that purpose, we use a classifier based on a support vector machines method, extended with multilabel and multiclass functionalities. During classifier construction we observed parameters such as learning time, representation size, and classification quality that allow us to draw conclusions about text representations. For the experiments presented in the article, we use data sets created from Wikipedia dumps. We describe our software, called Matrix’u, which allows a user to build computational representations of Wikipedia articles. The software is the second contribution of our research, because it is a universal tool for converting Wikipedia from a human-readable form to a form that can be processed by a machine. Results generated using Matrix’u can be used in a wide range of applications that involve usage of Wikipedia data.
asian conference on intelligent information and database systems | 2011
Julian Szymański
The article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality reduction of raw data using Principal Component Analysis. We introduce hierarchical organization of the categorized articles changing the granularity of SOM network. The categorization method has been used in implementation of the system that clusters results of keyword-based search in Polish Wikipedia.
international conference on neural information processing | 2008
Pawel Majewski; Julian Szymański
Most text categorization research exploit bag-of-words text representation. However, such representation makes it very hard to capture semantic similarity between text documents that share very little or even no vocabulary. In this paper we present preliminary results obtained with a novel approach that combines well established kernel text classifiers with external contextual commonsense knowledge. We propose a method for computing semantic similarity between words as a result of diffusion process in ConceptNet semantic space. Evaluation on a Reuters dataset show an improvement in precision of classification.
international conference on distributed computing and internet technology | 2013
Julian Szymański
In the article we evaluate different text representation methods used for a task of Wikipedia articles categorization. We present the Matrix’u application used for creating computational datasets of Wikipedia articles. The representations have been evaluated with SVM classifiers used for reconstruction human made categories.
international conference on computational collective intelligence | 2012
Julian Szymański
In the article we present an approach to improvement of retrieval information from large text collections using words context vectors. The vectors have been created analyzing English Wikipedia with Hyperspace Analogue to Language model of words similarity. For test phrases we evaluate retrieval with direct user queries as well as retrieval with context vectors of these queries. The results indicate that the proposed method can not replace retrieval based on direct user queries but it can be used for refining the search results.
ISMIS Industrial Session | 2011
Karol Draszawka; Julian Szymański
This article handles the problem of validating the results of nested (as opposed to “flat”) clusterings. It shows that standard external validation indices used for partitioning clustering validation, like Rand statistics, Hubert Γ statistic or F-measure are not applicable in nested clustering cases. Additionally to the work, where F-measure was adopted to hierarchical classification as hF-measure, here some methods to get desired hRand and hΓ indices for nested clustering are presented. Introduced measures are evaluated and, as an exemplary application, a validation of nested clustering methods for Wikipedia articles using OPTICS algorithm is shown.
international symposium on neural networks | 2007
Julian Szymański; Włodzisław Duch
Many language-oriented problems cannot be solved without semantic memory containing descriptions of concepts at different level of details. Automatic creation of semantic memories is a great challenge even for the simplest knowledge representation methods based on relations between concepts and keywords. Semantic memory based on such simple knowledge representation facilitates implementation of quite interesting linguistic competences that have not yet been demonstrated by more sophisticated rule or frame-based knowledge bases, for example CYC. These linguistic abilities include word games, such as the twenty questions game, that may be implemented using semantic memory built on relational model for knowledge representation. Creation of large-scale knowledge base for semantic memory involves mining structured information sources (ontologies, dictionaries, encyclopedic entries) and free texts (textbooks and Internet sources). Quality of this knowledge may be improved using collaborative projects in which systems that already possess some linguistic competence actively interact with human users, mining their knowledge. In this article three dialog scenarios for mining human knowledge are introduced, and the data acquired into semantic memory structures through such interaction is described.
Sensors | 2017
Higinio Mora; David Gil; Rafael Muñoz Terol; Jorge Azorin-Lopez; Julian Szymański
The new Internet of Things paradigm allows for small devices with sensing, processing and communication capabilities to be designed, which enable the development of sensors, embedded devices and other ‘things’ ready to understand the environment. In this paper, a distributed framework based on the internet of things paradigm is proposed for monitoring human biomedical signals in activities involving physical exertion. The main advantages and novelties of the proposed system is the flexibility in computing the health application by using resources from available devices inside the body area network of the user. This proposed framework can be applied to other mobile environments, especially those where intensive data acquisition and high processing needs take place. Finally, we present a case study in order to validate our proposal that consists in monitoring footballers’ heart rates during a football match. The real-time data acquired by these devices presents a clear social objective of being able to predict not only situations of sudden death but also possible injuries.
2015 IEEE 2nd International Conference on Cybernetics (CYBCONF) | 2015
Pawel Czarnul; Pawel Rosciszewski; Mariusz R. Matuszek; Julian Szymański
The paper presents our approach to implementation of similarity measure for big data analysis in a parallel environment. We describe the algorithm for parallelisation of the computations. We provide results from a real MPI application for computations of similarity measures as well as results achieved with our simulation software. The simulation environment allows us to model parallel systems of various sizes with various components such as CPUs, GPUs, network interconnects, and model parallel applications in a meta language. The simulations allow us to determine in details how computations will be performed on a particular hardware. They also allow to predict the shapes of time curves beyond the area where empirical results can be obtained due to limited computational resources such as memory capacity.