Martin Sarnovsky | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Martin Sarnovsky is active.

Explore More

Publication

Featured researches published by Martin Sarnovsky.

symposium on applied computational intelligence and informatics | 2013

Cloud-based clustering of text documents using the GHSOM algorithm on the GridGain platform

Martin Sarnovsky; Z. Ulbrik

This paper provides an overview of our research activities aimed on efficient use of distributed computing concepts for text-mining tasks. Work presented within this paper describes the GHSOM (Growing Hierarchical Self-Organizing Maps) algorithm for clustering of text documents and proposes the design and implementation of distributed version of this approach. Proposed implementation is based on JBOWL framework as a base for text mining. For distribution we used MapReduce paradigm implemented within the GridGain framework, which was used as a cloud application platform. Experiments were performed on standard Reuters dataset and for testing purposes we decided to use a simple private cloud infrastructure.

high performance computing and communications | 2006

Distributed classification of textual documents on the grid

Ivan Janciak; Martin Sarnovsky; A Min Tjoa; Peter Brezany

Efficient access to information and integration of information from various sources and leveraging this information to knowledge are currently major challenges in life science research. However, a large fraction of this information is only available from scientific articles that are stored in huge document databases in free text format or from the Web, where it is available in semi-structured format. Text mining provides some methods (e.g., classification, clustering, etc.) able to automatically extract relevant knowledge patterns contained in the free text data. The inclusion of the Grid text-mining services into a Grid-based knowledge discovery system can significantly support problem solving processes based on such a system. Motivation for the research effort presented in this paper is to use the Grid computational, storage, and data access capabilities for text mining tasks and text classification in particular. Text classification mining methods are time-consuming and utilizing the Grid infrastructure can bring significant benefits. Implementation of text mining techniques in distributed environment allows us to access different geographically distributed data collections and perform text mining tasks in parallel/distributed fashion.

international conference on intelligent engineering systems | 2012

Cloud computing as a platform for distributed fuzzy FCA approach in data analysis

Martin Sarnovsky; Peter Butka; Jana Pócsová

In this paper we describe use of cloud computing platform for support of distributed creation of conceptual models based on the FCA (Formal Concept Analysis) framework. FCA is one of the approaches which can be applied in process of conceptual data analysis. Extension of classical FCA (binary table data) is (one-sided) fuzzy version that works with different types of lattice-based attributes (binary, ordinal, interval-based, etc.) in the object-attribute table. This extension, so-called generalized one-sided concept lattices, provide possibility for researcher or data analyzer to use fuzzy FCA for object-attribute tables without the need for specific unified pre-processing, what is usually expected in practical data mining or online analytical tools. Computational complexity of creation of concept lattices from large contexts (data tables) is considerable, also interpretability of huge concept lattices is problematic. Therefore, we will also propose a solution for creation of simple hierarchy of smaller FCA models. Starting data table is decomposed into smaller sets of objects and then one concept lattice is built for every subset using generalized one-sided concept lattice. Such small FCA-based models are better for interpretability, and also can be combined into one hierarchy of models using simple hierarchical clustering based on the descriptions of particular models (as weighted vectors of attributes), which can be searched in analytical tool by data analyst. Cloud infrastructure is then used for increase of computational effectiveness, because particular models are built in parallel/distributed way. This cloud module can be a part of more complex data analytical system, which is also presented at the end of the paper.

international symposium on applied machine intelligence and informatics | 2014

RDF vs. NoSQL databases for the semantic web applications

Peter Bednár; Martin Sarnovsky; Viktor Demko

The main objective of presented paper is to compare and analyze the performance of semantic and NoSQL storage on the selected datasets. Paper focuses on a theoretical analysis of the problem and details the performance testing of selected semantic repositories and NoSQL databases. The practical part is focused on the testing of selected systems and our main aim was to simulate multiple querying with regard to diversity of the queries with different criteria. Results of the performed experiments are reported and analyzed.

international symposium on applied machine intelligence and informatics | 2014

Distributed boosting algorithm for classification of text documents

Martin Sarnovsky; Michal Vronc

Presented paper focuses on the area of analysis and classification of textual documents. We present the classification of documents based on boosting method applied on the decision tree algorithm. Main objective of the paper is to present the implementation of distributed boosting algorithm based on Map Reduce paradigm. We have used the GridGain framework as a platform for distributed data processing and have tested the implemented solution on two different dataset within our testing environment.

symposium on applied computational intelligence and informatics | 2011

IT service management supported by semantic technologies

Martin Sarnovsky; Karol Furdík

Main objective of this paper is to present the idea of semantic structures to support IT Service Management processes. Utilizing the principles of Semantic Web and knowledge technologies with the standardized IT service management processes can provide a framework for designing and maintaining interoperable service-based applications. In presented paper we describe the conceptual framework for developing a semantic model of ITIL, one of the selected ITSM frameworks, and also discusses future work including the implementation and planned testing environment.

international symposium on applied machine intelligence and informatics | 2017

Twitter data analysis and visualizations using the R language on top of the Hadoop platform

Martin Sarnovsky; Peter Butka; Andrea Huzvarova

The main objective of the work presented within this paper was to design and implement the system for twitter data analysis and visualization in R environment using the big data processing technologies. Our focus was to leverage existing big data processing frameworks with its storage and computational capabilities to support the analytical functions implemented in R language. We decided to build the backend on top of the Apache Hadoop framework including the Hadoop HDFS as a distributed filesystem and MapReduce as a distributed computation paradigm. RHadoop packages were then used to connect the R environment to the processing layer and to design and implement the analytical functions in a distributed manner. Visualizations were implemented on top of the solution as a RShiny application.

international symposium on applied machine intelligence and informatics | 2017

Building environment analysis based on clustering methods from sensor data on top of the Hadoop platform

Martin Sarnovsky; David Bajus

Presented paper describes the use of clustering methods in building environment analysis task. The presented approach is based on modeling of the sensor data containing information about humidity and temperature. Such models are then used to describe the level of the comfort of particular environment. K-means clustering algorithm was used to create those models. The paper then presents and describes a method of user interaction with the environment model. User feed-back represents how the user feels in the current environment. Feedback is then collected and evaluated. Based on the feedback, models can trigger the change of current environment or during the time, re-compute themselves in order to pro-vide more precise building environment representation. Our solution was based on real sensor data obtained from university buildings and presented solution was implemented on top of Hadoop cluster using Mahout library for machine learning.

ISAT (2) | 2017

Social-Media Data Analysis Using Tessera Framework in the Hadoop Cluster Environment

Martin Sarnovsky; Peter Butka; Jakub Paulina

The presented paper describes the design and implementation of R functions for twitter feeds analysis and visualization based on a combination of analytical technologies with big data processing tools. The main idea was to utilize the Hadoop processing framework and its storage and computational capabilities in analytical tasks designed and implemented in R language. For such purposes, we decided to use the Hadoop HDFS and MapReduce v2 for storage and handling of the processing logic connected via Tessera framework to analytical functions written in R. The results of the analysis were presented as the graph visualizations. Visualizations were implemented using the Trelliscope framework for flexible visualizations of large complex data in R environment in fast and effective fashion.

international symposium on applied machine intelligence and informatics | 2008