Piotr Gawrysiak | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Piotr Gawrysiak is active.

Explore More

Publication

Featured researches published by Piotr Gawrysiak.

Bioinformatics | 2014

SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision

Antonio Messina; Alicja Elzbieta Pacholewska; Sergio Maffioletti; Piotr Gawrysiak; Michal J. Okoniewski

UNLABELLED Many time-consuming analyses of next -: generation sequencing data can be addressed with modern cloud computing. The Apache Hadoop-based solutions have become popular in genomics BECAUSE OF: their scalability in a cloud infrastructure. So far, most of these tools have been used for batch data processing rather than interactive data querying. The SparkSeq software has been created to take advantage of a new MapReduce framework, Apache Spark, for next-generation sequencing data. SparkSeq is a general-purpose, flexible and easily extendable library for genomic cloud computing. It can be used to build genomic analysis pipelines in Scala and run them in an interactive way. SparkSeq opens up the possibility of customized ad hoc secondary analyses and iterative machine learning algorithms. This article demonstrates its scalability and overall fast performance by running the analyses of sequencing datasets. Tests of SparkSeq also prove that the use of cache and HDFS block size can be tuned for the optimal performance on multiple worker nodes. AVAILABILITY AND IMPLEMENTATION Available under open source Apache 2.0 license: https://bitbucket.org/mwiewiorka/sparkseq/.

international syposium on methodologies for intelligent systems | 2008

Text onto miner: a semi automated ontology building system

Piotr Gawrysiak; Grzegorz Protaziuk; Henryk Rybinski; Alexandre Delteil

This paper presents an overview of the results of the project undertaken by the Warsaw University of Technology Institute of Computer Science as a part of research agreement with France Telecom. The project goal was to create a set of tools - both software and methods, that could be used to speed up and improve a process of creating ontologies. In the course of the project a new ontology building methodology has been devised, new text mining algorithms optimized for extracting information useful for building an ontology from text corpora have been proposed and an universal text mining toolkit - TOM Platform - have been implemented.

Emerging Intelligent Technologies in Industry | 2011

Emerging Intelligent Technologies in Industry

Dominik Ryżko; Henryk Rybinski; Piotr Gawrysiak; Marzena Kryszkiewicz

Intelligent technologies are the essential factors of innovation, and enable the industry to overcome technological limitations and explore the new frontiers. Therefore it is necessary for scientists and practitioners to cooperate and inspire each other, and use the latest research results in creating new designs and products. The idea of this book came out with the industrial workshop organized at the ISMIS conference in Warsaw, 2011. The book covers several applications of emerging, intelligent technologies in various branches of the industry. The contributions describe modern intelligent tools, algorithms and architectures, which have the potential to solve real problems, experienced by practitioners in various industry sectors. We hope this volume will show new directions for cooperation between science and industry and will facilitate efficient transfer of knowledge in the area of intelligent information systems.

atlantic web intelligence conference | 2007

The Analysis and Visualization of Entries in Wiki Services

Jakub Gawryjołek; Piotr Gawrysiak

The use of online collaboration environments has become exceptionally widespread over the past decade. One of the most popular styles of collaboration are the “wiki” web sites. They have attracted attention because of their policy of letting anyone become an editor. This paper presents the technique for the analysis and visualization of Wikipedia - the largest wiki in existence. Specifically, it concentrates on some activity patterns of its contributors. First, a new visualization and analysis tool named JWikiVis is presented. Second, with the use of this software, some interesting user behaviors are described. Finally, text classification algorithms are applied in order to determine some patterns observed in individual wiki pages as well as in the entire service.

international conference on applications of declarative programming and knowledge management | 2001

Mining multi-dimensional quantitative associations

Michal J. Okoniewski; Łukasz Gancarz; Piotr Gawrysiak

The new form of quantitative and multi-dimensional association rules, unlike other approaches, does not require the discretization of real value attributes as a preprocessing step. Instead, associations are discovered with data-driven algorithms. Thus, such rules may be considered as a good tool to learn useful and precise knowledge from scientific, spatial or multimedia data, because data-driven algorithms work well with any sampling method. This paper presents the whole methodology of automatic discovery of new rules that includes theoretical background, algorithms, complexity analysis and postprocessing techniques. The methodology was designed for a specific telecom research problem, but it is expected to have a wide range of applications.

ICMMI | 2011

The Mobile Personal Augmented Reality Navigation System

Jakub Królewski; Piotr Gawrysiak

In this paper we present a prototype of a novel augmented reality navigation system able to provide information about the public transport system within a city via an AR interface on a mobile device. The system is able to provide point-to-point directions including directions to tram and bus stops, real time monitoring of public transport schedules and monitoring of user transit. Contrary to existing other AR solution, the system that is described herein constantly monitors users’ journey and is able to ‘guide’ him in urban scenario with minimal interaction. The functionality of the system has been tested in an urban environment. The system AR engine and the data interfacing engine collecting and processing information regarding public transport has been created from scratch for the purpose of this system.

international syposium on methodologies for intelligent systems | 2012

Using web mining for discovering spatial patterns and hot spots for spatial generalization

Jan Burdziej; Piotr Gawrysiak

In this paper we propose a novel approach to spatial data generalization, in which web user behavior information influences the generalization and mapping process. Our approach relies on combining usage information from web resources such as Wikipedia with search engines index statistics in order to determine an importance score for geographical objects that is used during map preparation.

intelligent information systems | 2003

Automatic Classification of Executable Code for Computer Virus Detection

Pawel Kierski; Michal J. Okoniewski; Piotr Gawrysiak

Automatic knowledge discovery methodologies has proved to be a very strong tool which is currently widely used for the analysis of large datasets, being produced by organizations worldwide. However, this analysis is mostly done for relatively simple and structured data, such as transactional or financial records. The real frontier for current KDD research seems to be analysis of unstructured data, such as freeform text, web pages, images etc. In this paper we present results of applying KDD methodology to such unstructured data — namely computer machine code. We show that it is possible to construct automatic classification system, that would be able to distinguish “good” computer code from malicious code — in our case code of computer viruses — and which therefore could act as an intelligent virus scanner. In our approach we use methods originating from text mining field, treating CPU instructions as a kind of natural language.

intelligent information systems | 2001

Regression - Yet Another Clustering Method

Piotr Gawrysiak; Michal J. Okoniewski; Henryk Rybinski

The paper contains description of a new clustering methodology that partitions data set into clusters, such that regression indetermination coefficient for data from each cluster is minimized. A clustering algorithm that realizes this methodology with genetic programming approaches, as well as, some experimental results are presented. The application of the algorithm for planning cellular telephone networks is discussed.

international syposium on methodologies for intelligent systems | 2011

Extracting product descriptions from polish e-commerce websites using classification and clustering

Piotr Kołaczkowski; Piotr Gawrysiak

A novel method for extracting product descriptions from ecommerce websites is presented. The algorithm consists of three major steps: (1) extracting descriptions of appropriate length from the source documents related to the search query using shallow text analysis methods; (2) assigning each of the description to one of the predefined categories by means of text classification and (3) grouping the results by a text clustering algorithm to return the descriptions found in the clusters with the highest quality. The recall and precision of the search are examined using a set of queries for laptops currently being sold in popular shopping sites. It is shown that, although the extraction method based purely on the classification and the method based purely on the clustering give acceptable results, the highest precision is achieved when using them together. It was also observed that examining about 20 first sites returned by Google is sufficient to get high quality descriptions of popular products.

Explore More