Pawel Boinski
Poznań University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pawel Boinski.
data warehousing and knowledge discovery | 2012
Pawel Boinski; Maciej Zakrzewicz
We consider the problem of executing collocation pattern queries in limited memory environments. Our experiments show that if the memory size is not sufficient to hold all internal data structures used by the iCPI-tree algorithm, its performance decreases dramatically. We present a new method to efficiently process collocation pattern queries using materialized, improved candidate pattern instance tree. We have implemented and tested the aforementioned solution and shown that it can significantly improve the performance of the iCPI-tree algorithm.
intelligent information systems | 2014
Pawel Boinski; Maciej Zakrzewicz
Rapid growth of spatial datasets requires methods to find (semi-)automatically spatial knowledge from these sets. Spatial collocation patterns represent subsets of spatial features whose instances are frequently located together in a spatial neighborhood. In recent years, efficient methods for collocation discovery have been developed, however, none of them assume limited size of the operational memory or limited access to memory with short access times. Such restrictions are especially important in the context of the large size of the data structures required for efficient identification of collocation instances. In this work we present and compare three algorithms for collocation pattern mining in a limited memory environment. The first algorithm is based on the well-known joinless method introduced by Shekhar and Yoo. The second and third algorithms are inspired by a tree structure (iCPI-tree) presented by Wang et al. In our experimental evaluation, we have compared the efficiency of the algorithms, both on synthetic and real world datasets.
advances in databases and information systems | 2013
Witold Andrzejewski; Pawel Boinski
Collocation Pattern Discovery is a very interesting field of data mining in spatial databases. It consists in searching for types of spatial objects that are frequently located together in a spatial neighborhood. Application domains of such patterns include, but are not limited to, biology, geography, marketing and meteorology. To cope with processing of these huge volumes of data programmable high-performance graphic cards GPU can be used. GPUs have been proven recently to be extremely efficient in accelerating many existing algorithms. In this paper we present GPU-CM, a GPU-accelerated version of iCPI-tree based algorithm for the collocation discovery problem. To achieve the best performance we introduce specially designed structures and processing methods for the best utilization of the SIMD execution model. In experimental evaluation we compare our GPU implementation with a parallel implementation of iCPI-tree method for CPU. Collected results show order of magnitude speedups over the CPU version of the algorithm.
data warehousing and knowledge discovery | 2006
Pawel Boinski; Marek Wojciechowski; Maciej Zakrzewicz
We consider the problem of concurrent execution of multiple frequent itemset queries. If such data mining queries operate on overlapping parts of the database, then their overall I/O cost can be reduced by integrating their dataset scans. The integration requires that data structures of many data mining queries are present in memory at the same time. If the memory size is not sufficient to hold all the data mining queries, then the queries must be scheduled into multiple phases of loading and processing. Since finding the optimal assignment of queries to phases is infeasible for large batches of queries due to the size of the search space, heuristic algorithms have to be applied. In this paper we formulate the problem of assigning the queries to phases as a particular case of hypergraph partitioning. To solve the problem, we propose and experimentally evaluate two greedy optimization algorithms.
advances in databases and information systems | 2014
Witold Andrzejewski; Pawel Boinski
In spatial databases collocation pattern discovery is one of the most interesting fields of data mining. It consists in searching for types of spatial objects that are frequently located together in a spatial neighborhood. With the advent of data gathering techniques, huge volumes of spatial data are being collected. To cope with processing of such datasets a GPU accelerated version of the collocation pattern mining algorithm has been proposed recently [3]. However, the method assumes that a supporting structure that contains information about neighborhoods (called iCPI-tree) is given in advance. In this paper we present a GPU-based version of iCPI-tree generation algorithm for the collocation pattern discovery problem. In an experimental evaluation we compare our GPU implementation with a parallel implementation of iCPI-tree generation method for CPU. Collected results show that proposed solution is multiple times faster than the CPU version of the algorithm.
intelligent information systems | 2006
Pawel Boinski; Konrad Jozwiak; Marek Wojciechowski; Maciej Zakrzewicz
Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. Recently, a new problem of optimizing processing of batches of frequent itemset queries has been considered. The best technique for this problem proposed so far is Common Counting, which consists in concurrent processing of frequent itemset queries and integrating their database scans. Common Counting requires that data structures of several queries are stored in main memory at the same time. Since in practice memory is limited, the crucial problem is scheduling the queries to Common Counting phases so that the I/O cost is optimized. According to our previous studies, the best algorithm for this task, applicable to large batches of queries, is CCAgglomerative. In this paper we present a novel query schedul- ing method CCAgglomerativeNoise, built around CCAgglomerative, increasing its chances of finding an optimal solution.
Journal of Database Management | 2015
Witold Andrzejewski; Pawel Boinski
This article tackles the problem of efficient construction of iCPI trees, frequently used in co-location pattern discovery in spatial databases. It discusses the methods for parallelization of iCPI-tree construction and plane-sweep algorithms used in state-of-the-art algorithms for co-location pattern mining. The main contribution of this paper is threefold: 1 a general algorithm for parallel iCPI-tree construction is presented, 2 two variants of parallel plane-sweep algorithm which can be used in conjunction with the aforementioned iCPI-tree construction algorithm are introduced and 3 all three algorithms are implemented on CUDA GPU platform and their performance is tested against an efficient multithreaded parallel implementation of iCPI-tree construction on CPU. Experiments prove that our solutions allow for large speedups over CPU version of the algorithm. This paper is an extension of the conference paper Andrzejewski & Boinski, 2014.
advances in databases and information systems | 2013
Pawel Boinski; Maciej Zakrzewicz
Collocation pattern mining is one of the latest data mining techniques applied in Spatial Knowledge Discovery. We consider the problem of executing collocation pattern queries in a limited memory environment. In this paper we introduce a new method based on iCPI-tree materialization and a spatial partitioning to efficiently discover collocation patterns. We have implemented this new solution and conducted series of experiments. The results show a significant improvement in processing times both on synthetic and real world datasets.
Expert Systems With Applications | 2018
Witold Andrzejewski; Pawel Boinski
Abstract In this paper, we investigate Co-location Pattern Mining (CPM) from big spatial datasets. CPM consists in searching for types of objects that are frequently located together in a spatial neighborhood. Knowledge about such patterns is very important in fields like biology, environmental sciences, epidemiology etc. However, CPM is computationally challenging, mainly due to the large number of pattern instances hidden in spatial data. In this work, we propose a new solution that can utilize the power of multiple GPUs to increase the performance of CPM. The proposed solution is also capable of coping with the GPU memory limits by dividing the work into multiple packages and compressing internal data structures. Experiments performed on large synthetic and real-world datasets prove that we can achieve an order of magnitude speedups in comparison to the efficient multithreaded CPU implementation. Our solution can greatly improve the performance of data analysis, using widely available and energy efficient graphics cards. As a result, CPM in large datasets is more viable for university researchers as well as smaller companies and organizations.
Archive | 2012
Marek Wojciechowski; Maciej Zakrzewicz; Pawel Boinski
Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. In this chapter we address the problem of processing sets of frequent itemset queries, which brings the ideas of multiple-query optimization to the domain of data mining. The most attractive method of solving the problem with respect to possible practical applications is Common Counting which consists in concurrent execution of the queries using Apriori with the integration of scans of the parts of the database shared among the queries. The major advantage of Common Counting over its alternatives is its applicability to arbitrarily large batches of queries. If the memory structures of all the queries to be processed by Common Counting do not fit together in main memory, the set of queries has to be partitioned into subsets processed in several phases. We formalize the problem of dividing the set of queries for Common Counting as a specific case of hypergraph partitioning and provide a comprehensive overview of query set partitioning algorithms proposed so far.