Rafal Malczok
Silesian University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rafal Malczok.
data warehousing and knowledge discovery | 2005
Marcin Gorawski; Rafal Malczok
In this paper we present a solution called Materialized Aggregate List designed for the efficient storing and processing of long aggregate lists. An aggregate list contains aggregates, calculated from the data stored in the database. In our approach, once created, the aggregates are materialized for further use. The list structure contains a table divided into pages. We present three different page-filling algorithms used when the list is browsed. We present test results and we use them for estimating the best combination of the configuration parameters: number of pages, size of a single page and number of available database connections. The Materialized Aggregate List can be applied on every aggregation level in various indexing structures, such as, an aR-tree.
Lecture Notes in Computer Science | 2006
Marcin Gorawski; Rafal Malczok
Spatial information processing is an active research field in database technology. Spatial databases store information about the position of individual objects in space [6]. Our current research is focused on providing an efficient caching structure for a telemetric data warehouse. We perform spatial objects clustering when creating levels of the structure. For this purpose we employ a density-based clustering algorithm. The algorithm requires an user-defined parameter Eps. As we cannot get the Eps from user for every level of the structure we propose a heuristic approach for calculating the Eps parameter. Automatic Eps Calculation (AEC) algorithm analyzes pairs of points defining two quantities: distance between the points and density of the stripe between the points. In this paper we describe in detail the algorithm operation and interpretation of the results. The AEC algorithm was implemented in one centralized and two distributed versions. Included test results present the algorithm correctness and efficiency against various datasets.
advances in databases and information systems | 2006
Marcin Gorawski; Rafal Malczok
Many real-life applications use various kinds of clustering algorithms. Very popular and interesting are applications dealing with spatial data, like on-line map services or traffic tracking systems. A very important branch of spatial systems is telemetry. Our current research is focused on providing an efficient caching structure that will accelerate spatial queries evaluation and improve the ways of storing and processing aggregates. We use a density-based clustering algorithm to create the structure levels. The used clustering algorithm is fast and efficient but it requires a user-defined Eps parameter. As we cannot get the Eps parameter from the user for every level of the structure, we propose an Automatic Eps Calculation (AEC) algorithm which, based on the points distribution characteristics, is able to estimate the Eps parameter value. The algorithm is not limited to the telemetry-specific data and can be applied to any set of points located in a two-dimensional space. We describe in detail the algorithm operation, test results and possible algorithm improvements.
Advances in Intelligent Information and Database Systems | 2010
Marcin Gorawski; Rafal Malczok
The process of adapting data warehouse solutions for application in many areas of everyday life causes that data warehouses are used for storing and processing many, often far from standard, kinds of data like maps, videos, clickstreams to name a few. A new type of data – stream data, generated by many types of systems like traffic monitoring or telemetry systems, created a motivation for a new concept, a stream data warehouse. In this paper we address a problem of indexing spatial objects generating streams of data with spatial indexing structure. Basing on our motivation, a telemetric system of integrated meter readings, and utilizing the results of our previous work, we extend the solution we created for processing long but limited aggregates lists to make it applicable for processing data streams. Then we describe the process of adapting a spatial indexing structure for usage in a stream data warehouse by modifying both the structure of the index nodes and the operation of the algorithm answering the range aggregate queries. The paper contains also experimental evaluation of the proposed solution.
parallel processing and applied mathematics | 2007
Marcin Gorawski; Rafal Malczok
Data processing computer systems store and process large volumes of data. The volumes tend to grow very quickly, especially in data warehouse systems. A few years ago data warehouses were used only for supporting strictly business decisions but nowadays they find their application in many domains of everyday life. New and very demanding field is stream data warehousing. Car traffic monitoring, cell phones tracking or utilities meters integrated reading systems generate stream data. In a stream data warehouse the ETL process is a continuous one. Stream data processing poses many new challenges to memory management and data processing algorithms. The most important aspects concern efficiency and scalability of the designed solutions. In this paper we present an example of a stream data warehouse and then, basing on the presented example and our previous work results, we discuss a solution for stream data parallel processing. We also show, how to integrate the presented solution with a spatial aggregating index.
international conference on parallel processing | 2003
Marcin Gorawski; Rafal Malczok
Data warehouses are used to store large amounts of data. A data model makes possible separating data categories and establishing relations between them. In this paper we introduce for the first time the new concept of distributed spatial data warehouse based on the multidimensional data model called cascaded star schema [1]. We decided to use the idea of new aggregation tree, that indexes our model in order to fully exploit capabilities of the cascaded star. After close discussion on the cascaded star schema and aggregation tree, we introduce the new idea of distributing data warehouse based on the cascaded star schema. Using Java we implemented both system running on a single computer as well as distributed system. Then we carried out the tests which results allow us to compare the performance of both systems. The tests results show that by distribution one may improve the performance of spatial data warehouse.
parallel, distributed and network-based processing | 2005
Marcin Gorawski; Rafal Malczok
In this paper we present a spatial telemetric data warehouse (STDW) system that we use for aggregation and analysis of huge amounts of spatial data. The data is generated by utilities meters communicating via radio. In order to provide a sufficient efficiency of our system we propose data and workload distribution as well as advanced indexing techniques. As a data model we applied a cascaded star model that is a spatial development of a standard star schema. We use a dynamic indexing structure called aggregation tree. The index operation is tightly integrated with the spatial character of the data. In this paper we address a few interesting details concerning the aggregation tree: indexing structure materialization, memory managing algorithm and tree updating. The system is written in Java; as the database we use Oracle 9i. Basing on the tests results, we prove that the distributed system significantly surpasses the centralized version in terms of efficiency. We also show that a selective materialization of indexing structure fragments strongly increases system efficiency.
Lecture Notes in Computer Science | 2004
Marcin Gorawski; Rafal Malczok
In this paper we present a system of spatial data warehouse designed for aggregating and analyzing a wide range of spatial information. The data is generated by media meters working in a telemetric system. The data warehouse is based on a new model called the cascaded star model. The cascaded star is the spatial development of a standard star schema. We decided to use an indexing structure called aggregation tree. The aggregation tree materialization and integrated mechanism of available memory estimation highly improve the efficiency of the system. The theoretical aspects are confirmed by the presented tests results.
database systems for advanced applications | 2010
Marcin Gorawski; Rafal Malczok
Nowadays computer systems process various types of data such as images, videos, maps, data streams to name a few. In this paper we focus on a problem of answering range-aggregate queries over objects generating data streams. Our motivating example is a network of meters monitoring utilities consumption and continuously reporting the readings to central gathering points. An answer to a range-aggregate query is a merged stream of aggregates allowing analyses of utilities consumption in a given region. In order to calculate the answer we integrate MAL (Materialized Aggregates List) with spatial aggregating index, e.g. aR-Tree. The result we obtain is a spatial aggregating index with functionality of answering range queries over objects generating data streams. The index is embedded in an experimental stream data warehouse system implemented in Java. The implementation provided us with the possibility of presenting the index operation and also carrying out a number of tests.
parallel processing and applied mathematics | 2005
Marcin Gorawski; Rafal Malczok
In this paper we present a solution called Materialized Aggregate List designed for the storing and processing of long aggregate lists. An aggregate list contains aggregates, calculated from the data stored in the database. In our approach, once created, the aggregates are materialized for further use. The list structure contains a table divided into pages. We present three different multi-thread page-filling algorithms used when the list is browsed. The Materialized Aggregate List can be applied as a component of a node on every aggregation level in indexing structures, such as, an aR-tree. We present test results estimating an efficiency of the proposed solution.