Is this you? Create Your Porfile

Marcin Gorawski

Silesian University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marcin Gorawski is active.

Explore More

Publication

Featured researches published by Marcin Gorawski.

data warehousing and knowledge discovery | 2005

On efficient storing and processing of long aggregate lists

Marcin Gorawski; Rafal Malczok

In this paper we present a solution called Materialized Aggregate List designed for the efficient storing and processing of long aggregate lists. An aggregate list contains aggregates, calculated from the data stored in the database. In our approach, once created, the aggregates are materialized for further use. The list structure contains a table divided into pages. We present three different page-filling algorithms used when the list is browsed. We present test results and we use them for estimating the best combination of the configuration parameters: number of pages, size of a single page and number of available database connections. The Materialized Aggregate List can be applied on every aggregation level in various indexing structures, such as, an aR-tree.

database systems for advanced applications | 2008

Collecting data streams from a distributed radio-based measurement system

Marcin Gorawski; Pawel Marks; Michal Gorawski

Nowadays it becomes more and more popular to process rapid data streams representing real-time events, such as large scale financial transfers, road or network traffic, sensor data. Analysis of data streams enables new capabilities. It is possible to perform intrusion detection while it is happening, it is possible to predict road traffic basing on the analysis of the past and current vehicle flow. We addressed the problem of real-time analysis of the stream data from a radio-based measurement system. The system consists of large number of water, gas and electricity meters. Our work is focused on data delivery from meters to the stream data warehouse as quick as possible even if transmission failures occur. The system we designed is intended to increase significantly system reliability and availability. During this demonstration we want to present an example of the system capabilities.

availability, reliability and security | 2007

Protecting Private Information by Data Separation in Distributed Spatial Data Warehouse

Marcin Gorawski; Jakub Bularz

Both transactional and analytical systems store data, which being accessible to unauthorized persons may result in privacy violation. This issue has become especially important nowadays, due to more restrictive legislation concerning personal data protection and preserving data privacy. We introduce relation decomposition as a method to preserve the data confidentiality in distributed spatial data warehouses. Data separation between nodes of distributed system can easily protect data privacy without requiring encrypting sensitive data. Using the relation decomposition strongly reduces the possibility of a disclosure of private information contained in data warehouse. The article presents how specified secure policy can be implemented into the data warehouse system as well as how analytical applications can retrieve protected data from the database. Finally, we present test results verifying efficiency of the latter operations including comparison between relation decomposition and the most popular method of preserving data privacy i.e., data encryption using symmetric encryption algorithms

database and expert systems applications | 2006

Fault-Tolerant Distributed Stream Processing System

Marcin Gorawski; Pawel Marks

Real-time data processing systems are more and more popular nowadays. Data warehouses not only collect terabytes of data, they also process endless data streams. To support such a situation, a data extraction process must become a continuous process also. Here a problem of a failure resistance arises. It is important not only to process a set of data on time, even more important is not to lose any data when a failure occurs. We achieve this by applying a redundant distributed stream processing. In this paper, we present a fault-tolerant system designed for processing data streams originating from geographically distributed sources

parallel processing and applied mathematics | 2007

Modified R-MVB tree and BTV algorithm used in a distributed spatio-temporal data warehouse

Marcin Gorawski; Michal Gorawski

Structural and software modifications of MVB-tree (reverse pointers, aggregations) in exchange for higher space consumption enable answering the timestamp and time aggregated queries in a fast and easy way. Software extensions are new algorithms that accelerate query processing. This paper contains a brief description of temporal data and ways of handling them in modified R-MVB tree and presents distributed system in which above-mentioned index was tested along with a load balancing algorithm used in this solution.

SET | 2006

Checkpoint-based resumption in data warehouses

Marcin Gorawski; Pawel Marks

In the paper we focused on the problem of efficient handling of ETL processes failures. During such a process, a data warehouse is filled with data. Because large amounts of data need to be processed, the whole process takes a lot of time. After a failure there may be no time to restart the process. In such a situation a resumption algorithm should be applied. In the paper we present a new approach to the checkpoint-based resumption method. We combine checkpointing with the Design-Resume algorithm. Such a combination is supposed to work more efficiently than the pure checkpointing. Moreover, not all the ETL application modules must implement the checkpointing. We present a basic idea of the algorithm, its requirements and necessary definitions. The proposed solution is then compared to other resumption methods and obtained results are discussed.

Lecture Notes in Computer Science | 2006

AEC algorithm: a heuristic approach to calculating density-based clustering Eps parameter

Marcin Gorawski; Rafal Malczok

Spatial information processing is an active research field in database technology. Spatial databases store information about the position of individual objects in space [6]. Our current research is focused on providing an efficient caching structure for a telemetric data warehouse. We perform spatial objects clustering when creating levels of the structure. For this purpose we employ a density-based clustering algorithm. The algorithm requires an user-defined parameter Eps. As we cannot get the Eps from user for every level of the structure we propose a heuristic approach for calculating the Eps parameter. Automatic Eps Calculation (AEC) algorithm analyzes pairs of points defining two quantities: distance between the points and density of the stripe between the points. In this paper we describe in detail the algorithm operation and interpretation of the results. The AEC algorithm was implemented in one centralized and two distributed versions. Included test results present the algorithm correctness and efficiency against various datasets.

intelligent data engineering and automated learning | 2009

Extended cascaded star schema and ECOLAP operations for spatial data warehouse

Marcin Gorawski

In this paper several new aspects of spatial data warehouse modeling are presented. The extended cascaded star schema in spatial telemetric data warehouse SDW(t) was defined. Research proven that there is a strong need for building many SDWs extended cascaded star schemas as an outcome of separate spatio-temporal conceptual models. For one of these new data schemas, the definitions of cascaded ECOLAP operations were presented. These operations base on a relation algebra, and make possible ad-hoc queries executing.

international conference on dependability of computer systems | 2007

Towards Reliability and Fault-Tolerance of Distributed Stream Processing System

Marcin Gorawski; Pawel Marks

Not so long ago data warehouses were used to process data sets loaded periodically. We could distinguish two kinds of ETL processes: full and incremental. Now we often have to process real-time data and analyse them almost on-the-fly, so the analysis are always up to date. There are many possible applications for real-time data warehouses. In most cases two features are important: delivering data to the warehouse as quick as possible, and not losing any tuple in case of failures. In this paper we propose an architecture for gathering and processing data from geographically distributed data sources. We present theoretical analysis, mathematical model of a data source, and some rules of system modules configuration. At the end of the paper our future plans are described briefly.

database and expert systems applications | 2005

High Efficiency of Hybrid Resumption in Distributed Data Warehouses

Marcin Gorawski; Pawel Marks

ETL processes are sometimes interrupted by occurrence of a failure. In such a case, one of the interrupted extraction resumption algorithms is usually used. In this paper we present a modified Design-Resume algorithm enriched by the possibility of handling ETL processes containing many loading nodes. We use the DR algorithm to resume a distributed data warehouse load process. The key feature of this algorithm is that it does not impose additional overhead on the normal ETL process. In our work we modify the algorithm to work with more than one loading node, and combine it with staging technique, which increases the efficiency of the resumption process. The combined algorithm, we name it hybrid resumption algorithm. Based on the results of performed tests, the benefits of our improvements are discussed

Explore More