Marcin Gorawski
Silesian University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marcin Gorawski.
data warehousing and knowledge discovery | 2005
Marcin Gorawski; Rafal Malczok
In this paper we present a solution called Materialized Aggregate List designed for the efficient storing and processing of long aggregate lists. An aggregate list contains aggregates, calculated from the data stored in the database. In our approach, once created, the aggregates are materialized for further use. The list structure contains a table divided into pages. We present three different page-filling algorithms used when the list is browsed. We present test results and we use them for estimating the best combination of the configuration parameters: number of pages, size of a single page and number of available database connections. The Materialized Aggregate List can be applied on every aggregation level in various indexing structures, such as, an aR-tree.
database systems for advanced applications | 2008
Marcin Gorawski; Pawel Marks; Michal Gorawski
Nowadays it becomes more and more popular to process rapid data streams representing real-time events, such as large scale financial transfers, road or network traffic, sensor data. Analysis of data streams enables new capabilities. It is possible to perform intrusion detection while it is happening, it is possible to predict road traffic basing on the analysis of the past and current vehicle flow. We addressed the problem of real-time analysis of the stream data from a radio-based measurement system. The system consists of large number of water, gas and electricity meters. Our work is focused on data delivery from meters to the stream data warehouse as quick as possible even if transmission failures occur. The system we designed is intended to increase significantly system reliability and availability. During this demonstration we want to present an example of the system capabilities.
availability, reliability and security | 2007
Marcin Gorawski; Jakub Bularz
Both transactional and analytical systems store data, which being accessible to unauthorized persons may result in privacy violation. This issue has become especially important nowadays, due to more restrictive legislation concerning personal data protection and preserving data privacy. We introduce relation decomposition as a method to preserve the data confidentiality in distributed spatial data warehouses. Data separation between nodes of distributed system can easily protect data privacy without requiring encrypting sensitive data. Using the relation decomposition strongly reduces the possibility of a disclosure of private information contained in data warehouse. The article presents how specified secure policy can be implemented into the data warehouse system as well as how analytical applications can retrieve protected data from the database. Finally, we present test results verifying efficiency of the latter operations including comparison between relation decomposition and the most popular method of preserving data privacy i.e., data encryption using symmetric encryption algorithms
database and expert systems applications | 2006
Marcin Gorawski; Pawel Marks
Real-time data processing systems are more and more popular nowadays. Data warehouses not only collect terabytes of data, they also process endless data streams. To support such a situation, a data extraction process must become a continuous process also. Here a problem of a failure resistance arises. It is important not only to process a set of data on time, even more important is not to lose any data when a failure occurs. We achieve this by applying a redundant distributed stream processing. In this paper, we present a fault-tolerant system designed for processing data streams originating from geographically distributed sources
parallel processing and applied mathematics | 2007
Marcin Gorawski; Michal Gorawski
Structural and software modifications of MVB-tree (reverse pointers, aggregations) in exchange for higher space consumption enable answering the timestamp and time aggregated queries in a fast and easy way. Software extensions are new algorithms that accelerate query processing. This paper contains a brief description of temporal data and ways of handling them in modified R-MVB tree and presents distributed system in which above-mentioned index was tested along with a load balancing algorithm used in this solution.
SET | 2006
Marcin Gorawski; Pawel Marks
In the paper we focused on the problem of efficient handling of ETL processes failures. During such a process, a data warehouse is filled with data. Because large amounts of data need to be processed, the whole process takes a lot of time. After a failure there may be no time to restart the process. In such a situation a resumption algorithm should be applied. In the paper we present a new approach to the checkpoint-based resumption method. We combine checkpointing with the Design-Resume algorithm. Such a combination is supposed to work more efficiently than the pure checkpointing. Moreover, not all the ETL application modules must implement the checkpointing. We present a basic idea of the algorithm, its requirements and necessary definitions. The proposed solution is then compared to other resumption methods and obtained results are discussed.
Lecture Notes in Computer Science | 2006
Marcin Gorawski; Rafal Malczok
Spatial information processing is an active research field in database technology. Spatial databases store information about the position of individual objects in space [6]. Our current research is focused on providing an efficient caching structure for a telemetric data warehouse. We perform spatial objects clustering when creating levels of the structure. For this purpose we employ a density-based clustering algorithm. The algorithm requires an user-defined parameter Eps. As we cannot get the Eps from user for every level of the structure we propose a heuristic approach for calculating the Eps parameter. Automatic Eps Calculation (AEC) algorithm analyzes pairs of points defining two quantities: distance between the points and density of the stripe between the points. In this paper we describe in detail the algorithm operation and interpretation of the results. The AEC algorithm was implemented in one centralized and two distributed versions. Included test results present the algorithm correctness and efficiency against various datasets.
intelligent data engineering and automated learning | 2009
Marcin Gorawski
In this paper several new aspects of spatial data warehouse modeling are presented. The extended cascaded star schema in spatial telemetric data warehouse SDW(t) was defined. Research proven that there is a strong need for building many SDWs extended cascaded star schemas as an outcome of separate spatio-temporal conceptual models. For one of these new data schemas, the definitions of cascaded ECOLAP operations were presented. These operations base on a relation algebra, and make possible ad-hoc queries executing.
international conference on dependability of computer systems | 2007
Marcin Gorawski; Pawel Marks
Not so long ago data warehouses were used to process data sets loaded periodically. We could distinguish two kinds of ETL processes: full and incremental. Now we often have to process real-time data and analyse them almost on-the-fly, so the analysis are always up to date. There are many possible applications for real-time data warehouses. In most cases two features are important: delivering data to the warehouse as quick as possible, and not losing any tuple in case of failures. In this paper we propose an architecture for gathering and processing data from geographically distributed data sources. We present theoretical analysis, mathematical model of a data source, and some rules of system modules configuration. At the end of the paper our future plans are described briefly.
database and expert systems applications | 2005
Marcin Gorawski; Pawel Marks
ETL processes are sometimes interrupted by occurrence of a failure. In such a case, one of the interrupted extraction resumption algorithms is usually used. In this paper we present a modified Design-Resume algorithm enriched by the possibility of handling ETL processes containing many loading nodes. We use the DR algorithm to resume a distributed data warehouse load process. The key feature of this algorithm is that it does not impose additional overhead on the normal ETL process. In our work we modify the algorithm to work with more than one loading node, and combine it with staging technique, which increases the efficiency of the resumption process. The combined algorithm, we name it hybrid resumption algorithm. Based on the results of performed tests, the benefits of our improvements are discussed