Pawel Marks
Silesian University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pawel Marks.
database systems for advanced applications | 2008
Marcin Gorawski; Pawel Marks; Michal Gorawski
Nowadays it becomes more and more popular to process rapid data streams representing real-time events, such as large scale financial transfers, road or network traffic, sensor data. Analysis of data streams enables new capabilities. It is possible to perform intrusion detection while it is happening, it is possible to predict road traffic basing on the analysis of the past and current vehicle flow. We addressed the problem of real-time analysis of the stream data from a radio-based measurement system. The system consists of large number of water, gas and electricity meters. Our work is focused on data delivery from meters to the stream data warehouse as quick as possible even if transmission failures occur. The system we designed is intended to increase significantly system reliability and availability. During this demonstration we want to present an example of the system capabilities.
database and expert systems applications | 2006
Marcin Gorawski; Pawel Marks
Real-time data processing systems are more and more popular nowadays. Data warehouses not only collect terabytes of data, they also process endless data streams. To support such a situation, a data extraction process must become a continuous process also. Here a problem of a failure resistance arises. It is important not only to process a set of data on time, even more important is not to lose any data when a failure occurs. We achieve this by applying a redundant distributed stream processing. In this paper, we present a fault-tolerant system designed for processing data streams originating from geographically distributed sources
SET | 2006
Marcin Gorawski; Pawel Marks
In the paper we focused on the problem of efficient handling of ETL processes failures. During such a process, a data warehouse is filled with data. Because large amounts of data need to be processed, the whole process takes a lot of time. After a failure there may be no time to restart the process. In such a situation a resumption algorithm should be applied. In the paper we present a new approach to the checkpoint-based resumption method. We combine checkpointing with the Design-Resume algorithm. Such a combination is supposed to work more efficiently than the pure checkpointing. Moreover, not all the ETL application modules must implement the checkpointing. We present a basic idea of the algorithm, its requirements and necessary definitions. The proposed solution is then compared to other resumption methods and obtained results are discussed.
international conference on dependability of computer systems | 2007
Marcin Gorawski; Pawel Marks
Not so long ago data warehouses were used to process data sets loaded periodically. We could distinguish two kinds of ETL processes: full and incremental. Now we often have to process real-time data and analyse them almost on-the-fly, so the analysis are always up to date. There are many possible applications for real-time data warehouses. In most cases two features are important: delivering data to the warehouse as quick as possible, and not losing any tuple in case of failures. In this paper we propose an architecture for gathering and processing data from geographically distributed data sources. We present theoretical analysis, mathematical model of a data source, and some rules of system modules configuration. At the end of the paper our future plans are described briefly.
database and expert systems applications | 2005
Marcin Gorawski; Pawel Marks
ETL processes are sometimes interrupted by occurrence of a failure. In such a case, one of the interrupted extraction resumption algorithms is usually used. In this paper we present a modified Design-Resume algorithm enriched by the possibility of handling ETL processes containing many loading nodes. We use the DR algorithm to resume a distributed data warehouse load process. The key feature of this algorithm is that it does not impose additional overhead on the normal ETL process. In our work we modify the algorithm to work with more than one loading node, and combine it with staging technique, which increases the efficiency of the resumption process. The combined algorithm, we name it hybrid resumption algorithm. Based on the results of performed tests, the benefits of our improvements are discussed
database systems for advanced applications | 2008
Marcin Gorawski; Pawel Marks
Not so long ago data warehouses were used to process data sets loaded periodically during ETL process (Extraction, Transformation and Loading). We could distinguish two kinds of ETL processes: full and incremental. Now we often have to process real-time data and analyse them almost on-the-fly, so the analyses are always up to date. There are many possible applications for real-time data warehouses. In most cases two features are important: delivering data to the warehouse as quick as possible, and not losing any tuple in case of failures. In this paper we describe an architecture for gathering and processing data from geographically distributed data sources and we define a method for analysing properties of the connections structure, finding the weakest points in case of single and multiple node failures. At the end of the paper our future plans are described briefly.
availability, reliability and security | 2007
Marcin Gorawski; Pawel Marks
Not so long ago data warehouses were used to process data sets loaded periodically during ETL process (extraction, transformation and loading). We could distinguish two kinds of ETL processes: full and incremental. Now we often have to process real-time data and analyse them almost on-the-fly, so the analyses are always up to date. There are many possible applications for real-time data warehouses. In most cases two features are important: delivering data to the warehouse as quick as possible, and not losing any tuple in case of failures. In this paper we propose an architecture for gathering and processing data from geographically distributed data sources. We present theoretical analysis, mathematical model of a data source, some rules of system modules configuration and results of experiments. At the end of the paper our future plans are described briefly
very large data bases | 2005
Marcin Gorawski; Pawel Marks
A data warehouse is filled with data during the extraction process. Such a process is sometimes interrupted by occurrence of a failure. After a failure the warehouse contains an incomplete data set, a part of the set is missing. To load the missing part of the data one of the interrupted extraction resumption algorithms is usually used. In this paper we analyze the influence of data balancing used in a distributed data warehouse on the efficiency of extraction and resumption processes. During resumption we base on the Design-Resume algorithm which imposes no additional overhead on an uninterrupted extraction process. We present how the balancing is done and examine its influence on the ETL process efficiency. Finally, basing on the results of performed tests, we discuss advantages and disadvantages of the balancing with respect to the ETL process.
Computer Networks and Isdn Systems | 2013
Marcin Gorawski; Pawel Marks; Michal Gorawski
In recent years energy market has changed. Consumers in many countries are free to buy energy from any of the available providers. This requires continuous reading from a huge number of energy meters to evaluate the amount of energy being bought from a particular provider. In this paper we present a fault-tolerant distributed stream processing system for continuous meter readings. The main goal of the system is to store the readings in a stream data warehouse for further analysis. We focus on modeling of the data stream intensity in order to estimate the size of buffers in a network of components composing the system. We present both the mathematical model of the intensity and the simulation results to prove the correctness of the theoretical analysis.
Lecture Notes in Computer Science | 2006
Marcin Gorawski; Pawel Marks