Maciej Zakrzewicz
Poznań University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Maciej Zakrzewicz.
european conference on principles of data mining and knowledge discovery | 2000
Tadeusz Morzy; Marek Wojciechowski; Maciej Zakrzewicz
Data mining is a useful decision support technique, which can be used to find trends and regularities in warehouses of corporate data. A serious problem of its practical applications is long processing time required by data mining algorithms. Current systems consume minutes or hours to answer simple queries. In this paper we present the concept of materialized data mining views. Materialized data mining views store selected patterns discovered in a portion of a database, and are used for query rewriting, which transforms a data mining query into a query accessing a materialized view. Since the transformation is transparent to a user, materialized data mining views can be created and used like indexes.
pacific asia conference on knowledge discovery and data mining | 2001
Tadeusz Morzy; Marek Wojciechowski; Maciej Zakrzewicz
Data clustering methods have many applications in the area of data mining. Traditional clustering algorithms deal with quantitative or categorical data points. However, there exist many important databases that store categorical data sequences, where significant knowledge is hidden behind sequential dependencies between the data. In this paper we introduce a problem of clustering categorical data sequences and present an efficient scalable algorithm to solve the problem. Our algorithm implements the general idea of agglomerative hierarchical clustering and uses frequently occurring subsequences as features describing data sequences. The algorithm not only discovers a set of high quality clusters containing similar data sequences but also provides descriptions of the discovered clusters.
Lecture Notes in Computer Science | 2002
Marek Wojciechowski; Maciej Zakrzewicz
Many data mining techniques consist in discovering patterns frequently occurring in the source dataset. Typically, the goal is to discover all the patterns whose frequency in the dataset exceeds a userspecified threshold. However, very often users want to restrict the set of patterns to be discovered by adding extra constraints on the structure of patterns. Data mining systems should be able to exploit such constraints to speed-up the mining process. In this paper, we focus on improving the efficiency of constraint-based frequent pattern mining by using dataset filtering techniques. Dataset filtering conceptually transforms a given data mining task into an equivalent one operating on a smaller dataset. We present transformation rules for various classes of patterns: itemsets, association rules, and sequential patterns, and discuss implementation issues regarding integration of dataset filtering with well-known pattern discovery algorithms.
advances in databases and information systems | 1999
Tadeusz Morzy; Marek Wojciechowski; Maciej Zakrzewicz
Clustering is a data mining method, which consists in discovering interesting data distributions in very large databases. The applications of clustering cover customer segmentation, catalog design, store layout, stock market segmentation, etc. In this paper, we consider the problem of discovering similarity-based clusters in a large database of event sequences. We introduce a hierarchical algorithm that uses sequential patterns found in the database to efficiently generate both the clustering model and data clusters. The algorithm iteratively merges smaller, similar clusters into bigger ones until the requested number of clusters is reached. In the absence of a well-defined metric space, we propose the similarity measure, which is used in cluster merging. The advantage of the proposed measure is that no additional access to the source database is needed to evaluate the inter-cluster similarities.
data warehousing and knowledge discovery | 2000
Tadeusz Morzy; Marek Wojciechowski; Maciej Zakrzewicz
The most popular data mining techniques consist in searching databases for frequently occurring patterns, e.g. association rules, sequential patterns. We argue that in contrast to todays loosely-coupled tools, data mining should be regarded as advanced database querying and supported by Database Management Systems (DBMSs). In this paper we descirbe our research prototype system, which logically extends DBMS functionality, offering extensive support for pattern discovery, storage and management. We focus on the system architecture and novel SQL-based data mining query language, which serves as the user interface to the system.
advances in databases and information systems | 2003
Marek Wojciechowski; Maciej Zakrzewicz
Data mining queries are often submitted concurrently to the data mining system. The data mining system should take advantage of overlapping of the mined datasets. In this paper we focus on frequent itemset mining and we discuss and experimentally evaluate the implementation of the Common Counting method on top of the Apriori algorithm. The general idea of Common Counting is to reduce the number of times the common parts of the source datasets are scanned during the processing of the set of frequent pattern queries.
advances in databases and information systems | 1998
Marek Wojciechowski; Maciej Zakrzewicz
Mining association rules is an important data mining problem. Association rules are usually mined repeatedly in different parts of a database. Current algorithms for mining association rules work in two steps. First, the most frequently occurring sets of items are discovered, then the sets are used to generate the association rules. The first step usually requires repeated passes over the analyzed database and determines the overall performance. In this paper, we present a new method that addresses the issue of discovering the most frequently occurring sets of items. Our method consists in materializing precomputed sets of items discovered in logical database partitions. We show that the materialized sets can be repeatedly used to efficiently generate the most frequently occurring sets of items. Using this approach, required association rules can be mined with only one scan of the database. Our experiments show that the proposed method significantly outperforms the well-known algorithms.
knowledge discovery and data mining | 2005
Marek Wojciechowski; Maciej Zakrzewicz
Traditional multiple query optimization methods focus on identifying common subexpressions in sets of relational queries and on constructing their global execution plans. In this paper we consider the problem of optimizing sets of data mining queries submitted to a Knowledge Discovery Management System. We describe the problem of data mining query scheduling and we introduce a new algorithm called CCAgglomerative to schedule data mining queries for frequent itemset discovery.
Information & Software Technology | 2003
Alexandros Nanopoulos; Maciej Zakrzewicz; Tadeusz Morzy; Yannis Manolopoulos
Abstract The number of patterns discovered by data mining can become tremendous, in some cases exceeding the size of the original database. Therefore, there is a requirement for querying previously generated mining results or for querying the database against discovered patters. In this paper, we focus on developing methods for the storage and querying of large collections of sequential patterns. We describe a family of algorithms, which address the problem of considering the ordering among elements, that is crucial when dealing with sequential patterns. Moreover, we take into account the fact that the distribution of elements within sequential patterns is highly skewed, to propose a novel approach for the effective encoding of patterns. Experimental results, which examine a variety of factors, illustrate the efficiency of the proposed method.
database and expert systems applications | 2002
Bogdan D. Czejdo; Mikolaj Morzy; Marek Wojciechowski; Maciej Zakrzewicz
Data mining is an interactive and iterative process. It is highly probable that a user will issue a series of similar queries until he or she receives satisfying results. Currently available mining algorithms suffer from long processing times depending mainly on the size of the dataset. As the pattern discovery takes place mainly in the data warehouse environment, such long processing times are unacceptable from the point of view of interactive data mining. On the other hand, the results of consecutive data mining queries are usually very similar. This observation leads to the idea of reusing materialized results of previous data mining queries in order to improve performance of the system. In this paper we present the concept of materialized data mining views and we show how the results stored in these views can be used to accelerate processing of data mining queries. We demonstrate the use of materialized views in the domains of association rules discovery and sequential pattern search.