Mikolaj Morzy
Poznań University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mikolaj Morzy.
machine learning and data mining in pattern recognition | 2007
Mikolaj Morzy
Advances in wireless and mobile technology flood us with amounts of moving object data that preclude all means of manual data processing. The volume of data gathered from position sensors of mobile phones, PDAs, or vehicles, defies human ability to analyze the stream of input data. On the other hand, vast amounts of gathered data hide interesting and valuable knowledge patterns describing the behavior of moving objects. Thus, new algorithms for mining moving object data are required to unearth this knowledge. An important function of the mobile objects management system is the prediction of the unknown location of an object. In this paper we introduce a data mining approach to the problem of predicting the location of a moving object. We mine the database of moving object locations to discover frequent trajectories and movement rules. Then, we match the trajectory of a moving object with the database of movement rules to build a probabilistic model of object location. Experimental evaluation of the proposal reveals prediction accuracy close to 80%. Our original contribution includes the elaboration on the location prediction model, the design of an efficient mining algorithm, introduction of movement rule matching strategies, and a thorough experimental evaluation of the proposed model.
workshop on internet and network economics | 2006
Mikolaj Morzy; Adam Wierzbicki
A reliable mechanism for scoring the reputation of sellers is crucial for the development of a successful environment for customer-to-customer e-commerce. Unfortunately, most C2C environments utilize simple feedback-based reputation systems, that not only do not offer sufficient protection from fraud, but tend to overestimate the reputation of sellers by introducing a strong bias toward maximizing the volume of sales at the expense of the quality of service. In this paper we present a method that avoids the unfavorable phenomenon of overestimating the reputation of sellers by using implicit feedbacks. We introduce the notion of an implicit feedback and we propose two strategies for discovering implicit feedbacks. We perform a twofold evaluation of our proposal. To demonstrate the existence of the implicit feedback and to propose an advanced method of implicit feedback discovery we conduct experiments on a large volume of real-world data acquired from an online auction site. Next, a game-theoretic approach is presented that uses simulation to show that the use of the implicit feedback can improve a simple reputation system such as used by eBay. Both the results of the simulation and the results of experiments prove the validity and importance of using implicit feedbacks in reputation scoring.
international conference on conceptual modeling | 2012
Bartosz Bębel; Mikolaj Morzy; Tadeusz Morzy; Zbyszko Królikowski; Robert Wrembel
Nowadays business intelligence technologies allow to analyze mainly set oriented data, without considering order dependencies between data. Few approaches to analyzing data of sequential order have been proposed so far. Nonetheless, for storing and manipulating sequential data the approaches use either the relational data model or its extensions. We argue that in order to be able to fully support the analysis of sequential data, a dedicated new data model is needed. In this paper, we propose a formal model for time point-based sequential data with operations that allow to construct sequences of events, organize them in an OLAP-like manner, and analyze them. To the best of our knowledge, this is the first formal model and query language for this class of data.
advances in social networks analysis and mining | 2011
Krzysztof Jędrzejewski; Mikolaj Morzy
In this paper we discuss the role and importance of social networks as preferred environments for opinion mining and sentiment analysis especially. We begin by briefly describing selected properties of social networks that are relevant with respect to opinion mining and we outline the general relationships between the two disciplines. We present the related work and provide basic definitions used in opinion mining. Then, we introduce our original method of opinion classification and we test the presented algorithm on real world datasets acquired from popular Polish social networks, reporting on the results. The results are promising and soundly support the main thesis of the paper, namely, that social networks exhibit properties that make them very suitable for opinion mining activities.
database and expert systems applications | 2002
Bogdan D. Czejdo; Mikolaj Morzy; Marek Wojciechowski; Maciej Zakrzewicz
Data mining is an interactive and iterative process. It is highly probable that a user will issue a series of similar queries until he or she receives satisfying results. Currently available mining algorithms suffer from long processing times depending mainly on the size of the dataset. As the pattern discovery takes place mainly in the data warehouse environment, such long processing times are unacceptable from the point of view of interactive data mining. On the other hand, the results of consecutive data mining queries are usually very similar. This observation leads to the idea of reusing materialized results of previous data mining queries in order to improve performance of the system. In this paper we present the concept of materialized data mining views and we show how the results stored in these views can be used to accelerate processing of data mining queries. We demonstrate the use of materialized views in the domains of association rules discovery and sequential pattern search.
international symposium on computer and information sciences | 2004
Maciej Zakrzewicz; Mikolaj Morzy; Marek Wojciechowski
One of the classic data mining problems is discovery of frequent itemsets. This problem particularly attracts database community as it resembles traditional database querying. In this paper we consider a data mining system which supports storing of previous query results in the form of materialized data mining views. While numerous works have shown that reusing results of previous frequent itemset queries can significantly improve performance of data mining query processing, a thorough study of possible differences between the current query and a materialized view has not been presented yet. In this paper we classify possible differences into six classes, provide I/O cost analysis for all the classes, and experimentally evaluate the most promising one.
ACM Transactions on Internet Technology | 2017
Michal Ciesielczyk; Andrzej Szwabe; Mikolaj Morzy; Pawel Misiorek
The vector space model is undoubtedly among the most popular data representation models used in the processing of large networks. Unfortunately, the vector space model suffers from the so-called curse of dimensionality, a phenomenon where data become extremely sparse due to an exponential growth of the data space volume caused by a large number of dimensions. Thus, dimensionality reduction techniques are necessary to make large networks represented in the vector space model available for analysis and processing. Most dimensionality reduction techniques tend to focus on principal components present in the data, effectively disregarding local relationships that may exist between objects. This behavior is a significant drawback of current dimensionality reduction techniques, because these local relationships are crucial for maintaining high accuracy in many network analysis tasks, such as link prediction or community detection. To rectify the aforementioned drawback, we propose Progressive Random Indexing, a new dimensionality reduction technique. Built upon Reflective Random Indexing, our method significantly reduces the dimensionality of the vector space model while retaining all important local relationships between objects. The key element of the Progressive Random Indexing technique is the use of the gain value at each reflection step, which determines how much information about local relationships should be included in the space of reduced dimensionality. Our experiments indicate that when applied to large real-world networks (Facebook social network, MovieLens movie recommendations), Progressive Random Indexing outperforms state-of-the-art methods in link prediction tasks.
data warehousing and knowledge discovery | 2005
Mikolaj Morzy; Marek Wojciechowski; Maciej Zakrzewicz
Discovery of frequent patterns is a very important data mining problem with numerous applications. Frequent pattern mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. A significant amount of research on efficient processing of frequent pattern queries has been done in recent years, focusing mainly on constraint handling and reusing results of previous queries. In this paper we tackle the problem of optimizing a sequence of frequent pattern queries, submitted to the system as a batch. Our solutions are based on previously proposed techniques of reusing results of previous queries, and exploit the fact that knowing a sequence of queries a priori gives the system a chance to schedule and/or adjust the queries so that they can use results of queries executed earlier. We begin with simple query scheduling and then consider other transformations of the original batch of queries.
Entropy | 2016
Tomasz Kajdanowicz; Mikolaj Morzy
Over the years, several theoretical graph generation models have been proposed. Among the most prominent are: the Erdős–Renyi random graph model, Watts–Strogatz small world model, Albert–Barabasi preferential attachment model, Price citation model, and many more. Often, researchers working with real-world data are interested in understanding the generative phenomena underlying their empirical graphs. They want to know which of the theoretical graph generation models would most probably generate a particular empirical graph. In other words, they expect some similarity assessment between the empirical graph and graphs artificially created from theoretical graph generation models. Usually, in order to assess the similarity of two graphs, centrality measure distributions are compared. For a theoretical graph model this means comparing the empirical graph to a single realization of a theoretical graph model, where the realization is generated from the given model using an arbitrary set of parameters. The similarity between centrality measure distributions can be measured using standard statistical tests, e.g., the Kolmogorov–Smirnov test of distances between cumulative distributions. However, this approach is both error-prone and leads to incorrect conclusions, as we show in our experiments. Therefore, we propose a new method for graph comparison and type classification by comparing the entropies of centrality measure distributions (degree centrality, betweenness centrality, closeness centrality). We demonstrate that our approach can help assign the empirical graph to the most similar theoretical model using a simple unsupervised learning method.
World Wide Web | 2015
Mikolaj Morzy
Ideas of open access, open data and open science are transforming the world of scientific inquiry as we speak. Every day thousands of ordinary citizens are engaging in data collection and data processing, giving rise to the new field of citizen science. Never before has the technology enabled scientists to reach out to such vast numbers of collaborators and show their work to the public. From pattern recognition in Hubble space telescope images of distant galaxies to field observations of migration patterns of birds in the rural areas of United States, the possibilities are countless. Certainly this new trend poses important problems and challenges, but it is also obvious that wide acceptance of citizen science can lead not only to great scientific results, but to the popularization of scientific method among the public. In the paper we examine the current state of citizen science, we outline some of the most interesting and difficult challenges in leading scientific projects on such scale, and we present typologies of citizen science projects. We also provide a survey of ICT tools available for citizen science projects.