Katsiaryna Mirylenka
University of Trento
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Katsiaryna Mirylenka.
very large data bases | 2012
Michele Dallachiesa; Besmira Nushi; Katsiaryna Mirylenka; Themis Palpanas
In the last years there has been a considerable increase in the availability of continuous sensor measurements in a wide range of application domains, such as Location-Based Services (LBS), medical monitoring systems, manufacturing plants and engineering facilities to ensure efficiency, product quality and safety, hydrologic and geologic observing systems, pollution management, and others. Due to the inherent imprecision of sensor observations, many investigations have recently turned into querying, mining and storing uncertain data. Uncertainty can also be due to data aggregation, privacy-preserving transforms, and error-prone mining algorithms. In this study, we survey the techniques that have been proposed specifically for modeling and processing uncertain time series, an important model for temporal data. We provide an analytical evaluation of the alternatives that have been proposed in the literature, highlighting the advantages and disadvantages of each approach, and further compare these alternatives with two additional techniques that were carefully studied before. We conduct an extensive experimental evaluation with 17 real datasets, and discuss some surprising results, which suggest that a fruitful research direction is to take into account the temporal correlations in the time series. Based on our evaluations, we also provide guidelines useful for the practitioners in the field.
Frontiers in Computational Neuroscience | 2011
Aliaksandr Birukou; Joseph Rushton Wakeling; Claudio Bartolini; Fabio Casati; Maurizio Marchese; Katsiaryna Mirylenka; Nardine Osman; Azzurra Ragone; Carles Sierra; Aalam Wassef
In this paper we review several novel approaches for research evaluation. We start with a brief overview of the peer review, its controversies, and metrics for assessing efficiency and overall quality of the peer review. We then discuss five approaches, including reputation-based ones, that come out of the research carried out by the LiquidPub project and research groups collaborated with LiquidPub. Those approaches are alternative or complementary to traditional peer review. We discuss pros and cons of the proposed approaches and conclude with a vision for the future of the research evaluation, arguing that no single system can suit all stakeholders in various communities.
Scientometrics | 2013
Azzurra Ragone; Katsiaryna Mirylenka; Fabio Casati; Maurizio Marchese
In this paper we focus on the analysis of peer reviews and reviewers behaviour in a number of different review processes. More specifically, we report on the development, definition and rationale of a theoretical model for peer review processes to support the identification of appropriate metrics to assess the processes main characteristics in order to render peer review more transparent and understandable. Together with known metrics and techniques we introduce new ones to assess the overall quality (i.e. ,reliability, fairness, validity) and efficiency of peer review processes e.g. the robustness of the process, the degree of agreement/disagreement among reviewers, or positive/negative bias in the reviewers’ decision making process. We also check the ability of peer review to assess the impact of papers in subsequent years. We apply the proposed model and analysis framework to a large reviews data set from ten different conferences in computer science for a total of ca. 9,000 reviews on ca. 2,800 submitted contributions. We discuss the implications of the results and their potential use toward improving the analysed peer review processes. A number of interesting results were found, in particular: (1) a low correlation between peer review outcome and impact in time of the accepted contributions; (2) the influence of the assessment scale on the way how reviewers gave marks; (3) the effect and impact of rating bias, i.e. reviewers who constantly give lower/higher marks w.r.t. all other reviewers; (4) the effectiveness of statistical approaches to optimize some process parameters (e.g. ,number of papers per reviewer) to improve the process overall quality while maintaining the overall effort under control. Based on the lessons learned, we suggest ways to improve the overall quality of peer-review through procedures that can be easily implemented in current editorial management systems.
Quest | 2011
Michele Dallachiesa; Besmira Nushi; Katsiaryna Mirylenka; Themis Palpanas
In the last years there has been a considerable increase in the availability of continuous sensor measurements in a wide range of application domains, such as Location-Based Services (LBS), medical monitoring systems, manufacturing plants and engineering facilities to ensure efficiency, product quality and safety, hydrologic and geologic observing systems, pollution management, and others. Due to the inherent imprecision of sensor observations, many investigations have recently turned into querying, mining and storing uncertain data. Uncertainty can also be due to data aggregation, privacy-preserving transforms, and error-prone mining algorithms. In this study, we survey the techniques that have been proposed specifically for modeling and processing uncertain time series, an important model for temporal data. We provide both an analytical evaluation of the alternatives that have been proposed in the literature, highlighting the advantages and disadvantages of each approach. We additionally conduct an extensive experimental evaluation with 17 real datasets, and discuss some surprising results. Based on our evaluations, we also provide guidelines useful for practitioners in the field.
very large data bases | 2015
Katsiaryna Mirylenka; Graham Cormode; Themis Palpanas; Divesh Srivastava
The notion of heavy hitters—items that make up a large fraction of the population—has been successfully used in a variety of applications across sensor and RFID monitoring, network data analysis, event mining, and more. Yet this notion often fails to capture the semantics we desire when we observe data in the form of correlated pairs. Here, we are interested in items that are conditionally frequent: when a particular item is frequent within the context of its parent item. In this work, we introduce and formalize the notion of conditional heavy hitters to identify such items, with applications in network monitoring and Markov chain modeling. We explore the relationship between conditional heavy hitters and other related notions in the literature, and show analytically and experimentally the usefulness of our approach. We introduce several algorithm variations that allow us to efficiently find conditional heavy hitters for input data with very different characteristics, and provide analytical results for their performance. Finally, we perform experimental evaluations with several synthetic and real datasets to demonstrate the efficacy of our methods and to study the behavior of the proposed algorithms for different types of data.
international conference on data engineering | 2013
Katsiaryna Mirylenka; Themis Palpanas; Graham Cormode; Divesh Srivastava
The notion of heavy hitters-items that make up a large fraction of the population - has been successfully used in a variety of applications across sensor and RFID monitoring, network data analysis, event mining, and more. Yet this notion often fails to capture the semantics we desire when we observe data in the form of correlated pairs. Here, we are interested in items that are conditionally frequent: when a particular item is frequent within the context of its parent item. In this work, we introduce and formalize the notion of Conditional Heavy Hitters to identify such items, with applications in network monitoring, and Markov chain modeling. We introduce several streaming algorithms that allow us to find conditional heavy hitters efficiently, and provide analytical results. Different algorithms are successful for different input characteristics. We perform experimental evaluations to demonstrate the efficacy of our methods, and to study which algorithms are most suited for different types of data.
Data Mining and Knowledge Discovery | 2017
Katsiaryna Mirylenka; George Giannakopoulos; Le Minh Do; Themis Palpanas
Machine learning algorithms perform differently in settings with varying levels of training set mislabeling noise. Therefore, the choice of the right algorithm for a particular learning problem is crucial. The contribution of this paper is towards two, dual problems: first, comparing algorithm behavior; and second, choosing learning algorithms for noisy settings. We present the “sigmoid rule” framework, which can be used to choose the most appropriate learning algorithm depending on the properties of noise in a classification problem. The framework uses an existing model of the expected performance of learning algorithms as a sigmoid function of the signal-to-noise ratio in the training instances. We study the characteristics of the sigmoid function using five representative non-sequential classifiers, namely, Naïve Bayes, kNN, SVM, a decision tree classifier, and a rule-based classifier, and three widely used sequential classifiers based on hidden Markov models, conditional random fields and recursive neural networks. Based on the sigmoid parameters we define a set of intuitive criteria that are useful for comparing the behavior of learning algorithms in the presence of noise. Furthermore, we show that there is a connection between these parameters and the characteristics of the underlying dataset, showing that we can estimate an expected performance over a dataset regardless of the underlying algorithm. The framework is applicable to concept drift scenarios, including modeling user behavior over time, and mining of noisy time series of evolving nature.
statistical and scientific database management | 2017
Katsiaryna Mirylenka; Michele Dallachiesa; Themis Palpanas
The increased availability of unprecedented amounts of sequential data (generated by Internet-of-Things, as well as scientific applications) has led in the past few years to a renewed interest and attention to the field of data series processing and analysis. Data series collections are processed and analyzed using a large variety of techniques, most of which are based on the computation of some distance function. In this study, we revisit this basic operation of data series distance calculation. We observe that the popular distance measures are oblivious to the correlations inherent in neighboring values in a data series. Therefore, we evaluate the plausibility and benefit of incorporating into the distance function measures of correlation, which enable us to capture the associations among neighboring values in the sequence. We propose four such measures, inspired by statistical and probabilistic approaches, which can effectively model these correlations. We analytically and experimentally demonstrate the benefits of the new measures using the 1NN classification task, and discuss the lessons learned. Finally, we propose future research directions for enabling the proposed measures to be used in practice.
Archive | 2010
Fabio Casati; Maurizio Marchese; Katsiaryna Mirylenka; Azzurra Ragone
extending database technology | 2016
Katsiaryna Mirylenka; Vassilis Christophides; Themis Palpanas; Ioannis Pefkianakis; Martin May