Indrė Žliobaitė
Helsinki Institute for Information Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Indrė Žliobaitė.
ACM Computing Surveys | 2014
João Gama; Indrė Žliobaitė; Albert Bifet; Mykola Pechenizkiy; Abdelhamid Bouchachia
Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this article, we characterize adaptive learning processes; categorize existing strategies for handling concept drift; overview the most representative, distinct, and popular techniques and algorithms; discuss evaluation methodology of adaptive algorithms; and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.
european conference on machine learning | 2011
Indrė Žliobaitė; Albert Bifet; Bernhard Pfahringer; Geoff Holmes
In learning to classify streaming data, obtaining the true labels may require major effort and may incur excessive cost. Active learning focuses on learning an accurate model with as few labels as possible. Streaming data poses additional challenges for active learning, since the data distribution may change over time (concept drift) and classifiers need to adapt. Conventional active learning strategies concentrate on querying the most uncertain instances, which are typically concentrated around the decision boundary. If changes do not occur close to the boundary, they will be missed and classifiers will fail to adapt. In this paper we develop two active learning strategies for streaming data that explicitly handle concept drift. They are based on uncertainty, dynamic allocation of labeling efforts over time and randomization of the search space. We empirically demonstrate that these strategies react well to changes that can occur anywhere in the instance space and unexpectedly.
Archive | 2016
Indrė Žliobaitė; Mykola Pechenizkiy; João Gama
In most challenging data analysis applications, data evolve over time and must be analyzed in near real time. Patterns and relations in such data often evolve over time, thus, models built for analyzing such data quickly become obsolete over time. In machine learning and data mining this phenomenon is referred to as concept drift. The objective is to deploy models that would diagnose themselves and adapt to changing data over time. This chapter provides an application oriented view towards concept drift research, with a focus on supervised learning tasks. First we overview and categorize application tasks for which the problem of concept drift is particularly relevant. Then we construct a reference framework for positioning application tasks within a spectrum of problems related to concept drift. Finally, we discuss some promising research directions from the application perspective, and present recommendations for application driven concept drift research and development.
european conference on machine learning | 2013
Albert Bifet; Jesse Read; Indrė Žliobaitė; Bernhard Pfahringer; Geoffrey Holmes
Data stream classification plays an important role in modern data analysis, where data arrives in a stream and needs to be mined in real time. In the data stream setting the underlying distribution from which this data comes may be changing and evolving, and so classifiers that can update themselves during operation are becoming the state-of-the-art. In this paper we show that data streams may have an important temporal component, which currently is not considered in the evaluation and benchmarking of data stream classifiers. We demonstrate how a naive classifier considering the temporal component only outperforms a lot of current state-of-the-art classifiers on real data streams that have temporal dependence, i.e. data is autocorrelated. We propose to evaluate data stream classifiers taking into account temporal dependence, and introduce a new evaluation measure, which provides a more accurate gauge of data stream classifier performance. In response to the temporal dependence issue we propose a generic wrapper for data stream classifiers, which incorporates the temporal component into the attribute space.
intelligent data analysis | 2011
Indrė Žliobaitė
Concept drift is a challenge in supervised learning for sequential data. It describes a phenomenon when the data distributions change over time. In such a case accuracy of a classifier benefits from the selective sampling for training. We develop a method for training set selection, particularly relevant when the expected drift is gradual. Training set selection at each time step is based on the distance to the target instance. The distance function combines similarity in space and in time. The method determines an optimal training set size online at every time step using cross validation. It is a wrapper approach, it can be used plugging in different base classifiers. The proposed method shows the best accuracy in the peer group on the real and artificial drifting data. The method complexity is reasonable for the field applications.
Sigkdd Explorations | 2010
Mykola Pechenizkiy; Jorn Bakker; Indrė Žliobaitė; Andriy Ivannikov; Tommi Kärkkäinen
Fuel feeding and inhomogeneity of fuel typically cause fluctuations in the circulating fluidized bed (CFB) process. If control systems fail to compensate the fluctuations, the whole plant will suffer from dynamics that is reinforced by the closed-loop controls. This phenomenon causes reducing efficiency and the lifetime of process components. In this paper we address the problem of online mass flow prediction, which is a part of control. Particularly, we consider the problem of learning an accurate predictor with explicit detection of abrupt concept drift and noise handling mechanisms. We emphasize the importance of having domain knowledge concerning the considered case and constructing the ground truth for facilitating the quantitative evaluation of different approaches. We demonstrate the performance of change detection methods and show their effect on the accuracy of the online mass flow prediction with real datasets collected from the experimental laboratory-scale CFB boiler.
Philosophical Transactions of the Royal Society B | 2016
Mikael Fortelius; Indrė Žliobaitė; Ferhat Kaya; Faysal Bibi; René Bobe; Louise N. Leakey; Meave G. Leakey; David Patterson; Janina Rannikko; Lars Werdelin
Although ecometric methods have been used to analyse fossil mammal faunas and environments of Eurasia and North America, such methods have not yet been applied to the rich fossil mammal record of eastern Africa. Here we report results from analysis of a combined dataset spanning east and west Turkana from Kenya between 7 and 1 million years ago (Ma). We provide temporally and spatially resolved estimates of temperature and precipitation and discuss their relationship to patterns of faunal change, and propose a new hypothesis to explain the lack of a temperature trend. We suggest that the regionally arid Turkana Basin may between 4 and 2 Ma have acted as a ‘species factory’, generating ecological adaptations in advance of the global trend. We show a persistent difference between the eastern and western sides of the Turkana Basin and suggest that the wetlands of the shallow eastern side could have provided additional humidity to the terrestrial ecosystems. Pending further research, a transient episode of faunal change centred at the time of the KBS Member (1.87–1.53 Ma), may be equally plausibly attributed to climate change or to a top-down ecological cascade initiated by the entry of technologically sophisticated humans. This article is part of the themed issue ‘Major transitions in human evolution’.
discovery science | 2013
Dino Ienco; Albert Bifet; Indrė Žliobaitė; Bernhard Pfahringer
Data labeling is an expensive and time-consuming task. Choosing which labels to use is increasingly becoming important. In the active learning setting, a classifier is trained by asking for labels for only a small fraction of all instances. While many works exist that deal with this issue in non-streaming scenarios, few works exist in the data stream setting. In this paper we propose a new active learning approach for evolving data streams based on a pre-clustering step, for selecting the most informative instances for labeling. We consider a batch incremental setting: when a new batch arrives, first we cluster the examples, and then, we select the best instances to train the learner. The clustering approach allows to cover the whole data space avoiding to oversample examples from only few areas. We compare our method w.r.t. state of the art active learning strategies over real datasets. The results highlight the improvement in performance of our proposal. Experiments on parameter sensitivity are also reported.
knowledge discovery and data mining | 2009
Jorn Bakker; Mykola Pechenizkiy; Indrė Žliobaitė; Andriy Ivannikov; Tommi Kärkkäinen
In this paper we consider an application of data mining technology to the analysis of time series data from a pilot circulating fluidized bed (CFB) reactor. We focus on the problem of the online mass prediction in CFB boilers. We present a framework based on switching regression models depending on perceived changes in the data. We analyze three alternatives for change detection. Additionally, a noise canceling and a state determination and windowing mechanisms are used for improving the robustness of online prediction. We validate our ideas on real data collected from the pilot CFB boiler.
Nature Ecology and Evolution | 2018
Ferhat Kaya; Faysal Bibi; Indrė Žliobaitė; Jussi T. Eronen; Tang Hui; Mikael Fortelius
Despite much interest in the ecology and origins of the extensive grassland ecosystems of the modern world, the biogeographic relationships of savannah palaeobiomes of Africa, India and mainland Eurasia have remained unclear. Here we assemble the most recent data from the Neogene mammal fossil record in order to map the biogeographic development of Old World mammalian faunas in relation to palaeoenvironmental conditions. Using genus-level faunal similarity and mean ordinated hypsodonty in combination with palaeoclimate modelling, we show that savannah faunas developed as a spatially and temporally connected entity that we term the Old World savannah palaeobiome. The Old World savannah palaeobiome flourished under the influence of middle and late Miocene global cooling and aridification, which resulted in the spread of open habitats across vast continental areas. This extensive biome fragmented into Eurasian and African branches due to increased aridification in North Africa and Arabia during the late Miocene. Its Eurasian branches had mostly disappeared by the end of the Miocene, but the African branch survived and eventually contributed to the development of Plio–Pleistocene African savannah faunas, including their early hominins. The modern African savannah fauna is thus a continuation of the extensive Old World savannah palaeobiome.Savannah faunas developed in a spatially and temporally connected palaeobiome that flourished in the mid Miocene, before fragmenting into Eurasian and African branches in the late Miocene.