Ammar Shaker
University of Marburg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ammar Shaker.
Sigkdd Explorations | 2014
Georg Krempl; Indre Žliobaite; Dariusz Brzezinski; Eyke Hüllermeier; Vincent Lemaire; Tino Noack; Ammar Shaker; Sonja Sievi; Myra Spiliopoulou; Jerzy Stefanowski
Every day, huge volumes of sensory, transactional, and web data are continuously generated as streams, which need to be analyzed online as they arrive. Streaming data can be considered as one of the main sources of what is called big data. While predictive modeling for data streams and big data have received a lot of attention over the last decade, many research approaches are typically designed for well-behaved controlled problem settings, overlooking important challenges imposed by real-world applications. This article presents a discussion on eight open challenges for data stream mining. Our goal is to identify gaps between current research and meaningful applications, highlight open problems, and define new application-relevant research directions for data stream mining. The identified challenges cover the full cycle of knowledge discovery and involve such problems as: protecting data privacy, dealing with legacy systems, handling incomplete and delayed information, analysis of complex data, and evaluation of stream mining algorithms. The resulting analysis is illustrated by practical applications and provides general suggestions concerning lines of future research in data stream mining.
Evolving Systems | 2012
Ammar Shaker; Eyke Hüllermeier
This paper presents an approach to learning on data streams called IBLStreams. More specifically, we introduce the main methodological concepts underlying this approach and discuss its implementation under the MOA software framework. IBLStreams is an instance-based algorithm that can be applied to classification and regression problems. In comparison to model-based methods for learning on data streams, it is conceptually simple. Moreover, as an algorithm for learning in dynamically evolving environments, it has a number of desirable properties that are not, at least not as a whole, shared by currently existing alternatives. Our experimental validation provides evidence for its flexibility and ability to adapt to changes of the environment quickly, a point of utmost importance in the data stream context. At the same time, IBLStreams turns out to be competitive to state-of-the-art methods in terms of prediction accuracy. Moreover, due to its robustness, it is applicable to streams with different characteristics.
Evolving Systems | 2014
Ammar Shaker; Edwin Lughofer
In this paper, we are dealing with a new concept for handling drifts in data streams during the run of on-line, evolving modeling processes in a regression context. Drifts require a specific attention in evolving modeling methods, as they usually change the underlying data distribution making previously learnt model parameters and structure outdated. Our approach comes with three new stages for an appropriate drift handling: (1) drifts are not only detected, but also quantified with a new extended version of the Page-Hinkley test; (2) we integrate an adaptive forgetting factor changing over time and which steers the degree of forgetting in dependency of the current drift intensity in the data stream; (3) we introduce local forgetting factors by addressing the different local regions of the feature space with a different forgetting intensity; this is achieved by using fuzzy model architecture within stream learning whose structural components (fuzzy rules) provide a local partitioning of the feature space and furthermore ensure smooth transitions of drift handling topology between neighboring regions. Additionally, our approach foresees an early drift recognition variant, which relies on divergence measures, indicating the degree of divergence in local parts of the feature space separately already before the global model error may start to rise significantly. Thus, it can be seen as an attempt regarding drift prevention on global model level. The new approach is successfully evaluated and compared with fixed forgetting and no forgetting on high-dimensional real-world data streams, including different types of drifts.
Information Sciences | 2013
Ammar Shaker; Robin Senge; Eyke Hüllermeier
Fuzzy pattern trees (FPTs) have recently been introduced as a novel model class for machine learning. In this paper, we consider the problem of learning fuzzy pattern trees for binary classification from data streams. Apart from its practical relevance, this problem is also interesting from a methodological point of view. First, the aspect of efficiency plays an important role in the context of data streams, since learning has to be accomplished under hard time (and memory) constraints. Moreover, a learning algorithm should be adaptive in the sense that an up-to-date model is offered at any time, taking new data items into consideration as soon as they arrive and perhaps forgetting old ones that have become obsolete due to a change of the underlying data generating process. To meet these requirements, we develop an evolving version of fuzzy pattern tree learning, in which model adaptation is realized by anticipating possible local changes of the current model, and confirming these changes through statistical hypothesis testing. In experimental studies, we compare our method to a state-of-the-art tree-based classifier for learning from data streams, showing that evolving pattern trees are competitive in terms of performance while typically producing smaller and more compact models.
Neurocomputing | 2015
Ammar Shaker; Eyke Hüllermeier
The extension of machine learning methods from static to dynamic environments has received increasing attention in recent years; in particular, a large number of algorithms for learning from so-called data streams has been developed. An important property of dynamic environments is non-stationarity, i.e., the assumption of an underlying data generating process that may change over time. Correspondingly, the ability to properly react to so-called concept change is considered as an important feature of learning algorithms. In this paper, we propose a new type of experimental analysis, called recovery analysis, which is aimed at assessing the ability of a learner to discover a concept change quickly, and to take appropriate measures to maintain the quality and generalization performance of the model. We develop recovery analysis for two types of supervised learning problems, namely classification and regression. Moreover, as a practical application, we make use of recovery analysis in order to compare model-based and instance-based approaches to learning on data streams.
Archive | 2012
Ammar Shaker; Eyke Hüllermeier
In order to be useful and effectively applicable in dynamically evolving environments, machine learning methods have to meet several requirements, including the ability to analyze incoming data in an online, incremental manner, to observe tight time and memory constraints, and to appropriately respond to changes of the data characteristics and underlying distributions. This paper advocates an instance-based learning algorithm for that purpose, both for classification and regression problems. This algorithm has a number of desirable properties that are not, at least not as a whole, shared by currently existing alternatives. Notably, our method is very flexible and thus able to adapt to an evolving environment quickly, a point of utmost importance in the data stream context. At the same time, the algorithm is relatively robust and thus applicable to streams with different characteristics.
2013 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS) | 2013
Ammar Shaker; Edwin Lughofer
In this paper, we present new concepts for dealing with drifts in data streams during the run of on-line modeling processes for regression problems in the context of evolving fuzzy systems. Opposed to the nominal case based on conventional life-long learning, drifts are requiring a specific treatment for the modeling phase, as they refer to changes in the underlying data distribution or target concepts, which makes older learned concepts obsolete. Our approach comes with three new stages for an appropriate drift handling: 1.) drifts are not only detected, but also quantified with a new extended version of the Page-Hinkley test, which overcomes some instabilities during downtrends of the indicator; 2.) based on the current intensity quantification of the drift, the necessary degree of forgetting (weak to strong) is extracted (adaptive forgetting); 3.) the latter is achieved by two variants, a.) a single forgetting factor value, accounting for global drifts, and b.) a forgetting factor vector with different entries for separate regions of the feature space, accounting for local drifts. Forgetting factors are integrated into the learning scheme of both, the antecedent and consequent parts of the evolving fuzzy systems. The new approach will be evaluated on high-dimensional data streams, where the results will show that 1.) our adaptive forgetting strategy outperforms the usage of fixed forgetting factors throughout the learning process and 2.) forgetting in local regions may improve forgetting in global ones when drifts appear locally.
IEEE Transactions on Software Engineering | 2017
Marie Christin Platenius; Ammar Shaker; Matthias Becker; Eyke Hüllermeier; Wilhelm Schäfer
Today, software components are provided by global markets in the form of services. In order to optimally satisfy service requesters and service providers, adequate techniques for automatic service matching are needed. However, a requesters requirements may be vague and the information available about a provided service may be incomplete. As a consequence, fuzziness is induced into the matching procedure. The contribution of this paper is the development of a systematic matching procedure that leverages concepts and techniques from fuzzy logic and possibility theory based on our formal distinction between different sources and types of fuzziness in the context of service matching. In contrast to existing methods, our approach is able to deal with imprecision and incompleteness in service specifications and to inform users about the extent of induced fuzziness in order to improve the users decision-making. We demonstrate our approach on the example of specifications for service reputation based on ratings given by previous users. Our evaluation based on real service ratings shows the utility and applicability of our approach.
International Journal of Applied Mathematics and Computer Science | 2014
Ammar Shaker; Eyke Hüllermeier
Abstract In this paper, we introduce a method for survival analysis on data streams. Survival analysis (also known as event history analysis) is an established statistical method for the study of temporal “events” or, more specifically, questions regarding the temporal distribution of the occurrence of events and their dependence on covariates of the data sources. To make this method applicable in the setting of data streams, we propose an adaptive variant of a model that is closely related to the well-known Cox proportional hazard model. Adopting a sliding window approach, our method continuously updates its parameters based on the event data in the current time window. As a proof of concept, we present two case studies in which our method is used for different types of spatio-temporal data analysis, namely, the analysis of earthquake data and Twitter data. In an attempt to explain the frequency of events by the spatial location of the data source, both studies use the location as covariates of the sources.
european conference on machine learning | 2017
Ammar Shaker; Waleri Heldt; Eyke Hüllermeier
Learning from data streams has received increasing attention in recent years, not only in the machine learning community but also in other research fields, such as computational intelligence and fuzzy systems. In particular, several rule-based methods for the incremental induction of regression models have been proposed. In this paper, we develop a method that combines the strengths of two existing approaches rooted in different learning paradigms. Our method induces a set of fuzzy rules, which, compared to conventional rules with Boolean antecedents, has the advantage of producing smooth regression functions. To do so, it makes use of an induction technique inspired by AMRules, a very efficient and effective learning algorithm that can be seen as the state of the art in machine learning. We conduct a comprehensive experimental study showing that a combination of the expressiveness of fuzzy rules with the algorithmic concepts of AMRules yields a learning system with superb performance.