Pedro A. C. Sousa
Universidade Nova de Lisboa
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pedro A. C. Sousa.
Physics Reports | 2016
Massimiliano Zanin; David Papo; Pedro A. C. Sousa; Ernestina Menasalvas; Andrea Nicchi; Elaine Kubik; Stefano Boccaletti
The increasing power of computer technology does not dispense with the need to extract meaningful in- formation out of data sets of ever growing size, and indeed typically exacerbates the complexity of this task. To tackle this general problem, two methods have emerged, at chronologically different times, that are now commonly used in the scientific community: data mining and complex network theory. Not only do complex network analysis and data mining share the same general goal, that of extracting information from complex systems to ultimately create a new compact quantifiable representation, but they also often address similar problems too. In the face of that, a surprisingly low number of researchers turn out to resort to both methodologies. One may then be tempted to conclude that these two fields are either largely redundant or totally antithetic. The starting point of this review is that this state of affairs should be put down to contingent rather than conceptual differences, and that these two fields can in fact advantageously be used in a synergistic manner. An overview of both fields is first provided, some fundamental concepts of which are illustrated. A variety of contexts in which complex network theory and data mining have been used in a synergistic manner are then presented. Contexts in which the appropriate integration of complex network metrics can lead to improved classification rates with respect to classical data mining algorithms and, conversely, contexts in which data mining can be used to tackle important issues in complex network theory applications are illustrated. Finally, ways to achieve a tighter integration between complex networks and data mining, and open lines of research are discussed.
mobile data management | 2012
João Bártolo Gomes; Shonali Krishnaswamy; Mohamed Medhat Gaber; Pedro A. C. Sousa; Ernestina Menasalvas
Mobile activity recognition focuses on inferring the current activities of a mobile user by leveraging the sensory data that is available on todays smart phones. The state of the art in mobile activity recognition uses traditional classification techniques. Thus, the learning process typically involves: i) collection of labelled sensory data that is transferred and collated in a centralised repository, ii) model building where the classification model is trained and tested using the collected data, iii) a model deployment stage where the learnt model is deployed on-board a mobile device for identifying activities based on new sensory data. In this paper, we demonstrate the Mobile Activity Recognition System (MARS) where for the first time the model is built and continuously updated on-board the mobile device itself using data stream mining. The advantages of the on-board approach are that it allows model personalisation and increased privacy as the data is not sent to any external site. Furthermore, when the user or its activity profile changes MARS enables quick model adaptation. One of the stand out features of MARS is that training/updating the model takes less than 30 seconds per activity. MARS has been implemented on the Android platform to demonstrate that it can achieve accurate mobile activity recognition. Moreover, we can show in practice that MARS quickly adapts to user profile changes while at the same time being scalable and efficient in terms of consumption of the device resources.
acm symposium on applied computing | 2011
João Bártolo Gomes; Ernestina Menasalvas; Pedro A. C. Sousa
The dynamic and unstable nature observed in real world applications influences learning systems through changes in data, context and resource availability. Data stream mining systems must be aware and adapt to such changes so that incoming data can continuously be classified with high accuracy. Ensemble approaches have been shown successful in dealing with concept changes. Despite their success in learning under concept changes, context information has not yet been exploited by ensemble approaches in data stream scenarios where concepts reappear. Under these circumstances, context information appropriately integrated with learned concepts would enable to anticipate recurring changes in concepts. In this work, we present an ensemble based approach for the problem of detecting concept changes in data streams where concepts reappear, that dynamically adds and removes weighted classifiers in response to changes not only in concepts but to context. We identify stable concepts using a change detection method, based on the error-rate of the learning process. Context information is used in the adaptation to recurring concepts and in the management of knowledge from previous learned concepts while adapting to resource constraints. Consequently, proper representation and storage of context and concepts is a major issue dealt within the paper. We present and discuss preliminary experimental results with synthetic and real datasets.
IEEE Transactions on Neural Networks | 2014
João Bártolo Gomes; Mohamed Medhat Gaber; Pedro A. C. Sousa; Ernestina Menasalvas
Most data stream classification techniques assume that the underlying feature space is static. However, in real-world applications the set of features and their relevance to the target concept may change over time. In addition, when the underlying concepts reappear, reusing previously learnt models can enhance the learning process in terms of accuracy and processing time at the expense of manageable memory consumption. In this paper, we propose mining recurring concepts in a dynamic feature space (MReC-DFS), a data stream classification system to address the challenges of learning recurring concepts in a dynamic feature space while simultaneously reducing the memory cost associated with storing past models. MReC-DFS is able to detect and adapt to concept changes using the performance of the learning process and contextual information. To handle recurring concepts, stored models are combined in a dynamically weighted ensemble. Incremental feature selection is performed to reduce the combined feature space. This contribution allows MReC-DFS to store only the features most relevant to the learnt concepts, which in turn increases the memory efficiency of the technique. In addition, an incremental feature selection method is proposed that dynamically determines the threshold between relevant and irrelevant features. Experimental results demonstrating the high accuracy of MReC-DFS compared with state-of-the-art techniques on a variety of real datasets are presented. The results also show the superior memory efficiency of MReC-DFS.
data warehousing and knowledge discovery | 2012
João B; rtolo Gomes; Shonali Krishnaswamy; Mohamed Medhat Gaber; Pedro A. C. Sousa; Ernestina Menasalvas
Mobile activity recognition focuses on inferring the current activities of a mobile user by leveraging the rich sensory data that is available on todays smart phones and other wearable sensors. The state of the art in mobile activity recognition research has focused on traditional classification learning techniques. In this paper, we propose the Mobile Activity Recognition System (MARS) where for the first time the classifier is built on-board the mobile device itself through ubiquitous data stream mining in an incremental manner. The advantages of on-board data stream mining for mobile activity recognition are: i) personalisation of models built to individual users; ii) increased privacy as the data is not sent to an external site; iii) adaptation of the model as the users activity profile changes. In our extensive experimental results using a recent benchmarking activity recognition dataset, we show that MARS can achieve similar accuracy when compared with traditional classifiers for activity recognition, while at the same time being scalable and efficient in terms of the mobile device resources consumption. MARS has been implemented on the Android platform for empirical evaluation.
Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques | 2010
João Bártolo Gomes; Ernestina Menasalvas; Pedro A. C. Sousa
Drift detection methods in data streams can detect changes in incoming data so that learned models can be used to represent the underlying population. In many real-world scenarios context information is available and could be exploited to improve existing approaches, by detecting or even anticipating to recurring concepts in the underlying population. Several applications, among them health-care or recommender systems, lend themselves to use such information as data from sensors is available but is not being used. Nevertheless, new challenges arise when integrating context with drift detection methods. Modeling and comparing context information, representing the context-concepts history and storing previously learned concepts for reuse are some of the critical problems. In this work, we propose the Context-aware Learning from Data Streams (CALDS) system to improve existing drift detection methods by exploiting available context information. Our enhancement is seamless: we use the association between context information and learned concepts to improve detection and adaptation to drift when concepts reappear. We present and discuss our preliminary experimental results with synthetic and real datasets.
intelligent data analysis | 2012
João Bártolo Gomes; Pedro A. C. Sousa; Ernestina Menasalvas
The problem of recurring concepts in data stream classification is a special case of concept drift where concepts may reappear. Although several existing methods are able to learn in the presence of concept drift, few consider contextual information when tracking recurring concepts. Nevertheless, in many real-world scenarios context information is available and can be exploited to improve existing approaches in the detection or even anticipation of recurring concepts. In this work, we propose the extension of existing approaches to deal with the problem of recurring concepts by reusing previously learned decision models in situations where concepts reappear. The different underlying concepts are identified using an existing drift detection method, based on the error-rate of the learning process. A method to associate context information and learned decision models is proposed to improve the adaptation to recurring concepts. The method also addresses the challenge of retrieving the most appropriate concept for a particular context. Finally, to deal with situations of memory scarcity, an intelligent strategy to discard models is proposed. The experiments conducted so far, using synthetic and real datasets, show promising results and make it possible to analyze the trade-off between the accuracy gains and the learned models storage cost.
intelligent data analysis | 2011
João Bártolo Gomes; Mohamed Medhat Gaber; Pedro A. C. Sousa; Ernestina Menasalvas
Recent advances in ubiquitous devices open an opportunity to apply new data stream mining techniques to support intelligent decision making in the next generation of ubiquitous applications. This paper motivates and describes a novel Context-aware Collaborative data stream mining system CC-Stream that allows intelligent mining and classification of time-changing data streams on-board ubiquitous devices. CC-Stream explores the knowledge available in other ubiquitous devices to improve local classification accuracy. Such knowledge is associated with context information that captures the system state for a particular underlying concept. CC-Stream uses an ensemble method where the classifiers are selected and weighted based on their local accuracy for different partitions of the instance space and their context similarity in relation to the current context.
International Journal of Information Technology and Decision Making | 2013
João Bártolo Gomes; Mohamed Medhat Gaber; Pedro A. C. Sousa; Ernestina Menasalvas
In ubiquitous data stream mining, different devices often aim to learn concepts that are similar to some extent. In many applications, such as spam filtering or news recommendation, the data stream underlying concept (e.g., interesting mail/news) is likely to change over time. Therefore, the resultant model must be continuously adapted to such changes. This paper presents a novel Collaborative Data Stream Mining (Coll-Stream) approach that explores the similarities in the knowledge available from other devices to improve local classification accuracy. Coll-Stream integrates the community knowledge using an ensemble method where the classifiers are selected and weighted based on their local accuracy for different partitions of the feature space. We evaluate Coll-Stream classification accuracy in situations with concept drift, noise, partition granularity and concept similarity in relation to the local underlying concept. The experimental results show that Coll-Stream resultant model achieves stability and accuracy in a variety of situations using both synthetic and real-world datasets.
mobile data management | 2010
João Bártolo Gomes; Ernestina Menasalvas; Pedro A. C. Sousa
Advances in data mining, particularly in anytime anywhere data stream mining, make on-board data analysis possible in mobile devices with resource constraints. In this work, we propose a data stream mining service to support knowledge discovery in ubiquitous applications while addressing resource constraints on mobile devices. As the basis for our service we describe a general mechanism, which autonomously adapts the execution of the data stream mining process to each situation, using context and resource awareness. We describe the main components to achieve adaptability and propose a decision mechanism based on machine learning to support the configuration selection task, as we consider this to be a key element to achieve autonomy and adaptation of the mining service. We then present an instantiation of the proposed approach for the particular case of classification using the VFDT algorithm and analyze which factors influence it. Experimental results show how the adaptable data stream mining service improves resource consumption while increasing the quality of the anytime mining model.