Rui Sarmento
University of Porto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rui Sarmento.
Social Network Analysis and Mining | 2016
Mário Cordeiro; Rui Sarmento; João Gama
Abstract The amount and the variety of data generated by today’s online social and telecommunication network services are changing the way researchers analyze social networks. Facing fast evolving networks with millions of nodes and edges are, among other factors, its main challenge. Community detection algorithms in these conditions have also to be updated or improved. Previous state-of-the-art algorithms based on the modularity optimization (i.e. Louvain algorithm), provide fast, efficient and robust community detection on large static networks. Nonetheless, due to the high computing complexity of these algorithms, the use of batch techniques in dynamic networks requires to perform network community detection for the whole network in each one of the evolution steps. This fact reveals to be computationally expensive and unstable in terms of tracking of communities. Our contribution is a novel technique that maintains the community structure always up-to-date following the addition or removal of nodes and edges. The proposed algorithm performs a local modularity optimization that maximizes the modularity gain function only for those communities where the editing of nodes and edges was performed, keeping the rest of the network unchanged. The effectiveness of our algorithm is demonstrated with the comparison to other state-of-the-art community detection algorithms with respect to Newman’s Modularity, Modularity with Split Penalty, Modularity Density, number of detected communities and running time.
Archive | 2016
Rui Sarmento; Márcia E. Oliveira; Mário Cordeiro; Shazia Tabassum; João Gama
Mobile phones are powerful tools to connect people. The streams of Call Detail Records (CDR’s) generating from these devices provide a powerful abstraction of social interactions between individuals, representing social structures. Call graphs can be deduced from these CDRs, where nodes represent subscribers and edges represent the phone calls made. These graphs may easily reach millions of nodes and billions of edges. Besides being large-scale and generated in real-time, the underlying social networks are inherently complex and, thus, difficult to analyze. Conventional data analysis performed by telecom operators is slow, done by request and implies heavy costs in data warehouses. In face of these challenges, real-time streaming analysis becomes an ever increasing need to mobile operators, since it enables them to quickly detect important network events and optimize business operations. Sampling, together with visualization techniques, are required for online exploratory data analysis and event detection in such networks. In this chapter, we report the burgeoning body of research in network sampling, visualization of streaming social networks, stream analysis and the solutions proposed so far.
international conference on enterprise information systems | 2015
Rui Sarmento; Mário Cordeiro; João Gama
The combination of top-K network representation of the data stream with community detection is a novel approach to streaming networks sampling. Keeping an always up-to-date sample of the full network, the advantage of this method, compared to previous, is that it preserves larger communities and original network distribution. Empirically, it will also be shown that these techniques, in conjunction with community detection, provide effective ways to perform sampling and analysis of large scale streaming networks with power law distributions.
acm symposium on applied computing | 2015
Rui Sarmento; Mário Cordeiro; João Gama
Large scale social networks streaming and visualization has been a hot topic in recent research. Researchers strive to achieve efficient streaming methods and to be able to gather knowledge from the results. Moreover treating the data as a continuous real time flow is a demand for immediate response to events in daily life. Our contribution is to treat the data as a continuous stream and represent it by streaming the egocentric networks (Ego-Networks) for particular nodes. We propose a non-standard node forgetting factor in the representation of the network data stream. Thus, this representation is sensible to recent events in users networks and less sensible for the past node events. The aim of these techniques is the visualization of large scale Ego-Networks from telecommunications social networks with power law distributions.
international conference on enterprise information systems | 2018
Rui Sarmento; Mário Cordeiro; Pavel Brazdil; João Gama
Text Mining and NLP techniques are a hot topic nowadays. Researchers thrive to develop new and faster algorithms to cope with larger amounts of data. Particularly, text data analysis has been increasing in interest due to the growth of social networks media. Given this, the development of new algorithms and/or the upgrade of existing ones is now a crucial task to deal with text mining problems under this new scenario. In this paper, we present an update to TextRank, a well-known implementation used to do automatic keyword extraction from text, adapted to deal with streams of text. In addition, we present results for this implementation and compare them with the batch version. Major improvements are lowest computation times for the processing of the same text data, in a streaming environment, both in sliding window and incremental setups. The speedups obtained in the experimental results are significant. Therefore the approach was considered valid and useful to the research community.
International Journal of Social and Organizational Dynamics in IT (IJSODIT) | 2017
Rui Sarmento
Nowadays, treating thedataasacontinuousreal-timeflux isanexigenceexplainedby theneed forimmediateresponsetoeventsindailylife.Westudythedatalikeanongoingdatastreamand representitbystreamingegocentricnetworks(Ego-Networks)oftheparticularnodesunderstudy. Weuseanon-standardnodeforgettingfactorintherepresentationofthenetworkdatastream,as previouslyintroducedintherelatedliterature.Thiswaytherepresentationissensibletorecentevents inusers’networksandlesssensibleforthepastnodeevents.Westudythismethodwithlargescale Ego-Networkstakenfromtelecommunicationssocialnetworkswithpowerlawdistribution.Weaim tocompareandanalysissomereferenceEgo-Networksmetrics,andtheirvariationwithorwithout forgettingfactor. KEywORDS Data Stream Analysis, Ego-Networks, Real-Time Applications, Social Network Stream Mining, Telecommunication Networks
International Conference on Complex Networks and their Applications | 2017
Rui Sarmento; Mário Cordeiro; Pavel Brazdil; João Gama
Social Network Analysis (SNA) is an important research area. It originated in sociology but has spread to other areas of research, including anthropology, biology, information science, organizational studies, political science, and computer science. This has stimulated research on how to support SNA with the development of new algorithms. One of the critical areas involves calculation of different centrality measures. The challenge is how to do this fast, as many increasingly larger datasets are available. Our contribution is an incremental version of the Laplacian Centrality measure that can be applied not only to large graphs but also to dynamically changing networks. We have conducted several tests with different types of evolving networks. We show that our incremental version can process a given large network, faster than the corresponding batch version in both incremental and full dynamic network setups.
International Journal of Social and Organizational Dynamics in IT (IJSODIT) | 2016
Rui Sarmento; LuÃs Trigo; Liliana Fonseca
Managers, investors, financial institutions and government agencies have a major concern on forecasting enterprise bankruptcy. It enables the sustainability assessment of critical suppliers and clients, as well as competitors and the business environment. Throughout the 20th and the 21st century, advances in statistics and computer science fields enabled the development of different trends in financial distress assessment that co-exist today. However, recent Data Mining (DM) techniques are regarded as being the most precise. IT expertise requirements in the constantly evolving DM field may have been a major obstacle to the adoption of these techniques by decision makers. Furthermore, DM software tools that are now widespread offer a broad spectrum of Artificial Intelligence algorithms and the most difficult task may be the decision of selecting the appropriate algorithm. Hence, the adoption of a good workflow method for data processing and analysis is critical for having fast and reliable results. This work presents an overview of the available bankruptcy techniques and provides a comprehensive case study exploring the latest Data Mining techniques.
international joint conference on knowledge discovery knowledge engineering and knowledge management | 2015
Luís Trigo; Martin Víta; Rui Sarmento; Pavel Brazdil
We present an Information Retrieval tool that facilitates the task of the user when searching for a particular information that is of interest to him. Our system processes a given set of documents to produce a graph, where nodes represent documents and links the similarities. The aim is to offer the user a tool to navigate in this space in an easy way. It is possible to collapse/expand nodes. Our case study shows affinity groups based on the similarities of text production of researchers. This goes beyond the already established communities revealed by co-authorship. The system characterizes the activity of each author by a set of automatically generated keywords and by membership to a particular affinity group. The importance of each author is highlighted visually by the size of the node corresponding to the number of publications and different measures of centrality. Regarding the validation of the method, we analyse the impact of using different combinations of titles, abstracts and keywords on capturing the similarity between researchers.
NFMCP'14 Proceedings of the 3rd International Conference on New Frontiers in Mining Complex Patterns | 2014
Rui Sarmento; Mário Cordeiro; João Gama
Regular services in telecommunications produce massive volumes of relational data. In this work the data produced in telecommunications is seen as a streaming network, where clients are the nodes and phone calls are the edges. Visualization techniques are required for exploratory data analysis and event detection. In social network visualization and analysis the goal is to get more information from the data taking into account actors at the individual level. Previous methods relied on aggregating communities, k-Core decompositions and matrix feature representations to visualize and analyse the massive network data. Our contribution is a group visualization and analysis technique of influential actors in the network by sampling the full network with a top-k representation of the network data stream.