Arian Bär
University of Erlangen-Nuremberg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Arian Bär.
international conference on big data | 2014
Arian Bär; Alessandro Finamore; Pedro Casas; Lukasz Golab; Marco Mellia
The complexity of the Internet has rapidly increased, making it more important and challenging to design scalable network monitoring tools. Network monitoring typically requires rolling data analysis, i.e., continuously and incrementally updating (rolling-over) various reports and statistics over highvolume data streams. In this paper, we describe DBStream, which is an SQL-based system that explicitly supports incremental queries for rolling data analysis. We also present a performance comparison of DBStream with a parallel data processing engine (Spark), showing that, in some scenarios, a single DBStream node can outperform a cluster of ten Spark nodes on rolling network monitoring workloads. Although our performance evaluation is based on network monitoring data, our results can be generalized to other Big Data problems with high volume and velocity.
conference on computer communications workshops | 2013
Pierdomenico Fiadino; Arian Bär; Pedro Casas
The popularity of web-based services and applications like YouTube and Facebook has taken HTTP back to the pole position on end-user traffic consumption. We present HTTPTag, a flexible on-line HTTP classification system based on pattern matching and tagging. HTTPTag recognizes and tracks the evolution of more than 280 HTTP applications on the fly. This applications are responsible for about 70% of the HTTP traffic in the investigated operational 3G network. HTTPTag improves the network traffic visibility of an operator, performing tasks such as top-services ranking, long-term monitoring of the application popularity, and trend analysis among others.
international conference on wireless communications and mobile computing | 2014
Arian Bär; Pedro Casas; Lukasz Golab; Alessandro Finamore
Network traffic monitoring systems generate high volumes of heterogeneous data streams which have to be processed and analyzed with different time constraints for daily network management operations. Some monitoring applications such as anomaly detection, performance tracking and alerting require fast processing of specific incoming real-time data. Other applications like fault diagnosis and trend analysis need to process historical data and perform deep analysis on generally heterogeneous sources of data. The Data Stream Warehousing (DSW) paradigm provides the means to handle both types of monitoring applications within a single system, providing fast and rich data analysis capabilities as well as data persistence. In this paper, we introduce DBStream, a novel online traffic monitoring system based on the DSW paradigm, which allows fast and flexible analysis across multiple heterogeneous data sources. DBStream provides a novel stream processing language for implementing data processing modules, as well as aggregation, filtering, and storage capabilities for further data analysis. We show multiple traffic monitoring applications running on DBStream, processing real traffic from operational ISPs.
IEEE Communications Magazine | 2014
Brian Trammell; Pedro Casas; Dario Rossi; Arian Bär; Zied Ben Houidi; Ilias Leontiadis; Tivadar Szemethy; Marco Mellia
The Internets universality is based on its decentralization and diversity. However, its distributed nature leads to operational brittleness and difficulty in identifying the root causes of performance and availability issues, especially when the involved systems span multiple administrative domains. The first step to address this fragmentation is coordinated measurement: we propose to complement the current Internets data and control planes with a measurement plane, or mPlane for short. mPlanes distributed measurement infrastructure collects and analyzes traffic measurements at a wide variety of scales to monitor the network status. Its architecture is centered on a flexible control interface, allowing the incorporation of existing measurement tools through lightweight mPlane proxy components, and offering dynamic support for new capabilities. A focus on automated, iterative measurement makes the platform well-suited to troubleshooting support. This is supported by a reasoning system, which applies machine learning algorithms to learn from success and failure in drilling down to the root cause of a problem. This article describes the mPlane architecture and shows its applicability to several distributed measurement problems involving content delivery networks and Internet service roviders. A first case study presents the tracking and iterative analysis of cache selection policies in Akamai, while a second example focuses on the cooperation between Internet service providers and content delivery networks to better orchestrate their traffic engineering decisions and jointly improve their performance.
european conference on networks and communications | 2014
Pedro Casas; Pierdomenico Fiadino; Arian Bär; Alessandro D'Alconzo; Alessandro Finamore; Marco Mellia
YouTube is the most popular service in todays Internet. Its own success forces Google to constantly evolve its functioning to cope with the ever growing number of users watching YouTube. Understanding the characteristics of YouTubes traffic as well as the way YouTube flows are served from the massive Google CDN is paramount for ISPs, specially for mobile operators, who must handle the huge surge of traffic with the capacity constraints of mobile networks. This papers presents a characterization of the YouTube traffic accessed through mobile and fixed-line networks. The analysis specially considers the YouTube content provisioning, studying the characteristics of the hosting servers as seen from both types of networks. To the best of our knowledge, this is the first paper presenting such a simultaneous characterization from mobile and fixed-line vantage points.
conference on network and service management | 2014
Pedro Casas; Alessandro D'Alconzo; Pierdomenico Fiadino; Arian Bär; Alessandro Finamore
YouTube is the most popular service in todays Internet. Google relies on its massive Content Delivery Network (CDN) to push YouTube videos as close as possible to the end-users to improve their Quality of Experience (QoE), using dynamic server selection strategies. Such traffic delivery policies can have a relevant impact on the traffic routed through the Internet Service Providers (ISPs) providing the access, but most importantly, they can have negative effects on the end-user QoE. In this paper we shed light on the problem of diagnosing QoE-based performance degradation events in YouTubes traffic. Through the analysis of one month of YouTube flow traces collected at the network of a large European ISP, we particularly identify and drill down a Googles CDN server selection policy negatively impacting the watching experience of YouTube users during several days at peak-load times. The analysis combines both the user-side perspective and the CDN perspective of the end-to-end YouTube delivery service to diagnose the problem. The main contributions of the paper are threefold: firstly, we provide a large-scale characterization of the YouTube service in terms of traffic characteristics and provisioning behavior of the Google CDN servers. Secondly, we introduce simple yet effective QoE-based KPIs to monitor YouTube videos from the end-user perspective. Finally and most important, we analyze and provide evidence of the occurrence of QoE-based YouTube anomalies induced by CDN server selection policies, which are somehow normally hidden from the common knowledge of the end-user. This is a main issue for ISPs, who see their reputation degrade when such events occur, even if Google is the culprit.
passive and active network measurement | 2014
Pedro Casas; Pierdomenico Fiadino; Arian Bär
Todays Internet is dominated by HTTP services and Content Delivery Networks (CDNs). Popular web services like Facebook and YouTube are hosted by highly distributed CDNs like Akamai and Google. Understanding this new complex Internet scenario is paramount for network operators, to control the traffic on their networks and to improve the quality experienced by their customers, specially when something goes wrong. This paper studies the most popular HTTP services and their underlying hosting networks, through the analysis of a full week of HTTP traffic traces collected at an operational mobile network.
international conference on communications | 2015
Arian Bär; Philippe Svoboda; Pedro Casas
Machine-to-Machine (M2M) network traffic is becoming highly relevant in nowadays cellular networks. The ever-increasing number of M2M devices is heavily modifying the traffic patterns observed in cellular networks, and the interest in discovering and tracking these devices is rapidly growing among operators. In this paper we introduce MTRAC, a complete approach for M2M TRAffic Classification, capable of discovering M2M devices from coarse-grained measurements. MTRAC uses different Machine Learning (ML) algorithms to unveil M2M devices in cellular networks. It relies on very simple traffic descriptors to characterize the communication patterns of each device. These descriptors are robust to traffic encryption techniques, and improve the portability of the MTRAC approach to other network scenarios. MTRAC is implemented on top of DBStream, a novel Data Stream Warehouse which allows to classify M2M devices in an on-line basis, using different temporal and logical traffic aggregations. We study the performance of MTRAC in the on-line classification of more than two months of traffic observed in a operational, nationwide cellular network, comparing different ML algorithms and different traffic aggregation techniques. To the best of our knowledge, MTRAC is the first ML-based approach for automatic M2M device classification in operational cellular networks.
international conference on data engineering | 2015
Arian Bär; Lukasz Golab; Stefan Ruehrup; Mirko Schiavone; Pedro Casas
Shared workload optimization is feasible if the set of tasks to be executed is known in advance, as is the case in updating a set of materialized views or executing an extract-transform-load workflow. In this paper, we consider data-intensive workloads with precedence constraints arising from data dependencies. While there has been previous work on identifying common subexpressions and task re-ordering to enable shared scans, in this paper we solve the problem of scheduling shared data-intensive workloads in a cache-oblivious way. Our solution relies on a novel formulation of precedence constrained scheduling with the additional constraint that once a data item is in the cache, all tasks that require this item should execute as soon as possible thereafter. We give an optimal algorithm using A* search over the space of possible orderings, and we propose efficient and effective heuristics that obtain nearly-optimal schedules in much less time. We present experimental results on real-life data warehouse workloads and the TCP-DS benchmark to validate our claims.
IEEE Transactions on Network and Service Management | 2014
Pedro Casas; Alessandro D'Alconzo; Pierdomenico Fiadino; Arian Bär; Alessandro Finamore; Tanja Zseby