Marc Ph. Stoecklin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marc Ph. Stoecklin is active.

Explore More

Publication

Featured researches published by Marc Ph. Stoecklin.

IEEE Transactions on Network and Service Management | 2009

Histogram-based traffic anomaly detection

Andreas Kind; Marc Ph. Stoecklin; Xenofontas A. Dimitropoulos

Identifying network anomalies is essential in enterprise and provider networks for diagnosing events, like attacks or failures, that severely impact performance, security, and Service Level Agreements (SLAs). Feature-based anomaly detection models (ab)normal network traffic behavior by analyzing different packet header features, like IP addresses and port numbers. In this work, we describe a new approach to feature-based anomaly detection that constructs histograms of different traffic features, models histogram patterns, and identifies deviations from the created models. We assess the strengths and weaknesses of many design options, like the utility of different features, the construction of feature histograms, the modeling and clustering algorithms, and the detection of deviations. Compared to previous feature-based anomaly detection approaches, our work differs by constructing detailed histogram models, rather than using coarse entropy-based distribution approximations. We evaluate histogram-based anomaly detection and compare it to previous approaches using collected network traffic traces. Our results demonstrate the effectiveness of our technique in identifying a wide range of anomalies. The assessed technical details are generic and, therefore, we expect that the derived insights will be useful for similar future research efforts.

passive and active network measurement | 2009

On the 95-Percentile Billing Method

Xenofontas A. Dimitropoulos; Paul Hurley; Andreas Kind; Marc Ph. Stoecklin

The 95-percentile method is used widely for billing ISPs and websites. In this work, we characterize important aspects of the 95-percentile method using a large set of traffic traces. We first study how the 95-percentile depends on the aggregation window size. We observe that the computed value often follows a noisy decreasing trend along a convex curve as the window size increases. We provide theoretical justification for this dependence using the self-similar model for Internet traffic and discuss observed more complex dependencies in which the 95-percentile increases with the window size. Secondly, we quantify how variations on the window size affect the computed 95-percentile. In our experiments, we find that reasonable differences in the window size can account for an increase between 4.1% and 42.5% in the monthly bill of medium and low-volume sites. In contrast, for sites with average traffic rates above 10Mbps the fluctuation of the 95-percentile is bellow 2.9%. Next, we focus on the use of flow data in hosting environments for billing individual sites. We describe the byte-shifting effect introduced by flow aggregation and quantify how it can affect the computed 95-percentile. We find that in our traces it can both decrease and increase the computed 95-percentile with the largest change being a decrease of 9.3%.

passive and active network measurement | 2008

A two-layered anomaly detection technique based on multi-modal flow behavior models

Marc Ph. Stoecklin; Jean-Yves Le Boudec; Andreas Kind

We present a novel technique to detect traffic anomalies based on network flow behavior in different traffic features. Based on the observation that a network has multiple behavior modes, we estimate the modes in each feature component and extract their model parameters during a learning phase. Observed network behavior is then compared to the baseline models by means of a two-layered distance computation: first, component-wise anomaly indices and second, a global anomaly index for each traffic feature enable effective detection of aberrant behavior. Our technique supports on-line detection and incorporation of administrator feedback and does not make use of explicit prior knowledge about normal and abnormal traffic. We expect benefits from the modeling and detection strategy chosen to reliably expose abnormal events of diverse nature at both detection layers while being resilient to seasonal effects. Experiments on simulated and real network traces confirm our expectations in detecting true anomalies without increasing the false positive rate. A comparison of our technique with entropy-and histogram-based approaches demonstrates its ability to reveal anomalies that disappear in the background noise of output signals from these techniques.

Computer Networks | 2008

The eternal sunshine of the sketch data structure

Xenofontas A. Dimitropoulos; Marc Ph. Stoecklin; Paul Hurley; Andreas Kind

In the past years there has been significant research on developing compact data structures for summarizing large data streams. A family of such data structures is the so-called sketches. Sketches bear similarities to the well-known Bloom filters [B.H. Bloom, Space/time trade-offs in hash coding with allowable errors, Communications of ACM, 13 (7) (1970), 422-426] and employ hashing techniques to approximate the count associated with an arbitrary key in a data stream using fixed memory resources. One limitation of sketches is that when used for summarizing long data streams, they gradually saturate, resulting in a potentially large error on estimated key counts. In this work, we introduce two techniques to address this problem based on the observation that real-world data streams often have many transient keys that appear for short time periods and do not re-appear later on. After entering the data structure, these keys contribute to hashing collisions and thus reduce the estimation accuracy of sketches. Our techniques use a limited amount of additional memory to detect transient keys and to periodically remove their hashed values from the sketch. In this manner the number of keys hashed into a sketch decreases, and as a result the frequency of hashing collisions and the estimation error are reduced. Our first technique in effect slows down the saturation process of a sketch, whereas our second technique completely prevents a sketch from saturating. We demonstrate the performance improvements of our techniques analytically as well as experimentally. Our evaluation results using real network traffic traces show a reduction in the collision rate ranging between 26.1% and 98.2% and even higher savings in terms of estimation accuracy compared to a state-of-the-art sketch data structure. To our knowledge this is the first work to look into the problem of improving the accuracy of sketches by mitigating their saturation process.

Ibm Journal of Research and Development | 2016

Security 360°: Enterprise security for the cognitive era

Josyula R. Rao; Suresh Chari; Dimitrios Pendarakis; Reiner Sailer; Marc Ph. Stoecklin; Wilfried Teiken; Andreas Wespi

The shift in the IT industry to a computing platform characterized by data-driven insights, cloud-based computing, and engagement via mobile and social platforms heralds the dawn of the cognitive era in computing. Historically, during the transition from the mainframe to the PC and web eras, we have witnessed how security innovation has followed the industry platform shift, and the shift to a cognitive platform is no exception. Enterprises are looking for a new security model and paradigm that will enable them to confront and cope with the multitude of security challenges facing them. Based on our experiences with using big data techniques for security intelligence (starting in 2007), especially in customer environments, we have developed an operational model of security for securing enterprises, cloud environments, and the critical infrastructure. In this paper, we introduce Security 360°, a contextual, cognitive, and adaptive approach to protecting mission-critical assets, and illustrate a top-down approach to security that forms the basis of security research at IBM today.

international conference on wireless communications and mobile computing | 2010

A flow trace generator using graph-based traffic classification techniques

Peter Siska; Marc Ph. Stoecklin; Andreas Kind; Torsten Braun

We propose a novel methodology to generate realistic network flow traces to enable systematic evaluation of network monitoring systems in various traffic conditions. Our technique uses a graph-based approach to model the communication structure observed in real-world traces and to extract traffic templates. By combining extracted and user-defined traffic templates, realistic network flow traces that comprise normal traffic and customized conditions are generated in a scalable manner. A proof-of-concept implementation demonstrates the utility and simplicity of our method to produce a variety of evaluation scenarios. We show that the extraction of templates from real-world traffic leads to a manageable number of templates that still enable accurate re-creation of the original communication properties on the network flow level.

conference on emerging network experiment and technology | 2006

Anomaly detection by finding feature distribution outliers

Marc Ph. Stoecklin

In our project we are developing a technique to detect traffic anomalies based on network flow behavior. We estimate baseline distributions for meaningful traffic features and derive measures of legitimate deviations thereof. Observed network behavior is then compared to the baseline behavior by means of a symmetrized version of the Kullback-Leibler divergence. The achieved dimension reduction enables effective outlier detection to flag deviations from the legitimate behavior with high precision. Our technique supports online training and provides enough information to efficiently classify observed anomalies and allows in-depth analysis on demand. First measurements confirm its resilience to seasonal effects while detecting abnormal behavior reliably.

international conference on data engineering | 2015

FCCE: Highly scalable distributed Feature Collection and Correlation Engine for low latency big data analytics

Douglas Lee Schales; Xin Hu; Jiyong Jang; Reiner Sailer; Marc Ph. Stoecklin; Ting Wang

In this paper, we present the design, architecture, and implementation of a novel analysis engine, called Feature Collection and Correlation Engine (FCCE), that finds correlations across a diverse set of data types spanning over large time windows with very small latency and with minimal access to raw data. FCCE scales well to collecting, extracting, and querying features from geographically distributed large data sets. FCCE has been deployed in a large production network with over 450,000 workstations for 3 years, ingesting more than 2 billion events per day and providing low latency query responses for various analytics. We explore two security analytics use cases to demonstrate how we utilize the deployment of FCCE on large diverse data sets in the cyber security domain: 1) detecting fluxing domain names of potential botnet activity and identifying all the devices in the production network querying these names, and 2) detecting advanced persistent threat infection. Both evaluation results and our experience with real-world applications show that FCCE yields superior performance over existing approaches, and excels in the challenging cyber security domain by correlating multiple features and deriving security intelligence.

international conference on distributed computing systems | 2016

BotMeter: Charting DGA-Botnet Landscapes in Large Networks

Ting Wang; Xin Hu; Jiyong Jang; Shouling Ji; Marc Ph. Stoecklin; Teryl Taylor

Recent years have witnessed a rampant use of domain generation algorithms (DGAs) in major botnet crimewares, which tremendously strengthens a botnets capability to evade detection or takedown. Despite a plethora of existing studies on detecting DGA-generated domains in DNS traffic, remediating such threats still relies on vetting the DNS behavior of each individual device. Yet, in large networks featuring complicated DNS infrastructures, we often lack the capability or the resource to exhaustively investigate every part of the networks to identify infected devices in a timely manner. It is therefore of great interest to first assess the population distribution of DGA-bots inside the networks and to prioritize the remediation efforts. In this paper, we present BotMeter, a novel tool that accurately charts the DGA-bot population landscapes in large networks. Specifically, we embrace the prevalent yet challenging setting of hierarchical DNS infrastructures with caching and forwarding mechanisms enabled, whereas DNS traffic is observable only at certain upper-level vantage points. We establish a new taxonomy of DGAs that captures their characteristic DNS dynamics. This allows us to develop a rich library of rigorous analytical models to describe the complex relationships between bot populations and DNS lookups observed at vantage points. We provide results from extensive empirical studies using both synthetic data and real DNS traces to validate the efficacy of BotMeter.

information reuse and integration | 2014

Stream computing for large-scale, multi-channel cyber threat analytics

Douglas Lee Schales; Mihai Christodorescu; Xin Hu; Jiyong Jang; Josyula R. Rao; Reiner Sailer; Marc Ph. Stoecklin; Wietse Z. Venema; Ting Wang

The cyber threat landscape, controlled by organized crime and nation states, is evolving rapidly towards evasive, multi-channel attacks, as impressively shown by malicious operations such as GhostNet, Aurora, Stuxnet, Night Dragon, or APT1. As threats blend across diverse data channels, their detection requires scalable distributed monitoring and cross-correlation with a substantial amount of contextual information. With threats evolving more rapidly, the classical defense life cycle of post-mortem detection, analysis, and signature creation becomes less effective. In this paper, we present a highly-scalable, dynamic cybersecurity analytics platform extensible at runtime. It is specifically designed and implemented to deliver generic capabilities as a basis for future cybersecurity analytics that effectively detect threats across multiple data channels while recording relevant context information, and that support automated learning and mining for new and evolving malware behaviors. Our implementation is based on stream computing middleware that has proven high scalability, and that enables cross-correlation and analysis of millions of events per second with millisecond latency. We report the lessons we have learned from applying stream computing to monitoring malicious activity across multiple data channels (e.g., DNS, NetFlow, ARP, DHCP, HTTP) in a production network of about fifteen thousand nodes.

Explore More