Andrey Sapegin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrey Sapegin is active.

Explore More

Publication

Featured researches published by Andrey Sapegin.

information assurance and security | 2013

Hierarchical object log format for normalisation of security events

Andrey Sapegin; David Jaeger; Amir Azodi; Marian Gawron; Feng Cheng; Christoph Meinel

The differences in log file formats employed in a variety of services and applications remain to be a problem for security analysts and developers of intrusion detection systems. The proposed solution, i.e. the usage of common log formats, has a limited utilization within existing solutions for security management. In our paper, we reveal the reasons for this limitation. We show disadvantages of existing common log formats for normalisation of security events. To deal with it we have created a new log format that fits for intrusion detection purposes and can be extended easily. Taking previous work into account, we would like to propose a new format as an extension to existing common log formats, rather than a standalone specification.

Computer Communications | 2013

On the extent of correlation in BGP updates in the Internet and what it tells us about locality of BGP routing events

Andrey Sapegin; Steve Uhlig

The Border Gateway Protocol (BGP) is the core routing protocol in the Internet. It maintains reachability information towards IP networks, called prefixes. The adoption of BGP has come at a price: a steady growth in the routing table size (Meng et al., 2005) [1] as well as BGP updates (Cittadini et al., 2010) [2]. In this work, we take a different look at BGP updates, by quantifying the amount of prefix correlation in the BGP updates received by different routers in the Internet. We design a method to classify sets of BGP updates, called spikes, into either correlated or non-correlated, by comparing streams of BGP updates from multiple vantage points. Based on publicly available data, we show that a significant fraction of all BGP updates are correlated. Most of these correlated spikes contain updates for a few BGP prefixes only. When studying the topological scope of the correlated spikes, we find that they are relatively global given the limited AS hop distance between most ASs in the Internet, i.e., they propagate at least 2 or 3 AS hops away. Most BGP updates visible from publicly available vantage points are therefore related to small events that propagate across multiple AS hops in the Internet, while a limited fraction of the BGP updates appear in large bursts that stay mostly localised. Our results shed light on a fundamental while often misunderstood aspect of BGP, namely the correlation between BGP updates and how it impacts our beliefs about the share of local and global BGP events in the Internet. Our work differs from the literature in that we try as much as possible to explicitly account in our methodology for the visibility of BGP vantage points, and its implication on the actual claims that can be made from the data.

International Conference on Passwords | 2014

Gathering and Analyzing Identity Leaks for Security Awareness

David Jaeger; Hendrik Graupner; Andrey Sapegin; Feng Cheng; Christoph Meinel

The amount of identity data leaks in recent times is drastically increasing. Not only smaller web services, but also established technology companies are affected. However, it is not commonly known, that incidents covered by media are just the tip of the iceberg. Accordingly, more detailed investigation of not just publicly accessible parts of the web but also deep web is imperative to gain greater insight into the large number of data leaks. This paper presents methods and experiences of our deep web analysis. We give insight in commonly used platforms for data exposure, formats of identity related data leaks, and the methods of our analysis. On one hand a lack of security implementations among Internet service providers exists and on the other hand users still tend to generate and reuse weak passwords. By publishing our results we aim to increase awareness on both sides and the establishment of counter measures.

Concurrency and Computation: Practice and Experience | 2017

Evaluation of in‐memory storage engine for machine learning analysis of security events

Andrey Sapegin; Marian Gawron; David Jaeger; Feng Cheng; Christoph Meinel

Modern security information and event management systems should be capable to store and process high amount of events or log messages in different formats and from different sources. This requirement often prevents such systems from usage of computational heavy algorithms for security analysis. To deal with this issue, we built our system based on an in‐memory database with an integrated machine learning library, namely, SAP HANA. Three approaches, that is, (1) deep normalisation of log messages, (2) storing data in the main memory and (3) running data analysis directly in the database, allow us to increase processing speed in such a way that machine learning analysis of security events becomes possible nearly in real time. Besides that, we developed a universal anomaly detection algorithm, which uses vector space model to represent and cluster textual log messages. Together with deep normalisation approach, this algorithm solves the problem of correlation for heterogenous security events containing many text fields. To prove our concepts, we measured the processing speed for the developed system on the data generated using Active Directory testbed, compared it with classical system architecture based on PostgreSQL database and showed the efficiency of our approach for high‐speed analysis of security events. Copyright

international symposium on parallel and distributed computing | 2015

High-Speed Security Analytics Powered by In-Memory Machine Learning Engine

Andrey Sapegin; Marian Gawron; David Jaeger; Feng Cheng; Christoph Meinel

Modern Security Information and Event Management systems should be capable to store and process high amount of events or log messages in different formats and from different sources. This requirement often prevents such systems from usage of computational-heavy algorithms for security analysis. To deal with this issue, we built our system based on an in-memory data base with an integrated machine learning library, namely SAP HANA. Three approaches, i.e. (1) deep normalisation of log messages (2) storing data in the main memory and (3) running data analysis directly in the database, allow us to increase processing speed in such a way, that machine learning analysis of security events becomes possible nearly in real-time. To prove our concepts, we measured the processing speed for the developed system on the data generated using Active Directory tested and showed the efficiency of our approach for high-speed analysis of security events.

Computers & Security | 2017

Towards a system for complex analysis of security events in large-scale networks

Andrey Sapegin; David Jaeger; Feng Cheng; Christoph Meinel

After almost two decades of development, modern Security Information and Event Management (SIEM) systems still face issues with normalisation of heterogeneous data sources, high number of false positive alerts and long analysis times, especially in large-scale networks with high volumes of security events. In this paper, we present our own prototype of SIEM system, which is capable of dealing with these issues. For efficient data processing, our system employs in-memory data storage (SAP HANA) and our own technologies from the previous work, such as the Object Log Format (OLF) and high-speed event normalisation. We analyse normalised data using a combination of three different approaches for security analysis: misuse detection, query-based analytics, and anomaly detection. Compared to the previous work, we have significantly improved our unsupervised anomaly detection algorithms. Most importantly, we have developed a novel hybrid outlier detection algorithm that returns ranked clusters of anomalies. It lets an operator of a SIEM system to concentrate on the several top-ranked anomalies, instead of digging through an unsorted bundle of suspicious events. We propose to use anomaly detection in a combination with signatures and queries, applied on the same data, rather than as a full replacement for misuse detection. In this case, the majority of attacks will be captured with misuse detection, whereas anomaly detection will highlight previously unknown behaviour or attacks. We also propose that only the most suspicious event clusters need to be checked by an operator, whereas other anomalies, including false positive alerts, do not need to be explicitly checked if they have a lower ranking. We have proved our concepts and algorithms on a dataset of 160 million events from a network segment of a big multinational company and suggest that our approach and methods are highly relevant for modern SIEM systems.

international conference on health informatics | 2015

Implementation of Data Security Requirements in a Web-based Application for Interactive Medical Documentation

Anja Perlich; Andrey Sapegin; Christoph Meinel

Keeping data confidential is a deeply rooted requirement in medical documentation. However, there are increasing calls for patient transparency in medical record documentation. With Tele-Board MED, an interactive system for joint documentation of doctor and patient is developed. This web-based application designed for digital whiteboards will be tested in treatment sessions with psychotherapy patients and therapists. In order to ensure the security of patient data, security measures were implemented and they are illustrated in this paper. We followed the major information security objectives: confidentiality, integrity, availability and accountability. Next to technical aspects, such as data encryption, access restriction through firewall and password, and measures for remote maintenance, we address issues at organizational and infrastructural levels as well (e.g., patients’ access to notes). With this paper we want to increase the awareness of information security, and promote a security conception from the beginning of health software research projects. The measures described in this paper can serve as an example for other health software applications dealing with sensitive patient data, from early user testing phases on.

MSPN 2015 Selected Papers of the First International Conference on Mobile, Secure, and Programmable Networking - Volume 9395 | 2015

Poisson-Based Anomaly Detection for Identifying Malicious User Behaviour

Andrey Sapegin; Aragats Amirkhanyan; Marian Gawron; Feng Cheng; Christoph Meinel

Nowadays, malicious user behaviour that does not trigger access violation or alert of data leak is difficult to be detected. Using the stolen login credentials the intruder doing espionage will first try to stay undetected: silently collect data from the company network and use only resources he is authorised to access. To deal with such cases, a Poisson-based anomaly detection algorithm is proposed in this paper. Two extra measures make it possible to achieve high detection rates and meanwhile reduce number of false positive alerts: 1 checking probability first for the group, and then for single users and 2 selecting threshold automatically. To prove the proposed approach, we developed a special simulation testbed that emulates user behaviour in the virtual network environment. The proof-of-concept implementation has been integrated into our prototype of a SIEM system -- Real-time Event Analysis and Monitoring System, where the emulated Active Directory logs from Microsoft Windows domain are extracted and normalised into Object Log Format for further processing and anomaly detection. The experimental results show that our algorithm was able to detect all events related to malicious activity and produced zero false positive results. Forethought as the module for our self-developed SIEM system based on the SAP HANA in-memory database, our solution is capable of processing high volumes of data and shows high efficiency on experimental dataset.

MSPN 2015 Selected Papers of the First International Conference on Mobile, Secure, and Programmable Networking - Volume 9395 | 2015

Leveraging Event Structure for Adaptive Machine Learning on Big Data Landscapes

Amir Azodi; Marian Gawron; Andrey Sapegin; Feng Cheng; Christoph Meinel

Modern machine learning techniques have been applied to many aspects of network analytics in order to discover patterns that can clarify or better demonstrate the behavior of users and systems within a given network. Often the information to be processed has to be converted to a different type in order for machine learning algorithms to be able to process them. To accurately process the information generated by systems within a network, the true intention and meaning behind the information must be observed. In this paper we propose different approaches for mapping network information such as IP addresses to integer values that attempts to keep the relation present in the original format of the information intact. With one exception, all of the proposed mappings result in at most 64 bit long outputs in order to allow atomic operations using CPUs with 64 bit registers. The mapping output size is restricted in the interest of performance. Additionally we demonstrate the benefits of the new mappings for one specific machine learning algorithm k-means and compare the algorithms results for datasets with and without the proposed transformations.

international performance computing and communications conference | 2015

Parallel and distributed normalization of security events for instant attack analysis

David Jaeger; Andrey Sapegin; Martin Ussath; Feng Cheng; Christoph Meinel

When looking at media reports nowadays, major security breaches of big companies and governments seem to be a normal situation. An important step for the investigation or even prevention of these breaches is to normalize and analyze security-related log events from various systems in the target network. However, the number of log events produced in big IT landscapes can grow up to multiple billions per day. Current log management solutions, e.g., Security Information and Event Management (SIEM), cannot even closely normalize such huge amounts of data and therefore disable the tracking of attacks in real-time, which means that the log data remains unusable for attack analysis. In this paper, we present an approach to fully normalize event logs in high-speed by making use of established high-performance inter-thread messaging in conjunction with a hierarchical knowledge-base of log formats and parallel processing on multiple low-end systems. Using our approach, we are able to process more than 250,000 events/sec on relatively low-profile machines and can therefore easily handle more than 20 billion events/day, which is enough to handle average and peek loads of log events from big enterprise networks.

Explore More