Yingbo Song | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yingbo Song is active.

Explore More

Publication

Featured researches published by Yingbo Song.

network and distributed system security symposium | 2009

Spectrogram: A Mixture-of-Markov-Chains Model for Anomaly Detection in Web Traffic

Yingbo Song; Angelos D. Keromytis; Salvatore J. Stolfo

We present Spectrogram, a machine learning based statistical anomaly detection (AD) sensor for defense against web-layer code-injection attacks. These attacks include PHP file inclusion, SQL-injection and cross-sitescripting; memory-layer exploits such as buffer overflows are addressed as well. Statistical AD sensors offer the advantage of being driven by the data that is being protected and not by malcode samples captured in the wild. While models using higher order statistics can often improve accuracy, trade-offs with false-positive rates and model efficiency remain a limiting usability factor. This paper presents a newmodel and sensor framework that offers a favorable balance under this constraint and demonstrates improvement over some existing approaches. Spectrogram is a network situated sensor that dynamically assembles packets to reconstruct content flows and learns to recognize legitimate web-layer script input. We describe an efficient model for this task in the form of a mixture of Markovchains and derive the corresponding training algorithm. Our evaluations show significant detection results on an array of real world web layer attacks, comparing favorably against other AD approaches.

european conference on machine learning | 2007

Spectral Clustering and Embedding with Hidden Markov Models

Tony Jebara; Yingbo Song; Kapil Thadani

Clustering has recently enjoyed progress via spectral methods which group data using only pairwise affinities and avoid parametric assumptions. While spectral clustering of vector inputs is straightforward, extensions to structured data or time-series data remain less explored. This paper proposes a clustering method for time-series data that couples non-parametric spectral clustering with parametric hidden Markov models (HMMs). HMMs add some beneficial structural and parametric assumptions such as Markov properties and hidden state variables which are useful for clustering. This article shows that using probabilistic pairwise kernel estimates between parametric models provides improved experimental results for unsupervised clustering and visualization of real and synthetic datasets. Results are compared with a fully parametric baseline method (a mixture of hidden Markov models) and a non-parametric baseline method (spectral clustering with non-parametric time-series kernels).

recent advances in intrusion detection | 2009

Brave New World: Pervasive Insecurity of Embedded Network Devices

Ang Cui; Yingbo Song; Pratap V. Prabhu; Salvatore J. Stolfo

Embedded network devices have become an ubiquitous fixture in the modern home, office as well as in the global communication infrastructure. Devices like routers, NAS appliances, home entertainment appliances, wifi access points, web cams, VoIP appliances, print servers and video conferencing units reside on the same networks as our personal computers and enterprise servers and together form our world-wide communication infrastructure. Widely deployed and often misconfigured, they constitute highly attractive targets for exploitation. In this study we present the results of a vulnerability assessment of embedded network devices within the worlds largest ISPs and civilian networks, spanning North America, Europe and Asia. The observed data confirms the intuition that these devices are indeed vulnerable to trivial attacks and that such devices can be found throughout the world in large numbers.

Machine Learning | 2010

On the infeasibility of modeling polymorphic shellcode

Yingbo Song; Michael E. Locasto; Angelos Stavrou; Angelos D. Keromytis; Salvatore J. Stolfo

Current trends demonstrate an increasing use of polymorphism by attackers to disguise their exploits. The ability for malicious code to be easily, and automatically, transformed into semantically equivalent variants frustrates attempts to construct simple, easily verifiable representations for use in security sensors. In this paper, we present a quantitative analysis of the strengths and limitations of shellcode polymorphism, and describe the impact that these techniques have in the context of learning-based IDS systems. Our examination focuses on dual problems: shellcode encryption-based evasion methods and targeted “blending” attacks. Both techniques are currently being used in the wild, allowing real exploits to evade IDS sensors. This paper provides metrics to measure the effectiveness of modern polymorphic engines and provide insights into their designs. We describe methods to evade statistics-based IDS sensors and present suggestions on how to defend against them. Our experimental results illustrate that the challenge of modeling self-modifying shellcode by signature-based methods, and certain classes of statistical models, is likely an intractable problem.

ieee symposium on security and privacy | 2013

System Level User Behavior Biometrics using Fisher Features and Gaussian Mixture Models

Yingbo Song; Malek Ben Salem; Shlomo Hershkop; Salvatore J. Stolfo

We propose a machine learning-based method for biometric identification of user behavior, for the purpose of masquerade and insider threat detection. We designed a sensor that captures system-level events such as process creation, registry key changes, and file system actions. These measurements are used to represent a users unique behavior profile, and are refined through the process of Fisher feature selection to optimize their discriminative significance. Finally, a Gaussian mixture model is trained for each user using these features. We show that this system achieves promising results for user behavior modeling and identification, and surpasses previous works in this area.

Archive | 2007

On the Infeasibility of Modeling Polymorphic Shellcode for Signature Detection

Yingbo Song; Michael E. Locasto; Angelos Stavrou; Angelos D. Keromytis; Salvatore J. Stolfo

Polymorphic malcode remains one of the most troubling threats for information security and intrusion defense systems. The ability for malcode to be automatically transformed into to a semantically equivalent variant frustrates attempts to construct a single, simple, easily verifiable representation. We present a quantitative analysis of the strengths and limitations of shellcode polymorphism and consider the impact of this analysis on the current practices in intrusion detection. Our examination focuses on the nature of shellcode decoding routines, and the empirical evidence we gather illustrates our main result: that the challenge of modeling the class of self-modifying code is likely intractable – even when the size of the instruction sequence (i.e., the decoder) is relatively small. We develop metrics to gauge the power of polymorphic engines and use them to provide insight into the strengths and weaknesses of some popular engines. We believe this analysis supplies a novel and useful way to understand the limitations of the current generation of signature-based techniques. We analyze some contemporary polymorphic techniques, explore ways to improve them in order to forecast the nature of future threats, and present our suggestions for countermeasures. Our results indicate that the class of polymorphic behavior is too greatly spread and varied to model effectively. We conclude that modeling normal content is ultimately a more promising defense mechanism than modeling malicious or abnormal content.

ieee international conference on technologies for homeland security | 2011

Behavior-based network traffic synthesis

Yingbo Song; Salvatore J. Stolfo; Tony Jebara

Modern network security research has demonstrated a clear necessity for open sharing of traffic datasets between organizations - a need that has so far been superseded by the challenges of removing sensitive content from the data beforehand. Network Data Anonymization is an emerging field dedicated to solving this problem, with a main focus on removal of identifiable artifacts that might pierce privacy, such as usernames and IP addresses. However, recent research has demonstrated that more subtle statistical artifacts may yield fingerprints that are just as differen-tiable as the former. This result highlights certain shortcomings in current anonymization frameworks; particularly, ignoring the behavioral idiosyncrasies of network protocols, applications, and users. Network traffic synthesis (or simulation) is a closely related complimentary approach which, while more difficult to execute accurately, has the potential for far greater flexibility. This paper leverages the statistical-idiosyncrasies of network behavior to augment anonymization and traffic-synthesis techniques through machine-learning models specifically designed to capture host-level behavior. We present the design of a system that can automatically learn models for network host behavior across time, then use these models to replicate the original behavior, to interpolate across gaps in the original traffic, and demonstrate how to generate new diverse behaviors. Further, we measure the similarity of the synthesized data to the original, providing us with a quantifiable estimate of data fidelity.

Archive | 2011

Markov Models for Network-Behavior Modeling and Anonymization

Yingbo Song; Salvatore J. Stolfo; Tony Jebara

Modern network security research has demonstrated a clear need for open sharing of traffic datasets between organizations, a need that has so far been superseded by the challenge of removing sensitive content beforehand. Network Data Anonymization (NDA) is emerging as a field dedicated to this problem, with its main direction focusing on removal of identifiable artifacts that might pierce privacy, such as usernames and IP addresses. However, recent research has demonstrated that more subtle statistical artifacts, also present, may yield fingerprints that are just as differentiable as the former. This result highlights certain shortcomings in current anonymization frameworks – particularly, ignoring the behavioral idiosyncrasies of network protocols, applications, and users. Recent anonymization results have shown that the extent to which utility and privacy can be obtained is mainly a function of the information in the data that one is aware and not aware of. This paper leverages the predictability of network behavior in our favor to augment existing frameworks through a new machine-learning-driven anonymization technique. Our approach uses the substitution of individual identities with group identities where members are divided based on behavioral similarities, essentially providing anonymityby-crowds in a statistical mix-net. We derive time-series models for network traffic behavior which quantifiably models the discriminative features of network ”behavior” and introduce a kernelbased framework for anonymity which fits together naturally with network-data modeling.

mobile computing applications and services | 2016

You Are What You Use: An Initial Study of Authenticating Mobile Users via Application Usage

Jonathan Voris; Yingbo Song; Malek Ben Salem; Salvatore J. Stolfo

Mobile smartphone devices are vulnerable to masquerade attacks because they can be easily lost or stolen. This paper introduces a technique for detecting unauthorized users by modeling the legitimate users typical behavior when using their mobile phone. The users behavior model augments typical authentication mechanisms (such as PINs or fingerprints) to provide continuous authentication while a device is in use. A preliminary human user study was conducted in order to assess the viability of our application usage oriented authentication approach. The results of our initial experiment demonstrate that our system is capable of detecting an unauthorized user within 2 minutes.

Archive | 2012

A behavior-based approach towards statistics-preserving network trace anonymization

Salvatore J. Stolfo; Yingbo Song

In modern network measurement research, there exists a clear and demonstrable need for open sharing of large-scale network traffic datasets between organizations. Beyond network measurement, many security-related fields, such as those focused on detecting new exploits or worm outbreaks, stand to benefit given the ability to easily correlate information between several different sources. Currently, the primary factor limiting such sharing is the risk of disclosing private information. While prior anonymization work has focused on traffic content, analysis based on statistical behavior patterns within network traffic has, so far, been under-explored. This thesis proposes a new behavior-based approach towards network trace source-anonymization, motivated by the concept of anonymity-by-crowds, and conditioned on the statistical similarity in host behavior. Novel time-series models for network traffic and kernel metrics for similarity are derived, and the problem is framed such that anonymity and statistics-preservation are congruent objectives in an unsupervised-learning problem. Source-anonymity is connected directly to the group size and homogeneity under this approach, and metrics for these properties are derived. Optimal segmentation of the population into anonymized groups is approximated with a graph-partitioning problem where maximization of this anonymity metric is an intrinsic property of the solution. Algorithms that guarantee a minimum anonymity-set size are presented, as well as novel techniques for behavior visualization and compression. Empirical evaluations on a range of network traffic datasets show significant advantages in both accuracy and runtime over similar solutions.

Explore More