Jiyong Jang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jiyong Jang is active.

Explore More

Publication

Featured researches published by Jiyong Jang.

international conference on data engineering | 2015

FCCE: Highly scalable distributed Feature Collection and Correlation Engine for low latency big data analytics

Douglas Lee Schales; Xin Hu; Jiyong Jang; Reiner Sailer; Marc Ph. Stoecklin; Ting Wang

In this paper, we present the design, architecture, and implementation of a novel analysis engine, called Feature Collection and Correlation Engine (FCCE), that finds correlations across a diverse set of data types spanning over large time windows with very small latency and with minimal access to raw data. FCCE scales well to collecting, extracting, and querying features from geographically distributed large data sets. FCCE has been deployed in a large production network with over 450,000 workstations for 3 years, ingesting more than 2 billion events per day and providing low latency query responses for various analytics. We explore two security analytics use cases to demonstrate how we utilize the deployment of FCCE on large diverse data sets in the cyber security domain: 1) detecting fluxing domain names of potential botnet activity and identifying all the devices in the production network querying these names, and 2) detecting advanced persistent threat infection. Both evaluation results and our experience with real-world applications show that FCCE yields superior performance over existing approaches, and excels in the challenging cyber security domain by correlating multiple features and deriving security intelligence.

international conference on distributed computing systems | 2016

BotMeter: Charting DGA-Botnet Landscapes in Large Networks

Ting Wang; Xin Hu; Jiyong Jang; Shouling Ji; Marc Ph. Stoecklin; Teryl Taylor

Recent years have witnessed a rampant use of domain generation algorithms (DGAs) in major botnet crimewares, which tremendously strengthens a botnets capability to evade detection or takedown. Despite a plethora of existing studies on detecting DGA-generated domains in DNS traffic, remediating such threats still relies on vetting the DNS behavior of each individual device. Yet, in large networks featuring complicated DNS infrastructures, we often lack the capability or the resource to exhaustively investigate every part of the networks to identify infected devices in a timely manner. It is therefore of great interest to first assess the population distribution of DGA-bots inside the networks and to prioritize the remediation efforts. In this paper, we present BotMeter, a novel tool that accurately charts the DGA-bot population landscapes in large networks. Specifically, we embrace the prevalent yet challenging setting of hierarchical DNS infrastructures with caching and forwarding mechanisms enabled, whereas DNS traffic is observable only at certain upper-level vantage points. We establish a new taxonomy of DGAs that captures their characteristic DNS dynamics. This allows us to develop a rich library of rigorous analytical models to describe the complex relationships between bot populations and DNS lookups observed at vantage points. We provide results from extensive empirical studies using both synthetic data and real DNS traces to validate the efficacy of BotMeter.

international symposium on information theory | 2015

Rateless and pollution-attack-resilient network coding

Wentao Huang; Ting Wang; Xin Hu; Jiyong Jang; Theodoros Salonidis

Consider the problem of reliable multicast over a network in the presence of adversarial errors. In contrast to traditional network error correction codes designed for a given network capacity and a given number of errors, we study an arguably more realistic setting that prior knowledge on the network and adversary parameters is not available. For this setting we propose efficient and throughput-optimal error correction schemes, provided that the source and terminals share randomness that is secret form the adversary. We discuss an application of cryptographic pseudorandom generators to efficiently produce the secret randomness, provided that a short key is shared between the source and terminals. Finally we present a secure key distribution scheme for our network setting.

information reuse and integration | 2014

Stream computing for large-scale, multi-channel cyber threat analytics

Douglas Lee Schales; Mihai Christodorescu; Xin Hu; Jiyong Jang; Josyula R. Rao; Reiner Sailer; Marc Ph. Stoecklin; Wietse Z. Venema; Ting Wang

The cyber threat landscape, controlled by organized crime and nation states, is evolving rapidly towards evasive, multi-channel attacks, as impressively shown by malicious operations such as GhostNet, Aurora, Stuxnet, Night Dragon, or APT1. As threats blend across diverse data channels, their detection requires scalable distributed monitoring and cross-correlation with a substantial amount of contextual information. With threats evolving more rapidly, the classical defense life cycle of post-mortem detection, analysis, and signature creation becomes less effective. In this paper, we present a highly-scalable, dynamic cybersecurity analytics platform extensible at runtime. It is specifically designed and implemented to deliver generic capabilities as a basis for future cybersecurity analytics that effectively detect threats across multiple data channels while recording relevant context information, and that support automated learning and mining for new and evolving malware behaviors. Our implementation is based on stream computing middleware that has proven high scalability, and that enables cross-correlation and analysis of millions of events per second with millisecond latency. We report the lessons we have learned from applying stream computing to monitoring malicious activity across multiple data channels (e.g., DNS, NetFlow, ARP, DHCP, HTTP) in a production network of about fifteen thousand nodes.

ieee international conference computer and communications | 2016

Hunting for invisibility: Characterizing and detecting malicious web infrastructures through server visibility analysis

Jialong Zhang; Xin Hu; Jiyong Jang; Ting Wang; Guofei Gu; Marc Ph. Stoecklin

Nowadays, cyber criminals often build web infrastructures rather than a single server to conduct their malicious activities. In order to continue their malevolent activities without being detected, cyber criminals make efforts to conceal the core servers (e.g., C&C servers, exploit servers, and drop-zone servers) in the malicious web infrastructure. Such deliberate invisibility of those concealed malicious servers, however, makes them particularly distinguishable from benign web servers that are usually promoted to be public. In this paper, we conduct the first large-scale measurement study to investigate the visibility of both malicious and benign servers. From our intensive analysis of over 100,000 benign servers, 45,000 malicious servers and 40,000 redirections, we identify a set of distinct features of malicious web infrastructures from their locations, structures, roles, and relationships perspectives, and propose a lightweight yet effective detection system called VisHunter. VisHunter identifies malicious redirections from visible servers to invisible servers at the entryway of malicious web infrastructures. We evaluate VisHunter on both online public data and large-scale enterprise network traffic, and demonstrate that VisHunter can achieve an average 96.2% detection rate with only 0.9% false positive rate on the real enterprise network traffic.

dependable systems and networks | 2016

BAYWATCH: Robust Beaconing Detection to Identify Infected Hosts in Large-Scale Enterprise Networks

Xin Hu; Jiyong Jang; Marc Ph. Stoecklin; Ting Wang; Douglas Lee Schales; Dhilung Kirat; Josyula R. Rao

Sophisticated cyber security threats, such as advanced persistent threats, rely on infecting end points within a targeted security domain and embedding malware. Typically, such malware periodically reaches out to the command and control infrastructures controlled by adversaries. Such callback behavior, called beaconing, is challenging to detect as (a) detection requires long-term temporal analysis of communication patterns at several levels of granularity, (b) malware authors employ various strategies to hide beaconing behavior, and (c) it is also employed by legitimate applications (such as updates checks). In this paper, we develop a comprehensive methodology to identify stealthy beaconing behavior from network traffic observations. We use an 8-step filtering approach to iteratively refine and eliminate legitimate beaconing traffic and pinpoint malicious beaconing cases for in-depth investigation and takedown. We provide a systematic evaluation of our core beaconing detection algorithm and conduct a large-scale evaluation of web proxy data (more than 30 billion events) collected over a 5-month period at a corporate network comprising over 130,000 end-user devices. Our findings indicate that our approach reliably exposes malicious beaconing behavior, which may be overlooked by traditional security mechanisms.

Ibm Journal of Research and Development | 2016

Scalable malware classification with multifaceted content features and threat intelligence

Xin Hu; Jiyong Jang; Ting Wang; Z. Ashraf; Marc Ph. Stoecklin; Dhilung Kirat

Recent years have witnessed the very rapid increase in both the volume and sophistication of malware programs. Malware authors invest heavily in technologies and capabilities to streamline the process of building and mutating existing malware programs to evade traditional protection. One major challenge currently faced by the antivirus industry is to efficiently process the vast amount of incoming suspicious samples. Since most new malware is a variation of an existing malware family with the same forms of malicious behavior, automatic clustering and classification of malware programs into families have become valuable tools for malware analysts. Such grouping criteria not only allow analysts to prioritize the allocation of their investigation efforts but may also be applied to detect new malware samples based on their association with existing families. In this paper, we address the multi-class malware classification challenge from a scalability perspective. We present the design, development, and evaluation of a novel machine learning classifier trained on multifaceted content features (e.g., instruction sequences, strings, section information, and other malware features) as well as threat intelligence gathered from external sources (e.g., antivirus output). Our experiments on a dataset of 21,741 malware samples demonstrate the efficacy and precision of the proposed algorithm and also provide insights into the utility of various features.

recent advances in intrusion detection | 2018

Error-Sensor: Mining Information from HTTP Error Traffic for Malware Intelligence

Jialong Zhang; Jiyong Jang; Guofei Gu; Marc Ph. Stoecklin; Xin Hu

Malware often encounters network failures when it launches malicious activities, such as connecting to compromised servers that have been already taken down, connecting to malicious servers that are blocked based on access control policies in enterprise networks, or scanning/exploiting vulnerable web pages. To overcome such failures and improve the resilience in light of such failures, malware authors have employed various strategies, e.g., connecting to multiple backup servers or connecting to benign servers for initial network connectivity checks. These network failures and recovery strategies lead to distinguishing traits, which are newly discovered and thoroughly studied in this paper. We note that network failures caused by malware are quite different from the failures caused by benign users/software in terms of their failure patterns and recovery behavior patterns.

computer and communications security | 2018

Threat Intelligence Computing

Xiaokui Shu; Frederico Araujo; Douglas Lee Schales; Marc Ph. Stoecklin; Jiyong Jang; Heqing Huang; Josyula R. Rao

Cyber threat hunting is the process of proactively and iteratively formulating and validating threat hypotheses based on security-relevant observations and domain knowledge. To facilitate threat hunting tasks, this paper introduces threat intelligence computing as a new methodology that models threat discovery as a graph computation problem. It enables efficient programming for solving threat discovery problems, equipping threat hunters with a suite of potent new tools for agile codifications of threat hypotheses, automated evidence mining, and interactive data inspection capabilities. A concrete realization of a threat intelligence computing platform is presented through the design and implementation of a domain-specific graph language with interactive visualization support and a distributed graph database. The platform was evaluated in a two-week DARPA competition for threat detection on a test bed comprising a wide variety of systems monitored in real time. During this period, sub-billion records were produced, streamed, and analyzed, dozens of threat hunting tasks were dynamically planned and programmed, and attack campaigns with diverse malicious intent were discovered. The platform exhibited strong detection and analytics capabilities coupled with high efficiency, resulting in a leadership position in the competition. Additional evaluations on comprehensive policy reasoning are outlined to demonstrate the versatility of the platform and the expressiveness of the language.

conference on data and application security and privacy | 2016