Xiaoyong Yuan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaoyong Yuan is active.

Explore More

Publication

Featured researches published by Xiaoyong Yuan.

ieee international conference on smart computing | 2017

DeepDefense: Identifying DDoS Attack via Deep Learning

Xiaoyong Yuan; Chuanhuang Li; Xiaolin Li

Distributed Denial of Service (DDoS) attacks grow rapidly and become one of the fatal threats to the Internet. Automatically detecting DDoS attack packets is one of the main defense mechanisms. Conventional solutions monitor network traffic and identify attack activities from legitimate network traffic based on statistical divergence. Machine learning is another method to improve identifying performance based on statistical features. However, conventional machine learning techniques are limited by the shallow representation models. In this paper, we propose a deep learning based DDoS attack detection approach (DeepDefense). Deep learning approach can automatically extract high-level features from low-level ones and gain powerful representation and inference. We design a recurrent deep neural network to learn patterns from sequences of network traffic and trace network attack activities. The experimental results demonstrate a better performance of our model compared with conventional machine learning models. We reduce the error rate from 7.517% to 2.103% compared with conventional machine learning method in the larger data set.

ieee international conference on smart computing | 2017

PhD Forum: Deep Learning-Based Real-Time Malware Detection with Multi-Stage Analysis

Xiaoyong Yuan

Protecting computer systems is a critical and ongoing problem, given that real-time malware detection is hard. The state-of-the-art for defense cannot keep pace with the increasing level of sophistication of malware. The industry, for instance, relies heavily on anti-virus technology for threat, which is effective for malware with known signatures, but not sustainable given the massive amount of malware samples released daily, as well as and its inefficacy in dealing with zero-day and polymorphic/metamorphic malware (practical detection rates range from 25% to 50%). Behavior-based approaches attempt to identify malware behaviors using instruction sequences, computation trace logic, and system (or API) call sequences. These solutions have been mostly based on conventional machine learning (ML) models with hand-craft features, such as K-nearest neighbor, SVM, and decision tree algorithms. However, current solutions based on ML suffer from high false-positive rates, mainly because of (i) the complexity and diversity of current software and malware, which are hard to capture during the learning phase of thealgorithms, (ii) sub-optimal feature extraction, and (iii) limited/outdated dataset. Since malware has been continuously evolving, existing protection mechanisms do not cope well with the increasedsophistication and complexity of these attacks, especially those performed by advanced persistent threats (APT), which are multi-module, stealthy, and target- focused. Furthermore, malware campaigns are not homogeneous--malware sophistication varies depending on the target, the type of service exploited as part of the attack (e.g., Internet Banking, relationship sites), the attack spreading source (e.g., phishing, drive-by downloads), and the location of the target. The accuracy of malware classification depends on gaining sufficient context information and extracting meaningful abstraction of behaviors. In problems about detecting malicious behavior based on sequence of system calls, longer sequences likely contain more information. However, classical ML- based detectors (i.e., Random Forest, Naive Bayes) often use short windows of system calls during the decision process and may not be able to extract enough features for accurate detection in a long term window. Thus, the main drawback of such approaches is to accomplish accurate detection, since it is difficult to analyze complex and longer sequences of malicious behaviors with limited window sizes, especially when malicious and benign behaviors are interposed. In contrast, Deep Learning models are capable of analyzing longer sequences of system calls and making better decisions through higher level information extraction and semantic knowledge learning. However, Deep Learning requires more computation time to estimate the probability of detection when the model needs to be retrained incrementally, a common requirement for malware detection when new variants and samples are frequently added to the training set. The trade-off is challenging: fast and not-so-accurate (classical ML methods) versus time-consuming and accurate detection (emerging Deep Learning methods). Our proposal is to leverage the best of the two worlds with Spectrum, a practical multi-stage malware- detection system operating in collaboration with the operating system (OS).

ieee conference dependable and secure computing | 2017

The dose makes the poison — Leveraging uncertainty for effective malware detection

Ruimin Sun; Xiaoyong Yuan; Andrew R. Lee; Matt Bishop; Donald E. Porter; Xiaolin Li; André Grégio; Daniela A. S. de Oliveira

Malware has become sophisticated and organizations dont have a Plan B when standard lines of defense fail. These failures have devastating consequences for organizations, such as sensitive information being exfiltrated. A promising avenue for improving the effectiveness of behavioral-based malware detectors is to combine fast (usually not highly accurate) traditional machine learning (ML) detectors with high-accuracy, but time-consuming, deep learning (DL) models. The main idea is to place software receiving borderline classifications by traditional ML methods in an environment where uncertainty is added, while software is analyzed by time-consuming DL models. The goal of uncertainty is to rate-limit actions of potential malware during deep analysis. In this paper, we describe Chameleon, a Linux-based framework that implements this uncertain environment. Chameleon offers two environments for its OS processes: standard — for software identified as benign by traditional ML detectors — and uncertain — for software that received borderline classifications analyzed by ML methods. The uncertain environment will bring obstacles to software execution through random perturbations applied probabilistically on selected system calls. We evaluated Chameleon with 113 applications from common benchmarks and 100 malware samples for Linux. Our results show that at threshold 10%, intrusive and non-intrusive strategies caused approximately 65% of malware to fail accomplishing their tasks, while approximately 30% of the analyzed benign software to meet with various levels of disruption (crashed or hampered). We also found that I/O-bound software was three times more affected by uncertainty than CPU-bound software.

network and parallel computing | 2014

Scheduling Cloud Platform Managed Live-Migration Operations to Minimize the Makespan

Xiaoyong Yuan; Ying Li; Yanqi Wang; Kewei Sun

Live-migration of virtual machines (VMs) has become an indispensable management operation of cloud platforms. The cloud platforms need to migrate multiple co-located and live VMs from one physical node to another for power saving, load balancing and maintenance. Such live-migration operations are critical to the running services, and thus should be completed as fast as possible. State-of-the-art live-migration techniques optimize the migration performance of single or multiple VMs by concentrating on Virtual Machine Monitor (VMM), little attention has been given to the cloud platforms which control and schedule the multiple migration operations. In this paper, we consider the problem of scheduling migration operations to minimize the makespan.

International Journal of Distributed Systems and Technologies | 2018

Analysis of Frequently Failing Tasks and Rescheduling Strategy in the Cloud System

Hongyan Tang; Ying Li; Tong Jia; Xiaoyong Yuan; Zhonghai Wu

To better understand task failures in cloud computing systems, the authors analyze failure frequency of tasks based on Google cluster dataset, and find some frequently failing tasks that suffer from long-term failures and repeated rescheduling, which are called killer tasks as they can be a big concern of cloud systems. Hence there is a need to analyze killer tasks thoroughly and recognize them precisely. In this article, the authors first investigate resource usage pattern of killer tasks and analyze rescheduling strategies of killer tasks in Google cluster to find that repeated rescheduling causes large amount of resource wasting. Based on the above observations, they then propose an online killer task recognition service to recognize killer tasks at the very early stage of their occurrence so as to avoid unnecessary resource wasting. The experiment results show that the proposed service performs a 93.6% accuracy in recognizing killer tasks with an 87% timing advance and 86.6% resource saving for the cloud system averagely.

international congress on big data | 2016

DeepSky: Identifying Absorption Bumps via Deep Learning

Xiaoyong Yuan; Min Li; Sudeep Gaddam; Xiaolin Li; Yinan Zhao; Jingzhe Ma; Jian Ge

The pervasive interstellar grains provide significant insights to help us understand the formation and evolution of stars, planetary systems, and galaxies, and could potentially lead us to the secret of the origin of life. One of the most effective ways to analyze the dusts is via their interaction and interference on some background light. The observable extinction curves and spectral features carry the information about the size and composition of the dusts. Among the features, the broad 2175 Å absorption bump is one of the most significant spectroscopic interstellar extinction features. Traditionally, astronomers apply conventional statistical and signal processing techniques to detect the existence of absorption bumps. These approaches require labor-intensive preprocessing and the co-existence of some other reference features to alleviate the influence from the noises. Conventional approaches not only involve substantial labor cost in complicated workflows, but also demand well-trained expertise to make subtle and error-prone conditional decisions. In this paper, we propose to leverage deep learning to automate the detection workflow without minute feature engineering. We design and analyze deep convolutional neural networks for detecting absorption bumps. We further propose the framework of deep learning mechanisms and models (collectively called DeepSky) for scientific discovery in astronomy. The prototype of DeepSky demonstrates efficient and effective results using limited labeled data. With well-designed data augmentation, our trained model achieved about 99% accuracy in prediction using the real-world data.

arXiv: Learning | 2017