Xiaokui Shu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaokui Shu is active.

Explore More

Publication

Featured researches published by Xiaokui Shu.

Computers & Security | 2015

Profiling user-trigger dependence for Android malware detection

Karim O. Elish; Xiaokui Shu; Danfeng Yao; Barbara G. Ryder; Xuxian Jiang

As mobile computing becomes an integral part of the modern user experience, malicious applications have infiltrated open marketplaces for mobile platforms. Malware apps stealthily launch operations to retrieve sensitive user or device data or abuse system resources. We describe a highly accurate classification approach for detecting malicious Android apps. Our method statically extracts a data-flow feature on how user inputs trigger sensitive API invocations, a property referred to as the user-trigger dependence. Our evaluation with 1433 malware apps and 2684 free popular apps gives a classification accuracy (2.1% false negative rate and 2.0% false positive rate) that is better than, or at least competitive against, the state-of-the-art. Our method also discovers new malicious apps in the Google Play market that cannot be detected by virus scanning tools. Our thesis in this mobile app classification work is to advocate the approach of benign property enforcement, i.e., extracting unique behavioral properties from benign programs and designing corresponding classification policies.

international conference on security and privacy in communication systems | 2012

Data Leak Detection as a Service

Xiaokui Shu; Danfeng Yao

We describe a network-based data-leak detection (DLD) technique, the main feature of which is that the detection does not reveal the content of the sensitive data. Instead, only a small amount of specialized digests are needed. Our technique – referred to as the fuzzy fingerprint detection – can be used to detect accidental data leaks due to human errors or application flaws. The privacy-preserving feature of our algorithms minimizes the exposure of sensitive data and enables the data owner to safely delegate the detection to others (e.g., network or cloud providers). We describe how cloud providers can offer their customers data-leak detection as an add-on service with strong privacy guarantees. We perform extensive experimental evaluation on our techniques with large datasets. Our evaluation results under various data-leak scenarios and setups show that our method can support accurate detection with very small number of false alarms, even when the presentation of the data has been transformed.

computer and communications security | 2015

Unearthing Stealthy Program Attacks Buried in Extremely Long Execution Paths

Xiaokui Shu; Danfeng Yao; Naren Ramakrishnan

Modern stealthy exploits can achieve attack goals without introducing illegal control flows, e.g., tampering with non-control data and waiting for the modified data to propagate and alter the control flow legally. Existing program anomaly detection systems focusing on legal control flow attestation and short call sequence verification are inadequate to detect such stealthy attacks. In this paper, we point out the need to analyze program execution paths and discover event correlations in large-scale execution windows among millions of instructions. We propose an anomaly detection approach with two-stage machine learning algorithms to recognize diverse normal call-correlation patterns and detect program attacks at both inter- and intra-cluster levels. We implement a prototype of our approach and demonstrate its effectiveness against three real-world attacks and four synthetic anomalies with less than 0.01% false positive rates and 0.1~1.3 ms analysis overhead per behavior instance (1k to 50k function or system calls).

IEEE Transactions on Information Forensics and Security | 2015

Privacy-Preserving Detection of Sensitive Data Exposure

Xiaokui Shu; Danfeng Yao; Elisa Bertino

Statistics from security firms, research institutions and government organizations show that the number of data-leak instances have grown rapidly in recent years. Among various data-leak cases, human mistakes are one of the main causes of data loss. There exist solutions detecting inadvertent sensitive data leaks caused by human mistakes and to provide alerts for organizations. A common approach is to screen content in storage and transmission for exposed sensitive information. Such an approach usually requires the detection operation to be conducted in secrecy. However, this secrecy requirement is challenging to satisfy in practice, as detection servers may be compromised or outsourced. In this paper, we present a privacy-preserving data-leak detection (DLD) solution to solve the issue where a special set of sensitive data digests is used in detection. The advantage of our method is that it enables the data owner to safely delegate the detection operation to a semihonest provider without revealing the sensitive data to the provider. We describe how Internet service providers can offer their customers DLD as an add-on service with strong privacy guarantees. The evaluation results show that our method can support accurate detection with very small number of false alarms under various data-leak scenarios.

conference on data and application security and privacy | 2015

Privacy-Preserving Scanning of Big Content for Sensitive Data Exposure with MapReduce

Fang Liu; Xiaokui Shu; Danfeng Yao; Ali Raza Butt

The exposure of sensitive data in storage and transmission poses a serious threat to organizational and personal security. Data leak detection aims at scanning content (in storage or transmission) for exposed sensitive data. Because of the large content and data volume, such a screening algorithm needs to be scalable for a timely detection. Our solution uses the MapReduce framework for detecting exposed sensitive content, because it has the ability to arbitrarily scale and utilize public resources for the task, such as Amazon EC2. We design new MapReduce algorithms for computing collection intersection for data leak detection. Our prototype implemented with the Hadoop system achieves 225 Mbps analysis throughput with 24 nodes. Our algorithms support a useful privacy-preserving data transformation. This transformation enables the privacy-preserving technique to minimize the exposure of sensitive data during the detection. This transformation supports the secure outsourcing of the data leak detection to untrusted MapReduce and cloud providers.

IEEE Transactions on Information Forensics and Security | 2016

Fast Detection of Transformed Data Leaks

Xiaokui Shu; Jing Zhang; Danfeng Daphne Yao; Wu-chun Feng

The leak of sensitive data on computer systems poses a serious threat to organizational security. Statistics show that the lack of proper encryption on files and communications due to human errors is one of the leading causes of data loss. Organizations need tools to identify the exposure of sensitive data by screening the content in storage and transmission, i.e., to detect sensitive information being stored or transmitted in the clear. However, detecting the exposure of sensitive information is challenging due to data transformation in the content. Transformations (such as insertion and deletion) result in highly unpredictable leak patterns. In this paper, we utilize sequence alignment techniques for detecting complex data-leak patterns. Our algorithm is designed for detecting long and inexact sensitive data patterns. This detection is paired with a comparable sampling algorithm, which allows one to compare the similarity of two separately sampled sequences. Our system achieves good detection accuracy in recognizing transformed leaks. We implement a parallelized version of our algorithms in graphics processing unit that achieves high analysis throughput. We demonstrate the high multithreading scalability of our data leak detection method required by a sizable organization.

recent advances in intrusion detection | 2015

A Formal Framework for Program Anomaly Detection

Xiaokui Shu; Danfeng Daphne Yao; Barbara G. Ryder

Program anomaly detection analyzes normal program behaviors and discovers aberrant executions caused by attacks, misconfigurations, program bugs, and unusual usage patterns. The merit of program anomaly detection is its independence from attack signatures, which enables proactive defense against new and unknown attacks. In this paper, we formalize the general program anomaly detection problem and point out two of its key properties. We present a unified framework to present any program anomaly detection method in terms of its detection capability. We prove the theoretical accuracy limit for program anomaly detection with an abstract detection machine. We show how existing solutions are positioned in our framework and illustrate the gap between state-of-the-art methods and the theoretical accuracy limit. We also point out some potential modeling features for future program anomaly detection evolution.

conference on computer communications workshops | 2015

Rapid and parallel content screening for detecting transformed data exposure

Xiaokui Shu; Jing Zhang; Danfeng Yao; Wu-chun Feng

The leak of sensitive data on computer systems poses a serious threat to organizational security. Organizations need to identify the exposure of sensitive data by screening the content in storage and transmission, i.e., to detect sensitive information being stored or transmitted in the clear. However, detecting the exposure of sensitive information is challenging due to data transformation in the content. Transformations (such as insertion, deletion) result in highly unpredictable leak patterns. Existing automata-based string matching algorithms are impractical for detecting transformed data leaks because of its formidable complexity when modeling the required regular expressions. We design two new algorithms for detecting long and inexact data leaks. Our system achieves high detection accuracy in recognizing transformed leaks compared with the state-of-the-art inspection methods. We parallelize our prototype on graphics processing unit and demonstrate the strong scalability of our data leak detection solution analyzing big data.

ACM Transactions on Privacy and Security (TOPS) | 2017

Long-Span Program Behavior Modeling and Attack Detection

Xiaokui Shu; Danfeng Yao; Naren Ramakrishnan; Trent Jaeger

Intertwined developments between program attacks and defenses witness the evolution of program anomaly detection methods. Emerging categories of program attacks, e.g., non-control data attacks and data-oriented programming, are able to comply with normal trace patterns at local views. This article points out the deficiency of existing program anomaly detection models against new attacks and presents long-span behavior anomaly detection (LAD), a model based on mildly context-sensitive grammar verification. The key feature of LAD is its reasoning of correlations among arbitrary events that occurred in long program traces. It extends existing correlation analysis between events at a stack snapshot, e.g., paired call and ret, to correlation analysis among events that historically occurred during the execution. The proposed method leverages specialized machine learning techniques to probe normal program behavior boundaries in vast high-dimensional detection space. Its two-stage modeling/detection design analyzes event correlation at both binary and quantitative levels. Our prototype successfully detects all reproduced real-world attacks against sshd, libpcre, and sendmail. The detection procedure incurs 0.1 ms to 1.3 ms overhead to profile and analyze a single behavior instance that consists of tens of thousands of function call or system call events.

conference on data and application security and privacy | 2015

Rapid Screening of Transformed Data Leaks with Efficient Algorithms and Parallel Computing

Xiaokui Shu; Jing Zhang; Danfeng Yao; Wu-chun Feng

The leak of sensitive data on computer systems poses a serious threat to organizational security. Organizations need to identify the exposure of sensitive data by screening the content in storage and transmission, i.e., to detect sensitive information being stored or transmitted in the clear. However, detecting the exposure of sensitive information is challenging due to data transformation in the content. Transformations (such as insertion, deletion) result in highly unpredictable leak patterns. Existing automata-based string matching algorithms are impractical for detecting transformed data leaks, because of its formidable complexity when modeling the required regular expressions. We design two new algorithms for detecting long and transformed data leaks. Our system achieves high detection accuracy in recognizing transformed leaks compared to the state-of-the-art inspection methods. We parallelize our prototype on graphics processing unit and demonstrate the strong scalability of our detection solution required by a sizable organization.

Explore More