Wei-Jen Li | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wei-Jen Li is active.

Explore More

Publication

Featured researches published by Wei-Jen Li.

systems man and cybernetics | 2005

Fileprints: identifying file types by n-gram analysis

Wei-Jen Li; Ke Wang; Salvatore J. Stolfo; Benjamin Herzog

We propose a method to analyze files to categorize their type using efficient 1-gram analysis of their binary contents. Our aim is to be able to accurately identify the true type of an arbitrary file using statistical analysis of their binary contents without parsing. Consequently, we may determine the type of a file if its name does not announce its true type. The method represents each file type by a compact representation we call a fileprint, effectively a simple means of representing all members of the same file type by a set of statistical 1-gram models. The method is designed to be highly efficient so that files can be inspected with little or no buffering, and on a network appliance operating in high bandwidth environment or when streaming the file from or to disk.

international conference on detection of intrusions and malware and vulnerability assessment | 2007

A Study of Malcode-Bearing Documents

Wei-Jen Li; Salvatore J. Stolfo; Angelos Stavrou; Elli Androulaki; Angelos D. Keromytis

By exploiting the object-oriented dynamic composability of modern document applications and formats, malcode hidden in otherwise inconspicuous documents can reach third-party applications that may harbor exploitable vulnerabilities otherwise unreachable by network-level service attacks. Such attacks can be very selective and difficult to detect compared to the typical network worm threat, owing to the complexity of these applications and data formats, as well as the multitude of document-exchange vectors. As a case study, this paper focuses on Microsoft Word documents as malcode carriers. We investigate the possibility of detecting embedded malcode in Word documents using two techniques: static content analysis using statistical models of typical document content, and run-time dynamic tests on diverse platforms. The experiments demonstrate these approaches can not only detect known malware, but also most zero-day attacks. We identify several problems with both approaches, representing both challenges in addressing the problem and opportunities for future research.

Malware Detection | 2007

Towards Stealthy Malware Detection

Salvatore J. Stolfo; Ke Wang; Wei-Jen Li

Malcode can be easily hidden in document files and go undetected by standard technology. We demonstrate this opportunity of stealthy malcode insertion in several experiments using a standard COTS Anti-Virus (AV) scanner. Furthermore, in the case of zero-day malicious exploit code, signature- based AV scanners would fail to detect such malcode even if the scanner knew where to look. We propose the use of statistical binary content analysis of files in order to detect suspicious anomalous file segments that may suggest insertion of malcode. Experiments are performed to determine whether the approach of n-gram analysis may provide useful evidence of a tainted file that would subsequently be subjected to further scmtiny. We further perform tests to determine whether known malcode can be easily distinguished from otherwise “normal” Windows executables, and whether self-encrypted files may be easy to spot. Our goal is to develop an efficient means by static content analysis of detecting suspect infected files. This approach may have value for scanning a large store of collected information, such as a database of shared documents. The preliminary experiments suggest the problem is quite hard requiring new research to detect stealthy malcode.

visualization for computer security | 2004

Email archive analysis through graphical visualization

Wei-Jen Li; Shlomo Hershkop; Salvatore J. Stolfo

The analysis of the vast storehouse of email content accumulated or produced by individual users has received relatively little attention other than for specific tasks such as spam and virus filtering. Current email analysis in standard client applications consists of keyword based matching techniques for filtering and expert driven manual exploration of email files. We have implemented a tool, called the Email Mining Toolkit (EMT) for analyzing email archives which includes a graphical display to explore relationships between users and groups of email users. The chronological flow of an email message can be analyzed by EMT. Our design goal is to embed the technology into standard email clients, such as Outlook, revealing far more information about a users own email history than is otherwise now possible. In this paper we detail the visualization techniques implemented in EMT. We show the utility of these tools and underlying models for detecting email misuse such as viral propagation, and spam spread as examples.

Archive | 2009

Thwarting Attacks in Malcode-Bearing Documents by Altering Data Sector Values

Wei-Jen Li; Salvatore J. Stolfo

Embedding malcode within documents provides a convenient means of attacking systems. Such attacks can be very targeted and difficult to detect to stop due to the multitude of document-exchange vectors and the vulnerabilities in modern document processing applications. Detecting malcode embedded in a document is difficult owing to the complexity of modern document formats that provide ample opportunity to embed code in a myriad of ways. We focus on Microsoft Word documents as malcode carriers as a case study in this paper. To detect stealthy embedded malcode in documents, we develop an arbitrary data transformation technique that changes the value of data segments in documents in such a way as to purposely damage any hidden malcode that may be embedded in those sections. Consequently, the embedded malcode will not only fail but also introduce a system exception that would be easily detected. The method is intended to be applied in a safe sandbox, the transformation is reversible after testing a document, and does not require any learning phase. The method depends upon knowledge of the structure of the document binary format to parse a document and identify the specific sectors to which the method can be safely applied for malcode detection. The method can be implemented in MS Word as a security feature to enhance the safety of Word documents.

Archive | 2009