Is this you? Create Your Porfile

Zhendong Su

University of California, Davis

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhendong Su is active.

Explore More

Publication

Featured researches published by Zhendong Su.

symposium on principles of programming languages | 2006

The essence of command injection attacks in web applications

Zhendong Su; Gary Wassermann

Web applications typically interact with a back-end database to retrieve persistent data and then present the data to the user as dynamically generated output, such as HTML web pages. However, this interaction is commonly done through a low-level API by dynamically constructing query strings within a general-purpose programming language, such as Java. This low-level interaction is ad hoc because it does not take into account the structure of the output language. Accordingly, user inputs are treated as isolated lexical entities which, if not properly sanitized, can cause the web application to generate unintended output. This is called a command injection attack, which poses a serious threat to web application security. This paper presents the first formal definition of command injection attacks in the context of web applications, and gives a sound and complete algorithm for preventing them based on context-free grammars and compiler parsing techniques. Our key observation is that, for an attack to succeed, the input that gets propagated into the database query or the output document must change the intended syntactic structure of the query or document. Our definition and algorithm are general and apply to many forms of command injection attacks. We validate our approach with SqlCheckS, an implementation for the setting of SQL command injection attacks. We evaluated SqlCheckS on real-world web applications with systematically compiled real-world attack data as input. SqlCheckS produced no false positives or false negatives, incurred low runtime overhead, and applied straightforwardly to web applications written in different languages.

international conference on software engineering | 2007

DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

Lingxiao Jiang; Ghassan Misherghi; Zhendong Su; Stéphane Glondu

Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space Rnmiddot and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar.

ieee symposium on security and privacy | 2006

FIREMAN: a toolkit for firewall modeling and analysis

Lihua Yuan; Hao Chen; Jianning Mai; Chen-Nee Chuah; Zhendong Su; Prasant Mohapatra

Security concerns are becoming increasingly critical in networked systems. Firewalls provide important defense for network security. However, misconfigurations in firewalls are very common and significantly weaken the desired security. This paper introduces FIREMAN, a static analysis toolkit for firewall modeling and analysis. By treating firewall configurations as specialized programs, FIREMAN applies static analysis techniques to check misconfigurations, such as policy violations, inconsistencies, and inefficiencies, in individual firewalls as well as among distributed firewalls. FIREMAN performs symbolic model checking of the firewall configurations for all possible IP packets and along all possible data paths. It is both sound and complete because of the finite state nature of firewall configurations. FIREMAN is implemented by modeling firewall rules using binary decision diagrams (BDDs), which have been used successfully in hardware verification and model checking. We have experimented with FIREMAN and used it to uncover several real misconfigurations in enterprise networks, some of which have been subsequently confirmed and corrected by the administrators of these networks

programming language design and implementation | 2007

Sound and precise analysis of web applications for injection vulnerabilities

Gary Wassermann; Zhendong Su

Web applications are popular targets of security attacks. One common type of such attacks is SQL injection, where an attacker exploits faulty application code to execute maliciously crafted database queries. Bothstatic and dynamic approaches have been proposed to detect or prevent SQL injections; while dynamic approaches provide protection for deployed software, static approaches can detect potential vulnerabilities before software deployment. Previous static approaches are mostly based on tainted information flow tracking and have at least some of the following limitations: (1) they do not model the precise semantics of input sanitization routines; (2) they require manually written specifications, either for each query or for bug patterns; or (3) they are not fully automated and may require user intervention at various points in the analysis. In this paper, we address these limitations by proposing a precise, sound, and fully automated analysis technique for SQL injection. Our technique avoids the need for specifications by consideringas attacks those queries for which user input changes the intended syntactic structure of the generated query. It checks conformance to this policy byconservatively characterizing the values a string variable may assume with a context free grammar, tracking the nonterminals that represent user-modifiable data, and modeling string operations precisely as language transducers. We have implemented the proposed technique for PHP, the most widely-used web scripting language. Our tool successfully discovered previously unknown and sometimes subtle vulnerabilities in real-world programs, has a low false positive rate, and scales to large programs (with approx. 100K loc).

international conference on software engineering | 2008

Static detection of cross-site scripting vulnerabilities

Gary Wassermann; Zhendong Su

Web applications support many of our daily activities, but they often have security problems, and their accessibility makes them easy to exploit. In cross-site scripting (XSS), an attacker exploits the trust a Web client (browser) has for a trusted server and executes injected script on the browser with the servers privileges. In 2006, XSS constituted the largest class of newly reported vulnerabilities making it the most prevalent class of attacks today. Web applications have XSS vulnerabilities because the validation they perform on untrusted input does not suffice to prevent that input from invoking a browsers JavaScript interpreter, and this validation is particularly difficult to get right if it must admit some HTML mark-up. Most existing approaches to finding XSS vulnerabilities are taint-based and assume input validation functions to be adequate, so they either miss real vulnerabilities or report many false positives. This paper presents a static analysis for finding XSS vulnerabilities that directly addresses weak or absent input validation. Our approach combines work on tainted information flow with string analysis. Proper input validation is difficult largely because of the many ways to invoke the JavaScript interpreter; we face the same obstacle checking for vulnerabilities statically, and we address it by formalizing a policy based on the W3C recommendation, the Firefox source code, and online tutorials about closed-source browsers. We provide effective checking algorithms based on our policy. We implement our approach and provide an extensive evaluation that finds both known and unknown vulnerabilities in real-world web applications.

international conference on software engineering | 2012

On the naturalness of software

Abram Hindle; Earl T. Barr; Zhendong Su; Mark Gabel; Premkumar T. Devanbu

Natural languages like English are rich, complex, and powerful. The highly creative and graceful use of languages like English and Tamil, by masters like Shakespeare and Avvaiyar, can certainly delight and inspire. But in practice, given cognitive constraints and the exigencies of daily life, most human utterances are far simpler and much more repetitive and predictable. In fact, these utterances can be very usefully modeled using modern statistical methods. This fact has led to the phenomenal success of statistical approaches to speech recognition, natural language translation, question-answering, and text mining and comprehension. We begin with the conjecture that most software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations — and thus, like natural language, it is also likely to be repetitive and predictable. We then proceed to ask whether a) code can be usefully modeled by statistical language models and b) such models can be leveraged to support software engineers. Using the widely adopted n-gram model, we provide empirical evidence supportive of a positive answer to both these questions. We show that code is also very repetitive, and in fact even more so than natural languages. As an example use of the model, we have developed a simple code completion engine for Java that, despite its simplicity, already improves Eclipses built-in completion capability. We conclude the paper by laying out a vision for future research in this area.

ACM Transactions on Software Engineering and Methodology | 2007

Static checking of dynamically generated queries in database applications

Gary Wassermann; Carl Gould; Zhendong Su; Premkumar T. Devanbu

Many data-intensive applications dynamically construct queries in response to client requests and execute them. Java servlets, e.g., can create string representations of SQL queries and then send the queries, using JDBC, to a database server for execution. The servlet programmer enjoys static checking via Javas strong type system. However, the Java type system does little to check for possible errors in the dynamically generated SQL query strings. Thus, a type error in a generated selection query (e.g., comparing a string attribute with an integer) can result in an SQL runtime exception. Currently, such defects must be rooted out through careful testing, or (worse) might be found by customers at runtime. In this paper, we present a sound, static, program analysis technique to verify the correctness of dynamically generated query strings. We describe our analysis technique and provide soundness results for our static analysis algorithm. We also describe the details of a prototype tool based on the algorithm and present several illustrative defects found in senior software-engineering student-team projects, online tutorial examples, and a real-world purchase order system written by one of the authors.

international conference on software engineering | 2008

Scalable detection of semantic clones

Mark Gabel; Lingxiao Jiang; Zhendong Su

Several techniques have been developed for identifying similar code fragments in programs. These similar fragments, referred to as code clones, can be used to identify redundant code, locate bugs, or gain insight into program design. Existing scalable approaches to clone detection are limited to finding program fragments that are similar only in their contiguous syntax. Other, semantics-based approaches are more resilient to differences in syntax, such as reordered statements, related statements interleaved with other unrelated statements, or the use of semantically equivalent control structures. However, none of these techniques have scaled to real world code bases. These approaches capture semantic information from Program Dependence Graphs (PDGs), program representations that encode data and control dependencies between statements and predicates. Our definition of a code clone is also based on this representation: we consider program fragments with isomorphic PDGs to be clones. In this paper, we present the first scalable clone detection algorithm based on this definition of semantic clones. Our insight is the reduction of the difficult graph similarity problem to a simpler tree similarity problem by mapping carefully selected PDG subgraphs to their related structured syntax. We efficiently solve the tree similarity problem to create a scalable analysis. We have implemented this algorithm in a practical tool and performed evaluations on several million-line open source projects, including the Linux kernel. Compared with previous approaches, our tool locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments.

programming language design and implementation | 1998

Partial online cycle elimination in inclusion constraint graphs

Manuel Fähndrich; Jeffrey S. Foster; Zhendong Su; Alex Aiken

Many program analyses are naturally formulated and implemented using inclusion constraints. We present new results on the scalable implementation of such analyses based on two insights: first, that online elimination of cyclic constraints yields orders-of-magnitude improvements in analysis time for large problems; second, that the choice of constraint representation affects the quality and efficiency of online cycle elimination. We present an analytical model that explains our design choices and show that the models predictions match well with results from a substantial experiment.

computer and communications security | 2005

On deriving unknown vulnerabilities from zero-day polymorphic and metamorphic worm exploits

Jedidiah R. Crandall; Zhendong Su; S. Felix Wu; Frederic T. Chong

Vulnerabilities that allow worms to hijack the control flow of each host that they spread to are typically discovered months before the worm outbreak, but are also typically discovered by third party researchers. A determined attacker could discover vulnerabilities as easily and create zero-day worms for vulnerabilities unknown to network defenses. It is important for an analysis tool to be able to generalize from a new exploit observed and derive protection for the vulnerability.Many researchers have observed that certain predicates of the exploit vector must be present for the exploit to work and that therefore these predicates place a limit on the amount of polymorphism and metamorphism available to the attacker. We formalize this idea and subject it to quantitative analysis with a symbolic execution tool called DACODA. Using DACODA we provide an empirical analysis of 14 exploits (seven of them actual worms or attacks from the Internet, caught by Minos with no prior knowledge of the vulnerabilities and no false positives observed over a period of six months) for four operating systems.Evaluation of our results in the light of these two models leads us to conclude that 1) single contiguous byte string signatures are not effective for content filtering, and token-based byte string signatures composed of smaller substrings are only semantically rich enough to be effective for content filtering if the vulnerability lies in a part of a protocol that is not commonly used, and that 2) practical exploit analysis must account for multiple processes, multithreading, and kernel processing of network data necessitating a focus on primitives instead of vulnerabilities.

Explore More