Juan Zhai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Juan Zhai is active.

Explore More

Publication

Featured researches published by Juan Zhai.

international conference on software engineering | 2016

Automatic model generation from documentation for Java API functions

Juan Zhai; Jianjun Huang; Shiqing Ma; Xiangyu Zhang; Lin Tan; Jianhua Zhao; Feng Qin

Modern software systems are becoming increasingly complex, relying on a lot of third-party library support. Library behaviors are hence an integral part of software behaviors. Analyzing them is as important as analyzing the software itself. However, analyzing libraries is highly challenging due to the lack of source code, implementation in different languages, and complex optimizations. We observe that many Java library functions provide excellent documentation, which concisely describes the functionalities of the functions. We develop a novel technique that can construct models for Java API functions by analyzing the documentation. These models are simpler implementations in Java compared to the original ones and hence easier to analyze. More importantly, they provide the same functionalities as the original functions. Our technique successfully models 326 functions from 14 widely used Java classes. We also use these models in static taint analysis on Android apps and dynamic slicing for Java programs, demonstrating the effectiveness and efficiency of our models.

foundations of software engineering | 2017

LAMP: data provenance for graph based machine learning algorithms through derivative computation

Shiqing Ma; Yousra Aafer; Zhaogui Xu; Wen-Chuan Lee; Juan Zhai; Yingqi Liu; Xiangyu Zhang

Data provenance tracking determines the set of inputs related to a given output. It enables quality control and problem diagnosis in data engineering. Most existing techniques work by tracking program dependencies. They cannot quantitatively assess the importance of related inputs, which is critical to machine learning algorithms, in which an output tends to depend on a huge set of inputs while only some of them are of importance. In this paper, we propose LAMP, a provenance computation system for machine learning algorithms. Inspired by automatic differentiation (AD), LAMP quantifies the importance of an input for an output by computing the partial derivative. LAMP separates the original data processing and the more expensive derivative computation to different processes to achieve cost-effectiveness. In addition, it allows quantifying importance for inputs related to discrete behavior, such as control flow selection. The evaluation on a set of real world programs and data sets illustrates that LAMP produces more precise and succinct provenance than program dependence based techniques, with much less overhead. Our case studies demonstrate the potential of LAMP in problem diagnosis in data engineering.

SETTA 2015 Proceedings of the First International Symposium on Dependable Software Engineering: Theories, Tools, and Applications - Volume 9409 | 2015

Assertion-Directed Precondition Synthesis for Loops over Data Structures

Juan Zhai; Hanfei Wang; Jianhua Zhao

Program verification typically generates verification conditions for a program to be proven and then uses a theorem prover to prove their correctness. These verification conditions are normally generated by means of weakest-precondition calculus. Nevertheless, the weakest-precondition calculus faces a big challenge when dealing with loops. In this paper, we propose a framework that automatically generates preconditions for loops that iterate over commonly-used data structures. The preconditions are generated based on given assertions of loops and they are proved to be strong enough to ensure those given assertions hold. The data structures dealt with in our framework include one-dimensional arrays, acyclic singly-linked lists, doubly-linked lists and static lists. Such loops usually achieve their final results by focusing on one element in each iteration. In many such cases, the given assertion and the corresponding precondition of the loop separately reflect the part and the whole or vice versa. Inspired by this, our framework automatically generates precondition candidates for loops by transforming a given assertion. Then the framework uses the SMT solver Z3 and the weakest-precondition calculator for non-loop statements provided in the interactive code-verification tool Accumulator to check whether they are strong enough to prove the given assertion. The framework has been integrated into the tool Accumulator to generate suitable preconditions for loops, which greatly relieves the burden of manually providing preconditions for loops.

automated software engineering | 2018

Dual-force: understanding WebView malware via cross-language forced execution

Zhenhao Tang; Juan Zhai; Minxue Pan; Yousra Aafer; Shiqing Ma; Xiangyu Zhang; Jianhua Zhao

Modern Android malwares tend to use advanced techniques to cover their malicious behaviors. They usually feature multi-staged, condition-guarded and environment-specific payloads. An increasing number of them utilize WebView, particularly the two-way communications between Java and JavaScript, to evade detection and analysis of existing techniques. We propose Dual-Force, a forced execution technique which simultaneously forces both Java and JavaScript code of WebView applications to execute along various paths without requiring any environment setup or providing any inputs manually. As such, the hidden payloads of WebView malwares are forcefully exposed. The technique features a novel execution model that allows forced execution to suppress exceptions and continue execution. Experimental results show that Dual-Force precisely exposes malicious payload in 119 out of 150 Web-View malwares. Compared to the state-of-the-art, Dual-Force can expose 23% more malicious behaviors.

2017 IEEE International Conference on Software Quality, Reliability and Security (QRS) | 2017

Are Your Classes Well-Encapsulated? Encapsulation Analysis for Java

Zhenhao Tang; Juan Zhai; Bin Li; Jianhua Zhao

Encapsulation is one of the basic characteristics of object-oriented programming. However, the access modifiers provided by common object-oriented languages do not help much because they only encapsulate the member references rather than the objects pointed to by them. Bad encapsulation makes object-oriented programs difficult to understand and reason about, thus concealing potential software vulnerabilities. We present in this paper the encapsulation analysis technique, which is an expression-based dataflow analysis, to statically compute the runtime memory layouts of object-oriented programs. The analysis results can help developers to master an intuitive comprehension on the code quality regarding encapsulation of classes. The results of experiments on various open-source Java projects and libraries show that our approach is both effective (25.76% of the classes are reported as not fully encapsulated) and efficient (2.15 KLOC/s and 27.35 classes/s) in finding potential encapsulation problems. We also give common guidance on how to achieve better encapsulation for object-oriented programs.

2016 IEEE International Conference on Software Quality, Reliability and Security (QRS) | 2016

Automatic Invariant Synthesis for Arrays in Simple Programs

Bin Li; Zhenhao Tang; Juan Zhai; Jianhua Zhao

This paper proposes a way of using abstract interpretation for discovering properties about array contents in programs which manipulate arrays by sequential traversal. The method summarizes an array property as a universally quantified property. It directly treats invariant properties (including universally quantified formulas and atomic formulas) as abstract domains. Our method is sound and converges in finite time, and it is flexible. The method has been used to automatically discover nontrivial invariants for several examples. In particular, the method can represent and process multidimensional array properties.

2016 IEEE International Conference on Software Quality, Reliability and Security (QRS) | 2016

Precondition Calculation for Loops Iterating over Data Structures

Juan Zhai; Bin Li; Zhenhao Tang; Jianhua Zhao; Xuandong Li

Precondition calculation is a fundamental program verification technique. Many previous works tried to solve this problem, but ended with limited capability due to loop statements. We conducted a survey on loops manipulating commonly-used data structures occurring in several real-world open-source programs, and found that about 80% of such loops iterate over elements of a data structure, indicating that automatic calculation of preconditions with respect to post-conditions of these loops would cover a great number of real-world programs and greatly ease code verification tasks. In this paper, we specify the execution effect of a program statement using the memories modified by the statement and the new values stored in these memories after executing the statement. Thus, conditional statements and loop statements can be uniformly reduced to a sequence of assignments. Also we present an approach to calculate preconditions with respect to given post-conditions of various program statements including loops that iterate over elements of commonly-used data structures (e.g., acyclic singly-linked lists) based on execution effects of these statements. With execution effects, post-conditions and loop invariants can also be generated. Our approach handles various types of data including numeric, boolean, arrays and user-defined structures. We have implemented the approach and integrated it into the code verification tool, Accumulator. We also evaluated the approach with a variety of programs, and the results show that our approach is able to calculate preconditions for different kinds of post-conditions, including linear ones and universally quantified ones. Preconditions generated with our approach can ease the verification task by reducing the burden of providing loop invariants and preconditions of loop statements manually, which improves the automatic level and efficiency, and makes the verification less error-prone.

asia pacific symposium on internetware | 2015

Analyzing Inductively Defined Properties for Recursive Data Structures

Zhenhao Tang; Hanfei Wang; Bin Li; Juan Zhai; Jianhua Zhao; Xuandong Li

This paper proposes a framework facilitating the analysis on inductively defined properties for recursive data structures. Our work has three main parts. First, it helps simplify the analysis of heap-manipulating programs by classifying inductive properties of recursive data structures into two classifications and each of them is handled with observed patterns. Second, we propose a technique called slicing and splicing to track and specify how data structures are manipulated by programs, in which data structures are first sliced into several parts and these parts are further spliced into new data structures. Third, this work presents a property-directed interprocedural analysis, together with an algorithm to check the boundaries of modified procedure-local heaps regarding the recursive data structures pointed to by the parameters passed to the procedures.

2015 IEEE International Conference on Software Quality, Reliability and Security | 2015

Node-Set Analysis for Linked Recursive Data Structures

Zhenhao Tang; Hanfei Wang; Bin Li; Juan Zhai; Jianhua Zhao; Xuandong Li

The technique presented in this paper concerns the problem of how to automatically obtaining the specification of the resulting set of reachable nodes, after destructive operations over the data structure. Our work has two main contributions. First, we represent the node sets by expressions at pre-states, which facilitates the process of proving complex formulas efficiently. Second, we propose an improved version of the fold/unfold technique, which handles more general cases. Our approach is based on the observation that structure-update statements manipulate recursive data structures in a slice-and-splice way: the original data structures are sliced into several disjoint segments and these segments are further spliced into new data structures. Case studies on typical data structures show that our approach is effective and practical.

network and distributed system security symposium | 2018