Wei Le | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wei Le is active.

Explore More

Publication

Featured researches published by Wei Le.

IEEE Transactions on Software Engineering | 2005

Software assurance by bounded exhaustive testing

David Coppit; Jinlin Yang; Sarfraz Khurshid; Wei Le; Kevin J. Sullivan

Bounded exhaustive testing (BET) is a verification technique in which software is automatically tested for all valid inputs up to specified size bounds. A particularly interesting case of BET arises in the context of systems that take structurally complex inputs. Early research suggests that the BET approach can reveal faults in small systems with inputs of low structural complexity, but its potential utility for larger systems with more complex input structures remains unclear. We set out to test its utility on one such system. We used Alloy and TestEra to generate inputs to test the Galileo dynamic fault tree analysis tool, for which we already had both a formal specification of the input space and a test oracle. An initial attempt to generate inputs using a straightforward translation of our specification to Alloy did not work well. The generator failed to generate inputs to meaningful bounds. We developed an approach in which we factored the specification, used TestEra to generate abstract inputs based on one factor, and passed the results through a postprocessor that reincorporated information from the second factor. Using this technique, we were able to generate test inputs to meaningful bounds, and the inputs revealed nontrivial faults in the Galileo implementation, our specification, and our oracle. Our results suggest that BET, combined with specification abstraction and factoring techniques, could become a valuable addition to our verification toolkit and that further investigation is warranted.

foundations of software engineering | 2010

Path-based fault correlations

Wei Le; Mary Lou Soffa

Although a number of automatic tools have been developed to detect faults, much of the diagnosis is still being done manually. To help with the diagnostic tasks, we formally introduce fault correlation, a causal relationship between faults. We statically determine correlations based on the expected dynamic behavior of a fault. If the occurrence of one fault causes another fault to occur, we say they are correlated. With the identification of the correlated faults, we can better understand fault behaviors and the risks of faults. If one fault is uniquely correlated with another, we know fixing the first fault will fix the other. Correlated faults can be grouped, enabling prioritization of diagnoses of the fault groups. In this paper, we develop an interprocedural, path-sensitive, and scalable algorithm to automatically compute correlated faults in a program. In our approach, we first statically detect faults and determine their error states. By propagating the effects of the error state along a path, we detect the correlation of pairs of faults. We automatically construct a correlation graph which shows how correlations occur among multiple faults and along different paths. Guided by a correlation graph, we can reduce the number of faults required for diagnosis to find root causes. We implemented our correlation algorithm and found through experimentation that faults involved in the correlations can be of different types and located in different procedures. Using correlation information, we are able to automate diagnostic tasks that previously had to be done manually.

international conference on software engineering | 2014

Patch verification via multiversion interprocedural control flow graphs

Wei Le; Shannon D. Pattison

Software development is inherently incremental; however, it is challenging to correctly introduce changes on top of existing code. Recent studies show that 15%-24% of the bug fixes are incorrect, and the most important yet hard-to-acquire information for programming changes is whether this change breaks any code elsewhere. This paper presents a framework, called Hydrogen, for patch verification. Hydrogen aims to automatically determine whether a patch correctly fixes a bug, a new bug is introduced in the change, a bug can impact multiple software releases, and the patch is applicable for all the impacted releases. Hydrogen consists of a novel program representation, namely multiversion interprocedural control flow graph (MVICFG), that integrates and compares control flow of multiple versions of programs, and a demand-driven, path-sensitive symbolic analysis that traverses the MVICFG for detecting bugs related to software changes and versions. In this paper, we present the definition, construction and applications of MVICFGs. Our experimental results show that Hydrogen correctly builds desired MVICFGs and is scalable to real-life programs such as libpng, tightvnc and putty. We experimentally demonstrate that MVICFGs can enable efficient patch verification. Using the results generated by Hydrogen, we have found a few documentation errors related to patches for a set of open-source programs.

international symposium on software testing and analysis | 2011

Generating analyses for detecting faults in path segments

Wei Le; Mary Lou Soffa

Although static bug detectors are extensively applied, there is a cost in using them. One challenge is that static analysis often reports a large number of false positives but little diagnostic information. Also, individual bug detectors need to be built in response to new types of faults, and tuning a static tool for precision and scalability is time-consuming. This paper presents a novel frame-work that automatically generates scalable, interprocedural, path-sensitive analyses to detect user-specified faults. The framework consists of a specification technique that expresses faults and information needed for their detection, a scalable, path-sensitive algorithm, and a generator that unifies the two. The analysis produced identifies not only faults but also the path segments where the root causes of a fault are located. The generality of the framework is accomplished for both data- and control-centric faults. We implemented our framework and generated fault detectors for identifying buffer overflows, integer violations, null-pointer dereferences and memory leaks. We experimentally demonstrate that the generated analyses scales to large deployed software, and its detection capability is comparable to tools that target a specific type of fault. In our experiments, we identify a total of 146 faults of the four types. While the length of path segments for the majority of faults is 1--4 procedures, we are able to detect faults deeply embedded in the code across 35 procedures.

mining software repositories | 2014

A code clone oracle

Daniel E. Krutz; Wei Le

Code clones are functionally equivalent code segments. De- tecting code clones is important for determining bugs, fixes and software reuse. Code clone detection is also essential for developing fast and precise code search algorithms. How- ever, the challenge of such research is to evaluate that the clones detected are indeed functionally equivalent, consider- ing the majority of clones are not textual or even syntacti- cally identical. The goal of this work is to generate a set of method level code clones with a high confidence to help to evaluate future code clone detection and code search tools to evaluate their techniques. We selected three open source programs, Apache, Python and PostgreSQL, and randomly sampled a total of 1536 function pairs. To confirm whether or not these function pairs indicate a clone and what types of clones they belong to, we recruited three experts who have experience in code clone research and four students who have experience in programming for manual inspection. For confi- dence of the data, the experts consulted multiple code clone detection tools to make the consensus. To assist manual inspection, we built a tool to automatically load function pairs of interest and record the manual inspection results. We found that none of the 66 pairs are textual identical type- 1 clones, and 9 pairs are type-4 clones. Our data is available at: http://phd.gccis.rit.edu/weile/data/cloneoracle/.

international conference on software engineering | 2013

Segmented symbolic analysis

Wei Le

Symbolic analysis is indispensable for software tools that require program semantic information at compile time. However, determining symbolic values for program variables related to loops and library calls is challenging, as the computation and data related to loops can have statically unknown bounds, and the library sources are typically not available at compile time. In this paper, we propose segmented symbolic analysis, a hybrid technique that enables fully automatic symbolic analysis even for the traditionally challenging code of library calls and loops. The novelties of this work are threefold: 1) we flexibly weave symbolic and concrete executions on the selected parts of the program based on demand; 2) dynamic executions are performed on the unit tests constructed from the code segments to infer program semantics needed by static analysis; and 3) the dynamic information from multiple runs is aggregated via regression analysis. We developed the Helium framework, consisting of a static component that performs symbolic analysis and partitions a program, a dynamic analysis that synthesizes unit tests and automatically infers symbolic values for program variables, and a protocol that enables static and dynamic analyses to be run interactively and concurrently. Our experimental results show that by handling loops and library calls that a traditional symbolic analysis cannot process, segmented symbolic analysis detects 5 times more buffer overflows. The technique is scalable for real-world programs such as putty, tightvnc and snort.

foundations of software engineering | 2016

Proteus: computing disjunctive loop summary via path dependency analysis

Xiaofei Xie; Bihuan Chen; Yang Liu; Wei Le; Xiaohong Li

Loops are challenging structures for program analysis, especially when loops contain multiple paths with complex interleaving executions among these paths. In this paper, we first propose a classification of multi-path loops to understand the complexity of the loop execution, which is based on the variable updates on the loop conditions and the execution order of the loop paths. Secondly, we propose a loop analysis framework, named Proteus, which takes a loop program and a set of variables of interest as inputs and summarizes path-sensitive loop effects on the variables. The key contribution is to use a path dependency automaton (PDA) to capture the execution dependency between the paths. A DFS-based algorithm is proposed to traverse the PDA to summarize the effect for all feasible executions in the loop. The experimental results show that Proteus is effective in three applications: Proteus can 1) compute a more precise bound than the existing loop bound analysis techniques; 2) significantly outperform state-of-the-art tools for loop verification; and 3) generate test cases for deep loops within one second, while KLEE and Pex either need much more time or fail.

international symposium on software testing and analysis | 2015

S-looper: automatic summarization for multipath string loops

Xiaofei Xie; Yang Liu; Wei Le; Xiaohong Li; Hongxu Chen

Loops are important yet most challenging program constructs to analyze for various program analysis tasks. Existing loop analysis techniques mainly handle well loops that contain only integer variables with a single path in the loop body. The key challenge in summarizing a multiple-path loop is that a loop traversal can yield a large number of possibilities due to the different execution orders of these paths located in the loop; when a loop contains a conditional branch related to string content, we potentially need to track every character in the string for loop summarization, which is expensive. In this paper, we propose an approach, named S-Looper, to automatically summarize a type of loops related to a string traversal. This type of loops can contain multiple paths, and the branch conditions in the loop can be related to string content. Our approach is to identify patterns of the string based on the branch conditions along each path in the loop. Based on such patterns, we then generate a loop summary that describes the path conditions of a loop traversal as well as the symbolic values of each variable at the exit of a loop. Combined with vulnerability conditions, we are thus able to generate test inputs that traverse a loop in a specific way and lead to exploitation. Our experiments show that handling such string loops can largely improve the buffer overflow detection capabilities of the existing symbolic analysis tool. We also compared our techniques with KLEE and PEX, and show that we can generate test inputs more effectively and efficiently.

formal methods | 2013

Marple: Detecting faults in path segments using automatically generated analyses

Wei Le; Mary Lou Soffa

Generally, a fault is a property violation at a program point along some execution path. To obtain the path where a fault occurs, we can either run the program or manually identify the execution paths through code inspection. In both of the cases, only a very limited number of execution paths can be examined for a program. This article presents a static framework, Marple, that automatically detects path segments where a fault occurs at a whole program scale. An important contribution of the work is the design of a demand-driven analysis that effectively addresses scalability challenges faced by traditional path-sensitive fault detection. The techniques are made general via a specification language and an algorithm that automatically generates path-based analyses from specifications. The generality is achieved in handling both data- and control-centric faults as well as both liveness and safety properties, enabling the exploitation of fault interactions for diagnosis and efficiency. Our experimental results demonstrate the effectiveness of our techniques in detecting path segments of buffer overflows, integer violations, null-pointer dereferences, and memory leaks. Because we applied an interprocedural, path-sensitive analysis, our static fault detectors generally report better precision than the tools available for comparison. Our demand-driven analyses are shown scalable to deployed applications such as apache, putty, and ffmpeg.

Proceedings of the 2nd Workshop on Software Engineering for Sensor Network Applications | 2011

Lazy preemption to enable path-based analysis of interrupt-driven code

Wei Le; Jing Yang; Mary Lou Soffa; Kamin Whitehouse

One of the important factors in ensuring the correct functionality of wireless sensor networks (WSNs) is the reliability of the software running on individual sensor nodes. Research has shown that path-sensitive static analysis is effective for bug detection and fault diagnosis; however, path-sensitive analysis is prohibitively expensive when applied to a WSN application due to the large state space caused by arbitrary interrupt preemptions. In this paper, we propose a new execution model called lazy preemption that reduces this state space by restricting interrupt handlers to a set of predetermined preemption points, if possible. This execution model allows us to represent the program with an inter-interrupt control flow graph (IICFG), which is easier to analyze than the original CFGs with arbitrary interrupt preemptions.

Explore More