Is this you? Create Your Porfile

Junghee Lim

University of Wisconsin-Madison

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Junghee Lim is active.

Explore More

Publication

Featured researches published by Junghee Lim.

architectural support for programming languages and operating systems | 2011

ConSeq: detecting concurrency bugs through sequential errors

Wei Zhang; Junghee Lim; Ramya Olichandran; Joel Scherpelz; Guoliang Jin; Shan Lu; Thomas W. Reps

Concurrency bugs are caused by non-deterministic interleavings between shared memory accesses. Their effects propagate through data and control dependences until they cause software to crash, hang, produce incorrect output, etc. The lifecycle of a bug thus consists of three phases: (1) triggering, (2) propagation, and (3) failure. Traditional techniques for detecting concurrency bugs mostly focus on phase (1)--i.e., on finding certain structural patterns of interleavings that are common triggers of concurrency bugs, such as data races. This paper explores a consequence-oriented approach to improving the accuracy and coverage of state-space search and bug detection. The proposed approach first statically identifies potential failure sites in a program binary (i.e., it first considers a phase (3) issue). It then uses static slicing to identify critical read instructions that are highly likely to affect potential failure sites through control and data dependences (phase (2)). Finally, it monitors a single (correct) execution of a concurrent program and identifies suspicious interleavings that could cause an incorrect state to arise at a critical read and then lead to a software failure (phase (1)). ConSeqs backwards approach, (3)!(2)!(1), provides advantages in bug-detection coverage and accuracy but is challenging to carry out. ConSeq makes it feasible by exploiting the empirical observationthat phases (2) and (3) usually are short and occur within one thread. Our evaluation on large, real-world C/C++ applications shows that ConSeq detects more bugs than traditional approaches and has a much lower false-positive rate.

embedded software | 2004

Compiler-assisted demand paging for embedded systems with flash memory

Chanik Park; Junghee Lim; Kiwon Kwon; Jaejin Lee; Sang Lyul Min

In this paper, we propose a novel, application specific demand paging mechanism for low-end embedded systems with flash memory as secondary storage. These systems are not equipped with virtual memory. A small memory space called an execution buffer is allocated to page an application. An application-specific page manager manages the buffer. The manager is generated by a compiler post-pass and combined with the application image. Our compiler post-pass analyzes the ELF executable image of an application and transforms function call/return instructions into calls to the page manager. As a result, each function of the code can be loaded into memory on demand at run time. To minimize the overhead of demand paging, code clustering algorithms are also presented. We evaluate our techniques with five embedded applications. We show that our approach can reduce the code memory size by 33% on average with reasonable performance degradation (8-20%) and energy consumption (10% more on average) for low-end embedded systems.

partial evaluation and semantic-based program manipulation | 2006

Intermediate-representation recovery from low-level code

Thomas W. Reps; Gogul Balakrishnan; Junghee Lim

The goal of our work is to create tools that an analyst can use to understand the workings of COTS components, plugins, mobile code, and DLLs, as well as memory snapshots of worms and virus-infected code. This paper describes how static analysis provides techniques that can be used to recover intermediate representations that are similar to those that can be created for a program written in a high-level language.

computer aided verification | 2010

Directed proof generation for machine code

Aditya V. Thakur; Junghee Lim; Akash Lal; Amanda Burton; Evan Driscoll; Matt Elder; Tycho Andersen; Thomas W. Reps

We present the algorithms used in McVeto (Machine-Code VErification TOol), a tool to check whether a stripped machine-code program satisfies a safety property The verification problem that McVeto addresses is challenging because it cannot assume that it has access to (i) certain structures commonly relied on by source-code verification tools, such as control-flow graphs and call-graphs, and (ii) meta-data, such as information about variables, types, and aliasing It cannot even rely on out-of-scope local variables and return addresses being protected from the programs actions What distinguishes McVeto from other work on software model checking is that it shows how verification of machine-code can be performed, while avoiding conventional techniques that would be unsound if applied at the machine-code level.

working conference on reverse engineering | 2006

Extracting Output Formats from Executables

Junghee Lim; Thomas W. Reps; Ben Liblit

We describe the design and implementation of FFE/x86 (File-Format Extractor for x86), an analysis tool that works on stripped executables (i.e., neither source code nor debugging information need be available) and extracts output data formats, such as file formats and network packet formats. We first construct a hierarchical finite state machine (HFSM) that over-approximates the output data format. An HFSM defines a language over the operations used to generate output data. We use value-set analysis (VSA) and aggregate structure identification (ASI) to annotate HFSMs with information that partially characterizes some of the output data values. VSA determines an over-approximation of the set of addresses and integer values that each data object can hold at each program point, and ASI analyzes memory accesses in the program to recover information about the structure of aggregates. A series of filtering operations is performed to over-approximate an HFSM with a finite-state machine, which can result in a final answer that is easier to understand. Our experiments with FFE/x86 uncovered a possible bug in the image-conversion utility png2ico

computer aided verification | 2005

Model checking x86 executables with codesurfer/x86 and WPDS++

Gogul Balakrishnan; Thomas W. Reps; Nicholas Kidd; Akash Lal; Junghee Lim; David Melski; Radu Gruian; Suan Hsi Yong; Chi-Hua Chen; Tim Teitelbaum

This paper presents a toolset for model checking x86 executables. The members of the toolset are CodeSurfer/x86, WPDS++, and the Path Inspector. CodeSurfer/x86 is used to extract a model from an executable in the form of a weighted pushdown system. WPDS++ is a library for answering generalized reachability queries on weighted pushdown systems. The Path Inspector is a software model checker built on top of CodeSurfer and WPDS++ that supports safety queries about the programs possible control configurations.

compiler construction | 2008

A system for generating static analyzers for machine instructions

Junghee Lim; Thomas W. Reps

This paper describes the design and implementation of a language for specifying the semantics of an instruction set, along with a run-time system to support the static analysis of executables written in that instruction set. The work advances the state of the art by creating multiple analysis phases from a specification of the concrete operational semantics of the language to be analyzed.

ACM Transactions on Programming Languages and Systems | 2013

TSL: A System for Generating Abstract Interpreters and its Application to Machine-Code Analysis

Junghee Lim; Thomas W. Reps

This article describes the design and implementation of a system, called TSL (for Transformer Specification Language), that provides a systematic solution to the problem of creating retargetable tools for analyzing machine code. TSL is a tool generator---that is, a metatool---that automatically creates different abstract interpreters for machine-code instruction sets. The most challenging technical issue that we faced in designing TSL was how to automate the generation of the set of abstract transformers for a given abstract interpretation of a given instruction set. From a description of the concrete operational semantics of an instruction set, together with the datatypes and operations that define an abstract domain, TSL automatically creates the set of abstract transformers for the instructions of the instruction set. TSL advances the state-of-the-art in program analysis because it provides two dimensions of parameterizability: (i) a given analysis component can be retargeted to different instruction sets; (ii) multiple analysis components can be created automatically from a single specification of the concrete operational semantics of the language to be analyzed. TSL is an abstract-transformer-generator generator. The article describes the principles behind TSL, and discusses how one uses TSL to develop different abstract interpreters.

asian symposium on programming languages and systems | 2005

A next-generation platform for analyzing executables

Thomas W. Reps; Gogul Balakrishnan; Junghee Lim; Tim Teitelbaum

In recent years, there has been a growing need for tools that an analyst can use to understand the workings of COTS components, plugins, mobile code, and DLLs, as well as memory snapshots of worms and virus-infected code. Static analysis provides techniques that can help with such problems; however, there are several obstacles that must be overcome: – For many kinds of potentially malicious programs, symbol-table and debugging information is entirely absent. Even if it is present, it cannot be relied upon. – To understand memory-access operations, it is necessary to determine the set of addresses accessed by each operation. This is difficult because While some memory operations use explicit memory addresses in the instruction (easy), others use indirect addressing via address expressions (difficult). Arithmetic on addresses is pervasive. For instance, even when the value of a local variable is loaded from its slot in an activation record, address arithmetic is performed. There is no notion of type at the hardware level, so address values cannot be distinguished from integer values. Memory accesses do not have to be aligned, so word-sized address values could potentially be cobbled together from misaligned reads and writes. We have developed static-analysis algorithms to recover information about the contents of memory locations and how they are manipulated by an executable. By combining these analyses with facilities provided by the IDAPro and CodeSurfer toolkits, we have created CodeSurfer/x86, a prototype tool for browsing, inspecting, and analyzing x86 executables. From an x86 executable, CodeSurfer/x86 recovers intermediate representations that are similar to what would be created by a compiler for a program written in a high-level language. CodeSurfer/x86 also supports a scripting language, as well as several kinds of sophisticated pattern-matching capabilities. These facilities provide a platform for the development of additional tools for analyzing the security properties of executables.

ACM Transactions on Programming Languages and Systems | 2014

Abstract Domains of Affine Relations

Matt Elder; Junghee Lim; Tushar Sharma; Tycho Andersen; Thomas W. Reps

This article considers some known abstract domains for affine-relation analysis (ARA), along with several variants, and studies how they relate to each other. The various domains represent sets of points that satisfy affine relations over variables that hold machine integers and are based on an extension of linear algebra to modules over a ring (in particular, arithmetic performed modulo 2w, for some machine-integer width w). We show that the abstract domains of Müller-Olm/Seidl (MOS) and King/Søndergaard (KS) are, in general, incomparable. However, we give sound interconversion methods. In other words, we give an algorithm to convert a KS element vKS to an overapproximating MOS element vMOS—that is, γ (vKS) ⊆ γ (vMOS—as well as an algorithm to convert an MOS element wMOS to an overapproximating KS element wKS—that is, γ (wMOS) ⊆ γ (wKS). The article provides insight on the range of options that one has for performing ARA in a program analyzer: —We describe how to perform a greedy, operator-by-operator abstraction method to obtain KS abstract transformers. —We also describe a more global approach to obtaining KS abstract transformers that considers the semantics of an entire instruction, basic block, or other loop-free program fragment. The latter method can yield best abstract transformers, and hence can be more precise than the former method. However, the latter method is more expensive. We also explain how to use the KS domain for interprocedural program analysis using a bit-precise concrete semantics, but without bit blasting.

Explore More