Is this you? Create Your Porfile

Pei Wang

Pennsylvania State University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pei Wang is active.

Explore More

Publication

Featured researches published by Pei Wang.

international conference on software engineering | 2017

LibD: scalable and precise third-party library detection in android markets

Menghao Li; Wei Wang; Pei Wang; Shuai Wang; Dinghao Wu; Jian Liu; Rui Xue; Wei Huo

With the thriving of the mobile app markets, third-party libraries are pervasively integrated in the Android applications. Third-party libraries provide functionality such as advertisements, location services, and social networking services, making multi-functional app development much more productive. However, the spread of vulnerable or harmful third-party libraries may also hurt the entire mobile ecosystem, leading to various security problems. The Android platform suffers severely from such problems due to the way its ecosystem is constructed and maintained. Therefore, third-party Android library identification has emerged as an important problem which is the basis of many security applications such as repackaging detection and malware analysis. According to our investigation, existing work on Android library detection still requires improvement in many aspects, including accuracy and obfuscation resilience. In response to these limitations, we propose a novel approach to identifying third-party Android libraries. Our method utilizes the internal code dependencies of an Android app to detect and classify library candidates. Different from most previous methods which classify detected library candidates based on similarity comparison, our method is based on feature hashing and can better handle code whose package and method names are obfuscated. Based on this approach, we have developed a prototypical tool called LibD and evaluated it with an update-to-date and large-scale dataset. Our experimental results on 1,427,395 apps show that compared to existing tools, LibD can better handle multi-package third-party libraries in the presence of name-based obfuscation, leading to significantly improved precision without the loss of scalability.

Proceedings of the 4th International Workshop on Managing Technical Debt | 2013

Generating precise dependencies for large software

Pei Wang; Jingiu Yang; Lin Tan; Robert Kroeger; J. David Morgenthaler

Intra- and inter-module dependencies can be a significant source of technical debt in the long-term software development, especially for large software with millions of lines of code. This paper designs and implements a precise and scalable tool that extracts code dependencies and their utilization for large C/C++ software projects. The tool extracts both symbol-level and module-level dependencies of a software system and identifies potential underutilized and inconsistent dependencies. Such information points to potential refactoring opportunities and help developers perform large-scale refactoring tasks.

ieee international conference on software analysis evolution and reengineering | 2016

UROBOROS: Instrumenting Stripped Binaries with Static Reassembling

Shuai Wang; Pei Wang; Dinghao Wu

Software instrumentation techniques are widely used in program analysis tasks such as program profiling, vulnerability discovering, and security-oriented transforming. In this paper, we present an instrumentation tool called UROBOROS, which supports static instrumentation on stripped binaries. Due to the lack of relocation and debug information, reverse engineering of stripped binaries is challenging. Compared with the previous work, UROBOROS can provide complete, easy-to-use, transparent, and efficient static instrumentation on stripped binaries. UROBOROS supports complete instrumentation by statically recovering the relocatable program (including both code and data sections) and the control flow structures from binary code. UROBOROS provides a rich API to access and manipulate different levels of the program tructure. The instrumentation facilities of UROBOROS are easy-to-use, users with no binary rewriting and patching skills can directly manipulate stripped binaries to perform smooth program transformations. Distinguished from most instrumentation tools that need to patch the instrumentation code as new sections, UROBOROS can directly inline the instrumentation code into the disassembled program, which provides transparent instrumentation on stripped binaries. For efficiency, in the rewritten output of existing tools, frequent control transfers between the attached and original sections can incur a considerable performance penalty. However, the output from UROBOROS incurs no extra cost because the original and instrumentation code are connected by fall-through transfers. We perform comparative evaluations between UROBOROS and the state-of-the-art binary instrumentation tools, including DynInst and Pin. To demonstrate the versatility of UROBOROS, we also implement two real-world reengineering tasks which could be challenging for other instrumentation tools to accomplish. Our experimental results show that UROBOROS outperforms theexisting binary instrumentation tools with better performance, lower labor cost, and a broader scope of applications.

computer and communications security | 2016

CREDAL: Towards Locating a Memory Corruption Vulnerability with Your Core Dump

Jun Xu; Dongliang Mu; Ping Chen; Xinyu Xing; Pei Wang; Peng Liu

After a program has crashed and terminated abnormally, it typically leaves behind a snapshot of its crashing state in the form of a core dump. While a core dump carries a large amount of information, which has long been used for software debugging, it barely serves as informative debugging aids in locating software faults, particularly memory corruption vulnerabilities. A memory corruption vulnerability is a special type of software faults that an attacker can exploit to manipulate the content at a certain memory. As such, a core dump may contain a certain amount of corrupted data, which increases the difficulty in identifying useful debugging information (e.g. , a crash point and stack traces). Without a proper mechanism to deal with this problem, a core dump can be practically useless for software failure diagnosis. In this work, we develop CREDAL, an automatic tool that employs the source code of a crashing program to enhance core dump analysis and turns a core dump to an informative aid in tracking down memory corruption vulnerabilities. Specifically, CREDAL systematically analyzes a core dump potentially corrupted and identifies the crash point and stack frames. For a core dump carrying corrupted data, it goes beyond the crash point and stack trace. In particular, CREDAL further pinpoints the variables holding corrupted data using the source code of the crashing program along with the stack frames. To assist software developers (or security analysts) in tracking down a memory corruption vulnerability, CREDAL also performs analysis and highlights the code fragments corresponding to data corruption. To demonstrate the utility of CREDAL, we use it to analyze 80 crashes corresponding to 73 memory corruption vulnerabilities archived in Offensive Security Exploit Database. We show that, CREDAL can accurately pinpoint the crash point and (fully or partially) restore a stack trace even though a crashing program stack carries corrupted data. In addition, we demonstrate CREDAL can potentially reduce the manual effort of finding the code fragment that is likely to contain memory corruption vulnerabilities.

international conference on software maintenance | 2017

Semantics-Aware Machine Learning for Function Recognition in Binary Code

Shuai Wang; Pei Wang; Dinghao Wu

Function recognition in program binaries serves as the foundation for many binary instrumentation and analysis tasks. However, as binaries are usually stripped before distribution, function information is indeed absent in most binaries. By far, identifying functions in stripped binaries remains a challenge. Recent research work proposes to recognize functions in binary code through machine learning techniques. The recognition model, including typical function entry point patterns, is automatically constructed through learning. However, we observed that as previous work only leverages syntax-level features to train the model, binary obfuscation techniques can undermine the pre-learned models in real-world usage scenarios. In this paper, we propose FID, a semantics-based method to recognize functions in stripped binaries. We leverage symbolic execution to generate semantic information and learn the function recognition model through well-performing machine learning techniques.FID extracts semantic information from binary code and, therefore, is effectively adapted to different compilers and optimizations. Moreover, we also demonstrate that FID has high recognition accuracy on binaries transformed by widely-used obfuscation techniques. We evaluate FID with over four thousand test cases. Our evaluation shows that FID is comparable with previous work on normal binaries and it notably outperforms existing tools on obfuscated code.

international conference on software maintenance | 2017

Composite Software Diversification

Shuai Wang; Pei Wang; Dinghao Wu

Many techniques of software vulnerability exploitation rely on deep and comprehensive analysis of vulnerable program binaries. If a copy of the vulnerable software is available to attackers, they can compose their attack scripts and payloads by studying the sample copy and launch attacks on other copies of the same software in deployment. By transforming software into different forms before deployment, software diversification is considered as an effective mitigation of attacks originated from malicious binary analyses.Essentially, developing a software diversification transformation is nontrivial because it has to preserve the original functionality, provide strong enough unpredictability, and introduce negligible cost. Enlightened by research in other areas, we seek to apply different diversification transformations to the same program for a synergy effect such that the resulting hybrid transformations can have boosted diversification effects with modest cost. We name this approach the composite software diversification.Although the concept is straightforward, it becomes challenging when searching for satisfactory compositions of primitive transformations that maximize the synergy effect and make a balance between effectiveness and cost. In this work, we undertake an in-depth study and develop a reasonably well working selection strategy to find a transformation composition that performs better than any single transformation used in the composition. We believe our work can provide guidelines for practitioners who would like to improve the design of diversification tools in the future.

international conference on software engineering | 2018

Software protection on the go: a large-scale empirical study on mobile app obfuscation

Pei Wang; Qinkun Bao; Li Wang; Shuai Wang; Zhaofeng Chen; Tao Wei; Dinghao Wu

The prosperity of smartphone markets has raised new concerns about software security on mobile platforms, leading to a growing demand for effective software obfuscation techniques. Due to various differences between the mobile and desktop ecosystems, obfuscation faces both technical and non-technical challenges when applied to mobile software. Although there have been quite a few software security solution providers launching their mobile app obfuscation services, it is yet unclear how real-world mobile developers perform obfuscation as part of their software engineering practices. Our research takes a first step to systematically studying the deployment of software obfuscation techniques in mobile software development. With the help of an automated but coarse-grained method, we computed the likelihood of an app being obfuscated for over a million app samples crawled from Apple App Store. We then inspected the top 6600 instances and managed to identify 601 obfuscated versions of 539 iOS apps. By analyzing this sample set with extensive manual effort, we made various observations that reveal the status quo of mobile obfuscation in the real world, providing insights into understanding and improving the situation of software protection on mobile platforms.

international conference on software engineering | 2017

Protecting million-user iOS apps with obfuscation: motivations, pitfalls, and experience

Pei Wang; Dinghao Wu; Zhaofeng Chen; Tao Wei

In recent years, mobile apps have become the infrastructure of many popular Internet services. It is now fairly common that a mobile app serves a large number of users across the globe. Different from web-based services whose important program logic is mostly placed on remote servers, many mobile apps require complicated client-side code to perform tasks that are critical to the businesses. The code of mobile apps can be easily accessed by any party after the software is installed on a rooted or jailbroken device. By examining the code, skilled reverse engineers can learn various knowledge about the design and implementation of an app. Real-world cases have shown that the disclosed critical information allows malicious parties to abuse or exploit the app-provided services for unrightful profits, leading to significant financial losses for app vendors. One of the most viable mitigations against malicious reverse engineering is to obfuscate the software before release. Despite that security by obscurity is typically considered to be an unsound protection methodology, software obfuscation can indeed increase the cost of reverse engineering, thus delivering practical merits for protecting mobile apps. In this paper, we share our experience of applying obfuscation to multiple commercial iOS apps, each of which has millions of users. We discuss the necessity of adopting obfuscation for protecting modern mobile business, the challenges of software obfuscation on the iOS platform, and our efforts in overcoming these obstacles. Our report can benefit many stakeholders in the iOS ecosystem, including developers, security service providers, and Apple as the administrator of the ecosystem.

Proceedings of the 2017 Workshop on Forming an Ecosystem Around Software Transformation | 2017

Binary Code Retrofitting and Hardening Using SGX

Shuai Wang; Wenhao Wang; Qinkun Bao; Pei Wang; XiaoFeng Wang; Dinghao Wu

Trusted Execution Environment (TEE) is designed to deliver a safe execution environment for software systems. Intel Software Guard Extensions (SGX) provides isolated memory regions (i.e., SGX enclaves) to protect code and data from adversaries in the untrusted world. While existing research has proposed techniques to execute entire executable files inside enclave instances by providing rich sets of OS facilities, one notable limitation of these techniques is the unavoidably large size of Trusted Computing Base (TCB), which can potentially break the principle of least privilege. In this work, we describe techniques that provide practical and efficient protection of security sensitive code components in legacy binary code. Our technique dissects input binaries into multiple components which are further built into SGX enclave instances. We also leverage deliberately-designed binary editing techniques to retrofit the input binary code and preserve the original program semantics. Our tentative evaluations on hardening AES encryption and decryption procedures demonstrate the practicability and efficiency of the proposed technique.

networked systems design and implementation | 2013