Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shiliang Hu is active.

Publication


Featured researches published by Shiliang Hu.


international symposium on computer architecture | 2013

QuickRec: prototyping an intel architecture extension for record and replay of multithreaded programs

Gilles Pokam; Klaus Danne; Cristiano Pereira; Rolf Kassa; Tim Kranich; Shiliang Hu; Justin E. Gottschlich; Nima Honarmand; Nathan Dautenhahn; Samuel T. King; Josep Torrellas

There has been significant interest in hardware-assisted deterministic Record and Replay (RnR) systems for multithreaded programs on multiprocessors. However, no proposal has implemented this technique in a hardware prototype with full operating system support. Such an implementation is needed to assess RnR practicality. This paper presents QuickRec, the first multicore Intel Architecture (IA) prototype of RnR for multithreaded programs. QuickRec is based on QuickIA, an Intel emulation platform for rapid prototyping of new IA extensions. QuickRec is composed of a Xeon server platform with FPGA-emulated second-generation Pentium cores, and Capo3, a full software stack for managing the recording hardware from within a modified Linux kernel. This papers focus is understanding and evaluating the implementation issues of RnR on a real platform. Our effort leads to some lessons learned, as well as to some pointers for future research. We demonstrate that RnR can be implemented efficiently on a real multicore IA system. In particular, we show that the rate of memory log generation is insignificant, and that the recording hardware has negligible performance overhead. However, the software stack incurs an average recording overhead of nearly 13%, which must be reduced to enable always-on use of RnR.


international symposium on microarchitecture | 2011

CoreRacer: a practical memory race recorder for multicore x86 TSO processors

Gilles Pokam; Cristiano Pereira; Shiliang Hu; Ali-Reza Adl-Tabatabai; Justin E. Gottschlich; Jungwoo Ha; Youfeng Wu

Shared memory multiprocessors are difficult to program because of the non-deterministic ways in which the memory operations from different threads interleave. To address this issue, many hardware-based memory race recorders have been proposed that efficiently log an ordering of the shared memory interleavings between threads for deterministic replay. These approaches are challenging to integrate into current processors because they change the cache subsystem or the coherence protocol, and they mostly support a sequentially consistent memory model. In this paper, we describe CoreRacer, a chunk-based memory race recorder architecture for multicore x86 TSO processors. CoreRacer does not modify the cache subsystem and yet it still integrates into the x86 TSO memory model. We show that by leveraging a specific x86 feature, the invariant timestamp, CoreRacer maintains ordering among chunks without piggybacking on cache coherence messages. We provide a detailed implementation and evaluation of CoreRacer on a cycle-accurate x86 simulator. We show that its integration cost into x86 is minimal and its overhead has negligible effect on performance.


programming language design and implementation | 2016

Remix: online detection and repair of cache contention for the JVM

Ariel Eizenberg; Shiliang Hu; Gilles Pokam; Joseph Devietti

As ever more computation shifts onto multicore architectures, it is increasingly critical to find effective ways of dealing with multithreaded performance bugs like true and false sharing. Previous approaches to fixing false sharing in unmanaged languages have employed highly-invasive runtime program modifications. We observe that managed language runtimes, with garbage collection and JIT code compilation, present unique opportunities to repair such bugs directly, mirroring the techniques used in manual repairs. We present Remix, a modified version of the Oracle HotSpot JVM which can detect cache contention bugs and repair false sharing at runtime. Remixs detection mechanism leverages recent performance counter improvements on Intel platforms, which allow for precise, unobtrusive monitoring of cache contention at the hardware level. Remix can detect and repair known false sharing issues in the LMAX Disruptor high-performance inter-thread messaging library and the Spring Reactor event-processing framework, automatically providing 1.5-2x speedups over unoptimized code and matching the performance of hand-optimization. Remix also finds a new false sharing bug in SPECjvm2008, and uncovers a true sharing bug in the HotSpot JVM that, when fixed, improves the performance of three NAS Parallel Benchmarks by 7-25x. Remix incurs no statistically-significant performance overhead on other benchmarks that do not exhibit cache contention, making Remix practical for always-on use.


high-performance computer architecture | 2016

LASER: Light, Accurate Sharing dEtection and Repair

Liang Luo; Akshitha Sriraman; Brooke Fugate; Shiliang Hu; Gilles Pokam; Chris J. Newburn; Joseph Devietti

Contention for shared memory, in the forms of true sharing and false sharing, is a challenging performance bug to discover and to repair. Understanding cache contention requires global knowledge of the programs actual sharing behavior, and can even arise invisibly in the program due to the opaque decisions of the memory allocator. Previous schemes have focused only on false sharing, and impose significant performance penalties or require non-trivial alterations to the operating system or runtime system environment. This paper presents the Light, Accurate Sharing dEtection and Repair (LASER) system, which leverages new performance counter capabilities available on Intels Haswell architecture that identify the source of expensive cache coherence events. Using records of these events generated by the hardware, we build a system for online contention detection and repair that operates with low performance overhead and does not require any invasive program, compiler or operating system changes. Our experiments show that LASER imposes just 2% average runtime overhead on the Phoenix, Parsec and Splash2x benchmarks. LASER can automatically improve the performance of programs by up to 19% on commodity hardware.


international symposium on microarchitecture | 2017

TMI: thread memory isolation for false sharing repair

Christian DeLozier; Ariel Eizenberg; Shiliang Hu; Gilles Pokam; Joseph Devietti

Cache contention in the form of false sharing and true sharing arises when threads overshare cache lines at high frequency. Such oversharing can reduce or negate the performance benefits of parallel execution. Prior systems for detecting and repairing cache contention lack efficiency in detection or repair, contain subtle memory consistency flaws, or require invasive changes to the program environment. In this paper, we introduce a new way to combat cache line oversharing via the Thread Memory Isolation (TMI) system. TMI operates completely in userspace, leveraging performance counters and the Linux ptrace mechanism to tread lightly on monitored applications, intervening only when necessary. TMI’s compatible-by-default design allows it to scale to real-world workloads, unlike previous proposals. TMI introduces a novel code-centric consistency model to handle cross-language memory consistency issues. TMI exploits the flexibility of code-centric consistency to efficiently repair false sharing while preserving strong consistency model semantics when necessary. TMI has minimal impact on programs without oversharing, slowing their execution by just 2% on average. We also evaluate TMI on benchmarks with known false sharing, and manually inject a false sharing bug into the leveldb key-value store from Google. For these programs, TMI provides an average speedup of 5.2x and achieves 88% of the speedup possible with manual source code fixes. CCS CONCEPTS • Computer systems organization


Archive | 2012

PROCESSOR WITH MEMORY RACE RECORDER TO RECORD THREAD INTERLEAVINGS IN MULTI-THREADED SOFTWARE

Tim Kranich; Gilles Pokam; Justin E. Gottschlich; Klaus Danne; Rolf Kassa; Shiliang Hu; Cristiano Pereira

\rightarrow


Archive | 2013

Methods and systems for performing a replay execution

Justin E. Gottschlich; Klaus Danne; Cristiano Pereira; Gilles Pokam; Rolf Kassa; Shiliang Hu; Tim Kranich

Multicore architectures; • Software and its engineering


Archive | 2014

Processor With Transactional Capability and Logging Circuitry To Report Transactional Operations

Rolf Kassa; Justin E. Gottschlich; Shiliang Hu; Gilles Pokam; Robert C. Knauerhase

\rightarrow


Archive | 2013

TECHNIQUES FOR DETECTING RACE CONDITIONS

Shiliang Hu; Gilles Pokam; Cristiano Pereira; Justin E. Gottschlich

Runtime environments;


Archive | 2012

Methods and systems to identify and reproduce concurrency violations in multi-threaded programs using expressions

Youfeng Wu; Justin E. Gottschlich; Gilles Pokam; Shiliang Hu; Ali-Reza Adl-Tabatabai; Cristiano Pereira

Collaboration


Dive into the Shiliang Hu's collaboration.

Researchain Logo
Decentralizing Knowledge