Xianhua Liu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xianhua Liu is active.

Explore More

Publication

Featured researches published by Xianhua Liu.

field programmable gate arrays | 2010

Bit-level optimization for high-level synthesis and FPGA-based acceleration

Jiyu Zhang; Zhiru Zhang; Sheng Zhou; Mingxing Tan; Xianhua Liu; Xu Cheng; Jason Cong

Automated hardware design from behavior-level abstraction has drawn wide interest in FPGA-based acceleration and configurable computing research field. However, for many high-level programming languages, such as C/C++, the description of bitwise access and computation is not as direct as hardware description languages, and high-level synthesis of algorithmic descriptions may generate suboptimal implementations for bitwise computation-intensive applications. In this paper we introduce a bit-level transformation and optimization approach to assisting high-level synthesis of algorithmic descriptions. We introduce a bit-flow graph to capture bit-value information. Analysis and optimizing transformations can be performed on this representation, and the optimized results are transformed back to the standard data-flow graphs extended with a few instructions representing bitwise access. This allows high-level synthesis tools to automatically generate circuits with higher quality. Experiments show that our algorithm can reduce slice usage by 29.8% on average for a set of real-life benchmarks on Xilinx Virtex-4 FPGAs. In the meantime, the clock period is reduced by 13.6% on average, with an 11.4% latency reduction.

international conference on supercomputing | 2012

CVP: an energy-efficient indirect branch prediction with compiler-guided value pattern

Mingxing Tan; Xianhua Liu; Tong Tong; Xu Cheng

Indirect branch prediction is becoming increasingly important in modern high-performance processors. However, previous indirect branch predictors either require a significant amount of hardware storage and complexity, or heavily rely on the expensive manual profiling. In this paper, we propose the Compiler-Guided Value Pattern (CVP) prediction, an energy-efficient and accurate indirect branch prediction via compiler-microarchitecture cooperation. The key of CVP prediction is to use the compiler-guided value pattern as the correlated information to hint the dynamic predictor. The value pattern reflects the pattern regularity of the value correlation, and thus significantly improves the prediction accuracy even in the case of deep pipeline stage or long memory latency. CVP prediction relies on the compiler to automatically identify the primary value correlation based on three high-level program substructures: virtual function calls, switch-case statements and function pointer calls. The compiler-identified information is then fed back to the dynamic predictor and is further used to hint the indirect branch prediction at runtime. We show that CVP prediction can be implemented in modern processors with little extra hardware support. Evaluations show that CVP prediction can significantly improve the prediction accuracy by 46% over the traditional BTB-based prediction, leading to the performance improvement of 20%. Compared with the state-of-the-art aggressive ITTAGE and VBBI predictors, CVP prediction can improve the performance by 5.5% and 4.2% respectively.

design, automation, and test in europe | 2012

Energy-efficient branch prediction with compiler-guided history stack

Mingxing Tan; Xianhua Liu; Zichao Xie; Dong Tong; Xu Cheng

Branch prediction is critical in exploring instruction level parallelism for modern processors. Previous aggressive branch predictors generally require significant amount of hardware storage and complexity to pursue high prediction accuracy. This paper proposes the Compiler-guided History Stack (CHS), an energy-efficient compiler-microarchitecture cooperative technique for branch prediction. The key idea is to track very-long-distance branch correlation using a low-cost compiler-guided history stack. It relies on the compiler to identify branch correlation based on two program substructures: loop and procedure, and feed the information to the predictor by inserting guiding instructions. At runtime, the processor dynamically saves and restores the global history using a low-cost history stack structure according to the compiler-guided information. The modification on the global history enables the predictor to track very-long-distance branch correlation and thus improves the prediction accuracy. We show that CHS can be combined with most of existing branch predictors and it is especially effective with small and simple predictors. Our evaluations show that the CHS technique can reduce the average branch mispredictions by 28.7% over gshare predictor, resulting in average performance improvement of 10.4%. Furthermore, it can also improve those aggressive perceptron, OGEHL and TAGE predictors.

international conference on parallel processing | 2015

An Energy-Efficient Branch Prediction with Grouped Global History

Mingkai Huang; Dan He; Xianhua Liu; Mingxing Tan; Xu Cheng

Branch prediction has been playing an increasingly important role in improving the performance and energy efficiency for modern microprocessors. The state-of-the-art branch predictors, such as the perceptron and TAGE predictors, leverage novel prediction algorithms to explore longer branch history for higher prediction accuracy. We observe that as the branch history is becoming longer, the efficiency of global history is degraded by the interference of different branch instructions. In order to mitigate the excessive influence of the branch history interference, we propose the Grouped Global History (GGH) based branch predictor, a lightweight yet efficient branch predictor. Unlike existing branch predictors that make use of a unified global history for prediction, GGH divides the global history into a set of subgroups such that the interference resulted by frequently executed branch instructions could be restricted. With subgroups of global history, GGH also enables us to track even longer effective branch correlation without introducing hardware storage overhead. Our experimental results based on SPEC CINT 2006 workloads demonstrate that our approach can significantly reduce the branch mispredictions per kilo instructions (MPKI) by 4.76 over the baseline perceptron predictor, with a simple control logic extension.

virtual execution environments | 2017

Content Look-Aside Buffer for Redundancy-Free Virtual Disk I/O and Caching

Chun Yang; Xianhua Liu; Xu Cheng

Storage consolidation in a virtualized environment introduces numerous duplications in virtual disks and imposes considerable pressure on disk I/O and caching. In this paper, we present a content look-aside buffer (CLB) approach for simultaneously providing redundancy-free virtual disk I/O and caching. CLB attaches persistent fingerprints to virtual disk blocks, which enables detection of I/O redundancy before disk access. At run time, CLB exploits content pages already present in the guest disk caches to service the redundant reads through page sharing, thus eliminating both redundant I/O requests and redundant disk cache copies. For write requests, CLB uses a group invalidating writeback protocol for updating fingerprints to support crash consistency while minimizing disk write overhead. By implementing and evaluating a CLB prototype on KVM hypervisor, we demonstrate that CLB delivers considerably improved I/O performance with realistic workloads. Our CLB prototype improves the throughput of sequential and random read on duplicate data by 4.1x and 26.2x, respectively. For typical read-intensive workloads, such as booting VM and launching application, CLBs I/O deduplication and cache deduplication eliminates 94.9%--98.5% of read requests and saves 50%--100% cache memory in each VM, respectively. Compared with the QEMUs raw virtual disk format, CLB improves the per-disk VM density by 8x--16x. For mixed read-write workloads, the cost of on-line fingerprint updating offsets the read benefit; nevertheless, CLB substantially improves overall performance.

international conference on computer design | 2016

MFAP: Fair Allocation between fully backlogged and non-fully backlogged applications

Yan Sui; Chun Yang; Dong Tong; Xianhua Liu; Xu Cheng

In this paper, we consider the problem of ensuring fairness in systems serving a mixture of fully backlogged applications, which continuously demand resources, and non-fully backlogged applications. We introduce a fairness metric, called interference fairness, the basic idea underlying which is that the interference caused by application A for another application B should be equal to that caused by B for A. To effectively and efficiently guarantee this fairness metric, we propose Mutual Fair Allocation Policy (MFAP), a simple and powerful resource sharing policy, and show how it guarantees interference fairness between any pair of applications. We also show that MFAP, unlike other viable policies, satisfies several highly desirable properties, including some from game theory, as well as common sense intuitions. As a use case, we implemented MFAP on a disk scheduling framework. The experimental results based on synthetic and real workloads show how our implementation achieved interference fairness and improved non-fully backlogged applications performance.

international conference on algorithms and architectures for parallel processing | 2015

Exploration of the Relationship Between Just-in-Time Compilation Policy and Number of Cores

Mingkai Huang; Xianhua Liu; Tingyu Zhang; Xu Cheng

Just-in-Time (JIT) compilation is a key technique for programs written in managed languages, such as Java and JavaScript. Traditionally, a conservative JIT compilation policy is used without impacting application threads too much on single-core machines. Nowadays, modern machines provide more and more processor cores, which are abundant computing resources. Modern virtual machines also have the ability to use an aggressive compilation policy, such as spawning multiple concurrent compiler threads, which is suitable to multicore situation. However, the suitable JIT compilation policy varies with the number of microprocessor cores. The goal of this work is to explore the relationship between the number of microprocessor cores on modern machines and the suitable JIT compilation policies that can enable existing as well as future VMs to realize better program performance.

international conference on embedded software and systems | 2007

NISD: A Framework for Automatic Narrow Instruction Set Design

Xianhua Liu; Jiyu Zhang; Xu Cheng

Code size is becoming an important design factor in the embedded domain. To deal with this problem, many embedded RISC processors support a dual-width instruction set. Mixed code generation is also introduced in expectation of achieving both higher code density from the narrow instruction set (usually 16 bits) and good performance from the normal one (usually 32 bits), with little extra cost. To a certain application domain, processors can combine an efficient general purpose instruction set and a narrow instruction set tailored to the particular applications. Since the design of instruction set is highly related to the compiler and the application programs, a feedback driven technique will be a good choice. In this paper, we introduce a framework of automatic narrow instruction set design. The instructions are described in our Instruction Set Description Template (ISDT). Given a set of application programs, the design tool will iteratively use the suggested narrow instruction set represented in ISDT to do mixed-code generation and to update the narrow instruction set according to the evaluation feedback, thus to get an ultimate fine narrow instruction set without human designers involvement. We describe our method in detail by example of designing narrow instruction set for UniCore with the mediabench as the application set, and show its usefulness through the experiments.

Archive | 2011