Is this you? Create Your Porfile

Minxuan Zhang

National University of Defense Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Minxuan Zhang is active.

Explore More

Publication

Featured researches published by Minxuan Zhang.

annual computer security applications conference | 2006

An architectural leakage power reduction method for instruction cache in ultra deep submicron microprocessors

Chengyi Zhang; Hongwei Zhou; Minxuan Zhang; Zuocheng Xing

Leakage power will exceed dynamic power in microprocessor as feature size shrinks, especially for on-chip caches. Besides developing low leakage process and circuit, how to control the leakage power in architectural level is worth to be studied. In this paper, a PDSR (Periodically Drowsy Speculatively Recover) algorithm and its extended version with adaptivity are proposed to optimize instruction cache leakage power dissipation. SPEC CPU2000 simulation results show that, with negligible performance loss, PDSR can aggressively decrease leakage power dissipation of instruction cache. Compared with other existing methods, PDSR and adaptive PDSR achieve more satisfying and more robust energy efficiency.

international conference on parallel processing | 2007

Hardware-Based Multicast with Global Load Balance on k-ary n-trees

Quanbao Sun; Minxuan Zhang; Liquan Xiao

The multicast operation is used commonly in parallel applications and can be used to support several other collective communication operations. A significant performance improvement can be achieved by supporting multicast operations at the hardware level. In this paper, we propose two parent selecting strategies which use global information to reduce the conflict among different multicast operations on k-ary n-trees. We first define an equivalence relation to divide the switches at each stage into several equivalence classes. Then we prove that the switches, which are at the same stage and are passed through by the same multicast tree, belong to the same equivalence class. Based on the study, two least loaded parent selecting strategies are developed. The proposed strategies are evaluated through simulation experiments. The results indicate that the proposed strategies lower the multicast latency and increase the multicast throughput significantly.

annual computer security applications conference | 2006

Enhancing ICOUNT2.8 fetch policy with better fairness for SMT processors

Caixia Sun; Hongwei Tang; Minxuan Zhang

In Simultaneous Multithreading (SMT) processors, the instruction fetch policy implicitly determines shared resources allocation among all the co-scheduled threads, and consequently affects throughput and fairness. However, prior work on fetch policies almost focuses on throughput optimization. The issue of fairness between threads in progress rates is studied rarely. In this paper, we take fairness as the optimization goal and propose an enhanced version of ICOUNT2.8 with better fairness called ICOUNT2.8-fairness. Results show that using ICOUNT2.8-fairness, RPRrange (a fairness metric defined in this paper) is less than 5% for all types of workloads, and the degradation of overall throughput is not more than 7%. Especially, for two-thread MIX workload, ICOUNT2.8-fairness outperforms ICOUNT2.8 in throughput at the same time of achieving better fairness.

international conference on embedded software and systems | 2005

Detecting memory access errors with flow-sensitive conditional range analysis

Yimin Xia; Jun Luo; Minxuan Zhang

Accessing an out-of-bounds memory address can lead to nondeterministic behaviors or elusive crashes. Static analysis can detect memory access errors from program source codes without runtime overhead, but existing techniques are either very imprecise or exponential cost. This paper proposes a precise and effective method to detect memory access errors. Firstly, it generates a state for each statement with a flow-sensitive, inter-procedural algorithm. A state includes not only range constraints like the traditional range analysis, but also occurrence conditions of the range constraints. Secondly, it solves states of memory access statement to evaluate the sizes of accessed memory bounds. The costs of state generation and state resolution are polynomial. We have implemented a prototype of the analysis method. Applied to 7 popular programs, the prototype found 40 memory access errors with a high precision of 80%.

advanced parallel programming technologies | 2005

A fetch policy maximizing throughput and fairness for two-context SMT processors

Caixia Sun; Hongwei Tang; Minxuan Zhang

In Simultaneous Multithreading (SMT) processors, co-scheduled threads share the processor’s resources, but at the same time compete for them. A thread missing in L2 cache may hold a large number of resources which other threads could be using to make forward progress. And as a result, the overall performance of SMT processors is degraded. Currently, many instruction fetch policy focus on this problem. However, these policies are not perfect, and each has its own disadvantages. Especially, these policies are designed for processors implementing any ways simultaneous multithreading. The disadvantages of these policies may become more serious when they are used in two-context SMT processors. In this paper, we propose a novel fetch policy called RG-FP (Resource Gating based on Fetch Priority), which is specially designed for two-context SMT processors. RG-FP combines reducing fetch priority with controlling shared resource allocation to prevent the negative effects caused by loads missing in L2 cache. Simulation results show that our RG-FP policy outperforms previously proposed fetch policies for all types of workloads in both throughput and fairness, especially for memory bounded workloads. Results also tell that our policy shows different degrees of improvement over other fetch policies. The increment over PDG is greatest, reaching 41.8% in throughput and 50.0% in Hmean on average.

international conference on embedded software and systems | 2004

Dual-stack return address predictor

Caixia Sun; Minxuan Zhang

Return address predictors used currently almost have the same architecture: a return address stack and a top-of-stack pointer, some of which may be enhanced by repair mechanisms. The disadvantage of this type of return ad-dress predictor is that either prediction accuracy is low or the hardware cost is high. In this paper, we present a novel kind of return address prediction structure called Dual-Stack Return Address Predictor (DSRAP) which contains two return address stacks: RAS_PRED and RAS_WRB. Just as the return address stack in current return address predictors does, RAS_PRED provides predicted target addresses for procedure returns. RAS_WRB provides data for repairing RAS_PRED when a branch misprediction is detected. Results show that DSRAP can acquire 100% hit rates if mispredictions caused by unmatched call/return sequences or the stack overflow are ignored. Furthermore, DSRAP is very easy to design.

international symposium on parallel and distributed processing and applications | 2007

A Parallel infrastructure on dynamic EPIC SMT and its speculation optimization

Qingying Deng; Minxuan Zhang; Jiang Jiang

SMT(simultaneous multithreading) processors execute instructions from different threads in the same cycle, which has the unique ability to exploit ILP(instruction-level parallelism) and TLP(thread-level parallelism) simultaneously. EPIC(explicitly parallel instruction computing) emphasizes importance of the synergy between compiler and hardware. Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. Control and data speculations are effective ways to improve instruction level parallelism. In this paper, we present our efforts to design and implement a parallel environment, which includes an optimizing, portable parallel compiler OpenUH and SMT architecture EDSMT based on IA-64. Meanwhile, its speculation is also reexamined.

international conference on embedded software and systems | 2007

A Unified Compressed Cache Hierarchy Using Simple Frequent Pattern Compression and Partial Cache Line Prefetching

Xinhua Tian; Minxuan Zhang

In this paper, we propose a novel compressed cache hierarchy that uses a unified compression algorithm in both L1 data cache and L2 cache, called Simple Frequent Pattern Compression(S-FPC). This scheme can increase the cache capacity of L1 data cache and L2 cache without any sacrifice of the L1 cache access latency. The layout of compressed data in L1 data cache enables partial cache line prefetching and does not introduce prefetch buffers or increase cache pollution and memory traffic. Compared to a baseline cache hierarchy not supporting data compression in cache, on average, our cache hierarchy design increases the average L1 cache capacity(in terms of the average number of valid words in cache per cycle) by about 33%, reduces the data cache miss rate by 21%, and speeds up program execution by 13%.

advanced parallel programming technologies | 2007

Look-ahead adaptive routing on k-ary n-trees

Quanbao Sun; Liquan Xiao; Minxuan Zhang

Supporting multicast at hardware level is a future trend of interconnection networks, and the latency of hardware-based multicast is sensitive to network conflict. In this paper, we study the method to forecast and reduce the conflict between unicast and multicast traffic on k-ary n-trees. We first derive a switch grouping method to describe the relationship among switches being passed through by a unicast routing path or a multicast tree. Then we analyze the sufficient condition for conflict-free routing. Based on these observations, a look-ahead adaptive routing strategy for unicast packet is proposed. The simulation results indicate that the proposed strategy can lower the multicast latency and the unicast latency simultaneously.

parallel and distributed computing: applications and technologies | 2006

Controlling Performance of a Time-Criticial Thread in SMT Processors by Instruction Fetch Policy

Caixia Sun; Hongwei Tang; Minxuan Zhang

In simultaneous multithreading (SMT) processors, the instruction fetch policy affects the speed at which each thread runs and overall throughput. However, current fetch policies almost focus on overall throughput optimization, and provide no control over how fast individual threads run. As a result, the performance of a thread varies with fetch policy and the workload it is executed. This performance unpredictability means that the execution time of a thread is unpredictable. So only depending on the operating system (OS) thread scheduler to guarantee the execution time constraint of a time critical thread is not enough even fails. The hardware must ensure that the performance of the time critical thread is predictable in any timeslice. In this paper, we propose a novel fetch policy to control performance of a time critical thread in SMT processors. We evaluate our policy using many different workloads, and results show that for more than 94% of all cases measured, our policy can achieve the desired performance. For the failing cases, the average variance is within 1.25%. Furthermore, our policy does not sacrifice overall throughput severely. Compared to fetch policies orienting towards throughput maximization such as ICOUNT, the average degradation of overall throughput is less than 3%. Especially, our policy makes efforts to maximize the throughput of all coscheduled threads other than the time critical one, and gives 98.25% of the throughput achieved by ICOUNT on average

Explore More