Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xiaoxia Wu is active.

Publication


Featured researches published by Xiaoxia Wu.


international symposium on computer architecture | 2009

Hybrid cache architecture with disparate memory technologies

Xiaoxia Wu; Jian Li; Lixin Zhang; Evan Speight; Ramakrishnan Rajamony; Yuan Xie

Caching techniques have been an efficient mechanism for mitigating the effects of the processor-memory speed gap. Traditional multi-level SRAM-based cache hierarchies, especially in the context of chip multiprocessors (CMPs), present many challenges in area requirements, core-to-cache balance, power consumption, and design complexity. New advancements in technology enable caches to be built from other technologies, such as Embedded DRAM (EDRAM), Magnetic RAM (MRAM), and Phase-change RAM (PRAM), in both 2D chips or 3D stacked chips. Caches fabricated in these technologies offer dramatically different power and performance characteristics when compared with SRAM-based caches, particularly in the areas of access latency, cell density, and overall power consumption. In this paper, we propose to take advantage of the best characteristics that each technology offers, through the use of Hybrid Cache Architecture (HCA) designs. We discuss and evaluate two types of hybrid cache architectures: inter cache Level HCA (LHCA), in which the levels in a cache hierarchy can be made of disparate memory technologies; and intra cache level or cache Region based HCA (RHCA), where a single level of cache can be partitioned into multiple regions, each of a different memory technology. We have studied a number of different HCA architectures and explored the potential of hardware support for intra-cache data movement and power consumption management within HCA caches. Utilizing a full-system simulator that has been validated against real hardware, we demonstrate that an LHCA design can provide a geometric mean 7% IPC improvement over a baseline 3-level SRAM cache design under the same area constraint across a collection of 25 workloads. A more aggressive RHCA-based design provides 12% IPC improvement over the baseline. Finally, a 2-layer 3D cache stack (3DHCA) of high density memory technology within the same chip footprint gives 18% IPC improvement over the baseline. Furthermore, up to 70% reduction in power consumption over a baseline SRAM-only design is achieved.


design, automation, and test in europe | 2009

Power and performance of read-write aware hybrid caches with non-volatile memories

Xiaoxia Wu; Jian Li; Lixin Zhang; Evan Speight; Yuan Xie

Caches made of non-volatile memory technologies, such as Magnetic RAM (MRAM) and Phase-change RAM (PRAM), offer dramatically different power-performance characteristics when compared with SRAM-based caches, particularly in the areas of static/dynamic power consumption, read and write access latency and cell density. In this paper, we propose to take advantage of the best characteristics that each technology has to offer through the use of read-write aware Hybrid Cache Architecture (RWHCA) designs, where a single level of cache can be partitioned into read and write regions, each of a different memory technology with disparate read and write characteristics. We explore the potential of hardware support for intra-cache data movement within RWHCA caches. Utilizing a full-system simulator that has been validated against real hardware, we demonstrate that a RWHCA design with a conservative setup can provide a geometric mean 55% power reduction and yet 5% IPC improvement over a baseline SRAM cache design across a collection of 30 workloads. Furthermore, a 2-layer 3D cache stack (3DRWHCA) of high density memory technology with the same chip footprint still gives 10% power reduction and boost performance by 16% IPC improvement over the baseline.


international conference on computer aided design | 2007

Variation-aware task allocation and scheduling for MPSoC

Feng Wang; Chrysostomos Nicopoulos; Xiaoxia Wu; Yuan Xie; Narayanan Vijaykrishnan

As technology scales, the delay uncertainty caused by process variations has become increasingly pronounced in deep sub-micron designs. As a result, a paradigm shift from deterministic to statistical design methodology at all levels of the design hierarchy is inevitable [1]. In this paper, we propose a variation-aware task allocation and scheduling algorithm for Multiprocessor System-on-Chip (MPSoC) architectures to mitigate the impact of parameter variations. A new design metric, called performance yield and defined as the probability of the assigned schedule meeting the predefined performance constraints, is used to guide the task allocation and scheduling procedure. An efficient yield computation method for task scheduling complements and significantly improves the effectiveness of the proposed variation-aware scheduling algorithm. Experimental results show that our variation-aware scheduler achieves significant yield improvements. On average, 45% and 34% yield improvements over worst-case and nominal-case deterministic schedulers, respectively, can be obtained across the benchmarks by using the proposed variation-aware scheduler.


international conference on computer design | 2007

Scan chain design for three-dimensional integrated circuits (3D ICs)

Xiaoxia Wu; Paul Falkenstern; Yuan Xie

Scan chains are widely used to improve the testability of IC designs. In traditional 2D IC designs, various design techniques on the construction of scan chains have been proposed to facilitate DFT (Design-For-Test). Recently, three-dimensional (3D) technologies have been proposed as a promising solution to continue technology scaling. In this paper, we study the scan chain construction for 3D ICs, examining the impact of 3D technologies on scan chain ordering. Three different 3D scan chain design approaches (namely, VIA3D, MAP3D, and OPT3D) are proposed and compared, with the experimental results for ISCAS89 benchmark circuits. The advantages as well as disadvantages for each approach are discussed. The results show that both MAP3D and VIA3D approaches require no changes of 2D scan chain algorithms, but OPT3D can achieve the best wire length reduction for the scan chain design. The average scan chain wire length of six ISCAS89 benchmarks obtained from OPT3D has 46.0% reduction compared to the 2D scan chain design. To the best of our knowledge, this is the first study on scan chain design for 3D integrated circuits.


international conference on computer aided design | 2006

Guaranteeing performance yield in high-level synthesis

Wei-Lun Hung; Xiaoxia Wu; Yuan Xie

Meeting timing constraint is one of the most important issues for modern design automation tools. This situation is exacerbated with the existence of process variation. Current high-level synthesis tools, performing task scheduling, resource allocation and binding, may result in unexpected performance discrepancy due to the ignorance of the impact of process variation, which requires a shift in the design paradigm, from todays deterministic design to statistical or probabilistic design. In this paper, we present a variation-aware performance yield-guaranteed high-level synthesis algorithm. The proposed approach integrates high-level synthesis and statistical static timing analysis into a simulated annealing engine to simultaneously explore solution space while meeting design objectives. Our results show that the area reduction is in the average of 14% when 95% performance yield is imposed with the same total completion time constraint


asia and south pacific design automation conference | 2008

Variability-driven module selection with joint design time optimization and post-silicon tuning

Feng Wang; Xiaoxia Wu; Yuan Xie

Increasing delay and power variation are significant challenges to the designers as technology scales to the deep sub-micron (DSM) regime. Traditional module selection techniques in high level synthesis use worst case delay/power information to perform the optimization, and therefore may be too pessimistic such that extra resources are used to guarantee design requirements. Parametric yield, which is defined as the probability of the synthesized hardware meeting the performance/power constraints, can be used to guide design space exploration. The parametric yield can be effectively improved by combining both design-time variation-aware optimization and post silicon tuning techniques (such as adaptive body biasing (ABB)). In this paper, we propose a module selection algorithm that combines design-time optimization with post- silicon tuning (using ABB) to maximize design yield. A variation-aware module selection algorithm based on efficient performance and power yield gradient computation is developed. The post silicon optimization is formulated as an efficient sequential conic program to determine the optimal body bias distribution, which in turn affects design-time module selection. The experiment results show that significant yield can be achieved compared to traditional worst-case driven module selection technique. To the best of our knowledge, this is the first variability-driven high level synthesis technique that considers post-silicon tuning during design time optimization.


ACM Transactions on Architecture and Code Optimization | 2010

Design exploration of hybrid caches with disparate memory technologies

Xiaoxia Wu; Jian Li; Lixin Zhang; Evan Speight; Ramakrishnan Rajamony; Yuan Xie

Traditional multilevel SRAM-based cache hierarchies, especially in the context of chip multiprocessors (CMPs), present many challenges in area requirements, core--to--cache balance, power consumption, and design complexity. New advancements in technology enable caches to be built from other technologies, such as Embedded DRAM (EDRAM), Magnetic RAM (MRAM), and Phase-change RAM (PRAM), in both 2D chips or 3D stacked chips. Caches fabricated in these technologies offer dramatically different power-performance characteristics when compared with SRAM-based caches, particularly in the areas of access latency, cell density, and overall power consumption. In this article, we propose to take advantage of the best characteristics that each technology has to offer through the use of Hybrid Cache Architecture (HCA) designs. We discuss and evaluate two types of hybrid cache architectures: intercache-Level HCA (LHCA), in which the levels in a cache hierarchy can be made of disparate memory technologies; and intracache-level or cache-Region-based HCA (RHCA), where a single level of cache can be partitioned into multiple regions, each of a different memory technology. We have studied a number of different HCA architectures and explored the potential of hardware support for intracache data movement and power consumption management within HCA caches. Utilizing a full-system simulator that has been validated against real hardware, we demonstrate that an LHCA design can provide a geometric mean 6% IPC improvement over a baseline 3-level SRAM cache design under the same area constraint across a collection of 30 workloads. A more aggressive RHCA-based design provides 10% IPC improvement over the baseline. A 2-layer 3D cache stack (3DHCA) of high density memory technology within the same chip footprint gives 16% IPC improvement over the baseline. We also achieve up to a 72% reduction in power consumption over a baseline SRAM-only design. Energy-delay and thermal evaluation for 3DHCA are also presented. In addition to the fast-slow region based RHCA, we further evaluate read-write region based RHCA designs.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2011

Variation-Aware Task and Communication Mapping for MPSoC Architecture

Feng Wang; Yibo Chen; Chrysostomos Nicopoulos; Xiaoxia Wu; Yuan Xie; Narayanan Vijaykrishnan

As technology scales, the delay uncertainty caused by process variations has become increasingly pronounced in deep submicrometer designs. As a result, a paradigm shift from deterministic to statistical design methodology at all levels of the design hierarchy is inevitable. In this paper, we propose a variation-aware task and communication mapping methodology for multiprocessor system-on-chips that uses network-on-chip communication architecture so that the impact of parameter variations can be mitigated. Our mapping scheme accounts for variability in both the processing cores and the communication links to ensure a complete and accurate model of the entire system. A new design metric, called performance yield and defined as the probability of the assigned schedule meeting the predefined performance constraints, is used to guide both the task scheduling and the routing path allocation procedure. An efficient yield computation method for this mapping complements and significantly improves the effectiveness of the proposed variation-aware mapping algorithm. Experimental results show that our variation-aware mapper achieves significant yield improvements over worst-case and nominal-case deterministic mapper.


Microelectronics Journal | 2010

Test-access mechanism optimization for core-based three-dimensional SOCs

Xiaoxia Wu; Yibo Chen; Krishnendu Chakrabarty; Yuan Xie

Embedded cores in a core-based system-on-chip (SOC) are not easily accessible via chip I/O pins. Test-access mechanisms (TAMs) and test wrappers (e.g., the IEEE Standard 1500 wrapper) have been proposed for the testing of embedded cores in a core-based SOC in a modular fashion. We show that such a modular testing approach can also be used for emerging three-dimensional integrated circuits based on through-silicon vias (TSVs). Core-based SOCs based on 3D IC technology are being advocated as a means to continue technology scaling and overcome interconnect-related bottlenecks. We present an optimization technique for minimizing the post-bond test time for 3D core-based SOCs under constraints on the number of TSVs, the TAM bitwidth, and thermal limits. The proposed optimization method is based on a combination of integer linear programming, LP-relaxation, and randomized rounding. It considers the Test Bus and TestRail architectures, and incorporates wire-length constraints in test-access optimization. Simulation results are presented for the ITC 02 SOC Test Benchmarks and the test times are compared to that obtained when methods developed earlier for two-dimensional ICs are applied to 3D ICs. The test time dependence on various 3D parameters (e.g. 3D placement, the number of layers, thermal constraints, and the number of TSVs) is also studied.


international symposium on low power electronics and design | 2009

Exploration of 3D stacked L2 cache design for high performance and efficient thermal control

Guangyu Sun; Xiaoxia Wu; Yuan Xie

The three-dimensional (3D) integration enables stacking large memory on top of chip-multi-processors (CMPs). Compared to the 2D case, the extra dimension and high bandwidth provide more options for the design of on-chip memory such as L2 caches. In this work, we study the design of 3D stacked set-associative L2 caches through managing the placement of cache ways. The evaluation results show that the placement has an impact on the performance. In addition, we propose a technique of shadow tag to dynamically adjust the working size of the 3D cache in order to save power and reduce the peak temperature. Evaluation results show that the proposed inter-layer core-based-distribution placement of 3D cache ways is the best design option, when both the performance and thermal management are considered.

Collaboration


Dive into the Xiaoxia Wu's collaboration.

Top Co-Authors

Avatar

Yuan Xie

University of California

View shared research outputs
Top Co-Authors

Avatar

Lixin Zhang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Feng Wang

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yibo Chen

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Paul Falkenstern

Pennsylvania State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge