Yanqi Zhou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yanqi Zhou is active.

Explore More

Publication

Featured researches published by Yanqi Zhou.

architectural support for programming languages and operating systems | 2016

OpenPiton: An Open Source Manycore Research Framework

Jonathan Balkind; Michael McKeown; Yaosheng Fu; Tri M. Nguyen; Yanqi Zhou; Alexey Lavrov; Mohammad Shahrad; Adi Fuchs; Samuel Payne; Xiaohua Liang; Matthew Matl; David Wentzlaff

Industry is building larger, more complex, manycore processors on the back of strong institutional knowledge, but academic projects face difficulties in replicating that scale. To alleviate these difficulties and to develop and share knowledge, the community needs open architecture frameworks for simulation, synthesis, and software exploration which support extensibility, scalability, and configurability, alongside an established base of verification tools and supported software. In this paper we present OpenPiton, an open source framework for building scalable architecture research prototypes from 1 core to 500 million cores. OpenPiton is the worlds first open source, general-purpose, multithreaded manycore processor and framework. OpenPiton leverages the industry hardened OpenSPARC T1 core with modifications and builds upon it with a scratch-built, scalable uncore creating a flexible, modern manycore design. In addition, OpenPiton provides synthesis and backend scripts for ASIC and FPGA to enable other researchers to bring their designs to implementation. OpenPiton provides a complete verification infrastructure of over 8000 tests, is supported by mature software tools, runs full-stack multiuser Debian Linux, and is written in industry standard Verilog. Multiple implementations of OpenPiton have been created including a taped-out 25-core implementation in IBMs 32nm process and multiple Xilinx FPGA prototypes.

architectural support for programming languages and operating systems | 2014

The sharing architecture: sub-core configurability for IaaS clouds

Yanqi Zhou; David Wentzlaff

Businesses and Academics are increasingly turning to Infrastructure as a Service (IaaS) Clouds such as Amazons Elastic Compute Cloud (EC2) to fulfill their computing needs. Unfortunately, current IaaS systems provide a severely restricted pallet of rentable computing options which do not optimally fit the workloads that they are executing. We address this challenge by proposing and evaluating a manycore architecture, called the Sharing Architecture, specifically optimized for IaaS systems by being reconfigurable on a sub-core basis. The Sharing Architecture enables better matching of workload to micro-architecture resources by replacing static cores with Virtual Cores which can be dynamically reconfigured to have different numbers of ALUs and amount of Cache. This reconfigurability enables many of the same benefits of heterogeneous multicores, but in a homogeneous fabric, and enables the reuse and resale of resources on a per ALU or per KB of cache basis. The Sharing Architecture leverages Distributed ILP techniques, but is designed in a way to be independent of recompilation. In addition, we introduce an economic model which is enabled by the Sharing Architecture and show how different users who have varying needs can be better served by such a flexible architecture. We evaluate the Sharing Architecture across a benchmark suite of Apache, SPECint, and parts of PARSEC, and find that it can achieve up to a 5x more economically efficient market when compared to static architecture multicores. We implemented the Sharing Architecture in Verilog and present area overhead results.

international symposium on computer architecture | 2016

CASH: supporting IaaS customers with a sub-core configurable architecture

Yanqi Zhou; Henry Hoffmann; David Wentzlaff

Infrastructure as a Service (IaaS) Clouds have grown increasingly important. Recent architecture designs support IaaS providers through fine-grain configurability, allowing providers to orchestrate low-level resource usage. Little work, however, has been devoted to supporting IaaS customers who must determine how to use such fine-grain configurable resources to meet quality-of-service (QoS) requirements while minimizing cost. This is a difficult problem because the multiplicity of configurations creates a non-convex optimization space. In addition, this optimization space may change as customer applications enter and exit distinct processing phases. In this paper, we overcome these issues by proposing CASH: a fine-grain configurable architecture co-designed with a cost-optimizing runtime system. The hardware architecture enables configurability at the granularity of individual ALUs and L2 cache banks and provides unique interfaces to support low-overhead, dynamic configuration and monitoring. The runtime uses a combination of control theory and machine learning to configure the architecture such that QoS requirements are met and cost is minimized. Our results demonstrate that the combination of fine-grain configurability and non-convex optimization provides tremendous cost savings (70% savings) compared to coarse-grain heterogeneity and heuristic optimization. In addition, the system is able to customize configurations to particular applications, respond to application phases, and provide near optimal cost for QoS targets.

IEEE Micro | 2017

Piton: A Manycore Processor for Multitenant Clouds

Michael McKeown; Yaosheng Fu; Tri M. Nguyen; Yanqi Zhou; Jonathan Balkind; Alexey Lavrov; Mohammad Shahrad; Samuel Payne; David Wentzlaff

The shared cloud-based computing paradigm has experienced enormous growth. Multitenant clouds are conventionally built atop datacenters that utilize commodity hardware connected hierarchically with standard network protocols. Piton is a 25-core manycore processor that takes a different perspective, rethinking the architecture of datacenters and specializing processor architecture for Infrastructure as a Service (IaaS) clouds. The tile-based manycore processor is designed not only as a single chip, but as a large-scale system. Up to 8,192 chips (204,800 cores) can be seamlessly connected in a flat topology, maintaining a packet-switched network fabric both on and off chip. Shared memory is supported across arbitrary cores in the system, both intrachip and interchip, enabling flexibility and fine-grained resource allocation in shared systems. Piton also targets energy efficiency, critical to datacenters, using a modified multithreaded OpenSPARC T1 core enhanced with an energy-efficient drafting mode. To further facilitate sharing and increase utility for IaaS users and providers, a novel memory traffic shaper partitions bandwidth among cores or applications. Piton reimagines the datacenter architecture, breaking down barriers between chips, nodes, and racks, and enables flexibility, performance, and energy efficiency at scale. Piton has also been open sourced as a research platform called OpenPiton to enable practical full-system manycore research.

international symposium on computer architecture | 2016

MITTS: memory inter-arrival time traffic shaping

Yanqi Zhou; David Wentzlaff

Memory bandwidth severely limits the scalability and performance of multicore and manycore systems. Application performance can be very sensitive to both the delivered memory bandwidth and latency. In multicore systems, a memory channel is usually shared by multiple cores. Having the ability to precisely provision, schedule, and isolate memory bandwidth and latency on a per-core basis is particularly important when different memory guarantees are needed on a per-customer, per-application, or per-core basis. Infrastructure as a Service (IaaS) Cloud systems, and even general purpose multicores optimized for application throughput or fairness all benefit from the ability to control and schedule memory access on a fine-grain basis. In this paper, we propose MITTS (Memory Inter-arrival Time Traffic Shaping), a simple, distributed hardware mechanism which limits memory traffic at the source (Core or LLC). MITTS shapes memory traffic based on memory request inter-arrival time, enabling fine-grain bandwidth allocation. In an IaaS system, MITTS enables Cloud customers to express their memory distribution needs and pay commensurately. For instance, MITTS enables charging customers that have bursty memory traffic more than customers with uniform memory traffic for the same aggregate bandwidth. Beyond IaaS systems, MITTS can also be used to optimize for throughput or fairness in a general purpose multi-program workload. MITTS uses an online genetic algorithm to configure hardware bins, which can adapt for program phases and variable input sets. We have implemented MITTS in Verilog and have taped-out the design in a 25-core 32nm processor and find that MITTS requires less than 0.9% of core area. We evaluate across SPECint, PARSEC, Apache, and bhm Mail Server workloads, and find that MITTS achieves an average 1.18× performance gain compared to the best static bandwidth allocation, a 2.69× average performance/cost advantage in an IaaS setting, and up to 1.17x better throughput and 1.52× better fairness when compared to conventional memory bandwidth provisioning techniques.

neural information processing systems | 2017

Deep Voice 2: Multi-Speaker Neural Text-to-Speech

Andrew Gibiansky; Sercan Ömer Arik; Gregory F. Diamos; John J. Miller; Kainan Peng; Wei Ping; Jonathan Raiman; Yanqi Zhou

arXiv: Learning | 2017

Deep Learning Scaling is Predictable, Empirically.

Joel Hestness; Sharan Narang; Newsha Ardalani; Gregory F. Diamos; Heewoo Jun; Hassan Kianinejad; Md. Mostofa Ali Patwary; Yang Yang; Yanqi Zhou

arXiv: Neural and Evolutionary Computing | 2018

Resource-Efficient Neural Architect.

Yanqi Zhou; Siavash Ebrahimi; Sercan Ömer Arik; Haonan Yu; Hairong Liu; Greg Diamos

high-performance computer architecture | 2018

Power and Energy Characterization of an Open Source 25-Core Manycore Processor

Michael McKeown; Alexey Lavrov; Mohammad Shahrad; Paul J. Jackson; Yaosheng Fu; Jonathan Balkind; Tri M. Nguyen; Katie Lim; Yanqi Zhou; David Wentzlaff

high-performance computer architecture | 2017