Michael Haselman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Haselman is active.

Explore More

Publication

Featured researches published by Michael Haselman.

international symposium on computer architecture | 2014

A reconfigurable fabric for accelerating large-scale datacenter services

Andrew Putnam; Adrian M. Caulfield; Eric S. Chung; Derek Chiou; Kypros Constantinides; John Demme; Hadi Esmaeilzadeh; Jeremy Fowers; Gopi Prashanth Gopal; Jan Gray; Michael Haselman; Scott Hauck; Stephen Heil; Amir Hormati; Joo-Young Kim; Sitaram Lanka; James R. Larus; Eric C. Peterson; Simon Pope; Aaron Smith; Jason Thong; Phillip Yi Xiao; Doug Burger

Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we have designed and built a composable, reconfigurable fabric to accelerate portions of large-scale software services. Each instantiation of the fabric consists of a 6×8 2-D torus of high-end Stratix V FPGAs embedded into a half-rack of 48 machines. One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables. In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system when ranking candidate documents. Under high load, the largescale reconfigurable fabric improves the ranking throughput of each server by a factor of 95% for a fixed latency distribution-or, while maintaining equivalent throughput, reduces the tail latency by 29%.

IEEE Micro | 2015

A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services

To advance datacenter capabilities beyond what commodity server designs can provide, the authors designed and built a composable, reconfigurable fabric to accelerate large-scale software services. Each instantiation of the fabric consists of a 6 x 8 2D torus of high-end field-programmable gate arrays (FPGAs) embedded into a half-rack of 48 servers. The authors deployed the reconfigurable fabric in a bed of 1,632 servers and FPGAs in a production datacenter and successfully used it to accelerate the ranking portion of the Bing Web search engine by nearly a factor of two.

international symposium on computer architecture | 2018

A configurable cloud-scale DNN processor for real-time AI

Jeremy Fowers; Kalin Ovtcharov; Michael Papamichael; Todd Massengill; Ming Liu; Daniel Lo; Shlomi Alkalay; Michael Haselman; Logan Adams; Mahdi Ghandi; Stephen Heil; Prerak Patel; Adam Sapek; Gabriel Weisz; Lisa Woods; Sitaram Lanka; Steven K. Reinhardt; Adrian M. Caulfield; Eric S. Chung; Doug Burger

Interactive AI-powered services require low-latency evaluation of deep neural network (DNN) models—aka real-time AI. The growing demand for computationally expensive, state-of-the-art DNNs, coupled with diminishing performance gains of general-purpose architectures, has fueled an explosion of specialized Neural Processing Units (NPUs). NPUs for interactive services should satisfy two requirements: (1) execution of DNN models with low latency, high throughput, and high efficiency, and (2) flexibility to accommodate evolving state-of-the-art models (e.g., RNNs, CNNs, MLPs) without costly silicon updates. This paper describes the NPU architecture for Project Brainwave, a production-scale system for real-time AI. The Brainwave NPU achieves more than an order of magnitude improvement in latency and throughput over state-of-the-art GPUs on large RNNs at a batch size of 1. The NPU attains this performance using a single-threaded SIMD ISA paired with a distributed microarchitecture capable of dispatching over 7M operations from a single instruction. The spatially distributed microarchitecture, scaled up to 96,000 multiply-accumulate units, is supported by hierarchical instruction decoders and schedulers coupled with thousands of independently addressable high-bandwidth on-chip memories, and can transparently exploit many levels of fine-grain SIMD parallelism. When targeting an FPGA, microarchitectural parameters such as native datapaths and numerical precision can be synthesis specialized to models at compile time, enabling atypically high FPGA performance competitive with hardened NPUs. When running on an Intel Stratix 10 280 FPGA, the Brainwave NPU achieves performance ranging from ten to over thirty-five teraflops, with no batching, on large, memory-intensive RNNs.

field programmable gate arrays | 2016

Agile Co-Design for a Reconfigurable Datacenter

Shlomi Alkalay; Hari Angepat; Adrian M. Caulfield; Eric S. Chung; Oren Firestein; Michael Haselman; Stephen Heil; Kyle Holohan; Matt Humphrey; Tamás Juhász; Puneet Kaur; Sitaram Lanka; Daniel Lo; Todd Massengill; Kalin Ovtcharov; Michael Papamichael; Andrew Putnam; Raja Seera; Rimon Tadros; Jason Thong; Lisa Woods; Derek Chiou; Doug Burger

In 2015, a team of software and hardware developers at Microsoft shipped the world?s first commercial search engine accelerated using FPGAs in the datacenter. During the sprint to production, new algorithms in the Bing ranking service were ported into FPGAs and deployed to a production bed within several weeks of conception, leading to significant gains in latency and throughput. The fast turnaround time of new features demanded by an agile software culture would not have been possible without a disciplined and effective approach to co-design in the datacenter. This talk will describe some of the learnings and best practices developed from this unique experience.

international symposium on microarchitecture | 2016

A cloud-scale acceleration architecture

Adrian M. Caulfield; Eric S. Chung; Andrew Putnam; Hari Angepat; Jeremy Fowers; Michael Haselman; Stephen Heil; Matt Humphrey; Puneet Kaur; Joo-Young Kim; Daniel Lo; Todd Massengill; Kalin Ovtcharov; Michael Papamichael; Lisa Woods; Sitaram Lanka; Derek Chiou; Doug Burger

IEEE Micro | 2018

Serving DNNs in Real Time at Datacenter Scale with Project Brainwave

Eric S. Chung; Jeremy Fowers; Kalin Ovtcharov; Michael Papamichael; Adrian M. Caulfield; Todd Massengill; Ming Liu; Daniel Lo; Shlomi Alkalay; Michael Haselman; Maleen Abeydeera; Logan Adams; Hari Angepat; Christian Boehn; Derek Chiou; Oren Firestein; Alessandro Forin; Kang Su Gatlin; Mahdi Ghandi; Stephen Heil; Kyle Holohan; Ahmad M. El Husseini; Tamás Juhász; Kara Kagi; Ratna Kovvuri; Sitaram Lanka; Friedel van Megen; Dima Mukhortov; Prerak Patel; Brandon Perez

IEEE Micro | 2017

Configurable Clouds

Adrian M. Caulfield; Eric S. Chung; Andrew Putnam; Hari Angepat; Daniel Firestone; Jeremy Fowers; Michael Haselman; Stephen Heil; Matt Humphrey; Puneet Kaur; Joo-Young Kim; Daniel Lo; Todd Massengill; Kalin Ovtcharov; Michael Papamichael; Lisa Woods; Sitaram Lanka; Derek Chiou; Doug Burger

Archive | 2016