Hari Angepat | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hari Angepat is active.

Explore More

Publication

Featured researches published by Hari Angepat.

international symposium on microarchitecture | 2007

FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators

Derek Chiou; Dam Sunwoo; Joonsoo Kim; Nikhil A. Patil; William H. Reinhart; Darrel Eric Johnson; Jebediah Keefe; Hari Angepat

Addresses suffering from cache misses typically exhibit repetitive patterns due to the temporal locality inherent in the access stream. However, we observe that the number of in- tervening misses at the last-level cache between the eviction of a particular block and its reuse can be very large, pre- venting traditional victim caching mechanisms from exploiting this repeating behavior. In this paper, we present Scavenger, a new architecture for last-level caches. Scavenger divides the total storage budget into a conventional cache and a novel victim file architecture, which employs a skewed Bloom filter in conjunction with a pipelined priority heap to identify and retain the blocks that most frequently missed in the conven- tional part of the cache in the recent past. When compared against a baseline configuration with a 1MB 8-way L2 cache, a Scavenger configuration with a 512kB 8-way conventional cache and a 512kB victim file achieves an IPC improvement of up to 63% and on average (geometric mean) 14.2% for nine memory-bound SPEC 2000 applications. On a larger set of sixteen SPEC 2000 applications, Scavenger achieves an aver- age speedup of 8%.This paper describes FAST, a novel simulation methodol- ogy that can produce simulators that (i) are orders of mag- nitude faster than comparable simulators, (ii) are cycle- accurate, (iii) model the entire system running unmodified applications and operating systems, (iv) provide visibility with minimal simulation performance impact and (v) are capable of running current instruction sets such as x86. It achieves its capabilities by partitioning simulators into a speculative functional model component that simulates the instruction set architecture and a timing model com- ponent that predicts performance. The speculative func- tional model enables the simulator to be parallelized, im- plementing the timing model in FPGA hardware for speed and the functional model using a modified full-system simu- lators. We currently achieve an average simulation speed of 1.2MIPS running x86 applications on x86 Linux and Win- dows XP and expect to achieve 10MIPS over time. Such simulators are useful to virtually all computer system sim- ulator users ranging from architects, through RTL design- ers and verifiers to software developers. Sharing a common simulation/design infrastructure could foster better commu- nication between these groups, potentially resulting in bet- ter system designs.

IEEE Computer Architecture Letters | 2014

An FPGA-based In-Line Accelerator for Memcached

Maysam Lavasani; Hari Angepat; Derek Chiou

We present a method for accelerating server applications using a hybrid CPU+FPGA architecture and demonstrate its advantages by accelerating Memcached, a distributed key-value system. The accelerator, implemented on the FPGA fabric, processes request packets directly from the network, avoiding the CPU in most cases. The accelerator is created by profiling the application to determine the most commonly executed trace of basic blocks which are then extracted. Traces are executed speculatively within the FPGA. If the control flow exits the trace prematurely, the side effects of the computation are rolled back and the request packet is passed to the CPU. When compared to the best reported software numbers, the Memcached accelerator is 9.15× more energy efficient for common case requests.

IEEE Computer Architecture Letters | 2009

Accurate Functional-First Multicore Simulators

Derek Chiou; Hari Angepat; Nikhil A. Patil; Dam Sunwoo

Fast and accurate simulation of multicore systems requires a parallelized simulator. This paper describes a novel method to build cycle-accurate-capable and parallelizable functional-first simulators of multicore targets.

field-programmable logic and applications | 2010

NIFD: Non-intrusive FPGA Debugger -- Debugging FPGA 'Threads' for Rapid HW/SW Systems Prototyping

Hari Angepat; Gage Eads; Christopher Craik; Derek Chiou

Debugging hardware has always been difficult when compared to debugging software, in large part due to a lack of convenient visibility. This paper describes the open NIFD framework that provides software-like debugging facilities to both pure FPGA and hybrid FPGA/software platforms, allowing a designer to treat the hardware logic like a specialized remote software debug target. NIFD provides features such as single stepping, breakpoints, and examination of the full hardware state from a standard debug console such as GDB. The framework leverages built-in readback support to enable non-intrusive, transparent debugging with full observability and controllability. This technique is not only useful for debugging, but can also be used in production environments for infrequent events such as the slow sampling of counters.

international parallel and distributed processing symposium | 2008

Parallelizing computer system simulators

Derek Chiou; Dam Sunwoo; Hari Angepat; Joonsoo Kim; Nikhil A. Patil; William H. Reinhart; Darrel Eric Johnson

This paper describes NSF-supported work in parallelized computer system simulators being done in the Electrical and Computer Engineering Department at the University of Texas at Austin. Our work is currently following two paths: (i) the FAST simulation methodology[9, 11, 10] that is capable of simulating complex systems accurately and quickly (currently about 1.2MIPS executing the x86 ISA, modeling an out-of-order superscalar processor and booting Windows XP and Linux) and (ii) the RAMP-White (White)[l, 22] platform that will soon be capable of simulating very large systems of around 1000 cores. We plan to combine the projects to provide fast and accurate simulation of multicore systems.

field programmable gate arrays | 2016

HGum: Messaging Framework for Hardware Accelerators (Abstact Only)

Sizhuo Zhang; Hari Angepat; Derek Chiou

Software messaging frameworks help avoid errors and reduce engineering effort in building distributed systems by (i) providing an interface definition language (IDL) to precisely specify the structure of the message (the message schema) and (ii) automatically generating the serialization and deserialization functions that transform user data structures into binary data for sending across the network and vice versa. Similarly, a hardware-accelerated system that consists of host software and multiple FPGAs, could also benefit from a messaging framework to handle messages both between software and FPGA and also between different FPGAs. The key challenge for a hardware messaging framework is that it must be able to support large messages with complex schema while meeting critical constraints such as clock frequency, area, and throughput. We present HGum, a messaging framework for hardware accelerators that meets all the above requirements. HGum is able to generate high-performance and low-cost hardware logic by employing a novel design that algorithmically parses the message schema to perform serialization and deserialization. Our evaluation of HGum shows that it not only significantly reduces engineering effort but also generates hardware with comparable quality to manual implementation.

field programmable gate arrays | 2016

Agile Co-Design for a Reconfigurable Datacenter

Shlomi Alkalay; Hari Angepat; Adrian M. Caulfield; Eric S. Chung; Oren Firestein; Michael Haselman; Stephen Heil; Kyle Holohan; Matt Humphrey; Tamás Juhász; Puneet Kaur; Sitaram Lanka; Daniel Lo; Todd Massengill; Kalin Ovtcharov; Michael Papamichael; Andrew Putnam; Raja Seera; Rimon Tadros; Jason Thong; Lisa Woods; Derek Chiou; Doug Burger

In 2015, a team of software and hardware developers at Microsoft shipped the world?s first commercial search engine accelerated using FPGAs in the datacenter. During the sprint to production, new algorithms in the Bing ranking service were ported into FPGAs and deployed to a production bed within several weeks of conception, leading to significant gains in latency and throughput. The fast turnaround time of new features demanded by an agile software culture would not have been possible without a disciplined and effective approach to co-design in the datacenter. This talk will describe some of the learnings and best practices developed from this unique experience.

international symposium on microarchitecture | 2016

A cloud-scale acceleration architecture

Adrian M. Caulfield; Eric S. Chung; Andrew Putnam; Hari Angepat; Jeremy Fowers; Michael Haselman; Stephen Heil; Matt Humphrey; Puneet Kaur; Joo-Young Kim; Daniel Lo; Todd Massengill; Kalin Ovtcharov; Michael Papamichael; Lisa Woods; Sitaram Lanka; Derek Chiou; Doug Burger

IEEE Micro | 2018

Serving DNNs in Real Time at Datacenter Scale with Project Brainwave

Eric S. Chung; Jeremy Fowers; Kalin Ovtcharov; Michael Papamichael; Adrian M. Caulfield; Todd Massengill; Ming Liu; Daniel Lo; Shlomi Alkalay; Michael Haselman; Maleen Abeydeera; Logan Adams; Hari Angepat; Christian Boehn; Derek Chiou; Oren Firestein; Alessandro Forin; Kang Su Gatlin; Mahdi Ghandi; Stephen Heil; Kyle Holohan; Ahmad M. El Husseini; Tamás Juhász; Kara Kagi; Ratna Kovvuri; Sitaram Lanka; Friedel van Megen; Dima Mukhortov; Prerak Patel; Brandon Perez

Synthesis Lectures on Computer Architecture | 2014