Hari Angepat
University of Texas at Austin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hari Angepat.
international symposium on microarchitecture | 2007
Derek Chiou; Dam Sunwoo; Joonsoo Kim; Nikhil A. Patil; William H. Reinhart; Darrel Eric Johnson; Jebediah Keefe; Hari Angepat
Addresses suffering from cache misses typically exhibit repetitive patterns due to the temporal locality inherent in the access stream. However, we observe that the number of in- tervening misses at the last-level cache between the eviction of a particular block and its reuse can be very large, pre- venting traditional victim caching mechanisms from exploiting this repeating behavior. In this paper, we present Scavenger, a new architecture for last-level caches. Scavenger divides the total storage budget into a conventional cache and a novel victim file architecture, which employs a skewed Bloom filter in conjunction with a pipelined priority heap to identify and retain the blocks that most frequently missed in the conven- tional part of the cache in the recent past. When compared against a baseline configuration with a 1MB 8-way L2 cache, a Scavenger configuration with a 512kB 8-way conventional cache and a 512kB victim file achieves an IPC improvement of up to 63% and on average (geometric mean) 14.2% for nine memory-bound SPEC 2000 applications. On a larger set of sixteen SPEC 2000 applications, Scavenger achieves an aver- age speedup of 8%.This paper describes FAST, a novel simulation methodol- ogy that can produce simulators that (i) are orders of mag- nitude faster than comparable simulators, (ii) are cycle- accurate, (iii) model the entire system running unmodified applications and operating systems, (iv) provide visibility with minimal simulation performance impact and (v) are capable of running current instruction sets such as x86. It achieves its capabilities by partitioning simulators into a speculative functional model component that simulates the instruction set architecture and a timing model com- ponent that predicts performance. The speculative func- tional model enables the simulator to be parallelized, im- plementing the timing model in FPGA hardware for speed and the functional model using a modified full-system simu- lators. We currently achieve an average simulation speed of 1.2MIPS running x86 applications on x86 Linux and Win- dows XP and expect to achieve 10MIPS over time. Such simulators are useful to virtually all computer system sim- ulator users ranging from architects, through RTL design- ers and verifiers to software developers. Sharing a common simulation/design infrastructure could foster better commu- nication between these groups, potentially resulting in bet- ter system designs.
IEEE Computer Architecture Letters | 2014
Maysam Lavasani; Hari Angepat; Derek Chiou
We present a method for accelerating server applications using a hybrid CPU+FPGA architecture and demonstrate its advantages by accelerating Memcached, a distributed key-value system. The accelerator, implemented on the FPGA fabric, processes request packets directly from the network, avoiding the CPU in most cases. The accelerator is created by profiling the application to determine the most commonly executed trace of basic blocks which are then extracted. Traces are executed speculatively within the FPGA. If the control flow exits the trace prematurely, the side effects of the computation are rolled back and the request packet is passed to the CPU. When compared to the best reported software numbers, the Memcached accelerator is 9.15× more energy efficient for common case requests.
IEEE Computer Architecture Letters | 2009
Derek Chiou; Hari Angepat; Nikhil A. Patil; Dam Sunwoo
Fast and accurate simulation of multicore systems requires a parallelized simulator. This paper describes a novel method to build cycle-accurate-capable and parallelizable functional-first simulators of multicore targets.
field-programmable logic and applications | 2010
Hari Angepat; Gage Eads; Christopher Craik; Derek Chiou
Debugging hardware has always been difficult when compared to debugging software, in large part due to a lack of convenient visibility. This paper describes the open NIFD framework that provides software-like debugging facilities to both pure FPGA and hybrid FPGA/software platforms, allowing a designer to treat the hardware logic like a specialized remote software debug target. NIFD provides features such as single stepping, breakpoints, and examination of the full hardware state from a standard debug console such as GDB. The framework leverages built-in readback support to enable non-intrusive, transparent debugging with full observability and controllability. This technique is not only useful for debugging, but can also be used in production environments for infrequent events such as the slow sampling of counters.
international parallel and distributed processing symposium | 2008
Derek Chiou; Dam Sunwoo; Hari Angepat; Joonsoo Kim; Nikhil A. Patil; William H. Reinhart; Darrel Eric Johnson
This paper describes NSF-supported work in parallelized computer system simulators being done in the Electrical and Computer Engineering Department at the University of Texas at Austin. Our work is currently following two paths: (i) the FAST simulation methodology[9, 11, 10] that is capable of simulating complex systems accurately and quickly (currently about 1.2MIPS executing the x86 ISA, modeling an out-of-order superscalar processor and booting Windows XP and Linux) and (ii) the RAMP-White (White)[l, 22] platform that will soon be capable of simulating very large systems of around 1000 cores. We plan to combine the projects to provide fast and accurate simulation of multicore systems.
field programmable gate arrays | 2016
Sizhuo Zhang; Hari Angepat; Derek Chiou
Software messaging frameworks help avoid errors and reduce engineering effort in building distributed systems by (i) providing an interface definition language (IDL) to precisely specify the structure of the message (the message schema) and (ii) automatically generating the serialization and deserialization functions that transform user data structures into binary data for sending across the network and vice versa. Similarly, a hardware-accelerated system that consists of host software and multiple FPGAs, could also benefit from a messaging framework to handle messages both between software and FPGA and also between different FPGAs. The key challenge for a hardware messaging framework is that it must be able to support large messages with complex schema while meeting critical constraints such as clock frequency, area, and throughput. We present HGum, a messaging framework for hardware accelerators that meets all the above requirements. HGum is able to generate high-performance and low-cost hardware logic by employing a novel design that algorithmically parses the message schema to perform serialization and deserialization. Our evaluation of HGum shows that it not only significantly reduces engineering effort but also generates hardware with comparable quality to manual implementation.
field programmable gate arrays | 2016
Shlomi Alkalay; Hari Angepat; Adrian M. Caulfield; Eric S. Chung; Oren Firestein; Michael Haselman; Stephen Heil; Kyle Holohan; Matt Humphrey; Tamás Juhász; Puneet Kaur; Sitaram Lanka; Daniel Lo; Todd Massengill; Kalin Ovtcharov; Michael Papamichael; Andrew Putnam; Raja Seera; Rimon Tadros; Jason Thong; Lisa Woods; Derek Chiou; Doug Burger
In 2015, a team of software and hardware developers at Microsoft shipped the world?s first commercial search engine accelerated using FPGAs in the datacenter. During the sprint to production, new algorithms in the Bing ranking service were ported into FPGAs and deployed to a production bed within several weeks of conception, leading to significant gains in latency and throughput. The fast turnaround time of new features demanded by an agile software culture would not have been possible without a disciplined and effective approach to co-design in the datacenter. This talk will describe some of the learnings and best practices developed from this unique experience.
international symposium on microarchitecture | 2016
Adrian M. Caulfield; Eric S. Chung; Andrew Putnam; Hari Angepat; Jeremy Fowers; Michael Haselman; Stephen Heil; Matt Humphrey; Puneet Kaur; Joo-Young Kim; Daniel Lo; Todd Massengill; Kalin Ovtcharov; Michael Papamichael; Lisa Woods; Sitaram Lanka; Derek Chiou; Doug Burger
IEEE Micro | 2018
Eric S. Chung; Jeremy Fowers; Kalin Ovtcharov; Michael Papamichael; Adrian M. Caulfield; Todd Massengill; Ming Liu; Daniel Lo; Shlomi Alkalay; Michael Haselman; Maleen Abeydeera; Logan Adams; Hari Angepat; Christian Boehn; Derek Chiou; Oren Firestein; Alessandro Forin; Kang Su Gatlin; Mahdi Ghandi; Stephen Heil; Kyle Holohan; Ahmad M. El Husseini; Tamás Juhász; Kara Kagi; Ratna Kovvuri; Sitaram Lanka; Friedel van Megen; Dima Mukhortov; Prerak Patel; Brandon Perez
Synthesis Lectures on Computer Architecture | 2014
Hari Angepat; Derek Chiou; Eric S. Chung; James C. Hoe