Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kalin Ovtcharov is active.

Publication


Featured researches published by Kalin Ovtcharov.


field-programmable custom computing machines | 2014

A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication

Jeremy Fowers; Kalin Ovtcharov; Karin Strauss; Eric S. Chung; Greg Stitt

Sparse matrix-vector multiplication (SMVM) is a crucial primitive used in a variety of scientific and commercial applications. Despite having significant parallelism, SMVM is a challenging kernel to optimize due to its irregular memory access characteristics. Numerous studies have proposed the use of FPGAs to accelerate SMVM implementations. However, most prior approaches focus on parallelizing multiply-accumulate operations within a single row of the matrix (which limits parallelism if rows are small) and/or make inefficient uses of the memory system when fetching matrix and vector elements. In this paper, we introduce an FPGA-optimized SMVM architecture and a novel sparse matrix encoding that explicitly exposes parallelism across rows, while keeping the hardware complexity and on-chip memory usage low. This system compares favorably with prior FPGA SMVM implementations. For the over 700 University of Florida sparse matrices we evaluated, it also performs within about two thirds of CPU SMVM performance on average, even though it has 2.4× lower DRAM memory bandwidth, and within almost one third of GPU SVMV performance on average, even at 9x lower memory bandwidth. Additionally, it consumes only 25W, for power efficiencies 2.6x and 2.3x higher than CPU and GPU, respectively, based on maximum device power.In this paper, we describe a novel technique to optimize longest common subsequence (LCS) algorithm for one-to-many matching problem on GPUs by transforming the computation into bit-wise operations and a post-processing step. The former can be highly optimized and achieves more than a trillion operations (cell updates) per second (CUPS)-a first for LCS algorithms. The latter is more efficiently done on CPUs, in a fraction of the bit-wise computation time. The bit-wise step promises to be a foundational step and a fundamentally new approach to developing algorithms for increasingly popular heterogeneous environments that could dramatically increase the applicability of hybrid CPU-GPU environments.Network centric core avionics attempts to solve the question of simplicity and dependability in computing by means of a fault-tolerant and robust architecture called middleware. By this means software services can be distributed via nodes which are non-dependable and which can migrate in the case of node failure, thus creating a reliable network of unreliable parts.


field programmable gate arrays | 2016

Agile Co-Design for a Reconfigurable Datacenter

Shlomi Alkalay; Hari Angepat; Adrian M. Caulfield; Eric S. Chung; Oren Firestein; Michael Haselman; Stephen Heil; Kyle Holohan; Matt Humphrey; Tamás Juhász; Puneet Kaur; Sitaram Lanka; Daniel Lo; Todd Massengill; Kalin Ovtcharov; Michael Papamichael; Andrew Putnam; Raja Seera; Rimon Tadros; Jason Thong; Lisa Woods; Derek Chiou; Doug Burger

In 2015, a team of software and hardware developers at Microsoft shipped the world?s first commercial search engine accelerated using FPGAs in the datacenter. During the sprint to production, new algorithms in the Bing ranking service were ported into FPGAs and deployed to a production bed within several weeks of conception, leading to significant gains in latency and throughput. The fast turnaround time of new features demanded by an agile software culture would not have been possible without a disciplined and effective approach to co-design in the datacenter. This talk will describe some of the learnings and best practices developed from this unique experience.


Archive | 2015

Accelerating Deep Convolutional Neural Networks Using Specialized Hardware

Kalin Ovtcharov; Olatunji Ruwase; Joo-Young Kim; Jeremy Fowers; Karin Strauss; Eric S. Chung


international symposium on microarchitecture | 2016

A cloud-scale acceleration architecture

Adrian M. Caulfield; Eric S. Chung; Andrew Putnam; Hari Angepat; Jeremy Fowers; Michael Haselman; Stephen Heil; Matt Humphrey; Puneet Kaur; Joo-Young Kim; Daniel Lo; Todd Massengill; Kalin Ovtcharov; Michael Papamichael; Lisa Woods; Sitaram Lanka; Derek Chiou; Doug Burger


ieee hot chips symposium | 2015

Toward accelerating deep learning at scale using specialized hardware in the datacenter

Kalin Ovtcharov; Olatunji Ruwase; Joo-Young Kim; Jeremy Fowers; Karin Strauss; Eric S. Chung


IEEE Micro | 2018

Serving DNNs in Real Time at Datacenter Scale with Project Brainwave

Eric S. Chung; Jeremy Fowers; Kalin Ovtcharov; Michael Papamichael; Adrian M. Caulfield; Todd Massengill; Ming Liu; Daniel Lo; Shlomi Alkalay; Michael Haselman; Maleen Abeydeera; Logan Adams; Hari Angepat; Christian Boehn; Derek Chiou; Oren Firestein; Alessandro Forin; Kang Su Gatlin; Mahdi Ghandi; Stephen Heil; Kyle Holohan; Ahmad M. El Husseini; Tamás Juhász; Kara Kagi; Ratna Kovvuri; Sitaram Lanka; Friedel van Megen; Dima Mukhortov; Prerak Patel; Brandon Perez


Archive | 2014

Sparse matrix data structure

Karin Strauss; Jeremy Fowers; Kalin Ovtcharov


networked systems design and implementation | 2018

Azure Accelerated Networking: SmartNICs in the Public Cloud.

Daniel Firestone; Andrew Putnam; Sambrama Mundkur; Derek Chiou; Alireza Dabagh; Mike Andrewartha; Hari Angepat; Vivek Bhanu; Adrian M. Caulfield; Eric S. Chung; Harish Kumar Chandrappa; Somesh Chaturmohta; Matt Humphrey; Jack Lavier; Norman Lam; Fengfen Liu; Kalin Ovtcharov; Jitu Padhye; Gautham Popuri; Shachar Raindel; Tejas Sapre; Mark Q. Shaw; Gabriel Silva; Madhan Sivakumar; Nisheeth Srivastava; Anshuman Verma; Qasim Zuhair; Deepak Bansal; Doug Burger; Kushagra Vaid


Archive | 2016

Convolutional neural networks on hardware accelerators

Eric S. Chung; Karin Strauss; Kalin Ovtcharov; Joo-Young Kim; Olatunji Ruwase


international symposium on computer architecture | 2018

A configurable cloud-scale DNN processor for real-time AI

Jeremy Fowers; Kalin Ovtcharov; Michael Papamichael; Todd Massengill; Ming Liu; Daniel Lo; Shlomi Alkalay; Michael Haselman; Logan Adams; Mahdi Ghandi; Stephen Heil; Prerak Patel; Adam Sapek; Gabriel Weisz; Lisa Woods; Sitaram Lanka; Steven K. Reinhardt; Adrian M. Caulfield; Eric S. Chung; Doug Burger

Collaboration


Dive into the Kalin Ovtcharov's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hari Angepat

University of Texas at Austin

View shared research outputs
Researchain Logo
Decentralizing Knowledge