Ilya K. Ganusov
Xilinx
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ilya K. Ganusov.
IEEE Micro | 2016
Sagheer Ahmad; Vamsi Boppana; Ilya K. Ganusov; Vinod K. Kathail; Vidya Rajagopalan; Ralph D. Wittig
This article presents the Zynq UltraScale+ MPSoC (multiprocessor system on chip), which builds on the Zynq-7000 family. Compared to the first-generation Zynq, MPSoC increases performance and power efficiency while significantly improving the integration level between the SoC and the field-programmable gate array (FPGA). It also further raises the programming abstraction with the introduction of a new heterogeneous system-wide compiler. At the hardware level, system-wide coherency and shared virtual memory bridge across the processor subsystem into the programmable logic array. The new SDSoC (software-designed SoC) environment combines the ARM compiler with a high-level synthesis technology-based FPGA compiler and a full-system optimizing compiler to target all elements of the heterogeneous SoC from a common program source.
field programmable logic and applications | 2016
Ilya K. Ganusov; Henri Fraisse; Aaron N. Ng; Rafael Trapani Possignolo; Sabya Das
This paper describes the methodology and algorithms behind extra pipeline analysis tools released in the Xilinx Vivado Design Suite version 2015.3. Extra pipelining is one of the most effective ways to improve performance of FPGA applications. Manual pipelining, however, often requires significant efforts from FPGA designers who need to explore various changes in the RTL and re-run the flow iteratively. The automatic pipelining approach described in this paper, in contrast, allows FPGA users to explore latency vs. performance trade-offs of their designs before investing time and effort into modifying RTL. We describe algorithms behind these tools which use simple cut heuristics to maximize performance improvement while minimizing additional latency and register overhead. To demonstrate the effectiveness of the proposed approach, we analyse a set of 93 commercial FPGA applications and IP blocks mapped to Xilinx UltraScale+ and UltraScale generations of FPGAs. The results show that extra pipelining can provide from 18% to 29% potential Fmax improvement on average. It also shows that the distribution of improvements is bimodal, with almost half of benchmark suite designs showing no improvement due to the presence of large loops. Finally, we demonstrate that highly-pipelined designs map well to UltraScale+ and UltraScale FPGA architectures. Our approach demonstrates 19% and 20% Fmax improvement potential for the UltraScale+ and UltraScale architectures respectively, with the majority of applications reaching their loop limit through pipelining.
field programmable logic and applications | 2016
Ilya K. Ganusov; Benjamin S. Devlin
This paper presents enhancements to the Xilinx UltraScale+ clocking architecture to support fine-grain time-borrowing. Time borrowing improves performance by redistributing timing slack between fast and slow paths. The Ultra-Scale+ architecture introduces programmable hardware delays and pulse generators embedded in the clocking tree to support time-borrowing based both on clock skew scheduling and pulsed latches. This programmable hardware allows borrowing from a few picoseconds to multiple nanoseconds between sequential pipeline stages without any changes to RTL, placement or routing. Vivado algorithms automatically determine when to skew flip-flop clock or convert them to pulsed latches to achieve the highest possible performance. Using the default Vivado flow, this programmable time-borrowing platform delivers 5.5% Fmax increase on average over a suite of 89 industrial designs. It is especially effective on high-speed applications, delivering up to 13.7% Fmax increase on individual designs. We also demonstrate that using non-default features, such as delays cascades or increasing hold margin, can increase average performance gains to 7.4% and 8.5%, respectively. This platform incurs minimum area (less than 0.1% of total chip area) while staying robust in the presence of tight hold constraints and increasing process variation.
ieee hot chips symposium | 2015
Vamsi Boppana; Sagheer Ahmad; Ilya K. Ganusov; Vinod K. Kathail; Vidya Rajagopalan; Ralph D. Wittig
Archive | 2016
Benjamin S. Devlin; Ilya K. Ganusov
Archive | 2015
Ilya K. Ganusov; Benjamin S. Devlin
Archive | 2013
Ilya K. Ganusov; Manu Jose
Archive | 2013
Ilya K. Ganusov; Brian C. Gaide
Archive | 2017
Ilya K. Ganusov; Benjamin S. Devlin
Archive | 2017
Jindrich Zejda; Atul Srinivasan; Ilya K. Ganusov; Walter A. Manaker; Benjamin S. Devlin; Satish Sivaswamy