Ilya K. Ganusov | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ilya K. Ganusov is active.

Explore More

Publication

Featured researches published by Ilya K. Ganusov.

IEEE Micro | 2016

A 16-nm Multiprocessing System-on-Chip Field-Programmable Gate Array Platform

Sagheer Ahmad; Vamsi Boppana; Ilya K. Ganusov; Vinod K. Kathail; Vidya Rajagopalan; Ralph D. Wittig

This article presents the Zynq UltraScale+ MPSoC (multiprocessor system on chip), which builds on the Zynq-7000 family. Compared to the first-generation Zynq, MPSoC increases performance and power efficiency while significantly improving the integration level between the SoC and the field-programmable gate array (FPGA). It also further raises the programming abstraction with the introduction of a new heterogeneous system-wide compiler. At the hardware level, system-wide coherency and shared virtual memory bridge across the processor subsystem into the programmable logic array. The new SDSoC (software-designed SoC) environment combines the ARM compiler with a high-level synthesis technology-based FPGA compiler and a full-system optimizing compiler to target all elements of the heterogeneous SoC from a common program source.

field programmable logic and applications | 2016

Automated extra pipeline analysis of applications mapped to Xilinx UltraScale+ FPGAs

Ilya K. Ganusov; Henri Fraisse; Aaron N. Ng; Rafael Trapani Possignolo; Sabya Das

This paper describes the methodology and algorithms behind extra pipeline analysis tools released in the Xilinx Vivado Design Suite version 2015.3. Extra pipelining is one of the most effective ways to improve performance of FPGA applications. Manual pipelining, however, often requires significant efforts from FPGA designers who need to explore various changes in the RTL and re-run the flow iteratively. The automatic pipelining approach described in this paper, in contrast, allows FPGA users to explore latency vs. performance trade-offs of their designs before investing time and effort into modifying RTL. We describe algorithms behind these tools which use simple cut heuristics to maximize performance improvement while minimizing additional latency and register overhead. To demonstrate the effectiveness of the proposed approach, we analyse a set of 93 commercial FPGA applications and IP blocks mapped to Xilinx UltraScale+ and UltraScale generations of FPGAs. The results show that extra pipelining can provide from 18% to 29% potential Fmax improvement on average. It also shows that the distribution of improvements is bimodal, with almost half of benchmark suite designs showing no improvement due to the presence of large loops. Finally, we demonstrate that highly-pipelined designs map well to UltraScale+ and UltraScale FPGA architectures. Our approach demonstrates 19% and 20% Fmax improvement potential for the UltraScale+ and UltraScale architectures respectively, with the majority of applications reaching their loop limit through pipelining.

field programmable logic and applications | 2016

Time-borrowing platform in the Xilinx UltraScale+ family of FPGAs and MPSoCs

Ilya K. Ganusov; Benjamin S. Devlin

This paper presents enhancements to the Xilinx UltraScale+ clocking architecture to support fine-grain time-borrowing. Time borrowing improves performance by redistributing timing slack between fast and slow paths. The Ultra-Scale+ architecture introduces programmable hardware delays and pulse generators embedded in the clocking tree to support time-borrowing based both on clock skew scheduling and pulsed latches. This programmable hardware allows borrowing from a few picoseconds to multiple nanoseconds between sequential pipeline stages without any changes to RTL, placement or routing. Vivado algorithms automatically determine when to skew flip-flop clock or convert them to pulsed latches to achieve the highest possible performance. Using the default Vivado flow, this programmable time-borrowing platform delivers 5.5% Fmax increase on average over a suite of 89 industrial designs. It is especially effective on high-speed applications, delivering up to 13.7% Fmax increase on individual designs. We also demonstrate that using non-default features, such as delays cascades or increasing hold margin, can increase average performance gains to 7.4% and 8.5%, respectively. This platform incurs minimum area (less than 0.1% of total chip area) while staying robust in the presence of tight hold constraints and increasing process variation.

ieee hot chips symposium | 2015