Andrey Ayupov | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrey Ayupov is active.

Explore More

Publication

Featured researches published by Andrey Ayupov.

international symposium on physical design | 2012

The ISPD-2012 discrete cell sizing contest and benchmark suite

Muhammet Mustafa Ozdal; Chirayu S. Amin; Andrey Ayupov; Steven M. Burns; Gustavo R. Wilke; Cheng Zhuo

Circuit optimization is essential to minimize power consumption of designs while satisfying timing constraints. The CAD problem focused on in the ISPD-2012 Contest is simultaneous gate sizing and threshold voltage assignment. In this paper, we describe an overview of the contest objectives and the provided benchmark suite. Furthermore, some details are provided in terms of the standard cell library, timing models, and the evaluation metrics of the ISPD-2012 Contest.

international symposium on physical design | 2013

An improved benchmark suite for the ISPD-2013 discrete cell sizing contest

Muhammet Mustafa Ozdal; Chirayu S. Amin; Andrey Ayupov; Steven M. Burns; Gustavo R. Wilke; Cheng Zhuo

Gate sizing and threshold voltage selection is an important step in the VLSI design process to optimize power and performance of a given netlist. In this paper, we provide an overview of the ISPD-2013 Discrete Cell Sizing Contest. Compared to the ISPD-2012 Contest, we propose improvements in terms of the benchmark suite and the timing models utilized. In this paper, we briefly describe the contest, and provide some details about the standard cell library, benchmark suite, timing infrastructure and the evaluation metrics.

international symposium on computer architecture | 2016

Energy efficient architecture for graph analytics accelerators

Muhammet Mustafa Ozdal; Serif Yesil; Taemin Kim; Andrey Ayupov; John Greth; Steven M. Burns; Ozcan Ozturk

Specialized hardware accelerators can significantly improve the performance and power efficiency of compute systems. In this paper, we focus on hardware accelerators for graph analytics applications and propose a configurable architecture template that is specifically optimized for iterative vertex-centric graph applications with irregular access patterns and asymmetric convergence. The proposed architecture addresses the limitations of the existing multi-core CPU and GPU architectures for these types of applications. The SystemC-based template we provide can be customized easily for different vertex-centric applications by inserting application-level data structures and functions. After that, a cycle-accurate simulator and RTL can be generated to model the target hardware accelerators. In our experiments, we study several graph-parallel applications, and show that the hardware accelerators generated by our template can outperform a 24 core high end server CPU system by up to 3x in terms of performance. We also estimate the area requirement and power consumption of these hardware accelerators through physical-aware logic synthesis, and show up to 65x better power consumption with significantly smaller area.

international conference on computer aided design | 2015

A Polyhedral-based SystemC Modeling and Generation Framework for Effective Low-power Design Space Exploration

Wei Zuo; Warren Kemmerer; Jong Bin Lim; Louis-Noël Pouchet; Andrey Ayupov; Taemin Kim; Kyungtae Han; Deming Chen

With the prevalence of System-on-Chips there is a growing need for automation and acceleration of the design process. A classical approach is to take a C/C++ specification of the application, convert it to a SystemC (or equivalent) description of hardware implementing this application, and perform successive refinement of the description to improve various design metrics. In this work, we present an automated SystemC generation and design space exploration flow alleviating several productivity and design time issues encountered in the current design process. We first automatically convert a subset of C/C++, namely affine program regions, into a full SystemC description through polyhedral model-based techniques while performing powerful data locality and parallelism transformations. We then leverage key properties of affine computations to design a fast and accurate latency and power characterization flow. Using this flow, we build analytical models of power and performance that can effectively prune away a large amount of inferior design points very fast and generate Pareto-optimal solution points. Experimental results show that (1) our SystemC models can evaluate system performance and power that is only 0.57% and 5.04% away from gate-level evaluation results, respectively; (2) our latency and power analytical models are 3.24% and 5.31% away from the actual Pareto points generated by SystemC simulation, with 2091x faster design-space exploration time on average. The generated Pareto-optimal points provide effective low-power design solutions given different latency constraints.

international conference on computer aided design | 2015

Architectural Requirements for Energy Efficient Execution of Graph Analytics Applications

Muhammet Mustafa Ozdal; Serif Yesil; Taemin Kim; Andrey Ayupov; Steven M. Burns; Ozcan Ozturk

Intelligent data analysis has become more important in the last decade especially because of the significant increase in the size and availability of data. In this paper, we focus on the common execution models and characteristics of iterative graph analytics applications. We show that the features that improve work efficiency can lead to significant overheads on existing systems. We identify the opportunities for custom hardware implementation, and outline the desired architectural features for energy efficient computation of graph analytics applications.

design automation conference | 2017

Accurate High-level Modeling and Automated Hardware/Software Co-design for Effective SoC Design Space Exploration

Wei Zuo; Louis-Noël Pouchet; Andrey Ayupov; Taemin Kim; Chung-Wei Lin; Shinichi Shiraishi; Deming Chen

A desirable feature of a development tool for SoC design is that, given the important applications in the domain to be targeted by the SoC, a powerful hardware-software partitioning engine is available to determine which function(s) shall be mapped to hardware. However, to provide high-quality partitioning, this engine must be able to consider a rich design space of possible alternate hardware and software implementations for each program region candidate for hardware acceleration, in turn making the task of finding the optimal mapping very difficult given the number of design points to consider and the need for accurate modeling of latency, power and area. In this work we propose a novel framework to enable hardware acceleration of performance-critical parts of an application, by addressing the problem of hardware/software partitioning under power and area constraints to minimize the overall program latency. Our flow is based on the LLVM compiler, and focuses on building a scalable compile-time partitioning algorithm while considering large sets of alternative hardware and software implementations for a particular region. To this end we develop a hybrid approach based on mixing semi-random selection of hardware design points and an Integer Linear Programming formulation of the mapping decision, along with iterative refinements of the solution. Experimental results demonstrate the capability of our approach to consider complex designs and yet output near-optimal partitioning decision. Our package is named RIP (Randomized ILP-based Partitioning), and is open source to benefit the research community.

east-west design and test symposium | 2008

A novel timing-driven placement algorithm using smooth timing analysis

Andrey Ayupov; Leonid Kraginskiy

This work proposes a timing-driven placement algorithm that uses a new type of timing analysis, which we call smooth timing analysis. It constructs the timing cost function as a smooth function of cell placement. In addition, for net modeling the algorithm uses a companion net routing that provides more accurate wire delay. The placement task is then formulated as a non-linear optimization problem. Experiments prove that the proposed method is applicable to build timing-driven placement solutions for designs with thousands critical cells. Experimental results on blocks from recent microprocessor designs show a 65% average improvement in total negative slack comparing to a leading industrial flow.

international conference on computer aided design | 2015

Hardware Accelerator Design for Data Centers

Serif Yesil; Muhammet Mustafa Ozdal; Taemin Kim; Andrey Ayupov; Steven M. Burns; Ozcan Ozturk

As the size of available data is increasing, it is becoming inefficient to scale the computational power of traditional systems. To overcome this problem, customized application-specific accelerators are becoming integral parts of modern system on chip (SOC) architectures. In this paper, we summarize existing hardware accelerators for data centers and discuss the techniques to implement and embed them along with the existing SOCs.

international conference on computer aided design | 2011

A trace compression algorithm targeting power estimation of long benchmarks

Andrey Ayupov; Steven M. Burns

This paper presents an algorithm for compressing long traces generated using RTL or other fast simulation. The compressed traces can be used by power analysis tools to estimate power on the original traces. We show that the length of the compressed trace is independent of the length of original trace and is a function of circuit size (precisely, its active part) for which the trace was generated. Our experiments show up to 578× compression ratio on several long RTL traces (up to 320,000 clock transitions) used for power analysis on three industrial blocks (4K, 114K and 202K gates). This leads to significant runtime improvement, especially when the traces are reused over multiple power analysis runs. The dynamic power estimated using compressed traces is within 5% of the power analysis on original traces.

great lakes symposium on vlsi | 2008

An analytical approach to placement legalization

Andrey Ayupov; Alexander Marchenko; Vladimir Tiourin

We present a method to achieve nearly legal placement while optimizing the traditional metrics in an analytical placement framework. A legalization penalty function term is added to the cost function of the placer. The purpose of this term is to remove overlaps and place cells into rows. The new term kicks in when global spreading cannot resolve overlaps any further. We study how this legalization term in placement helps to achieve better final placements when it is used in combination with wire-length driven analytical placement. Experimental results show that using this additional legalization cost term results in reduction of degradation of wire-length from 7.6% to 0.7% after discrete detailed placement. Optimization of wire-length along with the legalization term in placement shows 6% improvement in total wire-length on average, which if translated into timing is 48% of total negative slack. A further feature to control cell density helps reduce congestion by 33%.

Explore More