Alireza S. Kaviani
Xilinx
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alireza S. Kaviani.
field programmable gate arrays | 2002
Li Shang; Alireza S. Kaviani; Kusuma Bathala
This paper analyzes the dynamic power consumption in the fabric of Field Programmable Gate Arrays (FPGAs) by taking advantage of both simulation and measurement. Our target device is Xilinx Virtex™-II family, which contains the most recent and largest programmable fabric. We identify important resources in the FPGA architecture and obtain their utilization, using a large set of real designs. Then, using a number of representative case studies we calculate the switching activity corresponding to each resource. Finally, we combine effective capacitance of each resource with its utilization and switching activity to estimate its share of power consumption. According to our results, the power dissipation share of routing, logic and clocking resources are 60%, 16%, and 14%, respectively. Also, we concluded that dynamic power dissipation of a Virtex-II CLB is 5.9μW per MHz for typical designs, but it may vary significantly depending on the switching activity.
IEEE Design & Test of Computers | 1999
Alireza S. Kaviani; Stephen Dean Brown
The authors propose a new architecture that combines two existing technologies: lookup-table-based FPGAs and complex programmable logic devices based on PLA-like blocks. Their mapping results indicate that on average LUT-based FPGAs require 78% more area than their hybrid FPGA, while providing roughly the same circuit depth.
field programmable gate arrays | 2000
Alireza S. Kaviani; Stephen Dean Brown
In this paper we present new technology mapping algorithms for use in a programmable logic device (PLD) that contains both lookup tables (LUTs) and PLA-like blocks. The technology mapping algorithms partially collapse circuits to reduce either area or depth, and pack the circuits into a minimum number of LUTs and PLA-like blocks. Since no other technology mapping algorithm for this problem has been previously published, we cannot compare our approach to others. Instead, to illustrate the importance of this problem we use our algorithms to investigate the benefits provided by a PLD architecture with both LUTs and PLA-like blocks compared to a traditional LUT-based FPGA. The experimental results indicate that our mixed PLD architecture is more area-efficient than LUT-based FPGAs by up to 29%, or more depth-efficient by up to 75%.1
symposium on asynchronous circuits and systems | 2004
Alireza S. Kaviani
This work presents the circuit design for phase alignment in a digital frequency synthesizer (DFS), taking advantage of asynchronous level-mode state machines. An example of a real case asynchronous design is presented that provides superior results to alternative solutions. The designs are implemented in the Xilinx Spartan/spl trade/-III family, a field programmable device in the 90nm technology. We explain the specific clock management application and the circuits for our designs, followed by a summary of the final results. Our silicon results indicate functionality improvement, area decrease, and jitter reduction compared to alternatives. In addition, taking advantage of novel asynchronous circuits saves engineering effort during silicon characterization and design of future generations of products.
field programmable gate arrays | 2016
Henri Fraisse; Abhishek Joshi; Dinesh D. Gaitonde; Alireza S. Kaviani
Boolean Satisfiability (SAT)-based routing offers a unique advantage over conventional routing algorithms by providing an exhaustive approach to find a solution. Despite that advantage, commercial FPGA CAD tools rarely use SAT-based routers due to scalability issues. In this paper, we revisit SAT-based routing and propose two SAT formulations independent of routing architecture. We then demonstrate that SAT-based routing using either formulation dramatically outperforms conventional routing algorithms in both runtime and robustness for the clock routing of Xilinx UltraScale devices. Finally, we experimentally show that one of the proposed SAT formulations leads to a routing 18x faster and produces formulas 20x more compact than the other. This framework has been implemented into Vivado and is now currently used in production.
field-programmable technology | 2015
Elias Vansteenkiste; Alireza S. Kaviani; Henri Fraisse
The pinnacle of success for academic work is often achieved by having impact on commercial products. In order to have a successful transfer bridge, academic evaluation flows need to provide representative results of similar quality to commercial flows. A majority of publications in FPGA research use the same set of known academic CAD tools and benchmarks to evaluate new architecture and tool ideas. However, it is not clear whether the claims in academic publications based on these tools and benchmarks translate to real benefits in commercial products. In this work we compare the latest Xilinx commercial tools and products with these well-known academic tools to identify the gap in the major figures of merit. Our results show that there is a significant 2.2X gap in speed-performance for similar process technology. We have also identified the area-efficiency and runtime divide between commercial and academic tools to be 5% and 2.2X, respectively. We show that it is possible to improve portions of the academic flow such as ABC logic optimization to match the quality of commercial tools at the expense of additional runtime. Our results also show that depth reduction, which is often used as the main figure of merit for logic optimization papers does not translate to post-routing timing improvements. We finally discuss the differences between academic and commercial benchmark designs. We explain the main differences and trends that may influence the topic choice and conclusions of academic research. This work emphasizes how difficult it is to identify the relevant FPGA academic work that can provide meaningful benefits for commercial products.
field programmable logic and applications | 2002
Alireza S. Kaviani
This paper presents and analyzes a methodology for improving the quality of results in Field Programmable Gate Arrays (FPGAs) by taking advantage of the design hierarchy. We use a representative case study, which is a real design, to demonstrate how taking advantage of the hierarchy may lead to higher area-efficiency and better speed-performance. According to our results, an area saving of 18% along with a speedup of 15% is achievable; these area and speed improvements may result in a cost saving of a factor of two for volume production. Our analysis also shows that the above savings will not have a negative impact on routability and power consumption.
international symposium on multiple valued logic | 1994
Alireza S. Kaviani; Zvonko G. Vranesic
Proposes an algorithm for processor partitioning in multiprocessor systems. The algorithm is based on fuzzy logic. It takes crisp inputs that represent the remaining amount of work and the efficiency of an application and produces the output that determines the number of processors that are to be allocated to the application during a given reallocation period. Simulation results are presented that indicate the effectiveness of the proposed scheme.<<ETX>>
reconfigurable computing and fpgas | 2015
Pongstorn Maidee; Alireza S. Kaviani
FPGA capacity has grown rapidly and emerging large applications comprise a large number of hard and soft modules. The communication among these modules requires high demand from fabric interconnect, causing routing congestion and performance degradation. This problem will be more pronounced with process scaling since the technology is not improving wire resistance. A general technique to reduce interconnect demand is sharing the wires; Network-on-Chip (NoC) is a systematic method for sharing wires. Several NoC implementations have been proposed for FPGAs in the literature, but most are designed with assumptions carried over from ASIC NoCs. In this work we examine these assumptions and modify them when necessary to customize a soft NoC for FPGAs. We developed a NoC that is tuned for FPGAs and compared it to existing NoCs in the literature. The proposed soft NoC provides 12% to 58% higher throughput per link depending on the settings. This additional throughput comes with 5% to 19% reduction in area.
symposium on cloud computing | 2008
Alireza S. Kaviani; Tao Pi; Declan Kelly
A programmable digital deskewing and phase shifting architecture is presented, supporting a wide range of frequency with a fine granularity of shift control. Proposed design is based on a digital delay line that is distinguished from previous work by area and power efficiency. Proposed design takes less than 0.05 mm2 in 90 nm and dissipates 54 muW/MHz. Silicon results show three times area saving and four times wider operation frequency compared to alternative implementations.