Balasubramanian Sethuraman
University of Cincinnati
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Balasubramanian Sethuraman.
great lakes symposium on vlsi | 2005
Balasubramanian Sethuraman; Prasun Bhattacharya; Jawad Khan; Ranga Vemuri
Present day technology for ASICs supports Networks-on-Chip designs which can have 100 million gates on a single chip. The latest FPGAs can support only about 10 million gates to accomodate all logic and the associated routing. In order to implement a competitive NoC architecture in FP-GAs, the area occupied by the network should be kept to a minimum. This ensures that the maximum area can be utilized by the logic while maintaining the performance of the router network. Reducing area also reduces the power consumption. In this paper, we implement a parallel router which can support five simultaneous routing requests at the same time with an area overhead of only 352 Xilinx Virtex-II Pro FPGA slices (2. 57% of XC2VP30). We introduce optimizations in XY routing and decoding logic thereby gaining in area and performance. The header overhead is 8 bits per packet and the packet size can vary between 16 and 128 bits. We also implement a 3 x 3 mesh network with a total area overhead of 28% leaving 72% of the area available for the logic in a Virtex-II Pro XC2VP30 device. We characterize the router and several mesh networks for power and performance parameters.
design, automation, and test in europe | 2006
Balasubramanian Sethuraman; Ranga Vemuri
Networks-on-chip (NoC) way of system design has been introduced to overcome the communication and the performance bottlenecks of a bus based system design. Area is at a premium in FPGAs. In this research, we propose to reduce network area overhead by reducing the number of routers, by making the router handle multiple logic cores. We implement an improved multi-local port router design with variable number of local ports. In addition to substantial area savings, we observe significant performance improvement. We discuss the issues involved in the use of multi-local port routers for NoC design in FPGAs. We observe an average of 36% area savings (maximum of 47.5%) on XC2VP30 FPGA and significant performance gain (30% average compared to single-local port version) with a multi-local port router. Mapping of cores onto such a non-traditional NoC architecture is a complex task. We present an algorithm which optimally maps the cores based on the given set of objectives. For the given task graph and the set of constraints, the algorithm finds the optimal number of routers, configuration of each router, optimal mesh topology and the final mapping. We test the algorithm on a wide variety of benchmarks and report the results
field-programmable logic and applications | 2006
Balasubramanian Sethuraman; Ranga Vemuri
Modern FPGAs provide increased gate count with decreased power consumption. Several IP cores along with embedded processor and memory provide a great opportunity of implementing system-on-chip (SoC) designs on configurable devices. Networks-on-Chip (NoC) is an emerging style of SoC design, introduced to overcome the communication and performance bottlenecks of a shared-bus approach. Multi local port router (MLPR) present a novel design alternative for the traditional NoC design. This new methodology offers numerous advantages including bandwidth optimization and reduced network area & power consumption, resulting eventually in improved performance of the NoC system. Unlike the bus-based systems, communication in NoCs until now have been between pair of cores, with no scope of multi-casting. In this research, we advance a step further in the pursuit of a high performance FPGA-based NoC system. We exploit the multi-casting nature present in various application system task graphs and present a novel & improved MLPR architecture with broadcast capability. We present the modified architecture, the decoding scheme and the stripped-down crosspoint matrix, resulting in reduced logic usage & increased performance. We report the synthesis and the simulation results.
international conference on vlsi design | 2007
Balasubramanian Sethuraman; Ranga Vemuri
Networks-on-chip (NoC) is an emerging style of system design introduced to overcome the communication and the performance bottlenecks of a shared-bus design. Away from the traditional NoC mesh design, multi local port router (MLPR) has been introduced as design alternative to improve the bandwidth, reduce the network area (36% average area savings) and eventually, improve the overall performance of the NoC system. In this research, we present a fast mapping tool (cMap) for generating NoC architectures using MLPRs. The algorithm exploits the advantages offered by MLPRs and starts with a minimum dimension mesh. After an initial bandwidth-communication-cost based nearest-neighbor placement, it uses a force-directed approach to iteratively expand the mesh, as the cost gets reduced. The algorithm introduces the concept of folding to improve the NoC design. Unlike the earlier exhaustive-search based optiMap algorithm, cMap can handle any size of the task graph, producing near-optimal results (average cost difference between 3% and 10%,) in a couple of seconds. We experiment with a rich set of 22 benchmarks and report the results
international symposium on low power electronics and design | 2007
Balasubramanian Sethuraman; Ranga Vemuri
Networks-on-Chip (NoC) is an emerging alternative for system integration that is projected to meet the growing communication demands for future System-on-Chips. Compared to the bus-based systems, traditional NoCs do not have versatile data transfer capabilities like broadcasting. Multi2 Router is a Multi Local Port Router (MLPR) architecture that has multicast feature in-built inside the router elements of an MLPR-based NoC. In this research,we present an NoC configuration generation approach exploiting the multicast feature. Compared to the traditional single port based unicast transfers, we observe an average of 50% packet reduction (maximum of 74% using 9 Local Port (LP) router, in benchmark p3), across a set of benchmarks. On an average, when compared to the traditional 1 LP unicast router, there is a 16% reduction in the execution time and 35% reduction (maximum of 67% in benchmark p4) in total power consumption. The results show the promise of the proposed scheme, and thus, help to realize power-efficient Networks-on-Chip.
field-programmable logic and applications | 2006
Balasubramanian Sethuraman
Emerging platform-FPGAs with embedded soft and hard processors cores can be used for system-on-chip (SoC) designs. SoC systems represent a complex interconnection of various functional elements. Existing bus based interconnect architectures do not present a scalable solution to the existing problems in the communication. Networks-on-chip (NoC) has been proposed as a new design paradigm to solve the communication bottlenecks of the bus based system design (Benini and Micheli, 2002 , Dally and Towles, 2001). The basic idea is to interconnect the various intellectual property (IP) cores using on-chip networks (Hemani et al., 2000, Dally and Towles, 2004, Duato et al. 1998). Exploiting the advantages of NoC in FPGAs for implementing SoC designs is an active area of research. Modern FPGAs can support up to 10 million gates to accommodate all logic and the associated routing. Thus, logic area is at a premium in FPGAs. In order to implement a competitive NoC architecture in FPGAs, the area occupied by the network logic should be kept to a minimum. This ensures maximum area utilization by the logic while maintaining the performance of the on-chip network. Area reduction results in increased performance and reduced power consumption of the overall system (Sethuraman et al., 2005, Sethuraman et al., 2004). Firstly an area reduction, by designing a light weight router for FPGAs, was achieved. Then, a novel router architecture designs (Multi Local Port Routers and Multi2 Routers) that provide ample opportunity to optimize the data traffic, thereby achieving improvement in both the power and the performance were proposed. This is primarily because of the reduction in the number of packets flowing in the main networks-on-chip mesh. Also, in this research work, efficient NoC configuration generation strategies were presented
great lakes symposium on vlsi | 2009
Balasubramanian Sethuraman; Ranga Vemuri
To avoid bandwidth violations, the NoC architecture generation phase must consider the bandwidth variation along various links. In this paper, we analyze the impact of the dynamic nature of task graphs on bandwidth requirements and present an algorithm to find the Minimum BandWidth Guarantee along the various links of the NoC architecture.
symposium on cloud computing | 2004
Balasubramanian Sethuraman; Jawad Khan; Ranga Vemuri
Archive | 2007
Balasubramanian Sethuraman
ERSA | 2004
Jawad Khan; Balasubramanian Sethuraman; Ranga Vemuri