Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ajay Joshi is active.

Publication


Featured researches published by Ajay Joshi.


high performance interconnects | 2008

Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics

Christopher Batten; Ajay Joshi; Jason S. Orcutt; Anatoly Khilo; Benjamin Moss; Charles W. Holzwarth; Miloš A. Popović; Hanqing Li; Henry I. Smith; Judy L. Hoyt; Franz X. Kärtner; Rajeev J. Ram; Vladimir Stojanovic; Krste Asanovic

We present a new monolithic silicon photonics technology suited for integration with standard bulk CMOS processes, which reduces costs and improves opto-electrical coupling compared to previous approaches. Our technology supports dense wavelength-division multiplexing with dozens of wavelengths per waveguide. Simulation and experimental results reveal an order of magnitude better energy-efficiency than electrical links in the same technology generation. Exploiting key features of our photonics technology, we have developed a processor-memory network architecture for future manycore systems based on an opto-electrical global crossbar. We illustrate the advantages of the proposed network architecture using analytical models and simulations with synthetic traffic patterns. For a power-constrained system with 256 cores connected to 16 DRAM modules using an opto-electrical crossbar, aggregate network throughput can be improved by ap8-10times compared to an optimized purely electrical network.


international symposium on microarchitecture | 2009

Building Many-Core Processor-to-DRAM Networks with Monolithic CMOS Silicon Photonics

Christopher Batten; Ajay Joshi; Jason S. Orcutt; Anatol Khilo; Benjamin Moss; Charles W. Holzwarth; Miloš A. Popović; Hanqing Li; Henry I. Smith; Judy L. Hoyt; Franz X. Kärtner; Rajeev J. Ram; Vladimir Stojanovic; Krste Asanovic

Silicon photonics is a promising technology for addressing memory bandwidth limitations in future many-core processors. This article first introduces a new monolithic silicon-photonic technology, which uses a standard bulk CMOS process to reduce costs and improve energy efficiency, and then explores the logical and physical implications of leveraging this technology in processor-to-memory networks.


international symposium on computer architecture | 2010

Re-architecting DRAM memory systems with monolithically integrated silicon photonics

Scott Beamer; Chen Sun; Yong-Jin Kwon; Ajay Joshi; Christopher Batten; Vladimir Stojanovic; Krste Asanovic

The performance of future manycore processors will only scale with the number of integrated cores if there is a corresponding increase in memory bandwidth. Projected scaling of electrical DRAM architectures appears unlikely to suffice, being constrained by processor and DRAM pin-bandwidth density and by total DRAM chip power, including off-chip signaling, cross-chip interconnect, and bank access energy. In this work, we redesign the DRAM main memory system using a proposed monolithically integrated silicon photonics technology and show that our photonically interconnected DRAM (PIDRAM) provides a promising solution to all of these issues. Photonics can provide high aggregate pin-bandwidth density through dense wavelength-division multiplexing. Photonic signaling provides energy-efficient communication, which we exploit to not only reduce chip-to-chip interconnect power but to also reduce cross-chip interconnect power by extending the photonic links deep into the actual PIDRAM chips. To complement these large improvements in interconnect bandwidth and power, we decrease the number of bits activated per bank to improve the energy efficiency of the PIDRAM banks themselves. Our most promising design point yields approximately a 10x power reduction for a single-chip PIDRAM channel with similar throughput and area as a projected future electrical-only DRAM. Finally, we propose optical power guiding as a new technique that allows a single PIDRAM chip design to be used efficiently in several multi-chip configurations that provide either increased aggregate capacity or bandwidth.


IEEE Journal on Emerging and Selected Topics in Circuits and Systems | 2012

Designing Chip-Level Nanophotonic Interconnection Networks

Christopher Batten; Ajay Joshi; Vladimir Stojanovic; Krste Asanovic

Technology scaling will soon enable high-performance processors with hundreds of cores integrated onto a single die, but the success of such systems could be limited by the corresponding chip-level interconnection networks. There have been many recent proposals for nanophotonic interconnection networks that attempt to provide improved performance and energy-efficiency compared to electrical networks. This paper discusses the approach we have used when designing such networks, and provides a foundation for designing new networks. We begin by briefly reviewing the basic silicon-photonic device technology before outlining design issues and surveying previous nanophotonic network proposals at the architectural level, the microarchitectural level, and the physical level. In designing our own networks, we use an iterative process that moves between these three levels of design to meet application requirements given our technology constraints. We use our ongoing work on leveraging nanophotonics in an on-chip title-to-tile network, processor-to-main-memory network, and dynamic random-access memory (DRAM) channel to illustrate this design process.


IEEE Transactions on Very Large Scale Integration Systems | 2014

Design and Optimization of Nonvolatile Multibit 1T1R Resistive RAM

Mahmoud Zangeneh; Ajay Joshi

Memristor-based random access memory (RAM) is being explored as a potential replacement for flash memory to sustain the historic trends in the improvement of density, access time, and energy consumption of nonvolatile memory. In this paper, we present the detailed functionality of multibit one-transistor one-memristor (1T1R) cell-based memory arrays, and propose circuit-level performance and energy models for an individual memory cell and the memory array as a whole. We consider titanium dioxide (TiO2)and hafnium oxide (HfOx)based memristors, and for these technologies, there is a sub-10% difference between energy and performance computed using our models and HSPICE simulations. Using a performance-driven design approach, the energy-optimized TiO2-based resistive RAM (RRAM) array consumes the least write (4.06 pJ/b) and read energy (188 fJ/b) when storing 3 b/cell for 100-ns write and 1-ns read access times. Similarly, HfOx-based RRAM array consumes the least write (365 fJ/b) and read energy (173 fJ/b) when storing 3 b/cell for 1-ns write and 200-ns read access times. We also present a detailed analysis of the implications of process, voltage, and temperature variations on the performance and energy consumption of a multibit RRAM cell.


IEEE Journal of Selected Topics in Quantum Electronics | 2013

Runtime Management of Laser Power in Silicon-Photonic Multibus NoC Architecture

Chao Chen; Ajay Joshi

Silicon-photonic links have been proposed to replace electrical links for global on-chip communication in future many-core processors. Silicon-photonic links have the advantage of lower data-dependent power and higher bandwidth density, but the high laser power can more than offset these advantages. We propose a solution to manage laser power of silicon-photonic network-on-chip (NoC) in many-core system. We present a silicon-photonic multibus NoC architecture between private L1 caches and distributed L2 cache banks which uses weighted time-division multiplexing to distribute the laser power across multiple buses based on the runtime variations in the bandwidth requirements within and across applications to maximize energy efficiency. The multibus NoC architecture also harnesses the opportunities to switch OFF laser sources at runtime, during low-bandwidth requirements, to reduce laser power consumption. Using detailed system-level simulations, we evaluate the multibus NoC architecture and runtime laser power management technique on a 64-core system running NAS parallel benchmark suite. The silicon-photonic multibus NoC architecture provides more than two times better performance than silicon-photonic Clos and butterfly NoC architectures, while consuming the same laser power. Using runtime laser power management technique, the average laser power is reduced by more than 49% with minimal impact on the system performance.


design, automation, and test in europe | 2014

Thermal management of manycore systems with silicon-photonic networks

Tiansheng Zhang; José L. Abellán; Ajay Joshi; Ayse Kivilcim Coskun

Silicon-photonic network-on-chips (NoCs) provide high bandwidth density; therefore, they are promising candidates to replace electrical NoCs in manycore systems. The silicon-photonic NoCs, however, are sensitive to the temperature gradients that typically occur on the chip, and hence, require proactive thermal management. This paper first provides a design space exploration of silicon-photonic networks in manycore systems and quantifies the performance impact of the temperature gradients for various network bandwidths. The paper then introduces a novel job allocation technique that minimizes the temperature gradients among the ring modulators/filters to improve the application performance. Experimental results for a single-chip 256-core system demonstrate that our policy is able to maintain the maximum network bandwidth. Compared to existing workload allocation policies, the proposed policy improves system performance by up to 26.1% when running a single application and 18.3% for multi-program scenarios.


networks on chips | 2014

Sharing and placement of on-chip laser sources in silicon-photonic NoCs

Chao Chen; Tiansheng Zhang; Pietro Contu; Jonathan Klamkin; Ayse Kivilcim Coskun; Ajay Joshi

Silicon-photonic links are projected to replace the electrical links for global on-chip communications in future manycore systems. The use of off-chip laser sources to drive these silicon-photonic links can lead to higher link losses, thermal mismatch between laser source and on-chip photonic devices, and packaging challenges. Therefore, on-chip laser sources are being evaluated as candidates to drive the on-chip photonic links. In this paper, we first explore the power, efficiency and temperature tradeoffs associated with an on-chip laser source. Using a 3D stacked system that integrates a manycore chip with the optical devices and laser sources, we explore the design space for laser source sharing (among waveguides) and placement to minimize laser power by simultaneously considering the network bandwidth requirements, thermal constraints, and physical layout constraints. As part of this exploration we consider Clos and crossbar logical topologies, U-shaped and W-shaped physical layouts, and various sharing/placement strategies: locally-placed dedicated laser sources for waveguides, locally-placed shared laser sources, and shared laser sources placed remotely along the chip edges. Our analysis shows that logical topology, physical layout, and photonic device losses strongly drive the laser source sharing and placement choices to minimize laser power.


great lakes symposium on vlsi | 2011

Run-time energy management of manycore systems through reconfigurable interconnects

Jie Meng; Chao Chen; Ayse Kivilcim Coskun; Ajay Joshi

The active on-chip network channel width has a direct impact on the cache and memory access latency in manycore processors. A good choice of channel width improves the application performance and energy efficiency. In manycore systems, where workload patterns change significantly over time, setting the network channel width statically for the average or worst-case traffic gives sub-optimal energy efficiency. This paper proposes a novel, low-cost method to reconfigure the network channel width at run time to maximize energy efficiency of applications. We analyze the effect of channel width choices for two commonly used cache hierarchies, private and distributed L2 caches, on manycore systems with a bus or crossbar architecture running parallel workloads. The proposed reconfiguration policy predicts the energy-delay product (EDP) for the currently running application at various channel widths and chooses the best fitting width to minimize EDP. The experimental results show that in systems with private and distributed L2 caches our policy reduces EDP by 49.3% and 23.9%, and 65.5% and 20.6% on average with bus and crossbar, respectively, in comparison to statically setting the channel width.


high performance interconnects | 2009

Designing Energy-Efficient Low-Diameter On-Chip Networks with Equalized Interconnects

Ajay Joshi; Byungsub Kim; Vladimir Stojanovic

In a power and area constrained multicore system, the on-chip communication network needs to be carefully designed to maximize the system performance and programmer productivity while minimizing energy and area. In this paper, we explore the design of energy-efficient low-diameter networks (flattened butterfly and Clos) using equalized on-chip interconnects. These low-diameter networks are attractive as they can potentially provide uniformly high throughput and low latency across various traffic patterns, but require efficient global communication channels. In our case study, for a 64-tile system, the use of equalization for the wire channels in low-diameter networks provides 2x reduction in power with no loss in system performance compared to repeater-inserted wire channels. The use of virtual channels in routers further reduces the power of the network by 25-50% and wire area by 2x.

Collaboration


Dive into the Ajay Joshi's collaboration.

Top Co-Authors

Avatar

Vladimir Stojanovic

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Krste Asanovic

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Scott Beamer

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jeffrey A. Davis

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Yong-Jin Kwon

University of California

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge