Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Venkatesh Akella is active.

Publication


Featured researches published by Venkatesh Akella.


architectures for networking and communications systems | 2010

DOS: a scalable optical switch for datacenters

Xiaohui Ye; Paul Mejia; Yawei Yin; Roberto Proietti; S. J. B. Yoo; Venkatesh Akella

This paper discusses the architecture and performance studies of Datacenter Optical Switch (DOS) designed for scalable and high-throughput interconnections within a data center. DOS exploits wavelength routing characteristics of a switch fabric based on an Arrayed Waveguide Grating Router (AWGR) that allows contention resolution in the wavelength domain. Simulation results indicate that DOS exhibits lower latency and higher throughput even at high input loads compared with electronic switches or previously proposed optical switch architectures such as OSMOSIS [4, 5] and Data Vortex [6, 7]. Such characteristics, together with very high port count on a single switch fabric make DOS attractive for data center applications where the traffic patterns are known to be bursty with high temporary peaks [13]. DOS exploits the unique characteristics of the AWGR fabric to reduce the delay and complexity of arbitration. We present a detailed analysis of DOS using a cycle-accurate network simulator. The results show that the latency of DOS is almost independent of the number of input ports and does not saturate even at very high (approx 90%) input load. Furthermore, we show that even with 2 to 4 wavelengths, the performance of DOS is significantly better than an electrical switch network based on state-of-the-art flattened butterfly topology.


IEEE Journal on Selected Areas in Communications | 2003

High-performance optical-label switching packet routers and smart edge routers for the next-generation Internet

S. J. B. Yoo; Fei Xue; Y. Bansal; J. Taylor; Zhong Pan; Jing Cao; Minyong Jeon; T. Nady; G. Goncher; K. Boyer; K. Okamoto; Shin Kamei; Venkatesh Akella

This paper discusses the architecture, protocol, analysis, and experimentation of optical packet switching routers incorporating optical-label switching (OLS) technologies and electronic edge routers with traffic shaping capabilities. The core optical router incorporates all-optical switching with contention resolution in wavelength, time, and space domains. It is also capable of accommodating traffic of any protocol and format, and supports packet, flow, burst, and circuit traffic. The edge router is designed to achieve traffic shaping with consideration for quality of service and priority based class-of-service. Simulation results show packet loss rates below 0.3% at load 0.7 and jitter values below 18 /spl mu/s. The traffic shaping reduces the packet loss rate by a factor of /spl sim/5 while adding negligible additional latency. The OLS core routers and the electronic edge routers are constructed including the field-programmable-gate-arrays incorporating the wavelength-aware forwarding and contention resolution algorithms. The experiment shows optical-label-based packet switching with a packet loss rate near 0.2%.


IEEE Journal of Selected Topics in Quantum Electronics | 2013

LIONS: An AWGR-Based Low-Latency Optical Switch for High-Performance Computing and Data Centers

Yawei Yin; Roberto Proietti; Xiaohui Ye; Christopher Nitta; Venkatesh Akella; S. J. B. Yoo

This paper discusses the architecture of an arrayed waveguide grating router (AWGR)-based low-latency interconnect optical network switch called LIONS, and its different loopback buffering schemes. A proof of concept is demonstrated with a 4 × 4 experimental testbed. A simulator was developed to model the LIONS architecture and was validated by comparing experimentally obtained statistics such as average end-to-end latency with the results produced by the simulator. Considering the complexity and cost in implementing loopback buffers in LIONS, we propose an all-optical negative acknowledgement (AO-NACK) architecture in order to remove the need for loopback buffers. Simulation results for LIONS with AO-NACK architecture and distributed loopback buffer architecture are compared with the performance of the flattened butterfly electrical switching network.


Journal of Lightwave Technology | 2003

End-to-end contention resolution schemes for an optical packet switching network with enhanced edge routers

Fei Xue; Zhong Pan; Y. Bansal; Jing Cao; Minyong Jeon; K. Okamoto; Shin Kamei; Venkatesh Akella; S. J. B. Yoo

This paper investigates contention resolution schemes for optical packet switching networks from an end-to-end perspective, where the combined exploitation of both core routers and edge routers are highlighted. For the optical-core network, we present the architecture of an optical router to achieve contention resolution in wavelength, time, and space domains. Complementing the solution involving only the core router intelligences, we propose performance enhancement schemes at the network edge, including a traffic-shaping function at the ingress edge and a proper dimensioning of the drop port number at the egress edge. Both schemes prove effective in reducing networkwide packet-loss rates. In particular, scalability performance simulations demonstrate that a considerably low packet-loss rate (0.0001% at load 0.6) is achieved in a 16-wavelength network by incorporating the performance enhancement schemes at the edge with the contention resolution schemes in the core. Further, we develop an field-programmable gate-array (FPGA)-based switch controller and integrate it with enabling optical devices to demonstrate the packet-by-packet contention resolution. Proof-of-principle experiments involving the prototype core router achieve an error-free low-latency contention resolution.


international conference on computer aided design | 1992

SHILPA: a high-level synthesis system for self-timed circuits

Venkatesh Akella; Ganesh Gopalakrishnan

SHILPA is a system for the high-level synthesis of self-timed circuits. It takes behavioral descriptions in a process+functional language called hopCP and produces a netlist for the Actel field-programmable gate array (FPGA), supported by the VIEWlogic tools. hopCP descriptions are initially translated into an intermediate form based on hypergraphs called HFGs. SHILPA then applies action refinement, which is a technique for transforming HFGs into asynchronous hardware by a series of graph-based transformation rules. Action refinement is characterized by incremental resource allocation and control decomposition. The major contributions of the proposed work are given.<<ETX>>


international symposium on computer architecture | 2004

Synchroscalar: A Multiple Clock Domain, Power-Aware, Tile-Based Embedded Processor

John Y. Oliver; Ravishankar Rao; Paul Sultana; Jedidiah R. Crandall; Erik Czernikowski; Leslie W. Jones; Diana Franklin; Venkatesh Akella; Frederic T. Chong

We present Synchroscalar, a tile-based architecture for embedded processing that is designed to provide the flexibility of DSPs while approaching the power efficiency of ASICs. We achieve this goal by providing high parallelism and voltage scaling while minimizing control and communication costs. Specifically, Synchroscalar uses columns of processor tiles organized into statically-assigned frequency-voltage domains to minimize power consumption. Furthermore, while columns use SIMD control to minimize overhead, data-dependent computations can be supported by extremely flexible statically-scheduled communication between columns. We provide a detailed evaluation of Synchroscalar including SPICE simulation, wire and device models, synthesis of key components, cycle-level simulation, and compiler- and hand-optimized signal processing applications. We find that the goal of meeting, not exceeding, performance targets with data-parallel applications leads to designs that depart significantly from our intuitions derived from general-purpose microprocessor design. In particular, synchronous design and substantial global interconnect are desirable in the low-frequency, low-power domain. This global interconnect supports parallelization and reduces processor idle time, which are critical to energy efficient implementations of high bandwidth signal processing. Overall, Synchroscalar provides programmability while achieving power efficiencies within 8-30/spl times/ of known ASIC implementations, which is 10-60/spl times/ better than conventional DSPs. In addition, frequency-voltage scaling in Synchroscalar provides between 3-32% power savings in our application suite.


Journal of Lightwave Technology | 2003

Demonstration of all-optical packet switching routers with optical label swapping and 2R regeneration for scalable optical label switching network applications

Min Yong Jeon; Zhong Pan; Jing Cao; Y. Bansal; J. Taylor; Zubin Wang; Venkatesh Akella; K. Okamoto; Shin Kamei; J. Pan; S. J. B. Yoo

This paper investigates comprehensive operation and experimentation of an all-optical packet switching router with optical label swapping and reamplification and reshaping (2R) regeneration, capable of multihop operation and Internet protocol (IP)-client interoperability. In particular, the experiment demonstrates successful packet switching and transport up to 11 hops with 10/sup -9/ bit-error rate and error-free up to four hops. Furthermore, this paper demonstrates the optical label switching (OLS) core router and edge routers working together to support IP-client-to-IP-client packet transport and switching across the optical label switching network. The edge router generates an optical label based on the IP header content of the packet and generates an optical label encoded packet, which subsequently ingresses into the OLS network. The optical label switching router (OLSR) forwards the packet with all-optical label swapping at each hop with 2R regeneration. The 2R regeneration leads to an experimentally measured negative penalty and a successful experimental demonstration of multihop cascaded OLSR operation with the edge routers interfacing with IP clients. The successful IP-client-to-IP-client packet forwarding via the edge routers and the cascaded multihop OLSR with all-optical label swapping indicate the viability of OLS in the scalable and transparent IP-over-optical Internet.


Journal of Lightwave Technology | 2004

Design and experimental demonstration of a variable-length optical packet routing system with unified contention resolution

Fei Xue; Zhong Pan; Haijun Yang; Jinqiang Yang; Jing Cao; K. Okamoto; Shin Kamei; Venkatesh Akella; S. J. B. Yoo

This paper presents theoretical design, network simulation, implementation, and experimental studies of optical packet routing systems supporting variable-length packets. The optical packet switching network exploits unified contention resolution in core routers in three optical domains (wavelength, time, and space) and in edge routers by traffic shaping. The optical router controller and lookup table, implemented in a field-programmable gate array (FPGA), effectively incorporates the contention resolution scheme with pipelined arbitration of asynchronously arriving variable-length packets. In addition, real-time performance monitoring based on the strong correlation between the bit-error rates of the optical label and those of the data payload indicates its application in optical time-to-live detection for loop mitigations. Successful systems integration resulted in experimental demonstration of the all-optical packet switching system with contention resolution for variable-size packets.


IEEE Transactions on Circuits and Systems | 2011

Memory System Optimization for FPGA-Based Implementation of Quasi-Cyclic LDPC Codes Decoders

Xiaoheng Chen; Jingyu Kang; Shu Lin; Venkatesh Akella

Designers are increasingly relying on field-programmable gate array (FPGA)-based emulation to evaluate the performance of low-density parity-check (LDPC) codes empirically down to bit-error rates of 10-12 and below. This requires decoding architectures that can take advantage of the unique characteristics of a modern FPGA to maximize the decoding throughput. This paper presents two specific optimizations called vectorization and folding to take advantage of the configurable data-width and depth of embedded memory in an FPGA to improve the throughput of a decoder for quasi-cyclic LDPC codes. With folding it is shown that quasi-cyclic LDPC codes with a very large number of circulants can be implemented on FPGAs with a small number of embedded memory blocks. A synthesis tool called QCSyn is described, which takes the H matrix of a quasi-cyclic LDPC code and the resource characteristics of an FPGA and automatically synthesizes a vector or folded architecture that maximizes the decoding throughput for the code on the given FPGA by selecting the appropriate degree of folding and/or vectorization. This helps not only in reducing the design time to create a decoder but also in quickly retargeting the implementation to a different (perhaps new) FPGA or a different emulation board.


IEEE Computer | 1997

Asynchronous processor survey

Tony Werner; Venkatesh Akella

Virtually all computers today are synchronous. As systems grow increasingly large and complex the clock can cause big problems with clock skew, a timing delay that can create havoc with the overall design. It can also increase the circuit silicon and power dissipation, which can affect overheating and power supplies. Computer architecture researchers are actively considering asynchronous processor design. Asynchronous architectures permit modular design. Each subsystem or functional block can be optimized without being synchronized to a global clock, which simplifies interfacing. Moreover, an asynchronous system exhibits the average performance of all the individual components, rather than the synchronous systems worst-case performance of a single component. Furthermore, asynchronous processors may yet prove to offer reduced power dissipation by inherently shutting down unused portions of the circuit. This article examines the key architecture issues that concern designers and compares six developmental asynchronous architectures: CAP, the Caltoch Asynchronous Processor; FAM, the Fully Asynchronous Microprocessor; NSR, the Nonsynchronous RISC; CFPP, the Counterflow Pipeline Processor; Strip, a Self-Assured RISC Processor; and Amulet 1.

Collaboration


Dive into the Venkatesh Akella's collaboration.

Top Co-Authors

Avatar

S. J. B. Yoo

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yawei Yin

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zhong Pan

University of California

View shared research outputs
Top Co-Authors

Avatar

Xiaohui Ye

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John Y. Oliver

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge