Ahmed Louri
George Washington University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ahmed Louri.
Journal of Lightwave Technology | 1994
Ahmed Louri; Hongki Sung
A new interconnection network for massively parallel computing is introduced. This network is called an optical multi-mesh hypercube (OMMH) network. The OMMH integrates positive features of both hypercube (small diameter, high connectivity, symmetry, simple control and routing, fault tolerance, etc.) and mesh (constant node degree and scalability) topologies and at the same time circumvents their limitations (e.g., the lack of scalability of hypercubes, and the large diameter of meshes). The OMMH can maintain a constant node degree regardless of the increase in the network size. In addition, the flexibility of the OMMH network makes it well suited for optical implementations. This paper presents the OMMH topology, analyzes its architectural properties and potentials for massively parallel computing, and compares it to the hypercube. Moreover, it also presents a three-dimensional optical design methodology based on free-space optics. The proposed optical implementation has totally space-invariant connection patterns at every node, which enables the OMMH to be highly amenable to optical implementation using simple and efficient large space-bandwidth product space-invariant optical elements. >
Optical Engineering | 1989
Kai Hwang; Ahmed Louri
The modified-signed-digit (MSD) number system offers parallel addition and subtraction of any two numbers, with carry propagation constrained only between two adjacent digits. Based on MSD addition, parallel algorithms for multiplication and division are developed in this paper. The optical implementations of these MSD arithmetic algorithms are developed on the basis of symbolic substitution (SS). The space-invariant nature of SS matches well with the parallel nature of the MSD arithmetic algorithms presented. The potential advantages of using these algorithms for optical computing include the significant increase in speed, full exploitation of parallelism, and higher system throughput compared with existing electronic arithmetic processors. The performance of the proposed optical arithmetic system is analyzed and compared with that of state-of-the-art electronic counterparts.
Applied Optics | 2000
Jacques Henri Collet; Daniel Litaize; Jan Van Campenhout; Chris R. Jesshope; Marc Phillipe Yves Desmulliez; Hugo Thienpont; James R. Goodman; Ahmed Louri
The relevance of introducing optical interconnects (OIs) in monoprocessors and multiprocessors is studied from an architectural point of view. We show that perhaps the major explanation for why optical technologies have nearly been unable to penetrate into computers is that OIs generally do not shorten the memory-access time, which is the most critical issue for todays stored-program machines. In monoprocessors the memory-access time is dominated by the electronic latency of the memory itself. Thus implementing OIs inside the memory hierarchy without changing the memory architecture cannot dramatically improve the global performance. In strongly coupled multiprocessors the node-bypass latency dominates. Therefore the higher the connectivity (possibly with optics), the shorter the path to another node, but the more expensive the network and the more complex the structure of electronic nodes. This relation leaves the choice of the best network open in terms of simplicity and latency reduction. The bottlenecks resulting from and the benefits of implementing OIs are discussed with respect to symmetric multiprocessors, rings, and distributed shared-memory supercomputers.
high performance interconnects | 2004
Avinash Karanth Kodi; Ahmed Louri
The paper proposes a highly connected optical interconnect based architecture that maximizes the channel availability for future scalable parallel computers, such as distributed shared memory (DSM) multiprocessors and cluster networks. As the system size increases, various messages (requests, responses and acknowledgments) increase in the network resulting in contention. This results in increasing the remote memory access latency and significantly affects the performance of these parallel computers. As a solution, we propose an architecture called RAPID (reconfigurable and scalable all-photonic interconnect for distributed-shared memory), that provides low remote memory access latency by providing fast and efficient unicast, multicast and broadcast capabilities using a combination of aggressively designed WDM, TDM and SDM techniques. We evaluated RAPID based on network characteristics and by simulation using synthetic traffic workloads and compared it against other networks such as electrical ring, torus, mesh and hypercube networks. We found that RAPID outperforms all networks and satisfies most of the requirements of parallel computer design such as low latency, high bandwidth, high connectivity, and easy scalability.
IEEE Journal of Selected Topics in Quantum Electronics | 2011
Avinash Karanth Kodi; Ahmed Louri
Optical interconnects are becoming ubiquitous for short-range communication within boards and racks due to higher communication bandwidth at lower power dissipation when compared to metallic interconnects. Efficient multiplexing techniques (wavelengths, time, and space) allow bandwidths to scale; static or predetermined resource allocation of wavelengths can be detrimental to network performance for nonuniform (adversial) workloads. Dynamic bandwidth reallocation (DBR) based on actual traffic pattern can lead to improved network performance by utilizing idle resources. While DBR techniques can alleviate interconnection bottlenecks, power consumption also increases considerably with increase in bit rate and channels. In this paper, we propose to improve the performance of optical interconnects using DBR techniques and simultaneously optimize the power consumption using dynamic power management (DPM) techniques. DBR reallocates idle channels to busy channels (wavelengths) for improving throughput, and DPM regulates the bit rates and supply voltages for the individual channels. A reconfigurable optoelectronic architecture and a performance adaptive algorithm for implementing DBR and DPM are proposed in this paper. Our proposed reconfiguration algorithm achieves a significant reduction in power consumption and considerable improvement in throughput, with a marginal increase in latency for synthetic and real (Splash-2) traffic traces.
design automation conference | 2010
Xiang Zhang; Ahmed Louri
Multi-core chips or chip multiprocessors (CMPs) are becoming the de facto architecture for scaling up performance and taking advantage of the increasing transistor count on the chip within reasonable power consumption levels. The projected increase in the number of cores in future CMPs is putting stringent demands on the design of the on-chip network (or network-on-chip, NOC). Nanophotonic interconnects have recently emerged as a viable alternate technology solution for the design of NOC because of their higher communication bandwidth, much reduced power consumption and wiring simplification. Several photonic NOC approaches have recently been proposed. A common feature of almost all of these approaches is the integration of the entire optical network onto a single silicon waveguide layer. However, keeping the entire network on a single layer has a serious implication for power losses and design complexity due to the large amount of waveguide crossings. In this paper, we propose MPNOC: a multilayer photonic networks-on-chip. MPNOC combines the recent advances in silicon photonics and three-dimensional (3D) stacking technology with architectural innovations in an integrated architecture that provides ample bandwidth, low latency, and energy efficient on-chip communications for future CMPs. Simulation results show MPNOC can achieve 81.92 TFLOP/s peak bandwidth and an energy savings up to 23% compared to other proposed planar photonic NOC architectures.
Applied Optics | 1994
Ahmed Louri; Hongki Sung
Two important parameters of a network for massively parallel computers are scalability and modularity. Scalability has two aspects: size and time (or generation). Size scalability refers to the property that the size of the network can be increased with nominal effect on the existing configuration. Also, the increase in size is expected to result in a linear increase in performance. Time scalability implies that the communication capabilities of a network should be large enough to support the evolution of processing elements through generations. A modular network enables the construction of a large network out of many smaller ones. The lack of these two important parameters has limited the use of certain types of interconnection networks in the area of massively parallel computers. We present a new modular optical interconnection network, called an optical multimesh hypercube (OMMH), which is both size and time scalable. The OMMH combines positive features of both the hypercube (small diameter, high connectivity, symmetry, simple routing, and fault tolerance) and the torus (constant node degree and size scalability) networks. Also presented is a three-dimensional optical implementation of the OMMH network. A basic building block of the OMMH network is a hypercube module that is constructed with free-space optics to provide compact and high-density localized hypercube connections. The OMMH network is then constructed by the connection of such basic building blocks with multiwavelength optical fibers to realize torus connections. The proposed implementation methodology is intended to exploit the advantages of both space-invariant free-space and multiwavelength fiber-based optical interconnect technologies. The analysis of the proposed implementation shows that such a network is optically feasible in terms of the physical size and the optical power budget.
IEEE Transactions on Parallel and Distributed Systems | 2000
Brian Webb; Ahmed Louri
A class of highly scalable interconnect topologies called the Scalable Optical Crossbar-Connected Interconnection Networks (SOCNs) is proposed. This proposed class of networks combines the use of tunable Vertical Cavity Surface Emitting Lasers (VCSELs), Wavelength Division Multiplexing (WDM) and a scalable, hierarchical network architecture to implement large-scale optical crossbar based networks. A free-space and optical waveguide-based crossbar interconnect utilizing tunable VCSEL arrays is proposed for interconnecting processor elements within a local cluster. A similar WDM optical crossbar using optical fibers is proposed for implementing intercluster crossbar links. The combination of the two technologies produces large-scale optical fan-out switches that could be used to implement relatively low cost, large scale, high bandwidth, low latency, fully connected crossbar clusters supporting up to hundreds of processors. An extension of the crossbar network architecture is also proposed that implements a hybrid network architecture that is much more scalable. This could be used to connect thousands of processors in a multiprocessor configuration while maintaining a low latency and high bandwidth. Such an architecture could be very suitable for constructing relatively inexpensive, highly scalable, high bandwidth, and fault-tolerant interconnects for large-scale, massively parallel computer systems. This paper presents a thorough analysis of two example topologies, including a comparison of the two topologies to other popular networks. In addition, an overview of a proposed optical implementation and power budget is presented, along with analysis of proposed media access control protocols and corresponding optical implementation.
international symposium on microarchitecture | 1991
Ahmed Louri
A 3-D optical architecture currently under investigation is described. This model, a single-instruction, multiple-data (SIMD) system, exploits spatial parallelism and processes 2-D binary images as fundamental computational entities using symbolic substitution logic. This system effectively implements highly structured data-parallel algorithms, such as signal and image processing, partial differential equations, multidimensional numerical transforms, and numerical supercomputing. The model includes a hierarchical mapping technique that helps design the algorithms and maps them onto the proposed optical architecture. The symbolic substitution logic and the mapping of data-parallel algorithms are discussed. The theoretical performance of the optical system was estimated and compared with that of electronic SIMD array processors. Preliminary results show that the system provides greater computational throughput and efficiency than its electronic counterparts. >
IEEE Transactions on Parallel and Distributed Systems | 1998
Ahmed Louri; Brent Weech; Costas Neocleous
A new, scalable interconnection topology called the Spanning Multichannel Linked Hypercube (SMLH) is proposed. This proposed network is very suitable to massively parallel systems and is highly amenable to optical implementation. The SMLH uses the hypercube topology as a basic building block and connects such building blocks using two-dimensional multichannel links (similar to spanning buses). In doing so, the SMLH combines positive features of both the hypercube (small diameter, high connectivity, symmetry, simple routing, and fault tolerance) and the spanning bus hypercube (SBH) (constant node degree, scalability, and ease of physical implementation), while at the same time circumventing their disadvantages. The SMLH topology supports many communication patterns found in different classes of computation, such as bus-based, mesh-based, and tree-based problems, as well as hypercube-based problems. A very attractive feature of the SMLH network is its ability to support a large number of processors with the possibility of maintaining a constant degree and a constant diameter. Other positive features include symmetry, incremental scalability, and fault tolerance. It is shown that the SMLH network provides better average message distance, average traffic density, and queuing delay than many similar networks, including the binary hypercube, the SBH, etc. Additionally, the SMLH has comparable performance to other high-performance hypercubic networks, including the Generalized Hypercube and the Hypermesh. An optical implementation methodology is proposed for SMLH. The implementation methodology combines both the advantages of free space optics with those of wavelength division multiplexing techniques. A detailed analysis of the feasibility of the proposed network is also presented.