Ahmed El-Amawy
Louisiana State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ahmed El-Amawy.
IEEE Transactions on Parallel and Distributed Systems | 1991
Ahmed El-Amawy; Shahram Latifi
A new hypercube-type structure, the folded hypercube (FHC), which is basically a standard hypercube with some extra links established between its nodes, is proposed and analyzed. The hardware overhead is almost 1/n, n being the dimensionality of the hypercube, which is negligible for large n. For this new design, optimal routing algorithms are developed and proven to be remarkably more efficient than those of the conventional n-cube. For one-to-one communication, each node can reach any other node in the network in at most (n/2) hops (each hop corresponds to the traversal of a single link), as opposed to n hops in the standard hypercube. One-to-all communication (broadcasting) can also be performed in only (n/2) steps, yielding a 50% improvement in broadcasting time over that of the standard hypercube. All routing algorithms are simple and easy to implement. Correctness proofs for the algorithms are given. For the proposed architecture, communication parameters such as average distance, message traffic density, and communication time delay are derived. In addition, some fault tolerance capabilities of this architecture are quantified and compared to those of the standard cube. It is shown that this structure offers substantial improvement over existing hypercube-type networks in terms of the above-mentioned network parameters. >
IEEE Transactions on Computers | 1989
Ahmed El-Amawy
An array that inverts an n*n dense matrix in 5n-1 time units, including I/O time, is presented. The inversion algorithm consists of three phases and assumes that Gaussian elimination without pivoting can be applied. The array, which consists of 2n/sup 2/-n simple processing elements, implements and overlaps the execution of all three phases without any need for intermediate I/O or reconfiguration. An efficient data-steering technique which is well suited for feedback recurrences is utilized. >
Neural Networks | 1997
Behnam S. Arad; Ahmed El-Amawy
Abstract This paper presents an extensive study of fault tolerant training of feedforward artificial neural networks. We present several versions of a very robust training algorithm and report the results of their simulations. Our algorithm is shown to outperform all existing training algorithms in its ability to tolerate different fault types and larger number of hidden unit failures. We show that the generalization ability of the proposed algorithm is substantially better than that of the standard backpropagation algorithm and is comparable with that of other existing fault tolerant algorithms. The algorithm is based on the backpropagation algorithm with built-in measures for extensive fault tolerant training. A novel concept presented in this paper is that of training the network for fault types beyond the limits of the activation function. We demonstrate that training for such unrealistic fault types enables the network to be more tolerant to realistic fault types within the limits of the activation function. Further, tradeoffs between training time, enhanced fault tolerance, and generalization properties are studied.
IEEE Transactions on Parallel and Distributed Systems | 1993
Ahmed El-Amawy
A scheme for global synchronization of arbitrarily large computing structures such that clock skew between any two communicating cells is bounded above by a constant is described. The scheme utilizes clock nodes that perform simple processing on clock signals to maintain a constant skew bound irrespective of the size of the computing structure. Among the salient features of the scheme is the interdependence between network topology, skew upper bound, and maximum clocking rate achievable. A 2-D mesh framework is used to present the concepts, introduce three network designs, and to prove some basic results. For each network the (constant) upper bound on clock skew between any two communicating processors, is established, and its independence of network size is shown. Simulations were carried out to verify correctness and to check the workability of the scheme. A 4*4 network was built and successfully tested for stability. Such issues as node design, clocking of nonplanar structures such as hypercubes, and the concept of fuse programmed clock networks are addressed. >
IEEE Transactions on Computers | 1996
Priyalal Kulasinghe; Ahmed El-Amawy
We develop a formal and systematic methodology for designing an optimal multiple bus system (MBS) realizing a set of interconnection functions whose graphical representation (denoted as IFG) is symmetric. The problem of constructing an optimal MBS for a given IFG is NP-hard. In this paper, we show that polynomial time solutions exist when the IFG is vertex symmetric. This is the case of interest for the vast majority of important interconnection function sets. We present a particular partition (which can be found in polynomial time) on the edge set of a vertex symmetric IFG, that produces a symmetric MBS with minimum number of buses as well as minimum number of interfaces. We demonstrate several advantages of such an MBS over a direct-link architecture realizing the same IFG, in terms of the number of ports per processor, number of neighbors per processors, and the diameter.
international parallel processing symposium | 1999
Martin Feldman; Ahmed El-Amawy
In this paper we consider the use of optical slab waveguides as buses in a parallel computing environment. We show that slab buses can connect to many more elements than conventional electrical or fiber optic buses. We also introduce a novel multiplexing scheme called mode division multiplexing that vastly increases the number of independent channels that a single slab can support. We show that optical slab waveguides have, in principle, capacities of over a million independent channels (distinguished by about 1000 “out-of-plane modes” and about 1000 wavelengths) in a single physical medium, with each channel capable of sustaining a load of over 1000. This becomes comparable to the high capacity of a free space optical system, but with the ability to broadcast each light source to many physically separated locations. Preliminary experiments on the “sawtooth slab bus” point to the feasibility of practical slab buses. We also present a bus arbitration example that uses the high capacity and loading of slab buses to achieve sublogarithmic arbitration time.
IEEE Transactions on Parallel and Distributed Systems | 1997
Ahmed El-Amawy; Priyalal Kulasinghe
This paper addresses the problem of mapping a feedforward ANN onto a multiple bus system, MBS, with p processors and b buses so as to minimize the total execution time. We present an algorithm which assigns the nodes of a given computational layer (c-layer) to processors such that the computation lower bound [N/sup l//p]t/sub p//sup l/ and the communication lower bound [N/sup l//b]t/sub c/ are achieved simultaneously, where N/sup l/ is the number of nodes in the mapped c-layer l and t/sub p//sup l/ and t/sub c/ are the computation and communication times, respectively, associated with a node in the layer. When computation and communication are not overlapped, we show that the optimal number of processors needed is either 1 or p, depending on the ratio t/sub p//sup l//t/sub c/. When computation and communication are overlapped, we show that the optimal number of processors needed is either 1 or ([t/sub p//sup l//t/sub c/])b. We show that there is a unique arrangement of interfaces such that the total number of interfaces is minimum and the optimal time is reached. Finally, we compare the relative merits of the MBS simulating ANNs over the recently introduced checkerboarding scheme.
IEEE Transactions on Industry Applications | 1987
Ali Mirbod; Ahmed El-Amawy
The performance analysis of a novel fast-response micro-processor-based firing and control scheme for a phase-controlled rectifier is presented. Controller performance and stability analysis are emphasized, particularly when the converter operates in a closed loop fed by a weak ac system. The firing angle control scheme, which relies on real-time projection of the firing delay angle, has been implemented and tested. In this implementation the controller responds within 20¿s its to a change in the desired output voltage. The controller is synchronized to the line through a software phase-locked loop (PLL), which adapts to large variations in the line frequency. With this controller the bridge rectifier can operate properly, with a constant steady-state open-loop gain, even when a weak ac system with unregulated frequency feeds the converter. The operation and performance of the microprocessor-based converter with a current feedback loop are described. Two simple stability analyses, one based on system function simplification and the other based on exact system representation, are discussed and experimentally verified. The generation of subharmonics caused by the line inductance and the reference zero-crossing detection circuit is also studied. Experimental results are reported.
IEEE Transactions on Industrial Electronics | 1986
Ali Mirbod; Ahmed El-Amawy
A novel general-purpose microprocessor-based control circuit for a three-phase controlled rectifier is presented. The performance exhibited by the controller is superior to previously presented circuits. The firing angle is smoothly controlled in the range of 0 to 180/s=deg/ with a fast response and a constant open loop gain, even for the cases where the converter is fed by a weak ac system of unregulated frequency. The synchronization between the line and VCO is implemented by an efficient software-controlled PLL. The effect of source impedance in delaying the synchronization signal has been properly compensated for by the-control circuit. The implementation of the compensation circuit is also presented. The hardware and software control circuit implementation built around an 8086 microprocessor is discussed, and the experimental results are given.
IEEE Transactions on Computers | 1995
Priyalal Kulasinghe; Ahmed El-Amawy
This paper addresses the combinatorial problem of constructing a minimal cost, bused, interconnection among a set of modules (or processors). Although some work has been reported on bused interconnection between modules, the compuational complexity of the problem has not been previously addressed. We show that the optimization problem of finding a minimal cost interconnection among modules to realize a certain set of data transfers is NP-Hard. >