Abdulla Bataineh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Abdulla Bataineh is active.

Explore More

Publication

Featured researches published by Abdulla Bataineh.

ieee international conference on high performance computing data and analytics | 2012

Cray cascade: a scalable HPC system based on a Dragonfly network

Greg Faanes; Abdulla Bataineh; Duncan Roweth; Tom Court; Edwin L. Froese; Bob Alverson; Timothy J. Johnson; Joe Kopnick; Mike Higgins; James Reinhard

Higher global bandwidth requirement for many applications and lower network cost have motivated the use of the Dragonfly network topology for high performance computing systems. In this paper we present the architecture of the Cray Cascade system, a distributed memory system based on the Dragonfly [1] network topology. We describe the structure of the system, its Dragonfly network and the routing algorithms. We describe a set of advanced features supporting both mainstream high performance computing applications and emerging global address space programing models. We present a combination of performance results from prototype systems and simulation data for large systems. We demonstrate the value of the Dragonfly topology and the benefits obtained through extensive use of adaptive routing.

conference on high performance computing (supercomputing) | 2007

The Cray BlackWidow: a highly scalable vector multiprocessor

Dennis Abts; Abdulla Bataineh; Steve Scott; Greg Faanes; Jim Schwarzmeier; Eric P. Lundberg; Timothy J. Johnson; Mike Bye; Gerald A. Schwoerer

This paper describes the system architecture of the Cray BlackWidow scalable vector multiprocessor. The BlackWidow system is a distributed shared memory (DSM) architecture that is scalable to 32K processors, each with a 4-way dispatch scalar execution unit and an 8-pipe vector unit capable of 20.8 Gflops for 64-bit operations and 41.6 Gflops for 32-bit operations at the prototype operating frequency of 1.3 GHz. Global memory is directly accessible with processor loads and stores and is globally coherent. The system supports thousands of outstanding references to hide remote memory latencies, and provides a rich suite of built-in synchronization primitives. Each BlackWidow node is implemented as a 4-way SMP with up to 128 Gbytes of DDR2 main memory capacity. The system supports common programming models such as MPI and OpenMP, as well as global address space languages such as UPC and CAF. We describe the system architecture and microarchitecture of the processor, memory controller, and router chips. We give preliminary performance results and discuss design tradeoffs.

IEEE Transactions on Parallel and Distributed Systems | 1993

Balanced parallel sort on hypercube multiprocessors

Bülent Abali; Füsun Özgüner; Abdulla Bataineh

A parallel sorting algorithm for sorting n elements evenly distributed over 2/sup d/ p nodes of a d-dimensional hypercube is presented. The average running time of the algorithm is O((n log n)/p+p log 2n). The algorithm maintains a perfect load balance in the nodes by determining the (kn/p)th elements (k1,. . ., (p-1)) of the final sorted list in advance. These p-1 keys are used to partition the sorted sublists in each node to redistribute data to the nodes to be merged in parallel. The nodes finish the sort with an equal number of elements (n/p) regardless of the data distribution. A parallel selection algorithm for determining the balanced partition keys in O(p log2n) time is presented. The speed of the sorting algorithm is further enhanced by the distance-d communication capability of the iPSC/2 hypercube computer and a novel conflict-free routing algorithm. Experimental results on a 16-node hypercube computer show that the sorting algorithm is competitive with the previous algorithms and faster for skewed data distributions. >

distributed memory computing conference | 1990

Load balanced sort on hypercube multiprocessors

Bulent Abali; Füsun Özgüner; Abdulla Bataineh

A parallel algorithm for sorting n elements evenly distributed over 2d = p nodes of a d-dimensional hypercube is given. The algorithm ensures that the nodes always receive equal number of elements (n/p) at the end, regardless of the skew in data distribution.

international conference on computer aided design | 1992

Parallel logic and fault simulation algorithms for shared memory vector machines

Abdulla Bataineh; Füsun Özgüner; Imre Szauter

Algorithms for logic and fault simulation, developed and implemented on the Cray Y-MP supercomputer, a general-purpose shared-memory parallel machine with vector processors, are presented. The parallel-and-vector version of the event-driven logic simulation algorithm achieves a speedup of 52 on the Cray Y-MP with eight processors, with a maximum performance of about 2 million events per second. These results are comparable to the performance of hardware simulation engines and can be implemented on other parallel machines without major modifications. The second algorithm is a parallel and vector version of the parallel fault simulation algorithm. Experimental results on benchmark circuits show that very high evaluation rates (20 to 32*10/sup 9/ evaluations/s.) can be achieved. Speedup factors of 45 to 69 are observed between the scalar and the parallel-and-vector execution of the fault simulator.<<ETX>>

Parallel Algorithms and Applications | 1999

A PARALLEL AND VECTOR IMPLEMENTATION OF CIRCUIT SIMULATION ON CRAY SUPERCOMPUTERS

Abdulla Bataineh; Mike Aamodt

Abstract This paper reports the results of vectorizing and parallelizing the circuit simulator HSPICE on the Cray C90 supercomputer. The results show that significant speedup of circuit simulation is achievable when the transistor model evaluation and the Jacobian matrix update are vectorized and parallelized efficiently. A speedup of 40 times on 16 vector processors was achieved for MOSFET transistor model evaluation component. Furthermore, matrix update time was reduced by one order of magnitude and the solver time was reduced by a factor of 2 to 5 for the four circuits simulated. As a result, a total simulation speedup of about 12 times on 16 vector processors was achieved.

Simulation | 1993

Parallel and Vector Logic and Fault Simulation Algorithms on the Cray Y-MP Supercomputer

Abdulla Bataineh; Füsun Özgüner; Imre Szauter

In this paper, we present parallel algorithms for logic and fault simulation, developed for and implemented on the Cray Y-MP supercomputer, a general purpose shared- memory parallel machine with vector processors. The parallel-and-vector version of the event-driven logic simulation algorithm achieves a speedup of 52 on the Cray Y-MP with 8 processors, with a maximum performance of about 2 million events per second. These results are comparable to the performance of hardware simulation engines and can be implemented on other parallel machines without major modifications. The second algorithm is a parallel and vector version of the parallel fault simulation algorithm. Experimental results on benchmark circuits [1] show that very high evaluation rates (20 to 32×109 evaluations/s.) can be achieved. Speedup factors of 45 to 69 are observed between the scalar and the parallel and vector execution of the fault simulator.

conference on high performance computing (supercomputing) | 1992

Parallel-and-vector implementation of the event-driven logic simulation algorithm on the Cray Y-MP supercomputer

Abdulla Bataineh; Füsun Özgüner

The authors propose logic simulation techniques using parallel and vector machines to reduce the simulation time of large digital circuits. Three algorithms for logic simulation have been developed and implemented on the Cray Y-MP supercomputer, a general-purpose shared-memory parallel machine with vector processors. The first is a vector version of the event-driven algorithm that achieves a speedup of 13.6 on a single Cray Y-MP processor. The second is a parallel version of the event-driven algorithm that achieves a speedup of 6.3 with eight processors. The third is a complete parallel and vector version of the event-driven algorithm that achieves a speedup of 52 on the Cray Y-MP with eight processors. The proposed techniques are very general so that they can be implemented on other computers without major modifications. Comparisons between the three algorithms and commercial logic simulators are included.<<ETX>>

Computers & Structures | 1992

Structural analysis of shallow shells on the cray Y-MP supercomputer

M.S. Qatu; Abdulla Bataineh

Abstract Structural analysis of shallow shells is performed and relatively accurate displacements and stresses are obtained. An energy method, which is an extension of the Ritz method, is used in the analysis. Algebraic polynomials are used as displacement functions. The numerical problems which resulted in inaccurate stresses in previous publications are improved by making use of symmetry and performing the computations on the CRAY Y-MP supercomputer which has 29-digit double-precision arithmatics. Curvature effects upon deflections and stress resultants of shallow shells with cantilever and ‘semi-cantilever’ boundaries are studied.

Archive | 2006