Wolfgang E. Nagel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wolfgang E. Nagel is active.

Explore More

Publication

Featured researches published by Wolfgang E. Nagel.

parallel computing | 1990

Parallelizing QCD with dynamical fermions on a CRAY multiprocessor system

Siegfried Knecht; Edwin Laermann; Wolfgang E. Nagel

Abstract Quantum Chromodynamics, the theory of the nuclear forces, is simulated on a space-time lattice by means of a Hybrid Monte Carlo algorithm which incorporates dynamical quarks exactly. The large computational needs of the program necessitate to use modern supercomputers like the CRAY Y-MP that achieve their high computing speed by using both vector and parallel hardware. This paper gives a short introduction into the CRAY Y-MP system architecture and the autotasking concept which offers a way for finding automatically parallelism in existing FORTRAN programs. In detail, the paper describes the application as well as the optimization and transformation process which had to be done to improve the performance. After some code modifications a speedup of nearly 7 on a CRAY Y-MP8/832 could be achieved, leading to a performance of about 1.65 GFLOPS.

international conference on supercomputing | 1988

Three-dimensional numerical simulations of the czochralski bulk flow on a CRAY X-MP multiprocessor architecture

Wolfgang E. Nagel; Kurt Wingerath

Today, numerical simulation is used more and more to get detailed information about crystal growth processes. In the Czochralski crystal growth from electrically conductive melts, the application of an external magnetic field has become a very useful technique for improving crystal quality. The integration of the magnetic field into the simulation program leads to a system of coupled partial differential equations, i.e. incompressible Navier-Stokes equations and convective heat equations in combination with an external magnetic field, which have to be solved. Because of the regular access to data this problem is suitable for vector computers. The CRAY X-MP provides vectorization and multiprocessing capabilities. With multitasking it is possible to run different parts of one program in parallel. Both multitasking concepts, macrotasking and microtasking, were examined to evaluate their potential to speed-up the Czochralski crystal growth simulation program implemented on a CRAY X-MP. We will give a survey of the application, and will describe the concepts of macrotasking and microtasking on the CRAY X-MP architecture as well as the integration of these concepts into the simulation program. Timing results provide information about the user benefit obtained by using these techniques.

international conference on supercomputing | 1989

A comparison of parallel processing on CRAY X-MP AND IBM 3090 VF multiprocessors

Ferenc Szelenyi; Wolfgang E. Nagel

Modern supercomputers like CRAY X-MP and IBM 3090 VF achieve their high computing speed by using both vector and parallel hardware. The available multitasking concepts supporting concurrent execution of tasks within a single application have been designed for different purposes: owing to the small dispatching overhead, fine-grain parallelism allows parallelization of small units of computation, usually chunks of a DO loop. Larger units of computation, such as arithmetic intensive subroutines, may be processed independently using coarse-grain parallelism. This paper gives an introduction to the concepts of CRAY macro- and microtasking, and of IBM Multitasking Facility (MTF), the ECSEC microtasking prototype, and Parallel FORTRAN. Basic parallelization using fine-grain as well as coarse-grain techniques have been applied to linear algebra kernels, consisting in matrix multiplication and LU decomposition, and an application program simulating a Czochralski bulk flow describing a crystal growing system. Depending on the problem, it can be shown that a parallel speed up of nearly four (on the CRAY X-MP/416) and nearly six (on the IBM 3090-600E) can be achieved for the implementation of the matrix multiplication. All other kernels and the application program were limited by serialization overheads arising from memory conflicts (bank and section conflicts on CRAY, cache coherence on IBM) and multitasking primitive overheads. However, with a careful implementation a parallel efficiency of more than 0.9 can be obtained on both multiprocessors.

parallel computing | 1991

Paper: Benchmarking parallel programs in a multiprogramming environment: The PAR-bench system

Wolfgang E. Nagel; Markus A. Linn

While the efficiency of multitasking is proven for parallel programs running in a dedicated environment, this paper wants to show a new approach for the assessment of multitasking. It describes the benchmark generation environment PAR-Bench, which enables measurements of effects introduced by parallel programs running in a multiprogramming mode. The PAR-Bench system is implemented on Cray multiprocessor systems under the operating systems COS and UNICOS. Using PAR-Bench, the benchmark process is divided into two parts: In a first step, according to user-supplied parameters like MFLOPS rate, memory and I/O activities, CPU time etc., the PAR-Bench system generates synthetic benchmark programs by using the hardware performance monitor HPM. These programs can be used to simulate a given sites workload in a flexible way. In a second step, the system can be used to run this workload several times with varied parameters; the substantial work to be done is fixed, nevertheless dynamical changes of program parameters like memory size, priority, and also variations of the degree of parallelism are supported. The PAR-Bench system provides information about mutual influences of parallel programs and background load, enabling us to evaluate different multitasking implementations, different operating systems, and different computer hardware. Because of the abundance of data concerning characteristic program parameters and program timings, as a further component of PAR-Bench the graphical analyzing system GRANSYS was developed, realizing automatic analysis features for benchmark data including data interpretation and visualization.

international conference on supercomputing | 1991

Parallel programs and background load: efficiency studies with the PAR-Bench system

Wolfgang E. Nagel; Markus A. Linn

Today most of the multiprocessor supercomputer systems are still used within a multiprogramming environment, where the individual processors execute different jobs which are totally independent. All programs compete for the available resources. With multitasking, processors may also execute different parts of one program in parallel. The behaviour of parallel programs within a multiprogramming environment and the influence of these programs on the overall workload is of great interest for the rating of multitasking concepts with respect to their efficiency in practice. While the efficiency of multitasking is proved for parallel programs running in a dedicated environment, this paper leads to a new approach for the assessment of multitasking. It shortly describes the benchmark generation environment PAR-Bench, which is implemented on a Cray Y-MP under the UNICOS operating system; based on full information about system activities, this system enables reasonable measurements of parallel programs in a multiprogramming mode. Additionally, a few typical results obtained from first measurements

Archive | 1996

Software-Werkzeuge für Parallelrechner: Entwicklungen im Forschungszentrum Jülich

Wolfgang E. Nagel

Das Zentralinstitut fur Angewandte Mathematik (ZAM) betreibt seit mehr als 10 Jahren Parallelrechner. Seit 1987 wird ein Teil der Rechenleistung — im Rahmen des Hochstleistungsrechenzentrums HLRZ — mehr als 200 Benutzergruppen zur Verfugung gestellt, die uber ganz Deutschland verteilt sind und diese zentralen Supercomputer-Ressourcen nutzen. Momentan betreibt das ZAM neben einem IBM-Mainframe (ES/9000) einen Workstation-Rechnerkomplex (IBM SP 2), zwei Parallelrechner mit gemeinsamem Speicher (CRAY Y-MP8, CRAY Y-MP M94) und ein massiv-paralleles System mit verteiltem Speicher (Intel Paragon XP/S 10, 140 Prozessoren). Fur das Fruhjahr 1996 ist der Austausch der beiden Cray-Rechner durch einen neuen Systemkomplex bestehend aus einem Vektorrechner CRAY T90/12, zwei CRAY J90 mit insgesamt 20 Prozessoren und einem massiv-parallelen Rechner CRAY T3E mit 512 Prozessoren vorgesehen.

11. Gl-Fachtagung über Rechenzentren | 1995

Effektive Nutzung von Parallelrechnern in Rechenzentrumsumgebungen

Wolfgang E. Nagel

Das Zentralinstitut fur Angewandte Mathematik (ZAM) des Forschungszentrums Julich (KFA) betreibt seit mehr als 10 Jahren Parallelrechner. Seit 1987 wird ein Teil der Rechenleistung — im Rahmen des Hochstleistungsrechenzentrums HLRZ — mehr als 200 Benutzergruppen zur Verfugung gestellt, die uber ganz Deutschland verteilt sind und diese zentralen Parallelrechnerressourcen nutzen. Unsere Erfahrungen zeigen, das die effektive Nutzung derartiger Rechner von der Leistungsfahigkeit des Dienstleistungsangebots abhangt. Neben den klassischen Aufgaben mus hier — als neue und wichtige Teilkomponente des Aufgabenspektrums — die Entwicklung und Bereitstellung von Software-Werkzeugen sowohl zur Unterstutzung der Programmierung als auch zur Performance-Analyse und — Optimierung genannt werden, die fur diese neue Klasse von Rechnern eine immer starkere Bedeutung gewinnt.

GI - 20. Jahrestagung I, Informatik auf dem Weg zum Anwender | 1990

Prinzipien der Parallelverarbeitung auf Rechnern mit gemeinsamem Speicher

Wolfgang E. Nagel

Supercomputer wie CRAY X-MP und CRAY Y-MP erreichen ihre hohe Verarbeitungsgeschwindigkeit durch die Nutzung von sowohl Vektor- als auch Parallelverarbeitung. Am Beispiel des Vektorrechners CRAY Y-MP wird eine kurze Einfuhrung in die Systemarchitektur eines Multiprozessor-Rechners mit gemeinsamem Hauptspeicher gegeben. Fur Rechner dieser Art werden die Parallelisierungskonzepte beschrieben, die heute bereits verfugbar sind und auch in Produktionsumgebungen effizient genutzt werden. Die prinzipiellen Unterschiede werden vorgestellt und anhand der aktuellen Implementationen auf CRAY-Rechnern diskutiert. Anwendungsbeispiele dokumentieren die Leistungsfahigkeit der parallelen Konzepte sowohl fur kleinere Programmkerne der linearen Algebra als auch fur grose Anwendungsprogramme.

Computer benchmarks | 1993