Rolf Hoffmann
Technische Universität Darmstadt
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rolf Hoffmann.
international parallel and distributed processing symposium | 2004
Mathias Halbach; Rolf Hoffmann
Summary form only given. We have searched for a performance platform to run billions of simulations in the cellular automata model for optimizing applications. The question was how much speed-up could be gained by using the FPGA technology compared to optimized software. We have implemented to cellular automata rules in software on a PC and in hardware. On our low end experimental platform we reached a speed-up of 3 for a medium complex rule and 22 for a complex rule. If we would use the latest high end FPGA technology, speed-ups up to many thousand are realistic. A cluster of thousands of workstations would be necessary to reach the same performance, which is much more costly than the FPGA solution.
cellular automata for research and industry | 2006
Mathias Halbach; Rolf Hoffmann; Lars Both
The goal of our investigation is to find automatically the absolutely best rule for a moving creature in a cellular field The task of the creature is to visit all empty cells with a minimum number of steps We call this problem creatures exploration problem The behaviour was modelled using a variable state machine represented by a state table Input to the state table is the current state and the neighbours state in front of the creatures moving direction The problem is that the search space for the possible rules grows exponentially with the number of states, inputs and outputs We could solve the problem for six states, two inputs and two outputs with the aid of a parallel hardware platform (FPGA technology) The set of all possible n-state algorithms was first reduced by discarding equivalent, reducible and not strongly connected ones The algorithms which showed a certain performance for five initial configurations during simulation were extracted by the hardware and send to the host PC Additional tests for robustness and the behaviour of several creatures was carried out in software One creature with the best algorithm can visit 99.92 % of the empty cells of 26 test configurations Several creatures up to 16 can perform the task more efficiently for the tested initial configuration.
International Journal of Parallel, Emergent and Distributed Systems | 2009
Johannes Jendrsczok; Patrick Ediger; Rolf Hoffmann
The global cellular automata model (GCA) is a massively parallel computation model which extends the classical cellular automata model (CA) with dynamic global neighbors. We present for that model a data parallel architecture which is scalable in the number of parallel pipelines and which uses application specific operators (adapted operators). The instruction set consists of control and RULE instructions. A RULE computes the next cell contents for each cell in the destination object. The machine consists of P pipelines. Each pipeline has an associated primary memory bank and has access to the global memory (real or emulated multiport memory). The diffusion of particles was used as an example in order to demonstrate the adaptive operators, the machine programming and its performance. Particles which point to each other within a defined neighborhood search space are interchanged. The pointers are modified in each generation by apseudo random function. The machine with up to 32 pipelines was synthesized for an Altera FPGA for that application.
parallel computing technologies | 2001
Rolf Hoffmann; Klaus-Peter Völkmann; S. Waldschmidt; Wolfgang Heenes
A model called global cellular automata (GCA) will be introduced. The new model preserves the good features of the cellular automata but overcomes its restrictions. In the GCA the cell state consists of a data field and additional pointers. Via these pointers, each cell has read access to any other cell in the cell field, and the pointers may be changed from generation to generation. Compared to the cellular automata the neighbourhood is dynamic and differs from cell to cell. For many applications parallel algorithms can be found straight forward and can directly be mapped on this model. As the model is also massive parallel in a simple way, it can efficiently be supported by hardware.
international conference / workshop on embedded computer systems: architectures, modeling and simulation | 2009
Christian Schäck; Wolfgang Heenes; Rolf Hoffmann
The GCA (Global Cellular Automata) model consists of a collection of cells which change their states synchronously depending on the states of their neighbors like in the classical CA (Cellular Automata) model. In differentiation to the CA model the neighbors are not fixed and local, they are variable and global. The GCA model is applicable to a wide range of parallel algorithms. In this paper a general purpose multiprocessor architecture for the massively parallel GCA model is presented. In contrast to a special purpose implementation of a GCA algorithm the multiprocessor system allows the implementation in a flexible way through programming. The architecture mainly consists of a set of processors (Nios II) and a network. The Nios II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The network is a well-known omega network. Only read-accesses through the network are necessary in the GCA model leading to a simplified structure. A system with up to 32 processors was implemented as a prototype on an FPGA. The analysis and implementation results have shown that the performance of the system scales with the number of processors.
international parallel and distributed processing symposium | 2003
Rolf Hoffmann; Klaus-Peter Völkmann; Wolfgang Heenes
We have previously introduced the massively parallel global cellular automata (GCA) model. Parallel algorithms derived from applications can be mapped straightforward onto this model. In this model a cell in the cell field is dynamically connected (access pattern, dynamic neighbourhood) to other cells. The model can be implemented by pointers stored in the cell state. Via these pointers, each cell has read access to any other cell in the cell field, and the pointers may be changed from generation to generation. We have investigated different types of the model in order of minimize hardware/software implementation cost. So we have classified the GCA into types with respect to space, time or data dependency of the access pattern. We have investigated a number of different GCA algorithms and found out, that in most cases a time dependent access pattern is sufficient. To find out the usefulness of the data dependent access pattern we constructed a sophisticated merge sort algorithm, in which the target addresses are computed in contrast to classical algorithms where the data elements are moved. It turned out, that we could not achieve a speed up which we expected compared to an algorithm implemented on the more simple time dependent model. This is another confirmation that it is sufficient to implement only the time and space dependent model and thus reduce the hardware/software implementation cost.
international conference on high performance computing and simulation | 2010
Patrick Ediger; Rolf Hoffmann; Dominique Désérable
Given a triangular grid of N cells (communication nodes) with toroidal connections. The goal was to solve the routing problem with N/2 agents, each of the agents having the task to a transport a message from a source to a target. This task is also known as multiple target searching. The agents shall behave according to a control algorithm implemented as finite state machine (FSM). Using a genetic procedure (island genetic algorithm) algorithms were evolved that could solve successfully all the test cases under consideration. For comparison, intelligent random walkers were defined, which directly try to move to the target, or deviate from their way with a certain probability. It turned out that the evolved agents perform the task 22% faster than the intelligent random walkers.
cellular automata for research and industry | 2004
Mathias Halbach; Wolfgang Heenes; Rolf Hoffmann; Jan Tisje
We have investigated a problem where the goal is to find automatically the best rule for a cell in the cellular automata model. The cells are either of type OBSTACLE, EMPTY or CREATURE. Only CREATURE can move around in the cell space in one changeable direction and can perform four actions: if the path to the next cell is blocked turn left or right, if the path is free, i. e. the neighbor cell is of type EMPTY: move ahead and simultaneously turn left or right. The task of the creature is to cross all empty cells with a minimum number of steps.
field programmable logic and applications | 2000
Rolf Hoffmann; Bernd Ulmann; Klaus-Peter Völkmann; S. Waldschmidt
Stream processing is a very efficient method to process large amounts of data. In contrast to vector architectures, stream processing involves instruction stream which are associated with data streams instead of a single instruction operating on data streams (vectors) thus facilitating individual processing of stream elements. Furthermore, operators in the arithmetic/logic unit can be configured to meet special processing requirements of an application. In the following article an architecture which can be configured as a stream processor is described.
parallel computing technologies | 2005
Mathias Halbach; Rolf Hoffmann
The goal of our investigation is to find automatically the best rule for a cell in the cellular automata model. The cells are either of type Obstacle, Empty or Creature. Only Creature can move around in the cell space and can perform one of the four actions: if the path to the next cell is blocked: turn left or right, if the path is free: move ahead and simultaneously turn left or right. The task of the creature is to cross all empty cells with a minimum number of steps. The behavior was modeled using a variable state machine represented by a state table. Input to the state table is the neighbors state in front of its moving direction. The goal is to find the absolutely best rule in the set of all possible rules. The search space grows exponentially with the number of states. As simulation, testing and evaluating the quality are very time consuming in software, the migration of the problem to a parallel hardware platform is a promising solution. In order to reduce the computation time, the search procedure was (1) implemented in hardware and (2) solutions which are equivalent under state permutations were not generated and (3) solutions which show or expect bad or trivial behavior were excluded as soon as possible in a preselection phase. Exactly six different five-state algorithms could be detected, which allow to cross all empty cells for all the given initial configurations. We described this model in Verilog HDL and in AHDL. A hardware synthesizing tool transforms the description into a configuration file which was loaded into a field programmable gate array (FPGA). Hardware implementation offers a significant speed up of many thousands compared to software.