Montse Peiron
Polytechnic University of Catalonia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Montse Peiron.
international symposium on computer architecture | 1992
Mateo Valero; Tomás Lang; José M. Llabería; Montse Peiron; Eduard Ayguadé; Juan J. Navarra
Address transformation schemes, such as skewing and linear transformations, have been proposed to achieve conflict-free vector access for some strides in vector processors with multi-module memories. In this paper, we extend these schemes to achieve this conflict-free access for a larger number of strides. The basic idea is to perform an out-of-order access to vectors of fixed length, equal to that of the vector registers of the processor. Both matched and unmatched memories are considered: we show that the number of strides is even larger for the latter case. The hardware for address calculations and access control is described and shown to be of similar complexity as that required for access in order.
IEEE Transactions on Computers | 1995
Mateo Valero; Tomás Lang; Montse Peiron; Eduard Ayguadé
Address transformation schemes, such as skewing and linear transformations, have been proposed to achieve conflict-free access for streams with constant stride. However, this is achieved only for some strides. In this paper, we extend these schemes to achieve this conflict-free access for a larger number of strides. The basic idea is to perform an out-of-order access to a stream of fixed length. This stream is then stored in a local memory and used in subsequent instructions. This mode of operation is suitable for vector processors and for processors with decoupled access. The scheme and mode of operation proposed produce the largest possible number of conflict-free strides. Memory systems with any ratio between the number of memory modules and memory latency are considered. The hardware for address calculations and access control is described and shown to be of similar complexity as that required for access in order. >
international symposium on computer architecture | 1995
Montse Peiron; Mateo Valero; Eduard Ayguadé; Tomás Lang
The high latency of memory accesses is one of the factors that contributes to reduce the performance of current vector supercomputers. The conflicts that can occur in the memory modules plus the collisions in the interconnection network in case of multiprocessors make the execution time of applications increase significantly. In this work we propose a memory access method for vector uniprocessors and multiprocessors that allows to perform stream accesses with the smallest possible latency in the majority of the cases. The basic idea is to arbitrate the memory access by defining the order in which the memory modules are visited. The stream elements are requested out of order. In addition, the access method also reduces the cost of the interconnection network.
international conference on supercomputing | 1994
Montse Peiron; Mateo Valero; Eduard Ayguadé
The synchronized and simultaneous access to several vectors that form a single stream is typical in SIMD vector multiprocessors as well as in MIMD superscalar multiprocessors with decoupled access. In this paper we propose a block-interleaved storage scheme and an out-of-order access mechanism that allows conflict-free access to streams with an arbitrary initial address and constant stride between elements. The memory system can have any degree of unmatchness and we consider the use of either a crossbar or a multistage interconnection network. A maximal number of conflict-free families including the most commonly used strides can be obtained. We describe the hardware for address calculation and control and show that their additional costs are minimal compared with the cost of the hardware for in-order access. Finally, we evaluate the applicability of this technique to real loops from some programs of the Perfect Club and SPEC suites.
Parallel Processing Letters | 1991
Mateo Valero; Tomás Lang; José M. Llabería; Montse Peiron; Juan J. Navarro; Eduard Ayguadé
Address transformation schemes, such as skewing and linear transformations, have been proposed to achieve conflict-free access to one family of strides in vector processors with matched memories. In this paper, we extend these schemes to achieve this conflict-free access for several families. The basic idea is to perform an out-of-order access to vectors of fixed length, equal to that of the vector registers of the processor. The hardware rcquired is similar to that for the access in order.
joint international conference on vector and parallel processing parallel processing | 1994
Mateo Valero; Montse Peiron; Eduard Ayguadé
In vector multiprocessor systems, collisions in the interconnection network and conflicts in the memory modules are the main causes of the performance degradation. In this work we propose to synchronize the access to the memory system so that streams can be accessed with the minimum achievable latency if their elements are requested out of order. The mechanism uses a blockinterleaved storage scheme and works for strides belonging to the most common families of strides found in real programs. The hardware required is also described and its complexity is shown to be equivalent to the complexity of the address generator when the processors request the elements in order.
Microprocessing and Microprogramming | 1993
Montse Peiron; Mateo Valero; Eduard Ayguadé; Tomás Lang
Abstract The simultaneous access to several vectors is typical in vector multiprocessors. When these accesses are performed in an asynchronous manner, collisions in the network and the conflicts in the memory modules produce high latencies that reduce the efficiency of the system. In this paper we propose a block-interleaved storage scheme to store streams as well as a synchronized out-of-order access mechanism to the vectors that compose the stream so no access conflicts occur for several families of strides.
euromicro workshop on parallel and distributed processing | 1994
Mateo Valero; Montse Peiron; Eduard Ayguadé
The poor bandwidth obtained from memory when conflicts arise in the modules or in the interconnection network degrades the performance of computers. Address transformation schemes, such as interleaving, skewing and linear transformations, have been proposed to achieve conflict-free access for streams with constant stride. However, this is achieved only for some strides. In this paper, we summarize a mechanism to request the elements in an out-of-order way which allows to achieve conflict-free access for a larger number of strides. We study the cases of a single vector processor and of a vector multiprocessor system. For this latter case, we propose a synchronous mode of accessing memory that can be applied in SIMD machines or in MIMD systems with decoupled access and execution.
Parallel Processing Letters | 1994
Mateo Valero; Eduard Ayguadé; Montse Peiron
In vector multiprocessor systems, collisions in the interconnection network and conflicts in the memory modules are the main causes of the performance degradation. In this work we use a synchronized interconnection network, and propose an interleaved storage scheme and an out-of-order access to the elements of the stream that allow conflict-free access. The streams are generated by the different processors in an asynchronous manner. The mechanism works for the most common strides found in real programs. The hardware required is also described and its complexity is shown to be equivalent to the complexity when the processor requests the elements in order.
euromicro workshop on parallel and distributed processing | 1993
Mateo Valero; Montse Peiron; Eduard Ayguadé
When accessing streams in vector multiprocessor machines, degradation in the interconnection network and conflicts in the memory modules are the factors that reduce the efficiency of the system. In this paper, we present a synchronous access mechanism that allows conflict-free access to streams in a SIMD vector multiprocessor system. Each processor accesses the corresponding elements out of order, in such a way that in each cycle the requested elements do not collide in the interconnection network. Moreover, memory modules are accessed so that conflicts are avoided. The use of the proposed mechanism in present-day architectures would allow conflict-free access to streams with the most common strides that appear in real applications. The additional hardware is described and is shown to be of a similar complexity as that required for access in order.<<ETX>>