Tor Sørevik | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tor Sørevik is active.

Explore More

Publication

Featured researches published by Tor Sørevik.

parallel computing | 2005

Load balancing and OpenMP implementation of nested parallelism

Ragnhild Blikberg; Tor Sørevik

Many problems have multiple layers of parallelism. The outer-level may consist of few and coarse-grained tasks. Next, each of these tasks may also be rich in parallelism, and be split into a number of fine-grained tasks, which again may consist of even finer subtasks, and so on. Here we argue and demonstrate by examples that utilizing multiple layers of parallelism may give much better scaling than if one restricts oneself to only one level of parallelism. Two non-trivial issues for multi-level parallelism are load balancing and implementation. In this paper we provide an algorithm for finding good distributions of threads to tasks and discuss how to implement nested parallelism in OpenMP.

SIAM Journal on Matrix Analysis and Applications | 1992

Efficient matrix multiplication on SIMD computers

Petter E. Bjørstad; Fredrik Manne; Tor Sørevik; M. Vajteršic

Efficient algorithms are described for matrix multiplication on SIMD computers. SIMD implementations of Winograd’s algorithm are considered in the case where additions are faster than multiplications. Classical kernels and the use of Strassen’s algorithm are also considered. Actual performance figures using the MasPar family of SIMD computers are presented and discussed.

Mathematics of Computation | 2004

Four-dimensional lattice rules generated by skew-circulant matrices

James N. Lyness; Tor Sørevik

We introduce the class of skew-circulant lattice rules. These are s-dimensional lattice rules that may be generated by the rows of an s × s skew-circulant matrix. (This is a minor variant of the familiar circulant matrix.) We present briefly some of the underlying theory of these matrices and rules. We are particularly interested in finding rules of specified trigonometric degree d. We describe some of the results of computer-based searches for optimal four-dimensional skew-circulant rules. Besides determining optimal rules for δ = d + 1 ≤ 47, we have constructed an infinite sequence of rules Q(4, δ) that has a limit rho index of 27/34 ≈ 0.79. This index is an efficiency measure, which cannot exceed 1, and is inversely proportional to the abscissa count.

parallel computing | 1996

Partitioning an Array onto a Mesh of Processors

Fredrik Manne; Tor Sørevik

Achieving an even load balance with a low communication overhead is a fundamental task in parallel computing. In this paper we consider the problem of partitioning an array into a number of blocks such that the maximum amount of work in any block is as low as possible. We review different proposed schemes for this problem and the complexity of their communication pattern. We present new approximation algorithms for computing a well balanced generalized block distribution as well as an algorithm for computing an optimal semi-generalized block distribution. The various algorithms are tested and compared on a number of different matrices.

Journal of Algorithms | 1995

Optimal partitioning of sequences

Fredrik Manne; Tor Sørevik

The problem of partitioning a sequence of n real numbers into p intervals is considered. The goal is to find a partition such that the cost of the most expensive interval measured with a cost function ƒ is minimized. An efficient algorithm which solves the problem in time O((n − p)p log p) is developed. The algorithm is based on finding a sequence of feasible nonoptimal partitions, each having only one way it can be improved to get a better partition. Finally a number of related problems are considered and shown to be solvable by slight modifications of our main algorithm.

Computing | 1991

A search program for finding optimal integration lattices

J. N. Lyness; Tor Sørevik

In this paper we describe some of the salient features of our search program for finding good lattices. The reciprocals of these lattices are used in lattice integration rules, of which number theoretic rules form a major subset. We describe algorithms for ϱ(⋎), the Zaremba index (or figure of merit) of an integer lattice ⋎. We describe a search algorithm that finds ϱ(N), the maximum of ϱ(⋎) over lattices of orderN. One feature of our search is that it can exploit the symmetry of ϱ without significantly slowing down the program to list symmetric copies. We have also developed other interactions between the search algorithm and the algorithm for ϱ(⋎) that have a significant effect on the speed of the program. The paper is theoretical, providing the mathematical basis for these algorithms. However, we give a list of all the three-dimensional good lattices of order not exceedingN=4,000. This list has 68 entries, 40 of which are new.ZusammenfassungIn der vorliegenden Arbeit beschreiben wir die entscheidenden Charakteristika unseres Suchprogramms zum Auffinden von “guten Gittern”. Die Inversen dieser Gitter werden für Gitterintegrationsformeln verwendet, die in der Mehrzahl zahlentheoretischen Ursprungs sind. Wir beschreiben Algorithmen zur Bestimmung des Zaremba-Index ϱ(⋎), der ein Qualitätsmaß für das ganzzahlige Gitter ⋎ darstellt. Unser Suchalgorithmus bestimmt ϱ(N), das Maximum von ϱ(⋎) über die Gitter der OrdnungN. Insbesondere kann unser Suchprogramm die Symmetrie von ϱ erkennen, ohne durch das Auflisten symmetrischer Exemplare nennenswert aufgehalten zu werden. Wir haben auch noch andere Rückkoppelungen zwischen dem Suchalgorithmus und dem Algorithmus für ϱ(⋎) entwickelt, die das Programm wesentlich beschleunigen. Die Arbeit zeigt die mathematischen Grundlagen für diese Algorithmen auf. Darüberhinaus listen wir alle 3-dimensionalen guten Gitter bis zur Ordnung 4000 auf; von den 68 Einträgen sind 40 neu.

Mathematics of Computation | 2006

Five-dimensional

James N. Lyness; Tor Sørevik

A major search program is described that has been used to determine a set of five-dimensional K-optimal lattice rules of enhanced trigonometric degrees up to 12. The program involved a distributed search, in which approximately 190 CPU-years were shared between more than 1,400 computers in many parts of the world.

Mathematics of Computation | 1991

K

J. N. Lyness; Tor Sørevik; P. Keast

A lattice rule is a quadrature rule over an s-dimensional hypercube, using N abscissas located on an integration lattice. In this paper the authors study sublattices and superlattices of integration lattices and of integer lattices. They exploit the properties of generator matrices of a lattice to provide an easy and elegant description of the relation between a lattice and a sublattice of given order. They also obtain necessary and sufficient criteria for existence of sublattices and information about the number of these.

Bit Numerical Mathematics | 1989

-optimal lattice rules

Terje O. Espelid; Tor Sørevik

Previously D. P. Laurie has introduced a new and sharper error estimate for adaptive quadrature routines with the attractive property that the error is guaranteed to be in a small interval if some constraints are satisfied. In this paper we discuss how to test whether or not the constraints are satisfied, and we report a selection of results from our tests with one dimensional integrals to see how the error estimate works in practice. It turns out that we get a more economic routine using this error estimate, but the loss in reliability, even with the new tests, can be catastrophic.

Scientific Programming | 2001

Notes on integration and integer sublattices

Ragnhild Blikberg; Tor Sørevik

In this paper we discuss the use of nested parallelism. Our claim is that if the problem naturally possesses multiple levels of parallelism, then applying parallelism to all levels may significantly enhance the scalability of your algorithm. This claim is sustained by numerical experiments. We also discuss how to implement multi-level parallelism using OpenMP. We find current OpenMP implementation, based on version 1.0, to have severe limitation for implementing nested parallelization. We then show how this can be circumvented by explicitly assign task to threads. Load balancing issues become more complicated with two (or more) levels of parallelism. To handle this problem, we have designed a distribution algorithm which groups threads into teams, each team being responsible for one course grain outer-level task. This algorithm is proven to produce the optimal load balance, under given assumptions.

Explore More