Per-Åke Larson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Per-Åke Larson is active.

Explore More

Publication

Featured researches published by Per-Åke Larson.

international conference on data engineering | 2001

B-tree indexes and CPU caches

Goetz Graefe; Per-Åke Larson

Since many existing techniques for exploiting CPU caches in the implementation of B-tree indexes have not been discussed in the literature, most of them are surveyed. Rather than providing a detailed performance evaluation for one or two of them on some specific contemporary hardware, the purpose is to survey and to make widely available this heretofore-folkloric knowledge in order to enable, structure, and hopefully stimulate future research.

ACM Transactions on Database Systems | 1982

Performance analysis of linear hashing with partial expansions

Per-Åke Larson

Linear hashing with partial expansions is a new file organization primarily intended for files which grow and shrink dynamically. This paper presents a mathematical analysis of the expected performance of the new scheme. The following measures are considered: length of successful and unsuccessful searches, accesses required to insert or delete a record, and the size of the overflow area. The performance is cyclical. For all performance measures, the necessary formulas are derived for computing the expected performance at any point of a cycle and the average over a cycle. Furthermore, the expected worst case in connection with searching is analyzed. The overall performance depends on several file parameters. The numerical results show that for many realistic parameter combinations the performance is expected to be extremely good. Even the longest search is expected to be of quite reasonable length.

international conference on management of data | 1991

Multi-disk B-trees

Bernhard Seeger; Per-Åke Larson

In this paper, Dept. of Computer Science, University of Waterloo Waterloo, Ontario, Canada, N2L 3G1 we consider how to exploit multiple disks to improve the performance of B-tree structured files. Attention is paid both to the response time of individual operations and to the throughput of the system in a multi-user environment. We begin with a survey of three different approaches to designing multi-disk B-trees: distributing records among disks, using large multi-disk pages, and distributing pages among disks. For each approach, several alternatives are discussed and their main advantages and disadvantages are identified. We then propose a new scheme, based on page distribution, that is intended to provide a better local balancing of the request load than previous schemes. Preliminary performance results confirm that this irrproves both response time and throughput.

ACM Transactions on Database Systems | 1988

Linear hashing with separators—a dynamic hashing scheme achieving one-access

Per-Åke Larson

A new dynamic hashing scheme is presented. Its most outstanding feature is that any record can be retrieved in exactly one disk access. This is achieved by using a small amount of supplemental internal storage that stores enough information to uniquely determine the current location of any record. The amount of internal storage required is small: typically one byte for each page of the file. The necessary address computation, insertion, and expansion algorithms are presented and the performance is studied by means of simulation. The new method is the first practical method offering one-access retrieval for large dynamic files.

IEEE Transactions on Knowledge and Data Engineering | 1989

Performance of B/sup +/-trees with partial expansions

Ricardo A. Baeza-Yates; Per-Åke Larson

The authors mathematically analyze the behavior of B/sup +/-trees with partial expansions file structure under random insertions, focusing on the expected storage utilization and the expected cost of insertions. The model can be used for studying both the asymptotic and dynamic behavior. The accuracy of the model is confirmed by simulation. Disk space management is found to be more difficult than for standard B/sup +/-trees. Two simple space-management schemes specifically designed for handling buckets of two different sizes are investigated. It is found that an overall storage utilization of 81% can be achieved in practice. >

foundations of computer science | 1985

Robin hood hashing

Pedro Celis; Per-Åke Larson; J. Ian Munro

This paper deals with hash tables in which conflicts are resolved by open addressing. The initial contribution is a very simple insertion procedure which (in comparison to the standard approach) has the effect of dramatically reducing the variance of the number of probes required for a search. This leads to a new search procedure which requires only a constant number of probes, on average, even for full tables. Finally, an extension to these methods yields a new, simple way of performing deletions and subsequent insertions. Experimental results strongly indicate little degeneration in search time. In particular deletions and successful searches appear to require constant time (≪ 2.57 probes) and insertions and unsuccessful searches, O(logn).

international conference on management of data | 1998

Memory management during run generation in external sorting

Per-Åke Larson; Goetz Graefe

If replacement selection is used in an external mergesort to generate initial runs, individual records are deleted and inserted in the sort operations workspace. Variable-length records introduce the need for possibly complex memory management and extra copying of records. As a result, few systems employ replacement selection, even though it produces longer runs than commonly used algorithms. We experimentally compared several algorithms and variants for managing this workspace. We found that the simple best fit algorithm achieves memory utilization of 90% or better and run lengths over 1.8 times workspace size, with no extra copying of records and very little other overhead, for widely varying record sizes and for a wide range of memory sizes. Thus, replacement selection is a viable algorithm for commercial database systems, even for variable-length records. Efficient memory management also enables an external sort algorithm that degrades gracefully when its input is only slightly larger than or a small multiple of the available memory size. This is not the case with the usual implementations of external sorting, which incur I/O for the entire input even if it is as little as one record larger than memory. Thus, in some cases, our techniques may reduce I/O volume by a factor 10 compared to traditional database sort algorithms. Moreover, the gradual rather than abrupt growth in I/O volume for increasing input sizes significantly eases design and implementation of intra-query memory management policies.

IEEE Transactions on Knowledge and Data Engineering | 1996

Speeding up external mergesort

LuoQuan Zheng; Per-Åke Larson

External mergesort is normally implemented so that each run is stored continuously on disk and blocks of data are read exactly in the order they are needed during merging. We investigate two ideas for improving the performance of external mergesort: interleaved layout and a new reading strategy. Interleaved layout places blocks from different runs in consecutive disk addresses. This is done in the hope that interleaving will reduce seek overhead during merging. The new reading strategy precomputes the order in which data blocks are to be read according to where they are located on disk and when they are needed for merging. Extra buffer space makes it possible to read blocks in an order that reduces seek overhead, instead of reading them exactly in the order they are needed for merging. A detailed simulation model was used to compare the two layout strategies and three reading strategies. The effects of using multiple work disks were also investigated. We found that, in most cases, interleaved layout does not improve performance, but that the new reading strategy consistently performs better than double buffering and forecasting.

international conference on data engineering | 2002

Data reduction by partial preaggregation

Per-Åke Larson

Partial preaggregation is a simple data reduction operator that can be applied to aggregation queries. Whenever we group and aggregate on a column set G, we can preaggregate on any column set that functionally determines G. Preaggregation can be used, for example, to reduce the input size to a join. Regular aggregation reduces the input to one record per group. Partial preaggregation exploits the fact that preaggregation need not be complete-if multiple records happen to be output for a group, they will be combined into the same group by the final aggregation. This paper describes a straightforward hash-based algorithm for partial preaggregation, discusses where it can be applied, and derives a mathematical model for estimating the output size. The effectiveness of the technique and the accuracy of the model are shown on both artificial and real data. It is also shown how to reduce memory requirements by combining partial preaggregation with the input phase of a subsequent join or sort operator. Partial preaggregation has been implemented, in part, in Microsoft SQL Server.

international conference on management of data | 1989

A file structure supporting traversal recursion

Per-Åke Larson; Vinay Deshpande

Traversal recursion is a class of recursive queries where the evaluation of the query involves traversal of a graph or a tree. This limited type of recursion arises in many applications. In this report we investigate a simple file structure that efficiently supports traversal recursion over large, acyclic graphs. The nodes of the graph are sorted in topological order and stored in a B-tree. Hence, traversal of the graph can be done in a single scan. Nodes and edges can also be inserted, deleted, and modified efficiently.

Explore More