Is this you? Create Your Porfile

Niels Nes

Centrum Wiskunde & Informatica

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Niels Nes is active.

Explore More

Publication

Featured researches published by Niels Nes.

international conference on data engineering | 2006

Super-Scalar RAM-CPU Cache Compression

Marcin Zukowski; Sándor Héman; Niels Nes; Peter A. Boncz

High-performance data-intensive query processing tasks like OLAP, data mining or scientific data analysis can be severely I/O bound, even when high-end RAID storage systems are used. Compression can alleviate this bottleneck only if encoding and decoding speeds significantly exceed RAID I/O bandwidth. For this purpose, we propose three new versatile compression schemes (PDICT, PFOR, and PFOR-DELTA) that are specifically designed to extract maximum IPC from modern CPUs. We compare these algorithms with compression techniques used in (commercial) database and information retrieval systems. Our experiments on the MonetDB/X100 database system, using both DSM and PAX disk storage, show that these techniques strongly accelerate TPC-H performance to the point that the I/O bottleneck is eliminated.

international conference on management of data | 2002

Efficient k-NN search on vertically decomposed data

Arjen P. de Vries; Nikos Mamoulis; Niels Nes; Martin L. Kersten

Applications like multimedia retrieval require efficient support for similarity search on large data collections. Yet, nearest neighbor search is a difficult problem in high dimensional spaces, rendering efficient applications hard to realize: index structures degrade rapidly with increasing dimensionality, while sequential search is not an attractive solution for repositories with millions of objects. This paper approaches the problem from a different angle. A solution is sought in an unconventional storage scheme, that opens up a new range of techniques for processing k-NN queries, especially suited for high dimensional spaces. The suggested (physical) database design accommodates well a novel variant of branch-and-bound search, that reduces the high dimensional space quickly to a small candidate set. The paper provides insight in applying this idea to k-NN search using two similarity metrics commonly encountered in image database applications, and discusses techniques for its implementation in relational database systems. The effectiveness of the proposed method is evaluated empirically on both real and synthetic data sets, reporting the significant improvements in response time yielded.

international conference on management of data | 2010

Positional update handling in column stores

Sándor Héman; Marcin Zukowski; Niels Nes; Lefteris Sidirourgos; Peter A. Boncz

In this paper we investigate techniques that allow for on-line updates to columnar databases, leaving intact their high read-only performance. Rather than keeping differential structures organized by the table key values, the core proposition of this paper is that this can better be done by keeping track of the tuple position of the modifications. Not only does this minimize the computational overhead of merging in differences into read-only queries, but this makes the differential structure oblivious of the value of the order keys, allowing it to avoid disk I/O for retrieving the order keys in read-only queries that otherwise do not need them - a crucial advantage for a column-store. We describe a new data structure for maintaining such positional updates, called the Positional Delta Tree (PDT), and describe detailed algorithms for PDT/column merging, updating PDTs, and for using PDTs in transaction management. In experiments with a columnar DBMS, we perform microbenchmarks on PDTs, and show in a TPC-H workload that PDTs allow quick on-line updates, yet significantly reduce their performance impact on read-only queries compared with classical value-based differential methods.

International Feminist Journal of Politics | 2011

SciQL, a query language for science applications

Martin L. Kersten; Ying Zhang; Milena Ivanova; Niels Nes

Scientific applications are still poorly served by contemporary relational database systems. At best, the system provides a bridge towards an external library using user-defined functions, explicit import/export facilities or linked-in Java/C# interpreters. Time has come to rectify this with SciQL1, a SQL query language for scientific applications with arrays as first class citizens. It provides a seamless symbiosis of array-, set-, and sequence- interpretation using a clear separation of the mathematical object from its underlying implementation. A key innovation is to extend value-based grouping in SQL:2003 with structural grouping, i.e., fixed-sized and unbounded groups based on explicit relationships between their dimension attributes. It leads to a generalization of window-based query processing with wide applicability in science domains. This paper is focused on the language features, extensively illustrated with examples of its intended use.

statistical and scientific database management | 2007

MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database

Milena Ivanova; Niels Nes; Romulo Goncalves; Martin L. Kersten

This paper presents our experiences in porting the Sloan Digital Sky Survey(SDSS)/ SkyServer to the state-of- the-art open source database system MonetDB/SQL. SDSS acts as a well-documented benchmark for scientific database management. We have achieved a fully functional prototype for the personal SkyServer, to be downloaded from our site. The lessons learned are 1) the column store approach of MonetDB demonstrates a great potential in the world of scientific databases. However, the application also challenged the functionality of our implementation and revealed that a fully operational SQL environment is needed, e.g. including persistent stored modules; 2) the initial performance is competitive to the reference platform, MS SQL Server 2005, and 3) the analysis of SDSS query traces hints at several techniques to boost performance by utilizing repetitive behavior and zoom-in/zoom-out access patterns, that are currently not captured by the system.

international conference on data engineering | 2008

Adaptive Segmentation for Scientific Databases

Milena Ivanova; Martin L. Kersten; Niels Nes

In this paper we explore database segmentation in the context of a column-store DBMS targeted at a scientific database. We present a novel hardware- and scheme-oblivious segmentation algorithm, which learns and adapts to the workload immediately. The approach taken is to capitalize on (intermediate) query results, such that future queries benefit from a more appropriate data layout. The algorithm is implemented as an extension of a complete DBMS and evaluated against a real-life workload. It demonstrates significant performance gains without DBA assistance.

european conference on parallel processing | 2001

Macro- and Micro-parallelism in a DBMS

Martin L. Kersten; Stefan Manegold; Peter A. Boncz; Niels Nes

Large memories have become an affordable storage medium for databases involving hundreds of Gigabytes on multi-processor systems. In this short note, we review our research on building relational engines to exploit this major shift in hardware perspective. It illustrates that key design issues related to parallelism poses architectural problems at all levels of a system architecture and whose impact is not easily predictable. The sheer size/complexity of a relational DBMS and the sliding requirements of frontier applications are indicative that a substantial research agenda remains wide open.

Publications of the Astronomical Society of the Pacific | 2016

Column store for GWAC: a high-cadence, high-density, large-scale astronomical light curve pipeline and distributed shared-nothing database

Meng Wan; Chao Wu; Jing Wang; Y.-L. Qiu; L. P. Xin; Sjoerd Mullender; Hannes Mühleisen; Bart Scheers; Ying Zhang; Niels Nes; Martin L. Kersten; Yongpan Huang; J. S. Deng; Jian-Yan Wei

The ground-based wide-angle camera array (GWAC), a part of the SVOM space mission, will search for various types of optical transients by continuously imaging a field of view (FOV) of 5000 degrees2 every 15 s. Each exposure consists of 36 × 4k × 4k pixels, typically resulting in 36 × ~175,600 extracted sources. For a modern time-domain astronomy project like GWAC, which produces massive amounts of data with a high cadence, it is challenging to search for short timescale transients in both real-time and archived data, and to build long-term light curves for variable sources. Here, we develop a high-cadence, high-density light curve pipeline (HCHDLP) to process the GWAC data in real-time, and design a distributed shared-nothing database to manage the massive amount of archived data which will be used to generate a source catalog with more than 100 billion records during 10 years of operation. First, we develop HCHDLP based on the column-store DBMS of MonetDB, taking advantage of MonetDBs high performance when applied to massive data processing. To realize the real-time functionality of HCHDLP, we optimize the pipeline in its source association function, including both time and space complexity from outside the database (SQL semantic) and inside (RANGE-JOIN implementation), as well as in its strategy of building complex light curves. The optimized source association function is accelerated by three orders of magnitude. Second, we build a distributed database using a two-level time partitioning strategy via the MERGE TABLE and REMOTE TABLE technology of MonetDB. Intensive tests validate that our database architecture is able to achieve both linear scalability in response time and concurrent access by multiple users. In summary, our studies provide guidance for a solution to GWAC in real-time data processing and management of massive data.

very large data bases | 2008