André Brinkmann | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where André Brinkmann is active.

Explore More

Publication

Featured researches published by André Brinkmann.

Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference on | 2009

Multi-level comparison of data deduplication in a backup scenario

Dirk Meister; André Brinkmann

Data deduplication systems detect redundancies between data blocks to either reduce storage needs or to reduce network traffic. A class of deduplication systems splits the data stream into data blocks (chunks) and then finds exact duplicates of these blocks. This paper compares the influence of different chunking approaches on multiple levels. On a macroscopic level, we compare the chunking approaches based on real-life user data in a weekly full backup scenario, both at a single point in time as well as over several weeks. In addition, we analyze how small changes affect the deduplication ratio for different file types on a microscopic level for chunking approaches and delta encoding. An intuitive assumption is that small semantic changes on documents cause only small modifications in the binary representation of files, which would imply a high ratio of deduplication. We will show that this assumption is not valid for many important file types and that application-specific chunking can help to further decrease storage capacity demands.

international conference on cluster computing | 2001

Simple routing strategies for adversarial systems

Baruch Awerbuch; Petra Berenbrink; André Brinkmann; Christian Scheideler

In this paper we consider the problem of delivering dynamically changing input streams in dynamically changing networks where both the topology and the input streams can change in an unpredictable way. In particular, we present two simple distributed balancing algorithms (one for packet injections and one for flow injections) and show that for the case of a single receiver these algorithms will always ensure that the number of packets or flow in the system is bounded at any time step, even for an injection process that completely saturates the capacities of the available edges and even if the network topology changes in a completely unpredictable way. We also show that the maximum number of packets or flow that can be in the system at any time is essentially best possible by providing a lower bound that holds for any online algorithm, whether distributed or not. Interestingly, our balancing algorithms do not behave well in a completely adversarial setting. We show that also in the other extreme of a static network and a static injection pattern the algorithms will converge to a point in which they achieve an average routing time that is close to the best possible average routing time that can be achieved by any strategy. This demonstrates that there are simple algorithms that can be efficient for very different scenarios.

ieee conference on mass storage systems and technologies | 2010

dedupv1: Improving deduplication throughput using solid state drives (SSD)

Dirk Meister; André Brinkmann

Data deduplication systems discover and remove redundancies between data blocks. The search for redundant data blocks is often based on hashing the content of a block and comparing the resulting hash value with already stored entries inside an index. The limited random IO performance of hard disks limits the overall throughput of such systems, if the index does not fit into main memory. This paper presents the architecture of the dedupv1 dedupli-cation system that uses solid-state drives (SSDs) to improve its throughput compared to disk-based systems. dedupv1 is designed to use the sweet spots of SSD technology (random reads and sequential operations), while avoiding random writes inside the data path. This is achieved by using a hybrid deduplication design. It is an inline deduplication system as it performs chunking and fingerprinting online and only stores new data, but it is able to delay much of the processing as well as IO operations.

design, automation, and test in europe | 2010

Non-intrusive virtualization management using libvirt

Matthias Bolte; Michael Sievers; Georg Birkenheuer; Oliver Niehörster; André Brinkmann

The success of server virtualization has let to the deployment of a huge number of virtual machines in todays data centers, making a manual virtualization management very labor-intensive. The development of appropriate management solutions is hindered by the various management interfaces of different hypervisors. Therefore, a uniform management can be simplified by a layer abstracting from these dedicated hypervisor interfaces. The libvirt management library provides such an interface to different hypervisors. Unfortunately, remote hypervisor management using libvirt has not been possible without altering the managed servers. To overcome this limitation, we have integrated remote hypervisor management facilities into the libvirt driver infrastructure for VMware ESX and Microsoft Hyper-V. This paper presents the resulting architecture as well as experiences gained during the implementation process.

Journal of Chemical Theory and Computation | 2014

The MoSGrid Science Gateway - A Complete Solution for Molecular Simulations

Jens Krüger; Richard Grunzke; Sandra Gesing; Sebastian Breuers; André Brinkmann; Luis de la Garza; Oliver Kohlbacher; Martin Kruse; Wolfgang E. Nagel; Lars Packschies; Ralph Müller-Pfefferkorn; Patrick Schäfer; Charlotta Schärfe; Thomas Steinke; Tobias Schlemmer; Klaus Warzecha; Andreas Zink; Sonja Herres-Pawlis

The MoSGrid portal offers an approach to carry out high-quality molecular simulations on distributed compute infrastructures to scientists with all kinds of background and experience levels. A user-friendly Web interface guarantees the ease-of-use of modern chemical simulation applications well established in the field. The usage of well-defined workflows annotated with metadata largely improves the reproducibility of simulations in the sense of good lab practice. The MoSGrid science gateway supports applications in the domains quantum chemistry (QC), molecular dynamics (MD), and docking. This paper presents the open-source MoSGrid architecture as well as lessons learned from its design.

ieee international conference on high performance computing data and analytics | 2012

A study on data deduplication in HPC storage systems

Dirk Meister; Jürgen Kaiser; André Brinkmann; Toni Cortes; Michael Kuhn; Julian M. Kunkel

Deduplication is a storage saving technique that is highly successful in enterprise backup environments. On a file system, a single data block might be stored multiple times across different files, for example, multiple versions of a file might exist that are mostly identical. With deduplication, this data replication is localized and redundancy is removed -- by storing data just once, all files that use identical regions refer to the same unique data. The most common approach splits file data into chunks and calculates a cryptographic fingerprint for each chunk. By checking if the fingerprint has already been stored, a chunk is classified as redundant or unique. Only unique chunks are stored. This paper presents the first study on the potential of data deduplication in HPC centers, which belong to the most demanding storage producers. We have quantitatively assessed this potential for capacity reduction for 4 data centers (BSC, DKRZ, RENCI, RWTH). In contrast to previous deduplication studies focusing mostly on backup data, we have analyzed over one PB (1212 TB) of online file system data. The evaluation shows that typically 20% to 30% of this online data can be removed by applying data deduplication techniques, peaking up to 70% for some data sets. This reduction can only be achieved by a subfile deduplication approach, while approaches based on whole-file comparisons only lead to small capacity savings.

acm symposium on parallel algorithms and architectures | 2000

Efficient, distributed data placement strategies for storage area networks (extended abstract)

André Brinkmann; Kay Salzwedel; Christian Scheideler

In the last couple of years a dramatic growth of enterprise data storage capacity can be observed. As a result, new strategies have been sought that allow servers and storage being centralized to better manage the explosion of data and the overall cost of ownership. Nowadays, a common approach is to combine storage devices into a dedicated network that is connected to LANs and/or servers. Such networks are usually called storage area networks (SAN). A very important aspect for these networks is scalability. If a SAN undergoes changes (for instance, due to insertions or removals of disks), it may be necessary to replace data in order to allow an efficient use of the system. To keep the influence of data replacements on the performance of the SAN small, this should be done as efficiently as possible. In this paper, we investigate the problem of evenly distributing and efficiently locating data in dynamically changing SANs. We consider two scenarios: (1) all disks have the same capacity, and (2) the capacities of the disks are allowed to be arbitrary. For both scenarios, we present placement strategies capable of locating blocks efficiently and that are able to quickly adjust the data placement to insertions or removals of disks or data blocks. Furthermore, we study how the performance of our placement strategies changes if we allow to waste a certain amount of capacity of the disks.

acm symposium on parallel algorithms and architectures | 2002

Compact, adaptive placement schemes for non-uniform requirements

André Brinkmann; Kay Salzwedel; Christian Scheideler

In this paper we study the problem of designing compact, adaptive strategies for the distribution of objects among a heterogeneous set of servers. Ideally, such a strategy should allow the computation of the position of an object with a low time and space complexity, and it should be able to adapt with a near-minimum amount of replacements of objects to changes in the capabilities of the servers so that objects are always distributed among the servers according to their capabilities. Previous techniques are able to handle these requirements only in part. For example, standard hashing techniques can be used to achieve a non-uniform distribution of objects among a set of servers and the time and space efficient computation of the position of the objects, but they usually do not adapt well to a change in the capabilities. We present two strategies based on hashing that achieve all of the goals above. Furthermore, we give a list of applications for these strategies demonstrating that they can be used efficiently for distributed data management, web caches, and adaptive random graphs, which may be of interest for peer-to-peer networks.

grid computing | 2012

A Single Sign-On Infrastructure for Science Gateways on a Use Case for Structural Bioinformatics

Sandra Gesing; Richard Grunzke; Jens Krüger; Georg Birkenheuer; Martin Wewior; Patrick Schäfer; Bernd Schuller; Johannes Schuster; Sonja Herres-Pawlis; Sebastian Breuers; Ákos Balaskó; Miklos Kozlovszky; Anna Szikszay Fabri; Lars Packschies; Péter Kacsuk; Dirk Blunk; Thomas Steinke; André Brinkmann; Gregor Fels; Ralph Müller-Pfefferkorn; René Jäkel; Oliver Kohlbacher

Structural bioinformatics applies computational methods to analyze and model three-dimensional molecular structures. There is a huge number of applications available to work with structural data on large scale. Using these tools on distributed computing infrastructures (DCIs), however, is often complicated due to a lack of suitable interfaces. The MoSGrid (Molecular Simulation Grid) science gateway provides an intuitive user interface to several widely-used applications for structural bioinformatics, molecular modeling, and quantum chemistry. It ensures the confidentiality, integrity, and availability of data via a granular security concept, which covers all layers of the infrastructure. The security concept applies SAML (Security Assertion Markup Language) and allows trust delegation from the user interface layer across the high-level middleware layer and the Grid middleware layer down to the HPC facilities. SAML assertions had to be integrated into the MoSGrid infrastructure in several places: the workflow-enabled Grid portal WS-PGRADE (Web Services Parallel Grid Runtime and Developer Environment), the gUSE (Grid User Support Environment) DCI services, and the cloud file system XtreemFS. The presented security infrastructure allows a single sign-on process to all involved DCI components and, therefore, lowers the hurdle for users to utilize large HPC infrastructures for structural bioinformatics.

parallel distributed and network based processing | 2002

Dynamically reconfigurable system-on-programmable-chip

Heiko Kalte; Dominik Langen; Erik Vonnahme; André Brinkmann; Ulrich Rückert

Todays high-density FPGAs and intellectual property (IP) components enable the integration of complex systems in one programmable chip. New design strategies and concepts have to be developed in order to utilize the new system-level integration facilities. The approach introduced in this paper describes the implementation of a communication infrastructure that provides a number of on-chip IP-sockets. By using the FPGA feature of partial dynamic reconfiguration, different IP components can be plugged into these sockets at run-time. This leads to a reconfigurable system that can be adapted to varying demands. In this context, we designed a 32-bit RISC processor and an AMBA (Advanced Microcontroller Bus Architecture) on-chip interconnection bus. Finally, we mapped these components on to a reconfigurable system-level FPGA. The resulting hardware sizes and the utilization of the FPGAs resources are presented.

Explore More