Julian M. Kunkel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Julian M. Kunkel is active.

Explore More

Publication

Featured researches published by Julian M. Kunkel.

international parallel and distributed processing symposium | 2009

Small-file access in parallel file systems

Philip H. Carns; Samuel Lang; Robert B. Ross; Murali Vilayannur; Julian M. Kunkel; Thomas Ludwig

Todays computational science demands have resulted in ever larger parallel computers, and storage systems have grown to match these demands. Parallel file systems used in this environment are increasingly specialized to extract the highest possible performance for large I/O operations, at the expense of other potential workloads. While some applications have adapted to I/O best practices and can obtain good performance on these systems, the natural I/O patterns of many applications result in generation of many small files. These applications are not well served by current parallel file systems at very large scale. This paper describes five techniques for optimizing small-file access in parallel file systems for very large scale systems. These five techniques are all implemented in a single parallel file system (PVFS) and then systematically assessed on two test platforms. A microbenchmark and the mdtest benchmark are used to evaluate the optimizations at an unprecedented scale. We observe as much as a 905% improvement in small-file create rates, 1,106% improvement in small-file stat rates, and 727% improvement in small-file removal rates, compared to a baseline PVFS configuration on a leadership computing platform using 16,384 cores.

ieee international conference on high performance computing data and analytics | 2012

A study on data deduplication in HPC storage systems

Dirk Meister; Jürgen Kaiser; André Brinkmann; Toni Cortes; Michael Kuhn; Julian M. Kunkel

Deduplication is a storage saving technique that is highly successful in enterprise backup environments. On a file system, a single data block might be stored multiple times across different files, for example, multiple versions of a file might exist that are mostly identical. With deduplication, this data replication is localized and redundancy is removed -- by storing data just once, all files that use identical regions refer to the same unique data. The most common approach splits file data into chunks and calculates a cryptographic fingerprint for each chunk. By checking if the fingerprint has already been stored, a chunk is classified as redundant or unique. Only unique chunks are stored. This paper presents the first study on the potential of data deduplication in HPC centers, which belong to the most demanding storage producers. We have quantitatively assessed this potential for capacity reduction for 4 data centers (BSC, DKRZ, RENCI, RWTH). In contrast to previous deduplication studies focusing mostly on backup data, we have analyzed over one PB (1212 TB) of online file system data. The evaluation shows that typically 20% to 30% of this online data can be removed by applying data deduplication techniques, peaking up to 70% for some data sets. This reduction can only be achieved by a subfile deduplication approach, while approaches based on whole-file comparisons only lead to small capacity savings.

Physics in Medicine and Biology | 2013

Performance-optimized clinical IMRT planning on modern CPUs.

Peter Ziegenhein; C P Kamerling; Mark Bangert; Julian M. Kunkel; Uwe Oelfke

Intensity modulated treatment plan optimization is a computationally expensive task. The feasibility of advanced applications in intensity modulated radiation therapy as every day treatment planning, frequent re-planning for adaptive radiation therapy and large-scale planning research severely depends on the runtime of the plan optimization implementation. Modern computational systems are built as parallel architectures to yield high performance. The use of GPUs, as one class of parallel systems, has become very popular in the field of medical physics. In contrast we utilize the multi-core central processing unit (CPU), which is the heart of every modern computer and does not have to be purchased additionally. In this work we present an ultra-fast, high precision implementation of the inverse plan optimization problem using a quasi-Newton method on pre-calculated dose influence data sets. We redefined the classical optimization algorithm to achieve a minimal runtime and high scalability on CPUs. Using the proposed methods in this work, a total plan optimization process can be carried out in only a few seconds on a low-cost CPU-based desktop computer at clinical resolution and quality. We have shown that our implementation uses the CPU hardware resources efficiently with runtimes comparable to GPU implementations, at lower costs.

Biotechnology Journal | 2010

From experimental setup to bioinformatics: An RNAi screening platform to identify host factors involved in HIV-1 replication

Kathleen Börner; Johannes Hermle; Christoph Sommer; Nigel P. Brown; Bettina Knapp; Bärbel Glass; Julian M. Kunkel; Gloria Torralba; Jürgen Reymann; Nina Beil; Jürgen Beneke; Rainer Pepperkok; Reinhard Schneider; Thomas Ludwig; Michael Hausmann; Fred A. Hamprecht; Holger Erfle; Lars Kaderali; Hans-Georg Kräusslich; Maik J. Lehmann

RNA interference (RNAi) has emerged as a powerful technique for studying loss‐of‐function phenotypes by specific down‐regulation of gene expression, allowing the investigation of virus‐host interactions by large‐scale high‐throughput RNAi screens. Here we present a robust and sensitive small interfering RNA screening platform consisting of an experimental setup, single‐cell image and statistical analysis as well as bioinformatics. The workflow has been established to elucidate host gene functions exploited by viruses, monitoring both suppression and enhancement of viral replication simultaneously by fluorescence microscopy. The platform comprises a two‐stage procedure in which potential host factors are first identified in a primary screen and afterwards re‐tested in a validation screen to confirm true positive hits. Subsequent bioinformatics allows the identification of cellular genes participating in metabolic pathways and cellular networks utilised by viruses for efficient infection. Our workflow has been used to investigate host factor usage by the human immunodeficiency virus‐1 (HIV‐1), but can also be adapted to other viruses. Importantly, we expect that the description of the platform will guide further screening approaches for virus‐host interactions. The ViroQuant‐CellNetworks RNAi Screening core facility is an integral part of the recently founded BioQuant centre for systems biology at the University of Heidelberg and will provide service to external users in the near future.

Computer Science - Research and Development | 2010

Simulation of power consumption of energy efficient cluster hardware

Timo Minartz; Julian M. Kunkel; Thomas Ludwig

In recent years the power consumption of high-performance computing clusters has become a growing problem because the number and size of cluster installations has been rising. The high power consumption of clusters is a consequence of their design goal: High performance. With low utilization, cluster hardware consumes nearly as much energy as when it is fully utilized. Theoretically, in these low utilization phases cluster hardware can be turned off or switched to a lower power consuming state.We designed a model to estimate power consumption of hardware based on the utilization. Applications are instrumented to create utilization trace files for a simulator realizing this model. Different hardware components can be simulated using multiple estimation strategies. An optimal strategy determines an upper bound of energy savings for existing hardware without affecting the time-to-solution. Additionally, the simulator can estimate the power consumption of efficient hardware which is energy-proportional. This way the minimum power consumption can be determined for a given application. Naturally, this minimal power consumption provides an upper bound for any power saving strategy.After evaluating the correctness of the simulator several different strategies and energy-proportional hardware are compared.

international supercomputing conference | 2013

Evaluating Lossy Compression on Climate Data

Nathanael Hübbe; Al Wegener; Julian M. Kunkel; Yi Ling; Thomas Ludwig

While the amount of data used by today’s high-performance computing (HPC) codes is huge, HPC users have not broadly adopted data compression techniques, apparently because of a fear that compression will either unacceptably degrade data quality or that compression will be too slow to be worth the effort. In this paper, we examine the effects of three lossy compression methods (GRIB2 encoding, GRIB2 using JPEG 2000 and LZMA, and the commercial Samplify APAX algorithm) on decompressed data quality, compression ratio, and processing time. A careful evaluation of selected lossy and lossless compression methods is conducted, assessing their influence on data quality, storage requirements and performance. The differences between input and decoded datasets are described and compared for the GRIB2 and APAX compression methods. Performance is measured using the compressed file sizes and the time spent on compression and decompression. Test data consists both of 9 synthetic data exposing compression behavior and 123 climate variables output from a climate model. The benefits of lossy compression for HPC systems are described and are related to our findings on data quality.

Computer Science - Research and Development | 2013

Reducing the HPC-datastorage footprint with MAFISC--Multidimensional Adaptive Filtering Improved Scientific data Compression

Nathanael Hübbe; Julian M. Kunkel

Large HPC installations today also include large data storage installations. Data compression can significantly reduce the amount of data, and it was one of our goals to find out, how much compression can do for climate data. The price of compression is, of course, the need for additional computational resources, so our second goal was to relate the savings of compression to the costs it necessitates.In this paper we present the results of our analysis of typical climate data. A lossless algorithm based on these insights is developed and its compression ratio is compared to that of standard compression tools. As it turns out, this algorithm is general enough to be useful for a large class of scientific data, which is the reason we speak of MAFISC as a method for scientific data compression. A numeric problem for lossless compression of scientific data is identified and a possible solution is given. Finally, we discuss the economics of data compression in HPC environments using the example of the German Climate Computing Center.

Computer Science - Research and Development | 2013

Towards I/O analysis of HPC systems and a generic architecture to collect access patterns

Marc C. Wiedemann; Julian M. Kunkel; Michaela Zimmer; Thomas Ludwig; Michael M. Resch; Thomas Bönisch; Xuan Wang; Andriy Chut; Alvaro Aguilera; Wolfgang E. Nagel; Michael Kluge; Holger Mickler

In high-performance computing applications, a high-level I/O call will trigger activities on a multitude of hardware components. These are massively parallel systems supported by huge storage systems and internal software layers. Their complex interplay currently makes it impossible to identify the causes for and the locations of I/O bottlenecks. Existing tools indicate when a bottleneck occurs but provide little guidance in identifying the cause or improving the situation.We have thus initiated Scalable I/O for Extreme Performance to find solutions for this problem. To achieve this goal in SIOX, we will build a system to record access information on all layers and components, to recognize access patterns, and to characterize the I/O system. The system will ultimately be able to recognize the causes of the I/O bottlenecks and propose optimizations for the I/O middleware that can improve I/O performance, such as throughput rate and latency. Furthermore, the SIOX system will be able to support decision making while planning new I/O systems.In this paper, we introduce the SIOX system and describe its current status: We first outline our approach for collecting the required access information. We then provide the architectural concept, the methods for reconstructing the I/O path and an excerpt of the interface for data collection. This paper focuses especially on the architecture, which collects and combines the relevant access information along the I/O path, and which is responsible for the efficient transfer of this information. An abstract modelling approach allows us to better understand the complexity of the analysis of the I/O activities on parallel computing systems, and an abstract interface allows us to adapt the SIOX system to various HPC file systems.

european conference on parallel processing | 2008

Directory-Based Metadata Optimizations for Small Files in PVFS

Michael Kuhn; Julian M. Kunkel; Thomas Ludwig

Modern file systems maintain extensive metadata about stored files. While this usually is useful, there are situations when the additional overhead of such a design becomes a problem in terms of performance. This is especially true for parallel and cluster file systems, because due to their design every metadata operation is even more expensive. In this paper several changes made to the parallel cluster file system PVFS are presented. The changes are targeted at the optimization of workloads with large numbers of small files. To improve metadata performance, PVFS was modified such that unnecessary metadata is not managed anymore. Several tests with a large quantity of files were done to measure the benefits of these changes. The tests have shown that common file system operations can be sped up by a factor of two even with relatively few changes.

international conference on supercomputing | 2014

The SIOX Architecture --- Coupling Automatic Monitoring and Optimization of Parallel I/O

Julian M. Kunkel; Michaela Zimmer; Nathanael Hübbe; Alvaro Aguilera; Holger Mickler; Xuan Wang; Andriy Chut; Thomas Bönisch; Jakob Lüttgau; Roman Michel; Johann Weging

Performance analysis and optimization of high-performance I/O systems is a daunting task. Mainly, this is due to the overwhelmingly complex interplay of the involved hardware and software layers. The Scalable I/O for Extreme Performance SIOX project provides a versatile environment for monitoring I/O activities and learning from this information. The goal of SIOX is to automatically suggest and apply performance optimizations, and to assist in locating and diagnosing performance problems. In this paper, we present the current status of SIOX. Our modular architecture covers instrumentation of POSIX, MPI and other high-level I/O libraries; the monitoring data is recorded asynchronously into a global database, and recorded traces can be visualized. Furthermore, we offer a set of primitive plug-ins with additional features to demonstrate the flexibility of our architecture: A surveyor plug-in to keep track of the observed spatial access patterns; an fadvise plug-in for injecting hints to achieve read-ahead for strided access patterns; and an optimizer plug-in which monitors the performance achieved with different MPI-IO hints, automatically supplying the best known hint-set when no hints were explicitly set. The presentation of the technical status is accompanied by a demonstration of some of these features on our 20 node cluster. In additional experiments, we analyze the overhead for concurrent access, for MPI-IOs 4-levels of access, and for an instrumented climate application. While our prototype is not yet full-featured, it demonstrates the potential and feasibility of our approach.

Explore More