Jakob Blomer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jakob Blomer is active.

Explore More

Publication

Featured researches published by Jakob Blomer.

Journal of Physics: Conference Series | 2011

Distributing LHC application software and conditions databases using the CernVM file system

Jakob Blomer; Carlos Aguado-Sánchez; P. Buncic; Artem Harutyunyan

The CernVM File System (CernVM-FS) is a read-only file system designed to deliver high energy physics (HEP) experiment analysis software onto virtual machines and Grid worker nodes in a fast, scalable, and reliable way. CernVM-FS decouples the underlying operating system from the experiment denned software stack. Files and file metadata are aggressively cached and downloaded on demand. By designing the file system specifically to the use case of HEP software repositories and experiment conditions data, several typically hard problems for (distributed) file systems can be solved in an elegant way. For the distribution of files, we use a standard HTTP transport, which allows exploitation of a variety of web caches, including commercial content delivery networks. We ensure data authenticity and integrity over possibly untrusted caches and connections while keeping all distributed data cacheable. On the small scale, we developed an experimental extension that allows multiple CernVM-FS instances in a computing cluster to discover each other and to share their file caches.

Journal of Physics: Conference Series | 2010

CernVM – a virtual software appliance for LHC applications

P. Buncic; C. Aguado Sánchez; Jakob Blomer; L Franco; A Harutyunian; P. Mato; Y Yao

CernVM is a Virtual Software Appliance capable of running physics applications from the LHC experiments at CERN. It aims to provide a complete and portable environment for developing and running LHC data analysis on any end-user computer (laptop, desktop) as well as on the Grid, independently of Operating System platforms (Linux, Windows, MacOS). The experiment application software and its specific dependencies are built independently from CernVM and delivered to the appliance just in time by means of a CernVM File System (CVMFS) specifically designed for efficient software distribution. The procedures for building, installing and validating software releases remains under the control and responsibility of each user community. We provide a mechanism to publish pre-built and configured experiment software releases to a central distribution point from where it finds its way to the running CernVM instances via the hierarchy of proxy servers or content delivery networks. In this paper, we present current state of CernVM project and compare performance of CVMFS to performance of traditional network file system like AFS and discuss possible scenarios that could further improve its performance and scalability.

arXiv: Distributed, Parallel, and Cluster Computing | 2014

Micro-CernVM: slashing the cost of building and deploying virtual machines

Jakob Blomer; D. Berzano; P. Buncic; Ioannis Charalampidis; G. Ganis; Georgios Lestaris; René Meusel; V. Nicolaou

The traditional virtual machine building and and deployment process is centered around the virtual machine hard disk image. The packages comprising the VM operating system are carefully selected, hard disk images are built for a variety of different hypervisors, and images have to be distributed and decompressed in order to instantiate a virtual machine. Within the HEP community, the CernVM File System has been established in order to decouple the distribution from the experiment software from the building and distribution of the VM hard disk images. We show how to get rid of such pre-built hard disk images altogether. Due to the high requirements on POSIX compliance imposed by HEP application software, CernVM-FS can also be used to host and boot a Linux operating system. This allows the use of a tiny bootable CD image that comprises only a Linux kernel while the rest of the operating system is provided on demand by CernVM-FS. This approach speeds up the initial instantiation time and reduces virtual machine image sizes by an order of magnitude. Furthermore, security updates can be distributed instantaneously through CernVM-FS. By leveraging the fact that CernVM-FS is a versioning file system, a historic analysis environment can be easily re-spawned by selecting the corresponding CernVM-FS file system snapshot.

Journal of Physics: Conference Series | 2012

CernVM Co-Pilot: an Extensible Framework for Building Scalable Computing Infrastructures on the Cloud

Artem Harutyunyan; Jakob Blomer; P. Buncic; I Charalampidis; Francois Grey; A Karneyeu; D Larsen; D Lombraña González; J Lisec; Ben Segal; Peter Skands

CernVM Co-Pilot is a framework for instantiating an ad-hoc computing infrastructure on top of managed or unmanaged computing resources. Co-Pilot can either be used to create a stand-alone computing infrastructure, or to integrate new computing resources into existing infrastructures (such as Grid or batch). Unlike traditional middleware systems, Co-Pilot components communicate using the Extensible Messaging and Presence protocol (XMPP). This allows the system to be easily scaled in case of a high load, and it also simplifies the development of new components. In this contribution we present the latest developments and the current status of the framework, discuss how it can be extended to suit the needs of a particular community, as well as describe the operational experience of using the framework in the LHC@home 2.0 volunteer computing project.

Journal of Physics: Conference Series | 2011

CernVM Co-Pilot: a Framework for Orchestrating Virtual Machines Running Applications of LHC Experiments on the Cloud

Artem Harutyunyan; C. Aguado Sánchez; Jakob Blomer; P. Buncic

CernVM Co-Pilot is a framework for the delivery and execution of the workload on remote computing resources. It consists of components which are developed to ease the integration of geographically distributed resources (such as commercial or academic computing clouds, or the machines of users participating in volunteer computing projects) into existing computing grid infrastructures. The Co-Pilot framework can also be used to build an ad-hoc computing infrastructure on top of distributed resources. In this paper we present the architecture of the Co-Pilot framework, describe how it is used to execute the jobs of the ALICE and ATLAS experiments, as well as to run the Monte-Carlo simulation application of CERN Theoretical Physics Group.

Journal of Physics: Conference Series | 2011

Volunteer Clouds and Citizen Cyberscience for LHC Physics

Carlos Aguado Sanchez; Jakob Blomer; P. Buncic; G.M. Chen; John Ellis; David Garcia Quintas; Artem Harutyunyan; Francois Grey; Daniel Lombraña González; M.A. Marquina; P. Mato; Jarno Rantala; Holger Schulz; Ben Segal; Archana Sharma; Peter Skands; David J. Weir; Jie Wu; Wenjing Wu; Rohit Yadav

Computing for the LHC, and for HEP more generally, is traditionally viewed as requiring specialized infrastructure and software environments, and therefore not compatible with the recent trend in volunteer computing, where volunteers supply free processing time on ordinary PCs and laptops via standard Internet connections. In this paper, we demonstrate that with the use of virtual machine technology, at least some standard LHC computing tasks can be tackled with volunteer computing resources. Specifically, by presenting volunteer computing resources to HEP scientists as a volunteer cloud, essentially identical to a Grid or dedicated cluster from a job submission perspective, LHC simulations can be processed effectively. This article outlines both the technical steps required for such a solution and the implications for LHC computing as well as for LHC public outreach and for participation by scientists from developing regions in LHC research.

Computing in Science and Engineering | 2015

The Evolution of Global Scale Filesystems for Scientific Software Distribution

Jakob Blomer; P. Buncic; René Meusel; G. Ganis; I. Sfiligoi; Douglas Thain

Delivering complex software across a worldwide distributed system is a major challenge in high-throughput scientific computing. The problem arises at different scales for many scientific communities that use grids, clouds, and distributed clusters to satisfy their computing needs. For high-energy physics (HEP) collaborations dealing with large amounts of data that rely on hundreds of thousands of cores spread around the world for data processing, the challenge is particularly acute. To serve the needs of the HEP community, several iterations were made to create a scalable, user-level filesystem that delivers software worldwide on a daily basis. The implementation was designed in 2006 to serve the needs of one experiment running on thousands of machines. Since that time, this idea evolved into a new production global-scale filesystem serving the needs of multiple science communities on hundreds of thousands of machines around the world.

arXiv: Distributed, Parallel, and Cluster Computing | 2014

PROOF as a Service on the Cloud: a Virtual Analysis Facility based on the CernVM ecosystem

D. Berzano; Jakob Blomer; P. Buncic; Ioannis Charalampidis; G. Ganis; Georgios Lestaris; René Meusel

PROOF, the Parallel ROOT Facility, is a ROOT-based framework which enables interactive parallelism for event-based tasks on a cluster of computing nodes. Although PROOF can be used simply from within a ROOT session with no additional requirements, deploying and configuring a PROOF cluster used to be not as straightforward. Recently great efforts have been spent to make the provisioning of generic PROOF analysis facilities with zero configuration, with the added advantages of positively affecting both stability and scalability, making the deployment operations feasible even for the end user. Since a growing amount of large-scale computing resources are nowadays made available by Cloud providers in a virtualized form, we have developed the Virtual PROOF-based Analysis Facility: a cluster appliance combining the solid CernVM ecosystem and PoD (PROOF on Demand), ready to be deployed on the Cloud and leveraging some peculiar Cloud features such as elasticity. We will show how this approach is effective both for sysadmins, who will have little or no configuration to do to run it on their Clouds, and for the end users, who are ultimately in full control of their PROOF cluster and can even easily restart it by themselves in the unfortunate event of a major failure. We will also show how elasticity leads to a more optimal and uniform usage of Cloud resources.

Journal of Physics: Conference Series | 2011

CernVM: Minimal maintenance approach to virtualization

P. Buncic; Carlos Aguado-Sánchez; Jakob Blomer; Artem Harutyunyan

CernVM is a virtual software appliance designed to support the development cycle and provide a runtime environment for the LHC experiments. It consists of three key components that differentiate it from more traditional virtual machines: a minimal Linux Operating System (OS), a specially tuned file system designed to deliver application software on demand, and contextualization tools that provide a means to easily customize and configure CernVM instances for different tasks and user communities. In this contribution we briefly describe the most important use cases for virtualization in High Energy Physics (HEP), CernVM key components and discuss how end-to-end systems corresponding to these use cases can be realized using CernVM.

Journal of Physics: Conference Series | 2011

Studying ROOT I/O performance with PROOF-Lite

C Aguado-Sanchez; Jakob Blomer; P. Buncic; Ioannis Charalampidis; G Ganis; M Nabozny; F Rademakers

Parallelism aims to improve computing performance by executing a set of computations concurrently. Since the advent of todays many-core machines the full exploitation of the available CPU power has been one of the main challenges. In High Energy Physics (HEP) final data analysis the bottleneck is not only the available CPU but also the available I/O bandwidth. Most of todays HEP analysis frameworks depend on ROOT I/O. In this paper we will discuss the results obtained studying the ROOT I/O performance using PROOF-Lite, a parallel multi-process approach whose results can be directly applied to the generic case of many jobs running concurrently on the same machine. We will also discuss the impact of running the applications in virtual machines.

Explore More