P. Kreuzer
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by P. Kreuzer.
Journal of Grid Computing | 2010
A. Fanfani; Anzar Afaq; Jose Afonso Sanches; Julia Andreeva; Giusepppe Bagliesi; L. A. T. Bauerdick; Stefano Belforte; Patricia Bittencourt Sampaio; K. Bloom; Barry Blumenfeld; D. Bonacorsi; C. Brew; Marco Calloni; Daniele Cesini; Mattia Cinquilli; G. Codispoti; Jorgen D’Hondt; Liang Dong; Danilo N. Dongiovanni; Giacinto Donvito; David Dykstra; Erik Edelmann; R. Egeland; P. Elmer; Giulio Eulisse; D Evans; Federica Fanzago; F. M. Farina; Derek Feichtinger; I. Fisk
The CMS experiment expects to manage several Pbytes of data each year during the LHC programme, distributing them over many computing sites around the world and enabling data access at those centers for analysis. CMS has identified the distributed sites as the primary location for physics analysis to support a wide community with thousands potential users. This represents an unprecedented experimental challenge in terms of the scale of distributed computing resources and number of user. An overview of the computing architecture, the software tools and the distributed infrastructure is reported. Summaries of the experience in establishing efficient and scalable operations to get prepared for CMS distributed analysis are presented, followed by the user experience in their current analysis activities.
Journal of Physics: Conference Series | 2014
J Adelman; S. Alderweireldt; J Artieda; G. Bagliesi; D Ballesteros; S. Bansal; L. A. T. Bauerdick; W Behrenhof; S. Belforte; K. Bloom; B. Blumenfeld; S. Blyweert; D. Bonacorsi; C. Brew; L Contreras; A Cristofori; S Cury; D da Silva Gomes; M Dolores Saiz Santos; J Dost; David Dykstra; E Fajardo Hernandez; F Fanzango; I. Fisk; J Flix; A Georges; M. Giffels; G. Gomez-Ceballos; S. J. Gowdy; Oliver Gutsche
During the first run, CMS collected and processed more than 10B data events and simulated more than 15B events. Up to 100k processor cores were used simultaneously and 100PB of storage was managed. Each month petabytes of data were moved and hundreds of users accessed data samples. In this document we discuss the operational experience from this first run. We present the workflows and data flows that were executed, and we discuss the tools and services developed, and the operations and shift models used to sustain the system. Many techniques were followed from the original computing planning, but some were reactions to difficulties and opportunities. We also address the lessons learned from an operational perspective, and how this is shaping our thoughts for 2015.
17th International Conference on Computing in High Energy and Nuclear Physics (CHEP) | 2010
O. Buchmuller; D. Bonacorsi; F. Fanzago; S. J. Gowdy; P. Kreuzer; L. Malgeri; Rainer Mankel; S. Metson; B Panzer-Steindel; J Afonso Sanches; U Schwickerath; D. Spiga; D Teodoro; Rainer Többicke
The CMS CERN Analysis Facility (CAF) was primarily designed to host a large variety of latency-critical workflows. These break down into alignment and calibration, detector commissioning and diagnosis, and high-interest physics analysis requiring fast-turnaround. In addition to the low latency requirement on the batch farm, another mandatory condition is the efficient access to the RAW detector data stored at the CERN Tier-0 facility. The CMS CAF also foresees resources for interactive login by a large number of CMS collaborators located at CERN, as an entry point for their day-by-day analysis. These resources will run on a separate partition in order to protect the high-priority use-cases described above. While the CMS CAF represents only a modest fraction of the overall CMS resources on the WLCG GRID, an appropriately sized user-support service needs to be provided. We will describe the building, commissioning and operation of the CMS CAF during the year 2008. The facility was heavily and routinely used by almost 250 users during multiple commissioning and data challenge periods. It reached a CPU capacity of 1.4MSI2K and a disk capacity at the Peta byte scale. In particular, we will focus on the performances in terms of networking, disk access and job efficiency and extrapolate prospects for the upcoming LHC first year data taking. We will also present the experience gained and the limitations observed in operating such a large facility, in which well controlled workflows are combined with more chaotic type analysis by a large number of physicists.
Journal of Physics: Conference Series | 2008
J. M. Hernandez; P. Kreuzer; Ajit Mohapatra; N D Filippis; S D Weirdt; C. Hof; S. Wakefield; W Guan; A. Khomitch; A. Fanfani; D. Evans; A. Flossdorf; J. Maes; P v Mulders; I. Villella; A. Pompili; S. My; M. Abbrescia; G. Maggi; Giacinto Donvito; J. Caballero; J A Sanches; C. Kavka; F v Lingen; W. Bacchi; G. Codispoti; P. Elmer; G. Eulisse; C. Lazaridis; S. Kalini
Monte Carlo production in CMS has received a major boost in performance and scale since the past CHEP06 conference. The production system has been re-engineered in order to incorporate the experience gained in running the previous system and to integrate production with the new CMS event data model, data management system and data processing framework. The system is interfaced to the two major computing Grids used by CMS, the LHC Computing Grid (LCG) and the Open Science Grid (OSG). Operational experience and integration aspects of the new CMS Monte Carlo production system is presented together with an analysis of production statistics. The new system automatically handles job submission, resource monitoring, job queuing, job distribution according to the available resources, data merging, registration of data into the data bookkeeping, data location, data transfer and placement systems. Compared to the previous production system automation, reliability and performance have been considerably improved. A more efficient use of computing resources and a better handling of the inherent Grid unreliability have resulted in an increase of production scale by about an order of magnitude, capable of running in parallel at the order of ten thousand jobs and yielding more than two million events per day.
Nuclear Instruments & Methods in Physics Research Section A-accelerators Spectrometers Detectors and Associated Equipment | 2005
J.M. Hernández; D. Ressing; V. Rybnikov; Francisca Javiela Munoz Sanchez; A. Amorim; M. Medinnis; P. Kreuzer; U. Schwanke
This paper describes the architecture and implementation of the HERA-B framework for online calibration and alignment. At HERA-B the performance of all trigger levels, including the online reconstruction, strongly depends on using the appropriate calibration and alignment constants, which might change during data taking. A system to monitor, recompute and distribute those constants to online processes has been integrated in the data acquisition and trigger systems.
Journal of Physics: Conference Series | 2012
Barry Blumenfeld; Dave Dykstra; P. Kreuzer; Ran Du; Weizhen Wang
The Frontier framework is used in the CMS experiment at the LHC to deliver conditions data to processing clients worldwide, including calibration, alignment, and configuration information. Each central server at CERN, called a Frontier Launchpad, uses tomcat as a servlet container to establish the communication between clients and the central Oracle database. HTTP-proxy Squid servers, located close to clients, cache the responses to queries in order to provide high performance data access and to reduce the load on the central Oracle database. Each Frontier Launchpad also has its own reverse-proxy Squid for caching. The three central servers have been delivering about 5 million responses every day since the LHC startup, containing about 40 GB data in total, to more than one hundred Squid servers located worldwide, with an average response time on the order of 10 milliseconds. The Squid caches deployed worldwide process many more requests per day, over 700 million, and deliver over 40 TB of data. Several monitoring tools of the tomcat log files, the accesses of the Squids on the central Launchpad servers, and the availability of remote Squids have been developed to guarantee the performance of the service and make the system easily maintainable. Following a brief introduction of the Frontier framework, we describe the performance of this highly reliable and stable system, detail monitoring concerns and their deployment, and discuss the overall operational experience from the first two years of LHC data-taking.
Journal of Physics: Conference Series | 2012
G. Bagliesi; K. Bloom; C. Brew; J Flix; P. Kreuzer; A. Sciaba
The CMS experiment has adopted a computing system where resources are distributed worldwide in more than 50 sites. The operation of the system requires a stable and reliable behaviour of the underlying infrastructure. CMS has established procedures to extensively test all relevant aspects of a site and their capability to sustain the various CMS computing workflows at the required scale. The Site Readiness monitoring infrastructure has been instrumental in understanding how the system as a whole was improving towards LHC operations, measuring the reliability of sites when running CMS activities, and providing sites with the information they need to troubleshoot any problem. This contribution reviews the complete automation of the Site Readiness program, with the description of monitoring tools and their inclusion into the Site Status Board (SSB), the performance checks, the use of tools like HammerCloud, and the impact in improving the overall reliability of the Grid from the point of view of the CMS computing system. These results are used by CMS to select good sites to conduct workflows, in order to maximize workflows efficiencies. The performance against these tests seen at the sites during the first years of LHC running is as well reviewed.
Journal of Physics: Conference Series | 2011
H Riahi; S. J. Gowdy; P. Kreuzer; J Bakken; M Cinquilli; D Evans; S Foulkes; R Kaselis; S. Metson; D. Spiga; E Vaandering
While a majority of CMS data analysis activities rely on the distributed computing infrastructure on the WLCG Grid, dedicated local computing facilities have been deployed to address particular requirements in terms of latency and scale. The CMS CERN Analysis Facility (CAF) was primarily designed to host a large variety of latency-critical workflows. These break down into alignment and calibration, detector commissioning and diagnosis, and high-interest physics analysis requiring fast turnaround. In order to reach the goal for fast turnaround tasks, the Workload Management group has designed a CRABServer based system to fit with two main needs: to provide a simple, familiar interface to the user (as used in the CRAB Analysis Tool[7]) and to allow an easy transition to the Tier-0 system. While the CRABServer component had been initially designed for Grid analysis by CMS end-users, with a few modifications it turned out to be also a very powerful service to manage and monitor local submissions on the CAF. Transition to Tier-0 has been guaranteed by the usage of the WMCore, a library developed by CMS to be the common core of workload management tools, for handing data driven workflow dependencies. This system is now being used with the first use cases, and important experience is being acquired. In addition to the CERN CAF facility, FNAL has CMS dedicated analysis resources at the FNAL LHC Physics Center (LPC). In the first few years of data collection FNAL has been able to accept a large fraction of CMS data. The remote centre is not well suited for the extremely low latency work expected of the CAF, but the presence of substantial analysis resources, a large resident community, and a large fraction of the data make the LPC a strong facility for resource intensive analysis. We present the building, commissioning and operation of these dedicated analysis facilities in the first year of LHC collisions; we also present the specific development to our software needed to allow for the use of these computing facilities in the special use cases of fast turnaround analyses.
Journal of Physics: Conference Series | 2012
J Molina-Perez; D Bonacorsi; O Gutsche; A Sciabà; J Flix; P. Kreuzer; E Fajardo; T Boccali; M Klute; D Gomes; R Kaselis; R Du; N Magini; I Butenas; W Wang
The CMS offline computing system is composed of roughly 80 sites (including most experienced T3s) and a number of central services to distribute, process and analyze data worldwide. A high level of stability and reliability is required from the underlying infrastructure and services, partially covered by local or automated monitoring and alarming systems such as Lemon and SLS; the former collects metrics from sensors installed on computing nodes and triggers alarms when values are out of range, the latter measures the quality of service and warns managers when service is affected. CMS has established computing shift procedures with personnel operating worldwide from remote Computing Centers, under the supervision of the Computing Run Coordinator at CERN. This dedicated 24/7 computing shift personnel is contributing to detect and react timely on any unexpected error and hence ensure that CMS workflows are carried out efficiently and in a sustained manner. Synergy among all the involved actors is exploited to ensure the 24/7 monitoring, alarming and troubleshooting of the CMS computing sites and services. We review the deployment of the monitoring and alarming procedures, and report on the experience gained throughout the first two years of LHC operation. We describe the efficiency of the communication tools employed, the coherent monitoring framework, the proactive alarming systems and the proficient troubleshooting procedures that helped the CMS Computing facilities and infrastructure to operate at high reliability levels.
Journal of Physics: Conference Series | 2012
Kenneth Bloom; I. Fisk; P. Kreuzer; Gonzalo Merino
In the large LHC experiments the majority of computing resources are provided by the participating countries. These resource pledges account for more than three quarters of the total available computing. The experiments are asked to give indications of their requests three years in advance and to evolve these as the details and constraints become clearer. In this paper we will discuss the resource planning techniques used in CMS to predict the computing resources several years in advance. We will discuss how we attempt to implement the activities of the computing model in spread-sheets and formulas to calculate the needs. We will talk about how those needs are reflected in the 2012 running and how the planned long shutdown of the LHC in 2013 and 2014 impacts the planning process and the outcome. In the end we will speculate on the computing needs in the second major run of LHC.