David Colling | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Colling is active.

Explore More

Publication

Featured researches published by David Colling.

Journal of Grid Computing | 2004

The DataGrid Workload Management System: Challenges and Results

G. Avellino; S. Beco; B. Cantalupo; A. Maraschini; F. Pacini; M. Sottilaro; A. Terracina; David Colling; F. Giacomini; Elisabetta Ronchieri; A. Gianelle; M. Mazzucato; R. Peluso; M. Sgaravatto; Andrea Guarise; R. Piro; Albert Werbrouck; Daniel Kouřil; Aleš Křenek; Ludek Matyska; Miloš Mulač; Jan Pospíšil; Miroslav Ruda; Zdeněk Salvet; Jiří Sitera; Jiří Škrabal; Michal Voců; M. Mezzadri; F. Prelz; S. Monforte

The workload management task of the DataGrid project was mandated to define and implement a suitable architecture for distributed scheduling and resource management in a Grid environment. The result was the design and implementation of a Grid Workload Management System, a super-scheduler with the distinguishing property of being able to take data access requirements into account when scheduling jobs to the available Grid resources. Many novel issues in various fields were faced such as resource management, resource reservation and co-allocation, Grid accounting. In this paper, the architecture and the functionality provided by the DataGrid Workload Management System are presented.

Philosophical Transactions of the Royal Society A | 2009

GridPP: the UK grid for particle physics

D. Britton; A.J. Cass; P.E.L. Clarke; J. Coles; David Colling; A.T. Doyle; N.I. Geddes; J.C. Gordon; R.W.L. Jones; D.P. Kelsey; S.L. Lloyd; R.P. Middleton; G.N. Patrick; R.A. Sansum; S.E. Pearce

The start-up of the Large Hadron Collider (LHC) at CERN, Geneva, presents a huge challenge in processing and analysing the vast amounts of scientific data that will be produced. The architecture of the worldwide grid that will handle 15 PB of particle physics data annually from this machine is based on a hierarchical tiered structure. We describe the development of the UK component (GridPP) of this grid from a prototype system to a full exploitation grid for real data analysis. This includes the physical infrastructure, the deployment of middleware, operational experience and the initial exploitation by the major LHC experiments.

Journal of Grid Computing | 2010

Optimization of jobs submission on the EGEE production grid: modeling faults using workload

Diane Lingrand; Johan Montagnat; Janusz Martyniak; David Colling

It is commonly observed that production Grids are inherently unreliable. The aim of this work is to improve Grid application performances by tuning the job submission system. A stochastic model, capturing the behavior of a complex Grid workload management system is proposed. To instantiate the model, detailed statistics are extracted from dense Grid activity traces. The model is exploited for optimizing a simple job resubmission strategy. It provides quantitative inputs to improve job submission performance and it enables the impact of faults and outliers on Grid operations to be quantified.

job scheduling strategies for parallel processing | 2009

Analyzing the EGEE Production Grid Workload: Application to Jobs Submission Optimization

Diane Lingrand; Johan Montagnat; Janusz Martyniak; David Colling

Grids reliability remains an order of magnitude below clusters on production infrastructures. This work is aims at improving grid application performances by improving the job submission system. A stochastic model, capturing the behavior of a complex grid workload management system is proposed. To instantiate the model, detailed statistics are extracted from dense grid activity traces. The model is exploited in a simple job resubmission strategy. It provides quantitative inputs to improve job submission performance and it enables quantifying the impact of faults and outliers on grid operations.

Archive | 2009

On Quality of Service Support for Grid Computing

David Colling; T. Ferrari; Y. Hassoun; C. Huang; C. Kotsokalis; Andrew Stephen McGough; E. Ronchieri; Y. Patel; Panayiotis Tsanakas

Computing Grids are hardware and software infrastructures that support secure sharing and concurrent access to distributed services by a large number of competing users from different virtual organizations. Concurrency can easily lead to overload and resource shortcomings in large-scale Grid infrastructures, as today’s Grids do not offer differentiated services. We propose a framework for supporting quality of service guarantees via both reservation and discovery of best-effort services based on the matchmaking of application requirements and quality of service performance profiles of the candidate services. We illustrate the middleware components needed to support both strict and loose guarantees and the performance assessment techniques for the discovery of suitable services.

workflows in support of large scale science | 2007

GRIDCC: real-time workflow system

Andrew Stephen McGough; Asif Akram; Li Guo; Marko Krznaric; Luke Dickens; David Colling; Janusz Martyniak; Roger Powell; P. Kyberd; Constantinos Kotsokalis

The Grid is a concept which allows the sharing of resources between distributed communities, allowing each to progress towards potentially different goals. As adoption of the Grid increases so are the activities that people wish to conduct through it. The GRIDCC project is a European Union funded project addressing the issues of integrating instruments into the Grid. This increases the requirement of workflows and Quality of Service upon these workflows as many of these instruments have real-time requirements. In thispaper we present the workflow management service within the GRIDCC project which is tasked with optimising the workflows and ensuring that they meet the pre-defined QoS requirements specified upon them.

Journal of Physics: Conference Series | 2010

Real Time Monitor of Grid job executions

David Colling; Janusz Martyniak; Andrew Stephen McGough; Aleš Křenek; Jiří Sitera; Miloš Mulač; František Dvořák

In this paper we describe the architecture and operation of the Real Time Monitor (RTM), developed by the Grid team in the HEP group at Imperial College London. This is arguably the most popular dissemination tool within the EGEE [1] Grid. Having been used, on many occasions including GridFest and LHC inauguration events held at CERN in October 2008. The RTM gathers information from EGEE sites hosting Logging and Bookkeeping (LB) services. Information is cached locally at a dedicated server at Imperial College London and made available for clients to use in near real time. The system consists of three main components: the RTM server, enquirer and an apache Web Server which is queried by clients. The RTM server queries the LB servers at fixed time intervals, collecting job related information and storing this in a local database. Job related data includes not only job state (i.e. Scheduled, Waiting, Running or Done) along with timing information but also other attributes such as Virtual Organization and Computing Element (CE) queue – if known. The job data stored in the RTM database is read by the enquirer every minute and converted to an XML format which is stored on a Web Server. This decouples the RTM server database from the client removing the bottleneck problem caused by many clients simultaneously accessing the database. This information can be visualized through either a 2D or 3D Java based client with live job data either being overlaid on to a 2 dimensional map of the world or rendered in 3 dimensions over a globe map using OpenGL.

Philosophical Transactions of the Royal Society A | 2012

RAPPORT: running scientific high-performance computing applications on the cloud.

Jeremy Cohen; Ioannis Filippis; Mark Woodbridge; Daniela Bauer; Neil Chue Hong; Mike Jackson; Sarah Butcher; David Colling; John Darlington; Brian Fuchs; M. J. Harvey

Cloud computing infrastructure is now widely used in many domains, but one area where there has been more limited adoption is research computing, in particular for running scientific high-performance computing (HPC) software. The Robust Application Porting for HPC in the Cloud (RAPPORT) project took advantage of existing links between computing researchers and application scientists in the fields of bioinformatics, high-energy physics (HEP) and digital humanities, to investigate running a set of scientific HPC applications from these domains on cloud infrastructure. In this paper, we focus on the bioinformatics and HEP domains, describing the applications and target cloud platforms. We conclude that, while there are many factors that need consideration, there is no fundamental impediment to the use of cloud infrastructure for running many types of HPC applications and, in some cases, there is potential for researchers to benefit significantly from the flexibility offered by cloud platforms.

ieee nuclear science symposium | 2003

Scalability tests of R-GMA based grid job monitoring system for CMS Monte Carlo data production

D. Bonacorsi; David Colling; L Field; Sm Fisher; C. Grandi; P.R. Hobson; P. Kyberd; B. C. MacEvoy; J. J. Nebrensky; H Tallini; S. Traylen

High-energy physics experiments, such as the compact muon solenoid (CMS) at the large hadron collider (LHC), have large-scale data processing computing requirements. The grid has been chosen as the solution. One important challenge when using the grid for large-scale data processing is the ability to monitor the large numbers of jobs that are being executed simultaneously at multiple remote sites. The relational grid monitoring architecture (R-GMA) is a monitoring and information management service for distributed resources based on the GMA of the Global Grid Forum. We report on the first measurements of R-GMA as part of a monitoring architecture to be used for batch submission of multiple Monte Carlo simulation jobs running on a CMS-specific LHC computing grid test bed. Monitoring information was transferred in real time from remote execution nodes back to the submitting host and stored in a database. In scalability tests, the job submission rates supported by successive releases of R-GMA improved significantly, approaching that expected in full-scale production.

grid computing | 2004

HEP Applications and Their Experience with the Use of DataGrid Middleware

S. Burke; F. J. Harris; Ian Stokes-Rees; I. Augustin; F. Carminati; J. Closier; E. van Herwijnen; A. Sciaba; D Boutigny; J. J. Blaising; Vincent Garonne; A. Tsaregorodtsev; Paolo Capiluppi; A. Fanfani; C. Grandi; R. Barbera; E. Luppi; Guido Negri; L. Perini; S. Resconi; M. Reale; A. De Salvo; S. Bagnasco; P. Cerello; Kors Bos; D.L. Groep; W. van Leeuwen; Jeffrey Templon; Oxana Smirnova; O. J. E. Maroney

An overview is presented of the characteristics of HEP computing and its mapping to the Grid paradigm. This is followed by a synopsis of the main experiences and lessons learned by HEP experiments in their use of DataGrid middleware using both the EDG application testbed and the LCG production service. Particular reference is made to experiment ‘data challenges’, and a forward look is given to necessary developments in the framework of the EGEE project.

Explore More