David Britton
University of Glasgow
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David Britton.
Journal of Physics: Conference Series | 2015
Gareth Roy; A. Washbrook; D. R. M. Crooks; Gang Qin; Samuel Cadellin Skipsey; Gordon Stewart; David Britton
In this paper the emerging technology of Linux containers is examined and evaluated for use in the High Energy Physics (HEP) community. Key technologies required to enable containerisation will be discussed along with emerging technologies used to manage container images. An evaluation of the requirements for containers within HEP will be made and benchmarking will be carried out to asses performance over a range of HEP workflows. The use of containers will be placed in a broader context and recommendations on future work will be given.
Journal of Physics: Conference Series | 2014
D. R. M. Crooks; Mark Mitchell; Stuart Purdie; Gareth Roy; Samuel Cadellin Skipsey; David Britton
The monitoring of a grid cluster (or of any piece of reasonably scaled IT infrastructure) is a key element in the robust and consistent running of that site. There are several factors which are important to the selection of a useful monitoring framework, which include ease of use, reliability, data input and output. It is critical that data can be drawn from different instrumentation packages and collected in the framework to allow for a uniform view of the running of a site. It is also very useful to allow different views and transformations of this data to allow its manipulation for different purposes, perhaps unknown at the initial time of installation. In this context, we present the findings of an investigation of the Graphite monitoring framework and its use at the ScotGrid Glasgow site. In particular, we examine the messaging system used by the framework and means to extract data from different tools, including the existing framework Ganglia which is in use at many sites, in addition to adapting and parsing data streams from external monitoring frameworks and websites.
Journal of Physics: Conference Series | 2017
Samuel Cadellin Skipsey; A. Dewhurst; D. R. M. Crooks; Ewan MacMahon; Gareth Roy; Oliver Smith; Kashif Mohammed; Chris Brew; David Britton
Operational and other pressures have lead to WLCG experiments moving increasingly to a stratified model for Tier-2 resources, where ``fat Tier-2s (``T2Ds) and ``thin Tier-2s (``T2Cs) provide different levels of service. nIn the UK, this distinction is also encouraged by the terms of the current GridPP5 funding model. In anticipation of this, testing has been performed on the implications, and potential implementation, of such a distinction in our resources. nIn particular, this presentation presents the results of testing of storage T2Cs, where the ``thin nature is expressed by the site having either no local data storage, or only a thin caching layer; data is streamed or copied from a ``nearby T2D when needed by jobs. n nIn OSG, this model has been adopted successfully for CMS AAA sites; but the network topology and capacity in the USA is significantly different to that in the UK (and much of Europe). n nWe present the result of several operational tests: the in-production University College London (UCL) site, which runs ATLAS workloads using storage at the Queen Mary University of London (QMUL) site; the Oxford site, which has had scaling tests performed against T2Ds in various locations in the UK (to test network effects); and the Durham site, which has been testing the specific ATLAS caching solution of ``Rucio Cache integration with ARCs caching layer.
Journal of Physics: Conference Series | 2016
G. Qin; Gareth Roy; D. R. M. Crooks; Samuel Cadellin Skipsey; Gordon Stewart; David Britton
The Linux kernel feature Control Groups (cgroups) has been used to gather metrics on the resource usage of single and eight-core ATLAS workloads. It has been used to study the effects on performance of a reduction in the amount of physical memory. The results were used to optimise cluster performance, and consequently increase cluster throughput by up to 10%.
arXiv: Distributed, Parallel, and Cluster Computing | 2015
Samuel Cadellin Skipsey; Paulin Todev; David Britton; D. R. M. Crooks; Gareth Roy
The state of the art in Grid style data management is to achieve increased resilience of data via multiple complete replicas of data files across multiple storage endpoints. While this is effective, it is not the most space-efficient approach to resilience, especially when the reliability of individual storage endpoints is sufficiently high that only a few will be inactive at any point in time. We report on work performed as part of GridPPcite{GridPP}, extending the Dirac File Catalogue and file management interface to allow the placement of erasure-coded files: each file distributed as N identically-sized chunks of data striped across a vector of storage endpoints, encoded such that any M chunks can be lost and the original file can be reconstructed. The tools developed are transparent to the user, and, as well as allowing up and downloading of data to Grid storage, also provide the possibility of parallelising access across all of the distributed chunks at once, improving data transfer and IO performance. We expect this approach to be of most interest to smaller VOs, who have tighter bounds on the storage available to them, but larger (WLCG) VOs may be interested as their total data increases during Run 2. We provide an analysis of the costs and benefits of the approach, along with future development and implementation plans in this area. In general, overheads for multiple file transfers provide the largest issue for competitiveness of this approach at present.
arXiv: Computational Physics | 2015
Samuel Cadellin Skipsey; Shaun De Witt; A. Dewhurst; David Britton; Gareth Roy; D. R. M. Crooks
The Object Store model has quickly become the basis of most commercially successful mass storage infrastructure, backing so-called Cloud storage such as Amazon S3, but also underlying the implementation of most parallel distributed storage systems. Many of the assumptions in Object Store design are similar, but not identical, to concepts in the design of Grid Storage Elements, although the requirement for POSIX-like filesystem structures on top of SEs makes the disjunction seem larger. As modern Object Stores provide many features that most Grid SEs do not (block level striping, parallel access, automatic file repair, etc.), it is of interest to see how easily we can provide interfaces to typical Object Stores via plugins and shims for Grid tools, and how well experiments can adapt their data models to them. We present evaluation of, and first-deployment experiences with, (for example) Xrootd-Ceph interfaces for direct object-store access, as part of an initiative within GridPPcite{GridPP} hosted at RAL. Additionally, we discuss the tradeoffs and experience of developing plugins for the currently-popular {it Ceph} parallel distributed filesystem for the GFAL2 access layer, at Glasgow.
Journal of Physics: Conference Series | 2015
A. Washbrook; D. R. M. Crooks; Gareth Roy; Samuel Cadellin Skipsey; G. Qin; Gordon Stewart; David Britton
The field of analytics, the process of analysing data to visualise meaningful patterns and trends, has become increasingly important in scientific computing as the volume and variety of data available to process has significantly increased. There is now ongoing work in the High Energy Physics (HEP) community in this area, for example in the augmentation of systems management at WLCG computing sites. We report on work evaluating the feasibility of distributed site-oriented analytics using the Elasticsearch, Logstash and Kibana software stack and demonstrate functionality by the application of two workflows that give greater insight into site operations.
Journal of Physics: Conference Series | 2014
Gareth Roy; D. R. M. Crooks; Lena Mertens; Mark Mitchell; Stuart Purdie; Samuel Cadellin Skipsey; David Britton
With the current trend towards On Demand Computing in big data environments it is crucial that the deployment of services and resources becomes increasingly automated. Deployment based on cloud platforms is available for large scale data centre environments but these solutions can be too complex and heavyweight for smaller, resource constrained WLCG Tier-2 sites. Along with a greater desire for bespoke monitoring and collection of Grid related metrics, a more lightweight and modular approach is desired. In this paper we present a model for a lightweight automated framework which can be use to build WLCG grid sites, based on off the shelf software components. As part of the research into an automation framework the use of both IPMI and SNMP for physical device management will be included, as well as the use of SNMP as a monitoring/data sampling layer such that more comprehensive decision making can take place and potentially be automated. This could lead to reduced down times and better performance as services are recognised to be in a non-functional state by autonomous systems.
Journal of Physics: Conference Series | 2014
Samuel Cadellin Skipsey; Stuart Purdie; David Britton; Mark Mitchell; W. Bhimji; David Smith
Of the three most widely used implementations of the WLCG Storage Element specification, Disk Pool Manager[1, 2] (DPM) has the simplest implementation of file placement balancing (StoRM doesnt attempt this, leaving it up to the underlying filesystem, which can be very sophisticated in itself). DPM uses a round-robin algorithm (with optional filesystem weighting), for placing files across filesystems and servers. This does a reasonable job of evenly distributing files across the storage array provided to it. However, it does not offer any guarantees of the evenness of distribution of that subset of files associated with a given dataset (which often maps onto a directory in the DPM namespace (DPNS)). It is useful to consider a concept of balance, where an optimally balanced set of files indicates that the files are distributed evenly across all of the pool nodes. The best case performance of the round robin algorithm is to maintain balance, it has no mechanism to improve balance. In the past year or more, larger DPM sites have noticed load spikes on individual disk servers, and suspected that these were exacerbated by excesses of files from popular datasets on those servers. We present here a software tool which analyses file distribution for all datasets in a DPM SE, providing a measure of the poorness of file location in this context. Further, the tool provides a list of file movement actions which will improve dataset-level file distribution, and can action those file movements itself. We present results of such an analysis on the UKI-SCOTGRID-GLASGOW Production DPM.
Archive | 2015
Samuel Cadellin Skipsey; Paulin Todev; David Britton; D. R. M. Crooks; Gareth Roy