Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jon Hill is active.

Publication


Featured researches published by Jon Hill.


BMC Bioinformatics | 2008

SPRINT: A new parallel framework for R

Jon Hill; Matthew Hambley; Thorsten Forster; Muriel Mewissen; Terence Sloan; Florian Scharinger; Arthur Trew; Peter Ghazal

BackgroundMicroarray analysis allows the simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples. The analysis of the resulting data tests the limits of existing bioinformatics computing infrastructure. A solution to this issue is to use High Performance Computing (HPC) systems, which contain many processors and more memory than desktop computer systems. Many biostatisticians use R to process the data gleaned from microarray analysis and there is even a dedicated group of packages, Bioconductor, for this purpose. However, to exploit HPC systems, R must be able to utilise the multiple processors available on these systems. There are existing modules that enable R to use multiple processors, but these are either difficult to use for the HPC novice or cannot be used to solve certain classes of problems. A method of exploiting HPC systems, using R, but without recourse to mastering parallel programming paradigms is therefore necessary to analyse genomic data to its fullest.ResultsWe have designed and built a prototype framework that allows the addition of parallelised functions to R to enable the easy exploitation of HPC systems. The Simple Parallel R INTerface (SPRINT) is a wrapper around such parallelised functions. Their use requires very little modification to existing sequential R scripts and no expertise in parallel computing. As an example we created a function that carries out the computation of a pairwise calculated correlation matrix. This performs well with SPRINT. When executed using SPRINT on an HPC resource of eight processors this computation reduces by more than three times the time R takes to complete it on one processor.ConclusionSPRINT allows the biostatistician to concentrate on the research problems rather than the computation, while still allowing exploitation of HPC systems. It is easy to use and with further development will become more useful as more functions are added to the framework.


Computers & Geosciences | 2009

Modeling shallow marine carbonate depositional systems

Jon Hill; Daniel Tetzlaff; Andrew Curtis; Rachel Wood

Geological Process Models (GPMs) have been used in the past to simulate the distinctive stratigraphies formed in carbonate sediments, and to explore the interaction of controls that produce heterogeneity. Previous GPMs have only indirectly included the supersaturation of calcium carbonate in seawater, a key physicochemical control on carbonate production in reef and lagoon environments, by modifying production rates based on the distance from open marine sources. We here use the residence time of water in the lagoon and reef areas as a proxy for the supersaturation state of carbonate in a new process model, Carbonate GPM. Residence times in the model are calculated using a particle-tracking algorithm. Carbonate production is also controlled by water depth and wave power dissipation. Once deposited, sediment can be eroded, transported and re-deposited via both advective and diffusive processes. We show that using residence time as a control on production might explain the formation of non-ordered, three-dimensional carbonate stratigraphies by lateral shifts in the locus of carbonate deposition on timescales comparable to so-called 5th-order sea-level oscillations. We also show that representing supersaturation as a function of distance from open marine sources, as in previous models, cannot correctly predict the supersaturation distribution over a lagoon due to the intricacies of the flow regime.


Biodiversity Data Journal | 2014

The Supertree Toolkit 2: a new and improved software package with a Graphical User Interface for supertree construction.

Jon Hill; Katie E. Davis

Abstract Building large supertrees involves the collection, storage, and processing of thousands of individual phylogenies to create large phylogenies with thousands to tens of thousands of taxa. Such large phylogenies are useful for macroevolutionary studies, comparative biology and in conservation and biodiversity. No easy to use and fully integrated software package currently exists to carry out this task. Here, we present a new Python-based software package that uses well defined XML schema to manage both data and metadata. It builds on previous versions by 1) including new processing steps, such as Safe Taxonomic Reduction, 2) using a user-friendly GUI that guides the user to complete at least the minimum information required and includes context-sensitive documentation, and 3) a revised storage format that integrates both tree- and meta-data into a single file. These data can then be manipulated according to a well-defined, but flexible, processing pipeline using either the GUI or a command-line based tool. Processing steps include standardising names, deleting or replacing taxa, ensuring adequate taxonomic overlap, ensuring data independence, and safe taxonomic reduction. This software has been successfully used to store and process data consisting of over 1000 trees ready for analyses using standard supertree methods. This software makes large supertree creation a much easier task and provides far greater flexibility for further work.


BMC Research Notes | 2010

The Supertree Tool Kit

Katie E. Davis; Jon Hill

BackgroundLarge phylogenies are crucial for many areas of biological research. One method of creating such large phylogenies is the supertree method, but creating supertrees containing thousands of taxa, and hence providing a comprehensive phylogeny, requires hundred or even thousands of source input trees. Managing and processing these data in a systematic and error-free manner is challenging and will become even more so as supertrees contain ever increasing numbers of taxa. Protocols for processing input source phylogenies have been proposed to ensure data quality, but no robust software implementations of these protocols as yet exist.FindingsThe aim of the Supertree Tool Kit (STK) is to aid in the collection, storage and processing of input source trees for use in supertree analysis. It is therefore invaluable when creating supertrees containing thousands of taxa and hundreds of source trees. The STK is a Perl module with executable scripts to carry out various steps in the processing protocols. In order to aid processing we have added meta-data, via XML, to each tree which contains information such as the bibliographic source information for the tree and how the data were derived, for instance the character data used to carry out the original analysis. These data are essential parts of previously proposed protocols.ConclusionsThe STK is a bioinformatics tool designed to make it easier to process source phylogenies for inclusion in supertree analysis from hundreds or thousands of input source trees, whilst reducing potential errors and enabling easy sharing of such datasets. It has been successfully used to create the largest known supertree to date containing over 5000 taxa from over 700 source phylogenies.


Computer Physics Communications | 2008

Performance of a Lattice Quantum Chromodynamics kernel on the Cell processor

J. Spray; Jon Hill; Arthur Trew

The implementation of a proof-of-concept Lattice Quantum Chromodynamics kernel on the Cell processor is described in detail, illustrating issues encountered in the porting process. The resulting code performs up to 45 GFlop/s per socket (without inter-node parallel communications), indicating that the Cell processor is likely to be a good platform for future Lattice QCD calculations.


Nature Communications | 2016

Global cooling as a driver of diversification in a major marine clade

Katie E. Davis; Jon Hill; Tim I. Astrop; Matthew A. Wills

Climate is a strong driver of global diversity and will become increasingly important as human influences drive temperature changes at unprecedented rates. Here we investigate diversification and speciation trends within a diverse group of aquatic crustaceans, the Anomura. We use a phylogenetic framework to demonstrate that speciation rate is correlated with global cooling across the entire tree, in contrast to previous studies. Additionally, we find that marine clades continue to show evidence of increased speciation rates with cooler global temperatures, while the single freshwater clade shows the opposite trend with speciation rates positively correlated to global warming. Our findings suggest that both global cooling and warming lead to diversification and that habitat plays a role in the responses of species to climate change. These results have important implications for our understanding of how extant biota respond to ongoing climate change and are of particular importance for conservation planning of marine ecosystems.


Database | 2015

BioAcoustica: a free and open repository and analysis platform for bioacoustics.

Edward Baker; Ben W. Price; Simon D. Rycroft; Jon Hill; Vincent S. Smith

We describe an online open repository and analysis platform, BioAcoustica (http://bio.acousti.ca), for recordings of wildlife sounds. Recordings can be annotated using a crowdsourced approach, allowing voice introductions and sections with extraneous noise to be removed from analyses. This system is based on the Scratchpads virtual research environment, the BioVeL portal and the Taverna workflow management tool, which allows for analysis of recordings using a grid computing service. At present the analyses include spectrograms, oscillograms and dominant frequency analysis. Further analyses can be integrated to meet the needs of specific researchers or projects. Researchers can upload and annotate their recordings to supplement traditional publication. Database URL: http://bio.acousti.ca


Concurrency and Computation: Practice and Experience | 2011

Optimization of a parallel permutation testing function for the SPRINT R package

Savvas Petrou; Terence Sloan; Muriel Mewissen; Thorsten Forster; Michal Piotrowski; Bartosz Dobrzelecki; Peter Ghazal; Arthur Trew; Jon Hill

The statistical language R and its Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by some analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing systems offer a solution to this issue. The Simple Parallel R Interface (SPRINT) is a package that provides biostatisticians with easy access to High Performance Computing systems and allows the addition of parallelized functions to R. Previous work has established that the SPRINT implementation of an R permutation testing function has close to optimal scaling on up to 512 processors on a supercomputer. Access to supercomputers, however, is not always possible, and so the work presented here compares the performance of the SPRINT implementation on a supercomputer with benchmarks on a range of platforms including cloud resources and a common desktop machine with multiprocessing capabilities. Copyright


Nature Communications | 2017

Tidal dynamics and mangrove carbon sequestration during the Oligo–Miocene in the South China Sea

Daniel S. Collins; Alexandros Avdis; Peter A. Allison; Howard D. Johnson; Jon Hill; Matthew D. Piggott; Meor Hakif Amir Hassan; Abdul Razak Damit

Modern mangroves are among the most carbon-rich biomes on Earth, but their long-term (≥106 years) impact on the global carbon cycle is unknown. The extent, productivity and preservation of mangroves are controlled by the interplay of tectonics, global sea level and sedimentation, including tide, wave and fluvial processes. The impact of these processes on mangrove-bearing successions in the Oligo–Miocene of the South China Sea (SCS) is evaluated herein. Palaeogeographic reconstructions, palaeotidal modelling and facies analysis suggest that elevated tidal range and bed shear stress optimized mangrove development along tide-influenced tropical coastlines. Preservation of mangrove organic carbon (OC) was promoted by high tectonic subsidence and fluvial sediment supply. Lithospheric storage of OC in peripheral SCS basins potentially exceeded 4,000 Gt (equivalent to 2,000 p.p.m. of atmospheric CO2). These results highlight the crucial impact of tectonic and oceanographic processes on mangrove OC sequestration within the global carbon cycle on geological timescales.


Methods of Information in Medicine | 2012

Exploiting Parallel R in the Cloud with SPRINT

Michal Piotrowski; Gary A. McGilvary; Terence Sloan; Muriel Mewissen; Ashley D. Lloyd; Thorsten Forster; Lawrence Mitchell; Peter Ghazal; Jon Hill

BACKGROUND Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. OBJECTIVES Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazons Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. METHODS The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. RESULTS It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of the algorithm. Resource underutilization can further improve the time to result. End-users location impacts on costs due to factors such as local taxation. CONCLUSIONS Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds.

Collaboration


Dive into the Jon Hill's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Arthur Trew

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar

Peter Ghazal

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge