Michal Piotrowski
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michal Piotrowski.
Proceedings of the second international workshop on Emerging computational methods for the life sciences | 2011
Lawrence Mitchell; Terence Sloan; Muriel Mewissen; Peter Ghazal; Thorsten Forster; Michal Piotrowski; Arthur Trew
The statistical language R is favoured by many biostaticians for processing microarray data. In recent times, the quantity of data that can be obtained in experiments has risen significantly, making previously fast analyses time consuming, or even not possible at all with the existing software infrastructure. High Performance Computing (HPC) systems offer a solution to these problems, but at the expense of increased complexity for the end user. The Simple Parallel R Interface (SPRINT) is a library for R that aims to reduce the complexity of using HPC systems by providing biostatisticians with drop-in parallelized replacements of existing R functions. In this paper we describe the implementation of a parallel version of the Random Forest classifier in the SPRINT library.
Concurrency and Computation: Practice and Experience | 2014
Lawrence Mitchell; Terence Sloan; Muriel Mewissen; Peter Ghazal; Thorsten Forster; Michal Piotrowski; Arthur Trew
The statistical language R is favoured by many biostatisticians for processing microarray data. In recent times, the quantity of data that can be obtained in experiments has risen significantly, making previously fast analyses time consuming or even not possible at all with the existing software infrastructure. High performance computing (HPC) systems offer a solution to these problems but at the expense of increased complexity for the end user. The Simple Parallel R Interface is a library for R that aims to reduce the complexity of using HPC systems by providing biostatisticians with drop‐in parallelised replacements of existing R functions. In this paper we describe parallel implementations of two popular techniques: exploratory clustering analyses using the random forest classifier and feature selection through identification of differentially expressed genes using the rank product method. Copyright
Methods of Information in Medicine | 2012
Michal Piotrowski; Gary A. McGilvary; Terence Sloan; Muriel Mewissen; Ashley D. Lloyd; Thorsten Forster; Lawrence Mitchell; Peter Ghazal; Jon Hill
BACKGROUND Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. OBJECTIVES Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazons Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. METHODS The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. RESULTS It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of the algorithm. Resource underutilization can further improve the time to result. End-users location impacts on costs due to factors such as local taxation. CONCLUSIONS Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds.
international conference on high performance computing and simulation | 2011
Michal Piotrowski; Thorsten Forster; Bartosz Dobrezelecki; Terence Sloan; Lawrence Mitchell; Peter Ghazal; Muriel Mewsissen; Savvas Petrou; Arthur Trew; Jon Hill
R is a free statistical programming language commonly used for the analysis of high-throughput microarray and other data. It is currently unable to easily utilise multiprocessor architectures without substantial changes to existing R scripts. Further, working with large volumes of data often leads to slow processing and even memory allocation faults. A recent survey highlighted clustering algorithms as both computation and data intensive bottlenecks in post-genomic data analyses. These algorithms aim to sort numeric vectors (such as gene expression profiles) into groups by minimising vector distances within groups and maximising them between groups. This paper describes the optimisation and parallelisation of a popular clustering algorithm, partitioning around medoids (PAM), for the Simple Parallel R INTerface (SPRINT). SPRINT allows R users to exploit high performance computing systems without expert knowledge of such systems. This paper reports on a serial optimisation of the original code and a subsequent parallel implementation. The parallel implementation enables the processing of data sets that exceed the available physical memory and can yield, depending on the data set, over 100-fold increase in performance
Grid and Cloud Database Management | 2011
Bartosz Dobrzelecki; Amrey Krause; Michal Piotrowski; Neil Chue Hong
Database management techniques using distributed processing services have evolved to address the issues of distributed, heterogeneous data collections held across dynamic, virtual organisations [1-3]. These techniques, originally developed for data grids in domains such as high-energy particle physics [4], have been adapted to make use of the emerging cloud infrastructures [5].
Philosophical Transactions of the Royal Society A | 2009
Jeremy Nowell; Charaka Palansuriya; Michal Piotrowski; Florian Scharinger; Paul Graham; Bartosz Dobrzelecki; Arthur Trew
As large grid infrastructures, such as Enabling Grids for E-sciencE, mature, they are being used by scientists around the world in their daily work, running thousands of concurrent computational jobs and transferring large amounts of data. The successful and sustainable operation of such grid infrastructures is only possible through the use of monitoring tools. The underlying networks upon which grid infrastructures are built are critical to their operation; therefore, network monitoring becomes an important part of the overall grid monitoring strategy. In this paper, the design and implementation of a set of tools for providing access to federated network monitoring data are presented, based on standards developed within the Open Grid Forum Network Measurements Working Group (NM-WG). These tools give access to data collected by heterogeneous, NM-WG compliant network monitoring tools.
high performance distributed computing | 2010
Savvas Petrou; Terence Sloan; Muriel Mewissen; Thorsten Forster; Michal Piotrowski; Bartosz Dobrzelecki
The statistical language R and Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by these analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing (HPC) systems offer a solution to this issue. The Simple Parallel R INTerface (SPRINT) is a package that provides biostatisticians with easy access to HPC systems and allows the addition of parallelized functions to R. This paper will present how we added a parallelized permutation testing function in R using SPRINT and how this function performs on a supercomputer for executions of up to 512 processes.
Archive | 2009
Michal Piotrowski
Clusters of shared memory nodes have become a system of choice for many research and enterprise projects. Mixed mode programming is a combination of shared and distributed programming models and naturally matches the SMP cluster architecture. It can potentially exploit features of the system by replacing the message exchanges within a node with faster direct reads and writes from memory, using message passing only to exchange information between the nodes.
arXiv: Computation | 2014
Terence Sloan; Michal Piotrowski; Thorsten Forster; Peter Ghazal
Archive | 2011
Michal Piotrowski; Muriel Mewissen; Terence Sloan; Thorsten Forster; Lawrence Mitchell; Peter Ghazal