Muriel Mewissen
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Muriel Mewissen.
BMC Genomics | 2004
Kevin L. Garwood; Thomas McLaughlin; Chris Garwood; Scott Joens; Norman Morrison; Chris F. Taylor; Kathleen M. Carroll; Caroline A. Evans; Anthony D. Whetton; Sarah R. Hart; David Stead; Zhikang Yin; Alistair J. P. Brown; Andrew Hesketh; Keith F. Chater; Lena Hansson; Muriel Mewissen; Peter Ghazal; Julie Howard; Kathryn S. Lilley; Simon J. Gaskell; Andy Brass; Simon J. Hubbard; Stephen G. Oliver; Norman W. Paton
BackgroundProteomics is rapidly evolving into a high-throughput technology, in which substantial and systematic studies are conducted on samples from a wide range of physiological, developmental, or pathological conditions. Reference maps from 2D gels are widely circulated. However, there is, as yet, no formally accepted standard representation to support the sharing of proteomics data, and little systematic dissemination of comprehensive proteomic data sets.ResultsThis paper describes the design, implementation and use of a P roteome E xperimental D ata R epo sitory (PEDRo), which makes comprehensive proteomics data sets available for browsing, searching and downloading. It is also serves to extend the debate on the level of detail at which proteomics data should be captured, the sorts of facilities that should be provided by proteome data management systems, and the techniques by which such facilities can be made available.ConclusionsThe PEDRo database provides access to a collection of comprehensive descriptions of experimental data sets in proteomics. Not only are these data sets interesting in and of themselves, they also provide a useful early validation of the PEDRo data model, which has served as a starting point for the ongoing standardisation activity through the Proteome Standards Initiative of the Human Proteome Organisation.
BMC Bioinformatics | 2008
Jon Hill; Matthew Hambley; Thorsten Forster; Muriel Mewissen; Terence Sloan; Florian Scharinger; Arthur Trew; Peter Ghazal
BackgroundMicroarray analysis allows the simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples. The analysis of the resulting data tests the limits of existing bioinformatics computing infrastructure. A solution to this issue is to use High Performance Computing (HPC) systems, which contain many processors and more memory than desktop computer systems. Many biostatisticians use R to process the data gleaned from microarray analysis and there is even a dedicated group of packages, Bioconductor, for this purpose. However, to exploit HPC systems, R must be able to utilise the multiple processors available on these systems. There are existing modules that enable R to use multiple processors, but these are either difficult to use for the HPC novice or cannot be used to solve certain classes of problems. A method of exploiting HPC systems, using R, but without recourse to mastering parallel programming paradigms is therefore necessary to analyse genomic data to its fullest.ResultsWe have designed and built a prototype framework that allows the addition of parallelised functions to R to enable the easy exploitation of HPC systems. The Simple Parallel R INTerface (SPRINT) is a wrapper around such parallelised functions. Their use requires very little modification to existing sequential R scripts and no expertise in parallel computing. As an example we created a function that carries out the computation of a pairwise calculated correlation matrix. This performs well with SPRINT. When executed using SPRINT on an HPC resource of eight processors this computation reduces by more than three times the time R takes to complete it on one processor.ConclusionSPRINT allows the biostatistician to concentrate on the research problems rather than the computation, while still allowing exploitation of HPC systems. It is easy to use and with further development will become more useful as more functions are added to the framework.
conference on high performance computing (supercomputing) | 1995
Nigel P. Topham; Alasdair Rawsthorne; Callum McLean; Muriel Mewissen; Peter L. Bird
Decoupled architectures provide a key to the problem of sustained supercomputer performance through their ability to hide large memory latencies. When a program executes in a decoupled mode the perceived memory latency at the processor is zero; effectively the entire physical memory has an access time equivalent to the processors register file, and latency is completely hidden. However, the asynchronous functional units within a decoupled architecture must occasionally synchronize, incurring a high penalty. The goal of compiling and optimizing for decoupled architectures is to partition the program between the asynchronous functional units in such a way that latencies are hidden but synchronization events are executed infrequently. This paper describes a model for decoupled compilation, and explains the effectiveness of compilation for decoupled systems. A number of new compiler optimizations are introduced and evaluated quantitatively using the Perfect Club scientific benchmarks. We show that with a suitable repertiore of optimizations, it is possible to hide large latencies most of the time for most of the programs in the Perfect Club.
Bioinformatics | 2006
Graeme Grimes; Ted Wen; Muriel Mewissen; Rob Baxter; Stuart L. Moodie; John S. Beattie; Peter Ghazal
SUMMARY PDQ Wizard automates the process of interrogating biomedical references using large lists of genes, proteins or free text. Using the principle of linkage through co-citation biologists can mine PubMed with these proteins or genes to identify relationships within a biological field of interest. In addition, PDQ Wizard provides novel features to define more specific relationships, highlight key publications describing those activities and relationships, and enhance protein queries. PDQ Wizard also outputs a metric that can be used for prioritization of genes and proteins for further research. AVAILABILITY PDQ Wizard is freely available from http://www.gti.ed.ac.uk/pdqwizard/.
Proceedings of the second international workshop on Emerging computational methods for the life sciences | 2011
Lawrence Mitchell; Terence Sloan; Muriel Mewissen; Peter Ghazal; Thorsten Forster; Michal Piotrowski; Arthur Trew
The statistical language R is favoured by many biostaticians for processing microarray data. In recent times, the quantity of data that can be obtained in experiments has risen significantly, making previously fast analyses time consuming, or even not possible at all with the existing software infrastructure. High Performance Computing (HPC) systems offer a solution to these problems, but at the expense of increased complexity for the end user. The Simple Parallel R Interface (SPRINT) is a library for R that aims to reduce the complexity of using HPC systems by providing biostatisticians with drop-in parallelized replacements of existing R functions. In this paper we describe the implementation of a parallel version of the Random Forest classifier in the SPRINT library.
BMC Genomics | 2005
Graeme Grimes; Stuart L. Moodie; John S. Beattie; Marie Craigon; Paul Dickinson; Thorsten Forster; Andrew D Livingston; Muriel Mewissen; Kevin Robertson; Alan J. Ross; Garwin Sing; Peter Ghazal
BackgroundMacrophages play an integral role in the host immune system, bridging innate and adaptive immunity. As such, they are finely attuned to extracellular and intracellular stimuli and respond by rapidly initiating multiple signalling cascades with diverse effector functions. The macrophage cell is therefore an experimentally and clinically amenable biological system for the mapping of biological pathways. The goal of the macrophage expression atlas is to systematically investigate the pathway biology and interaction network of macrophages challenged with a variety of insults, in particular via infection and activation with key inflammatory mediators. As an important first step towards this we present a single searchable database resource containing high-throughput macrophage gene expression studies.DescriptionThe GPX Macrophage Expression Atlas (GPX-MEA) is an online resource for gene expression based studies of a range of macrophage cell types following treatment with pathogens and immune modulators. GPX-MEA follows the MIAME standard and includes an objective quality score with each experiment. It places special emphasis on rigorously capturing the experimental design and enables the searching of expression data from different microarray experiments. Studies may be queried on the basis of experimental parameters, sample information and quality assessment score. The ability to compare the expression values of individual genes across multiple experiments is provided. In addition, the database offers access to experimental annotation and analysis files and includes experiments and raw data previously unavailable to the research community.ConclusionGPX-MEA is the first example of a quality scored gene expression database focussed on a macrophage cellular system that allows efficient identification of transcriptional patterns. The resource will provide novel insights into the phenotypic response of macrophages to a variety of benign, inflammatory, and pathogen insults. GPX-MEA is available through the GPX website at http://www.gti.ed.ac.uk/GPX.
Journal of Bioinformatics and Computational Biology | 2010
Mizanur Khondoker; Till T. Bachmann; Muriel Mewissen; Paul Dickinson; Bartosz Dobrzelecki; Colin J. Campbell; Andrew R. Mount; Anthony J. Walton; Jason Crain; Holger Schulze; Gerard Giraud; Alan J. Ross; Ilenia Ciani; Stuart W. J. Ember; Chaker Tlili; Jonathan G. Terry; Eilidh Grant; Nicola McDonnell; Peter Ghazal
Machine learning and statistical model based classifiers have increasingly been used with more complex and high dimensional biological data obtained from high-throughput technologies. Understanding the impact of various factors associated with large and complex microarray datasets on the predictive performance of classifiers is computationally intensive, under investigated, yet vital in determining the optimal number of biomarkers for various classification purposes aimed towards improved detection, diagnosis, and therapeutic monitoring of diseases. We investigate the impact of microarray based data characteristics on the predictive performance for various classification rules using simulation studies. Our investigation using Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour shows that the predictive performance of classifiers is strongly influenced by training set size, biological and technical variability, replication, fold change and correlation between biomarkers. Optimal number of biomarkers for a classification problem should therefore be estimated taking account of the impact of all these factors. A database of average generalization errors is built for various combinations of these factors. The database of generalization errors can be used for estimating the optimal number of biomarkers for given levels of predictive accuracy as a function of these factors. Examples show that curves from actual biological data resemble that of simulated data with corresponding levels of data characteristics. An R package optBiomarker implementing the method is freely available for academic use from the Comprehensive R Archive Network (http://www.cran.r-project.org/web/packages/optBiomarker/).
Concurrency and Computation: Practice and Experience | 2014
Lawrence Mitchell; Terence Sloan; Muriel Mewissen; Peter Ghazal; Thorsten Forster; Michal Piotrowski; Arthur Trew
The statistical language R is favoured by many biostatisticians for processing microarray data. In recent times, the quantity of data that can be obtained in experiments has risen significantly, making previously fast analyses time consuming or even not possible at all with the existing software infrastructure. High performance computing (HPC) systems offer a solution to these problems but at the expense of increased complexity for the end user. The Simple Parallel R Interface is a library for R that aims to reduce the complexity of using HPC systems by providing biostatisticians with drop‐in parallelised replacements of existing R functions. In this paper we describe parallel implementations of two popular techniques: exploratory clustering analyses using the random forest classifier and feature selection through identification of differentially expressed genes using the rank product method. Copyright
Methods of Information in Medicine | 2012
Michal Piotrowski; Gary A. McGilvary; Terence Sloan; Muriel Mewissen; Ashley D. Lloyd; Thorsten Forster; Lawrence Mitchell; Peter Ghazal; Jon Hill
BACKGROUND Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. OBJECTIVES Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazons Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. METHODS The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. RESULTS It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of the algorithm. Resource underutilization can further improve the time to result. End-users location impacts on costs due to factors such as local taxation. CONCLUSIONS Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds.
high performance distributed computing | 2010
Savvas Petrou; Terence Sloan; Muriel Mewissen; Thorsten Forster; Michal Piotrowski; Bartosz Dobrzelecki
The statistical language R and Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by these analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing (HPC) systems offer a solution to this issue. The Simple Parallel R INTerface (SPRINT) is a package that provides biostatisticians with easy access to HPC systems and allows the addition of parallelized functions to R. This paper will present how we added a parallelized permutation testing function in R using SPRINT and how this function performs on a supercomputer for executions of up to 512 processes.