Maria De Iorio
University College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Maria De Iorio.
Journal of the American Statistical Association | 2004
Peter Müller; Bruno Sansó; Maria De Iorio
We consider decision problems defined by a utility function and an underlying probability model for all unknowns. The utility function quantifies the decision makers preferences over consequences. The optimal decision maximizes the expected utility function where the expectation is taken with respect to all unknowns, that is, future data and parameters. In many problems, the solution is not analytically tractable. For example, the utility function might involve moments that can be computed only by numerical integration or simulation. Also, the nature of the decision space (i.e., the set of all possible actions) might have a shape or dimension that complicates the maximization. The motivating application for this discussion is the choice of a monitoring network when the optimization is performed over the high-dimensional set of all possible locations of monitoring stations, possibly including choice of the number of locations. We propose an approach to optimal Bayesian design based on inhomogeneous Markov chain simulation. We define a chain such that the limiting distribution identifies the optimal solution. The approach is closely related to simulated annealing. Standard simulated annealing algorithms assume that the target function can be evaluated for any given choice of the variable with respect to which we wish to optimize. For optimal design problems the target function (i. e., expected utility) is in general not available for efficient evaluation and might require numerical integration. We overcome the problem by defining an inhomogeneous Markov chain on an appropriately augmented space. The proposed inhomogeneous Markov chain Monte Carlo method addresses within one simulation both problems, evaluation of the expected utility and maximization.
Genetics | 2007
Clive J. Hoggart; Marc Chadeau-Hyam; Taane G. Clark; Riccardo Lampariello; John C. Whittaker; Maria De Iorio; David J. Balding
Simulation is an invaluable tool for investigating the effects of various population genetics modeling assumptions on resulting patterns of genetic diversity, and for assessing the performance of statistical techniques, for example those designed to detect and measure the genomic effects of selection. It is also used to investigate the effectiveness of various design options for genetic association studies. Backward-in-time simulation methods are computationally efficient and have become widely used since their introduction in the 1980s. The forward-in-time approach has substantial advantages in terms of accuracy and modeling flexibility, but at greater computational cost. We have developed flexible and efficient simulation software and a rescaling technique to aid computational efficiency that together allow the simulation of sequence-level data over large genomic regions in entire diploid populations under various scenarios for demography, mutation, selection, and recombination, the latter including hotspots and gene conversion. Our forward evolution of genomic regions (FREGENE) software is freely available from www.ebi.ac.uk/projects/BARGEN together with an ancillary program to generate phenotype labels, either binary or quantitative. In this article we discuss limitations of coalescent-based simulation, introduce the rescaling technique that makes large-scale forward-in-time simulation feasible, and demonstrate the utility of various features of FREGENE, many not previously available.
Nature Protocols | 2014
Jie Hao; Manuel Liebeke; William Astle; Maria De Iorio; Jacob G. Bundy; Timothy M. D. Ebbels
Data processing for 1D NMR spectra is a key bottleneck for metabolomic and other complex-mixture studies, particularly where quantitative data on individual metabolites are required. We present a protocol for automated metabolite deconvolution and quantification from complex NMR spectra by using the Bayesian automated metabolite analyzer for NMR (BATMAN) R package. BATMAN models resonances on the basis of a user-controllable set of templates, each of which specifies the chemical shifts, J-couplings and relative peak intensities for a single metabolite. Peaks are allowed to shift position slightly between spectra, and peak widths are allowed to vary by user-specified amounts. NMR signals not captured by the templates are modeled non-parametrically by using wavelets. The protocol covers setting up user template libraries, optimizing algorithmic input parameters, improving prior information on peak positions, quality control and evaluation of outputs. The outputs include relative concentration estimates for named metabolites together with associated Bayesian uncertainty estimates, as well as the fit of the remainder of the spectrum using wavelets. Graphical diagnostics allow the user to examine the quality of the fit for multiple spectra simultaneously. This approach offers a workflow to analyze large numbers of spectra and is expected to be useful in a wide range of metabolomics studies.
Analytical Chemistry | 2012
Joram M. Posma; Isabel Garcia-Perez; Maria De Iorio; John C. Lindon; Paul Elliott; Elaine Holmes; Timothy M. D. Ebbels; Jeremy K. Nicholson
We describe a new multivariate statistical approach to recover metabolite structure information from multiple (1)H NMR spectra in population sample sets. Subset optimization by reference matching (STORM) was developed to select subsets of (1)H NMR spectra that contain specific spectroscopic signatures of biomarkers differentiating between different human populations. STORM aims to improve the visualization of structural correlations in spectroscopic data by using these reduced spectral subsets containing smaller numbers of samples than the number of variables (n ≪ p). We have used statistical shrinkage to limit the number of false positive associations and to simplify the overall interpretation of the autocorrelation matrix. The STORM approach has been applied to findings from an ongoing human metabolome-wide association study on body mass index to identify a biomarker metabolite present in a subset of the population. Moreover, we have shown how STORM improves the visualization of more abundant NMR peaks compared to a previously published method (statistical total correlation spectroscopy, STOCSY). STORM is a useful new tool for biomarker discovery in the omic sciences that has widespread applicability. It can be applied to any type of data, provided that there is interpretable correlation among variables, and can also be applied to data with more than one dimension (e.g., 2D NMR spectra).
Genetic Epidemiology | 2013
Erika Cule; Maria De Iorio
To date, numerous genetic variants have been identified as associated with diverse phenotypic traits. However, identified associations generally explain only a small proportion of trait heritability and the predictive power of models incorporating only known‐associated variants has been small. Multiple regression is a popular framework in which to consider the joint effect of many genetic variants simultaneously. Ordinary multiple regression is seldom appropriate in the context of genetic data, due to the high dimensionality of the data and the correlation structure among the predictors. There has been a resurgence of interest in the use of penalised regression techniques to circumvent these difficulties. In this paper, we focus on ridge regression, a penalised regression approach that has been shown to offer good performance in multivariate prediction problems. One challenge in the application of ridge regression is the choice of the ridge parameter that controls the amount of shrinkage of the regression coefficients. We present a method to determine the ridge parameter based on the data, with the aim of good performance in high‐dimensional prediction problems. We establish a theoretical justification for our approach, and demonstrate its performance on simulated genetic data and on a real data example. Fitting a ridge regression model to hundreds of thousands to millions of genetic variants simultaneously presents computational challenges. We have developed an R package, ridge, which addresses these issues. Ridge implements the automatic choice of ridge parameter presented in this paper, and is freely available from CRAN.
Journal of the American Statistical Association | 2012
William Astle; Maria De Iorio; Sylvia Richardson; David A. Stephens; Timothy M. D. Ebbels
Nuclear magnetic resonance (NMR) spectra are widely used in metabolomics to obtain profiles of metabolites dissolved in biofluids such as cell supernatants. Methods for estimating metabolite concentrations from these spectra are presently confined to manual peak fitting and to binning procedures for integrating resonance peaks. Extensive information on the patterns of spectral resonance generated by human metabolites is now available in online databases. By incorporating this information into a Bayesian model, we can deconvolve resonance peaks from a spectrum and obtain explicit concentration estimates for the corresponding metabolites. Spectral resonances that cannot be deconvolved in this way may also be of scientific interest; so, we model them jointly using wavelets. We describe a Markov chain Monte Carlo algorithm that allows us to sample from the joint posterior distribution of the model parameters, using specifically designed block updates to improve mixing. The strong prior on resonance patterns allows the algorithm to identify peaks corresponding to particular metabolites automatically, eliminating the need for manual peak assignment. We assess our method for peak alignment and concentration estimation. Except in cases when the target resonance signal is very weak, alignment is unbiased and precise. We compare the Bayesian concentration estimates with those obtained from a conventional numerical integration method and find that our point estimates have six-fold lower mean squared error. Finally, we apply our method to a spectral dataset taken from an investigation of the metabolic response of yeast to recombinant protein expression. We estimate the concentrations of 26 metabolites and compare with manual quantification by five expert spectroscopists. We discuss the reason for discrepancies and the robustness of our methods concentration estimates. This article has supplementary materials online.
Statistics in Medicine | 2016
Menelaos Pavlou; Gareth Ambler; Shaun R. Seaman; Maria De Iorio; Rumana Z. Omar
Risk prediction models are used to predict a clinical outcome for patients using a set of predictors. We focus on predicting low‐dimensional binary outcomes typically arising in epidemiology, health services and public health research where logistic regression is commonly used. When the number of events is small compared with the number of regression coefficients, model overfitting can be a serious problem. An overfitted model tends to demonstrate poor predictive accuracy when applied to new data. We review frequentist and Bayesian shrinkage methods that may alleviate overfitting by shrinking the regression coefficients towards zero (some methods can also provide more parsimonious models by omitting some predictors). We evaluated their predictive performance in comparison with maximum likelihood estimation using real and simulated data. The simulation study showed that maximum likelihood estimation tends to produce overfitted models with poor predictive performance in scenarios with few events, and penalised methods can offer improvement. Ridge regression performed well, except in scenarios with many noise predictors. Lasso performed better than ridge in scenarios with many noise predictors and worse in the presence of correlated predictors. Elastic net, a hybrid of the two, performed well in all scenarios. Adaptive lasso and smoothly clipped absolute deviation performed best in scenarios with many noise predictors; in other scenarios, their performance was inferior to that of ridge and lasso. Bayesian approaches performed well when the hyperparameters for the priors were chosen carefully. Their use may aid variable selection, and they can be easily extended to clustered‐data settings and to incorporate external information.
Hypertension | 2013
Jeremiah Stamler; Ian J. Brown; Ivan K. S. Yap; Queenie Chan; Anisha Wijeyesekera; Isabel Garcia-Perez; Marc Chadeau-Hyam; Timothy M. D. Ebbels; Maria De Iorio; Joram M. Posma; Martha L. Daviglus; Mercedes R. Carnethon; Elaine Holmes; Jeremy K. Nicholson; Paul Elliott
African-Americans compared to non-Hispanic-White-Americans have higher systolic, diastolic blood pressure and rates of prehypertension/hypertension. Reasons for these adverse findings remain obscure. Analyses here focused on relations of foods/nutrients/urinary metabolites to higher African-American blood pressure for 369 African-Americans compared to 1,190 non-Hispanic-White-Americans ages 40-59 from 8 population samples. Standardized data were from four 24-hour dietary recalls/person, two 24-h urine collections, 8 blood pressure measurements; multiple linear regression quantitating role of foods, nutrients, metabolites in higher African-American blood pressure. Compared to non-Hispanic-White-Americans, African-Americans average systolic/diastolic pressure was higher by 4.7/3.4 mm Hg (men) and 9.0/4.8 mm Hg (women). Control for higher body mass index of African-American women reduced excess African-American systolic/diastolic pressure to 6.8/3.8 mm Hg. African American intake of multiple foods, nutrients related to blood pressure was less favorable - - less vegetables, fruits, grains, vegetable protein, glutamic acid, starch, fiber, minerals, potassium; more processed meats, pork, eggs, sugar-sweetened beverages, cholesterol, higher sodium to potassium ratio. Control for 11 nutrient and 10 non-nutrient correlates reduced higher African-American systolic/diastolic pressure to 2.3/2.3 mm Hg (52% and 33% reduction) (men) and to 5.3/2.8 mm Hg (21% and 27% reduction) (women). Control also for foods/urinary metabolites had little further influence on higher African-American blood pressure. Multiple nutrients with less favorable intakes by African-Americans than non-Hispanic-White-Americans account at least in part for higher African-American blood pressure. Improved dietary patterns can contribute to prevention/control of more adverse African-American blood pressure levels.Black compared with non-Hispanic white Americans have higher systolic and diastolic blood pressure and rates of prehypertension/hypertension. Reasons for these adverse findings remain obscure. Analyses here focused on relations of foods/nutrients/urinary metabolites and higher black blood pressure for 369 black compared with 1190 non-Hispanic white Americans aged 40 to 59 years from 8 population samples. Multiple linear regression, standardized data from four 24-hour dietary recalls per person, two 24-hour urine collections, and 8 blood pressure measurements were used to quantitate the role of foods, nutrients, and metabolites in higher black blood pressure. Compared with non-Hispanic white Americans, blacks’ average systolic/diastolic pressure was higher by 4.7/3.4 mm Hg (men) and 9.0/4.8 mm Hg (women). Control for higher body mass index of black women reduced excess black systolic/diastolic pressure to 6.8/3.8 mm Hg. Lesser intake of vegetables, fruits, grains, vegetable protein, glutamic acid, starch, fiber, minerals, and potassium, and higher intake of processed meats, pork, eggs, and sugar-sweetened beverages, along with higher cholesterol and higher Na/K ratio, related to in higher black blood pressure. Control for 11 nutrient and 10 non-nutrient correlates reduced higher black systolic/diastolic pressure to 2.3/2.3 mm Hg (52% and 33% reduction in men) and to 5.3/2.8 mm Hg (21% and 27% reduction in women). Control for foods/urinary metabolites had little further influence on higher black blood pressure. Less favorable multiple nutrient intake by blacks than non-Hispanic white Americans accounted, at least in part, for higher black blood pressure. Improved dietary patterns can contribute to prevention/control of more adverse black blood pressure levels.
BMC Urology | 2015
Kiren Gill; Harry Horsley; Anthony Kupelian; Gianluca Baio; Maria De Iorio; Sanchutha Sathiananamoorthy; Rajvinder Khasriya; Jennifer Rohn; Scott S.P. Wildman; James Malone-Lee
BackgroundAdenosine-5′-triphosphate (ATP) is a neurotransmitter and inflammatory cytokine implicated in the pathophysiology of lower urinary tract disease. ATP additionally reflects microbial biomass thus has potential as a surrogate marker of urinary tract infection (UTI). The optimum clinical sampling method for ATP urinalysis has not been established. We tested the potential of urinary ATP in the assessment of lower urinary tract symptoms, infection and inflammation, and validated sampling methods for clinical practice.MethodsA prospective, blinded, cross-sectional observational study of adult patients presenting with lower urinary tract symptoms (LUTS) and asymptomatic controls, was conducted between October 2009 and October 2012. Urinary ATP was assayed by a luciferin-luciferase method, pyuria counted by microscopy of fresh unspun urine and symptoms assessed using validated questionnaires. The sample collection, storage and processing methods were also validated.Results75 controls and 340 patients with LUTS were grouped as without pyuria (nu2009=u2009100), pyuria 1-9 wbc μl-1 (nu2009=u2009120) and pyuria ≥10 wbc μl-1 (nu2009=u2009120). Urinary ATP was higher in association with female gender, voiding symptoms, pyuria greater than 10 wbc μl-1 and negative MSU culture. ROC curve analysis showed no evidence of diagnostic test potential. The urinary ATP signal decayed with storage at 23°C but was prevented by immediate freezing atu2009≤u2009-20°C, without boric acid preservative and without the need to centrifuge urine prior to freezing.ConclusionsUrinary ATP may have a role as a research tool but is unconvincing as a surrogate, clinical diagnostic marker.
Metabolomics | 2016
Gregory D. Tredwell; Jacob G. Bundy; Maria De Iorio; Timothy M. D. Ebbels
IntroductionDespite the use of buffering agents the 1H NMR spectra of biofluid samples in metabolic profiling investigations typically suffer from extensive peak frequency shifting between spectra. These chemical shift changes are mainly due to differences in pH and divalent metal ion concentrations between the samples. This frequency shifting results in a correspondence problem: it can be hard to register the same peak as belonging to the same molecule across multiple samples. The problem is especially acute for urine, which can have a wide range of ionic concentrations between different samples.ObjectivesTo investigate the acid, base and metal ion dependent 1H NMR chemical shift variations and limits of the main metabolites in a complex biological mixture.MethodsUrine samples from five different individuals were collected and pooled, and pre-treated with Chelex-100 ion exchange resin. Urine samples were either treated with either HCl or NaOH, or were supplemented with various concentrations of CaCl2, MgCl2, NaCl or KCl, and their 1H NMR spectra were acquired.ResultsNonlinear fitting was used to derive acid dissociation constants and acid and base chemical shift limits for peaks from 33 identified metabolites. Peak pH titration curves for a further 65 unidentified peaks were also obtained for future reference. Furthermore, the peak variations induced by the main metal ions present in urine, Na+, K+, Ca2+ and Mg2+, were also measured.ConclusionThese data will be a valuable resource for 1H NMR metabolite profiling experiments and for the development of automated metabolite alignment and identification algorithms for 1H NMR spectra.