Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Elon Correa is active.

Publication


Featured researches published by Elon Correa.


Analytica Chimica Acta | 2015

A tutorial review: Metabolomics and partial least squares-discriminant analysis – a marriage of convenience or a shotgun wedding

Piotr S. Gromski; Howbeer Muhamadali; David I. Ellis; Yun Xu; Elon Correa; Michael L. Turner; Royston Goodacre

The predominance of partial least squares-discriminant analysis (PLS-DA) used to analyze metabolomics datasets (indeed, it is the most well-known tool to perform classification and regression in metabolomics), can be said to have led to the point that not all researchers are fully aware of alternative multivariate classification algorithms. This may in part be due to the widespread availability of PLS-DA in most of the well-known statistical software packages, where its implementation is very easy if the default settings are used. In addition, one of the perceived advantages of PLS-DA is that it has the ability to analyze highly collinear and noisy data. Furthermore, the calibration model is known to provide a variety of useful statistics, such as prediction accuracy as well as scores and loadings plots. However, this method may provide misleading results, largely due to a lack of suitable statistical validation, when used by non-experts who are not aware of its potential limitations when used in conjunction with metabolomics. This tutorial review aims to provide an introductory overview to several straightforward statistical methods such as principal component-discriminant function analysis (PC-DFA), support vector machines (SVM) and random forests (RF), which could very easily be used either to augment PLS or as alternative supervised learning methods to PLS-DA. These methods can be said to be particularly appropriate for the analysis of large, highly-complex data sets which are common output(s) in metabolomics studies where the numbers of variables often far exceed the number of samples. In addition, these alternative techniques may be useful tools for generating parsimonious models through feature selection and data reduction, as well as providing more propitious results. We sincerely hope that the general reader is left with little doubt that there are several promising and readily available alternatives to PLS-DA, to analyze large and highly complex data sets.


Numerical Algorithms | 2004

A Genetic Algorithm for Solving a Capacitated p-Median Problem

Elon Correa; Maria Teresinha Arns Steiner; Alex Alves Freitas; Celso Carnieri

Facility-location problems have several applications, such as telecommunications, industrial transportation and distribution. One of the most well-known facility-location problems is the p-median problem. This work addresses an application of the capacitated p-median problem to a real-world problem. We propose a genetic algorithm (GA) to solve the capacitated p-median problem. The proposed GA uses not only conventional genetic operators, but also a new heuristic “hypermutation” operator suggested in this work. The proposed GA is compared with a tabu search algorithm.


Chemical Science | 2014

Simultaneous detection and quantification of three bacterial meningitis pathogens by SERS

Kirsten Gracie; Elon Correa; Samuel Mabbott; Jennifer A. Dougan; Duncan Graham; Royston Goodacre; Karen Faulds

Bacterial meningitis is well known for its rapid onset and high mortality rates, therefore rapid detection of bacteria found in cerebral spinal fluid (CSF) and subsequent effective treatment is crucial. A new quantitative assay for detection of three pathogens that result in bacterial meningitis using a combination of lambda exonuclease (λ-exonuclease) and surface enhanced Raman scattering (SERS) is reported. SERS challenges current fluorescent-based detection methods in terms of both sensitivity and more importantly the detection of multiple components in a mixture, which is becoming increasingly more desirable for clinical diagnostics. λ-Exonuclease is a processive enzyme that digests one strand of double stranded DNA bearing a terminal 5′-phosphate group. The new assay format involves the simultaneous hybridisation of two complementary DNA probes (one containing a SERS active dye) to a target sequence followed by λ-exonuclease digestion of double stranded DNA and SERS detection of the digestion product. Three meningitis pathogens were successfully quantified in a multiplexed test with calculated limits of detection in the pico-molar range, eliminating the need for time consuming culture based methods that are currently used for analysis. Quantification of each individual pathogen in a mixture using SERS is complex, however, this is the first report that this is possible using the unique spectral features of the SERS signals combined with partial least squares (PLS) regression. This is a powerful demonstration of the ability of this SERS assay to be used for analysis of clinically relevant targets with significant advantages over existing approaches and offers the opportunity for future deployment in healthcare applications.


genetic and evolutionary computation conference | 2006

A new discrete particle swarm algorithm applied to attribute selection in a bioinformatics data set

Elon Correa; Alex Alves Freitas; Colin G. Johnson

Many data mining applications involve the task of building a model for predictive classification. The goal of such a model is to classify examples (records or data instances) into classes or categories of the same type. The use of variables (attributes) not related to the classes can reduce the accuracy and reliability of a classification or prediction model. Superuous variables can also increase the costs of building a model - particularly on large data sets. We propose a discrete Particle Swarm Optimization (PSO) algorithm designed for attribute selection. The proposed algorithm deals with discrete variables, and its population of candidate solutions contains particles of different sizes. The performance of this algorithm is compared with the performance of a standard binary PSO algorithm on the task of selecting attributes in a bioinformatics data set. The criteria used for comparison are: (1) maximizing predictive accuracy; and (2) finding the smallest subset of attributes.


Metabolites | 2014

Influence of Missing Values Substitutes on Multivariate Analysis of Metabolomics Data

Piotr S. Gromski; Yun Xu; Helen L. Kotze; Elon Correa; David I. Ellis; Emily G. Armitage; Michael L. Turner; Royston Goodacre

Missing values are known to be problematic for the analysis of gas chromatography-mass spectrometry (GC-MS) metabolomics data. Typically these values cover about 10%–20% of all data and can originate from various backgrounds, including analytical, computational, as well as biological. Currently, the most well known substitute for missing values is a mean imputation. In fact, some researchers consider this aspect of data analysis in their metabolomics pipeline as so routine that they do not even mention using this replacement approach. However, this may have a significant influence on the data analysis output(s) and might be highly sensitive to the distribution of samples between different classes. Therefore, in this study we have analysed different substitutes of missing values namely: zero, mean, median, k-nearest neighbours (kNN) and random forest (RF) imputation, in terms of their influence on unsupervised and supervised learning and, thus, their impact on the final output(s) in terms of biological interpretation. These comparisons have been demonstrated both visually and computationally (classification rate) to support our findings. The results show that the selection of the replacement methods to impute missing values may have a considerable effect on the classification accuracy, if performed incorrectly this may negatively influence the biomarkers selected for an early disease diagnosis or identification of cancer related metabolites. In the case of GC-MS metabolomics data studied here our findings recommend that RF should be favored as an imputation of missing value over the other tested methods. This approach displayed excellent results in terms of classification rate for both supervised methods namely: principal components-linear discriminant analysis (PC-LDA) (98.02%) and partial least squares-discriminant analysis (PLS-DA) (97.96%) outperforming other imputation methods.


Metabolomics | 2016

Data standards can boost metabolomics research, and if there is a will, there is a way

Philippe Rocca-Serra; Reza M. Salek; Masanori Arita; Elon Correa; Saravanan Dayalan; Alejandra Gonzalez-Beltran; Timothy M. D. Ebbels; Royston Goodacre; Janna Hastings; Kenneth Haug; Albert Koulman; Macha Nikolski; Matej Orešič; Susanna-Assunta Sansone; Daniel Schober; J. Smith; Christoph Steinbeck; Mark R. Viant; Steffen Neumann

Thousands of articles using metabolomics approaches are published every year. With the increasing amounts of data being produced, mere description of investigations as text in manuscripts is not sufficient to enable re-use anymore: the underlying data needs to be published together with the findings in the literature to maximise the benefit from public and private expenditure and to take advantage of an enormous opportunity to improve scientific reproducibility in metabolomics and cognate disciplines. Reporting recommendations in metabolomics started to emerge about a decade ago and were mostly concerned with inventories of the information that had to be reported in the literature for consistency. In recent years, metabolomics data standards have developed extensively, to include the primary research data, derived results and the experimental description and importantly the metadata in a machine-readable way. This includes vendor independent data standards such as mzML for mass spectrometry and nmrML for NMR raw data that have both enabled the development of advanced data processing algorithms by the scientific community. Standards such as ISA-Tab cover essential metadata, including the experimental design, the applied protocols, association between samples, data files and the experimental factors for further statistical analysis. Altogether, they pave the way for both reproducible research and data reuse, including meta-analyses. Further incentives to prepare standards compliant data sets include new opportunities to publish data sets, but also require a little “arm twisting” in the author guidelines of scientific journals to submit the data sets to public repositories such as the NIH Metabolomics Workbench or MetaboLights at EMBL-EBI. In the present article, we look at standards for data sharing, investigate their impact in metabolomics and give suggestions to improve their adoption.


Analytica Chimica Acta | 2014

A comparative investigation of modern feature selection and classification approaches for the analysis of mass spectrometry data.

Piotr S. Gromski; Yun Xu; Elon Correa; David I. Ellis; Michael L. Turner; Royston Goodacre

Many analytical approaches such as mass spectrometry generate large amounts of data (input variables) per sample analysed, and not all of these variables are important or related to the target output of interest. The selection of a smaller number of variables prior to sample classification is a widespread task in many research studies, where attempts are made to seek the lowest possible set of variables that are still able to achieve a high level of prediction accuracy; in other words, there is a need to generate the most parsimonious solution when the number of input variables is huge but the number of samples/objects are smaller. Here, we compare several different variable selection approaches in order to ascertain which of these are ideally suited to achieve this goal. All variable selection approaches were applied to the analysis of a common set of metabolomics data generated by Curie-point pyrolysis mass spectrometry (Py-MS), where the goal of the study was to classify the Gram-positive bacteria Bacillus. These approaches include stepwise forward variable selection, used for linear discriminant analysis (LDA); variable importance for projection (VIP) coefficient, employed in partial least squares-discriminant analysis (PLS-DA); support vector machines-recursive feature elimination (SVM-RFE); as well as the mean decrease in accuracy and mean decrease in Gini, provided by random forests (RF). Finally, a double cross-validation procedure was applied to minimize the consequence of overfitting. The results revealed that RF with its variable selection techniques and SVM combined with SVM-RFE as a variable selection method, displayed the best results in comparison to other approaches.


Analytical Chemistry | 2013

Optimization of Parameters for the Quantitative Surface-Enhanced Raman Scattering Detection of Mephedrone Using a Fractional Factorial Design and a Portable Raman Spectrometer.

Samuel Mabbott; Elon Correa; David P. Cowcher; J. William Allwood; Royston Goodacre

A new optimization strategy for the SERS detection of mephedrone using a portable Raman system has been developed. A fractional factorial design was employed, and the number of statistically significant experiments (288) was greatly reduced from the actual total number of experiments (1722), which minimized the workload while maintaining the statistical integrity of the results. A number of conditions were explored in relation to mephedrone SERS signal optimization including the type of nanoparticle, pH, and aggregating agents (salts). Through exercising this design, it was possible to derive the significance of each of the individual variables, and we discovered four optimized SERS protocols for which the reproducibility of the SERS signal and the limit of detection (LOD) of mephedrone were established. Using traditional nanoparticles with a combination of salts and pHs, it was shown that the relative standard deviations of mephedrone-specific Raman peaks were as low as 0.51%, and the LOD was estimated to be around 1.6 μg/mL (9.06 × 10(-6) M), a detection limit well beyond the scope of conventional Raman and extremely low for an analytical method optimized for quick and uncomplicated in-field use.


Metabolomics | 2015

COordination of Standards in MetabOlomicS (COSMOS): facilitating integrated metabolomics data access

Reza M. Salek; Steffen Neumann; Daniel Schober; Jan Hummel; Kenny Billiau; Joachim Kopka; Elon Correa; Theo H. Reijmers; Antonio Rosato; Leonardo Tenori; Paola Turano; Silvia Marin; Catherine Deborde; Daniel Jacob; Dominique Rolin; Benjamin Dartigues; Pablo Conesa; Kenneth Haug; Philippe Rocca-Serra; Steve O’Hagan; Jie Hao; Michael van Vliet; Marko Sysi-Aho; Christian Ludwig; Jildau Bouwman; Marta Cascante; Timothy M. D. Ebbels; Julian L. Griffin; Annick Moing; Macha Nikolski

Abstract Metabolomics has become a crucial phenotyping technique in a range of research fields including medicine, the life sciences, biotechnology and the environmental sciences. This necessitates the transfer of experimental information between research groups, as well as potentially to publishers and funders. After the initial efforts of the metabolomics standards initiative, minimum reporting standards were proposed which included the concepts for metabolomics databases. Built by the community, standards and infrastructure for metabolomics are still needed to allow storage, exchange, comparison and re-utilization of metabolomics data. The Framework Programme 7 EU Initiative ‘coordination of standards in metabolomics’ (COSMOS) is developing a robust data infrastructure and exchange standards for metabolomics data and metadata. This is to support workflows for a broad range of metabolomics applications within the European metabolomics community and the wider metabolomics and biomedical communities’ participation. Here we announce our concepts and efforts asking for re-engagement of the metabolomics community, academics and industry, journal publishers, software and hardware vendors, as well as those interested in standardisation worldwide (addressing missing metabolomics ontologies, complex-metadata capturing and XML based open source data exchange format), to join and work towards updating and implementing metabolomics standards.


BMC Bioinformatics | 2011

A genetic algorithm-Bayesian network approach for the analysis of metabolomics and spectroscopic data: application to the rapid identification of Bacillus spores and classification of Bacillus species

Elon Correa; Royston Goodacre

BackgroundThe rapid identification of Bacillus spores and bacterial identification are paramount because of their implications in food poisoning, pathogenesis and their use as potential biowarfare agents. Many automated analytical techniques such as Curie-point pyrolysis mass spectrometry (Py-MS) have been used to identify bacterial spores giving use to large amounts of analytical data. This high number of features makes interpretation of the data extremely difficult We analysed Py-MS data from 36 different strains of aerobic endospore-forming bacteria encompassing seven different species. These bacteria were grown axenically on nutrient agar and vegetative biomass and spores were analyzed by Curie-point Py-MS.ResultsWe develop a novel genetic algorithm-Bayesian network algorithm that accurately identifies sand selects a small subset of key relevant mass spectra (biomarkers) to be further analysed. Once identified, this subset of relevant biomarkers was then used to identify Bacillus spores successfully and to identify Bacillus species via a Bayesian network model specifically built for this reduced set of features.ConclusionsThis final compact Bayesian network classification model is parsimonious, computationally fast to run and its graphical visualization allows easy interpretation of the probabilistic relationships among selected biomarkers. In addition, we compare the features selected by the genetic algorithm-Bayesian network approach with the features selected by partial least squares-discriminant analysis (PLS-DA). The classification accuracy results show that the set of features selected by the GA-BN is far superior to PLS-DA.

Collaboration


Dive into the Elon Correa's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yun Xu

University of Manchester

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David I. Ellis

University of Manchester

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David M. Lee

University of Manchester

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge