[PDF] Advancing Standards-Free Methods for the Identification of Small Molecules in Complex Samples

Abstract

The current gold standard for unambiguous identification in metabolomics analysis is based on comparing two or more orthogonal properties from the analysis of authentic, pure reference materials (standards) to experimental data acquired in the same laboratory with the same analytical methods. This represents a significant limitation for comprehensive chemical identification of small molecules in complex samples since this process is time-consuming and costly, and the majority of molecules are not yet represented by standards, leading to a need for standards-free identification. To address this need, we are advancing chemical property calculations and developing multi-attribute scoring and matching algorithms to utilize data from multiple analytical platforms through the utilization and creation of the in silico Chemical Library Engine (ISiCLE) and the Multi-Attribute Matching Engine (MAME). Here, we describe our results in a blinded analysis of synthetic chemical mixtures as part of the U.S. Environmental Protection Agency's (EPA) Non-Targeted Analysis Collaborative Trial (ENTACT). The blinded false negative rate (FNR), false discovery rate (FDR), and accuracy were 57%, 77%, and 91%, respectively. For high confidence identifications, the FDR was 35%. After unblinding of the sample compositions, we improved our approach by optimizing the scoring parameters used to increase confidence. The final FNR, FDR, and accuracy were 67%, 53%, and 96%, respectively. For high confidence identifications, the FDR was 10%. This study demonstrates that standards-free small molecule identification and multi-attribute matching methods can significantly reduce reliance on standards.

Full PDF

JJamie R. Nuñez , Sean M. Colby , Dennis G. Thomas , Malak M. Tfaily , Nikola Tolic , Elin M. Ul-rich , Jon R. Sobus , Thomas O. Metz , Justin G. Teeguarden , Ryan S. Renslow Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, USA. U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, Re-search Triangle Park, NC, USA. Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR, US * [email protected]

ABSTRACT:

Chemical Library Engine (ISiCLE) and the Multi-Attribute Matching Engine (MAME). Here, we describe our results in a blinded analysis of synthetic chemical mixtures as part of the U.S. Environmental Protection Agency’s (EPA) Non-Targeted Analysis Collaborative Trial (ENTACT). The blinded false negative rate (FNR), false discovery rate (FDR), and accuracy were 57%, 77%, and 91%, respectively. For high confidence identifications, the FDR was 35%. After unblinding of the sample compositions, we improved our approach by opti-mizing the scoring parameters used to increase confidence. The final FNR, FDR, and accuracy were 67%, 53%, and 96%, respectively. For high confidence identifications, the FDR was 10%. This study demonstrates that standards-free small molecule identification and multi-attribute matching methods can significantly reduce reliance on standards.

INTRODUCTION

Conventional metabolomics and small molecule identification approaches have demonstrated immense value for disease diagno-sis, evaluation of environmental exposures, and discovery of novel molecules. This success is reflected in the large number of recent biomedical research, environmental exposure studies, and soil and ecology publications employing metabolomics approaches.

In contrast to genetic and proteomic information available from rapid genome sequencing and proteome characterization, far less is un-derstood about the totality of human exposure and small molecules found in the environment.

Furthermore, driven by a broader in-terest in understanding biological impacts of chemical exposures, biomonitoring is undergoing a significant evolution.

Tradi-tional biomonitoring approaches, either targeted (seeking to iden-tify specific compounds) or non-targeted (seeking to identify as many compounds as possible), and using either low or high reso-lution mass spectrometry, rely on authentic, pure reference ma-terials (standards) for unambiguous chemical identification, and are therefore limited to the subset of molecules for which these stand-ards exist. A wealth of information about human exposure con-tinues to emerge from these methods for a subset of chemical space confined to a priority list of molecules represented by standards. The Centers for Disease Control and Prevention (CDC) National Health and Nutrition Examination Survey (NHANES) program and the National Institutes of Health (NIH) Children's Health Exposure Analysis Resource (CHEAR) centers have provided such data, leading to examples of successful applications of these methods.

Recent proposals to characterize the whole metabolome and exposome—the aggregate of all exposures—are driving a shift from traditional quantitative analytical chemistry and the typical strictures for chemical identification to new methods applicable to the discovery of molecules for which there are standards. The vast chemical space of the metabolome and exposome together includes endogenous (e.g., molecular transducers and microbiomes) and ex-ogenous (e.g., xenobiotics, industrial chemicals, consumer prod-ucts, and transformation products of these) chemicals. There are not enough authentic reference materials for the preponderance of these molecules. For example, using an automated script, we found only 17% of compounds found in the Human Metabolome Data-base, HMDB, and less than 2% of compounds found in exposure chemical databases like the EPA DSSTox (comptox.epa.gov) can be purchased in pure form. Without chemical standards, unambig-uous chemical identification is limited to the small number of mol-ecules amenable to nuclear magnetic resonance spectroscopy- or crystallography-based structural elucidation while the vast majority is left as chemical “dark matter”. The need for more comprehen-sive and unambiguous chemical identification in these studies is driving innovations in analytical chemistry, computational chemis-try and cheminformatics.

For example, new targeted and non-targeted methods have emerged as adaptions to traditional analyti-cal chemistry. We are advancing standards-free metabolomics, the identifica-tion of small molecules without reliance upon standards, through the use of calculated chemical properties and associated matching using multiple experimental attributes (i.e., multi-attribute match-ing). Our approach currently relies on multiple experimental data types, including accurate mass, isotopic distribution, and colli-sional cross section (CCS), and comparison of these values to en-tries in in silico libraries, leveraging instrumental and computa-tional innovations developed at Pacific Northwest National Labor-atory (PNNL).

To evaluate our methodology, we participated in the U.S. Envi-ronmental Protection Agency’s (EPA) Non-Targeted Analysis Col- laborative Trial (ENTACT), an inter-laboratory challenge estab-lished to provide a consistent set of verified, blinded synthetic mix-tures for the objective testing of non-targeted analytical chemistry methods.

We performed blinded analysis of 10 synthetic mix-tures each containing an, at the time, unknown number of sub-stances as part of the multi-laboratory challenge. Accurate mass, isotopic signature, and CCS measurements were collected using ion-mobility spectrometry-mass spectrometry (IMS-MS) and ultra-high resolution 21-Tesla Fourier transform ion cyclotron reso-nance–mass spectrometry (FTICR-MS) (Figure 1). These proper-ties were also calculated for each molecule in our processed form of the EPA Toxicity Forecaster (ToxCast) library, allowing us to match observed features (e.g., peaks characterized by a measured mass and intensity, or a measured mass, CCS and intensity) to li-brary entries. After unblinding, we performed statistical analysis on the results in order to find how well our method performed com-pared to others. Scoring parameters were then optimized to im-prove our method and to better understand the importance of each parameter in our scoring algorithm. Our findings demonstrate the potential of standards-free small molecule identification methods, particularly the value of using calculated, orthogonal properties, such as CCS and accurate mass, and multi-attribute matching to increase confidence in compound identification and significantly reduce reliance on standards. MATERIALS AND METHODS . Ten synthetic mix-tures were provided by the EPA. Each mixture contained an un-known number of substances (later revealed as 95-365 substances) in dimethylsulfoxide (DMSO), with all substances selected from EPA’s ToxCast chemical library . Further details on ENTACT are outlined in Sobus et al. (2017) and Ulrich et al. (2018). . Briefly, samples were analyzed using both an Agilent 6560 drift tube ion mobility spectrometry-quadrupole time-of-flight mass spectrometer (IMS-MS) and a 21-Tesla Fourier transform-ion cyclotron resonance spectrometer coupled to a Velos Pro dual linear quadrupole mass spectrometer (FTICR-MS) in both positive (+) and negative (-) ionization modes (Figure 1). IMS-MS samples were analyzed using electrospray ion-ization (ESI) and atmospheric pressure photoionization (APPI). FTICR-MS samples were analyzed using ESI only. Samples were analyzed in triplicate using IMS-MS and in singlet using FTICR-MS. This resulted in 14 disparate experimental data sets per sam-ple. Appropriate sample blanks provided by the EPA were also an-alyzed using each instrument method. Extensive details regarding the experimental protocol for sample preparation and both mass spectrometry methods are provided in Supporting Information (SI) 1.0-3.0. . Mass, CCS, and isotopic sig-nature were calculated for the [M+H] + , [M+Na] + , and [M-H] - ad-ducts of each entry in the suspect library. We recently developed an automated high-accuracy method for calculating CCS and other chemical properties called the in silico Chemical Library En-gine (ISiCLE), only requiring chemical structure information (e.g., as provided by the InChI ). The ISiCLE module for calculating IMS CCS for molecules has three methods for calculating CCS — Standard , Lite , and

AIMD - based — of which the Standard and

Lite methods were used for this study. Complete details regarding CCS calculation methods are provided in SI 4.0. At its current stage of development, the

Stand-ard method has an average error of 3.2% and the

Lite method has an average error of 6.7% (SI Table 1); however, the Lite method is much less computationally intensive, making it more than 200 times faster. The

Standard method was used for calculating the CCS of all three adducts from a selected subset of 1,000 molecules that showed significant evidence (early in our analysis) of being

Figure 1. Project overview. Detailed project flow, starting from the blinded mixtures and ToxCast Library (the given suspect screening library). After instrumental analysis of the mixtures and computational property calculations for library entries, our multi-attribute scoring algorithm was used for assigning confidence and identifying compounds likely to be present in each mixture. Note substances can be composed of one or more molecules that separate upon solvation in liquid. Molecules are single molecular structures. present in the mixtures. The

Lite method was then used for the re-maining molecules, as an appropriate tradeoff between accuracy and computational cost based on the scope of the project. The CCS calculation method and results for each entry in the suspect library are provided in the Supplemental Data. Details on how we pro-cessed the ToxCast library to generate our suspect library are pro-vided in SI 5.1. Ecipex was used to calculate the isotopic signature of each ad-duct of each molecule in the suspect library. Once high-mass reso-lution data was collected using FTICR-MS, there was evidence for a significant presence of chlorinated compounds, leading to the ad-ditional calculation of chlorinated library entries (giving a total of four adducts per molecule with calculated isotopic signatures). For-mularity was then used to match calculated and observed isotopic signatures. Note, Formularity was not used on the IMS-MS data sets due to instrumental error being too high to reliably assign for-mulae to potential isotopic signatures. More details on how Formu-larity was used are provided in SI 3.2. . We developed a comprehensive identification package, the Multi-At-tribute Matching Engine (MAME), which includes feature down-selection and a scoring system to provide confidence scores (bro-ken into low, medium, and high confidence). The confidence scores are increased by the number and quality of experimental features that match to those in the in silico library for a given entry and pro-vide increasing evidence for the presence of the molecule in the sample. We scored the confidence of suspect library entry being in each mixture using our weighting method. Note that the method described here does not label specific fea-tures as belonging to a specific compound (i.e., directly linking a feature arising due to instrument response to a specific compound), which is common in the literature. Instead, our scoring system con-siders all evidence indicating the presence of a compound, where multiple features consistent with possible instrument responses of a compound increase the probability of that compound’s presence. The focus is to connect the experimental evidence to the presence of specific compounds, rather than attempt to prove that specific features resulted from specific compounds. This is an important distinction as it is not always possible to label a feature as belong-ing to a particular compound, especially in the case of complex samples. Instead, we use multiple experimental features to lend confidence to the presence of a compound within a sample, without attempting to label individual features. Downselection of candidate features and molecular library en-tries and confidence scoring were performed using MAME, which processed all 14 disparate raw data sets per sample to achieve standards-free, multi-attribute, aggregate evidence-based molecu-lar identification. A set of parameter cutoffs were used for data pre-processing (Table 1). For example, for an IMS-MS feature to be counted toward the confidence score of a molecule, it needed to (i) be observed in all three technical replicates, (ii) have a signal inten-sity ≥ 1000 (arbitrary units), (iii) have a mass measurement error ≤ ±6 ppm, and (iv) not have been observed in more than one blank (which also had three technical replicates). For an FTICR-MS fea-ture to be counted toward the confidence score of a molecule, it needed to (i) have a mass measurement error ≤ ±1.5 ppm, and (ii) not have been observed in the blank run. Initially, these cutoff cri-teria were chosen based on expert domain knowledge. Once all analytical features were processed and matched to cor-responding entries in the suspect library, we scored the confidence of each library entry being in each mixture using MAME, which uses a total of 11 independent scoring parameters (Table 2). These parameters were initially selected based on expert domain knowledge in our group, since this type of study had not been pur-sued previously. A library entry was labeled as “present” in the mixture if its confidence score was 6.0 or more. We decided evi-dence amounting to this score (e.g., observing a high intensity FTICR-MS feature (4 points) for a library entry with a unique mass (2 points)) was enough to earn this label. Confidence scores of 6.0-11.0, 11.0-19.0, and 19.0+ were labelled as low, medium, and high confidence, respectively. In addition, we apply the level system de-veloped by Schymanski et al., which can be used to evaluate the level of confidence based on evidence provided by orthogonal fea-tures. Based on the given definitions, we use Level 2a to indicate identification based on mass and CCS, Level 4 for mass and iso-topic signature, and Level 5 for identification based on mass alone. A more detailed description of MAME is included in SI 5.2-5.3 and the full software package is available upon request. As an example, Figure 2 shows how pioglitazone was scored and correctly labeled as present in one of the mixtures. As metrics to quantify success, we used false discovery rate (FDR, the percentage of false positives out of the total number of compounds labeled as present), false negative rate (FNR, also known as the

Table 1. Parameter cutoffs used for scoring and downselection. a Arbitrary units. b A library entry’s mass is considered unique if its nearest neighbor library entry is more than 6 ppm away.

Category Parameter Cutoff IMS-MS

Intensity ≥ 1000 a.u. a Mass Error (Magnitude) ≤ ± 6 ppm

FTICR-MS

Intensity ≥ 1 a.u. Mass Error (Magnitude) ≤ ± 1.5 ppm

IMS-MS & FTICR-MS

High Intensity ≥ 30th %ile Low Intensity < 30th %ile

Library

Unique Mass b > ± 6 ppm Large Mass ≥ 200 Da Table 2. Initial scoring criteria and their associated weights. a Earned a maximum of one time per library entry

Category Index Criteria Weight IMS-MS

1 High Intensity 2.0 2 Low Intensity 1.0 3 Low CCS Error 3.0

FTICR-MS

4 High Intensity 4.0 5 Low Intensity 2.0 6 Isotopic Signature 3.0 a IMS-MS & FTICR-MS

7 Additional Adducts 1.0 8 Additional Features 0.5 9 Detected by Both MS 2.0

Library

10 Unique Mass 4.0 11 Large Mass 1.0 miss rate, the percentage of false negatives out of how many com-pounds were spiked in by the EPA), and accuracy (the percentage of correct labels). Equations for each of these metrics are provided in SI 5.4. The overall goal of our method is to minimize FDR and FNR, while maximizing overall accuracy. When reporting these values here, we use the average across the ten mixtures. These met-rics (and more) are broken down for each mixture in the Supple-mental Data. It is important to note our analysis and statistics were based on identifying which structures were observed, not on identifying the correct parent compound. For example, cyclohexylamine and cy-clohexylamine hydrochloride were both suspects provided in the ToxCast library. Since these are indistinguishable in solution, they were grouped into a single entry within our suspect library. We de-fine successful identification of these compounds to mean we re-port both parent compounds as potential candidates when one or both was spiked into a mixture.

Figure 2. Example scoring of pioglitazone, a true positive evaluated using our multi-attribute scoring system. Note, pioglitazone hydro-chloride was in the ToxCast library, then changed to pioglitazone (the structure present in solution) in our processed library. a) Library entry for pioglitazone, a phenol ether drug (sold as Actos) used to control high blood sugar in patients with type 2 diabetes, with calculated CCS (using standard ISiCLE) for the three adduct types and its calculated isotopic signature. b-e) IMS-MS features observed within a ±6 ppm mass error window of a given adduct mass. A magnified view is provided, centered around the calculated mass and CCS, with the mass and CCS ranges extending 6 ppm and 20 Å, respectively, on either side of this average. Percentages are in respect to the calculated CCS. Red points indicate the experimental feature closest to our prediction. f) Combined scoring of all features. The number of features matching a specified criterion, or whether the criterion was met, is provided in the “Observed” column. After unblinding, we used Monte Carlo and particle swarm op-timization (PSO, via PySwarm ) methods, implemented in Python scripts, to select new weights for each scoring criteria using an ob-jective function to maximize the area under the precision-recall curve (AUPR). AUPR is generated by determining precision and recall, which can be derived directly from FDR and FNR, respec-tively, parameterized by a minimum confidence score cutoff. This enables performance of the scoring weights to be assessed without an explicit cutoff selection for the score, which is a nontrivial deci-sion with implications beyond the scope of this work. AUPR was also selected as the objective function due to its relatively good per-formance compared to other classifiers when dealing with imbal-anced datasets (i.e., significantly more true negatives compared to true positives). Further details are provided in SI 5.5.

RESULTS AND DISCUSSION

The foundation of our standards-free approach is the in silico construction of a library of chemical properties used to characterize experimental data collected for each sample. Our method operates by considering the consistency between the library of predicted properties and the observed analytical features, and subsequently quantifying and weighting their similarity. Calculated scores based on the evaluation of experimental features matched to library en-tries allow us to determine a single confidence score for each li-brary entry and, ultimately, whether there is enough evidence to indicate a given compound is present in a sample. . As part of this challenge, the EPA provided the ToxCast library as the sus-pect library (mixtures were only spiked with ToxCast substances). We processed all substances within this library as described in SI 5.1, which lead to a suspect library of 4,348 total compounds that are theoretically observable by mass spectrometry. Approximately 50% of this library was not identifiable based on mass alone with an experimental mass error of ±6 ppm (Figure 3a). Further, 47% of library entries have at least one other formula con-flict within the ToxCast and over 13% had five or more conflicts. Even perfect mass accuracy has high collision rates in nearly all chemical libraries, and thus high-resolution mass instruments alone are inadequate for high-accuracy identification without comple-mentary, orthogonal data. CCS is a chemical property that provides additional information on which to increase the uniqueness of each library entry (Figure 3b). This is especially powerful when considering the CCS of each adduct as independent information, effectively adding corroborat-ing dimensions of data for each adduct with a known CCS. Beyond the need to add additional dimensions beyond mass, CCS can also increase confidence of a molecule being present in the mixture by providing additional evidence and Schymanski Level 2a confi-dence rather than Level 4 or 5. Ultimately, the addition of CCS in-creased the confidence score of 90% of compounds that were cor-rectly determined to be present in the samples. . A total of 14 data sets were successfully generated per sample (plus additional data for blanks). The raw IMS-MS data included ~200,000 total m/z -CCS features observed across triplicate anal-yses and ~475,000 total m/z features observed across FTICR-MS analyses (Figures S1-2). This data showed evidence of many dif-ferent adducts, as indicated by the high number of features and a significant presence of multimers, as indicated by frequent obser-vations of features with extremely high CCS: m/z ratios (Figures S3-6 and see Figure 3b for the much tighter m/z -CCS distribution of the library when considering [M+H] + features). To help re-duce noise and low-level contamination, we downselected to a sub-set of features using constraints based on feature intensity and pres-ence across technical replicates (and absence in blanks), applying the cutoffs described in Table 1. For IMS-MS, an intensity cutoff of 1,000 was set for all features in addition to requiring that each feature must have been observed across all three technical replicates and no more than once across the corresponding blank replicates. This removed 94% of features and improved confidence that those remaining were from mole-cules present in the sample. There was still significant evidence of multimer formation in our positive mode ESI-IMS-MS analyses (Figure S3b) but, with no way to ensure these were removed with-out losing possible overlapping monomer features, we decided to move forward, understanding that most of the suspected multimer features would not match the CCS values of library entries. For FTICR-MS, we did not find a reliable method to apply an intensity cutoff, so the intensity cutoff was trivially set to 1. Since there was only a single replicate for each condition (due to limited sample), any feature seen at any intensity in the blank was removed, leading to 9% of features being removed. . Before unblinding the true compositions of the mixtures, we performed multi-attribute match-ing by comparing the measured properties of downselected experi-mental features to our in silico library of calculated properties, and scoring each putative match using values given in Table 2. Note, for IMS-MS, the high intensity cutoff (i.e., the 30th percentile value of downselected features) was 2,123 and 2,174 for positive and negative mode, respectively. For FTICR-MS, the high intensity cutoff was 3,358 and 110 for positive and negative mode, respec-tively. An example of our multi-attribute scoring method is demon-strated in Figure 2. This same analysis was performed for all library entries, taking into consideration all 14 disparate data sets (and blanks), using our Python module, MAME, resulting in a confidence score for each library entry for each mix. We submitted the list of compounds la-beled as present in each mixture (and their associated confidence

Figure 3. CCS is a chemical property that increases each li-brary entry’s uniqueness. (a) Number of molecule entries in the ToxCast library (shown as a percentage of total library size) whose protonated masses fall within ±6 ppm of another entry. Zero (black bar) indicates no neighbors within this mass range (a molecule that can be resolved with mass alone, 2,216 total). The grey bars represent molecules (2,130 total) that cannot be distinguished based on mass alone, within an instrumental error of ±6 ppm. (b) Calculated CCS vs. m/z for the protonated forms of each molecule in the ToxCast library. The inset shows the example of m/z 357.3005, where 3 mole-cules lie within ±6 ppm of one another. When adding the property of CCS, all 3 molecules are predicted to become an-alytically unique within our specified parameter thresholds. scores and confidence levels) to the EPA, who then unblinded the samples by returning the sample key to enable the assessment of our approach. An overview of the results is shown in Figure 4a-d. Our overall FDR, FNR, and accuracy was 77%, 57%, and 91%, respectively. For high confidence (confidence score of 19.0 or more) Schymanski Level 2a (probable structure provided by mass and CCS) identifications, FDR was 35%. Additionally, FDR had a smooth inverse trend with the magnitude of confidence score as-signed by our algorithm (Figure 4b). We also showed the capability to distinguish between compounds with the same mass (including isomers) (Figure S7). One major issue, which caused a high FDR, was that we rou-tinely determined 300-500 molecules to be present in sample mix-tures designed to contain 95-365 substances (Figure 4c). We hy-pothesized the high occurrence of false positives was attributable to one or more of the following factors: (i) noise present in raw data; (ii) low confidence score cutoff; (iii) detection of molecules that were in the suspect library, but unintentionally present in the samples due to reactions occurring in the highly concentrated mix-tures; and/or (iv) multimer formation during the ionization process due to high sample concentrations. In the case of multimers, we hypothesized these formed during ESI, remained as multimers upon entry and flight through the IMS drift tube, and then dissoci-ated to the constituent monomer prior to arriving at the MS detec-tor. Support for this hypothesis was provided by much higher ob-served CCS values than expected with corresponding m/z values that were consistent with monomers. Because our criteria for label-ing a compound as present required associated experimental fea-tures to be observed across all three technical replicates (in the case of IMS-MS features), and minimal presence (observed once at the most) in blanks, it seems unlikely that low levels of contamination were the cause of the high FDR. Chemical reactions that produce molecules found in the library, such as hydroxylation, are possible at the high concentrations found in the mixture. The EPA confirmed that each molecule was spiked in at approximately 0.05 mM. As a clear example, Figure 5 shows tamoxifen and 4-hydroxytamoxifen (a hydroxylated form of ta-moxifen), molecules both found in the suspect library and both re-ceiving high confidence scores (45.5 and 25, respectively) in the same sample. However, only tamoxifen was classified as a true positive since it was spiked into the mixture, whereas 4-hydroxta-moxifen was not. It is possible 4-hydroxytamoxifen may not be a genuine false positive and instead could have been formed in situ, due to reactions within the mixture. Additionally, it is important to note the importance of choosing a cutoff (i.e., minimum score to be labeled as present) that best re-flects the desired balance of true positives to true negatives. For example, during a forensics study, it may be desirable to decrease the number of false positives and therefore a higher cutoff would be needed. This would decrease FDR but also increase FNR. For example, in our case, increasing the cutoff from 6 to 19 leads to an FDR of 35%, FNR of 81%, and accuracy of 96%. We then optimized the cutoff by finding the one that yielded the highest F1 score (a function of FNR and FDR, equation provided in SI 5.5). We found a cutoff of 9.5 (and using the same set of weights as our blinded approach) decreased FDR by 14% (to 63%), increased FNR by 9% (to 66%), and increased accuracy by 4% (to 95%) (Figure S8).

Figure 4. Results using our standards-free multi-attribute matching methods. (a-d) Blinded method results. (e-h) Optimized (weights chosen using Monte Carlo) results. (a, e) AUPR curve, with red dot showing our cutoff (a total confidence score of 6.0 and 11.2 for blinded and optimized approach, respectively). Please refer to the SI for details on the highest F1 score. (b, f) FDR as a function of confidence score. (c, g) Comparison between the number of molecules identified compared to the number of molecules spiked into each mixture. (d, h) FDR for each of the mixtures individually, split by confidence levels.

Figure 5. Tamoxifen and 4-hydroxytamoxifen. Both were identified with high confidence in the same mixture but only tamoxifen was actually present.

Based on these initial results, we concluded our approach worked well, but would likely be improved by optimizing our scor-ing parameters and cutoff ranges for each confidence category. Be-yond finding which false positives were present due to the reasons stated earlier, this was the most powerful way to improve our over-all results and learn more about our algorithm before broader appli-cation. . To determine the importance of each scoring parame-ter and to increase the accuracy of our approach, we set out to op-timize our scoring method and subsequent confidence level cutoffs. The results of the Monte Carlo and particle swarm optimization methods are provided in the SI (SI 5.7, Figures S9-10). Optimiza-tion results were used to better understand the effect of each param-eter and to update weights (Table S2), ultimately decreasing our combined FNR and FDR (Figure 4e-h).

CONCLUSIONS

The capability to routinely measure and identify even a modest fraction of biologically, environmentally, or medically important chemicals within all of chemical space remains one of the grand challenges in science. The vast majority of molecules are not rep-resented by standards. Furthermore, data for even fewer molecules have been added to reference libraries for use in identification (li-braries currently cover much less than 1% of chemical space). This limit has remained a major constraint for decades in the global search for chemical biomarkers of disease, toxin exposure, and af-filiated efforts in the search for new drug candidates and attempts to sequence the complete metabolome. It is clear that relying on a single instrument and slow, costly establishment of reference li-braries in the laboratory, restricted to standards available for pur-chase, is not a viable approach for identifying the tens-to-hundreds of thousands of small molecules in complex biological or environ-mental samples. Through advances in instrumentation, computa-tion, and data integration, there has been a push for a shift in metab-olomics and exposomics toward standards-free, multi-attribute identification, in which the use of multiple molecular properties, accurately predicted computationally and consistently measured experimentally, are used for comprehensive identification of small molecules without the need for standards. Our findings, both pre- and post-optimization, show great value in using standards-free, multi-attribute based identification meth-ods. Furthermore, the addition of CCS increased confidence for true positives and was able to distinguish between isomers, even with our team’s most rapid and least accurate CCS calculation method used for most molecules. To improve our results in the fu-ture, we will need to add additional capabilities that can be pre-dicted or calculated. This indicates the value for future use of additional identification “dimensions”, such as MS/MS fragmentation patterns, chromato-graphic retention time, more accurate prediction of adduct for-mation (e.g., additional metal ion adducts not considered here), and infrared or Raman spectra. Complete standards-free identification, for even large library sizes, and potentially the complete molecular universe, may become possible through use of multiple accurately measured and calculated chemical properties. The value in increas-ing accuracy of analytical and computational methods is important; however, adding orthogonal chemical properties for all researchers in the field to use will aid in the identification of small molecules and will be essential for addressing major challenges within metab-olomics. As additional chemical properties are added to this pipe-line, the “distance” between the features of each library entry will become dramatically larger, thereby requiring a lower resolution for each property. The so-called “curse of dimensionality” can be used for our benefit to turn each library entry into a unique or nearly-unique set of chemical properties with no overlapping neighbors. As metabolomics evolves and computational libraries are used more frequently, associated methods could eventually challenge the field’s current definition of, and requirements for, identification. While it is not possible to measure values such as accuracy in real (i.e., non-synthetic) complex mixtures, the approach described here was developed using blinded results. In future studies, we plan to again validate this approach using the optimized scoring param-eters on other synthetic mixtures and real samples where molecules have already been identified with standards. Consistent low FNR, FDR, and accuracy with the same scoring system will show the use and reliability of our method in complex sample identification.

ASSOCIATED CONTENT

The Supporting Information is availa-ble free of charge on the ACS Publications website. SupportingInformation.pdf: Includes further detailed methods and additional figures. SupportingData.xlsx: Includes our suspect library, property pre-dictions, and results broken down for each mixture in this chal-lenge. Table captions are provided in SupportingInformation.pdf

AUTHOR INFORMATION

Ryan S. Renslow - [email protected] Justin G. Teeguarden - [email protected] Thomas O. Metz – [email protected] The manuscript was written through con-tributions of all authors. All authors have given approval to the final version of the manuscript.

ACKNOWLEDGMENTS

This research was partially supported by the Genomic Science Program (GSP), Office of Biological and Environmental Research (OBER), the U.S. Department of Energy (DOE), and is a contribu-tion of the Pacific Northwest National Laboratory (PNNL) Meta-bolic and Spatial Interactions in Communities (MOSAIC) Scien-tific Focus Area (SFA). The Multi-Attribute Matching Engine (MAME) was fully developed under MOSAIC funding. Portions of this research were also supported by the United States Environmen-tal Protection Agency (Interagency Agreement DW-089-92452001-0 in support of DOE Project No. 68955A), the National Cancer Institute (grant R03CA222443), and a PNNL Laboratory Directed Research and Development program, the Microbiomes in Transition (MinT) Initiative. This work was performed in the W. R. Wiley Environmental Molecular Sciences Laboratory (EMSL), a DOE national scientific user facility at the PNNL. The NWChem calculations were performed using the Cascade supercomputer at the EMSL. PNNL is operated by Battelle for the DOE under con-tract DE-AC05-76RL0 1830.

REFERENCES (1) Djoumbou Feunang, Y.; Eisner, R.; Knox, C.; Chepelev, L.; Hastings, J.; Owen, G.; Fahy, E.; Steinbeck, C.; Subramanian, S.; Bolton, E.; Greiner, R.; Wishart, D. S. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy

Journal of Cheminformatics , , 61. (2) Vinayavekhin, N.; Homan, E. A.; Saghatelian, A. Exploring Disease through Metabolomics ACS Chemical Biology , , 91-103. (3) Teklab, G.; Robert, P. Application of NMR Metabolomics to Search for Human Disease Biomarkers Combinatorial Chemistry & High Throughput Screening , , 595-610. (4) National Academies of Sciences, E.; Medicine. Use of Metabolomics to Advance Research on Environmental Exposures and the Human Exposome: Workshop in Brief ; The National Academies Press: Washington, DC, 2016, p 12. (5) Lu, K.; Abo, R. P.; Schlieper, K. A.; Graffam, M. E.; Levine, S.; Wishnok, J. S.; Swenberg, J. A.; Tannenbaum, S. R.; Fox, J. G. Arsenic Exposure Perturbs the Gut Microbiome and Its Metabolic Profile in Mice: An Integrated Metagenomics and Metabolomics Analysis

Environmental Health Perspectives , , 284-291. (6) Glauser, G.; Boccard, J.; Rudaz, S.; Wolfender, J. L. Mass spectrometry ‐ based metabolomics oriented by correlation analysis for wound ‐ induced molecule discovery: identification of a novel jasmonate glucoside Phytochemical Analysis , , 95-101. (7) Wu, C.; Zacchetti, B.; Ram, A. F. J.; van Wezel, G. P.; Claessen, D.; Hae Choi, Y. Expanding the chemical space for natural products by Aspergillus-Streptomyces co-cultivation and biotransformation Scientific Reports , , 10868. (8) Pirhaji, L.; Milani, P.; Leidl, M.; Curran, T.; Avila-Pacheco, J.; Clish, C. B.; White, F. M.; Saghatelian, A.; Fraenkel, E. Revealing disease-associated pathways by network integration of untargeted metabolomics Nature Methods , , 770. (9) Griffin, J. L.; Wang, X.; Stanley, E. Does Our Gut Microbiome Predict Cardiovascular Risk? Circ. Cardiovasc. Genet. , , 187-191. (10) Sampaio, B. L.; Edrada-Ebel, R.; Da Costa, F. B. Effect of the environment on the secondary metabolic profile of Tithonia diversifolia: a model for environmental metabolomics of plants Scientific Reports , , 29265. (11) Pearson, H. Meet the human metabolome Nature , , 8. (12) Dettmer, K.; Aronov, P. A.; Hammock, B. D. Mass spectrometry-based metabolomics Mass spectrometry reviews , , 51-78. (13) Worley, B.; Powers, R. Multivariate Analysis in Metabolomics Curr. Metabolomics , , 92-107. (14) Goldberg, R. B.; Kendall, D. M.; Deeg, M. A.; Buse, J. B.; Zagar, A. J.; Pinaire, J. A.; Tan, M. H.; Khan, M. A.; Perez, A. T.; Jacober, S. J. A Comparison of Lipid and Glycemic Effects of Pioglitazone and Rosiglitazone in Patients With Type 2 Diabetes and Dyslipidemia Diabetes Care , , 1547-1554. (15) Bohan, D. A.; Vacher, C.; Tamaddoni-Nezhad, A.; Raybould, A.; Dumbrell, A. J.; Woodward, G. Next-Generation Global Biomonitoring: Large-scale, Automated Reconstruction of Ecological Networks Trends in Ecology & Evolution , , 477-487. (16) Dennis, K. K.; Marder, E.; Balshaw, D. M.; Cui, Y.; Lynes, M. A.; Patti, G. J.; Rappaport, S. M.; Shaughnessy, D. T.; Vrijheid, M.; Barr, D. B. Biomonitoring in the Era of the Exposome Environmental Health Perspectives , , 502-510. (17) Patti, G. J.; Yanes, O.; Siuzdak, G. Metabolomics: the apogee of the omic triology Nat. Rev. Mol. Cell. Biol. , , 263-269. (18) Onghena, M.; Hoeck, E. V.; Loco, J. V.; Ibáñez, M.; Cherta, L.; Portolés, T.; Pitarch, E.; Hernandéz, F.; Lemière, F.; Covaci, A. Identification of substances migrating from plastic baby bottles using a combination of low ‐ resolution and high ‐ resolution mass spectrometric analysers coupled to gas and liquid chromatography Journal of Mass Spectrometry , , 1234-1244. (19) Guo, J.; Yun, B. H.; Upadhyaya, P.; Yao, L.; Krishnamachari, S.; Rosenquist, T. A.; Grollman, A. P.; Turesky, R. J. Multi-Class Carcinogenic DNA Adduct Quantification in Formalin-Fixed Paraffin-Embedded Tissues by Ultra-Performance Liquid Chromatography–Tandem Mass Spectrometry Analytical chemistry , , 4780-4787. (20) Metz, T. O.; Baker, E. S.; Schymanski, E. L.; Renslow, R. S.; Thomas, D. G.; Causon, T. J.; Webb, I. K.; Hann, S.; Smith, R. D.; Teeguarden, J. G. Integrating ion mobility spectrometry into mass spectrometry-based exposome measurements: what can it add and how far can it go? Bioanalysis , , 81-98. (21) Patel, C. J.; Bhattacharya, J.; Butte, A. J. An Environment-Wide Association Study (EWAS) on type 2 diabetes mellitus PloS one , , e10746. (22) Eke, P. I.; Dye, B. A.; Wei, L.; Slade, G. D.; Thornton-Evans, G. O.; Borgnakke, W. S.; Taylor, G. W.; Page, R. C.; Beck, J. D.; Genco, R. J. Update on Prevalence of Periodontitis in Adults in the United States: NHANES 2009 to 2012 Journal of Periodontology , , 611-622. (23) Cathey, A.; Ferguson, K. K.; McElrath, T. F.; Cantonwine, D. E.; Pace, G.; Alshawabkeh, A.; Cordero, J. F.; Meeker, J. D. Distribution and predictors of urinary polycyclic aromatic hydrocarbon metabolites in two pregnancy cohort studies Environmental Pollution , , 556-562. (24) Bloszies, C. S.; Fiehn, O. Using untargeted metabolomics for detecting exposome compounds Curr. Opin. Toxicol. (25) Wishart, D. S.; Tzur, D.; Knox, C.; Eisner, R.; Guo, A. C.; Young, N.; Cheng, D.; Jewell, K.; Arndt, D.; Sawhney, S.; Fung, C.; Nikolai, L.; Lewis, M.; Coutouly, M. A.; Forsythe, I.; Tang, P.; Shrivastava, S.; Jeroncic, K.; Stothard, P.; Amegbey, G., et al. HMDB: the Human Metabolome Database

Nucleic Acids Res , , D521-526. (26) da Silva, R. R.; Dorrestein, P. C.; Quinn, R. A. Illuminating the dark matter in metabolomics Proc Natl Acad Sci U S A , , 12549-12550. (27) Kurita, K. L.; Glassey, E.; Linington, R. G. Integration of high-content screening and untargeted metabolomics for comprehensive functional annotation of natural product libraries Proc Natl Acad Sci U S A , , 11999-12004. (28) Barupal, D. K.; Fan, S.; Fiehn, O. Integrating bioinformatics approaches for a comprehensive interpretation of metabolomics datasets Curr Opin Biotechnol , , 1-9. (29) Newton, S. R.; McMahen, R. L.; Sobus, J. R.; Mansouri, K.; Williams, A. J.; McEachran, A. D.; Strynar, M. J. Suspect screening and non-targeted analysis of drinking water using point-of-use filters Environmental Pollution , , 297-306. (30) Ibrahim, Y. M.; Baker, E. S.; Danielson Iii, W. F.; Norheim, R. V.; Prior, D. C.; Anderson, G. A.; Belov, M. E.; Smith, R. D. Development of a new ion mobility time-of-flight mass spectrometer International Journal of Mass Spectrometry , , 655-662. (31) Tfaily, M. M.; Chu, R. K.; Toyoda, J.; Tolić, N.; Robinson, E. W.; Paša-Tolić, L.; Hess, N. J. Sequential extraction protocol for organic matter from soils and sediments using high resolution mass spectrometry Analytica Chimica Acta , , 54-61. (32) Tfaily, M. M.; Chu, R. K.; Tolić, N.; Roscioli, K. M.; Anderton, C. R.; Paša-Tolić, L.; Robinson, E. W.; Hess, N. J. Advanced Solvent Based Methods for Molecular Characterization of Soil Organic Matter by High-Resolution Mass Spectrometry Analytical Chemistry , , 5206-5215. (33) Graham, T. R.; Renslow, R.; Govind, N.; Saunders, S. R. Precursor Ion–Ion Aggregation in the Brust–Schiffrin Synthesis of Alkanethiol Nanoparticles J. Phys. Chem. C , , 19837-19847. (34) Tolic, N.; Liu, Y.; Liyu, A.; Shen, Y.; Tfaily, M. M.; Kujawinski, E. B.; Longnecker, K.; Kuo, L. J.; Robinson, E. W.; Pasa-Tolic, L.; Hess, N. J. Formularity: Software for Automated Formula Assignment of Natural and Other Organic Matter from Ultrahigh-Resolution Mass Spectra Anal. Chem. , , 12659-12665. (35) Zheng, X.; Renslow, R. S.; Makola, M. M.; Webb, I. K.; Deng, L.; Thomas, D. G.; Govind, N.; Ibrahim, Y. M.; Kabanda, M. M.; Dubery, I. A.; Heyman, H. M.; Smith, R. D.; Madala, N. E.; Baker, E. S. Structural Elucidation of cis/trans Dicaffeoylquinic Acid Photoisomerization Using Ion Mobility Spectrometry-Mass Spectrometry J. Phys. Chem. Lett. , , 1381-1388. (36) Valiev, M.; Bylaska, E. J.; Govind, N.; Kowalski, K.; Straatsma, T. P.; Van Dam, H. J. J.; Wang, D.; Nieplocha, J.; Apra, E.; Windus, T. L.; de Jong, W. A. NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations Computer Physics Communications , , 1477-1489. (37) Ma, J.; Casey, C. P.; Zheng, X.; Ibrahim, Y. M.; Wilkins, C. S.; Renslow, R. S.; Thomas, D. G.; Payne, S. H.; Monroe, M. E.; Smith, R. D.; Teeguarden, J. G.; Baker, E. S.; Metz, T. O. PIXiE: an algorithm for automated ion mobility arrival time extraction and collision cross section calculation using global data association Bioinformatics , , 2715-2722. (38) Zhang, X.; Romm, M.; Zheng, X.; Zink, E. M.; Kim, Y. M.; Burnum-Johnson, K. E.; Orton, D. J.; Apffel, A.; Ibrahim, Y. M.; Monroe, M. E.; Moore, R. J.; Smith, J. N.; Ma, J.; Renslow, R. S.; Thomas, D. G.; Blackwell, A. E.; Swinford, G.; Sausen, J.; Kurulugama, R. T.; Eno, N., et al. SPE-IMS-MS: An automated platform for sub-sixty second surveillance of endogenous metabolites and xenobiotics in biofluids Clin. Mass. Spectrom. , , 1-10. (39) Zheng, X.; Zhang, X.; Schocker, N. S.; Renslow, R. S.; Orton, D. J.; Khamsi, J.; Ashmus, R. A.; Almeida, I. C.; Tang, K.; Costello, C. E.; Smith, R. D.; Michael, K.; Baker, E. S. Enhancing glycan isomer separations with metal ions and positive and negative polarity ion mobility spectrometry-mass spectrometry analyses Analytical and bioanalytical chemistry , , 467-476. (40) Ulrich, E. M.; Sobus, J. R.; Grulke, C.; Richard, A.; Newton, S.; Strynar, M.; Mansouri, K.; Williams, A. Genesis and Study Design for EPA’s Non-Targeted Analysis Collaborative Trial (ENTACT) To Be Submitted . (41) Sobus, J. R.; Wambaugh, J. F.; Isaacs, K. K.; Williams, A. J.; McEachran, A. D.; Richard, A. M.; Grulke, C. M.; Ulrich, E. M.; Rager, J. E.; Strynar, M. J.; Newton, S. R. Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA

Journal of Exposure Science & Environmental Epidemiology . (42) Richard, A. M.; Judson, R. S.; Houck, K. A.; Grulke, C. M.; Volarath, P.; Thillainadarajah, I.; Yang, C.; Rathman, J.; Martin, M. T.; Wambaugh, J. F.; Knudsen, T. B.; Kancherla, J.; Mansouri, K.; Patlewicz, G.; Williams, A. J.; Little, S. B.; Crofton, K. M.; Thomas, R. S. ToxCast Chemical Landscape: Paving the Road to 21st Century Toxicology

Chemical Research in Toxicology , , 1225-1251. (43) May, J. C.; Goodwin, C. R.; Lareau, N. M.; Leaptrot, K. L.; Morris, C. B.; Kurulugama, R. T.; Mordehai, A.; Klein, C.; Barry, W.; Darland, E.; Overney, G.; Imatani, K.; Stafford, G. C.; Fjeldsted, J. C.; McLean, J. A. Conformational Ordering of Biomolecules in the Gas Phase: Nitrogen Collision Cross Sections Measured on a Prototype High Resolution Drift Tube Ion Mobility-Mass Spectrometer Analytical Chemistry , , 2107-2116. (44) Shaw, J. B.; Lin, T.-Y.; Leach, F. E.; Tolmachev, A. V.; Tolić, N.; Robinson, E. W.; Koppenaal, D. W.; Paša-Tolić, L. 21 Tesla Fourier Transform Ion Cyclotron Resonance Mass Spectrometer Greatly Expands Mass Spectrometry Toolbox Journal of The American Society for Mass Spectrometry , , 1929-1936. (45) Heller, S. R.; McNaught, A.; Pletnev, I.; Stein, S.; Tchekhovskoi, D. InChI, the IUPAC International Chemical Identifier Journal of Cheminformatics , , 23. (46) Colby, S. M.; Thomas, D. G.; Nunez, J.; Baxter, D.; Glaesemann, K.; Brown, J. M.; Pirrung, M. A.; Govind, N.; Teeguarden, J.; Metz, T. O.; Renslow, R. S. ISiCLE: A molecular collision cross section calculation pipeline for establishing large in silico reference libraries for compound identification Analytical Chemistry. In review. . (47) Ipsen, A. Efficient Calculation of Exact Fine Structure Isotope Patterns via the Multidimensional Fourier Transform

Analytical Chemistry , , 5316-5322. (48) Schymanski, E. L.; Jeon, J.; Gulde, R.; Fenner, K.; Ruff, M.; Singer, H. P.; Hollender, J. Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence Environmental Science & Technology , , 2097-2098. (49) Robert, C. P. In Wiley StatsRef: Statistics Reference Online , Balakrishnan, N.; Colton, T.; Everitt, B.; Piegorsch, W.; Ruggeri, F.; Teugels, J. L., Eds., 2016. (50) Pyswarm, github.com/tisimst/pyswarm. (51) Sahoo, P. K.; Soltani, S.; Wong, A. K. C. A survey of thresholding techniques

Comput Vis Graph Image Process , , 233-260. (52) Saito, T.; Rehmsmeier, M. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets PloS one , , e0118432. (53) Jones, D. P. Sequencing the exposome: A call to action Toxicol Rep. , , 29-45. (54) Bellman, R.; Bellman, R. E. Adaptive Control Processes: A Guided Tour ; Princeton University Press, 1961. (55) Donoho, D.

High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000., 2000.