[PDF] Ensemble Deep Learning on Large, Mixed-Site fMRI Datasets in Autism and Other Tasks

Abstract

Deep learning models for MRI classification face two recurring problems: they are typically limited by low sample size, and are abstracted by their own complexity (the "black box problem"). In this paper, we train a convolutional neural network (CNN) with the largest multi-source, functional MRI (fMRI) connectomic dataset ever compiled, consisting of 43,858 datapoints. We apply this model to a cross-sectional comparison of autism (ASD) vs typically developing (TD) controls that has proved difficult to characterise with inferential statistics. To contextualise these findings, we additionally perform classifications of gender and task vs rest. Employing class-balancing to build a training set, we trained 3 × 300 modified CNNs in an ensemble model to classify fMRI connectivity matrices with overall AUROCs of 0.6774, 0.7680, and 0.9222 for ASD vs TD, gender, and task vs rest, respectively. Additionally, we aim to address the black box problem in this context using two visualization methods. First, class activation maps show which functional connections of the brain our models focus on when performing classification. Second, by analyzing maximal activations of the hidden layers, we were also able to explore how the model organizes a large and mixed-centre dataset, finding that it dedicates specific areas of its hidden layers to processing different covariates of data (depending on the independent variable analyzed), and other areas to mix data from different sources. Our study finds that deep learning models that distinguish ASD from TD controls focus broadly on temporal and cerebellar connections, with a particularly high focus on the right caudate nucleus and paracentral sulcus.

Full PDF

MMay 28, 2020 1:6 ws-ijns

ENSEMBLE DEEP LEARNING ON LARGE, MIXED-SITE FMRIDATASETS IN AUTISM AND OTHER TASKS

MATTHEW LEMING

JUAN MANUEL G ´ORRIZ

JOHN SUCKLING

Deep learning models for MRI classiﬁcation face two recurring problems: they are typically limited bylow sample size, and are abstracted by their own complexity (the “black box problem”). In this paper,we train a convolutional neural network (CNN) with the largest multi-source, functional MRI (fMRI)connectomic dataset ever compiled, consisting of 43,858 datapoints. We apply this model to a cross-sectional comparison of autism (ASD) vs typically developing (TD) controls that has proved diﬃcult tocharacterise with inferential statistics. To contextualise these ﬁndings, we additionally perform classiﬁ-cations of gender and task vs rest. Employing class-balancing to build a training set, we trained 3 × Keywords : Autism; Big data; Functional connectivity; Deep learning.

1. Introduction

The characterization of brain diﬀerences in autismspectrum disorder (ASD) is an ongoing challenge.Although the consensus is that there are widespreadstructural and functional diﬀerences, the directionand spatial patterns of diﬀerences are not reliably observed and overlap with inter-individual variabil-ity in the neurotypical population.Estimates of grey matter volume with voxel-based morphometry (VBM) have been the most com-monly used methodology to assess brain structure,but have resulted in discrepancies amongst meta-analytic ﬁndings, at least a partial explanation for a r X i v : . [ q - b i o . Q M ] M a y ay 28, 2020 1:6 ws-ijns Leming, M., Gorriz, J.M., & Suckling, J. which are the small-sample sizes that are a prevalentfeature of the primary literature.

4, 5

To address variations in data acquisitionand processing that make between-study compar-isons less powerful, publicly available large-sampledatasets are now pivotal to imaging research. TheABIDE multi-centre initiative has made availableover 2000 images in two releases, but cross-sectionalVBM analyses have failed to observe signiﬁcant dif-ferences.

6, 7

Other morphological properties of thecortex may yield greater sensitivity, and recent ﬁnd-ings using estimates of cortical thickness from theENIGMA working group suggest a complex patternof diﬀerences relative to neurotypical controls thatvaries across the lifespan. Other databases, such asthe National Database for ASD Research (NDAR)act as aggregates of MRI data for diﬀerent smaller-scale studies, though centre diﬀerences complicateconventional analyses on these data as a whole.ASD has been consistently associated with dif-ferences in brain function.

10, 11

This is often studiedin the context of EEG, for which several stud-ies have been conducted to achieve automated di-agnosis, and fMRI. The measurement of corre-lation, or “functional connectivity”, between time-series of blood oxygenation level dependent (BOLD)endogenous contrast estimated from brain regionsduring resting wakefulness has been demonstratedas a reproducible measurement on an individual ba-sis. Functional connectivity (FC) matrices are esti-mates of the connectivities between all brain regionsthat can be represented as undirected graphs (con-nectomes) of nodes (brain regions) and edges (con-nectivity strengths). They show promise in localisingcharacteristic diﬀerences for ASD in resting activityto speciﬁc large-scale brain networks. Whilst thereis cautionary evidence using the ABIDE dataset andothers, it would appear that statistically signif-icant diﬀerences in connectivity are generally ob-servable, but like measurements of brain structure,are variable in their presentation. With consistentand localised changes remaining elusive, a number ofstudies have characterised ASD as exhibiting under-connectivity in certain areas of the brain, whileothers show evidence of over-connectivity. A re-cent review posited that ASD is likely a mix ofthese traits.In other ﬁelds, computing power and access tolarge datasets have led to a resurgence in the pop- ularity of NNs as a tool for data classiﬁcation. NNs are especially adept at classifying complex datawhich parametric inferential statistics may fail tofully characterize due to their inherent assumptions.Given that brain function in ASD has been consis-tently found to be diﬀerent but in diﬀerent ways,such a model may be a sensible approach for acomprehensive representation. In parallel, becauseof their wide applicability in representing complexdata such as proteins and social networks, func-tional connectomes have undergone signiﬁcant de-velopment in terms of global and local topologicaldescriptions. Some recent work has used NNs forprocessing connectomes, including whole-graph clas-siﬁcation, clustering into sub-graphs, and node-wiseclassiﬁcation.

Previous eﬀorts to classify func-tional connectivity in ASD on smaller datasets haveachieved accuracy rates that have been describedas “modest to conservatively good”, though thesemethods have had trouble replicating on diﬀerentdata. More recently, the application of convolu-tional CNNs to ABIDE data has achieved achieved68% to 77.3% classiﬁcation accuracies.

In this article, we leverage publicly availabledatasets to amass and automatically pre-process atotal of 43,838 functional MRIs from nine diﬀerentcollections. To test the application of CNNs to imag-ing data, we ﬁrst classify autistic individuals fromtypically developing (TD) controls. To validate theproposed models, we then classify functional connec-tivity matrices based on gender and task vs rest-ing state. All classiﬁcations were undertaken using aCNN that uniquely encodes multi-layered connectiv-ity matrices, using an original deep learning architec-ture, partially inspired by Kawahara et al 2017. Weopted to use these connectivity matrices as opposedto full fMRI datasets both for memory managementpurposes (the average fMRI dataset in our collec-tion is 176 MB per ﬁle, while the connectivity ma-trix is just under 500 KB), and for interpretability, asconnectivity matrices allow for the direct analysis ofboth localized areas and connections between areas.Due to the stochastic properties of NNs and set di-visions, we use a standard stratiﬁed cross-validationstrategy, performing each of our tests across 300 in-dependent models using diﬀerent subsamples and di-visions of the total dataset. To incentivise the modelto classify based on phenotypic diﬀerences ratherthan centre diﬀerences, class-balancing techniquesay 28, 2020 1:6 ws-ijns

Ensemble deep learning on large, mixed-site fMRI datasets in autism and other tasks across participant age and collection were used whenbuilding the training and test sets, and comparedagainst the fully-inclusive samples.Key outputs of the CNN are class activationmaps that highlight areas of the connectomethe model preferentially focuses on when perform-ing its classiﬁcation, and activation maximization of a hidden layer that visualizes how the model parti-tioned the dataset as a whole following classiﬁcation.We suggest an index to quantify the output of acti-vation maximisation.In attempting to classify components of this ac-cumulated dataset, we sought to address the follow-ing questions: (1) How eﬀective is our machine learn-ing paradigm at classifying FC in ASD, gender, andresting-state/task? (2) Which areas or networks ofthe brain do models focus on when undertaking clas-siﬁcations? (3) How does the model partition largedatasets during diﬀerent classiﬁcation tasks? (4) Canthe model eﬀectively classify FCs taken from multi-ple sources without relying explicitly on centre dif-ferences to do so?

2. Methods2.1.

Datasets and preprocessing

Datasets were acquired from OpenFMRI;

58, 59 theAlzheimers Disease Neuroimaging Initiative (ADNI);ABIDE; ABIDE II; the Adolescent Brain Cog-nitive Development (ABCD) Study; the NIMHData Archive, including the Research Domain Cri-teria Database (RDoCdb), the National Databasefor Clinical Trials (NDCT), and, predominantly, theNational Database for Autism Research (NDAR); the 1000 Functional Connectomes Project; the In-ternational Consortium for Brain Mapping database(ICBM); and the UK Biobank; we refer to each ofthese sets as collections . OpenFMRI, NDAR, ICBM,and the 1000 Functional Connectomes Project arecollections that comprise diﬀerent datasets submit-ted from unrelated research groups. ADNI, ABIDE,ABIDE II, ABCD, and the UK Biobank are collec-tions that were acquired as part of a larger researchinitiative.These data were pre-processed using the fMRISignal Processing Toolbox (SPT). Following skull-stripping, motion correction was accomplished usingSpeedyPP version 2.0, which utilized AFNI tools andwavelet despiking,

65, 66 with a low-bandpass ﬁlter of 0.01Hz, in addition to motion and motion derivativeregression. Both functional and structural datasetswere non-linearly registered to MNI space and par-cellated using the 116-area automated anatomicallabeling (AAL) template, which includes subcor-tical regions. Extracted time series were the meansof each AAL region. Each dataset was transformedinto N × ×

116 connectivity matrices, usingedges weighted by the Pearson correlation of thewavelet coeﬃcients of the pre-processed time-seriesin each of four frequency scales: 0.1-0.2 Hz, 0.05-0.1 Hz, 0.03-0.05 Hz, and 0.01-0.03 Hz. Wavelet cor-relation estimates were adjusted from TR rates toequalize the frequency ranges across diﬀerent collec-tions. Pre-processing was accomplished on a com-puting cluster over a period of several weeks. Due tothe volume of datasets, individualized quality controlwas not possible. The porportion of datasets failingpre-processing varied by collection.Across all collections, 70,284 potential datasetswere identiﬁed of which 67,396 contained suitablefunctional and structural datasets. Of these, 52,396succeeded pre-processing to parcellation. However,datasets with regional dropout of greater than 10%were omitted from the analyses, and redundantdatasets across collections were also discarded alongwith those data with a TR outside of the desiredrange. In total, 43,838 connectomes from 17,614unique participants were available for analysis withthe NN. Multiple instances of connectomes from thesame individuals were used, though they were notshared between the training, validation, and testsets. The numbers of participants, total numbers ofdatasets used as well as phenotypic distributions, areshown in Table 1.

Neural Network Model andTraining

Figure 1. The structure of the neural network. Thesewere applied in an ensemble model, so the outputs of 300independently-trained neural networks were averaged ina cross-validation scheme. ay 28, 2020 1:6 ws-ijns Leming, M., Gorriz, J.M., & Suckling, J.

Table 1. Average populations present for successfully-preprocessed datasets. Some datasets were not labeledwith respect to one or more covariates, so counts may not sum to the listed total.Age Sex DisordersCollection Subjs Conns Rest Task Min Max Mean Stddev F M Autism1000 FC 764 764 764 0 7.88 85.00 25.76 10.18 443 321 0ABCD 1319 9205 4043 5162 0.42 11.08 10.08 0.65 4339 4866 113Abide 193 193 193 0 9.00 50.00 17.81 6.69 21 172 94Abide II 720 761 761 0 5.22 55.00 14.44 7.45 174 587 375ADNI 141 261 261 0 56.00 95.00 73.57 7.32 146 115 0BioBank 11811 16970 9937 7033 40.00 70.00 55.23 7.51 8752 8218 8ICBM 112 381 29 352 19.00 74.00 43.53 14.83 188 193 0NDAR 1123 8569 5952 2617 0.25 55.83 18.65 7.82 4165 4404 994Open fMRI 1443 6655 1169 5486 5.89 78.00 27.22 10.40 2768 3133 127All 17614 43838 23109 20650 0.25 95.00 33.05 20.68 20996 22009 1711

The data used for training and testing the CNNwere 4 × ×

116 (4 wavelet scales and 116 nodes)symmetric FC (wavelet coeﬃcient correlation) ma-trices, with values linearly scaled from [-1.1] to [0,1]for easier use in a NN.To classify the data, we employed a CNN withvertical convolutional ﬁlters on the ﬁrst layer fol-lowed by horizontal convolutional ﬁlters on the sec-ond layer, eﬀectively reducing the matrices to singlevalues to allow the network to train on connectivitymatrices (Figure 1). This approach was partially in-spired by the cross-shaped ﬁlters described in Kawa-hara et al 2017, though previous tests with that ar-chitecture resulted in a number of failed models withno apparent increase in accuracy over the simplerarchitecture proposed here. We implemented this ar-chitecture using Keras, a popular machine learn-ing library, leveraging the advantages of supportingsoftware libraries. Additionally, this implementationincludes multiple channels in the inputs, as opposedto single-input connectivity matrices.The CNN was constructed with: 24 edge-to-nodevertical convolutional ﬁlters; 24 node-to-graph hori-zontal convolutional ﬁlters; 3 fully-connected layers,each with 64 nodes; and a ﬁnal softmax layer. Sep-arating each layer were batch normalization, recti-ﬁed linear unit (ReLU), and dropout layers, withthe dropout being 0.3 in the convolutional layers and0.7 in the dense layers. The layer structures and or-dering followed the advice oﬀered in 69. Speciﬁca-tions are shown in Figure 1. No pooling layers wereused, and all strides were of length 1. The model wastrained using an Adam optimizer with batch sizesof 64. Otherwise, Keras defaults were used. Models were trained for 200 epochs, and the epoch with thehighest validation accuracy was selected.To obtain a reliable average, we trained 300models independently for each classiﬁcation, whichwere then combined in an ensemble model. In eachtraining instance, a subset of the total available datawas taken. A holdout test and validation set were notused, but instead a division of the data was per-formed for each model in a stratiﬁed cross-validationschema, subject to the rules detailed below. Set division

Data were divided into three sets: a training set,comprising of two-thirds of the data and used totrain the model; a validation set, comprising of one-sixth of the data and used to select the epoch atwhich training stopped; and a test set, used to as-sess the trained classiﬁer performance, comprising ofone-sixth of the data. The approximate total num-ber of images used by each model was 10,000 for thegender and resting-state classiﬁcation, and 4000 (lim-ited by sample size) for the ASD classiﬁcation. Forall classiﬁcations, balancing was used such that eachclass comprised approximately half of the datasets.To account for covariates, classes were additionallybalanced such that the distributions of diﬀerent col-lections and ages were equal between classes. Forcollection balancing, equal numbers of datasets wereused from each collections. For continuous age val-ues, distributions of age between classes were madeto fail a Mann-Whitney U-test, with p > .

05. Weused standard stratiﬁed cross-validation rather thana holdout division across the 300 runs.Because of the collection balancing procedure,ay 28, 2020 1:6 ws-ijns

Ensemble deep learning on large, mixed-site fMRI datasets in autism and other tasks many data were excluded from certain classiﬁcationtasks; for instance, as BioBank only included eightsubjects with ASD. Due to the class balancing, setdivisions were not precise in each instance. Test set evaluation

Inter-data classiﬁcation

Following the training of the models, the accuracyand the area under the receiver operating character-istic curve (AUROC) were calculated as measures ofmachine learning performance on the test set. Thiswas to determine if one group in the classiﬁcationoutperformed the other in training leading to a bias-ing of the overall accuracy.2.4.2.

Activation Maximization

Activation maximization is a technique to deter-mine the maximally activated hidden units in re-sponse to the test set of the CNN layers followingtraining. Activation maximization was applied to the116 ×

24 second layer of our network (Figure 1) asthis convolutional layer acts as a bottleneck, and isthus easier to interpret and visualize. This layer isnaturally stratiﬁed by 24 ﬁlters , each with 116 nodes(brain regions). To oﬀset the inﬂuence of spuriousmaximizations, we opted to record the 10 datasetsthat maximally activated each hidden unit, obtain-ing their mode with respect to collection, gender,and whether it was task/rest; for example, if six con-nectomes that maximally activated a unit were fromCollection A and four were from Collection B, Col-lection A would be recorded as maximally activatingthat hidden unit.For each covariate, this method yields a 116 × ×

300 models. Weopted to measure the stratiﬁcation of the diﬀer-ent convolutional ﬁlters in our models by measur-ing whether it was maximally activated primarily byone source of data, or whether it was activated by amixed population. With this in mind, we calculatedfor each layer a diversity coeﬃcient, which is 0 if thelayer is only maximally activated by one class of dataand 1 if it is maximized proportional to the popula-tion maximized. Given K possible classes, F k , k ∈ K indicating the percentage of each class in a given ﬁl-ter, and T k , k ∈ K indicating the percentage of eachclass across all ﬁlters, we calculated the diversity co- eﬃcient for each ﬁlter as: D i = tan − (cid:32) ln − √ (cid:80) Kk =1 F k (cid:114)(cid:80) Kk =1 ( Fk − Tk )22 (cid:33) + π π (1)Brieﬂy, the justiﬁcation for this equation is thatthe summation (cid:80) Kk =1 ( F k − T k ) equals 0 if the distri-bution of the ﬁlter’s population is equal to the pop-ulation of the whole layer; that is, the distribution isideally diverse, and this pulls the logarithm towards −∞ , which in turn pulls the inverse tangent func-tion to π . Conversely, 1 − (cid:113)(cid:80) Kk =1 F k tends towards0 if the individual layer is only composed of a singleclass, pulling the inverse tangent towards - π . The di-versity coeﬃcient is normalized to be between 0 and1. Its value is indeterminate if only a single class ispresent globally.This equation is a more complex version ofother diversity coeﬃcients, such as the Herﬁndahl-Hirschman or Simpson diversity indices. However,the proposed index better accounts for overall pop-ulations in the hidden layer activations and thusmakes it easier to compare across diﬀerent classiﬁ-cation tasks and independent variables. While theHerﬁndahl-Hirschman or Simpson indices both ap-proach their maxima when the measured populationis completely homogenous, their lower extrema variesdepending on the number of distinct populationspresent. This is problematic in comparing across in-dices, because the number of populations varies de-pending on the application, and assumes that the ex-pected (i.e., most diverse) distribution occurs whendiﬀerent populations are perfectly proportional. Theproposed index deﬁnes the most diverse populationas that which has distributions proportional to theoverall population, at which point the index is zero.In practice, low diversity coeﬃcients indicatethat the ensemble models stratiﬁed data by the co-variate. This allows us to measure the degree towhich individual covariates (such as collection) weretaken into account by the CNNs. We found the diver-sity coeﬃcient of each of the 24 ﬁlters of our hidden,116 ×

24, convolutional layers, then sorted these val-ues to show which ﬁlters were primarily activated bya few covariates and which were activated maximallyby many covariates.ay 28, 2020 1:6 ws-ijns Leming, M., Gorriz, J.M., & Suckling, J.

Table 2. The ensemble and averaged AUROCS and accu-racies for 300 models.Autism Gender Rest v TaskEnsemble AUROC 0.6774 0.7680 0.9222Ensemble Acc. 67.0253% 69.7063% 85.1996%Average AUROC 0.6133 0.6858 0.9231Average Acc. 57.1150% 63.3398% 84.3153%

Class Activation Maps

We used class activation maps (CAMs) and aprevious Keras implementation to display parts ofthe connectivity matrix the CNN emphasised in itsclassiﬁcation of the test sets. CAMs operate by tak-ing the derivative of the CNN classiﬁcation function(approximated as a ﬁrst-order Taylor expansion, es-timated via back-propagation) with respect to an in-put matrix, with the output being the same dimen-sions as the input.

54, 56

While class activation mapswere originally proposed in, they were improved tothe commonly-used method presented in, knownas ‘Gradient Class Activation Maps (Grad-CAMs).CAMs are particularly advantageous when applied toconnectivity matrices, because unlike typical 2D im-ages, these matrices are spatially static (i.e. each partof the matrix represents the same connection in thebrain, across all datasets). Thus, global tendenciesof the model can be visualized by averaging manyCAMs. CAMs for each connectivity matrix were av-eraged, maximised across the four wavelet frequencydomains, and displayed to show which aspects ofthe connectome the CNN focused. To simplify theanalysis, CAMs were taken with respect to the in-put data’s predicted output, rather than two outputclasses. Experiments

We performed the classiﬁcation on class- andage-balanced datasets that then classiﬁed based ongender, task vs rest, and ASD vs TD controls inseparate analyses. We then analysed the averagedCAMs with respect to their output prediction. Wealso recorded the diversity coeﬃcient with respectto gender, collection, and rest v task.

Figure 2. Histograms of all AUROCS for 300 indepen-dent models, using diﬀerent, stratiﬁed samples of thewhole dataset.

3. Results

Table 2 shows the accuracies for the 300 mod-els tested. The AUROCs for the individual mod-els, across all data (Figure 2) were averaged to give0.6858, 0.9231, and 0.6133 for gender, task vs rest,and ASD vs TD classiﬁcations, respectively, whilethe average accuracies were 63.33%, 84.31%, and57.11%. In nearly all cases, however, as shown inTable 2, the ensemble AUROC and accuracies weresubstantially higher. The ROC of ensemble modelswith respect to collections are shown in Figures 3C,4C, and 5C.The results in Figures 3B, 4B, and 5B displaythe histogram of diversity indices across all models’activation maximisation values. This indicates thetendency of models to use particular ﬁlters to se-quester data by diﬀerent covariates, especially if itwere attempting to classify by that variable; thus, adiversity index of 0 indicates that all nodes within aparticular ﬁlter were maximally activated from oneor a small number of collections (i.e., BioBank orOpen fMRI). The covariates measured are gender,rest/task, and collection site; ASD was not includedas a covariate because of the relatively small percent-age of ASD data overall.The diversity index of the activation maximiza-tion of the second hidden layer revealed that ﬁlterswe in many cases sorted into two distinct groups, asshown by peaks on the lower and upper end of his-tograms in Figures 3B, 4B, and 5B: stratiﬁed layers(i.e., with a diversity index close to 0), which werewholly maximally activated by one type of dataset,and mixed layers (i.e., with a diversity index closeto 1), which integrated data from diﬀerent sources.ay 28, 2020 1:6 ws-ijns

Ensemble deep learning on large, mixed-site fMRI datasets in autism and other tasks AB C

Figure 3. Results for autism classiﬁcation. (A) The 100 strongest connections of the mean class activation maps, withthe maximum value taken across wavelet correlations. (B) The distribution of the diversity index of maximal activationsacross all ﬁlters over 300 models, showing how much ﬁlters in general were dedicated to particular phenotypes. (C) Theoverall classiﬁcation AUROC and the AUROC of individual data collections in the model, showing the overall and relativesuccess of the model.

While gender and task vs rest each had a proportionof their ﬁlters wholly activated by a single collection,the majority of ﬁlters were activated by a varietyof diﬀerent collections, indicating the eﬀective syn-thesis of data from diﬀerent sources. ASD, however,had a large proportion of data with a diversity in- dex close to zero; this is expected for the gender andresting-state covariates, given that the datasets weremainly from males, but the low diversity indices forcollection indicates that ASD classiﬁcation modelssequestered data based on collection, and thus manydatasets were considered independently.ay 28, 2020 1:6 ws-ijns Leming, M., Gorriz, J.M., & Suckling, J.

ASD vs TD Controls

With class balancing, the ensemble performancefor ASD v TD controls across test sets was AU-ROC=0.6774 (Figure 3). ASD classiﬁcations werehighly dependent on the collection used, althoughthe ﬁnal AUROCs were above chance for all collec-tions. Class balancing was particularly necessary forthis scheme, as data from autistic individuals com-prised less than 10% overall.Class activation was strongest for ASD in thelimbic system, cerebellum, temporal lobe, and frontalmiddle orbital lobe, but overwhelmingly emphasisedin the right caudate nucleus and paracentral lob-ule (Figure 3A). Findings of the caudate nucleus areconsistent with historical ﬁndings in developmentalASD, with both aberrant FC frequently associatedwith that area and the presence of volume diﬀer-ences. As stated above, activation maximization sawhigh stratiﬁcation with regards to gender andresting-state (Figure 3B). Collection also saw a mixof ﬁlters that were both highly stratiﬁed and highlydiverse, indicating the dual use of convolutional ﬁl-ters. Given the phenotypic diﬀerences in our ASDdatasets (with ABCD consisting largely of childrenand ABIDE adolescents, for instance), it is likely thatthe models considered parts of them independentlyduring classiﬁcation.

Gender

The ensemble classiﬁcation of gender yielded 0.7680AUROC, with comparable AUROCs across diﬀerentcollections (Figure 4C).On average, CAMs in gender classiﬁcationsshowed more diﬀerences around areas in the corpuscallosum and the frontal lobe (especially the medialleft frontal lobe), as well as parietal areas, with veryfew subcortical diﬀerences (Figure 4A).In activation maximization (Figure 4B), mostof the ﬁlters mixed data from diﬀerent genders andrest/task. A proportion were maximally activated byindividual collections, but for the most part, this wasmixed as well. Among the three classiﬁcation tasks inthis study, gender integrated the most data from dif-ferent sources. As gender distributions are likely themost homogenous variable tracked across datasets(with the exception of ABIDE I and II), the strat-iﬁcation with respect to individual collections was appropriately lower than expected when classifyingother variables.

Task vs Rest

Task v rest classiﬁcation had an ensemble clas-siﬁcation of AUROC=0.9222 (Figure 5C), by farthe highest of any classiﬁcation task. BioBankrest/task classiﬁcation had nearly perfect classiﬁca-tion, while other collections that contributed sub-stantial amounts to both resting-state and task par-ticipants, that is, NDAR, ABCD, and Open fMRI,had comparable performance. The CAM focused onthe default mode network, largely in the left hemi-sphere, and its connection to the right frontal medialorbital area. The highly emphasised areas includethe supplementary motor area, the left parietal lobe,the bilateral middle and inferior occipital lobe, theleft precentral gyrus, and the bilateral thalamus rep-resenting the wide range of areas activated in taskfMRI.In activation maximization, stratiﬁcation wasfound with respect to task (the target covariate),somewhat on collection, and very little with respectto gender. A degree of collection stratiﬁcation may beexpected due to the diﬀerent tasks found in diﬀerentcollections; for instance, BioBank consisted almostentirely of an emotional faces recognition task, whileOpen fMRI contains a medley of diﬀerent tasks.

4. Discussion

This work describes how large and diverse imagingdata might be analyzed by deep learning models, en-couraging the aggregation of publicly available col-lections. Data were partitioned based on clear andlogical features of the images and, even with imper-fect classiﬁcation accuracies, deep learning modelswere capable of recognizing complex patterns in largedatasets, many consistent with previous work.The neuroscientiﬁc objective of this study wasto use available imaging data with deep learning todescribe the pattern of functional brain changes thatdistinguishes ASD from TD. With the absence of anygold standard in this cross-sectional comparison,wealso undertook classiﬁcations of gender and rest vtask, which have more secure, robust ﬁndings in theextant literature to conﬁrm the veracity of the de-veloped methods.We used CAMs

54, 56 to identify connections anday 28, 2020 1:6 ws-ijns

Ensemble deep learning on large, mixed-site fMRI datasets in autism and other tasks AB C

Figure 4. Results for gender classiﬁcation. (A) The 100 strongest connections of the mean class activation maps, withthe maximum value taken across wavelet correlations. (B) The distribution of the diversity index of maximal activationsacross all ﬁlters over 300 models, showing how much ﬁlters in general were dedicated to particular phenotypes. (C) Theoverall classiﬁcation AUROC and the AUROC of individual data collections in the model, showing the overall and relativesuccess of the model. areas that had a pronounced inﬂuence on the classi-ﬁcations by the model. This method has previouslybeen used in deep learning on functional connectiv-ity

53, 55 as an eﬀective way of dissecting NNs. How-ever, a caveat to this is that CAMs, while indica-tors of areas of importance in the data, may not give a complete depiction of its distinguishing fea-tures. Without further tests, CAMs cannot indicatewhether a particular set of edges is over-connected orunder-connected, or whether the areas of high classactivation are independent or components of a morecomplex pattern.ay 28, 2020 1:6 ws-ijns Leming, M., Gorriz, J.M., & Suckling, J.

AB C

Figure 5. Results for resting-state/task classiﬁcation. (A) The 100 strongest connections of the mean class activationmaps, with the maximum value taken across wavelet correlations. (B) The distribution of the diversity index of maximalactivations across all ﬁlters over 300 models, showing how much ﬁlters in general were dedicated to particular phenotypes.(C) The overall classiﬁcation AUROC and the AUROC of individual data collections in the model, showing the overalland relative success of the model.

When classifying gender, the model was inﬂu-enced by diﬀuse areas connected to the frontal lobe(Figure 4A). This is consistent with previous ﬁndingsin gender comparisons of functional imaging, whichdid not ﬁnd diﬀerences in brain activity in speciﬁcareas, but rather diﬀerences in local FC over large areas of the cortex. Task vs rest FC classiﬁcations, as expected,identiﬁed the major components of the well-knowndefault mode network (Figure 5A), a set of bilat-eral and symmetric regions that is suppressed duringexogenous stimulation, as well as visual process-ay 28, 2020 1:6 ws-ijns Ensemble deep learning on large, mixed-site fMRI datasets in autism and other tasks ing areas (the occipital lobe) and the supplementarymotor area. Together with the comparison of gender,the conﬁrmation of the results with those expectedfrom the extant literature give conﬁdence for accu-rate classiﬁcation by the CNN as well as the speci-ﬁcity of the visualization method used.The paracentral lobule and right caudate nu-cleus, as well as connections to the cerebellum andvermis, were identiﬁed as salient to the comparisonof the ASD vs TD (Figure 3). This ﬁnding is largelysubstantiated by previous studies that have foundboth FC and volume diﬀerences between autisticand healthy individuals in the caudate nucleus, though these studies disagree on the exact natureof those diﬀerences. Much of the literature onfunctional connectivity in ASD, however, concernsnetwork-wide diﬀerences rather than localized dif-ferences captured by the CAMs.Another of the key methods we used to inter-rogate the results from our deep learning model wasactivation maximization. Previously, activation max-imization has been used for intuiting the internalconﬁguration of NNs rather than for quantitative in-terpretation, which has never been tried, especiallyacross many diﬀerent independent models. Many ofthe ﬁlters in our models were wholly activated bydatasets from a single group, while others utiliseda mixture of datasets. We sought to quantify thiseﬀect through a diversity index leading to two gen-eral observations: ﬁrst, across models, a few ﬁlterswere entirely activated by a single collection (i.e.,had a diversity index of 0), though which collectionremained inconsistent, and was not apparently pro-portional to the amount of data contributed by thatparticular dataset; Second, across models, the diver-sity index was not normally distributed but oftenhad two peaks, one at the low end of the spectrum(indicating stratiﬁcation of the ﬁlters) and one atthe high end (indicating a highly diverse, or close torandom, distribution of the ﬁlter). In ASD, a dispro-portionately high number of ﬁlters were activated bya single collection, indicating that the NN split datainternally more than other classiﬁcation tasks.In ASD, model accuracy was lower compared tothe highest rates reported in literature, althoughthis result should be viewed with several caveats. Thedataset used in this analysis was larger and more var-ied than any previously analyzed, consisting of manycollections. Direct comparisons of machine learning classiﬁcation methods are diﬃcult as there are nouniversally accepted schema to divide collections intotraining and test sets (unlike standardized compe-titions in other ﬁelds, such as the ImageNet LargeScale Visual Recognition Challenge (ILSVRC) ).Furthermore, our exclusion criteria diﬀered, and, be-cause we opted to use multiple scanning sessions fromsingle subjects during training, we also used follow-up data in ABIDE not employed in previous stud-ies. Class balancing may also have signiﬁcantly af-fected the classiﬁcation accuracy. However, this wasnecessary to avoid spuriously large accuracies due tothe highly skewed ratios of ASD-to-TD individuals.Lastly, preprocessing methods and exclusion criteriaare not typically shared across collections, and thustechnical and demographic diﬀerences in the inputdata cannot be discounted.While in this study (and all previous largesample-size studies of ASD classiﬁcation), the classi-ﬁcation percentage of ASD v TD datasets does notapproach the standards of clinical diagnosis, but re-mains pertinent. First, the intention of the modelsis to encourage further research and analysis in thisﬁeld. Second, FC data may simply lack discrete, dis-tinguishing signals indicative of ASD, making perfectclassiﬁcation impossible, in which case deep learningought to be viewed as an advanced statistical modelrather than a potential diagnostic tool. Third, ASD isa spectrum and not binary (unlike resting-state/taskand, in the vast majority of cases, biological gender),and these labels were applied with varying diagnosticstandards. While we are simply using the informationavailable, we recognise that the problem itself may beill-formed. This is also a potential explanation for thevariance in model accuracies seen in Figure 2, com-pared to the other classiﬁcation problems addressed.Fourth, due to the inﬂuence of confounding factors,high accuracy in machine learning for scientiﬁc ap-plications should be viewed with skepticism; for in-stance, we used several stringent motion-regressionalgorithms in preprocessing, which likely mitigatedthe eﬀects of group diﬀerences in motion that haspreviously been observed between autistic and non-autistic subjects. Finally, our deep learning model provides sev-eral advantages and unique features. First, it em-ployed multichannel input. Although this has longbeen the standard in 2D image classiﬁcation (for in-stance, RGB images), it has not been utilized be-ay 28, 2020 1:6 ws-ijns Leming, M., Gorriz, J.M., & Suckling, J. fore in the classiﬁcation of connectomes. Theoret-ically, this provides an advantage since it encodesmore information about the underlying time-series.In supplementary tests, multichannel inputs gener-ally increased the accuracy of our model by 23% oversingle-channel Pearson correlation input, though thiswas not tested extensively. Second, it used verticalﬁlters to encode matrices. In initial versions of thisstudy, we opted to copy the framework of Kawaharaet al 2017, which used cross-shaped ﬁlters, althoughthis was found to not increase accuracy over verticalﬁlters and caused the model to sometimes fail. Verti-cal ﬁlters were found to be more compatible with theframeworks of modern deep learning libraries, eventhough they sacriﬁce the theoretical advantage of en-coding edge-to-edge connections.In our training scheme, we also found substan-tial accuracy increases with the use of “ensemble”models in machine learning (Table 2); that is, usingmany independent NNs to vote on a single datapoint.This idea is not new in machine learning, but itis notable because the ensemble showed a substantialincrease in AUROC and accuracy over the sum of theindividual models, and thus in this context it was aneﬀective method of smoothing out unexpected be-haviour in models for potential real-world applica-tions. Additionally, it is an eﬀective way to evaluatethe performance of a model across the entirety of adataset. Combined with the attractiveness of evalu-ating and averaging models independently to reducevariance in class activation, this makes a good casefor classifying functional connectomes using many in-dependent models rather than one.

5. Conclusion

Our investigation was the ﬁrst to amass an ex-ceedingly large and diverse collection of fMRI dataand then apply big data methods. We opted topresent three important classiﬁcation tasks and fo-cus on the one that is both most interesting andleast-understood. With careful class-balancing, weshow that deep learning models are capable of good-quality classiﬁcations across mixed collections de-tecting diﬀerences in brain networks, and functions oflocalized structures, or FCs over large areas. CAMshighlighted key spatial elements of the classiﬁca-tion, and our results were largely validated by priorﬁndings of speciﬁc phenotypic diﬀerences. Activationmaximisation gave insights into the types of features on which the CNN based its classiﬁcation. While thedeep learning model in its present form should not beviewed as a diagnostic tool, it is an example of theapparatus needed to statistically analyse large andpublicly accessible volumes of data.The classiﬁcation of ASD, on average, pointedoverwhelmingly to two key areas (the right caudatenucleus and the right paracentral lobule), which isconsistent with many previous studies of ASD. How-ever, it should be noted that the ﬁnal AUROC waswell below the standard for clinical diagnosis, andthe variation of model accuracies across our ensemblewas very high, especially in relation to the other twocategorical classiﬁcations. Thus, the areas observedare unlikely to fully characterise ASD. This variationacross our very mixed dataset is related to the diﬃ-culties of diagnosing ASD in diﬀerent contexts, anda binary label applied a spectrum disorder may makefor an ill-formed machine learning problem.The most salient future direction of the presentwork is to focus on one of the classiﬁcation problemspresented and analyse how class activation maps ac-tivate diﬀerently for diﬀerent sorts of data. We canalso take advantage of several aspects of the tech-nique not explored in the present work, such as com-paring the class activations with respect to diﬀerentinput classes. While this was outside the scope of thepresent study and would have complicated the anal-ysis signiﬁcantly, it is one of many possible directionsin which to take future endeavours.

Acknowledgments

This paper is an extension of our previous work atIWINAC 2019. This study used publicly availabledatasets (acknowledgements below). This researchwas co-funded by the NIHR Cambridge Biomedi-cal Research Centre and Marmaduke Sheild. Thiswork was partly supported by the MINECO/FEDERunder the RTI2018-098913-B100 project. MatthewLeming is supported by a Gates Cambridge Scholar-ship from the University of Cambridge.

ADNI Acknowledgement

Data used in thepreparation of this article were obtained fromthe Alzheimer’s Disease Neuroimaging Initiative(ADNI) database (adni.loni.usc.edu). The ADNIwas launched in 2003 as a public-private part-nership, led by Principal Investigator Michael W.Weiner,MD. The primary goal of ADNI has beenay 28, 2020 1:6 ws-ijns

Ensemble deep learning on large, mixed-site fMRI datasets in autism and other tasks ICBM Acknowledgement

Data collection andsharing for this project was provided by the In-ternational Consortium for Brain Mapping (ICBM;Principal Investigator: John Mazziotta, MD, PhD).ICBM funding was provided by the National In-stitute of Biomedical Imaging and BioEngineering.ICBM data are disseminated by the Laboratory ofNeuro Imaging at the University of Southern Cali-fornia.

NDAR Acknowledgement

Data and/or researchtools used in the preparation of this manuscript wereobtained from the NIH-supported National Databasefor ASD Research (NDAR). NDAR is a collabora-tive informatics system created by the National Insti-tutes of Health to provide a national resource to sup-port and accelerate research in ASD. Dataset iden-tiﬁer(s): [NIMH Data Archive Collection ID(s) orNIMH Data Archive Digital Object Identiﬁer (DOI)].This manuscript reﬂects the views of the authors andmay not reﬂect the opinions or views of the NIH or ofthe Submitters submitting original data to NDAR.

ABCD Acknowledgement

Data used in thepreparation of this article were obtained from theAdolescent Brain Cognitive Development (ABCD)Study (https://abcdstudy.org), held in the NIMHData Archive (NDA). This is a multisite, lon-gitudinal study designed to recruit more than10,000 children age 9-10 and follow them over 10years into early adulthood. The ABCD Study issupported by the National Institutes of Healthand additional federal partners under award num-bers U01DA041022, U01DA041028, U01DA041048,U01DA041089, U01DA041106, U01DA041117,U01DA041120, U01DA041134, U01DA041148,U01DA041156, U01DA041174, U24DA041123, andU24DA041147. A full list of supporters is availableat https://abcdstudy.org/federal-partners.html. Alisting of participating sites and a complete list-ing of the study investigators can be found athttps://abcdstudy.org/Consortium Members.pdf.ABCD consortium investigators designed and im-plemented the study and/or provided data but did not necessarily participate in analysis or writing ofthis report. This manuscript reﬂects the views of theauthors and may not reﬂect the opinions or views ofthe NIH or ABCD consortium investigators.

UK Biobank Acknowledgement

This researchhas been conducted using the UK Biobank Re-source [project ID 20904], co-funded by the NIHRCambridge Biomedical Research Centre and a Mar-maduke Sheild grant to Richard A.I. Bethlehem andVarun Warrier. The views expressed are those of theauthor(s) and not necessarily those of the NHS, theNIHR or the Department of Health and Social Care.

Other database Acknowledgements

We wouldalso like to thank the 1000 Functional ConnectomesProject, ABIDE I and II, and Open fMRI.

Bibliography

1. F. Cauda, E. Geda, K. Sacco, F. D’Agata, S. Duca,G. Geminiani and R. Keller, Grey matter abnormal-ity in autism spectrum disorder: an activation likeli-hood estimation meta-analysis study,

J Neurol Neu-rosurg Psychiatry (2011) 1304–1313.2. T. DeRamus and R. Kana, Anatomical likelihoodestimation meta-analysis of grey and white matteranomalies in autism spectrum disorders author linksopen overlay panel, NeuroImage: Clin. (2015) 525–536.3. J. Yang and J. Hofmann, Action observation andimitation in autism spectrum disorders: an ale meta-analysis of fmri studies, Brain Imaging and Behavior (2015) 960–969.4. K. Button, J. Ioannidis, C. Mokrysz, B. Nosek,J. Flint, E. J. Robinson and M. Munafo, Power fail-ure: why small sample size undermines the reliabilityof neuroscience, Nat Rev Neurosci (2013) 365–376.5. C. Nord, V. Valton, J. Wood and J. Roiser, Power-up: A reanalysis of ’power failure’ in neuroscienceusing mixture modeling, J Neurosci (2017) 8051–8061.6. S. Haar, S. Berman, M. Behrmann and I. Dinstein,Anatomical abnormalities in autism?, Cereb Cortex (2016) 1440–1452.7. W. Zhang, W. Groen, M. Mennes, C. Greven,J. Buitelaar and N. Rommelse, Revisiting sub-cortical brain volume correlates of autism in theabide dataset: eﬀects of age and sex, PsychologicalMedicine (2018) 654–668.8. B. Khundrakpam, J. Lewis, P. Kostopoulos, F. Car-bonell and A. Evans, Cortical thickness abnormali-ties in autism spectrum disorders through late child-hood, adolescence, and adulthood: A large-scale mristudy, Cereb Cortex (2017) 1721–1731.9. D. van Rooij, E. Anagnostou, C. Arango, G. Auzias, ay 28, 2020 1:6 ws-ijns Leming, M., Gorriz, J.M., & Suckling, J.

M. Behrmann, G. Busatto, S. Calderoni, E. Daly,C. Deruelle, A. Di Martino et al. , Cortical and sub-cortical brain morphometry diﬀerences between pa-tients with autism spectrum disorder and healthyindividuals across the lifespan: Results from theenigma asd working group,

American Journal ofPsychiatry (2017) 359–369.10. E. M¨uller, A. Schuler and G. Yates, Social chal-lenges and supports from the perspective of indi-viduals with asperger syndrome and other autismspectrum disabilities,

Autism (2008) 173–190.11. T. Simas, S. Chattopadhyay, C. Hagan, P. Kundu,A. Patel, R. Holt, D. Floris, J. Graham, C. Ooi,R. Tait et al. , Semi-metric topology of the humanconnectome: Sensitivity and speciﬁcity to autismand major depressive disorder, PLoS One (2015).12. M. Ahmadlou, H. Adeli and A. Adeli, Fractality anda wavelet-chaos-neural network methodology for eeg-based diagnosis of autistic spectrum disorder, Jour-nal of Clinical Neurophysiology (2010) 328–333.13. M. Ahmadlou, H. Adeli and A. Adeli, Fuzzy synchro-nization likelihood-wavelet methodology for diagno-sis of autism spectrum disorder, Journal of Neuro-science Methods (2012) 203–209.14. S. Bhat, U. Acharya, H. Adeli, G. Muralidhar Bairyand A. Adeli, Automated diagnosis of autism: Insearch of a mathematical marker,

Reviews in theNeurosciences (2014) 851–861.15. S. Bhat, U. Acharya, G. Adeli, Muralidhar Bairy,and A. Adeli, Autism: Cause factors, early diagno-sis and therapies, Reviews in the Neurosciences (2014) 841–850.16. A. Antoniades, L. Spyrou, D. Martin-Lopez,A. Valentin, G. Alarcon, S. Sanei and C. Took, Deepneural architectures for mapping scalp to intracra-nial eeg, Intl J Neur Sys (2018).17. C. Hua, H. Wang, H. Wang, S. Lu, C. Liu, andS. Khalid, A novel method of building functionalbrain network using deep learning algorithm withapplication in proﬁciency detection, Intl J Neur Sys (2019) p. 1850015.18. A. Ansari, P. Cherian, A. Caicedo, G. Naulaers,M. De Vos and S. Van Huﬀel, Neonatal seizure detec-tion using deep convolutional neural networks, IntlJ Neur Sys (2019) p. 1850011.19. F. Schaper, Y. Zhao, M. Janssen, G. Wagner,A. Colon, D. Hilkman, E. Gommer, M. Vlooswijk,G. Hoogland, L. Ackermans et al. , Single cell record-ings to target the anterior nucleus of the thalamusin deep brain stimulation for patients with refractoryepilepsy, Intl J Neur Sys (2019).20. U. Acharya, S. Oh, Y. Hagiwara, J. Tan, H. Adeliand D. Subha, Automated eeg-based screening ofdepression using deep convolutional neural network, Comp Meth and Progr in Biomed (2018) 103–113.21. U. Acharya, S. Oh, Y. Hagiwara, J. Tan, andH. Adeli, Deep convolutional neural network for the automated detection of seizure using eeg signals,

Computers in Biology and Medicine (2018) 270–278.22. E. Finn, X. Shen, D. Scheinost, M. Rosenberg,J. Huang, M. Chun, X. Papademetris and R. Consta-ble, Functional connectome ﬁngerprinting: Identify-ing individuals using patterns of brain connectivity,

Nat Neurosci. (2015) 1664–1671.23. W. Wang, J. Liu, S. Shi, T. Liu, L. Ma, X. Ma,J. Tian, Q. Gong and M. Wang, Altered resting-statefunctional activity in patients with autism spectrumdisorder: A quantitative meta-analysis, Front. Neu-rol. (2018) p. 556.24. M. Plitt, K. Barnes and A. Martin, Functional con-nectivity classiﬁcation of autism identiﬁes highlypredictive brain features but falls short of biomarkerstandards, NeuroImage Clinical (2015) 359–66.25. M. Just, V. Cherkassky, T. Keller and N. Minshew,Cortical activation and synchronization during sen-tence comprehension in high-functioning autism: ev-idence of underconnectivity, Brain (2004) 1811–21.26. V. Cherkassky, R. Kana, T. Keller and M. Just,Functional connectivity in a baseline resting-statenetwork in autism,

Neuroreport (2006) 1687–90.27. D. Kennedy and E. Courchesne, The intrinsic func-tional organization of the brain is altered in autism, NeuroImage (2008) 1877–85.28. M. Assaf, K. Jagannathan, V. Calhoun, L. Miller,M. Stevens, R. Sahl, J. O’Boyle, R. Schultz andG. Pearlson, Abnormal functional connectivity of de-fault mode sub-networks in autism spectrum disor-der patients, NeuroImage (2010) 247–256.29. T. Jones, P. Bandettini, L. Kenworthy, L. Case,S. Milleville, A. Martin and R. Birn, Sources of groupdiﬀerences in functional connectivity: an investiga-tion applied to autism spectrum disorder, NeuroIm-age (2010) 401–414.30. S. Weng, J. Wiggins, S. Peltier, M. Carrasco, S. Risi,C. Lord and C. Monk, Alterations of resting statefunctional connectivity in the default network in ado-lescents with autism spectrum disorders, Brain Res (2010) 202–214.31. L. Cerliani, M. Mennes, R. Thomas, A. Martino,M. Thioux and C. Keysers, Increased functional con-nectivity between subcortical and cortical resting-state networks in autism spectrum disorder,

JAMAPsychiatry (2015) 767–777.32. H. Chien, H. Lin, M. Lai, S. Gau and W. Tseng,Hyperconnectivity of the right posterior temporo-parietal junction predicts social diﬃculties in boyswith autism spectrum disorder, Autism Res (2015)427–441.33. S. Delmonte, L. O’Gallagher, E. Hanlon, J. McGrathand J. Balsters, Functional and structural connec-tivity of frontostriatal circuitry in autism spectrumdisorder, Front Hum Neurosci (2013) p. 430.34. A. Di Martino, C. Kelly, R. Grzadzinski, X. Zuo, ay 28, 2020 1:6 ws-ijns Ensemble deep learning on large, mixed-site fMRI datasets in autism and other tasks M. Mennes, M. Mairena, C. Lord, F. Castellanosand M. Milham, Aberrant striatal functional con-nectivity in children with autism,

Biol Psychiatry (2011) 847–856.35. M. Nebel, A. Eloyan, A. Barber and S. Mostofsky,Precentral gyrus functional connectivity signaturesof autism, Front Syst Neurosci (2014) p. 80.36. M. Nebel, S. Joel, J. Muschelli, A. Barber, B. Caﬀo,J. Pekar and S. Mostofsky, Disruption of functionalorganization within the primary motor cortex in chil-dren with autism, Hum Brain Mapp (2014) 567–580.37. J. Hull, L. Dokovna, Z. Jacokes, C. Torgerson, A. Ir-imia and J. van Horn, Resting-state functional con-nectivity in autism spectrum disorders: A review, Front Psychiatry (2017).38. Y. LeCun, P. Haﬀner, L. Bottou and Y. Bengio, Ob-ject recognition with gradient-based learning, Lec-ture Notes in Computer Science (1999) 319–345.39. G. Hinton, S. Osindero and Y.-H. Teh, A fast learn-ing algorithm for deep belief nets,

Neural Computa-tion (2006) 1527–1554.40. A. Krizhevsky, I. Sutskever and G. Hinton, Ima-genet classiﬁcation with deep convolutional neuralnetworks, Advances in Neural Information Process-ing Systems (2012).41. J. Bruna, W. Zaremba, , A. Szlam and Y. LeCun,Spectral networks and locally connected networks ongraphs,

ICLR (2014).42. M. Deﬀerrard, P. Bresson and X. Vandergheynst,Convolutional neural networks on graphs with fastlocalized spectral ﬁltering,

NIPS (2016) 3844–3852.43. W. Hamilton, R. Ying and J. Leskovec, Representa-tion learning on graphs: Methods and applications,

Bulletin of the IEEE Computer Society TechnicalCommittee on Data Engineering (2017).44. Y. Hechtlinger, P. Chakravarti and J. Qin, A gener-alization of convolutional neural networks to graph-structured data, arXiv (2017).45. T. Kipf and M. Welling, Semi-supervised classiﬁca-tion with graph convolutional neural networks,

ICLR2017 (2017).46. G. Nikolentzos, P. Meladianos, A. Tixier, K. Skia-nis and M. Vazirgiannis, Kernel graph convolutionalneural networks,

ICANN 2018 (2017).47. M. Jung, H. Kosaka, D. Saito, M. Ishitobi, T. Morita,K. Inohara, M. Asano, S. Arai, T. Munesue, A. To-moda et al. , Default mode network in young maleadults with autism spectrum disorder: relationshipwith autism spectrum traits,

Mol Autism (2014)p. 35.48. T. Price, C. Wee, W. Gao and D. Shen, Multiple-network classiﬁcation of childhood autism usingfunctional connectivity dynamics, Med Image Com-put Comput Assist Interv (2014) 177–184.49. T. Iidaka, Resting state functional magnetic reso-nance imaging and neural network classiﬁed autism and control, Cortex (2015) 55–67.50. V. Subbaraju, M. Suresh, S. Sundaram andS. Narasimhan, Identifying diﬀerences in brain activ-ities and an accurate detection of autism spectrumdisorder using resting state functional-magnetic res-onance imaging: A spatial ﬁltering approach, MedImage Anal (2017) 375–389.51. C. Brown, J. Kawahara and G. Hamarneh, Con-nectome priors in deep neural networks to predictautism, ISBI 2018 (2018).52. A. Heinsfeld, A. Franco, R. Craddock, A. Buchweitzand F. Meneguzzia, Identiﬁcation of autism spec-trum disorder using deep learning and the abidedataset,

NeuroImage: Clinical (2018) 16–23.53. M. Khosla, K. Jamison, A. Kuceyeski andM. Sabuncu, 3d convolutional neural networks forclassiﬁcation of functional connectomes, MICCAI2018 (2018).54. K. Simonyan, A. Vedaldi and A. Zisserman, Deep in-side convolutional networks: Visualising image clas-siﬁcation models and saliency maps,

Workshop atInternational Conference on Learning Representa-tions , 2014.55. J. Kawahara, C. Brown, S. Miller, B. Booth,V. Chau, R. Grunau, J. Zwicker and G. Hamarneh,Brainnetcnn: Convolutional neural networks forbrain networks; towards predicting neurodevelop-ment,

NeuroImage (2017) 1038–1049.56. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam,D. Parikh and D. Batra, Grad-cam: Visual expla-nations from deep networks via gradient-based lo-calization, (2017).57. D. Erhan, Y. Bengio, A. Courville, and P. Vincent,Visualizing higher-layer features of a deep network,technical report 1341, University of Montreal (2009).58. R. Poldrack, D. Barch, J. Mitchell, T. Wager,A. Wagner, J. Devlin, C. Cumba, O. Koyejo andM. Milham, Toward open sharing of task-based fmridata: the openfmri project,

Front Neuroinform (2013).59. R. Poldrack and K. Gorgolewski, Openfmri: Opensharing of task fmri data, NeuroImage (2017)259–261.60. A. Di Martino, C. Yan, Q. Li, E. Denio,F. Castellanos, K. Alaerts, J. Anderson, M. Assaf,S. Bookheimer, M. Dapretto et al. , The autism brainimaging data exchange: towards a large-scale evalu-ation of the intrinsic brain architecture in autism,

Mol Psychiatry (2014) 659–67.61. A. Di Martino, D. O’Connor, B. Chen, K. Alaerts,J. Anderson, M. Assaf, J. Balsters, L. Baxter,A. Beggiato, S. Bernaerts et al. , Enhancing stud-ies of the connectome in autism using the autismbrain imaging data exchange ii, Sci Data (2017) p.170010.62. B. Casey and A. Dale, The adolescent brain cogni-tive development (abcd) study: Imaging acquisition ay 28, 2020 1:6 ws-ijns Leming, M., Gorriz, J.M., & Suckling, J. across 21 sites,

Dev Cog Neuro (2018) 43–54.63. D. Hall, M. Huerta, M. McAuliﬀe and G. Farber,Sharing heterogeneous data: The national databasefor autism research, Neuroinf. (2012) 331–339.64. E. Dolgin, This is your brain online: the functionalconnectomes project, Nat. Med. (2010) p. 351.65. A. Patel, P. Kundu, M. Rubinov, P. Jones, P. Vertes,K. Ersche, J. Suckling and E. Bullmore, A waveletmethod for modeling and despiking motion artifactsfrom resting-state fmri time series, NeuroImage (2014) 287–304.66. A. Patel and E. Bullmore, A wavelet-based estima-tor of the degrees of freedom in denoised fmri timeseries for probabilistic testing of functional connec-tivity and brain graphs, NeuroImage (2016) 14–26.67. N. Tzourio-Mazoyer, B. Landeau, D. Papathanas-siou, F. Crivello, O. Etard, N. Delcroix, B. Mazoyerand M. Joliot, Automated anatomical labeling of ac-tivations in spm using a macroscopic anatomical par-cellation of the mni mri single-subject brain,

Neu-roImage (2002) 273–289.68. F. Chollet, keras https://github.com/fchollet/keras,(2015).69. S. Ioﬀe and C. Szegedy, Batch normalization: Accel-erating deep network training by reducing internalcovariate shift, arXiv (2015).70. R. Kohavi, A study of cross-validation and bootstrapfor accuracy estimation and model selection, Intelli-gence - Volume 2, IJCAI95, Morgan Kaufmann Pub-lishers Inc., San Francisco, C (1995) 1137–1143.71. R. Kotikalapudi and contributors, keras-vishttps://github.com/raghakot/keras-vis, (2017).72. A. Qiu, T. Anh, Y. Li, H. Chen, A. Rifkin-Graboi,B. Broekman, K. Kwek, S. Saw, Y. Chong, P. Gluck-man et al. , Prenatal maternal depression altersamygdala functional connectivity in 6-month-old in-fants,

Transl Psychiatry (2015) p. e508.73. L. Sears, C. Vest, S. Mohamed, J. Bailey, B. Ransonand J. Piven, An mri study of the basal ganglia inautism, Prog Neuropsychopharmacol Biol Psychiatry (1999) 613–624.74. G. McAlonan, E. Daly, V. Kumari, H. Critchley,T. van Amelsvoort, J. Suckling, A. Simmons, T. Sig-mundsson, K. Greenwood, A. Russell et al. , Brainanatomy and sensorimotor gating in asperger’s syn-drome, Brain (2002) 1594–1606.75. P. Brambilla, A. Hardan, S. di Nemi, J. Perez,J. Soares and F. Barale, Brain anatomy and devel-opment in autism: review of structural mri studies,

Brain Res Bull (2003) 557–569.76. E. Hollander, E. Anagnostou, W. Chaplin, K. Es-posito, M. Haznedar, E. Licalzi, S. Wasserman,L. Soorya and M. Buchsbaum, Striatal volume onmagnetic resonance imaging and repetitive behav-iors in autism, Biol Psychiatry (2005) 226–232.77. L. O’Dwyer, C. Tanner, E. van Dongen, C. Greven,J. Bralten, M. Zwiers, B. Franke, D. Heslenfeld,J. Oosterlaan, P. Hoekstra et al. , Decreased left cau- date volume is associated with increased severity ofautistic-like symptoms in a cohort of adhd patientsand their unaﬀected siblings, PLoS One (2016)p. e0165620.78. D. Rojas, E. Peterson, E. Winterrowd, M. Reite,S. Rogers and J. Tregellas, Regional gray matter vol-umetric changes in autism associated with social andrepetitive behavior symptoms, BMC Psych (2006)p. 56.79. K. Turner, L. Frost, D. Linsenbardt, J. McIlroy andR. M uller, Atypically diﬀuse functional connectiv-ity between caudate nuclei and cerebral cortex inautism, Behav Brain Funct (2006).80. T. Qiu, C. Chang, Y. Li, L. Qian, C. Xiao, T. Xiao,X. Xiao, Y. Xiao, K. Chu et al. , Two years changes inthe development of caudate nucleus are involved inrestricted repetitive behaviors in 2–5-year-old chil-dren with autism spectrum disorder, DevelopmentalCognitive Neuroscience (2016) 137–143.81. D. Tomasi and N. Volkow, Gender diﬀerences inbrain functional connectivity density, Hum BrainMapp. (2013) 849–860.82. M. Raichle, A. MacLeod, A. Snyder, W. Powers,D. Gusnard and G. Shulman, A default mode ofbrain function, PNAS (2001) 676–682.83. M. Greicius, B. Krasnow, A. Reiss and V. Menon,Functional connectivity in the resting brain: A net-work analysis of the default mode hypothesis, PNAS (2003) 253–258.84. O. Russakovsky, J. Deng, H. Su, J. Krause,S. Satheesh, S. Ma, Z. Huang, A. Karpathy,A. Khosla et al. , Imagenet large scale visual recog-nition challenge,

International Journal of ComputerVision (2015) 211–252.85. M. Ribeiro, S. Singh and C. Guestrin, “why should itrust you?” explaining the predictions of any classi-ﬁer,

Knowledge Discovery and Data Mining (KDD) (2016).86. J. Cook, S. Blakemore and C. Press, Atypical ba-sic movement kinematics in autism spectrum condi-tions,

Brain (2013) 2816–2824.87. D. Opitz and R. Maclin, Popular ensemble methods:An empirical study,