[PDF] Bayesian Spatial Models for Voxel-wise Prostate Cancer Classification Using Multi-parametric MRI Data

Abstract

Multi-parametric magnetic resonance imaging (mpMRI) plays an increasingly important role in the diagnosis of prostate cancer. Various computer-aided detection algorithms have been proposed for automated prostate cancer detection by combining information from various mpMRI data components. However, there exist other features of mpMRI, including the spatial correlation between voxels and between-patient heterogeneity in the mpMRI parameters, that have not been fully explored in the literature but could potentially improve cancer detection if leveraged appropriately. This paper proposes novel voxel-wise Bayesian classifiers for prostate cancer that account for the spatial correlation and between-patient heterogeneity in mpMRI. Modeling the spatial correlation is challenging due to the extreme high dimensionality of the data, and we consider three computationally efficient approaches using Nearest Neighbor Gaussian Process (NNGP), knot-based reduced-rank approximation, and a conditional autoregressive (CAR) model, respectively. The between-patient heterogeneity is accounted for by adding a subject-specific random intercept on the mpMRI parameter model. Simulation results show that properly modeling the spatial correlation and between-patient heterogeneity improves classification accuracy. Application to in vivo data illustrates that classification is improved by spatial modeling using NNGP and reduced-rank approximation but not the CAR model, while modeling the between-patient heterogeneity does not further improve our classifier. Among our proposed models, the NNGP-based model is recommended considering its robust classification accuracy and high computational efficiency.

Full PDF

BBayesian Spatial Models for Voxel-wise ProstateCancer Classiﬁcation Using Multi-parametric MRIData

Jin Jin , ∗ , Lin Zhang , Ethan Leng ,Gregory J. Metzger , and Joseph S. Koopmeiners ∗ [email protected] Department of Biostatistics, Johns Hopkins BloombergSchool of Public Health, Baltimore, MD, USA Division of Biostatistics, School of Public Health, University of Minnesota,Minneapolis, MN, USA Department of Biomedical Engineering, University of Minnesota,Minneapolis, MN, USA Department of Radiology, Center for Magnetic Resonance Research,University of Minnesota, Minneapolis, MN, USA

Abstract

Multi-parametric magnetic resonance imaging (mpMRI) plays an increasingly importantrole in the diagnosis of prostate cancer. Various computer-aided detection algorithms havebeen proposed for automated prostate cancer detection by combining information from var-ious mpMRI data components. However, there exist other features of mpMRI, including thespatial correlation between voxels and between-patient heterogeneity in the mpMRI param-eters, that have not been fully explored in the literature but could potentially improve cancerdetection if leveraged appropriately. This paper proposes novel voxel-wise Bayesian classi-ﬁers for prostate cancer that account for the spatial correlation and between-patient hetero-geneity in mpMRI. Modeling the spatial correlation is challenging due to the extreme highdimensionality of the data, and we consider three computationally efﬁcient approaches usingNearest Neighbor Gaussian Process (NNGP), knot-based reduced-rank approximation, anda conditional autoregressive (CAR) model, respectively. The between-patient heterogene-ity is accounted for by adding a subject-speciﬁc random intercept on the mpMRI parametermodel. Simulation results show that properly modeling the spatial correlation and between-patient heterogeneity improves classiﬁcation accuracy. Application to in vivo data illustratesthat classiﬁcation is improved by spatial modeling using NNGP and reduced-rank approxi-mation but not the CAR model, while modeling the between-patient heterogeneity does not a r X i v : . [ s t a t . A P ] J a n urther improve our classiﬁer. Among our proposed models, the NNGP-based model is rec-ommended considering its robust classiﬁcation accuracy and high computational efﬁciency. Keywords—

Bayesian hierarchical models; Between-patient heterogeneity; MpMRI; Multi-imagespatial modeling; NNGP; Voxel-wise prostate cancer classiﬁcation.

Multi-parametric magnetic resonance imaging (mpMRI) has continued to play an important role in thedetection of prostate cancer (Kurhanewicz et al., 2008; Dickinson et al., 2011). Conventional manualdiagnosis using mpMRI does not realize its full potential due to demonstrated radiologist and experience-dependent variability (Rosenkrantz et al., 2013; Garcia-Reyes et al., 2015). To avoid such shortcomings,various computer-aided detection (CAD) algorithms have been proposed, seeking automated cancer de-tection using different methods to combine information from various MRI parameters, such as linear andnonlinear regressions, clustering methods, support vector machine, ensemble learning approaches andnaive Bayes (Niaf et al., 2012; Shah et al., 2012; Vos et al., 2012; Peng et al., 2013; Khalvati, Wong andHaider, 2015). However, these existing methods have been found insufﬁcient, either lacking the ﬂexibilityto model important features of the mpMRI data, or are overly ﬂexible, resulting in poor performance withthe sample sizes found in prostate cancer imaging studies. Previously, it has been observed that there existssubstantial difference in both the distribution of the observed mpMRI parameters and cancer prevalencebetween the two regions of the prostate: the peripheral zone (PZ) and central gland (CG). Recently, Jin etal. (2018) proposed to model the regional heterogeneity using a Bayesian hierarchical modeling frame-work, which has a ﬂexible structure allowing complex distributional assumptions for both the predictors(mpMRI parameters) and the outcome (cancer status) that cannot be easily accounted for by the existingCAD algorithms.In addition to the regional variability, mpMRI data also show substantial spatial correlation in thevoxel-wise cancer status, as well as between-patient heterogeneity in the mpMRI parameters which ispossibly due to registration error or variability in patients’ physical conditions. Although Jin et al. (2018)proposed to account for spatial correlation between voxels using post hoc spatial smoothing, systematicmodeling of these various mpMRI features has not been formally investigated in the context of fully au-tomated, voxel-wise prostate cancer detection. The major challenge associated with modeling the spatialcorrelation is the extreme high dimensionality of the data. Typically, each mpMRI image has thousandsof voxels, which must be modeled simultaneously over multiple images for model development. Spatialmodeling requires inverting large spatial covariance matrices that typically requires ∼ n ﬂoating pointoperations and storage of order n , with n denoting the number of voxels in an image, which is computa-tionally infeasible for our motivating data set.There are two general approaches for modeling large spatial data sets. One is sparse approximation,which introduces sparsity in the spatial covariance or precision matrix. Methods include covariance taper-ing assuming compactly supported covariance functions (e.g. Kaufman, Schervish and Nychka, 2008),and approximating the likelihood by the product of lower dimensional conditional densities (e.g. Vec-chia, 1988; Stein, Chi and Welty, 2004), Markov random ﬁelds (e.g. Rue and Held, 2005), or compositelikelihoods (e.g. Eidsvik et al., 2014). More recently, Datta et al. (2016) proposed a Nearest NeighborGaussian Process (NNGP) for fully process-based modeling of large spatial data sets, which approximates he likelihood of a spatial process by the product of conditional densities between nearest neighbors. Theother general approach is reduced-rank approximation, which constructs spatial processes on a lower-dimensional subspace. Methods include predictive process models (Banerjee et al., 2008; Finley, Banerjeeand McRoberts, 2009), low rank splines or basis functions (Cressie and Johannesson, 2008) and kernelconvolutions (Higdon, 2002).This paper aims to improve voxel-wise classiﬁcation of prostate cancer by systematically modelingthe between-patient variability in the mpMRI parameters and spatial correlation in the voxel-wise cancerstatus via Bayesian hierarchical modeling, which can potentially better localize and determine the extentof cancer lesions, providing enhanced guidance for needle biopsy and focal therapy in clinical practice.Our proposed models build on the work of Jin et al. (2018), which accounts for regional variability in thevarious data components, and will serve as the “baseline” to evaluate the performance of our proposedmodels. The between-patient heterogeneity is incorporated by specifying a subject speciﬁc random in-tercept for the mpMRI parameters. Unlike the post hoc spatial smoothing technique used by Jin et al.(2018), we propose to formally model the spatial correlation structure in the voxel-wise cancer status viathree scalable spatial modeling approaches: a sparse approximation approach using the NNGP proposedby Datta et al. (2016), a reduced-rank approximation approach using the knot-based method in Baner-jee et al. (2008), and Gaussian Markov random ﬁelds implemented via the conditional autoregressive(CAR) model. Our simulation results illustrate that the proposed models substantially improve classi-ﬁcation accuracy both by incorporating the between-patient heterogeneity and by modeling the spatialcorrelation. Application to our motivating data set demonstrates improvement due to modeling the spatialcorrelation using the NNGP and reduced-rank approximation but not the CAR model, while modelingthe between-patient heterogeneity does not further improve classiﬁcation accuracy. Among our proposedscalable spatial modeling approaches, the NNGP-based approach is recommended considering its robustclassiﬁcation accuracy with respect to spatial correlation pattern and high computational efﬁciency.The remainder of the paper proceeds as follows. In Section 2, we describe our motivating data set. InSection 3, we discuss methods development and implementation. In Section 4, we present simulation re-sults illustrating the performance of the various models discussed in Section 3, and in Section 5, we assessmodel performance on our motivating data set. We conclude with a discussion of the model propertiesand potential extensions in Section 6. Our methods development was motivated by a unique mpMRI data set obtained from patients diagnosedwith prostate cancer at the University of Minnesota, the collection procedure for which was previouslydescribed (Metzger et al., 2016). Brieﬂy, the data were collected on a clinical 3T scanner. The diffusion-weighted and contrast-enhanced images (DWI and DCE-MRI, respectively) were acquired in accordancewith Prostate Imaging – Reporting and Data System (PI-RADS) v2 guidelines (Weinreb et al., 2016).Maps of the quantitative MRI parameters were then calculated from these data: apparent diffusion co-efﬁcient (ADC) maps were calculated from the DWI data acquired using multiple diffusion-encodingb-values; the maps of DCE-MRI parameters, including the area under the gadolinium concentration timecurve at 90 seconds (AUGC90), the forward volume transfer constant (K trans ), and reﬂux rate constant(k ep ), were generated using a modiﬁed Tofts model (Tofts, 1997). Maps of these quantitative parameterswere then manually co-registered. Patients that were imaged subsequently underwent radical prostatec-tomy, and the ex vivo prostate specimens were collected and processed after surgery. The histopathologyslides were annotated for cancer by trained pathologists, then co-registered with the quantitative MR maps sing a registration method in Kalavagunta et al. (2015). The data contains 34 prostate images, each has ∼ voxels and is from a different patient/subject. Figure 1 presents an example of a prostateimage showing the two regions of the prostate and areas of cancer and non-cancer. Figure 1: An example of the manually guided segmentation of the T2-weighted anatomic prostateimage in the mpMRI data set. The prostate gland is the area within the green curve. Theblue curve demarcates the division between the peripheral zone (PZ) and central gland (CG).Histopathologically identiﬁed cancer and noncancer regions are indicated as the white and blackareas, respectively, in the second sub-ﬁgure.

The general notations for the various data components are as follows. Assume there are N images(subjects), with n i voxels in the i th image, i = 1 , , . . . , N . For the j th voxel in the i th image, wehave a d × vector of observed mpMRI parameters, y ij = ( y ij, , ..., y ij,d ) T , which, for our data set,includes the aforementioned ADC, AUGC90, K trans and k ep . Note that the ADC values are approximatelynormally distributed, while the right-skewed AUGC90, K trans and k ep values are log transformed to havean approximately normal distribution. Each voxel has a binary cancer indicator obtained from the co-registered pathology data, which we call c ij , where c ij = 1 indicates cancer and indicates noncancer.In addition, each voxel is identiﬁed by a set of location information, including r ij , a binary indicator ofprostate region, with r ij = 1 indicating PZ and indicating CG, and s ij , a standardized 2-D coordinate.Unlike the brain for which a common template already exists, the size of the prostate can vary substantiallyacross patients, making it difﬁcult to develop a common template for the prostate. Therefore, we linearlyrescaled the original 2-D coordinates of the voxels so that they have a common support of ( − , × ( − , , with (0 , being the center of each image. We denote the set of data components as Y = { y ij | i = 1 , . . . , N, j = 1 , . . . , n i } , c = { c ij | i = 1 , . . . , N, j = 1 , . . . , n i } , R = { r ij | i = 1 , . . . , N, j =1 , . . . , n i } and S = { s ij | i = 1 , . . . , N, j = 1 , . . . , n i } . Finally, we use Y ∗ , R ∗ and S ∗ to denote theset of mpMRI parameters, region information and coordinates, respectively, for voxels in a new prostateimage whose cancer status c ∗ are unobserved and must be predicted. Method

The classiﬁcation models proposed in this paper build upon the work of Jin et al. (2018), which focusedon modeling the regional heterogeneity in the mpMRI data, i.e. the difference in both the distributionof mpMRI parameters and cancer prevalence between the two regions of the prostate, PZ and CG. Jin etal. (2018) proposed to model the region-speciﬁc joint density of the mpMRI parameters, Y , and cancerstatus, c , given region information, R , via a Bayesian hierarchical modeling framework. The unknowncancer status for the voxels in a new prostate image, c ∗ , are classiﬁed using posterior predictive cancerprobabilities given Y ∗ and R ∗ . Speciﬁcally, the joint distribution of Y and c is assumed to depend on R , which can be deﬁned hierarchically as f ( Y , c | R ) = f ( Y | c , R ) p ( c | R ) . For f ( Y | c , R ) : y ij ’s areassumed to follow independent multivariate normal distributions with mean and covariance matrix varyingby cancer status c ij ∈ { , } and region indicator r ij ∈ { , } . For p ( c | R ) : c ij is assumed to follow aBernoulli distribution, Ber ( p r ij ) , with cancer probability p r ij varying by region indicator r ij ∈ { , } : y ij | c ij , r ij iid ∼ MVN ( µ c ij ,r ij , Γ c ij ,r ij ) ,c ij | r ij iid ∼ Ber ( p r ij ) . (1)Model (1) is the basis for the development of our proposed models, and will be used as the baseline forevaluating the performance of the models deﬁned below. In this sub-section, we introduce our general approach for modeling the between-patient heterogeneity inmpMRI parameters and spatial correlation in the voxel-wise cancer status.

Building on the hierarchical structure of the baseline model (1), the between-patient heterogeneity inthe mpMRI parameters can be incorporated by introducing a random intercept, δ i ∼ MVN ( , Σ ) , on { y ij | j = 1 , , . . . , n i } , so that y ij | c ij , r ij , δ i iid ∼ MVN ( µ c ij ,r ij + δ i , Γ c ij ,r ij ) . In our setting, δ i repre-sents the subject-speciﬁc random shift of the i th patient from the overall mean, with respect to the mpMRIparameters. The baseline model (1) assumes that all voxels are independent, with voxel-wise cancer probability p r ij only depending on the region indicator of the voxel. In reality, however, substantial spatial correlation hasbeen observed in the voxel-wise cancer status within each image. To account for this feature, we insteadassume that the voxel-wise cancer probabilities in the i th image, which we now denote as p ij ’s, vary bythe location/coordinate of the voxels, s ij ’s, and are spatially correlated according to the distance between s ij ’s. Note that this only implies spatial correlation between voxels within an image and that voxels fromdifferent images/patients are assumed independent. For Gaussian distributed geostatistical data, Gaussian rocess models are widely used to model the spatial correlation assuming correlated Gaussian spatialrandom effects. The p ij ’s, however, are restricted to the unit interval, and Gaussian spatial random effectscannot be directly applied. Instead, we conduct a probit transformation on p ij and obtain q ij = Φ − ( p ij ) ,where Φ − denotes the probit function, which has a support of ( −∞ , + ∞ ) , then assume a vector ofspatially correlated Gaussian random effects w i = ( w i , w i , . . . , w i,n i ) T on q i = ( q i , q i , . . . , q i,n i ) T .Under the probit model structure, we introduce a latent variable, κ ij ∼ N ( q r ij , + w ij , , where q r ij , denotes the probit of the cancer prevalence in region r ij ∈ { , } . The cancer status, c ij , is then deﬁnedas I ( κ ij > . The full model that accounts for both between-patient heterogeneity and spatial correlationbetween voxels becomes: y ij iid ∼ MVN ( µ c ij ,r ij + δ i , Γ c ij ,r ij ) , δ i ∼ MVN ( , Σ ) ,c ij = I ( κ ij > , κ ij ∼ N ( q r ij , + w ij , , w i ∼MVN ( , C ( S i , S i | θ )) , (2)where S i = ( s i , s i , . . . , s i,n i ) T , C ( S i , S i | θ ) denotes the spatial covariance matrix of w i , and θ de-notes the set of spatial parameters shared by all images. For the construction of C ( S i , S i | θ ) , we as-sume that w i is the realization of a zero-mean Gaussian process GP ( , C ( · , ·| θ )) on S i . We deﬁne C ( S i , S i | θ ) = σ ρ ( S i , S i | ζ ) , where ρ ( · , ·| ζ ) is a correlation function with a set of correlation parame-ters ζ , σ denotes the spatial variance, and θ = { σ , ζ } . In this paper, we employ a Matérn correlationfunction (Stein, 2012), which is one of the most widely used correlation functions in spatial statistics thatcovers various types of stationary spatial correlation patterns. Given the Matérn correlation function, thespatial correlation between two locations s , t ∈ R is deﬁned as: ρ ( s , t | ζ ) = 12 ν − Γ( ν ) (cid:16) ν / (cid:107) s − t (cid:107) φ (cid:17) ν J ν (cid:16) ν / (cid:107) s − t (cid:107) φ (cid:17) , (3)where ζ = { φ, ν } , φ is the spatial range parameter with larger φ indicating larger-scale spatial correlation, ν is the smoothness parameter controlling the degree of differentiability, with larger ν indicating smoothercorrelation, Γ( · ) is the gamma function, J ν ( · ) is the modiﬁed Bessel function of the second kind with order ν , and (cid:107)·(cid:107) denotes the Euclidean distance. The full spatial process model (2) becomes computationally infeasible on our motivating data set, wherethere are a large number of images, each involving a separate spatial process over thousands of voxels.In this sub-section, we propose three different approaches that ensure scalable spatial modeling of thempMRI data.

Our ﬁrst scalable approach to modeling spatial correlation is based on sparse approximation via NNGP.Most sparse approximation approaches do not necessarily deﬁne a valid spatial process, and prediction isthrough interpolation from a different spatial process that may not reﬂect the true predictive uncertainty.To deal with issue, Datta et al. (2016) proposed NNGP for fully process-based modeling of large spatialdata sets, which was shown to signiﬁcantly outperform competing approaches in terms of inference andscalability.The construction of NNGP was discussed in detail by Datta et al. (2016), with applications tospatially-correlated, normally distributed observations on a single map in the linear regression model ramework. In our setting, we extend NNGP to Bayesian hierarchical modeling of multi-image data withspatially-correlated binary outcomes. We apply a separate NNGP to each image. Take the i th image asan example: the joint density of w i (i.e. the vector of spatial random effects on q i ) can be written as theproduct of conditional densities: f ( w i ) = n i (cid:89) j =1 f ( w ij | w i , w i , . . . , w i,j − ) . (4)To reduce the computational burden, we replace each large conditioning set { w i , w i , . . . , w i,j − } witha smaller set of size at most m on N ( s ij ) ⊆ S i \ { s ij } , where m (cid:28) min i n i , to construct an alternativedensity: (cid:101) f ( w i ) = n i (cid:89) j =1 f ( w ij | w N ( s ij ) ) , (5)where w N ( s ij ) denotes the vector of spatial random effects on N ( s ij ) , the m nearest neighbors of s ij .For each image i , we view { S i , N S i } as a directed graph G , with S i being the set of nodes and N S i the setof directed edges. It has been proven that if G is a directed acyclic graph, then (cid:101) f ( w i ) in (5) will be a propermultivariate joint density. Speciﬁcally, let C s ij denote the variance of w ij , C N ( s ij ) the m × m covariancematrix of w N ( s ij ) , and C s ij ,N ( s ij ) the × m cross-covariance matrix between w ij and w N ( s ij ) . Wecan show that (cid:101) f ( w i ) = n i (cid:89) j =1 N ( w ij | B s ij w N ( s ij ) , F s ij ) , (6)where B s ij = C s ij ,N ( s ij ) C − N ( s ij ) , F s ij = C s ij − C s ij ,N ( s ij ) C − N ( s ij ) C N ( s ij ) ,s ij . In fact, (cid:101) f ( w i ) isthe probability density function of a multivariate normal distribution, which we denote as MVN ( , (cid:101) C S i ) .Given that each N ( s ij ) has at most m ( m (cid:28) min i n i ) members, it can be shown that the precision matrix (cid:101) C − S i is sparse with at most m ( m + 1) n i / non-zero entries.To construct { N ( s ij ) | j = 1 , . . . , n i } , i.e. the neighbor sets in the i th image, we ﬁrst order the voxelsby x-coordinate then y-coordinate, and denote the ordered voxels as s i , s i , ..., s i,n i , then deﬁne N ( s ij ) as the set of m voxels in { s i , s i , ..., s i,j − } that have the shortest Euclidean distance from s ij . Theordering of voxels has been shown to have no discernible impact on the approximation of (4) by (5).The choice of m can follow standard model comparison metrics such as DIC, but typically a small valuebetween ∼ can provide inference almost indistinguishable to full spatial models for an image withthousands of voxels (Datta et al., 2016). Our proposed model with NNGP for spatial modeling still followsthe structure of the full model (2), except that we replace MVN ( , C S i ) , the original prior for w i , by theNNGP prior MVN ( , (cid:101) C S i ) . The NNGP-based approach proposed in Section 3.3.1 reduces computational intensity by inducing spar-sity in the large spatial precision matrix, which has been proven to perform well in capturing local spatialdependence structures. An alternative approach is through reduced-rank approximation, which is betterequipped to capture large-scale, global spatial dependency (Finley et al., 2009). mong the various reduced-rank approximation techniques, we considered a knot-based method pro-posed by Banerjee et al. (2008), which regresses the original process on its realizations over a smallerset of locations, referred to as “knots”. Take the i th image as an example: we ﬁrst select a set of a knots, S ∗ i = { s ∗ i, , ..., s ∗ i,a } ⊂ S i , where a (cid:28) min i n i , with corresponding spatial random effects w ∗ i = ( w s ∗ i, , ..., w s ∗ i,a ) T . The original Gaussian process in model (2) yields w ∗ i ∼ MVN ( , C S ∗ i ) ,where C S ∗ i = C ( S ∗ i , S ∗ i | θ ) . Using w ∗ i as the basis, for any single site s ik , the corresponding spatialinterpolant is given by (cid:101) w ( s ik ) = E [ w ( s ik ) | w ∗ i ] = C s ik ,S ∗ i C − S ∗ i w ∗ i , where C s ik ,S ∗ i = C ( s ik , S ∗ i | θ ) isthe × a cross-covariance matrix between w s ik and w ∗ i . This single-site interpolator deﬁnes a new spa-tial process w ∼∼ ( S i ) ∼ GP ( , C ∼∼ ( · , ·| θ , S ∗ i )) , where C ∼∼ ( S i , S i | θ , S ∗ i ) = C S i ,S ∗ i C − S ∗ i C S ∗ i ,S i , and we canreplace the original spatial random effects w i by w ∼∼ i = E [ w i | w ∗ i ] = C S i ,S ∗ i C − S ∗ i w ∗ i . Since the resultingcovariance matrix has a ﬁxed rank a much smaller than min i n i , faster computation can be achieved byavoiding inverting spatial covariance matrices of size larger than a × a . The ﬁnal approach we consider uses Gaussian Markov random ﬁelds (GMRF). Different from the NNGP-based and reduced-rank approaches, GMRF does not specify a spatial correlation function. Instead, spatialdependency is introduced by specifying the conditional distributions { f ( w ij | w i, − j ) | j = 1 , . . . , n i } . Inthis paper, we apply a popular application of GMRF: the conditional autoregressive (CAR) model, on w i : w ij | w i, − j ∼ N ( (cid:88) k (cid:54) = j b ijk w ik , σ ) , (7)where b ijk is the weight of w ik on w ij that can be speciﬁed by the user, and we deﬁne b ijk = d − ijk (cid:80) l (cid:54) = j d − ijl ,with d ijk denoting the Euclidean distance between voxel j and k in the i th image. Using Brook’s lemma(Rue and Held, 2005), we can obtain that w i ∼ MVN ( , σ ( I − B i ) − ) , where B i = [ b ijk ] n i j,k =1 . Theprecision matrix of w i has a closed form σ − ( I − B i ) , therefore we have avoided the need to invert largecovariance matrices. However, since GMRF does not allow inference on the underlying spatial process,this model may not reveal the true spatial dependence structure, which may limit accuracy of the resultingclassiﬁer. Bayesian inference and classiﬁcation of the various models were implemented using MCMC algorithmsvia Gibbs sampler with a Metropolis-Hastings sampling step. In particular, the non-spatial parameters,as well as w ij ’s and k ij ’s, were updated via Gibbs sampling, while the spatial parameters in θ wereupdated in block via Metropolis-Hastings sampling. Our initial simulation results showed that when thespatial variance σ was large (e.g. 50) and the spatial range φ was large (e.g. 5), given a ﬂat prior, q r, ’s, the probit of the region-speciﬁc cancer prevalences, converged to values close to 0 or 1, which werefar from their true values. This was possibly due to the large-scale spatial correlation and large spatialvariance, which resulted in a small effective sample size for estimating the overall cancer prevalence.Therefore, instead of updating q r, ’s along with the other parameters, we set them equal to the sampleprevalence Φ − ( (cid:80) Ni =1 (cid:80) j : r ij = r c ij / (cid:80) Ni =1 (cid:80) j : r ij = r , r = 0 , . Our simulation results indicate that his substitution does not degrade our ability to appropriately model the other model parameters. Tofacilitate computational efﬁciency, we conducted a partial parallelization strategy in updating the voxel-level spatial components, including w ij ’s, k ij ’s, and the corresponding spatial matrices involved in theirfull conditional distributions, given the conditional independence between images. Details of the MCMCalgorithm are provided in the Web Appendix. We conducted simulation studies to evaluate the performance of the models discussed in Section 3. Tosimulate a prostate image, a mask (shape of the prostate image, including the voxel-wise region indicators r ij ’s and 2-D coordinates s ij ’s) was sampled with replacement from the prostate images in our motivatingdata set. Voxel-wise mpMRI parameters and cancer status were then simulated according to the full model(2), with a spatial correlation structure following the Matérn correlation function (3). We then applied theproposed models, and compared their performance with the baseline model (1) to evaluate the improve-ment in classiﬁcation due to modeling the between-patient heterogeneity and spatial correlation. We alsocompared our proposed models with the full model (2), to evaluate their performance in approximatingthe classiﬁcation accuracy of full spatial modeling. Since the full model is computationally infeasibleon the original-size images, we generated reduced-size images by taking every third row and column ofthe full-size images, resulting in about ∼ voxels per image. The general model parameters,including the mean, within-patient and between-patient covariances of the mpMRI parameters, and theregion-speciﬁc cancer prevalences, were set equal to the estimates from the motivating data set. We set σ (spatial variance) to be 1, 5 or 20, φ (range parameter) to be 0.1, 0.25, 0.5 or 2, and ν (smoothnessparameter) to be 0.5 or 1.5. Figure 2 shows the spatial correlation pattern given different values of φ and ν . In each simulation, we trained the classiﬁers using 34 training images, and evaluated their performanceon 10 test images. Table 1 reports the performance of seven candidate models, including M-base: the baseline model; M-sse:M-base plus subject speciﬁc effects (SSE) accounting for between-patient heterogeneity in the mpMRIparameters; M-sse-nngp, M-sse-rr, M-sse-car and M-sse-full: M-sse plus spatial modeling using NNGP,reduced-rank approximation, CAR model, and full model, respectively; and M-smooth: the “Msmooth”model proposed by Jin et al. (2018), which applies a spatial Gaussian kernel smoother to the posteriorpredictive cancer probabilities of M-base to account for spatial correlation between voxels, but withoutaccounting for between-patient variability. We set m (the number of nearest neighbors in NNGP) and a (the number of knots in reduced-rank approximation) both equal to 10 to ensure a fair comparison betweenthe NNGP and reduced-rank models. The nearest neighbor sets for NNGP were constructed following theprocedure introduced in Section 3.3.1, and for the reduced-rank model, the knots were selected as a setof equally spaced grids in each image. All samplers were programmed using the R package “Rcpp”(Eddelbuettel et al., 2011).From Table 1, we observed that the area under the ROC curve (AUC) was substantially improvedby formally modeling the patient heterogeneity in the mpMRI parameters (M-sse versus M-base or M-smooth, which uses post-hoc spatial smoothing) in all scenarios. We observed substantial improvement σ (spatial variance), φ (spatial range parameter), and ν (spatial smoothness parameter). “M-base”: the baseline model(1) in Section 3.1; “M-sse”: the baseline model plus subject-speciﬁc effects (SSE) accountingfor between-patient heterogeneity in the mpMRI parameters; “M-sse-nngp”, “M-sse-rr”, “M-sse-car” and “M-sse-full”: models that account for patient heterogeneity, as well as spatial correlationusing NNGP, reduced-rank model, CAR model and full spatial model, respectively. “M-smooth”:the “Msmooth” model in Jin et al. (2018), which is M-base plus a post hoc spatial Gaussiankernel smoother applied to the posterior predictive cancer probabilities. Bayesian inference andclassiﬁcation were based on two chains of 25000 MCMC iterations after a burn-in stage of 5000iterations. AUCs are summarized as means with standard deviations in the parentheses, whichwere obtained from 100 simulations for each data scenario. The “Time” row lists the averagecomputation time in hours for each simulation. Parameters AUC σ φ ν M-base M-sse M-sse-nngp M-sse-rr M-sse-car M-sse-full M-smooth1 2.0 0.5 .755 (.023) .820 (.015) .847 (.011) .851 (.012) .835 (.010) .852 (.011) .562 (.058)1.5 .750 (.032) .823 (.020) .833 (.012) .851 (.014) .837 (.013) .851 (.013) .520 (.064)0.5 0.5 .766 (.022) .819 (.017) .859 (.011) .852 (.011) .834 (.010) .861 (.011) .611 (.046)1.5 .763 (.021) .819 (.019) .849 (.011) .872 (.011) .838 (.011) .876 (.010) .636 (.048)0.25 0.5 .770 (.015) .815 (.018) .849 (.010) .800 (.019) .830 (.010) .850 (.011) .598 (.036)1.5 .767 (.019) .820 (.011) .858 (.010) .822 (.016) .833 (.010) .865 (.010) .622 (.039)0.1 0.5 .769 (.016) .817 (.013) .834 (.007) .627 (.014) .827 (.008) .818 (.008) .553 (.032)1.5 .768 (.015) .815 (.017) .837 (.008) .600 (.015) .828 (.008) .831 (.009) .564 (.035)5 2.0 0.5 .760 (.013) .819 (.014) .888 (.011) .888 (.013) .842 (.009) .892 (.012) .675 (.052)1.5 .751 (.027) .809 (.027) .821 (.023) .883 (024) .832 (.020) .882 (.024) .639 (.067)0.5 0.5 .770 (.017) .807 (.018) .892 (.010) .885 (.012) .835 (.009) .896 (.010) .722 (.036)1.5 .767 (.025) .808 (.020) .906 (.012) .919 (.011) .841 (.010) .925 (.011) .778 (.039)0.25 0.5 .776 (.015) .812 (.011) .876 (.005) .848 (.010) .835 (.005) .880 (.005) .693 (.024)1.5 .777 (.013) .808 (.021) .901 (.008) .873 (.009) .835 (.008) .907 (.008) .755 (.028)0.1 0.5 .777 (.016) .803 (.017) .837 (.007) .688 (.017) .824 (.006) .839 (.008) .607 (.019)1.5 .778 (.011) .810 (.009) .851 (.007) .688 (.015) .827 (.007) .853 (.007) .644 (.018)20 2.0 0.5 .765 (.012) .811 (.010) .902 (.009) .906 (.010) .839 (.009) .910 (.009) .750 (.054)1.5 .755 (.027) .811 (.024) .843 (.022) .923 (025) .843 (.018) .924 (.026) .754 (.076)0.5 0.5 .771 (.015) .802 (.014) .903 (.010) .897 (.013) .836 (.008) .914 (.010) .770 (.032)1.5 .770 (.021) .805 (.020) .940 (.010) .945 (.011) .844 (.010) .952 (.010) .843 (.034)0.25 0.5 .782 (.009) .801 (.016) .882 (.009) .856 (.013) .833 (.006) .893 (.009) .733 (.025)1.5 .761 (.026) .830 (.014) .924 (.012) .918 (.015) .849 (.012) .925 (.014) .790 (.027)0.1 0.5 .782 (.013) .804 (.014) .837 (.009) .698 (.021) .824 (.008) .845 (.010) .629 (.015)1.5 .786 (.009) .808 (.011) .852 (.008) .710 (.018) .826 (.007) .860 (.008) .673 (.017)Time 0.31 0.35 2.39 3.75 0.77 0.34Note: M-sse-full is computationally intensive for simulation studies if conducted under a fully Bayesian framework.Instead, we ﬁxed the spatial parameters to their true values, and therefore the corresponding computation time is notlisted here for comparison. φ (range pa-rameter, larger φ indicates larger-scale correlation) and ν (smoothness parameter, smaller ν in-dicates larger differentiability). The difference in x-axes between two neighboring vertical linesis equal to 0.04, which is the average distance between two neighboring voxels in the motivatingdata set. The range of the x-axes is (0 , , which is the scale of the pairwise distances betweenvoxels in the motivating data set. in AUC due to modeling the spatial correlation between voxels (the four models with spatial modelingversus M-sse) given large σ (large spatial variance), large φ (large-scale correlation) and large ν (littledifferentiability). Relative performance of the spatial models varied by scenario (see columns 6-9), with σ having little effect but φ and ν having a noticeable effect on the performance. The M-sse-nngp modelhad an average AUC close to that of the full model (M-sse-full) when φ equaled . and . (small-scale,local correlation) regardless of ν , and also performed well when φ equaled . and with the exception ofthe scenario where ν = 1 . . The M-sse-rr model, on the other hand, had an average AUC close to that ofM-sse-full when φ equaled . or (larger-scale correlation), but performed poorly when φ equaled . or . (local correlation), having worse performance than M-base in some scenarios (e.g. σ = 1 , φ = 0 . and ν = 1 . ). Both M-sse-nngp and M-sse-rr had an average AUC closer to that of M-sse-full when ν was smaller (larger differentiability). In general, M-sse-nngp approximated the AUC of M-sse-full wellunder most data scenarios except when φ = 2 and ν = 1 . , where there was smooth and very strongcorrelation across the whole image (see Figure 2); in contrast, M-sse-nngp did not approximate the AUCof M-sse-full well unless the spatial correlation had a large scale (e.g. φ = 2 ). One possible explanationfor M-sse-rr having poor performance in some scenarios is that 10 knots is not adequate for the reduced-rank approximation to capture the underlying spatial structure unless there is strong, large-scale spatialcorrelation across the whole image. In fact, our simulation results indicated that a much larger numberof knots ( a ≥ ) was required for good performance, which would lead to a much longer computationtime and thus was not conducted as part of our simulation study. The M-sse-nngp model, on the otherhand, demonstrated its advantage in that a much smaller number of nearest neighbors, in other words,much less computation time, is required, to approximate the classiﬁcation accuracy of the full model. he M-sse-car model had improved AUC compared with M-sse, but overall did not approximate the fullmodel well. The performance of M-sse-car was not affected by σ , φ or ν since the CAR model assumesa ﬁxed spatial structure with no unknown spatial correlation parameters. In summary, the M-sse-nngpmodel demonstrated robust performance, offering higher classiﬁcation accuracy and better approximationto the full model than M-sse-rr and M-sse-car in the majority of the simulated data scenarios.We also investigated the performance of the classiﬁers under a more complex spatial pattern, wherethe spatial covariance follows the average of three different Matérn covariance functions with ( σ , φ, ν ) equal to (20 , . , . , (20 , , and (20 , , . , respectively. Following the same simulation procedure,the mean and standard deviation (in the parentheses) of AUC were 0.765 (0.021) for M-base, 0.803 (0.022)for M-sse, 0.892 (0.017) for M-sse-nngp, 0.880 (0.018) for M-sse-rr, 0.834 (0.013) for M-sse-car, 0.902(0.016) for M-sse-full, and 0.769 (0.038) for M-smooth. This relative performance between models issimilar to that when generating data from a Matérn correlation structure, thus illustrating the robustnessof our results to model mis-speciﬁcation. We next illustrate the performance of the candidate models on the motivating mpMRI data set described inSection 2. Our preliminary analyses suggest that the between-patient variation in mpMRI parameters hasa complex pattern that might be hard to estimate with a limited sample (34 patients/subjects). Therefore,we applied each proposed spatial modeling approach both with and without subject speciﬁc effects (SSE)accounting for the between-patient variability, which leads to M-nnngp and M-sse-nngp: spatial modelingusing NNGP without and with SSE; M-rr and M-sse-rr: spatial modeling using reduce-rank approximationwithout and with SSE; M-car and M-sse-car: spatial modeling using the CAR model without and withSSE. Summaries of the ROC curve were obtained using 5-fold Cross-Validation to account for over-ﬁttingdue to training and evaluating the model on the same data set (Friedman, Hastie and Tibshirani, 2001).Table 2 summarizes the performance of the candidate models using AUC, S80 (sensitivity corre-sponding to 80% speciﬁcity), posterior mean of the spatial parameters and computation time in hoursto complete the 5-fold Cross-Validation. Spatial modeling with NNGP and reduced-rank approximationdemonstrated improvement in the AUC and S80 both with and without SSE compared with M-base: theAUC increased from 0.763 to 0.808 (M-nngp), 0.785 (M-sse-nngp), 0.807 (M-rr) and 0.774 (M-sse-rr),and the S80 increased from 0.615 to 0.673 (M-nngp), 0.641 (M-sse-nngp), 0.680 (M-rr) and 0.637 (M-sse-rr). This suggests that spatial modeling improved the classiﬁcation accuracy by successfully modelingthe spatial correlation in the data. The M-car and M-sse-car models, however, had performance essen-tially equivalent to M-base both without SSE (AUC: 0.764, S80: 0.607) and with SSE (AUC: 0.765, S80:0.605). Adding SSE to account for between-patient heterogeneity on top of spatial modeling did not resultin additional improvement in AUC or S80, and, in fact, lowered the AUC and S80 of M-nngp and M-rr.One possible explanation is that the SSE might have a complex distribution, with features that cannot bereﬂected by the assumed multivariate normal distribution.The three proposed spatial modeling approaches lead to different posterior estimates for the spatialparameters. For the NNGP-based models, the posterior mean of the spatial range ( (cid:98) φ ), smoothness ( (cid:98) ν ) andvariance ( (cid:99) σ ) were 1.67, 0.88 and 657.9, respectively, without SSE, and 1.64, 0.89 and 632.2, respectively,with SSE. For the models using reduced-rank approximation, (cid:98) φ , (cid:98) ν and (cid:99) σ were 0.71, 15.89 and 765.6,respectively, without SSE, and 0.76, 16.03 and 720.5, respectively, with SSE. For the CAR-based models,there is no assumed φ or ν , and the posterior mean of σ was 4.7 without SSE and 4.5 with SSE. We a b l e : M od e l p e rf o r m a n ce on t h e m o ti v a ti ng m p M R I d a t a s e t . T h eca nd i d a t e m od e l s i n c l ud e : “ M - b a s e” :t h e b a s e li n e m od e l ( ) i n S ec ti on3 . , “ M - nnngp ”a nd “ M - ss e - nngp ” : s p a ti a l m od e li ngu s i ng NNG P w it hou t a nd w it h SS E , r e s p ec ti v e l y , “ M -rr ”a nd “ M - ss e -rr ” : s p a ti a l m od e li ngu s i ng r e du ce -r a nk a pp r ox i m a ti on w it hou t a nd w it h SS E , r e s p ec ti v e l y , a nd “ M - ca r ”a nd “ M - ss e - ca r ” : s p a ti a l m od e li ngu s i ng t h e C A R m od e l w it hou t a nd w it h SS E , r e s p ec ti v e l y . B a y e s i a n i n f e r e n cea nd c l a ss i ﬁ ca ti on w e r e b a s e don t w o c h a i n s o f M C M C it e r a ti on s , a f t e r a bu r n - i n s t a g e o f it e r a ti on s . T h e“ AU C ” r o w li s t s t h ea v e r a g e AU C ob t a i n e d fr o m -f o l d C r o ss - V a li d a ti on , t h e“ S ” r o w li s t s t h e s e n s iti v it y c o rr e s pond i ng t o80 % s p ec i ﬁ c it y , a nd t h e“ T i m e” r o w li s t s t h e c o m pu t a ti on ti m e i nhou r s t o c o m p l e t e -f o l d C r o ss - V a li d a ti on . M - b a s e NNG P R e du ce d -r a nk C A R M - nngp M - ss e - nngp M -rr M - ss e -rr M - ca r M - ss e - ca r AU C . . . . . . . S . . . . . . . φ . ( . , . ) . ( . , . ) . ( . , . ) . ( . , . ) ν . ( . , . ) . ( . , . ) . ( . , . ) . ( . , . ) σ . ( . , . ) . ( . , . ) . ( . , . ) . ( . , . ) . ( . , . ) . ( . , . ) T i m e . . . . . . . N o t e : R e s u lt s f o r t h e NNG P a nd r e du ce d -r a nk m od e l s w e r e ob t a i n e du s i ng10n ea r e s t n e i ghbo r s a nd10kno t s , r e s p ec ti v e l y . T h e NNG P u s i ng m o r e t h a n10n e i ghbo r s g a v e s i m il a r AU C a nd S . T h e r e du ce d -r a nk m od e l c ou l dpo t e n ti a ll yh a v e i m p r ov e d c l a ss i ﬁ ca ti on i f w e u s e m u c h m o r e kno t s ( a = , f o r e x a m p l e ) . H o w e v e r , t h e M C M C a l go r it h m b ec o m e s m u c h m o r ec o m pu t a ti on a ll y i n t e n s i v e , a nd i n f ac ti s un r ea li s ti c t o a pp l y t o t h e d a t a . an observe that, for each spatial modeling approach, modeling with or without SSE resulted in similarestimated spatial parameters. For the models without SSE, M-nngp and M-rr both gave large (cid:99) σ , indicatingstrong spatial variation among voxels within each image. Although (cid:98) φ differed (1.67 for M-nngp and . for M-rr), both indicated large-scale spatial dependency (see Figure 2). The M-nngp model gavea smaller (cid:98) ν of 0.88, possibly because the NNGP tends to capture local spatial dependencies that areless smooth. The M-rr model, on the other hand, gave a larger (cid:98) ν of 15.89, indicating a much smoothercorrelation structure, possibly because reduced-rank approximation tends to capture large-scale, smoothspatial dependencies. In contrast, the CAR model assumed a ﬁxed spatial dependence structure that wasdetermined by the distance between voxels, which was quite different from the estimated spatial structuresusing NNGP and reduced-rank approximation, and thus M-car gave a quite different estimated spatialvariance (cid:99) σ = 4 . . There was also substantial variability in the computational intensity. For the 5-foldCross-Validation with 75000 MCMC iterations after a burn-in stage of 5000 iterations, the NNGP-basedmodels were approximately 3.1 times slower than the CAR-based models, and the models using reduced-rank approximation were approximately 4.9 times slower than the CAR-based models. Computation timefor the various models are reported in Table 2.Figures 3 and 4 show heatmaps and classiﬁcation maps, respectively, for two representative prostateimages. In the heatmaps, warmer color indicates higher posterior predictive cancer probability, and inthe classiﬁcation maps, red, yellow, grey and blue indicate correctly identiﬁed cancer regions (true pos-itive), incorrectly identiﬁed cancer regions (false positive), correctly identiﬁed noncancer regions (truenegative), and incorrectly identiﬁed noncancer regions (false negative), respectively, using the probabilitycut-off corresponding to 80% speciﬁcity. Compared with M-base which generated diffuse areas of highposterior cancer probabilities scattered throughout the images, all spatial models identiﬁed larger and moreintegrated true positive areas. The models using NNGP and reduced-rank approximation produced pre-diction maps with clearly distinguished cancer and noncancer regions; this is likely because the estimated φ and ν for the two models indicate strong, large-scale spatial correlation in the voxel-wise cancer proba-bilities, which leads to strongly clustered regions of cancer or noncancer voxels. Prediction maps from theCAR-based models are much more smooth, and it is hard to distinguish the cancer and noncancer regionsby eye. Using a probability cut-off that corresponds to 80% speciﬁcity, however, we can see from Figure 4that the CAR-based models had classiﬁcation results similar to those of M-base. In addition, the modelsthat did not account for patient heterogeneity (no SSE) generated prediction maps closer to the groundtruth than models with SSE: compared with M-nngp and M-rr, M-sse-nngp and M-sse-rr identiﬁed moreareas of high posterior cancer probabilities which lead to more true positive areas, but also larger areasof false positives around true positives. We have also applied several widely used machine-learning tech-niques for developing MR classiﬁers for prostate cancer to our in vivo data, including the random forest(RF), K-nearest neighbors algorithm (KNN), Naive Bayes, and quadratic discriminant analysis (QDA).The corresponding average AUCs obtained from 5-fold Cross-Validation were 0.711, 0.711, 0.724 and0.729, respectively, which were lower than those presented in Table 2, indicating the superiority of ourproposed method. The advantage of our proposed modeling approach is to be expected: these machine-learning approaches offer tremendous ﬂexibility but likely lead to over-ﬁtting for the small sample sizestypically found in mpMRI studies (Lemaître et al., 2015). In contrast, our approach provides the structurethat is needed to develop an accurate classiﬁer with small sample sizes, while building in the ﬂexibilityneeded to target speciﬁc features of the data that could improve classiﬁcation. i gu r e : M a p s o f t w o r e p r e s e n t a ti v e p r o s t a t e s : g r ound t r u t h ( c o l u m n1 , r e d r e g i on s r e p r e s e n tt h e r e g i s t e r e d a r ea s o f ca n ce r) , h ea t m a p s o f po s t e r i o r p r e d i c ti v eca n ce r p r ob a b iliti e s ob t a i n e d fr o m M - b a s e ( b a s e li n e m od e l ) , M - nngp ( NNG P w it hou t SS E ) , M -rr (r e du ce d -r a nk a pp r ox i m a ti on w it hou t SS E ) , M - ca r( C A R w it hou t SS E ) , M - ss e - nngp ( NNG P w it h SS E ) , M - ss e -rr(r e du ce d -r a nk a pp r ox i m a ti on w it h SS E ) , M - ss e - ca r( C A R w it h SS E ) , r e s p ec ti v e l y ( c o l u m n s - , w h e r e w a r m e r c o l o r i nd i ca t e s h i gh e r po s t e r i o r p r e d i c ti v eca n ce r p r ob a b ilit y , a nd t h ec o l o r w a ss ca l e dby t h e r a ng e o f t h e po s t e r i o r p r e d i c ti v eca n ce r p r ob a b iliti e s w it h i n eac h i m a g e ) . T h e w h it e do t s i n t h e h ea t m a p s i nd i ca t e m i ss i ngv a l u e s f o r a tl ea s t on e m p M R I p a r a m e t e rf o r t h e vox e l s . i gu r e : C l a ss i ﬁ ca ti on m a p s o f t w o r e p r e s e n t a ti v e p r o s t a t e s t h a t ca t e go r i ze vox e l s i n t o t r u e po s iti v e (r e d ) , f a l s e po s iti v e ( y e ll o w ) , t r u e n e g a ti v e ( g r e y ) a nd f a l s e n e g a ti v e ( b l u e ) , u s i ng t h e p r ob a b ilit y c u t - o ff c o rr e s pond i ng t o80 % s p ec i ﬁ c it y . M od e l s i n c l ud e M - b a s e ( b a s e li n e m od e l ) , M - nngp ( NNG P w it hou t SS E ) , M -rr(r e du ce d -r a nk a pp r ox i m a ti on w it hou t SS E ) , M - ca r( C A R w it hou t SS E ) , M - ss e - nngp ( NNG P w it h SS E ) , M - ss e -rr(r e du ce d -r a nk a pp r ox i m a ti on w it h SS E ) , M - ss e - ca r( C A R w it h SS E ) , r e s p ec ti v e l y ( c o l u m n s - ) . Discussion

This paper proposes Bayesian hierarchical models for high-resolution mpMRI data, which aim to improvethe voxel-wise classiﬁcation of prostate cancer by modeling the spatial correlation in the voxel-wise cancerstatus as well as the between-patient variability in the mpMRI parameters. To our best knowledge, ourmethod is the ﬁrst to investigate the modeling of spatial correlation in high-dimensional MR classiﬁcation.This enhanced voxel-wise cancer classiﬁcation can be utilized as a ﬁrst step of cancer lesion identiﬁcationfor improving prostate cancer management. Initially, the method would greatly impact the clinic througha reader-independent method for targeting biopsies from mpMRI data, with the potential to increase theconﬁdence with which clinically signiﬁcant cancer is diagnosed thus impacting treatment decisions. Inthe future, such method may play a role in improved guidance for focal therapies.Conventional spatial modeling becomes computationally infeasible for voxel-level cancer classiﬁca-tion due to the large size of the multi-image mpMRI data. We consider three scalable approaches usingsparse approximation via NNGP, reduced-rank approximation through a knot-based approach and GMRFwith a CAR model. Simulation results indicate that classiﬁcation can be substantially improved by mod-eling both the patient heterogeneity in the mpMRI parameters and spatial correlation structure within animage. The proposed M-sse-nngp model performed better under more local, smaller-scale spatial de-pendency, but was robust to the true spatial structure and had the best overall performance. The proposedM-sse-rr model outperformed M-sse-nngp only when there was strong correlation across the whole image.Otherwise, it performed poorly and even worse than non-spatial models when there was only small-scalecorrelation. The proposed M-sse-car model had higher AUC than the non-spatial models, but performedworse than M-sse-nngp in all simulated data scenarios. Application to in vivo data showed that spatialmodeling using NNGP and reduced-rank approximation improved the average AUC and S80 of M-base,while spatial modeling using the CAR model did not. Modeling between-patient heterogeneity on top ofspatial modeling did not further improve the average AUC and S80, and, in some cases, actually decreasedperformance. This is possibly because the between-patient variation has a complex structure that is hardto estimate only using information from the 34 patients. In fact, our preliminary analysis suggests that thedistribution of the subject speciﬁc effects may be bimodal, and there might be outliers. We did investigatemore complex modeling strategies, which, however, did not show better performance, suggesting that ourcurrent data might be insufﬁcient for estimating a more complex random effects distribution. However, wedo note that between-patient variation is an important feature of mpMRI, and further investigation shouldbe conducted once more data are available.The major contribution of the paper is to investigate scalable spatial modeling strategies, which ad-dress the computational challenge brought by simultaneously modeling and classifying multiple high-dimensional images with spatial correlation structures. Among the proposed spatial modeling approaches,CAR demonstrated the highest computational efﬁciency but low classiﬁcation accuracy. This is likely dueto the ﬁxed spatial correlation structure used in the CAR model, which might be far from the truth. Al-though the NNGP-based and reduced-rank classiﬁers demonstrated similar classiﬁcation accuracy on invivo data, the NNGP-based classiﬁer was approximately 1.6 times faster given equal numbers of nearestneighbors and knots ( m = a = 10 ). The computation time increases linearly with the increase in m forNNGP, and increases slightly faster with the increase in a for reduced-rank approximation. Applicationresults showed that the NNGP-based classiﬁer using 10 neighbors gave AUC and S80 almost as high asusing more than 10. The reduced-rank classiﬁer, however, could potentially have improved accuracy ifusing many more knots (e.g. a = 100 ), but the MCMC algorithm will be too computationally intensivefor our data. As a general conclusion, the NNGP-based classiﬁer is recommended for application, con- idering its robust performance with respect to the spatial correlation pattern, high classiﬁcation accuracy,and the small number of nearest neighbors required that ensures scalability for high-dimensional spatialmodeling of the between-voxel correlation. The NNGP-based classiﬁers completed 5-fold Cross Valida-tion with 80000 MCMC iterations per fold in approximately 60 hours on in vivo data. The computationtime increases linearly as the number of voxels per image increases, and therefore could be a computa-tional burden in practice. However, this issue could be alleviated by applying alternative approximationalgorithms, such as HMC-NUTS (Hoffman and Gelman, 2014), for the posterior distribution.We currently conduct spatial modeling assuming a stationary correlation structure, while in realitythe correlation structure can be more complex. A future extension is to conduct non-stationary spatialmodeling. For example, the spatial correlation may change as a function of the location in the prostate,and properly modeling this non-stationary structure could improve performance. Another future directionis cancer lesion identiﬁcation using voxel-wise mpMRI data, which has so far received limited attention inthe literature (Litjens et al., 2014, Leng et al., 2018). We are currently discussing voxel-wise classiﬁcationof prostate cancer, which is the focus for the majority of the existing quantitative mpMRI classiﬁers.However, clinical practice requires that the results are eventually translated into detection of cancer lesionswhich could better guide the decisions for clinical treatment, and determining how best to identify lesionsusing voxel-wise data is worthy of investigation.Our proposed method was motivated by the speciﬁc features of prostate mpMRI. However, similarchallenges will arise in settings that consider other organs or imaging modalities, as well, which alsoinclude high-dimensional spatial modeling. In this case, our speciﬁc classiﬁers cannot be directly applied,but our general modeling approach and evaluation of the various approaches to scalable modeling arerelevant. Among the three proposed scalable spatial modeling approaches, we recommend the NNGP-based approach considering its robustness with respect to spatial correlation pattern, high classiﬁcationaccuracy and scalability. Note that the mpMRI parameters should be transformed to ensure the validity ofnormality assumption, or assumed to have different distributions, which, however, might complicate themodeling procedure. Finally, our method was developed for 2-D mpMRI due to the technical challengesrelated to co-registering the MRI parameters and histopathology in 3-D. However, its extension to 3-D canbe easily accomplished by replacing the 2-D coordinates with their 3-D analogs, where the computationalcomplexity only increases linearly with the increase in the number of voxels per image. Acknowledgements

This work was supported by NCIR01 CA155268, NCIP30 CA077598, NIBIBP41 EB015894, and theAssistant Secretary of Defense for Health affairs, through the Prostate Cancer Research Program underAward No. W81XWH-15-1-0478. Opinions, interpretations, conclusions, and recommendations are thoseof the author and are not necessarily endorsed by the Department of Defense.

References

Banerjee, S., Gelfand, A.E., Finley, A.O. and Sang, H. (2008). Gaussian predictive process models forlarge spatial data sets.

Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 825-848. ressie, N. and Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets. Journal of theRoyal Statistical Society: Series B (Statistical Methodology) , 209-226.Datta, A., Banerjee, S., Finley, A.O. and Gelfand, A.E. (2016). Hierarchical nearest-neighbor Gaus-sian process models for large geostatistical datasets.

Journal of the American Statistical Association , 800-812.Dickinson, L., Ahmed, H.U., Allen, C., Barentsz, J.O., Carey, B., Futterer, J.J., et al. (2011). Magneticresonance imaging for the detection, localisation, and characterization of prostate cancer: recommen-dations from a European consensus meeting.

European urology , 477-494.Eddelbuettel, D., François, R., Allaire, J., Ushey, K., Kou, Q., Russel, N., et al. (2011). Rcpp: Seamless Rand C++ integration.

Journal of Statistical Software , 1-18.Eidsvik, J., Shaby, B.A., Reich, B.J., Wheeler, M. and Niemi, J. (2014). Estimation and prediction inspatial models with block composite likelihoods.

Journal of Computational and Graphical Statistics , 295-315.Finley, A.O., Banerjee, S. and McRoberts, R.E. (2009). Hierarchical spatial models for predicting treespecies assemblages across large domains.

The annals of applied statistics , 1052.Friedman, J., Hastie, T. and Tibshirani, R. (2001). The elements of statistical learning.

New York, NY,USA:: Springer series in statistics .Garcia-Reyes, K., Passoni, N.M., Palmeri, M.L., Kauffman, C.R., Choudhury, K.R., Polascik, T.J., etal. (2015). Detection of prostate cancer with multiparametric MRI (mpMRI): effect of dedicated readereducation on accuracy and conﬁdence of index and anterior cancer diagnosis.

Abdominal imaging ,134-142.Higdon, D. (2002). Space and space-time modeling using process convolutions.

Quantitative methods forcurrent environmental issues , 37-56. Springer, London.Hoffman, M. D. and Gelman, A. (2014). The No-U-Turn sampler: adaptively setting path lengths inHamiltonian Monte Carlo.

Journal of Machine Learning Research , , 1593-1623.Jin, J., Zhang L., Leng E., Metzger G.J. and Koopmeiners, J.S. (2018). Detection of prostate cancer withmultiparametric MRI utilizing the anatomic structure of the prostate. Statistics in Medicine , 3214–3229.Kalavagunta, C., Zhou, X., Schmechel, S.C. and Metzger, G.J. (2015). Registration of in vivo prostateMRI and pseudo-whole mount histology using Local Afﬁne Transformations guided by Internal Struc-tures (LATIS). Journal of Magnetic Resonance Imaging , 1104-1114.Kaufman, C.G., Schervish, M.J. and Nychka, D.W. (2008). Covariance tapering for likelihood-based esti-mation in large spatial data sets.

Journal of the American Statistical Association , 1545-1555.Khalvati, F., Wong, A. and Haider, M.A. (2015). Automated prostate cancer detection via comprehensivemulti-parametric magnetic resonance imaging texture feature models.

BMC medical imaging , 27. urhanewicz, J., Vigneron, D., Carroll, P. and Coakley, F. (2008). Multiparametric magnetic resonanceimaging in prostate cancer: present and future. Current opinion in urology , 71.Lemaître, G., Martí, R., Freixenet, J., Vilanova, J.C., Walker, P.M. and Meriaudeau, F. (2015). Computer-aided detection and diagnosis for prostate cancer based on mono and multi-parametric MRI: a review.

Computers in biology and medicine , 8-31.Leng, E., Spilseth, B., Zhang, L., Jin, J., Koopmeiners, J.S. and Metzger, G.J. (2018). Development of ameasure for evaluating lesion-wise performance of CAD algorithms in the context of mpMRI detectionof prostate cancer. CMedical physics , 2076-2088.Litjens, G., Debats, O., Barentsz, J., Karssemeijer, N. and Huisman, H. (2014). Computer-aided detectionof prostate cancer in MRI.

IEEE transactions on medical imaging , 1083-1092.Metzger, G.J., Kalavagunta, C., Spilseth, B., Bolan, P.J., Li, X., Hutter, D., et al. (2016). Detection ofprostate cancer: quantitative multiparametric MR imaging models developed using registered correla-tive histopathology.

Radiology , 805-816.Niaf, E., Rouvière, O., Mège-Lechevallier, F., Bratan, F. and Lartizien, C. (2012). Computer-aided di-agnosis of prostate cancer in the peripheral zone using multiparametric MRI.

Physics in Medicine &Biology , 3833.Peng, Y., Jiang, Y., Antic, T., Giger, M.L., Eggener, S. and Oto, A. (2013). A study of T 2-weightedMR image texture features and diffusion-weighted MR image features for computer-aided diagnosisof prostate cancer. In

Medical Imaging 2013: Computer-Aided Diagnosis , 86701H. InternationalSociety for Optics and Photonics.Rosenkrantz, A.B., Kim, S., Lim, R.P., Hindman, N., Deng, F.M., Babb, J.S., et al. (2013). Prostate cancerlocalization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and DataSystem (PI-RADS) and Likert scales.

Radiology , 482-492.Rue, H. and Held, L. (2005). Gaussian Markov random ﬁelds: theory and applications.

CRC press .Shah, V., Turkbey, B., Mani, H., Pang, Y., Pohida, T., Merino, M.J., et al. (2012). Decision support systemfor localizing prostate cancer based on multiparametric magnetic resonance imaging.

Medical physics , 4093-4103.Stein, M.L. (2012). Interpolation of spatial data: some theory for kriging.

Springer Science & BusinessMedia .Stein, M.L., Chi, Z. and Welty, L.J. (2004). Approximating likelihoods for large spatial data sets.

Journalof the Royal Statistical Society: Series B (Statistical Methodology) , 275-296.Tofts, P.S. (1997). Modeling tracer kinetics in dynamic Gd-DTPA MR imaging.

Journal of magneticresonance imaging , 91-101.Vecchia, A.V. (1988). Estimation and model identiﬁcation for continuous spatial processes.

Journal of theRoyal Statistical Society. Series B (Methodological) , 297-312. os, P.C., Barentsz, J.O., Karssemeijer, N. and Huisman, H.J. (2012). Automatic computer-aided detectionof prostate cancer based on multiparametric magnetic resonance image analysis. Physics in Medicine& Biology , 1527.Weinreb, J.C., Barentsz, J.O., Choyke, P.L., Cornud, F., Haider, M.A., Macura, K.J., et al. (2016). PI-RADS prostate imaging–reporting and data system: 2015, version 2.

European urology , 16-40.

Supporting Information