[PDF] Knowledge-Based Three-Dimensional Dose Prediction for Tandem-And-Ovoid Brachytherapy

Abstract

Purpose: To develop a knowledge-based voxel-wise dose prediction system using a convolution neural network for high-dose-rate brachytherapy cervical cancer treatments with a tandem-and-ovoid (T&O) applicator. Methods: A 3D U-NET was utilized to output dose predictions using organ-at-risk (OAR), high-risk clinical target volume (HRCTV), and possible source locations. A sample of previous T&O treatments comprising 397 cases (273 training:62 validation:62 test), HRCTV and OARs (bladder/rectum/sigmoid) was used. Structures and dose were interpolated to 1x1x2.5mm3 dose planes with two input channels (source positions, voxel identification) and one output channel for dose. We evaluated dose difference (\Delta D)(xyz)=D_(actual)(x,y,z)-D_(predicted)(x,y,z) and dice similarity coefficients in all cohorts across the clinically-relevant dose range (20-130% of prescription), mean and standard deviation. We also examined discrete DVH metrics used for T&O plan quality assessment: HRCTV D_90%(dose to hottest 90% volume) and OAR D_2cc, with \Delta D_x=D_(x,actual)-D_(x,predicted). Pearson correlation coefficient, standard deviation, and mean quantified model performance on the clinical metrics. Results: Voxel-wise dose difference accuracy for 20-130% dose range for training(test) ranges for mean (\Delta D) and standard deviation for all voxels was [-0.3%+/-2.0% to +1.0%+/-12.0%] ([-0.1%+/-4% to +4.0%+/-26%]). Voxel-wise dice similarity coefficients for 20-130% dose ranged from [0.96, 0.91]([0.94, 0.87]). DVH metric prediction in the training (test) set were HRCTV(\Delta D_90)=-0.19+/-0.55 Gy (-0.09+/-0.67 Gy), bladder(\Delta D_2cc)=-0.06+/-0.54 Gy (-0.17+/-0.67 Gy), rectum(\Delta D)_2cc=-0.03+/-0.36 Gy (-0.04+/-0.46 Gy), and sigmoid(\Delta D_2cc)=-0.01+/-0.34 Gy (0.00+/-0.44 Gy). Conclusion: 3D knowledge-based dose predictions for T&O brachytherapy provide accurate voxel-level and DVH metric estimates.

Full PDF

11 Knowledge-Based Three-Dimensional Dose Prediction for Tandem-And-Ovoid Brachytherapy

Running title: 3D Dose Prediction for GYN Brachytherapy Katherina G. Cortes, Aaron Simon, Karoline Kallis, Jyoti Mayadev, Sandra Meyers, and Kevin L. Moore* Department of Radiation Medicine and Applied Sciences University of California San Diego La Jolla, CA 92093 *Corresponding author: [email protected]

Abstract

Purpose:

The purpose of this work was to develop a knowledge-based voxel-wise dose prediction system using a convolution neural network (CNN) for high-dose-rate brachytherapy cervical cancer treatments with a tandem-and-ovoid applicator.

Methods:

A 3D U-NET CNN was utilized to output voxel-wise dose predictions based on organ-at-risk (OAR), high-risk clinical target volume (HRCTV), and possible source location geometry. The available dataset was a five-year sample of previously treated tandem-and-ovoid treatments comprising 397 cases (273 training: 61 validation: 61 test). HRCTV, OARs (bladder/rectum/sigmoid). Structures and dose were interpolated to 1mmx1mmx2.5mm dose planes with 2 channels: one for dose-emitting structures (possible source positions) and the other for voxel identification (OAR, HRCTV, or unspecified) with a single output channel for dose. To assess 3D voxel prediction accuracy, we evaluated the dose difference 𝛿𝐷 𝑥𝑦𝑧,𝑖𝑗 =𝐷 𝑎𝑐𝑡𝑢𝑎𝑙,𝑖𝑗 (𝑥, 𝑦, 𝑧) − 𝐷 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑,𝑖𝑗 (𝑥, 𝑦, 𝑧) and dice similarity coefficients in all cohorts across the clinically-relevant dose range for cervical cancer brachytherapy (20-130% of prescription), mean and standard deviation. We also examined discrete DVH metrics utilized for tandem-and-ovoid plan quality assessment: HRCTV D90%(dose to hottest 90% volume) and bladder/rectum/sigmoid D2cc, with ∆𝐷 𝑥 = 𝐷 𝑥,𝑎𝑐𝑡𝑢𝑎𝑙 − 𝐷 𝑥,𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 Pearson correlation coefficient, standard deviation, and mean quantifying model performance on the clinical metrics.

Results:

Voxel-wise dose difference accuracy for 20-130% dose range inside contours volumes for training (test) ranges for mean 𝛿𝐷̅̅̅̅ and standard deviation 𝜎 for all voxels [-0.3%±2.0% to +1.0%±12.0%] ([-0.1%±4.0% to +4.0%±26.0%]), HRCTV [-3.5%±5.1% to -1.7%±12.8%] ([-3.5%±4.8% to -2.6%±18.9%]), bladder [-0.7%±2.4% to +3.2%±12.0%] ([-2.5%±3.6% to +0.8%±12.7%]), rectum [-0.7%±2.4% to +15.5%±11.0%] ([-0.9%±3.2% to +27.8%±11.6%]), and sigmoid [-0.7%±2.3% to +10.7%±15.0%] ([-0.4%±3.0% to +18.4%±11.4%]). Voxel-wise dice similarity coefficients for 20-130% dose ranged from [0.96, 0.91] for training and [0.94, 0.87] for test cohort. Relative DVH metric prediction in the training (test) set were HRCTV ∆𝐷̅̅̅̅ ± 𝜎 ∆𝐷 =-0.19±0.55 Gy (-0.09±0.67 Gy), bladder ∆𝐷̅̅̅̅ ± 𝜎 ∆𝐷 = -0.06±0.54 Gy ( − ∆𝐷̅̅̅̅ ± 𝜎 ∆𝐷 = -0.03±0.36 Gy (-0.04±0.46 Gy) , and sigmoid ∆𝐷̅̅̅̅ ± 𝜎 ∆𝐷 = - Conclusion:

3D knowledge-based dose predictions for brachytherapy provide accurate voxel-level and DVH metric estimates that could be used for treatment plan quality control and potentially for fully automated plan generation.

Introduction

Brachytherapy (BT) is considered the standard of care for definitive radiation treatment for locally advanced cervical cancer that often utilizes an intracavitary device, such as a tandem and ovoids (T&O) . The dose is customized for a patient’s specific anatomy to ensure adequate dose to the high-risk cancer tumor volume (HRCTV) while sparing the organs-at-risk (OARs). High quality brachytherapy is an essential component of treating cervical cancer, linked to improved pelvic control and disease-free survival , but can be challenging and highly labor intensive . Recent advancements in 3D imaging and treatment planning for intact cervix BT have permitted the development of new techniques for shaping the dose distribution to permit better sparing of OARs and better dose escalation of the target . These techniques, which include increased utilization of interstitial applicators in addition to standard intracavitary applicators, however, increase not only the technical difficulty of BT implantation but also brachytherapy treatment planning resource requirements. Intensity Modulated Radiation Therapy (IMRT) has traditionally presented a similarly labor-intensive and practitioner-dependent planning challenge, where inter-patient anatomical variations coupled with subjectivity of human planners leads to widely documented plan quality variations that can put patients at risk for increased complication . Knowledge-based planning, a method that utilizes inferred correlations between patient anatomical variations and final plan dosimetry has been shown to reduce these plan variations . It is unknown how variable treatment plans are in the context of BT and use of knowledge-based dose prediction techniques for locally advanced cervical cancer BT has been limited . Published knowledge-based models for gynecologic BT are based on OAR dose-volume histogram (DVH) estimates, whereas current development in external beam knowledge-based models focuses on three-dimensional dose distribution prediction . The development of a three-dimensional estimation of expected BT dose distributions could highlight inconsistencies in present planning practices and standardize treatment planning for a standard of care therapy in cervical cancer. In this manuscript we describe a method for three-dimensional knowledge-based dose distribution prediction in the context of T&O BT using a convolutional neural network. The resultant system uses spatial information about OARs, high-risk clinical target volume (HRCTV), and applicator geometry to predict at a voxel-by-voxel level what the expected dose should be for a high-quality T&O treatment. We then sought to quantify the model accuracy across the dose distribution as well as in clinically-important BT quality metrics. In addition to serving as an objective measure quantifying BT plan quality, applications of three-dimensional dose distribution prediction in BT span the range from real-time quality control tool to the basis for fully automated knowledge-based BT treatment planning. Materials and Methods

Clinical datasets

A total of 395 treatment plans from 126 patients with cervical cancer over a 6-year period (2012-2018, UCSD IRB Project

Training

Validation

Test

Stage N training =273 plans N validation =61 plans N test =61 plans T1 90 33% 18 30% 16 26% T2 135 49% 28 46% 28 46% T3 47 17% 15 25% 14 23% T4 1 0% 0 0% 3 5% Physicians

A 122 45% 28 46% 20 33% B 55 20% 9 15% 14 23% C 73 27% 8 13% 26 43% D 12 4% 4 7% 1 2% E 11 4% 12 20%

Fraction

1 65 24% 16 26% 12 20% 2 63 23% 15 25% 13 21% 3 60 22% 13 21% 13 21% 4 53 19% 12 20% 14 23% 5 32 12% 5 8% 9 15%

Table 1.

Breakdown of training, test, and validation datasets by different features such as physicians, fraction number, and stage of each case.

Image Processing

All patients were planned in the BrachyVision treatment planning systems (Varian Medical Systems, Inc., Palo Alto, CA). All planning CT scans were at 2.5mm slice thickness with transverse pixel size ranging from 0.893 to 1.367mm. Dose grids were calculated at 1.0x1.0x2.5mm resolution. The associated DICOM-RT (CT, RT-STRUCT, RT-DOSE) for each plan was exported from BrachyVision and imported into MATLAB (Mathworks Inc., Natick, MA) using CERR . In MATLAB, the contours in the RT-STRUCT object were converted to image masks. To standardize the image matrices, the CT intensity values, and structure masks were interpolated to the same resolution as the dose grid: 1.0x1.0x2.5mm voxels. The brachytherapy applicator is extremely critical to the predictions and dose distributions as it dictates possible source positions. The applicators used were the Varian Titanium Fletcher-Suit-Delclos tandem and ovoid set (1AL07522000 Titanium intrauterine tandem angle 15°, 1AL07522001 Titanium intrauterine tandem angle 30°, or 1AL07522002 Titanium intrauterine tandem angle 45°). Because of the clear visibility of the high-density applicator in CT imaging, image features alone can be used without need to utilize the RT-PLAN object that contained the digitized source positions. Digitized source positions were not used because the aim was to utilize only image-based features without requiring applicator digitization. Using a Density- Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, a minimum HU threshold of 3000 was input to the clustering algorithm to identify the titanium applicator elements: one tandem and two ovoids. DBSCAN model parameters were tuned to maximize applicator identification accuracy. Best performance for identifying titanium applicators was obtained with DBSCAN parameters epsilon = 3.5 and minimum number of points = 30. The tandem was determined from these three clusters according to which cluster of points included the HRCTV center of mass. The other two structures were then labeled as ovoids. To pre-condition the planning data for neural network learning, we developed two input channels (applicator mask and anatomical mask) and one output channel (dose matrix). The dimensions of the voxel matrices were bounded by a 10x10x10cm volume centered on the HRCTV center of mass to ensure a similar field of view, a uniform matrix size for each plan, and computationally efficient neural network training on a clinically-relevant region of interest. At this point each plan was represented by a 3D matrix (100x100x40 voxels) with three channels. Each structure in the anatomical input channel was represented by a different number (1 for HRCTV, 2 for bladder, 3 for rectum, 4 for sigmoid); the applicator channel was one channel with a different number for each ovoid and the tandem. The output channel is composed of the continuous dose values, normalized to the prescription dose to ensure prediction similarity between cases with differing prescription values. Dose Transformation

Unlike EBRT, brachytherapy dose distributions diverge rapidly near the applicator due to the inverse square law. Prediction errors in this high gradient, high dose region can contribute disproportionately to a typical loss function of a neural network (e.g. mean squared error). Notably, the prediction accuracy in these extreme regions of the dose distribution do not contribute to standard clinical quality metrics of BT which are evaluated at the boundary of the HRCTV and in the regions of the OARs. To prevent the voxels very near the applicator from dominating the loss function and degrading accuracy in regions with clinically-relevant dose values, we employed a dose transformation function before inputting the dose channel to the neural network. An arbitrary dose transformation can be represented by an invertible monotonic function f whereby a scaled distribution The value of this approach is that this invertible function can be easily adjusted to tune the network performance without changing the functional operation of the network. For this work we found that scaling the dose over 150% with a square root function provided unbiased predictions in the dose region below 150%: 𝐷 ′ = 𝑓(𝐷) = { 𝐷 ≤ 150%, D𝐷 > 150%, (𝐷 − 150%) (1) trivially inverted by: 𝑓(𝐷 ′ ) = { 𝐷′ ≤ 150%,

D′𝐷′ > 150%, (𝐷′ − 150%) (2) Neural Network Modeling

As brachytherapy dose distributions have large dose gradients in all directions and voxel dose can be strongly affected by applicator positions that are offset craniocaudially from the voxel (e.g. voxel position is just superior to ovoid, dose will be dominated by a feature not contained in the same z-plane), we employed a convolutional neural network known as a 3D UNET . The neural network was trained on the COMET GPU at the San Diego Supercomputing Center using 16GB NIVIDIA P100 GPU Nodes, 16 input filters, Leaky ReLU activation function, dropout layer with parameter 0.3, initial kernel size of 3, Mean Squared Error (MSE) loss function, and ADAM optimizer to give meaningful predictions. The model was trained in 20 epoch batches until validation loss stopped decreasing to prevent overfitting. This resulted in a total of 100 epochs. The model at each epoch was saved for later access. The final model was then chosen out of the previous models based on when the validation set loss stopped decreasing. Figure 1 depicts the entire image processing and neural network training pipeline. Fig. 1.

Flowchart detailing image processing, neural network training and dose prediction workflow.

Evaluating Model Performance

Model performance was evaluated in two ways: (I) quantifying the voxel-level accuracy of the 3D network predictions and (II) evaluating how well the 3D network predicted clinically relevant quality metrics for gynecologic brachytherapy. To evaluate the voxel-wise performance, the primary quantification used was the dose difference: 𝛿𝐷 𝑥𝑦𝑧,𝑖𝑗 = 𝐷 𝑎𝑐𝑡𝑢𝑎𝑙,𝑖𝑗 (𝑥, 𝑦, 𝑧) − 𝐷 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑,𝑖𝑗 (𝑥, 𝑦, 𝑧) (3) where x, y, z represents the Cartesian position of the voxel in the i th case in the j th structure, expressed as a percentage of the prescription dose for the case. Given a dose range D l to D u , let 𝑆 𝐶 (𝑥, 𝑦, 𝑧, 𝑖, 𝑗; 𝐷 𝑙 , 𝐷 𝑢 ) represent the set of all points x, y, z in the i th case and j th structure such that 𝐷 𝑙 < 𝐷 𝑎𝑐𝑡𝑢𝑎𝑙,𝑖𝑗 (𝑥, 𝑦, 𝑧) ≤ 𝐷 𝑢 . Structures were defined as HRCTV, bladder, rectum, sigmoid and all voxels irrespective of structure type. To visualize the prediction accuracy across the C cohorts – training, validation, and test – we examined the voxel-wise dose difference across different percentage deciles using the average dose difference: 𝛿𝐷̅̅̅̅ 𝑗,𝐶 [𝐷 𝑙 , 𝐷 𝑢 ] = ∑ 𝛿𝐷 𝑥𝑦𝑧,𝑖𝑗𝑥,𝑦,𝑧,𝑖∈𝑆 𝐶 (4) and its standard deviation: 𝜎 𝑗,𝐶 [𝐷 𝑙 , 𝐷 𝑢 ] = √ ∑ ((𝛿𝐷 𝑥𝑦𝑧,𝑖𝑗 −𝛿𝐷̅̅̅̅ 𝑗 [𝐷 𝑙 ,𝐷 𝑢 ]) (5) Notably, because these metrics do not include spatial uncertainty terms, it is expected that 𝜎 will generally increase with dose. Because of this, we evaluated 𝛿𝐷̅̅̅̅ and 𝜎 for the clinical range 20% - 130% at 10% intervals to understand the dose estimation accuracy (bias and uncertainty) across individual structures, cohorts C, and in different ranges of the dose distribution. Another metric that has been used to quantitatively evaluate 3D dose estimations is the Dice Similarity Coefficient (DSC) computed on isodose surfaces. The DSC is defined as 𝐷𝑆𝐶 = with output values ranging from 0 to 1, where 1 is perfect prediction of the particular isodose volume. For the purposes of this study, A i (D) = set of all voxels in actual dose distribution >D for the i th case, B i (D) = set of all voxels in predicted dose distribution > D for the i th case, and the average 𝐷𝑆𝐶̅̅̅̅̅̅

𝐶,𝐷 = ∑ 𝐷𝑆𝐶 𝑖,𝐷𝑖∈𝑐 /𝑁𝑐 is computed for isodose levels between 25%-125% at 10% intervals (to align with 𝛿𝐷̅̅̅̅ 𝑗,𝐶 and 𝜎 𝑗,𝐶 percentage deciles) and averaged across the N C plans in the respective cohort C. To add clinical context to the dose estimation performance, we also utilized common quality metrics employed in brachytherapy plan evaluation . For OARs bladder, rectum, and sigmoid, we used D (dose to the hottest 2cc volume) and for HRCTV we used D (dose to the hottest 90% volume). Model performance on these parameters was quantified using the difference ∆𝐷 𝑥 = 𝐷 𝑥,𝑎𝑐𝑡𝑢𝑎𝑙 − 𝐷 𝑥,𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 . Across each cohort C, model prediction of 𝐷 𝑥 was evaluated with Pearson correlation coefficient R , ∆𝐷̅̅̅̅ 𝑥 = ∑ 𝐷 𝑥,𝑖𝑖∈𝑐 /𝑁 𝑐 and the standard deviation of the ∆𝐷 𝑥,𝐶 distribution ( 𝜎 ∆𝐷 ) . All 𝐷 𝑥 were reported in absolute dose (Gy) as this is more commonly used to evaluate OARs during treatment planning. Good performance on discrete quality metric prediction can be seen by |∆𝐷̅̅̅̅ 𝑥 | ≪ 𝜎 ∆𝐷 ≪ 𝐷 𝑅𝑥 where the nominal prescription dose 𝐷 𝑅𝑥 is 6 Gy (used in 62% of available cases; full range of prescription doses was 5.5-8 Gy across the cohorts). The out-of-sample test set was used as the primary comparisons to the training set since it was fully independent of the training process; however, the model performance on the validation cases will be shown for completeness. Results

Model Training

In total, the model took 75 hours to train on two GPU nodes for 100 epochs. Training and validation loss were calculated at each epoch. Using the epoch loss data, we selected epoch 80 for the final model because the training loss was still decreasing but the validation error was not

Figure 1 . Predictions on an individual dataset took ~1 minute using one GPU.

3D Dose Prediction Figure 2 shows a randomly selected patient from the test set as an example of the dose estimation performance. The outline of the input contours shows the dose distribution inside the OARs in the axial, sagittal, and coronal planes. To visualize the voxel-by-voxel performance, 𝛿𝐷 𝑥𝑦𝑧 is shown for the same viewing planes with the 100% prescription isodose of the actual plan shown as a dashed line. As expected, the large dose differences are confined near the applicator while the agreement outside the 100% isodose surface is good. Fig. 2.

Axial (top row), sagittal (middle row) and coronal (bottom row) slices show the actual, predicted and dose differences of a randomly chosen plan from the test set. Solid lines indicate contour outlines (R=rectum, B=bladder, S=sigmoid, H=HRCTV, A=applicator), the dotted line represents the actual 100% isodose line. In Figure 3, average dose differences 𝛿𝐷̅̅̅̅ 𝑗,𝐶 [𝐷 𝑙 , 𝐷 𝑢 ] and standard deviation 𝜎 𝑗,𝐶 [𝐷 𝑙 , 𝐷 𝑢 ] are plotted for each structure in the training, validation, and test set. The all voxel error plots in Figure 3(a-c) show minimal bias with standard deviation increasing as dose increases. The training set from 20-130% had a mean 𝛿𝐷̅̅̅̅ ranging from -0.3% to +1.0% and a standard deviation increasing from 2% at 20% through 12% at 130%. Across the 20-130% range, the test set bias was more variable than the training set, spanning -0.1% to +4.0%, representing slightly lowered voxel doses in the model-predicted distributions. The test set standard deviation also increased more quickly (4% to 26%). The table of Dice coefficients for isodoses 25% through 125% at 10% intervals was shown in

Figure 3(a-c) under the corresponding isodose deciles. The isodose dice coefficient is highest at 25% (training: 0.96, test: 0.94) and decreases slightly with dose until 125% (training: 0.91, test: 0.87). Within the HRCTV, the training cohort voxels error plots have shown mean error 𝛿𝐷̅̅̅̅ 𝑗,𝐶 spanning from -1.7% to -3.5% across the 40% - 130% range

Figure 3(d-f) , representing the predicted doses inside the HRCTV to be higher on average (in contrast to the same analysis across all voxels). Also, in contrast to all voxels analysis, the prediction error was lower inside the HRCTV, with 𝜎 ranging from 5% to 13% across the 40-130% span. The test set had a slightly higher bias ranging from -2.6% to -3.4% with a 𝜎 of 5% to 19% across 50-130% span (insufficient test set cases were available <50% prescription isodose). In all cohorts there were proportionally fewer cases with HRCTV minimum doses at low isodoses contributing to the voxel error plots. Inside the bladder Figure 3(g-i ), the training cohort voxels error plots show mean bias 𝛿𝐷̅̅̅̅ spanning from -0.7% to 3% across the 20% - 130% range, representing the network-predicted doses inside the bladder to be lower than the actual (in contrast to the same analysis across HRCTV voxels). Also, compared to the analysis across all voxels, the prediction error was lower inside the bladder, with 𝜎 ranging from 2.4% to 13% across the 20-130% span. The test set had a slightly higher bias ranging from -2.5% to 0.8% with a 𝜎 of 3.5% to 13% across 20-130% span. Training Validation Test Fig 3. Average model bias 𝛿𝐷̅̅̅̅ 𝑗,𝐶 and error 𝜎 𝑗,𝐶 are shown across all voxels (a-c), HRCTV ( d-f ), and OARs (g-o ) for each cohort. Solid black line denotes 𝛿𝐷̅̅̅̅ 𝑗,𝐶 and grey error bands denote standard deviation 𝜎 𝑗,𝐶 . Dice similarity coefficients for the respective isodose range are shown below the all voxels plot ( a-c ). The number of plans containing voxels in the defined dose regions shown below each structure-specific plot ( d-o ). The bias and error for rectum and sigmoid are shown for all cohorts in

Figure 3(j-l) and

Figure 3(m-o) , respectively. Both the rectum and sigmoid displayed similar trends of low bias for 20%-70% isodoses, ranging from -0.7%-0.5% and 2.4%-5.5%, respectively. Beyond 70% prescription isodose, the rectum and sigmoid 𝛿𝐷̅̅̅̅ both began increasing steadily with increased dose across all cohorts. Markedly, the bias begins to increase precisely when the number of plans with occupation in these isodose ranges begins to decrease. Within the rectum, the training cohort voxels error plots show mean error 𝛿𝐷̅̅̅̅ increasing from 1.6% to 15.5% across the 70% - 130% range with 𝜎 ranging from 6.4% to 11% across the 70-130% span. Across the same range, the test set had a higher bias increasing from 2.9% to 28% with an 𝜎 of 8.2% to 11.5%. As in the case of the bladder, positive mean bias 𝛿𝐷̅̅̅̅ represents the predicted doses inside the rectum and sigmoid to be lower than the actual doses. Clinical metric prediction exhibited the expected behavior of |∆𝐷̅̅̅̅ 𝑥 | ≪ 𝜎 ∆𝐷 ≪ 𝐷 𝑅𝑥 across all in the training and test cohorts Figure 4 . The HRCTV D Pearson correlation coefficients, R, for the training and test set were 0.83 and 0.71, respectively. Corresponding ∆𝐷̅̅̅̅ ± 𝜎 ∆𝐷 were -0.19±0.55 Gy for the training set and -0.09±0.67 Gy for the test set. The slight negative bias across implies higher predicted D coverage than observed in the actual plans. Model-predicted D s for the OARs were substantially better than HRCTV D predictions across all OARs and all cohorts. OAR D Pearson correlation coefficients for the bladder, rectum, and sigmoid were 0.91, 0.94, and 0.93 in the training set and 0.88, 0.90, and 0.88 in the test set. ∆𝐷̅̅̅̅ ± 𝜎 ∆𝐷 for the bladder were -0.06±0.54 Gy in the training set and -0.17±0.67 Gy in the test set. ∆𝐷̅̅̅̅ ± 𝜎 ∆𝐷 for the rectum were -0.03±0.36 Gy in the training set and -0.04±0.46 Gy in the test set. ∆𝐷̅̅̅̅ ± 𝜎 ∆𝐷 for the sigmoid were -0.01±0.34 Gy in the training set and 0.00±0.44 Gy in the test set. Fig. 4.

Actual vs predicted clinical dose metrics for training, test, and validation datasets. These plots include Pearson correlation coefficient (R), standard deviation ( 𝜎 ∆𝐷 ) and mean of the clinical metrics ( 𝛥𝐷̅̅̅̅ ). Blue lines indicate perfect model predictions. Closed dots on test set plots (third column) indicate the test plan that is displayed in Fig 2. Red x indicated plan mentioned in discussion.

Discussion

In an effort to standardize cervical cancer brachytherapy, we demonstrate the first knowledge-based 3D dose estimation for cervical cancer brachytherapy using spatial information of OARs, the HRCTV, and applicators to create an accurate voxel-level prediction model. While focused on the widely-used T&O applicator, this method could likely be expandable to other applicators and interstitial situations. This prediction system operates without need for source position digitization. However, if desired this could be easily incorporated by replacing the applicator channel with digitized source positions and retraining the network. While 3D dose prediction in external beam radiotherapy has been demonstrated across multiple disease sites, the comparatively steep dose gradients and highly inhomogeneous dose distributions in brachytherapy make comparing these modalities difficult. The closest external beam radiotherapy analog would likely be stereotactic radiosurgery (SRS), where Shiraishi et al reported voxel prediction error of ~10%, similar to this model’s performance shown in Figure 3. As direct voxel error includes no spatial uncertainty terms, it represents the most stringent evaluation criteria. Prior 3D dose prediction studies have directly employed spatial uncertainty terms such as gamma or isodose dice similarity coefficients in the context of prostate radiotherapy . McIntosh and Purdie reported the average Dice coefficient to be 0.88 with a range of 0.82 to 0.93 while Ngyuen et al reported the average DSC for their test and cross-validation set as 0.91 and 0.95, respectively. Our model performs similarly with a range of 0.84 to 0.96 for all datasets, with the aforementioned caveat that external beam and brachytherapy dose distributions are not necessarily directly comparable due to fundamental differences in their spatial variation and isodose span. The visualization of the prediction errors across different structures and isodose domains shown in Figure 3 offers interesting insight into the performance of the system. Noticeably, the rectum and sigmoid structures appear to pick up substantial bias at isodoses near the prescription dose, steadily increasing as the dose increases past 100%. Examining the most extreme cases in this group, one plan in the test set had a mean error of 32% for the rectum voxels with 120-130% prescription dose. Large voxel bias at high doses for this plan can be explained by proximity of the rectum to the ovoids and the very sharp dose gradients in that region. Notably, the HRCTV D bladder and sigmoid D were within their respective 𝜎 ∆𝐷 and the rectum  D of 0.50 Gy was only just outside 𝜎 ∆𝐷 of 0.46 Gy for this cohort, as shown in Figure 4 (c,f,i,l).

From a clinical perspective, the D and D predictions are not as biased, indicating that if we act on clinical data any voxel-wise bias at high dose is not as problematic. In addition, DSCs for this plan ranged from [0.88, 0.96] which is better than the test cohort’s overall average. This implyies that despite the error in the rectum and sigmoid, isodose similarity for this plan was overall better than the average plan from the test cohort. The non-zero bias for low occupation, higher isodoses in the OAR structures – even in the training cohort – can be understood from the fact that all cases in the training set contribute to the dose prediction model at all isodoses but only the hottest plans have OAR voxel occupations at near prescription isodoses. There are two explanations for diminishing occupation at higher isodose lines, one purely geometric (some cases have close OAR voxels but all have distant voxels) and one dosimetric (OAR voxel occupation in higher isodose lines biased towards cases where planning did not fully pull isodoses away from these structures, potentially a sign of sub-optimal planning). Both could be at work here, though the dosimetric explanation would result in an increasing bias at high isodoses because while all cases would be contributing to the dose prediction, the remaining sample at greater isodose occupation is biased towards cases where planning did not carve away the dose sufficiently. Figures 3(g-l) are suggestive of this interpretation but a definitive demonstration of this would require a full replanning study across a sufficiently large sample which is the focus of future work. While knowledge-based dose estimation is relatively new to the field of brachytherapy and cervical cancer, it is possible to compare our clinical metric prediction to Yusufaly et al where a knowledge-based organ-at-risk DVH estimation modeling approach was used on the same patient cohort in this work (though training, validation, and test set splits were different). OAR D predictions from the 3D neural network vs the 1D DVH estimates show similar accuracy: bladder 𝜎 ∆𝐷 was 0.67 Gy vs. 0.61 Gy, rectum 𝜎 ∆𝐷 𝜎 ∆𝐷 There are some existing limitations with our current approach; our data is restricted to one institution and only includes T&O cases. The current approach also requires that the applicator be inserted to make predictions; however, it can operate without the need for source position digitization. Despite current limitations, the metrics analyzed demonstrate that the model could serve as a useful quality control tool for T&O brachytherapy and, potentially, as the basis for automated knowledge-based planning.

Conclusion

This study demonstrates the ability to predict accurate 3D dose distributions for T&O brachytherapy using a deep learning framework.

Acknowledgements

We thank Kelly Kisling, Xenia Ray, and Aaron Babier for helpful discussions. This work was supported by Pedal the Cause and the Agency for Healthcare Research and Quality (AHRQ R01HS025440).

Conflict of Interest Statement

KLM reports honoraria and personal fees from Varian Medical Systems, outside the submitted work. KLM holds two patents (Developing Predictive Dose-Volume Relationships for a Radiotherapy Treatment and Knowledge-based Prediction of Three-Dimensional Dose Distributions), licensed to Varian Medical Systems. AS reports personal fees from Courage Health, Inc., outside the submitted work. JM reports personal fees from AstraZeneca, grants from NRG Oncology, grants from GOG Foundation, personal fees from Varian Medical Systems, outside the submitted work. References

1. Mayadev J, Viswanathan A, Liu Y, et al. American Brachytherapy Task Group Report: A pooled analysis of clinical outcomes for high-dose-rate brachytherapy for cervical cancer.

Brachytherapy . 2017;16(1):22-43. doi:10.1016/j.brachy.2016.03.008 2. Hanks GE, Herring DF, Kramer S.

Patterns of Care Outcome Studies Results of the National Practice in Cancer of the Cervix . 3. Eifel PJ, Moughan J, Erickson B, Iarocci T, Grant D, Owen J. Patterns of radiotherapy practice for patients with carcinoma of the uterine cervix: A patterns of care study.

Int J Radiat Oncol Biol Phys . 2004;60(4):1144-1153. doi:10.1016/j.ijrobp.2004.04.063 4. Mayadev J, Dieterich S, Harse R, et al. A failure modes and effects analysis study for gynecologic high-dose-rate brachytherapy.

Brachytherapy . 2015;14(6):866-875. doi:10.1016/j.brachy.2015.06.007 5. Nelms BE, Tomé WA, Robinson G, Wheeler J. Variations in the Contouring of Organs at Risk: Test Case From a Patient With Oropharyngeal Cancer.

Int J Radiat Oncol . 2012;82(1):368-378. doi:10.1016/j.ijrobp.2010.10.019 6. Pötter R, Tanderup K, Kirisits C, et al. The EMBRACE II study: The outcome and prospect of two decades of evolution within the GEC-ESTRO GYN working group and the EMBRACE studies.

Clin Transl Radiat Oncol . 2018;9:48-60. doi:10.1016/j.ctro.2018.01.001 7. Tanderup K, Ménard C, Polgar C, Lindegaard JC, Kirisits C, Pötter R. Advancements in brachytherapy.

Adv Drug Deliv Rev . 2017;109:15-25. doi:10.1016/j.addr.2016.09.002 8. Lombe D, Crook J, Bachand F, et al. The addition of interstitial needles to intracavitary applicators in the treatment of locally advanced cervical cancer: Why is this important and how to implement in low- and middle-income countries?

Brachytherapy . 2020;19(3):316-322. doi:10.1016/j.brachy.2020.02.004 9. Walter F, Maihöfer C, Schüttrumpf L, et al. Combined intracavitary and interstitial brachytherapy of cervical cancer using the novel hybrid applicator Venezia: Clinical feasibility and initial results.

Brachytherapy . 2018;17(5):775-781. doi:10.1016/j.brachy.2018.05.009 10. Nomden CN, de Leeuw AAC, Moerland MA, Roesink JM, Tersteeg RJHA, Jürgenliemk-Schulz IM. Clinical Use of the Utrecht Applicator for Combined Intracavitary/Interstitial Brachytherapy Treatment in Locally Advanced Cervical Cancer.

Int J Radiat Oncol . 2012;82(4):1424-1430. doi:10.1016/j.ijrobp.2011.04.044

11. Moore KL, Schmidt R, Moiseenko V, et al. Quantifying Unnecessary Normal Tissue Complication Risks due to Suboptimal Planning: A Secondary Study of RTOG 0126.

Int J Radiat Oncol . 2015;92(2):228-235. doi:10.1016/j.ijrobp.2015.01.046 12. Wu B, Ricchetti F, Sanguineti G, et al. Patient geometry-driven information retrieval for IMRT treatment plan quality control.

Med Phys . 2009;36(12):5497-5505. doi:10.1118/1.3253464 13. Zhu X, Ge Y, Li T, Thongphiew D, Yin F-F, Wu QJ. A planning quality evaluation tool for prostate adaptive IMRT based on machine learning.

Med Phys . 2011;38(2):719-726. doi:10.1118/1.3539749 14. Appenzoller LM, Michalski JM, Thorstad WL, Mutic S, Moore KL. Predicting dose-volume histograms for organs-at-risk in IMRT planning.

Med Phys . 2012;39(12):7446-7461. doi:10.1118/1.4761864 15. Moore KL, Brame RS, Low DA, Mutic S. Experience-Based Quality Control of Clinical Intensity-Modulated Radiotherapy Planning.

Int J Radiat Oncol . 2011;81(2):545-551. doi:10.1016/j.ijrobp.2010.11.030 16. Cornell M, Kaderka R, Hild SJ, et al. Noninferiority Study of Automated Knowledge-Based Planning Versus Human-Driven Optimization Across Multiple Disease Sites.

Int J Radiat Oncol Biol Phys . 2020;106(2):430-439. doi:10.1016/j.ijrobp.2019.10.036 17. Yusufaly TI, Kallis K, Simon A, et al. A knowledge-based organ dose prediction tool for brachytherapy treatment planning of patients with cervical cancer.

Brachytherapy . 2020;19(5):624-634. doi:10.1016/j.brachy.2020.04.008 18. Shiraishi S, Moore KL. Knowledge-based prediction of three-dimensional dose distributions for external beam radiotherapy.

Med Phys . 2016;43(1):378-387. doi:10.1118/1.4938583 19. McIntosh C, Purdie TG. Voxel-based dose prediction with multi-patient atlas selection for automated radiotherapy treatment planning.

Phys Med Biol . 2017;62(2):415-431. doi:10.1088/1361-6560/62/2/415 20. McIntosh C, Purdie TG. Contextual Atlas Regression Forests: Multiple-Atlas-Based Automated Dose Prediction in Radiation Therapy.

IEEE Trans Med Imaging . 2016;35(4). doi:10.1109/TMI.2015.2505188 21. Babier A, Mahmood R, McNiven AL, Diamant A, Chan TCY. Knowledge-based automated planning with three-dimensional generative adversarial networks.

Med Phys . 2020;47(2):297-306. doi:10.1002/mp.13896 22. Apte AP.

Walk-Through of CERR Capabilities CERR: Computational Environment for Radiological Research . http://cerr.info/wiki Comput Sci Eng . 2014;16(5). doi:10.1109/MCSE.2014.80 27. Nguyen D, Long T, Jia X, et al. A feasibility study for predicting optimal radiation therapy dose distributions of prostate cancer patients from patient anatomy using deep learning.

Sci Rep . 2019;9(1):1-10. doi:10.1038/s41598-018-37741-x 28. Welch ML, Mcintosh C, Purdie TG, et al. Automatic classification of dental artifact status for efficient image veracity checks: effects of image resolution and convolutional neural network depth.