A machine learning approach to using Quality-of-Life patient scores in guiding prostate radiation therapy dosing
Zhijian Yang, Daniel Olszewski, Chujun He, Giulia Pintea, Jun Lian, Tom Chou, Ronald Chen, Blerta Shtylla
AA machine learning approach to using Quality-of-Life patient scores inguiding prostate radiation therapy dosing
Zhijian Yang a,b , Daniel Olszewski c,d , Chujun He e , Giulia Pintea f,g , Jun Lian h , Tom Chou i ,Ronald Chen j , Blerta Shtylla k a New York University, New York NY 10012 b Current Address: Applied Mathematics and Computational Science Program, University of Pennsylvania,Philadelphia, PA 19104 c Carroll College, Helena MT 59625 d Current Address: Computer, Information Science and Engineering Department, University of Florida, GainesvilleFL 32611 e Smith College, Northampton MA 01063 f Simmons University, Boston MA g Current Address: Department of Psychology, Tufts University, Boston MA 02111 h Department of Radiation Oncology, The University of North Carolina, Chapel Hill, NC 27599 i Depts. of Computational Medicine and Mathematics, UCLA, Los Angeles, CA 90095-1766 j Department of Radiation Oncology, University of Kansas Medical Center, Kansas City, KS 66160 k Department of Mathematics, Pomona College, Claremont CA 91711
Abstract
Thanks to advancements in diagnosis and treatment, prostate cancer patients have high long-termsurvival rates. Currently, an important goal is to preserve quality-of-life during and after treat-ment. The relationship between the radiation a patient receives and the subsequent side effects heexperiences is complex and difficult to model or predict. Here, we use machine learning algorithmsand statistical models to explore the connection between radiation treatment and post-treatmentgastro-urinary function. Since only a limited number of patient datasets are currently available,we used image flipping and curvature-based interpolation methods to generate more data in orderto leverage transfer learning. Using interpolated and augmented data, we trained a convolutionalautoencoder network to obtain near-optimal starting points for the weights. A convolutional neu-ral network then analyzed the relationship between patient-reported quality-of-life and radiation.We also used analysis of variance and logistic regression to explore organ sensitivity to radiationand develop dosage thresholds for each organ region. Our findings show no connection betweenthe bladder and quality-of-life scores. However, we found a connection between radiation appliedto posterior and anterior rectal regions to changes in quality-of-life. Finally, we estimated radia-tion therapy dosage thresholds for each organ. Our analysis connects machine learning methodswith organ sensitivity, thus providing a framework for informing cancer patient care using patientreported quality-of-life metrics.
Keywords:
Machine Learning, Convolutional Neural Network, Radiation Therapy, OrganSensitivity, Prostate Cancer
1. Introduction
Approximately 175 thousand new cases of prostate cancer were reported in 2019 in the UnitedStates [18]. Depending on age and cancer stage, first-line treatments include prostatectomy, radia-tion treatment, and androgen ablation. Each of these treatments carries different side effects. For
Preprint submitted to Elsevier May 25, 2020 a r X i v : . [ q - b i o . Q M ] M a y ome patients, a prostatectomy is followed by radiation treatment to minimize the possibility ofrecurrence. Radiation planning for each patient begins with a CT scan, which is followed by thedemarcation of the prostate (radiation target) and the surrounding organs (bladder and rectum)by a physician. Each plan is customized to a patient as there is some flexibility in the spatialdosing of radiation with the primary consideration being delivery of sufficient dosage of radiationto the target organ without overexposing and damaging surrounding organs and structures. Ra-diation treatment (RT) plans are developed using Dose Volume Histograms (DVH). DVH discardall organ-specific spatial information and they are usually based on a single planning CT scan thatdoes not account for anatomical variations over the course of several weeks of therapy [10]. Variousmetrics have been developed in order to translate the information from a DVH into a computedprobability of uncomplicated tumor control using normal tissue complication probability models(NTCP) [10]. These efforts are necessary as the relationship between exposure to radiation of thesurrounding organs/structures and the severity and probability of toxicity (urinary and bowel) isstill not fully understood. From a physiological perspective, it is not clear how radiation dosageaffects tissues, organs, as well as their control and function.In order to develop a radiation plan that minimizes patient side effects, one needs to quan-tify these side effects post-radiation. Traditionally, physician-assessed scoring systems have beenwidely used for measuring patient side effects following cancer treatment [3]. However, more recentevidence indicates that clinicians can downgrade the frequency and severity of patients’ treatment-related symptoms [3, 14]. Patient-reported health-related quality-of-life (QOL) surveys are becom-ing an important tool in measuring outcomes after cancer treatment [17, 20]. For example, a studyby K. Diao et al. [4] explored urinary and bowel symptom development during treatment usingpatient-reported QOL scores (from 1, indicating no symptoms, to 5, indicating high frequency ofsymptoms). An IRB waiver was received at the University of North Carolina for this retrospectivestudy using anonymized data. The results showed that average scores progressively increased frombaseline throughout treatment, but all symptoms resolved to baseline levels by follow-up. In thecontext of RT, NTCP models can be correlated to patient-reported QOL data, as was done in [11].However their analysis focused only on urinary symptoms during post-prostatectomy radiotherapyand NTCP models rely on already reduced DVH information. Given the rich organ-specific in-formation obtained during treatment planning, it could be desirable to more directly connect 3Dpatient CT scans and dosing with QOL scores.Machine learning methods have become state of the art in many applications with impressiveresults; deep convolutional networks have many times outperformed traditional methods for diag-nosis [15] or visual recognition tools [8]. In biomedical imaging, deep convolutional neural network(CNN) algorithms have been applied to a wide array of problems. Of particular interest in ourcontext are medical image segmentation machine learning algorithms, such as the U-net [16] archi-tecture that focuses on semantic segmentation of biomedical images. The U-net architecture hasappealing features that we will employ in our approach such as it uses a large number of max-pooling operations to allow for the identification of global, non-local features and up-convolutionto return images to their original size. In the work of Nguyen et al. [13], a modified U-net ar-chitecture was shown to accurately predict voxel-level dose distributions for intensity-modulatedradiation therapy for prostate cancer patients. These prior studies indicate tremendous promise ofguidance of radiation treatment planning with artificial intelligence-based algorithms. Nevertheless,these algorithms can be of significant use in treatment planning if they can also incorporate QOLpredictions that can provide immediate guidance for the dosimetrist during clinical plan optimiza-tion. To our knowledge, there are no prior approaches that integrate QOL in machine learningalgorithms in the context of RT for prostate cancer.In the department of Radiation Oncology at the University of North Carolina, QOL scores were2ollected using a validated questionnaire [19] administered during weekly treatment visits as partof the routine clinical work-flow for prostate cancer patients. While QOL data have been studied post prostate cancer radiation treatment, data collected during the course of treatment can conveyimportant information about symptom development, which can be a fertile ground for the use ofquantitative modeling to guide optimal RT dosing. Accordingly, in this paper we analyze data froma 14-question quality-of-life prostate cancer patient survey that was collected over the span of fiveyears (2010-2015). The data we examined tracked patient urinary and bowel side effects before andalong the course of their treatment (about seven weeks). Associated with each patient’s QOL, wealso examined associated anatomical CT scans and radiation dosing patterns for approximately 50patients.In this paper, we propose a CNN algorithm to explore the connection between the spatialdistribution of the RT dose and the QOL outcomes reported from patients in our data set. Asignificant problem with CNN algorithms in our context is the need for a sufficiently large dataset; to resolve this issue, we augmented our patient data sets using interpolation algorithms thatgenerated synthetic patients by combining existing patient data. In addition, we used transferlearning in order to improve the performance of our CNN algorithm and implemented steps toavoid overfitting the problem. A key goal for our study was to generate insight into the mostradiation-sensitive tissue regions.As a comparative alternative to the CNN approach, we also used analysis of variance and logisticregression to explore organ sensitivity to radiation and develop dosage thresholds for each organregion. We identified regions of the rectum that were highly correlated with changes in individualpatient symptoms. Finally, we estimated radiation therapy dosage thresholds to determine howhigh radiation therapy dosage needed to be in order to trigger collateral symptoms. Combiningresults from machine learning and direct analyses of organ sensitivity provides a powerful frameworkto inform patient care in the quality-of-life context.This paper is organized as follows. In the Methods section, we formulate convolutional neuralnetwork algorithms and statistical models. In the Results section, we demonstrate that the CNNalgorithm we developed can identify correlations between bowel-related symptoms and radiation.Furthermore, we support these findings through statistical analyses that explore organ sensitivityto radiation dosage. Finally, in the Discussion and Conclusions section, we compare our resultswith those of previous studies.
2. Methods
Our patient-reported data were extracted from a patient quality-of-life survey containing an-swers to 14 questions, seven questions pertaining to urinary symptoms and seven concerned withbowel symptoms. The specific survey questions and possible responses are shown in Appendix A.Patients took the survey before undergoing radiation therapy, once a week during the seven-week-long treatment, and after completing therapy. Answers by patient j to question i at time point t = { , , , , , , , } , a ij ( t ), were scored on a discrete scale ranging from 1 to 5 (a score of 1 indi-cating the least severity in symptoms, and a score of 5 indicating very high severity in symptoms).We used this quality-of-life survey and de-identified CT scans and treatment plans for a total of 52patients (note that 57 patients were provided, however, five patients had incomplete data and werediscarded in our analysis).The answers a ij ( t = 0) to the first survey taken before treatment were used as the baseline ofsymptoms before radiation therapy. Subsequent answers a ij ( t ≥
1) provide information on patient j ’s symptoms associated with question i . In order to reduce the dimensionality of the data set3ollected over several weeks, we developed a single score for urinary-related symptoms and bowel-related symptoms as follows.For each patient, we divided the questions and associated answers into those concerning urinaryor bowel function and then identified the worst (maximum) score a ∗ ij ≡ max t a ij ( t ) a patient reportedthroughout the multi-week treatment for each question. A single total difference score for eachpatient j , ∆ j , was computed as the difference a ∗ ij − a ij (0), summed over the urinary- or bowel-subset of answers i : ∆ u j = (cid:88) i =1 (cid:2) a ∗ ij − a ij (0) (cid:3) , ∆ b j = (cid:88) i =8 (cid:2) a ∗ ij − a ij (0) (cid:3) . (1)Here, we have ordered questions and corresponding answers associated with urinary function as i = { , , , , , , } and the bowel-associated questions as i = { , , , , , , } . The totaldifference score ∆ u j and ∆ b j represent the total change in the quality of life associated with urinaryand bowel function, respectively.Next, we used a cut-off (or threshold) value of 6 to convert the patient’s quality-of-life responsesinto a binary classifier defined by the discrete Heaviside function y j ≡ H (∆ j −
6) = (cid:40) j < , j ≥ . (2)Score thresholding can be visualized in Figure 1 that shows the urinary total difference scores foreach patient overlaid with colored boxes that mark the score cut-off value. The threshold of 6 wasinitially arbitrarily chosen using the fact that it is more than half of greatest sum of changes forpatients; however, we tested other thresholds in order to discern issues with the binary classifier asdetailed in the Results section. Total difference score, ∆ uj (urinary) P a t i en t Figure 1: Total difference scores in urinary symptoms for patients 1-54. In blue, we mark patients who were classifiedas having a significant change in symptoms with ∆ u j ≥ .2. Image augmentation Previous biomedical studies focusing on image processing have used data sets as small as 85data points [7] and as large as 100,000 data points [15]. Since the amount of data points we have iseven smaller than 85, we decided to enrich our dataset prior to training our algorithms. We usedthe Fischer-Modersitzki (FM) curvature-based image registration technique [5] in order to generatenew in silico patients as interpolations of existing patient data.The FM approach was developed in the context of image registration. For completeness, weoutline the core ideas of image registration algorithms here. Let d -dimensional images be repre-sented by compactly supported mappings T, R : Ω → R where Ω =]0 , d . Specifically, the quantity T ( x ) is the intensity or image grey value at the spatial position x ∈ Ω. Given a reference image R ( x ) and a deformable template image T ( x ), a registration algorithm outputs a deformation, ordisplacement field, u : R d → R d such that when applied to the template image, T ( x − u ( x )) aresulting modified template more closely matches the reference, R ( x ). The problem is then how tofind a desired deformation u = ( u , . . . , u d ). This becomes an optimization problem as one tries tominimize the difference between the deformed template T u := T ( x − u ( x )) and the reference R ( x ).Variations in registration methods arise when one employs a particular optimization techniqueand one must define a metric for measuring the goodness of a deformation. Let D be the distancemeasure between the reference R and deformed template T , and S a measure of the smoothness ofthe deformation u . The FM approach consists of finding u by minimizing the joint functional J [ u ], J [ u ] := D [ R, T ; u ] + αS [ u ] . (3)The regularization parameter α is used to control the strength of the smoothness of the displacementversus the similarity of the images. The difference-squared measure D is given by D [ R, T ; u ] := 12 (cid:107) R − T u (cid:107) = 12 (cid:90) Ω ( R ( x ) − T ( x − u ( x ))) d x , (4)and the curvature-based smoothness S [ u ] := 12 d (cid:88) j =1 (cid:90) Ω (∆ u j ) d x , (5)with Neumann boundary conditions defined by ∇ u j ( x ) = 0 , x ∈ ∂ Ω , j = 1 , . . . , d. (6)We obtain a minimizer u by first ensuring that the Gˆateaux derivative of the objective functionvanishes. The resulting Euler-Lagrange equations are f ( x , u ( x )) + αA [ u ]( x ) = 0 , x ∈ Ω . (7)with f ( x , u ( x )) = ( R − T u ) · ∇ T u ( x ) = ( R ( x ) − T ( x − u ( x ))) · ∇ T ( x − u ( x )) , (8) A [ u ]( x ) = ∆ u . (9)The above semi-linear partial differential equations (PDE) are known as the Navier-Lame bihar-monic and diffusion equations. 5he Euler-Lagrange PDE’s can be solved using the following fixed-point iteration αA [ u k +1 ]( x , t ) = − f ( x , u k ( x , t )) , k ≥ , (10) u = 0 . (11)Since the computational domain Ω is of a simple geometry in this case, a finite differenceapproximation for the derivatives can be used. This yields a linear system of equations that aresolved in each iteration step to obtain the deformation u ; more details of the discretization schemewe implemented to solve for u are given in [5].Returning to our problem, the goal is to take each CT slice of patient A and interpolate itwith each CT slice of patient B creating a new stack of CT image slices for a new “patient” C.We achieved this interpolation by applying the FM registration approach and selecting one stackof images to serve as the reference and another to serve as the template. We implemented FM in Matlab , using the discretization approach outlined above and also the implementation outlinedin [9]. A new interpolated patient C was obtained by applying the deformation to the template CTimage stacks. We performed this procedure on the 52 patients’ cropped CT image stacks for thebladder and the rectum respectively, thus producing 1,326 new images for each organ. An exampleof the interpolated CT images we obtained using this method is shown in Figure 2. (a) Reference (b) Template (c) Interpolation
Figure 2: Interpolation of CT images of the bladder using the FM algorithm. The interpolated image (c) is thedeformed version of the template image (b) against the reference image (a).
Radiation plans are overlayed on a baseline CT scan and they give the spatial distribution ofthe doses that will be given to a patient over the course of treatment. When the plan is mappedonto a CT image, high radiation dosage regions are denoted by high pixel intensity (white) whilelow dosages are represented by darker pixels. A cross-section of a representative radiation plan isshown in Figure 3. We applied FM interpolation onto the radiation plan images for each in silico patient. Specifically, we interpolate between radiation plans of patient A and patient B to create anew stack of radiation plans for a new “patient” C.
Next, we constructed a 3-level 3D convolutional neural network (CNN) model for each organ:one processing data related to urinary symptoms and one processing data related to rectal symp-toms. The architecture of the model is illustrated in Figure 4.The convolutional layers use filters of size 3 × × ×
48 pixels down to 16 igure 3: A representative radiation plan. The highest dosage corresponds to the greatest pixel intensity (in white).Black pixels correspond to no radiation.Figure 4: Architecture used for the CNN classification model. There are three layers that have convolution, activation,and pooling. The last convolution layer is connected by fully connected layers and a drop-out layer which drops out40% information to avoid over-fitting. Finally the model output two nodes which tell the predicted probability thata patient will or will not manifest change in (urinary or bowel) symptoms throughout treatment pixel. We chose to use max pooling for our pooling method so that the filters captured the strongest(and thus highest) pixel value for each stride (we use strides of 1 for each filter and a pooling sizeof [2,2,2]). In addition, each convolutional layer is followed by an exponential linear unit (ELU)activation function defined as f ( x ) = (cid:40) e x − x < ,x x ≥ . (12)We used batch normalization (BN) after the convolution and ELU operations, which have beenshown to update weights equally throughout the CNN, resulting in faster convergence [12]. Inaddition, we used drop-out (40 percent rate) to reduce overfitting since our dataset was small.Because our model is a classifier, we use a cross-entropy loss function that the network minimizes7hrough back-propagation:Loss( k ) = − N N (cid:88) j =1 y j log p k ( y j ) + (1 − y j ) log(1 − p k ( y j )) , (13)where k is the step number, y j = { , } classifier label of the j th patient, and p k ( y j ) represents thepredicted probability for the corresponding label at the k th iteration.Our CNN had 2 channels as part of its input layer: one to process information related to thepatient CT scans, and one to process information related to the patient RT plans. The input dataconsisted of 52 patient CT scan stacks and RT plans cropped around the organ of interest (eitherthe bladder or the rectum) with the respective organ doctor annotated contours for each CT scan.The CT scans and RT plans were cropped because the information outside of the organs of interestwas not useful for the purposes of this study. Cropping also minimized the computational powerneeded to run our algorithm. The final layer of the model consisted of two nodes: one providingthe predicted probability that a patient would manifest a change in (urinary or bowel) symptomsthroughout treatment; and one giving the predicted probability that a patient would not manifesta change in (urinary or bowel) symptoms throughout treatment. In addition, the CNN produceda confusion matrix (for either the urinary or bowel symptoms) outlining how many patients itaccurately predicted from the testing set.To assess the overall performance of our model, the CNN trained on 39 patients with a batchsize of 20 and learning rate of 0.001 for approximately 12 hours, and was then tested against theremaining 13 patients. Patients were shuffled and randomly assigned to the training and testingsets to avoid bias. The CNN also employed a 5-fold cross-validation procedure on the training set,similar to the approach in Jiang et al. [6]. Each of the 5 folds split the training set into 31 trainingpatients and 8 validation patients, respectively. Every fold initialized a classifier (for a total of 5classifiers), from which we could select the model that performed best, based on its accuracy andnumber of true-positives, and evaluated it on the testing set. We used the number of true-positivesas a criterion for the best-performing model in order to avoid only predicting a lack of a change insymptoms. The cross-validation model and corresponding loss functions used can be visualized inFigure 5. Figure 5: (A) Model cross-validation. We initialized 5 different models. For each model there is a validation setthat moves across the data. First, we trained on the training set, then checked each of the models on the differentvalidation sets, providing a general idea of which model would work the best. (B) Training (orange) and validation(blue) loss for the best model chosen through cross-validation. As shown in orange, the training loss becomes verysmall, but the validation loss stays at about 0 . .5. Autoencoder We employed a convolutional autoencoder network that uses a portion of the CNN architecture.Specifically, instead of connecting to a fully connected layer after all the convolution layers as wedid with the CNN, the autoencoder was used to pre-train the network on unlabeled informationby reconstructing the original images. The convolutional autoencoder network architecture is illus-trated in Figure 6 and is similar to the U-net architecture used in related segmentation problems[7, 2]. After the autoencoder was trained, we truncated the network at the start of the deconvolu-tion layers and connected it to a fully connected layer that served as the output for our new CNNmodel. (A) (B)
TestTraining
Steps x Lo ss Figure 6: Convolutional Autoencoder. (A) Schematic of the architecture of the convolutional autoencoder network.Three layers have convolution, activation, and pooling. The network deconvolves with activation and pooling forthree more layers. The network targets to reconstruct the input image with the goal to learn the key features of theCT scans and RT plans. (B) Autoencoder loss while training on the bladder data. The blue curve shows the loss forthe test set and the orange curve shows the loss for the training set. Both decrease with the number of iterations ofthe autoencoder.
Training of the autoencoder allowed us to implement a transfer learning approach where wefirst trained the autoencoder network to reconstruct patient images and assigned the near-optimalweights obtained from autoencoder training as initial conditions for the CNN network with thebinary classifier. This pre-training was necessary due to the small size of of the dataset.
In order to further explore the relationship between QOL scores and RT dosage, we investigatedwhether or not there were any correlations between a certain organ region’s RT dosage and whetheror not the patient experienced a change in symptoms related to that organ. We accomplished thisthrough the use of analysis of variance (ANOVA) and logistic regression.We first built an algorithm to obtain the RT dosage on the rectum and bladder contours. We fedthe cropped image of the organ of interest (with overlaid contours from the earlier data preparationprocess) into the algorithm and obtained a new image of the organ with thicker contours. The9lgorithm was able to do this by locating the center of the organ and obtaining a radius thattraces around the organ’s original (doctor annotated) contour to obtain the new, thicker contour.This allowed us to obtain more information on the RT dosage located around the contour of theorgan. We were then able to obtain the RT dosage corresponding to the area occupied by the new,thicker organ contours. Had we used the original organ contours, we may not have had the mostcomplete information on the prescribed amount of RT. The new, thick bladder contours with thecorresponding RT dosage are shown in Figure 7.Following this procedure, we combined all CT images of one patient into a three dimensionalcube and separated it into either top and bottom, or front and back regions. We split the bladderinto top and bottom halves and the rectum into front and back regions using the following rationale.Since the bottom of the bladder is closer to the prostate, we speculated this region would be exposedto more radiation and thus be associated with a higher incidence of collateral urinary symptoms.Therefore, we split the bladder into top and bottom. As for the rectum, we split it between frontand back, since the front is closer to the prostate and we thus anticipated it would be exposed toa higher radiation dosage. Organ regions are illustrated in Figure 7(C).
Figure 7: Organ contouring and organ regions. (A) A CT slice of the rectum of patient 40, showing the newthick contour. (B) Same as (A) but showing the thick contour with the corresponding RT dosage. (C) Lateralview diagram showing how the bladder (yellow) and rectum (red) were split for spatial RT analyses. Image from http://libcat.org/anatomy-of-prostate-cancer . Finally, we define a quantity d lj as the total RT dosage measured in cGy for a specified organregion, l and patient, j . Total dosages were computed for: ( l = 1) top of the bladder, ( l = 2)bottom of the bladder, ( l = 3) total bladder, ( l = 4) front of the rectum, ( l = 5) back of therectum, and ( l = 6) total rectum. Organ sensitivity.
We first conducted a paired 2-tailed t-test to evaluate whether or not thetotal average dosages were significantly different across all the regions outlined above. We foundthat all regions had significantly different total RT dosages ( p < . y dij ≡ H (( a ∗ ij − a ij (0)) −
2) = (cid:40) a ∗ ij − a ij (0) < , a ∗ ij − a ij (0) ≥ . (14)Using this approach, we obtained a binary value y dij for each survey question i and patient j sothat we could identify which symptoms were significantly affected by the corresponding RT dosage.Then, the binary data were used in one-way ANOVA, which compared the distribution of RTdosage for a given organ region ( l = 1 , . . . ,
6) between the two patient groups ( y dij = 1 vs y dij = 0)for a given symptom question i . Recall that using our convention urinary questions, i = 1 − l = 1 − i = 8 −
14 correspond to regions10 = 4 −
6. This analysis allowed us to identify which specific symptoms were significantly affectedby the corresponding RT dosage for a given organ region.
Dosage Thresholding.
Since the previous analysis identified organ regions that were associatedwith significant changes in symptoms, we next investigated the ranges of RT dosage that couldtrigger such changes. We used logistic regression applied to all the original binary patient classifiers y j (from Eq. (2)) and region total doses, d lj in order to predict whether a patient would experiencechanges in symptoms given the corresponding region dose, d lj . The goal of this analysis was toidentify the lowest RT dose for patients with changes in symptoms ( y j = 1) and the highest RTdose for patients that did not have a change in symptoms ( y j = 0).For each logistic model, a tunable RT dose threshold parameter, θ determined to which categorypatients were assigned based on the model’s prediction, ˆ y j . For each region l , we varied θ untilwe obtained a θ p corresponding to no false positives and a θ n corresponding to no false negativeswhen comparing y j with the predicted logistic classifier ˆ y j . Once we obtained the thresholdingparameters for each region, l , we recorded the corresponding RT dosage ranges and used them toinfer how high RT dosage had to be in order to trigger collateral symptoms.
3. Results and Discussion
We evaluated the performance of the CNN network using a measure of accuracy defined to bethe number of patients with correctly predicted outcomes over the total number of patients. Weestimated accuracy results for the bladder and the rectum symptoms by running our algorithm 10times and averaging the results. We did not find conclusive results for the bladder, as we obtainedan average accuracy of 38% with a range of 23% to 53%. This indicated that the CNN model, usingthe available data, did not find any patterns to classify the patients for bladder symptoms. Forthe patients with a change in bladder symptoms, there was a lot of variability in model predictionsranging anywhere from 0% to 50% with an average of 27%.In contrast, promising results were obtained for classification of rectal symptoms. Our modelwith cross-validation accurately predicted an average of 74% changes in rectal symptoms with arange of 62% to 84.5%. The results are visualized in Figure 8. For patients with a change insymptoms, the model was on average accurately predicting the change in symptoms 56% of thetime with a range from 25% to 100%. In Table 1, we give the confusion matrix for the rectum modelwith the validation set that resulted in an 84.6% accuracy. Of the 10 patients without a changein symptoms, 9 of them were accurately predicted. Of the 3 patients with a change in symptoms,2 were accurately predicted. The result is promising because the model is not always predictingone of the classes; it is picking up some patterns from the patients’ data so that it can classify thepatients in either category.We will show later that our results are rather insensitive to the threshold delineating quality-of-life changes.
Confusion Matrix (Rectum Symptoms)Actual Predicted
No Change ChangeNo Change 9 1Change 1 2
Table 1: Confusion matrix for rectum model. Table shows the accuracy for one completely validated model. Weshow the actual classification of the patient, and what the model predicted. igure 8: Accuracy for our trained classification model. The overall accuracy for the bladder and rectum and themodel accuracy within patients who experienced symptoms (denoted with symp). As we can see with the bladder andbladder symptom accuracy, there were no significant differences, as results sit below the 50% line. For the rectum,the overall accuracy for predicting rectum symptoms exceed the 50% line. Other than classifying patients based on their quality-of-life score with a cut-off value of 6, wealso used thresholds of 5 and 7 and used the reclassified data to train the classification model.Unsurprisingly, the accuracy rates are similar to those found using a threshold of 6.For the data reclassified with a threshold of 5, our model with cross-validation accuratelypredicted an average of 69% changes in rectal symptoms.For the data reclassified with a threshold of 7, we found an average accuracy rate of 69%. Asthere are no considerable differences in results when we change the threshold, reinforcing that ouroriginal choice of a cut-off-value, the half of the greatest sum of changes, is a reasonable way toclassify patients based on their quality-of-life scores.
Based on our ANOVA analyses for the bladder data, it does not appear that there were anysignificant differences between the two patient groups in regards to average spatial RT dosing forthe bottom and top of the bladder, as shown in Table 2.
ANOVA Dose Difference ResultsQoL Question P-Value (Bottom of Bladder) P-Value (Top of Bladder)1(Flow Easiness) 0.5331 0.07632 (Frequency at Night) 0.2405 0.86523 (General Frequency) 0.58571 0.6874 (Pain) 0.6682 0.18955 (Urgency) 0.5272 0.0946 (Control) 0.9656 0.17677 (Leak) 0.7949 0.8284All Questions 0.6172 0.6667
Table 2: P-values corresponding to the ANOVA analysis of each urinary symptom and the average RT dosage in thetop and bottom of the bladder. ∗ indicates p < .
05 which we interpret as statistically significant. NOVA Dose Difference ResultsQuestion P-Value (Front of Rectum) P-Value (Back of Rectum)1 (Diarrhea) 0.0018* 0.0002*2 (Urgency) 0.0025* 0.0006*3 (Pain) 0.0023* 0.0008*4 (Bleeding) 0.6563 0.41425 (Cramping) 0.0545 0.0015*6 (Pass Mucus) 0.0387* 0.0185*7 (Nothing to Pass) 0.0541 0.0504All Questions 0.0123* 0.0112*
Table 3: P-values corresponding to the ANOVA analysis of each rectal symptom and the average RT dosage in thefront and back of the rectum. ∗ indicates p < .
05 which we interpret as statistically significant.
According to our ANOVA analyses on rectum data, those who experienced changes in symptomsof diarrhea, urgency, pain, and passing mucus had significantly higher average RT dosage in thefront of the rectum. Furthermore, those who experienced changes across all rectal symptoms hada significantly higher average RT dosage in the front of the rectum. We found the same results forthe back of the rectum (with a difference in cramping which turned out to be significant for theback but not the front). The corresponding p-values for both organ regions are listed in Table 3.The results indicate that the rectum is more sensitive to radiation therapy, matching the resultsfrom our CNN model.According to our logistic threshold analysis, we found that patients tend to develop bowel-related symptoms throughout the treatment if they are receiving doses more than 2,500 cGy in therectum. Correspondingly, in Figure 9(A), we also found that the safe doses to the rectum rangefrom 1,250 to 2,950 cGy, implying that a dose greater than 2,950 cGy will trigger the developmentof collateral symptoms. If we further assume the radiation dosages to the different parts of therectum are uncorrelated, we can also independently find dosage thresholds for the front and backof the rectum. We observed that the front of the rectum could tolerate a higher range of absolutedosage (2,100-4,300 cGy) than the back of the rectum (400-1,700 cGy). These threshold values arelisted in Figure 9(A) and similar results for the top, bottom, and total bladder regions are shownin Appendix B.While this conclusion is consistent with the correlation found between the rectal symptomsand rectum dosages in the interval 2500-4200 cGy [1], we cannot rule out that it could resultfrom a possible collinear effect in which patients received high doses to both front and back of therectum. Since the symptoms cannot be associated with excess radiation to the front of back of therectum, the region-dependent dosage thresholds are likely to depend on the dosage experienced inother regions. The dosage-symptom instances are shown in a scatter plot in Figure 9(B). Here,symptoms are typically associated with larger overall dosages. The data are not sufficient to resolveindependent region-specific thresholds.For the bladder, the dosage thresholds for the top and bottom of the bladder showed significantoverlap: the top of the bladder could tolerate a dosage of 0-4,665 cGy while the bottom couldtolerate a dosage of 3,183-5,387 cGy (see Appendix B). The corresponding scatter plot shows nodiscernible correlation between symptoms and sampled dosages.
4. Summary and Conclusions
With the lowering of the prostate cancer mortality rate, an emphasis has been placed on in-creasing the quality-of-life for patients undergoing radiation treatment. Utilizing machine learningalgorithms and statistical methods, we provide an in-depth analysis on the spatial dosage providedto each patient. By analyzing a patient’s anatomical CT image and the radiation therapy dosing,13
False Positives 0 FalseNegatives
Model Threshold 0.08 0.62Dosage Range (cGy) 2100 4300
Dosage Thresholds for the Front of the Rectum0 False Positives 0 FalseNegatives
Model Threshold 0.13 0.53Dosage Range (cGy) 400 1700
Dosage Thresholds for the Back of the Rectum0 False Positives 0 FalseNegatives
Model Threshold 0.12 0.62Dosage Range (cGy) 1250 2950
Dosage Thresholds for the Total Rectum (A)
Rectum Front Radiation Dose R e c t u m B a ck R ad i a t i on D o s e SymptomNo Symptoms (B)
Figure 9: Logistic model thresholds and corresponding RT dosages for each rectum region. (A) Computed thresholdsassuming independence. (B) Scatter plot for patients and their corresponding front and back RT doses. Patients withsymptoms and without symptoms are shown in red and blue, respectively. The distribution of sampled RT doses arejust broad enough to observe the that higher doses lead to symptoms. The mean front and back radiation doses ofpatients with and without symptoms are indicated by the thick red and blue bars on the x - and y -axes, respectively.The current data are insufficient to resolve anything other than a total dosage thresholding effect. we were able to connect understanding of how radiation influences secondary symptoms. We wereable to do this by using a convolutional neural network that analyzed the CT image and associatedradiation dosage. Our second method used ANOVA analysis on summarized spatial information.Using a brute-force technique, we were able to identify that splitting the bladder into a top regionand bottom and the rectum into a front and back region was the best approach. Our outcomes fromANOVA agreed with our convolutional neural network and also provided dosage thresholds for eachregion. These results for the dosage thresholds for the rectum and bladder align with the resultswe obtained from our CNN prediction model, but should be interpreted with care. The thresholdsacross different regions should not be thought of as independent parameters because the dosagesapplied in the patient samples are correlated and the binary, whole patient symptom indicators arenot attributed to any region. Moreover, the number of patients and the range of radiation dosesthey received are not large enough clearly resolve sharper thresholds. This explains the wide rangefor the bladder dosage thresholds and the overlap we observed between the top and bottom of thebladder. On the other hand, our CNN prediction model found that the radiation dosage (and theCT scan features) do in fact play a large role in explaining the differences in symptom developmentacross patients.In conclusion, we developed a deep learning framework and complementary statistical methodsto identify the connection between spatial dosage and symptoms caused by prostate radiationtherapy. A strength of machine learning is that it can produce accurate predictions if presentedwith sufficiently large data sets; however, the underlying mechanisms or specific features are difficultto discern in these approaches. In our application, it has the potential not only to accurately predictpatient side effects, but also to learn what regions of the organs might be responsible for specific sideeffects. As is significant interest in integrating machine learning approaches with more traditionalmodeling approaches, we also found that classical statistical approaches was also useful in ourproblem. We expect that our CNN results will be much more accurate upon subsequent trainingon larger patient data sets and can be extended to predicting specific question scores to furtherrefine treatment planning. 14 . Acknowledgements This research was funded by The Jayne Koskinas Ted Giovanis Foundation for Health andPolicy and the Breast Cancer Research Foundation. TC acknowledges support from the ArmyResearch Office through grant W911NF-18-1-0345 and the National Institutes of Health throughgrant R01HL146552 (TC). The authors also thank the Institute for Pure and Applied Mathematicsat UCLA for hosting this project under the Research in Industrial Projects for Students program.
6. ReferencesReferences [1] Massoud Al-Abany, Asgeir R. Helgason, Anna-Karin Agren Cronqvist, Bengt Lind, PanayiotisMavroidis, Peter Wersll, Helena Lind, Eva Qvanta, and Gunnar Steineck. Toward a definitionof a threshold for harmless doses to the anal-sphincter region and the rectum.
InternationalJournal of Radiation Oncology · Biology cdot
Physics , 61(4):1035–1044, 2005.[2] Anjali Balagopal, Samaneh Kazemifar, Dan Nguyen, Mu-Han Lin, Raquibul Hannan, AmirOwrangi, and Steve Jiang. Fully automated organ segmentation in male pelvic CT images.
Physics in Medicine & Biology , 63(24):245015, 2018.[3] Ethan Basch. The missing voice of patients in drug-safety reporting.
New England Journal ofMedicine , 362(10):865–869, 2010. PMID: 20220181.[4] Kevin Diao, Emily A. Lobos, Eda Yirmibesoglu, Ram Basak, Laura H. Hendrix, Brittney Bar-bosa, Seth M. Miller, Kevin A. Pearlstein, Gregg H. Goldin, Andrew Z. Wang, and Ronald C.Chen. Patient-reported quality of life during definitive and postprostatectomy image-guidedradiation therapy for prostate cancer.
Practical Radiation Oncology , 7(2):e117–e124, 2017.[5] Bernd Fischer and Jan Modersitzki. A unified approach to fast image registration and a newcurvature based registration technique.
Linear Algebra and its Applications , 380:107–124, 2004.[6] Z. Jiang, J.-F. Witz, P. Lecomte-Grosbras, J. Dequidt, C. Duriez, M. Cosson, S. Cotin, andM. Brieu. B-spline Based Multi-organ Detection in Magnetic Resonance Imaging.
Strain ,51(3):235–247, 2015.[7] Samaneh Kazemifar, Anjali Balagopal, Dan Nguyen, Sarah McGuire, Raquibul Hannan, SteveJiang, and Amir Owrangi. Segmentation of the prostate and organs at risk in male pelvic CTimages using deep learning.
Biomedical Physics & Engineering Express , 4(5):055003, 2018.[8] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deepconvolutional neural networks.
Commun. ACM , 60(6):84–90, 2017.[9] Stuart Alexander MacGillivray. Curvature-based image registration: Review and extensions.mathesis, University of Waterloo, 2009.[10] Lawrence B. Marks, Ellen D. Yorke, Andrew Jackson, Randall K. Ten Haken, Louis S. Cons-tine, Avraham Eisbruch, Søren M. Bentzen, Jiho Nam, and Joseph O. Deasy. Use of normaltissue complication probability models in the clinic.
International Journal of Radiation On-cology *Biology *Physics , 76(3):S10–S19, 2010.1511] Panayiotis Mavroidis, Kevin A. Pearlstein, John Dooley, Jasmine Sun, Srinivas Saripalli,Shiva K. Das, Andrew Z. Wang, and Ronald C. Chen. Fitting ntcp models to bladder dosesand acute urinary symptoms during post-prostatectomy radiotherapy.
Radiation Oncology ,13(1):17, 2018.[12] Dan Nguyen, Troy Long, Xun Jia, Weiguo Lu, Xuejun Gu, Zohaib Iqbal, and Steve Jiang. DosePrediction with U-net: A Feasibility Study for Predicting Dose Distributions from Contoursusing Deep Learning on Prostate IMRT Patients. arXiv , pages 1–18, 2018.[13] Dan Nguyen, Troy Long, Xun Jia, Weiguo Lu, Xuejun Gu, Zohaib Iqbal, and Steve Jiang. Afeasibility study for predicting optimal radiation therapy dose distributions of prostate cancerpatients from patient anatomy using deep learning.
Scientific Reports , 9:1076, 2019.[14] Serguei V Pakhomov, Steven J Jacobsen, Christopher G Chute, and Veronique L Roger. Agree-ment between patient-reported symptoms and their documentation in the medical record.
TheAmerican journal of managed care , 14(8):530–539, 2008.[15] Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, DaisyDing, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, and Andrew Y.Ng. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning.
ArXiv , pages 1–7, 2017.[16] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks forbiomedical image segmentation. In Nassir Navab, Joachim Hornegger, William M. Wells, andAlejandro F. Frangi, editors,
Medical Image Computing and Computer-Assisted Intervention– MICCAI 2015 , pages 234–241, Cham, 2015. Springer International Publishing.[17] Jeff A. Sloan, Lawrence Berk, Joseph Roscoe, Michael J. Fisch, Edward G. Shaw, Gwen Wy-att, Gary R. Morrow, and Amylou C. Dueck. Integrating patient-reported outcomes intocancer symptom management clinical trials supported by the national cancer institutespon-sored clinical trials networks.
Journal of Clinical Oncology , 25(32):5070–5077, 2007. PMID:17991923.[18] American Cancer Society. American cancer society facts & figures 2019, 2019. [Online; postedJanuary 8 2019].[19] James A. Talcott, Jack A. Clark, Judith Manola, and Sonya P. Mitchell. Bringing prostatecancer quality of life research back to the bedside: Translating numbers into a format thatpatients can understand.
Journal of Urology , 176(4):1558–1564, 2006.[20] Lynne I. Wagner, Lari Wenzel, Edward Shaw, and David Cella. Patient-Reported Outcomesin Phase II Cancer Clinical Trials: Lessons Learned and Future Directions.
Journal of ClinicalOncology , 25(32):5058–5062, 2007. PMID: 17991921.16 ppendix A. Quality-of-Life Survey
Figure A.10: Quality-of-Life surveys given to patients before and weekly after RT. ppendix B. Bladder region thresholds Model Threshold 0.49 0.63Dosage Range (cGy) 0 4665
Dosage Thresholds for the Top of the Bladder0 False Positives 0 FalseNegatives
Model Threshold 0.43 0.62Dosage Range (cGy) 3183 5387
Dosage Thresholds for the Bottom of the Bladder0 False Positives 0 FalseNegatives
Model Threshold 0.4 0.65Dosage Range (cGy) 1640 4500
Dosage Thresholds for the Total Bladder (A)
Bladder Bottom Radiation Dose B l adde r U p R ad i a t i on D o s e SymptomNo Symptoms (B)