[PDF] A machine learning approach to using Quality-of-Life patient scores in guiding prostate radiation therapy dosing

Abstract

Thanks to advancements in diagnosis and treatment, prostate cancer patients have high long-term survival rates. Currently, an important goal is to preserve quality-of-life during and after treatment. The relationship between the radiation a patient receives and the subsequent side effects he experiences is complex and difficult to model or predict. Here, we use machine learning algorithms and statistical models to explore the connection between radiation treatment and post-treatment gastro-urinary function. Since only a limited number of patient datasets are currently available, we used image flipping and curvature-based interpolation methods to generate more data in order to leverage transfer learning. Using interpolated and augmented data, we trained a convolutional autoencoder network to obtain near-optimal starting points for the weights. A convolutional neural network then analyzed the relationship between patient-reported quality-of-life and radiation. We also used analysis of variance and logistic regression to explore organ sensitivity to radiation and develop dosage thresholds for each organ region. Our findings show no connection between the bladder and quality-of-life scores. However, we found a connection between radiation applied to posterior and anterior rectal regions to changes in quality-of-life. Finally, we estimated radiation therapy dosage thresholds for each organ. Our analysis connects machine learning methods with organ sensitivity, thus providing a framework for informing cancer patient care using patient reported quality-of-life metrics.

Full PDF

AA machine learning approach to using Quality-of-Life patient scores inguiding prostate radiation therapy dosing

Zhijian Yang a,b , Daniel Olszewski c,d , Chujun He e , Giulia Pintea f,g , Jun Lian h , Tom Chou i ,Ronald Chen j , Blerta Shtylla k a New York University, New York NY 10012 b Current Address: Applied Mathematics and Computational Science Program, University of Pennsylvania,Philadelphia, PA 19104 c Carroll College, Helena MT 59625 d Current Address: Computer, Information Science and Engineering Department, University of Florida, GainesvilleFL 32611 e Smith College, Northampton MA 01063 f Simmons University, Boston MA g Current Address: Department of Psychology, Tufts University, Boston MA 02111 h Department of Radiation Oncology, The University of North Carolina, Chapel Hill, NC 27599 i Depts. of Computational Medicine and Mathematics, UCLA, Los Angeles, CA 90095-1766 j Department of Radiation Oncology, University of Kansas Medical Center, Kansas City, KS 66160 k Department of Mathematics, Pomona College, Claremont CA 91711

Abstract

Thanks to advancements in diagnosis and treatment, prostate cancer patients have high long-termsurvival rates. Currently, an important goal is to preserve quality-of-life during and after treat-ment. The relationship between the radiation a patient receives and the subsequent side eﬀects heexperiences is complex and diﬃcult to model or predict. Here, we use machine learning algorithmsand statistical models to explore the connection between radiation treatment and post-treatmentgastro-urinary function. Since only a limited number of patient datasets are currently available,we used image ﬂipping and curvature-based interpolation methods to generate more data in orderto leverage transfer learning. Using interpolated and augmented data, we trained a convolutionalautoencoder network to obtain near-optimal starting points for the weights. A convolutional neu-ral network then analyzed the relationship between patient-reported quality-of-life and radiation.We also used analysis of variance and logistic regression to explore organ sensitivity to radiationand develop dosage thresholds for each organ region. Our ﬁndings show no connection betweenthe bladder and quality-of-life scores. However, we found a connection between radiation appliedto posterior and anterior rectal regions to changes in quality-of-life. Finally, we estimated radia-tion therapy dosage thresholds for each organ. Our analysis connects machine learning methodswith organ sensitivity, thus providing a framework for informing cancer patient care using patientreported quality-of-life metrics.

Keywords:

Machine Learning, Convolutional Neural Network, Radiation Therapy, OrganSensitivity, Prostate Cancer

1. Introduction

Approximately 175 thousand new cases of prostate cancer were reported in 2019 in the UnitedStates [18]. Depending on age and cancer stage, ﬁrst-line treatments include prostatectomy, radia-tion treatment, and androgen ablation. Each of these treatments carries diﬀerent side eﬀects. For

Preprint submitted to Elsevier May 25, 2020 a r X i v : . [ q - b i o . Q M ] M a y ome patients, a prostatectomy is followed by radiation treatment to minimize the possibility ofrecurrence. Radiation planning for each patient begins with a CT scan, which is followed by thedemarcation of the prostate (radiation target) and the surrounding organs (bladder and rectum)by a physician. Each plan is customized to a patient as there is some ﬂexibility in the spatialdosing of radiation with the primary consideration being delivery of suﬃcient dosage of radiationto the target organ without overexposing and damaging surrounding organs and structures. Ra-diation treatment (RT) plans are developed using Dose Volume Histograms (DVH). DVH discardall organ-speciﬁc spatial information and they are usually based on a single planning CT scan thatdoes not account for anatomical variations over the course of several weeks of therapy [10]. Variousmetrics have been developed in order to translate the information from a DVH into a computedprobability of uncomplicated tumor control using normal tissue complication probability models(NTCP) [10]. These eﬀorts are necessary as the relationship between exposure to radiation of thesurrounding organs/structures and the severity and probability of toxicity (urinary and bowel) isstill not fully understood. From a physiological perspective, it is not clear how radiation dosageaﬀects tissues, organs, as well as their control and function.In order to develop a radiation plan that minimizes patient side eﬀects, one needs to quan-tify these side eﬀects post-radiation. Traditionally, physician-assessed scoring systems have beenwidely used for measuring patient side eﬀects following cancer treatment [3]. However, more recentevidence indicates that clinicians can downgrade the frequency and severity of patients’ treatment-related symptoms [3, 14]. Patient-reported health-related quality-of-life (QOL) surveys are becom-ing an important tool in measuring outcomes after cancer treatment [17, 20]. For example, a studyby K. Diao et al. [4] explored urinary and bowel symptom development during treatment usingpatient-reported QOL scores (from 1, indicating no symptoms, to 5, indicating high frequency ofsymptoms). An IRB waiver was received at the University of North Carolina for this retrospectivestudy using anonymized data. The results showed that average scores progressively increased frombaseline throughout treatment, but all symptoms resolved to baseline levels by follow-up. In thecontext of RT, NTCP models can be correlated to patient-reported QOL data, as was done in [11].However their analysis focused only on urinary symptoms during post-prostatectomy radiotherapyand NTCP models rely on already reduced DVH information. Given the rich organ-speciﬁc in-formation obtained during treatment planning, it could be desirable to more directly connect 3Dpatient CT scans and dosing with QOL scores.Machine learning methods have become state of the art in many applications with impressiveresults; deep convolutional networks have many times outperformed traditional methods for diag-nosis [15] or visual recognition tools [8]. In biomedical imaging, deep convolutional neural network(CNN) algorithms have been applied to a wide array of problems. Of particular interest in ourcontext are medical image segmentation machine learning algorithms, such as the U-net [16] archi-tecture that focuses on semantic segmentation of biomedical images. The U-net architecture hasappealing features that we will employ in our approach such as it uses a large number of max-pooling operations to allow for the identiﬁcation of global, non-local features and up-convolutionto return images to their original size. In the work of Nguyen et al. [13], a modiﬁed U-net ar-chitecture was shown to accurately predict voxel-level dose distributions for intensity-modulatedradiation therapy for prostate cancer patients. These prior studies indicate tremendous promise ofguidance of radiation treatment planning with artiﬁcial intelligence-based algorithms. Nevertheless,these algorithms can be of signiﬁcant use in treatment planning if they can also incorporate QOLpredictions that can provide immediate guidance for the dosimetrist during clinical plan optimiza-tion. To our knowledge, there are no prior approaches that integrate QOL in machine learningalgorithms in the context of RT for prostate cancer.In the department of Radiation Oncology at the University of North Carolina, QOL scores were2ollected using a validated questionnaire [19] administered during weekly treatment visits as partof the routine clinical work-ﬂow for prostate cancer patients. While QOL data have been studied post prostate cancer radiation treatment, data collected during the course of treatment can conveyimportant information about symptom development, which can be a fertile ground for the use ofquantitative modeling to guide optimal RT dosing. Accordingly, in this paper we analyze data froma 14-question quality-of-life prostate cancer patient survey that was collected over the span of ﬁveyears (2010-2015). The data we examined tracked patient urinary and bowel side eﬀects before andalong the course of their treatment (about seven weeks). Associated with each patient’s QOL, wealso examined associated anatomical CT scans and radiation dosing patterns for approximately 50patients.In this paper, we propose a CNN algorithm to explore the connection between the spatialdistribution of the RT dose and the QOL outcomes reported from patients in our data set. Asigniﬁcant problem with CNN algorithms in our context is the need for a suﬃciently large dataset; to resolve this issue, we augmented our patient data sets using interpolation algorithms thatgenerated synthetic patients by combining existing patient data. In addition, we used transferlearning in order to improve the performance of our CNN algorithm and implemented steps toavoid overﬁtting the problem. A key goal for our study was to generate insight into the mostradiation-sensitive tissue regions.As a comparative alternative to the CNN approach, we also used analysis of variance and logisticregression to explore organ sensitivity to radiation and develop dosage thresholds for each organregion. We identiﬁed regions of the rectum that were highly correlated with changes in individualpatient symptoms. Finally, we estimated radiation therapy dosage thresholds to determine howhigh radiation therapy dosage needed to be in order to trigger collateral symptoms. Combiningresults from machine learning and direct analyses of organ sensitivity provides a powerful frameworkto inform patient care in the quality-of-life context.This paper is organized as follows. In the Methods section, we formulate convolutional neuralnetwork algorithms and statistical models. In the Results section, we demonstrate that the CNNalgorithm we developed can identify correlations between bowel-related symptoms and radiation.Furthermore, we support these ﬁndings through statistical analyses that explore organ sensitivityto radiation dosage. Finally, in the Discussion and Conclusions section, we compare our resultswith those of previous studies.

2. Methods

Our patient-reported data were extracted from a patient quality-of-life survey containing an-swers to 14 questions, seven questions pertaining to urinary symptoms and seven concerned withbowel symptoms. The speciﬁc survey questions and possible responses are shown in Appendix A.Patients took the survey before undergoing radiation therapy, once a week during the seven-week-long treatment, and after completing therapy. Answers by patient j to question i at time point t = { , , , , , , , } , a ij ( t ), were scored on a discrete scale ranging from 1 to 5 (a score of 1 indi-cating the least severity in symptoms, and a score of 5 indicating very high severity in symptoms).We used this quality-of-life survey and de-identiﬁed CT scans and treatment plans for a total of 52patients (note that 57 patients were provided, however, ﬁve patients had incomplete data and werediscarded in our analysis).The answers a ij ( t = 0) to the ﬁrst survey taken before treatment were used as the baseline ofsymptoms before radiation therapy. Subsequent answers a ij ( t ≥

1) provide information on patient j ’s symptoms associated with question i . In order to reduce the dimensionality of the data set3ollected over several weeks, we developed a single score for urinary-related symptoms and bowel-related symptoms as follows.For each patient, we divided the questions and associated answers into those concerning urinaryor bowel function and then identiﬁed the worst (maximum) score a ∗ ij ≡ max t a ij ( t ) a patient reportedthroughout the multi-week treatment for each question. A single total diﬀerence score for eachpatient j , ∆ j , was computed as the diﬀerence a ∗ ij − a ij (0), summed over the urinary- or bowel-subset of answers i : ∆ u j = (cid:88) i =1 (cid:2) a ∗ ij − a ij (0) (cid:3) , ∆ b j = (cid:88) i =8 (cid:2) a ∗ ij − a ij (0) (cid:3) . (1)Here, we have ordered questions and corresponding answers associated with urinary function as i = { , , , , , , } and the bowel-associated questions as i = { , , , , , , } . The totaldiﬀerence score ∆ u j and ∆ b j represent the total change in the quality of life associated with urinaryand bowel function, respectively.Next, we used a cut-oﬀ (or threshold) value of 6 to convert the patient’s quality-of-life responsesinto a binary classiﬁer deﬁned by the discrete Heaviside function y j ≡ H (∆ j −

6) = (cid:40) j < , j ≥ . (2)Score thresholding can be visualized in Figure 1 that shows the urinary total diﬀerence scores foreach patient overlaid with colored boxes that mark the score cut-oﬀ value. The threshold of 6 wasinitially arbitrarily chosen using the fact that it is more than half of greatest sum of changes forpatients; however, we tested other thresholds in order to discern issues with the binary classiﬁer asdetailed in the Results section. Total difference score, ∆ uj (urinary) P a t i en t Figure 1: Total diﬀerence scores in urinary symptoms for patients 1-54. In blue, we mark patients who were classiﬁedas having a signiﬁcant change in symptoms with ∆ u j ≥ .2. Image augmentation Previous biomedical studies focusing on image processing have used data sets as small as 85data points [7] and as large as 100,000 data points [15]. Since the amount of data points we have iseven smaller than 85, we decided to enrich our dataset prior to training our algorithms. We usedthe Fischer-Modersitzki (FM) curvature-based image registration technique [5] in order to generatenew in silico patients as interpolations of existing patient data.The FM approach was developed in the context of image registration. For completeness, weoutline the core ideas of image registration algorithms here. Let d -dimensional images be repre-sented by compactly supported mappings T, R : Ω → R where Ω =]0 , d . Speciﬁcally, the quantity T ( x ) is the intensity or image grey value at the spatial position x ∈ Ω. Given a reference image R ( x ) and a deformable template image T ( x ), a registration algorithm outputs a deformation, ordisplacement ﬁeld, u : R d → R d such that when applied to the template image, T ( x − u ( x )) aresulting modiﬁed template more closely matches the reference, R ( x ). The problem is then how toﬁnd a desired deformation u = ( u , . . . , u d ). This becomes an optimization problem as one tries tominimize the diﬀerence between the deformed template T u := T ( x − u ( x )) and the reference R ( x ).Variations in registration methods arise when one employs a particular optimization techniqueand one must deﬁne a metric for measuring the goodness of a deformation. Let D be the distancemeasure between the reference R and deformed template T , and S a measure of the smoothness ofthe deformation u . The FM approach consists of ﬁnding u by minimizing the joint functional J [ u ], J [ u ] := D [ R, T ; u ] + αS [ u ] . (3)The regularization parameter α is used to control the strength of the smoothness of the displacementversus the similarity of the images. The diﬀerence-squared measure D is given by D [ R, T ; u ] := 12 (cid:107) R − T u (cid:107) = 12 (cid:90) Ω ( R ( x ) − T ( x − u ( x ))) d x , (4)and the curvature-based smoothness S [ u ] := 12 d (cid:88) j =1 (cid:90) Ω (∆ u j ) d x , (5)with Neumann boundary conditions deﬁned by ∇ u j ( x ) = 0 , x ∈ ∂ Ω , j = 1 , . . . , d. (6)We obtain a minimizer u by ﬁrst ensuring that the Gˆateaux derivative of the objective functionvanishes. The resulting Euler-Lagrange equations are f ( x , u ( x )) + αA [ u ]( x ) = 0 , x ∈ Ω . (7)with f ( x , u ( x )) = ( R − T u ) · ∇ T u ( x ) = ( R ( x ) − T ( x − u ( x ))) · ∇ T ( x − u ( x )) , (8) A [ u ]( x ) = ∆ u . (9)The above semi-linear partial diﬀerential equations (PDE) are known as the Navier-Lame bihar-monic and diﬀusion equations. 5he Euler-Lagrange PDE’s can be solved using the following ﬁxed-point iteration αA [ u k +1 ]( x , t ) = − f ( x , u k ( x , t )) , k ≥ , (10) u = 0 . (11)Since the computational domain Ω is of a simple geometry in this case, a ﬁnite diﬀerenceapproximation for the derivatives can be used. This yields a linear system of equations that aresolved in each iteration step to obtain the deformation u ; more details of the discretization schemewe implemented to solve for u are given in [5].Returning to our problem, the goal is to take each CT slice of patient A and interpolate itwith each CT slice of patient B creating a new stack of CT image slices for a new “patient” C.We achieved this interpolation by applying the FM registration approach and selecting one stackof images to serve as the reference and another to serve as the template. We implemented FM in Matlab , using the discretization approach outlined above and also the implementation outlinedin [9]. A new interpolated patient C was obtained by applying the deformation to the template CTimage stacks. We performed this procedure on the 52 patients’ cropped CT image stacks for thebladder and the rectum respectively, thus producing 1,326 new images for each organ. An exampleof the interpolated CT images we obtained using this method is shown in Figure 2. (a) Reference (b) Template (c) Interpolation

Figure 2: Interpolation of CT images of the bladder using the FM algorithm. The interpolated image (c) is thedeformed version of the template image (b) against the reference image (a).

Radiation plans are overlayed on a baseline CT scan and they give the spatial distribution ofthe doses that will be given to a patient over the course of treatment. When the plan is mappedonto a CT image, high radiation dosage regions are denoted by high pixel intensity (white) whilelow dosages are represented by darker pixels. A cross-section of a representative radiation plan isshown in Figure 3. We applied FM interpolation onto the radiation plan images for each in silico patient. Speciﬁcally, we interpolate between radiation plans of patient A and patient B to create anew stack of radiation plans for a new “patient” C.

Next, we constructed a 3-level 3D convolutional neural network (CNN) model for each organ:one processing data related to urinary symptoms and one processing data related to rectal symp-toms. The architecture of the model is illustrated in Figure 4.The convolutional layers use ﬁlters of size 3 × × ×

48 pixels down to 16 igure 3: A representative radiation plan. The highest dosage corresponds to the greatest pixel intensity (in white).Black pixels correspond to no radiation.Figure 4: Architecture used for the CNN classiﬁcation model. There are three layers that have convolution, activation,and pooling. The last convolution layer is connected by fully connected layers and a drop-out layer which drops out40% information to avoid over-ﬁtting. Finally the model output two nodes which tell the predicted probability thata patient will or will not manifest change in (urinary or bowel) symptoms throughout treatment pixel. We chose to use max pooling for our pooling method so that the ﬁlters captured the strongest(and thus highest) pixel value for each stride (we use strides of 1 for each ﬁlter and a pooling sizeof [2,2,2]). In addition, each convolutional layer is followed by an exponential linear unit (ELU)activation function deﬁned as f ( x ) = (cid:40) e x − x < ,x x ≥ . (12)We used batch normalization (BN) after the convolution and ELU operations, which have beenshown to update weights equally throughout the CNN, resulting in faster convergence [12]. Inaddition, we used drop-out (40 percent rate) to reduce overﬁtting since our dataset was small.Because our model is a classiﬁer, we use a cross-entropy loss function that the network minimizes7hrough back-propagation:Loss( k ) = − N N (cid:88) j =1 y j log p k ( y j ) + (1 − y j ) log(1 − p k ( y j )) , (13)where k is the step number, y j = { , } classiﬁer label of the j th patient, and p k ( y j ) represents thepredicted probability for the corresponding label at the k th iteration.Our CNN had 2 channels as part of its input layer: one to process information related to thepatient CT scans, and one to process information related to the patient RT plans. The input dataconsisted of 52 patient CT scan stacks and RT plans cropped around the organ of interest (eitherthe bladder or the rectum) with the respective organ doctor annotated contours for each CT scan.The CT scans and RT plans were cropped because the information outside of the organs of interestwas not useful for the purposes of this study. Cropping also minimized the computational powerneeded to run our algorithm. The ﬁnal layer of the model consisted of two nodes: one providingthe predicted probability that a patient would manifest a change in (urinary or bowel) symptomsthroughout treatment; and one giving the predicted probability that a patient would not manifesta change in (urinary or bowel) symptoms throughout treatment. In addition, the CNN produceda confusion matrix (for either the urinary or bowel symptoms) outlining how many patients itaccurately predicted from the testing set.To assess the overall performance of our model, the CNN trained on 39 patients with a batchsize of 20 and learning rate of 0.001 for approximately 12 hours, and was then tested against theremaining 13 patients. Patients were shuﬄed and randomly assigned to the training and testingsets to avoid bias. The CNN also employed a 5-fold cross-validation procedure on the training set,similar to the approach in Jiang et al. [6]. Each of the 5 folds split the training set into 31 trainingpatients and 8 validation patients, respectively. Every fold initialized a classiﬁer (for a total of 5classiﬁers), from which we could select the model that performed best, based on its accuracy andnumber of true-positives, and evaluated it on the testing set. We used the number of true-positivesas a criterion for the best-performing model in order to avoid only predicting a lack of a change insymptoms. The cross-validation model and corresponding loss functions used can be visualized inFigure 5. Figure 5: (A) Model cross-validation. We initialized 5 diﬀerent models. For each model there is a validation setthat moves across the data. First, we trained on the training set, then checked each of the models on the diﬀerentvalidation sets, providing a general idea of which model would work the best. (B) Training (orange) and validation(blue) loss for the best model chosen through cross-validation. As shown in orange, the training loss becomes verysmall, but the validation loss stays at about 0 . .5. Autoencoder We employed a convolutional autoencoder network that uses a portion of the CNN architecture.Speciﬁcally, instead of connecting to a fully connected layer after all the convolution layers as wedid with the CNN, the autoencoder was used to pre-train the network on unlabeled informationby reconstructing the original images. The convolutional autoencoder network architecture is illus-trated in Figure 6 and is similar to the U-net architecture used in related segmentation problems[7, 2]. After the autoencoder was trained, we truncated the network at the start of the deconvolu-tion layers and connected it to a fully connected layer that served as the output for our new CNNmodel. (A) (B)

TestTraining

Steps x Lo ss Figure 6: Convolutional Autoencoder. (A) Schematic of the architecture of the convolutional autoencoder network.Three layers have convolution, activation, and pooling. The network deconvolves with activation and pooling forthree more layers. The network targets to reconstruct the input image with the goal to learn the key features of theCT scans and RT plans. (B) Autoencoder loss while training on the bladder data. The blue curve shows the loss forthe test set and the orange curve shows the loss for the training set. Both decrease with the number of iterations ofthe autoencoder.

Training of the autoencoder allowed us to implement a transfer learning approach where weﬁrst trained the autoencoder network to reconstruct patient images and assigned the near-optimalweights obtained from autoencoder training as initial conditions for the CNN network with thebinary classiﬁer. This pre-training was necessary due to the small size of of the dataset.

In order to further explore the relationship between QOL scores and RT dosage, we investigatedwhether or not there were any correlations between a certain organ region’s RT dosage and whetheror not the patient experienced a change in symptoms related to that organ. We accomplished thisthrough the use of analysis of variance (ANOVA) and logistic regression.We ﬁrst built an algorithm to obtain the RT dosage on the rectum and bladder contours. We fedthe cropped image of the organ of interest (with overlaid contours from the earlier data preparationprocess) into the algorithm and obtained a new image of the organ with thicker contours. The9lgorithm was able to do this by locating the center of the organ and obtaining a radius thattraces around the organ’s original (doctor annotated) contour to obtain the new, thicker contour.This allowed us to obtain more information on the RT dosage located around the contour of theorgan. We were then able to obtain the RT dosage corresponding to the area occupied by the new,thicker organ contours. Had we used the original organ contours, we may not have had the mostcomplete information on the prescribed amount of RT. The new, thick bladder contours with thecorresponding RT dosage are shown in Figure 7.Following this procedure, we combined all CT images of one patient into a three dimensionalcube and separated it into either top and bottom, or front and back regions. We split the bladderinto top and bottom halves and the rectum into front and back regions using the following rationale.Since the bottom of the bladder is closer to the prostate, we speculated this region would be exposedto more radiation and thus be associated with a higher incidence of collateral urinary symptoms.Therefore, we split the bladder into top and bottom. As for the rectum, we split it between frontand back, since the front is closer to the prostate and we thus anticipated it would be exposed toa higher radiation dosage. Organ regions are illustrated in Figure 7(C).

Figure 7: Organ contouring and organ regions. (A) A CT slice of the rectum of patient 40, showing the newthick contour. (B) Same as (A) but showing the thick contour with the corresponding RT dosage. (C) Lateralview diagram showing how the bladder (yellow) and rectum (red) were split for spatial RT analyses. Image from http://libcat.org/anatomy-of-prostate-cancer . Finally, we deﬁne a quantity d lj as the total RT dosage measured in cGy for a speciﬁed organregion, l and patient, j . Total dosages were computed for: ( l = 1) top of the bladder, ( l = 2)bottom of the bladder, ( l = 3) total bladder, ( l = 4) front of the rectum, ( l = 5) back of therectum, and ( l = 6) total rectum. Organ sensitivity.

We ﬁrst conducted a paired 2-tailed t-test to evaluate whether or not thetotal average dosages were signiﬁcantly diﬀerent across all the regions outlined above. We foundthat all regions had signiﬁcantly diﬀerent total RT dosages ( p < . y dij ≡ H (( a ∗ ij − a ij (0)) −

2) = (cid:40) a ∗ ij − a ij (0) < , a ∗ ij − a ij (0) ≥ . (14)Using this approach, we obtained a binary value y dij for each survey question i and patient j sothat we could identify which symptoms were signiﬁcantly aﬀected by the corresponding RT dosage.Then, the binary data were used in one-way ANOVA, which compared the distribution of RTdosage for a given organ region ( l = 1 , . . . ,

6) between the two patient groups ( y dij = 1 vs y dij = 0)for a given symptom question i . Recall that using our convention urinary questions, i = 1 − l = 1 − i = 8 −

14 correspond to regions10 = 4 −

6. This analysis allowed us to identify which speciﬁc symptoms were signiﬁcantly aﬀectedby the corresponding RT dosage for a given organ region.

Dosage Thresholding.

Since the previous analysis identiﬁed organ regions that were associatedwith signiﬁcant changes in symptoms, we next investigated the ranges of RT dosage that couldtrigger such changes. We used logistic regression applied to all the original binary patient classiﬁers y j (from Eq. (2)) and region total doses, d lj in order to predict whether a patient would experiencechanges in symptoms given the corresponding region dose, d lj . The goal of this analysis was toidentify the lowest RT dose for patients with changes in symptoms ( y j = 1) and the highest RTdose for patients that did not have a change in symptoms ( y j = 0).For each logistic model, a tunable RT dose threshold parameter, θ determined to which categorypatients were assigned based on the model’s prediction, ˆ y j . For each region l , we varied θ untilwe obtained a θ p corresponding to no false positives and a θ n corresponding to no false negativeswhen comparing y j with the predicted logistic classiﬁer ˆ y j . Once we obtained the thresholdingparameters for each region, l , we recorded the corresponding RT dosage ranges and used them toinfer how high RT dosage had to be in order to trigger collateral symptoms.

3. Results and Discussion

We evaluated the performance of the CNN network using a measure of accuracy deﬁned to bethe number of patients with correctly predicted outcomes over the total number of patients. Weestimated accuracy results for the bladder and the rectum symptoms by running our algorithm 10times and averaging the results. We did not ﬁnd conclusive results for the bladder, as we obtainedan average accuracy of 38% with a range of 23% to 53%. This indicated that the CNN model, usingthe available data, did not ﬁnd any patterns to classify the patients for bladder symptoms. Forthe patients with a change in bladder symptoms, there was a lot of variability in model predictionsranging anywhere from 0% to 50% with an average of 27%.In contrast, promising results were obtained for classiﬁcation of rectal symptoms. Our modelwith cross-validation accurately predicted an average of 74% changes in rectal symptoms with arange of 62% to 84.5%. The results are visualized in Figure 8. For patients with a change insymptoms, the model was on average accurately predicting the change in symptoms 56% of thetime with a range from 25% to 100%. In Table 1, we give the confusion matrix for the rectum modelwith the validation set that resulted in an 84.6% accuracy. Of the 10 patients without a changein symptoms, 9 of them were accurately predicted. Of the 3 patients with a change in symptoms,2 were accurately predicted. The result is promising because the model is not always predictingone of the classes; it is picking up some patterns from the patients’ data so that it can classify thepatients in either category.We will show later that our results are rather insensitive to the threshold delineating quality-of-life changes.

Confusion Matrix (Rectum Symptoms)Actual Predicted

No Change ChangeNo Change 9 1Change 1 2

Table 1: Confusion matrix for rectum model. Table shows the accuracy for one completely validated model. Weshow the actual classiﬁcation of the patient, and what the model predicted. igure 8: Accuracy for our trained classiﬁcation model. The overall accuracy for the bladder and rectum and themodel accuracy within patients who experienced symptoms (denoted with symp). As we can see with the bladder andbladder symptom accuracy, there were no signiﬁcant diﬀerences, as results sit below the 50% line. For the rectum,the overall accuracy for predicting rectum symptoms exceed the 50% line. Other than classifying patients based on their quality-of-life score with a cut-oﬀ value of 6, wealso used thresholds of 5 and 7 and used the reclassiﬁed data to train the classiﬁcation model.Unsurprisingly, the accuracy rates are similar to those found using a threshold of 6.For the data reclassiﬁed with a threshold of 5, our model with cross-validation accuratelypredicted an average of 69% changes in rectal symptoms.For the data reclassiﬁed with a threshold of 7, we found an average accuracy rate of 69%. Asthere are no considerable diﬀerences in results when we change the threshold, reinforcing that ouroriginal choice of a cut-oﬀ-value, the half of the greatest sum of changes, is a reasonable way toclassify patients based on their quality-of-life scores.

Based on our ANOVA analyses for the bladder data, it does not appear that there were anysigniﬁcant diﬀerences between the two patient groups in regards to average spatial RT dosing forthe bottom and top of the bladder, as shown in Table 2.

ANOVA Dose Diﬀerence ResultsQoL Question P-Value (Bottom of Bladder) P-Value (Top of Bladder)1(Flow Easiness) 0.5331 0.07632 (Frequency at Night) 0.2405 0.86523 (General Frequency) 0.58571 0.6874 (Pain) 0.6682 0.18955 (Urgency) 0.5272 0.0946 (Control) 0.9656 0.17677 (Leak) 0.7949 0.8284All Questions 0.6172 0.6667

Table 2: P-values corresponding to the ANOVA analysis of each urinary symptom and the average RT dosage in thetop and bottom of the bladder. ∗ indicates p < .

05 which we interpret as statistically signiﬁcant. NOVA Dose Diﬀerence ResultsQuestion P-Value (Front of Rectum) P-Value (Back of Rectum)1 (Diarrhea) 0.0018* 0.0002*2 (Urgency) 0.0025* 0.0006*3 (Pain) 0.0023* 0.0008*4 (Bleeding) 0.6563 0.41425 (Cramping) 0.0545 0.0015*6 (Pass Mucus) 0.0387* 0.0185*7 (Nothing to Pass) 0.0541 0.0504All Questions 0.0123* 0.0112*

Table 3: P-values corresponding to the ANOVA analysis of each rectal symptom and the average RT dosage in thefront and back of the rectum. ∗ indicates p < .

05 which we interpret as statistically signiﬁcant.

According to our ANOVA analyses on rectum data, those who experienced changes in symptomsof diarrhea, urgency, pain, and passing mucus had signiﬁcantly higher average RT dosage in thefront of the rectum. Furthermore, those who experienced changes across all rectal symptoms hada signiﬁcantly higher average RT dosage in the front of the rectum. We found the same results forthe back of the rectum (with a diﬀerence in cramping which turned out to be signiﬁcant for theback but not the front). The corresponding p-values for both organ regions are listed in Table 3.The results indicate that the rectum is more sensitive to radiation therapy, matching the resultsfrom our CNN model.According to our logistic threshold analysis, we found that patients tend to develop bowel-related symptoms throughout the treatment if they are receiving doses more than 2,500 cGy in therectum. Correspondingly, in Figure 9(A), we also found that the safe doses to the rectum rangefrom 1,250 to 2,950 cGy, implying that a dose greater than 2,950 cGy will trigger the developmentof collateral symptoms. If we further assume the radiation dosages to the diﬀerent parts of therectum are uncorrelated, we can also independently ﬁnd dosage thresholds for the front and backof the rectum. We observed that the front of the rectum could tolerate a higher range of absolutedosage (2,100-4,300 cGy) than the back of the rectum (400-1,700 cGy). These threshold values arelisted in Figure 9(A) and similar results for the top, bottom, and total bladder regions are shownin Appendix B.While this conclusion is consistent with the correlation found between the rectal symptomsand rectum dosages in the interval 2500-4200 cGy [1], we cannot rule out that it could resultfrom a possible collinear eﬀect in which patients received high doses to both front and back of therectum. Since the symptoms cannot be associated with excess radiation to the front of back of therectum, the region-dependent dosage thresholds are likely to depend on the dosage experienced inother regions. The dosage-symptom instances are shown in a scatter plot in Figure 9(B). Here,symptoms are typically associated with larger overall dosages. The data are not suﬃcient to resolveindependent region-speciﬁc thresholds.For the bladder, the dosage thresholds for the top and bottom of the bladder showed signiﬁcantoverlap: the top of the bladder could tolerate a dosage of 0-4,665 cGy while the bottom couldtolerate a dosage of 3,183-5,387 cGy (see Appendix B). The corresponding scatter plot shows nodiscernible correlation between symptoms and sampled dosages.

4. Summary and Conclusions

With the lowering of the prostate cancer mortality rate, an emphasis has been placed on in-creasing the quality-of-life for patients undergoing radiation treatment. Utilizing machine learningalgorithms and statistical methods, we provide an in-depth analysis on the spatial dosage providedto each patient. By analyzing a patient’s anatomical CT image and the radiation therapy dosing,13

False Positives 0 FalseNegatives

Model Threshold 0.08 0.62Dosage Range (cGy) 2100 4300

Dosage Thresholds for the Front of the Rectum0 False Positives 0 FalseNegatives

Model Threshold 0.13 0.53Dosage Range (cGy) 400 1700

Dosage Thresholds for the Back of the Rectum0 False Positives 0 FalseNegatives

Model Threshold 0.12 0.62Dosage Range (cGy) 1250 2950

Dosage Thresholds for the Total Rectum (A)

Rectum Front Radiation Dose R e c t u m B a ck R ad i a t i on D o s e SymptomNo Symptoms (B)

Figure 9: Logistic model thresholds and corresponding RT dosages for each rectum region. (A) Computed thresholdsassuming independence. (B) Scatter plot for patients and their corresponding front and back RT doses. Patients withsymptoms and without symptoms are shown in red and blue, respectively. The distribution of sampled RT doses arejust broad enough to observe the that higher doses lead to symptoms. The mean front and back radiation doses ofpatients with and without symptoms are indicated by the thick red and blue bars on the x - and y -axes, respectively.The current data are insuﬃcient to resolve anything other than a total dosage thresholding eﬀect. we were able to connect understanding of how radiation inﬂuences secondary symptoms. We wereable to do this by using a convolutional neural network that analyzed the CT image and associatedradiation dosage. Our second method used ANOVA analysis on summarized spatial information.Using a brute-force technique, we were able to identify that splitting the bladder into a top regionand bottom and the rectum into a front and back region was the best approach. Our outcomes fromANOVA agreed with our convolutional neural network and also provided dosage thresholds for eachregion. These results for the dosage thresholds for the rectum and bladder align with the resultswe obtained from our CNN prediction model, but should be interpreted with care. The thresholdsacross diﬀerent regions should not be thought of as independent parameters because the dosagesapplied in the patient samples are correlated and the binary, whole patient symptom indicators arenot attributed to any region. Moreover, the number of patients and the range of radiation dosesthey received are not large enough clearly resolve sharper thresholds. This explains the wide rangefor the bladder dosage thresholds and the overlap we observed between the top and bottom of thebladder. On the other hand, our CNN prediction model found that the radiation dosage (and theCT scan features) do in fact play a large role in explaining the diﬀerences in symptom developmentacross patients.In conclusion, we developed a deep learning framework and complementary statistical methodsto identify the connection between spatial dosage and symptoms caused by prostate radiationtherapy. A strength of machine learning is that it can produce accurate predictions if presentedwith suﬃciently large data sets; however, the underlying mechanisms or speciﬁc features are diﬃcultto discern in these approaches. In our application, it has the potential not only to accurately predictpatient side eﬀects, but also to learn what regions of the organs might be responsible for speciﬁc sideeﬀects. As is signiﬁcant interest in integrating machine learning approaches with more traditionalmodeling approaches, we also found that classical statistical approaches was also useful in ourproblem. We expect that our CNN results will be much more accurate upon subsequent trainingon larger patient data sets and can be extended to predicting speciﬁc question scores to furtherreﬁne treatment planning. 14 . Acknowledgements This research was funded by The Jayne Koskinas Ted Giovanis Foundation for Health andPolicy and the Breast Cancer Research Foundation. TC acknowledges support from the ArmyResearch Oﬃce through grant W911NF-18-1-0345 and the National Institutes of Health throughgrant R01HL146552 (TC). The authors also thank the Institute for Pure and Applied Mathematicsat UCLA for hosting this project under the Research in Industrial Projects for Students program.

6. ReferencesReferences [1] Massoud Al-Abany, Asgeir R. Helgason, Anna-Karin Agren Cronqvist, Bengt Lind, PanayiotisMavroidis, Peter Wersll, Helena Lind, Eva Qvanta, and Gunnar Steineck. Toward a deﬁnitionof a threshold for harmless doses to the anal-sphincter region and the rectum.

InternationalJournal of Radiation Oncology · Biology cdot

Physics , 61(4):1035–1044, 2005.[2] Anjali Balagopal, Samaneh Kazemifar, Dan Nguyen, Mu-Han Lin, Raquibul Hannan, AmirOwrangi, and Steve Jiang. Fully automated organ segmentation in male pelvic CT images.

Physics in Medicine & Biology , 63(24):245015, 2018.[3] Ethan Basch. The missing voice of patients in drug-safety reporting.

New England Journal ofMedicine , 362(10):865–869, 2010. PMID: 20220181.[4] Kevin Diao, Emily A. Lobos, Eda Yirmibesoglu, Ram Basak, Laura H. Hendrix, Brittney Bar-bosa, Seth M. Miller, Kevin A. Pearlstein, Gregg H. Goldin, Andrew Z. Wang, and Ronald C.Chen. Patient-reported quality of life during deﬁnitive and postprostatectomy image-guidedradiation therapy for prostate cancer.

Practical Radiation Oncology , 7(2):e117–e124, 2017.[5] Bernd Fischer and Jan Modersitzki. A uniﬁed approach to fast image registration and a newcurvature based registration technique.

Linear Algebra and its Applications , 380:107–124, 2004.[6] Z. Jiang, J.-F. Witz, P. Lecomte-Grosbras, J. Dequidt, C. Duriez, M. Cosson, S. Cotin, andM. Brieu. B-spline Based Multi-organ Detection in Magnetic Resonance Imaging.

Strain ,51(3):235–247, 2015.[7] Samaneh Kazemifar, Anjali Balagopal, Dan Nguyen, Sarah McGuire, Raquibul Hannan, SteveJiang, and Amir Owrangi. Segmentation of the prostate and organs at risk in male pelvic CTimages using deep learning.

Biomedical Physics & Engineering Express , 4(5):055003, 2018.[8] Alex Krizhevsky, Ilya Sutskever, and Geoﬀrey E. Hinton. Imagenet classiﬁcation with deepconvolutional neural networks.

Commun. ACM , 60(6):84–90, 2017.[9] Stuart Alexander MacGillivray. Curvature-based image registration: Review and extensions.mathesis, University of Waterloo, 2009.[10] Lawrence B. Marks, Ellen D. Yorke, Andrew Jackson, Randall K. Ten Haken, Louis S. Cons-tine, Avraham Eisbruch, Søren M. Bentzen, Jiho Nam, and Joseph O. Deasy. Use of normaltissue complication probability models in the clinic.

International Journal of Radiation On-cology *Biology *Physics , 76(3):S10–S19, 2010.1511] Panayiotis Mavroidis, Kevin A. Pearlstein, John Dooley, Jasmine Sun, Srinivas Saripalli,Shiva K. Das, Andrew Z. Wang, and Ronald C. Chen. Fitting ntcp models to bladder dosesand acute urinary symptoms during post-prostatectomy radiotherapy.

Radiation Oncology ,13(1):17, 2018.[12] Dan Nguyen, Troy Long, Xun Jia, Weiguo Lu, Xuejun Gu, Zohaib Iqbal, and Steve Jiang. DosePrediction with U-net: A Feasibility Study for Predicting Dose Distributions from Contoursusing Deep Learning on Prostate IMRT Patients. arXiv , pages 1–18, 2018.[13] Dan Nguyen, Troy Long, Xun Jia, Weiguo Lu, Xuejun Gu, Zohaib Iqbal, and Steve Jiang. Afeasibility study for predicting optimal radiation therapy dose distributions of prostate cancerpatients from patient anatomy using deep learning.

Scientiﬁc Reports , 9:1076, 2019.[14] Serguei V Pakhomov, Steven J Jacobsen, Christopher G Chute, and Veronique L Roger. Agree-ment between patient-reported symptoms and their documentation in the medical record.

TheAmerican journal of managed care , 14(8):530–539, 2008.[15] Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, DaisyDing, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, and Andrew Y.Ng. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning.

ArXiv , pages 1–7, 2017.[16] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks forbiomedical image segmentation. In Nassir Navab, Joachim Hornegger, William M. Wells, andAlejandro F. Frangi, editors,

Medical Image Computing and Computer-Assisted Intervention– MICCAI 2015 , pages 234–241, Cham, 2015. Springer International Publishing.[17] Jeﬀ A. Sloan, Lawrence Berk, Joseph Roscoe, Michael J. Fisch, Edward G. Shaw, Gwen Wy-att, Gary R. Morrow, and Amylou C. Dueck. Integrating patient-reported outcomes intocancer symptom management clinical trials supported by the national cancer institutespon-sored clinical trials networks.

Journal of Clinical Oncology , 25(32):5070–5077, 2007. PMID:17991923.[18] American Cancer Society. American cancer society facts & ﬁgures 2019, 2019. [Online; postedJanuary 8 2019].[19] James A. Talcott, Jack A. Clark, Judith Manola, and Sonya P. Mitchell. Bringing prostatecancer quality of life research back to the bedside: Translating numbers into a format thatpatients can understand.

Journal of Urology , 176(4):1558–1564, 2006.[20] Lynne I. Wagner, Lari Wenzel, Edward Shaw, and David Cella. Patient-Reported Outcomesin Phase II Cancer Clinical Trials: Lessons Learned and Future Directions.

Journal of ClinicalOncology , 25(32):5058–5062, 2007. PMID: 17991921.16 ppendix A. Quality-of-Life Survey

Figure A.10: Quality-of-Life surveys given to patients before and weekly after RT. ppendix B. Bladder region thresholds Model Threshold 0.49 0.63Dosage Range (cGy) 0 4665

Dosage Thresholds for the Top of the Bladder0 False Positives 0 FalseNegatives

Model Threshold 0.43 0.62Dosage Range (cGy) 3183 5387

Dosage Thresholds for the Bottom of the Bladder0 False Positives 0 FalseNegatives

Model Threshold 0.4 0.65Dosage Range (cGy) 1640 4500

Dosage Thresholds for the Total Bladder (A)

Bladder Bottom Radiation Dose B l adde r U p R ad i a t i on D o s e SymptomNo Symptoms (B)