[PDF] An Amalgamation of Classical and Quantum Machine Learning For the Classification of Adenocarcinoma and Squamous Cell Carcinoma Patients

Abstract

The ability to accurately classify disease subtypes is of vital importance, especially in oncology where this capability could have a life saving impact. Here we report a classification between two subtypes of non-small cell lung cancer, namely Adeno- carcinoma vs Squamous cell carcinoma. The data consists of approximately 20,000 gene expression values for each of 104 patients. The data was curated from [1] [2]. We used an amalgamation of classical and and quantum machine learning models to successfully classify these patients. We utilized feature selection methods based on univariate statistics in addition to XGBoost [3]. A novel and proprietary data representation method developed by one of the authors called QCrush was also used as it was designed to incorporate a maximal amount of information under the size constraints of the D-Wave quantum annealing computer. The machine learning was performed by a Quantum Boltzmann Machine. This paper will report our results, the various classical methods, and the quantum machine learning approach we utilized.

Full PDF

AAn Amalgamation of Classical and QuantumMachine Learning For the Classiﬁcation ofAdenocarcinoma and Squamous Cell CarcinomaPatients

Siddhant Jain , , Jalal Ziauddin , Paul Leonchyk , Joseph Geraci , University of Toronto NetraMark Corp Queen’s University, Molecular MedicineOctober 30, 2018

Abstract

The ability to accurately classify disease subtypes is of vital importance, especiallyin oncology where this capability could have a life saving impact. Here we reporta classiﬁcation between two subtypes of non-small cell lung cancer, namely Adeno-carcinoma vs Squamous cell carcinoma. The data consists of approximately 20,000gene expression values for each of 104 patients. The data was curated from [1] [2].We used an amalgamation of classical and and quantum machine learning modelsto successfully classify these patients. We utilized feature selection methods basedon univariate statistics in addition to XGBoost [3]. A novel and proprietary datarepresentation method developed by one of the authors called QCrush was also usedas it was designed to incorporate a maximal amount of information under the sizeconstraints of the D-Wave quantum annealing computer. The machine learning wasperformed by a Quantum Boltzmann Machine. This paper will report our results, thevarious classical methods, and the quantum machine learning approach we utilized.1 a r X i v : . [ s t a t . M L ] O c t Introduction

The primary purpose of this paper is to elaborate on the process we used to classifytwo speciﬁc lung cancers, namely Adenocarcinoma and Squamous Cell Carcinoma,from a 20,000 gene expression set using a combination of classical and quantum ma-chine learning models. In the process of achieving consistent classiﬁcation we utilizeda process of iterative hyper-tuning. This paper will focus largely on the algorithmsand procedures used to complete computations with the Quantum Processing Unit(QPU) provided by D-Wave to harness its unique ability to enhance speciﬁc opera-tions used for our classiﬁcation of lung cancers.In the spirit of the work in [4] we would like to disclose that there were no attemptsat demonstrating some type of quantum supremacy here. It is believed that classicalmachine learning will dominate the landscape of data science for many years whileserious quantum computational hardware challenges are met. However, it is believedthat near term devices will be able to provide impressive advantages for certain tasksand that hybrid classical and quantum methods will be a way forward. With thatin mind, we utilize the ability of quantum annealers to generate complex probabilitydistributions to aid a machine learning protocol in the arena of medicine.

The idea that complex disorders like Alzheimer’s disease, and many cancers, are infact syndromes consisting of several sub-diseases is not new. However, our ability tobetter stratify patient populations for these disorders arrived with machine learning.Oncology led the way with what we refer to as biomarkers [5], which are sets ofvariables that one can measure from a patient, like gene expression or cholesterollevel, that can be used to make a prediction or determination about a person’shealth. For example, will a patient respond to some speciﬁc treatment, or is apatient at risk for having at heart attack in the near future? Terms like personalizedand precision medicine are usually used interchangeably, but a preference in generalhas emerged for precision medicine. This is simply because the term personalizedseems to imply a treatment designed for one person speciﬁcally, which may occur inthe future, whereas the term precision is based on the idea that we could understanda true subtype that a patient lies within and for whom a particular treatment maybe optimal. Machine learning is making this a reality and models already exist thatinﬂuence cancer and other treatments [6].In addition to the very important work which focuses on making treatments forpatients more precise, precision medicine through machine learning is also inﬂuenc-2ng drug design. The reason for wanting to do this is simple: if one says that theyare going to design a drug for Alzheimer’s, what do they really mean? What mech-anism of action is this drug going to have? Machine learning is beginning to revealthat Alzheimer’s and other disorders have multiple distinct manifestations that maylikely require diﬀerent interventions. Thus, work is commencing to precisely deﬁnethese subtypes and what molecular machinery drives them, and this knowledge isgoing to inﬂuence the next generation of drug design. As quantum computers ma-ture, quantum machine learning will play a role at both ends of this process. On theone hand, quantum computers provide a natural computational environment for theexploration of molecules, being that molecules are quantum mechanical structures.On the other hand, their ability to utilize quantum parallelism may be utilized tounderstand patient subpopulations. In the meanwhile, the machine learning proto-cols available for systems like the D-Wave 2000Q is allowing researchers to createpatient response protocols that may one day inﬂuence clinical decisions. We presenta unique eﬀort in this paper by utilizing the D-Wave 2000Q computer to create acancer biomarker, but there has already been some eﬀort in the medical space in-cluding the work in [7] which attempts to explore the potential impact that quantummachine learning may have on a computational biology problem. In addition to this,there have been various eﬀorts in utilizing quantum annealers to model drugs andtheir interactions with proteins. One may refer to [8] to learn about the many, albeitpotentially premature yet exciting, commercial eﬀorts commencing in this space.

At the heart of a quantum computer is the Quantum Processing Unit (QPU) whichprovides a way to encode information in what is referred to as quantum bits (qubits)[9]. A quantum bit, likened to a classical bit contains information that the machineuses to conduct operations. Unlike a classical bit the quantum bit does not default toa 0 or 1, rather it occupies a state that is in, what is referred to as, a superposition[10]of 0 and 1 with an underlying probability[11]. Once the qubit is measured by anobservation, the qubit ﬁnds itself in a deﬁnite state of either 0 or 1, like an ordinaryclassical computer. The fact that the qubit, or more accurately, that the set of qubitswithin the computer, is in a superposition allows the machine to use quantum me-chanics to evolve its state. This property, in addition to entanglement[10], producesa situation where the amount of information that can be simultaneously representedin one state of the quantum computer is exponentially greater than what can rep-resented in a classical machine. In other words, if you have two qubits, a quantumcomputer can be in all 4 states | (cid:105) , | (cid:105) , | (cid:105) , and | (cid:105) simultaneously while a clas-3ical computer can only be in one of these four states in any moment. This doesmean that a quantum computing programmer has access to quantum mechanicalproperties and can evolve a state in a way that classical computers cannot, and thusperform some novel computations through quantum algorithms. This does not meanthat quantum computers are in general exponentially faster, though they can per-form some computations more eﬃciently. To be mathematically rigorous, if BQPare the class of problems that quantum computers can make a decision about inpolynomial time, the the actual relationship between BQP and NP is not known[10].Ultimately, the advent of quantum computers like the D-Wave computer repre-sents the culmination of years of eﬀort and the achievement is remarkable. Never-theless, these modern quantum computers are still not superior in any practical way,however our ability to model lung cancer with such a machine indicates that as thesemachines scale, we will be able to access them to understand patient populations in away not possible with classical computers. We will now review the D-Wave machinebrieﬂy. .Figure 1: D-Wave Quantum Processing Unit Arranged by Unit CellsThe architecture of the Quantum Processing Unit used by D-Wave is a chimeragraph as shown in ﬁgure 1.The paradigm of quantum computation that the D-Wavecomputer uses is known as annealing Quantum Computation or Quantum Annealing(QA) [12]. Essentially, one ﬁrst needs to understand that a problem can be encodedin an energy functional called a Hamiltonian. Next, note that in the D-Wave systemone can start with a simple Hamiltonian in its ground state and then allow thetechnology to change its internal conﬁguration in such a way so that it will end up in4he ground state of the Hamiltonian that encodes the problem of interest. This hasto be done in a very precise and time sensitive way, but is clearly possible[13]. Thisallows us to solve a certain Hamiltonian required to learn about our cancer patientsthrough Quantum Boltzmann Machines. See [7] for more information on the QAparadigm. A Boltzmann Machine is a generative machine learning model that is trained to en-code the underlying distribution of a given dataset [14]. A Restricted BoltzmannMachine (RBM), a subset of Boltzmann Machines, is a bipartite graph that is seg-mented into visible and hidden neurons. The neurons of the visible layer are stronglyconnected to those in the hidden layer, but there exist no adjacent edges betweenneurons belonging to the same layer. We chose to use a RBM in our experimentbecause its training is less computationally expensive than a standard BoltzmannMachine [15]. See ﬁgure 2. .Figure 2: A Simple Restricted Boltzmann Machine Consisting of 4 Visible and 3Hidden Nodes. Note the absence of intralayer edges.Once trained, RBMs produce reconstructions of the provided data [15]. Onemethod of training, optimizing the weights of, an RBM is Contrastive Divergence,which is achieved by Gibbs sampling and gradient descent[16]. It is important tounderstand the workings of a Restricted Boltzmann Machine as it provides the foun-dation for understanding the training of Boltzmann machines on the D-Wave quan-tum annealer via the quantum sampler. Keep in mind that the idea to utilize this5aradigm of machine learning was natural as the fundamental architecture of theQuantum Processing Unit closely resembles the graphical structure of an RBM.[17]

An RBM contains three parameters that encode the entire model:i) the bias vector for the visible layer (length n)ii) the bias vector for the hidden layer (length m)iii) the weights matrix that represents the edge weight for each connection betweenthe visible and hidden layer (m by n)Once these three parameters are trained the RBM should be able to create reliablereconstructions of the data that it is provided. The signiﬁcant hyper-parameters ofa RBM are the learning rate and the number of hidden neurons. The learning ratesimply determines the magnitude of change made to the parameters, and the numberof hidden neurons determines the degree of information compression that occurs. Bytuning the hyper-parameters you should be able to create a model that reconstructsthe data provided with a high degree of accuracy.The Boltzmann equation determines the energy of a system. In the context of anRBM this means the equation is as follows: E ( v, h ) = − (cid:88) i a i v i − (cid:88) j b j h j − (cid:88) i (cid:88) j v i w i,j h j (1)The parameters that we seek to tune with training are the weights matrix, andthe two bias vectors a and b. In matrix form: E ( v, h ) = − a T v − b T h − v T W h (2)To put this in perspective consider that energy based methods are designed tominimize an energy functional, as given by E ( v, h ) above, for example. The proposedsystem is essentially trying to capture relationships between variables. Thus, if wetrain this system to minimize E ( v, h ) then lower energy conﬁgurations will be favoredby the system. This is because a standard way of computing the joint probabilitybetween v and h is given by P ( v, h ) = exp ( − E ( v, h )) (cid:80) v,h exp ( − E ( v, h )) (3)6hus, because we minimize E ( v, h ), we end up maximizing P ( v ) = (cid:80) h exp ( − E ( v, h )) (cid:80) v,h exp ( − E ( v, h )) (4)Once this maximization occurs, we will have a system that will reﬂect the dis-tribution of the training set. We will not go into the technicalities of how this isaccomplished here, however there are standard references available [18]. The essen-tial idea for the reader to understand is that this system generalizes well on theD-Wave machine due to the architecture of the system. More speciﬁcally, the con-nectivity of the QPU given in ﬁgure 1, can be utilized to deﬁne a system that is thequantum analogy of the Restricted Boltzmann Machines just described [19]. Being able to reconstruct gene expression data is useful, yet it isn’t immediatelyapparent how it can be used to classify diﬀerent subtypes of lung cancer. As RBMsare not ordinary neural networks with well deﬁned cost functions and conventionalback-propagation they cannot be trained simply by employing usual methods. Onemethod of classiﬁcation is to use a neural network classiﬁer that uses the hiddenvariables as the input. Please see ﬁgure 3.Another method, and the one used for our experiment is to clamp visible neuronswith the class in which they belong and train the RBM to be able to reconstructthose clamped values[20]. For example, if a dataset contains 100 patients that areeither male or female, and 50 variables: one would construct an RBM with 50 visibleneurons + 2 visible neurons that will contain the class information (clamp), for atotal of 52 visible neurons. When training, the 50 variables would be fed into theRBM as usual, with [1, 0] as the clamp value for male and [0, 1] for female. Thusthe RBM, which is agnostic to the order of the data being fed in, would learn thedistribution of data in relation to the clamp. In essence it would learn that whenthe 50 variables are distributed in a certain way the clamp is distributed in a uniqueway as well (particularly either [1, 0] or [0, 1]).Once the model is trained with a clamp (see ﬁgure 4), its accuracy can be validatedby feeding a new patient vector with a neutral clamp [1, 1] and evaluating the resultby collapsing the larger of the clamp values to a 1 and the other to a 0. E.g. ifa 50 variable patient vector is fed into the RBM with a neutral clamp [1,1], thereconstruction would be a fuzzy reconstruction of the patient vector with the clampvalue [0.23, 0.48] which we would collapse to [0, 1], i.e., prediction would be female.7igure 3: Model of a Possible Architecture Used to Classify Samples Consisting ofan RBM and ClassiﬁerIt is important to note that you need an equal number of clamped neurons as thenumber of classes present in the dataset in order to provide a one-hot encoded clamp.

With a better understanding of an RBM we can begin to understand the basis ofhow we utilize the D-Wave machine to train the RBM. At the simplest level, thechimera structure allows one to map the classical RBM to the QPU. A diﬀerencenow is that the Hamiltonian of this new system consists of an energy where the spinterms in E ( h, v ) above are replaced by operators[21], however the same three param-eters exist for this mapped system as the classical RBM we deﬁned. In essence, totrain a generative model you must have some insight into the underlying probability8igure 4: Method of Classifying Samples Using a Clampdistribution of the data. Since determining the true distribution is a computationallyexpensive task we sample from the distribution instead. For example, if you wantto know the underlying distribution of the sum of two dice throws you can samplefrom the distribution to gain further insight. Samples would be like: 2, 6, 7, 9, 7,11, 3, 6, etc. Given enough samples you would be able to better construct the trueprobability distribution of the sum of two dice throws.This process of sampling scales to larger, more complex multivariate probabilitydistributions. The QPU has an ability to sample from a more complex distributionand this is due to the fact that the quantum mechanics that drives the QPU inevitablyintroduces ﬂuctuations that act as perturbations that can keep the system out of itsground state. This ‘restlessness’ provides a complex landscape from which to sampleand thus to enrich the training of the RBM.We will use the QPU to conduct the classiﬁcation of lung cancer, but ﬁrst we mustreduce the dataset from roughly 20,000 gene expressions down to a more manageableamount. We independently used three main methods of reducing data, namely univariatevariable reduction, XGBoost, and, QCrush. The primary reason for reducing down9igure 5: Histogram of the Sums of Two Dice Being Thrown Trialsthe number of variables is due to the physical restraints of the QPU. In addition, wedo so to reduce noise from the set and be able to train solely with the most signiﬁcantvariables/indicators. Feature selection can be a very complex and important partof the machine learning process and there are many methods that are generallyconsidered either ﬁlter, wrapper, or embedded methods [22], and then combinationsof these which we refer to as ensemble feature selection methods [23]. For this projectwe used ﬁlter, wrapper, and embedded methods, speciﬁcally we experimented withthe Fisher score, XGBoost [3], and LASSO. For this experiment this made littlediﬀerence and the results we report used the Fisher score. This reduced the numberof genes from approximately 20,000 to three diﬀerent sets with 10,20, and 50 genes.The results reported below is derived from the set with 10 variables.An interesting approach that we developed internally was referred to above asQCrush. We will review this brieﬂy as it will be the subject of a future publicationafter we perform more experiments with it. Essentially what QCrush does is that itcompresses many variables into a representation for each patient. This enriches howmuch information can be used to represent an object that is to be modeled, which inthis case is a patient. The reason we created such an algorithm is due to limitationsin the architecture of the D-Wave QPU, which limits how many variables can be usedto create a model. Similar approaches have been used including autoencoders [24] todeal with this issue. QCrush has an advantage in that the encoding can be visualizedin terms of how patients relate to each other. However, the point is that one can usea method like this to reduce the size of the data to ensure that an optimum amountof information is utilized for the learning phase of the RBM on the QPU.10 .2 Training a Quantum Boltzmann Machine

Once the number of features has been reduced there are a set of steps that need tobe taken to train the QBM. Steps 6 through 9 were repeated thrice for each set ofhyper-parameters for greater precision.1. Partition the dataset. In our case we will train on 80 samples, validate on 10,and test on 142. Normalize the dataset using a standard linear scaling to ensure every datapointis between 0 and 13. Determine the number of hidden variables and number of samples to use4. Initialize the quantum sampler5. Binarize the dataset6. Clamp the desired result7. Pass the batches through for training8. Evaluate results on the validation set9. Score results on test setThese steps lay out the basic procedure by which the Quantum Boltzmann Machineis trained. For our procedure we will focus on a the dataset created after performingan XGBoost.

The dataset created via XGBoost contains 104 patients with 3 signiﬁcant variables.In order to accurately validate the models’ performance the patients were partitionedinto training, validation, and testing sets with a 80:10:14 split of the patients.

Restricted Boltzmann Machines are probabilistic models that seek to encode a com-plex probability distribution. By performing a linear normalization to the datasetwe are able to utilize this probabilistic property of the model. Furthermore, when webinarize the dataset in order to represent it with binary samples, to be compatible11igure 6: Partitioning the Patient Dataset Into 80 Training, 10 Validation, and 14Testingwith the QPU, it will be crucial that the values be contained within [0, 1]. This isbecause the algorithm that is utilized during this process by the QPU is known asquadratic unconstrained binary optimization (QUBO) [25] and the vector quantitiesmust be binary. The D-Wave QPU is designed to excel at solving these types ofproblems.We normalized in the following way:current element − minimum elementmaximum element − minimum element The hyper-parameters for the QBM that we are expressly concerned with are:i) the number of samplesii) the number of hidden nodes (i.e. data compression)iii) the learning rateWith a rudimentary form of hyper-tuning, our approach was simply to iterate overevery number of hidden layers from 1 to 3, learning rate from 0.25 to 1.25 with stepsize 0.25, and samples from 1 to 2048 in powers of 2. In the section dedicated tohyper-tuning we will speak to the results from our hyper-tuning.

Create an instance of the QBM class with the hyper-parameters that were deter-mined. In addition the QBM is initialized with the number of visible nodes (numberof features + classes)(refer to clamping for reasoning).12 .2.5 Binarize the Dataset

In order to utilize the Quantum Sampler the input data must contain only binarysamples of the dataset, yet the data that is available is not so. Once normalized eachpatient is a vector of ﬂoating point data points between 0 to 1. To now binarize thisvector of ﬂoats we broadcast a single vector into a set of one thousand vectors asfollows. To illustrate with a broadcast to just ten vectors:Figure 7: Binarization of a Normalized Vector to a Set of Binary VectorsEvery value in a column vector is expanded from being a single value to a vectorof one thousand 1s and 0s, where the number of 1s in the new vector correspond tothe probability encoded by the single value. E.g. if the ﬁrst variable of the patients’column vector is normalized to the value 0.7, the new binary vector representation ofthat single variable will be a thousand element vector with 700 ones, and 300 zerosin a random ordering. These 1000 column vectors are treated as a single batch whentraining the QBM, meaning each batch contains a single patient represented by 1000binary vectors.

Adenocarcinoma and Squamous Cell Carcinoma are the 2 classes that the patientdata belongs to. Thus we have a clamp of size 2 with [1,0] representing Adenocar-cinoma and [0, 1] representing Squamous Cell Carcinoma. Once we append theseclamp values to the patient vector we can begin to train.

Every binarized batch of 1000 vectors is trained using the D-Wave quantum com-puter and the weights matrices are updated to be able to produce more accuraterepresentations of the dataset. The operation of creating a batch of binarized test13ata and determining the hyper-parameters are in essence the only major require-ments to train the QBM. Once the QBM is initialized and the dataset is processedinto the aforementioned batches, D-Wave’s libraries handle the task of sampling fromthe QPU and using the gathered samples to train the model.

To validate the classiﬁcation accuracy of the QBM we compute an error. This error isthe euclidean distance between the vectors representing the clamp value i.e. [1, 0] andthe predicted clamp value. In layman’s terms, we determine an error by summing thesquare of the diﬀerences between each corresponding vector value of the predictedand true clamp. The lowest consistent error value indicates the best trained modelwith optimal hyper-parameters. To determine the predicted classiﬁcation we repeatthe same process for normalizing and preprocessing as before, yet instead of providingthe clamp with the actual classiﬁcation we provide a neutral clamp, i.e., [1, 1]. Thenwe simply feed the batch forward, and back, and calculate error.

Once validated we can test the QBM’s classiﬁcation accuracy by repeating the sameprocess as validation. The one diﬀerence between validation and testing is thatwe will not be testing for an error score, rather a raw value of success/failure inpredication. If the models we create are over-ﬁt we should observe a great disparitybetween the error found when validating and the actual test results. • the results of experiment showed that a learning rate of 0.75, 3 hidden neurons,and 1024 samples produced raw scores of 13, 14, and 13 • the average score being = 95 . • observed an increase in prediction accuracy with greater samples, as expected,and an increase in accuracy with a greater number of hidden nodes, withdimishing returns. 14 F r e q u e n c y o f R a w S c o r e o n T e s t D a t a s e t Histogram of Raw Score with Learning Rates0.250.500.751.001.25

Due to the limitations of time-sharing the D-Wave computer, in addition to con-straints of the service itself, we were unable to test a greater number of hyper-parameters and conditions. In the future we hope to further explore the capabilitiesand potential optimizations of the D-Wave quantum system. That being said, wehave already constructed interesting models to predict mild vs aggressive cases ofChronic Lymphocytic Leukemia (CLL). However, a hope that we have and see as apotential avenue to achieve true quantum supremacy in the near term involves theability to explore the molecular and genetic landscape of patient populations.It is already understood and believed that quantum supremacy can be achievedon near term quantum devices in the area of drug design. This is due to the factthat molecules are quantum entities and quantum computers provide a computa-tional space to perform simulations that is obviously more natural for such a venture.15his eﬀort is underway by the authors of this paper but we propose a related spacewhere quantum supremacy may be possible: the aforementioned molecular and ge-netic landscape of patient populations. To elucidate consider that one can capturehundreds of miRNA data + 20000 mRNAs + a large number of methylation data(hundreds of thousands) + single nucleotide polymorphisms (millions) + microbiomedata. This is a monstrous amount of data about each individual and the challengeis to create algorithms that will help us understand which subset of these variablesactually characterize a patient or person in a meaningful way. For example, is there arelatively small number of these variables, say approximately 10-30, that could deﬁnea speciﬁc sub-type of human that would respond particularly well to a new treatmentof pancreatic cancer? This not only deﬁnes a subtype of human, but perhaps a spe-ciﬁc manifestation of the disease. In this way, one would even be able to direct theactivities of drug designers and thus a truly personalized approach to medicine maybe possible. A hopeful perspective, but one tamed by the monstrous complexity in-volved with this kind of variable reduction. Approaches that are deemed black boxescannot help with this task and thus one will need methods that hand over the subsetsof variables, and eﬀectively ‘explain’ themselves. Quantum computation may be ableto play a signiﬁcant role here because of our ability to utilize quantum parallelism.Work in this direction that utilized the QUBO [25] algorithm on a quantum annealercan be found in a white paper here [26]. Much progress has to be made in quantumcomputational technologies before problems like this can be truly addressed howevernear term machines may be able to make real progress in this direction by allowingfor a more complex and complete sampling space. This will allow novel algorithmsa deeper reach into the space of these variable subsets. We are beginning to explorethis direction through work with the D-Wave technology.

We utilized a quantum annealer, namely the D-Wave 2000Q, to create a model ca-pable of predicting if non-small cell lung cancer patients either have adenocarcinomaor squamous cell carcinoma, two diﬀerent varieties of this deadly disease. The abil-ity to know this can have life altering treatment consequences, especially as cancertreatment matures and becomes more personalized. The variables used to train themodel was gene expression data derived from tumor samples. Considering that themachine had access to 104 patients in total, the machine performed very well and wehave reason to believe that the models are robust, as classical methods performedsimilarly and replicated well. The eﬀort however was not made to demonstrate quan-tum supremacy of any kind, but to explore how precision medicine may be impacted16s near term devices become more powerful. The ability to sample from complexdistributions, something that quantum annealers are able to readily provide, is al-lowing generative based machine learning models, like ours, to become a reality. In[4], various approaches like this are explored but an important goal of this same pa-per was to suggest that near term quantum computers can be utilized to eventuallyout perform classical computers but only for certain types of problems and utiliz-ing certain approaches. Our work here was performed in the same spirit and is aninitiation point for an ongoing eﬀort to explore how this new paradigm of machinelearning will impact the medical space. Precision medicine is the hope of the futurewhere treatment protocols will be tailored for speciﬁc subpopulations of patients. Inorder for this to become a reality, we will need to utilize machine learning proto-cols to understand the disease from the perspective of patient populations and thevarious manifestations these classically identiﬁed illnesses take. In addition to thenew disease deﬁnitions that this eﬀort will bring forth, a new understanding thatwill inﬂuence treatment paradigms will emerge. We believe that quantum machinelearning, even near term devices, will have an impact.

The authors would like to thank D-Wave for allowing us access to their technologyand the excellent training provided by their team. Thank you Dr. Peter Wittek,Colin J.E. Lupton, and Dr. Abhi Rampal. We would also like to thank Creative De-struction Labs for having us participate in the 2017 CDL Quantum Machine Learningstream. Thank you to Songeun You for providing the ﬁgures. A thank you to Queen’sUniversity and Dr. Harriet Feillotter for advice and data sources. Finally, a specialthank you to the whole NetraMark Corp team and all investors who helped makethis endeavor a reality.

References [1] R. Kuner. Geo lung cancer data set - gse10245. , 2009.[2] M. C. Golumbic. Geo lung cancer data set - gse18842. , 2010.[3] C. Guestrin T. Chen. Xgboost: A scalable tree boosting system. https://dl.acm.org/citation.cfm?id=1273596 , 2007.174] Alejandro Perdomo-Ortiz, Marcello Benedetti, John Realpe-G´omez, and RupakBiswas. Opportunities and challenges for quantum-assisted machine learning innear-term quantum computers.

Quantum Science and Technology , 3(3):030502,2018.[5] K. Strimbu and J. A. Tavel. What are biomarkers? current opinion in hiv andaids. http://doi.org/10.1097/COH.0b013e32833ed177 , pages 463, 466, 2010.[6] B. Mesko. The role of artiﬁcial intelligence in precision medicine.

Expert Reviewof Precision Medicine and Drug Development , 2(5):239–241, 2017.[7] Richard Y. Li, Rosa Di Felice, Remo Rohs, and Daniel A. Lidar. Quantumannealing versus classical machine learning applied to a simpliﬁed computationalbiology problem. npj Quantum Information , 4(1):14, 2018.[8] https://quantumcomputingreport.com/players/privatestartup.[9] D-Wave. D-wave. 2018.[10] M.A. Nielsen and I.L. Chuang.

Quantum Computation and Quantum Informa-tion: 10th Anniversary Edition . Cambridge University Press, 2011.[11] B. Schumacher. Quantum coding.

APS Physics , pages 2738–2747, 1995.[12] Tadashi Kadowaki and Hidetoshi Nishimori. Quantum annealing in the trans-verse ising model.

Phys. Rev. E 58, 5355 (1998) , 1998.[13] R. Harris, M. W. Johnson, T. Lanting, A. J. Berkley, J. Johansson, P. Bunyk,E. Tolkacheva, E. Ladizinsky, N. Ladizinsky, T. Oh, F. Cioata, I. Perminov,P. Spear, C. Enderud, C. Rich, S. Uchaikin, M. C. Thom, E. M. Chapple,J. Wang, B. Wilson, M. H. S. Amin, N. Dickson, K. Karimi, B. Macready,C. J. S. Truncik, and G. Rose. Experimental investigation of an eight-qubitunit cell in a superconducting optimization processor.

Phys. Rev. B , 82:024511,Jul 2010.[14] R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted boltzmann machines forcollaborative ﬁltering. https://dl.acm.org/citation.cfm?id=1273596 , 2007.[15] A. Fischer and C. Igel. An introduction to restricted boltzmann machines. https://link.springer.com/chapter/ , 2012.[16] M. Carreira-Perpinan and G. Hinton. On contrastive divergence learning. , 2005.1817] S. Adachi. Application of quantum annealing to training of deep neural net-works. https://arxiv.org/abs/1510.06356 , 2015.[18] Geoﬀrey E. Hinton.

A Practical Guide to Training Restricted Boltzmann Ma-chines , pages 599–619. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.[19] Mohammad H. Amin, Evgeny Andriyash, Jason Rolfe, Bohdan Kulchytskyy,and Roger Melko. Quantum boltzmann machine.

Phys. Rev. X , 8:021050, May2018.[20] H. Larochelle and Y. Bengio. Classiﬁcation using discriminative restricted boltz-mann machines. 2008.[21] M.H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, and R. Melko. Quantumboltzmann machine. 2016.[22] J. Li, K. Cheng, S. Wang, F. Morstatter, R. Trevino,J. Tang, and H. Liu. Feature selection: A data perspective. https://dl.acm.org/citation.cfm?id=1273596 , 2018.[23] Donghai Guan, Weiwei Yuan, Young-Koo Lee, Kamran Najeebullah, andMostofa Kamal Rasel. A review of ensemble learning based feature selection.

IETE Technical Review , 31(3):190–198, 2014.[24] D. Kingma and M.Welling. Auto-encoding variational bayes. 2013.[25] G. Tavares E. Boros, P. Hammer. Local search heuristics for quadratic uncon-strained binary optimization (qubo).