[PDF] A Digital Quantum Algorithm for Jet Clustering in High-Energy Physics

Abstract

Full PDF

AA Digital Quantum Algorithm for Jet Clustering inHigh-Energy Physics

Diogo Pires , Pedrame Bargassa , Jo ˜ao Seixas , and Yasser Omar Instituto Superior T ´ecnico, Universidade de Lisboa, Portugal Instituto de Telecomunicac¸ ˜oes, Physics of Information and Quantum Technologies Group, Lisbon, Portugal Laborat ´orio de Instrumentac¸ ˜ao e F´ısica Experimental de Part´ıculas (LIP), Lisbon, Portugal Centro de F´ısica e Engenharia de Materiais Avanc¸ados (CeFEMA), Instituto Superior T ´ecnico, Av. Rovisco Pais 1,1049-001 Lisboa, Portugal † [email protected] ‡ [email protected] †† [email protected] ‡‡ [email protected] ABSTRACT

Experimental High-Energy Physics (HEP), especially the Large Hadron Collider (LHC) programme at the European Organizationfor Nuclear Research (CERN), is one of the most computationally intensive activities in the world. This demand is set to increasesigniﬁcantly with the upcoming High-Luminosity LHC (HL-LHC), and even more in future machines, such as the Future CircularCollider (FCC). As a consequence, event reconstruction, and in particular jet clustering, is bound to become an even moredaunting problem, thus challenging present day computing resources. In this work, we present the ﬁrst digital quantum algorithmto tackle jet clustering, opening the way for digital quantum processors to address this challenging problem. Furthermore, weshow that, at present and future collider energies, our algorithm has comparable, yet generally lower complexity relative to theclassical state-of-the-art k t clustering algorithm. In a world where big data is inevitably becoming the norm of the everyday technological landscape, computational tasksare bound to become increasingly intense. Following the trend, data processing and analysis in experimental high-energyphysics, and in particular at the LHC, presents some of the most computationally challenging tasks worldwide. Given thesmall production cross section of the events of interest associated to New Physics (NP), it is necessary to analyze an enormousnumber of events, often very complex in structure. As such, the situation regarding computational resources is bound to becomedrastically more demanding after the HL-LHC upgrade currently under way, with event sizes being expected to increase ∼

10 fold . Consequently, event reconstruction, and in particular jet clustering, is bound to become an even more dauntingcombinatorial problem, thus challenging present day computing resources.A jet algorithm maps the momenta { (cid:126) p i } of N collimated ﬁnal-state particles, into the momenta { (cid:126) j k } of K clusters calledjets, dependent on the collision conditions and the subsequent particles’ distribution, in an approximate attempt to reverse-engineer the quantum mechanical processes of fragmentation and hadronisation as a way of probing the underlying QuantumChromodynamics (QCD) processes (see Figure 1). Recently, quantum algorithmic approaches for jet clustering have beenproposed, namely in a quantum annealing formulation . In this work, we develop a digital quantum algorithm for multijetclustering. Namely, we propose a new, modiﬁed, version of the quantum k-means algorithm to address the jet clusteringproblem. Furthermore, we implement and classically simulate it through the use of IBM’s Qiskit package . Finally, webenchmark the performance of our quantum algorithm against the classical state-of-the-art k t jet clustering algorithm , namelyin terms of its scaling, as well as in terms of its clustering efﬁciency. k-means The classical k-means algorithm (see Figure 2), applied to jet clustering for the ﬁrst time in 2006 and subsequently in 2012 and 2015 , receives as input a set of N , D -dimensional data points and outputs K centroids, calculated through the mean ofeach group of data points, thus deﬁning K clusters. To be assigned to any particular cluster, a data point needs to be closer to a r X i v : . [ phy s i c s . d a t a - a n ] J a n igure 1. Example of a Dijet event ( e.g. from an e + e − collision), where a quark-antiquark pair is produced, later giving originto colorless bound states through hadronisation, and resulting in two back-to-back jets.that cluster’s centroid than to any other centroid in the data set. In order to successfully converge to the ﬁnal set of centroids,the algorithm iteratively alternates between assigning the data points to K clusters based on the current centroids and choosingthe centroids based on the current assignment of the data points to clusters. It presents a scaling complexity of O ( KND ) ,which corresponds to the dominating step where the KN distances between all D -dimensional data points and all centroids arecomputed. Randomly generate K ini-tial centroids within the data domain (here K=4 , repre-sented by triangles). Assign every point (repre-sented by circles) to the cor-responding nearest centroid (assignment represented through colors). Recalculate the new K centroids by computing the mean of each cluster of points. Repeat steps 2 and 3 until centroids stabilize, and con-vergence has been reached.

Figure 2.

Overview of the procedure relative to the classical k-means algorithm. k-means

In order to construct our digital quantum jet clustering algorithm based on quantum k-means , one needs to start by identifyingwhich of the classical algorithm’s steps has the most potential to yield a quantum advantage. Here, this step corresponds to thedistance calculation between the N data points and the K centroids. In order to compute the distances on a quantum circuit, the SwapTest quantum sub-routine (see Figure 3) is used . By measuring the overlap between two quantum states | ψ (cid:105) and | φ (cid:105) , (cid:104) ψ | φ (cid:105) , based on the measurement probability of the control qubit being in state | (cid:105) , P ( | (cid:105) ) = + | (cid:104) ψ | φ (cid:105) | , the SwapTest routine is used to calculate the squared Euclidean distance || (cid:126) p i − µ k || between a particle’s momentum vector (cid:126) p i and a given jet luster’s centroid µ k . To that purpose it performs the following steps:1. State Preparation:

Prepare two quantum states, | ψ (cid:105) = √ (cid:0) | ,(cid:126) p i (cid:105) + | , µ k (cid:105) (cid:1) | φ (cid:105) = √ Z (cid:0) || (cid:126) p i || | (cid:105) − || µ k || | (cid:105) (cid:1) , with Z = || (cid:126) p i || + || µ k || (1)2. Find Overlap:

Compute overlap | (cid:104) ψ | φ (cid:105) | through the SwapTest sub-routine.3.

Compute Squared Euclidean Distance:

Get the desired squared Euclidean distance through the following equation, || (cid:126) p i − µ k || = Z | (cid:104) ψ | φ (cid:105) | (2)where the qubit registers | (cid:126) p i (cid:105) and | µ k (cid:105) are prepared using Amplitude Encoding , or can be loaded directly from QuantumRandom Access Memory (QRAM) . Furthermore, the search for the closest jet centroid for each particle is also typicallyperformed via the Grover Optimization quantum sub-routine, based on the original Grover search algorithm . However,given that this step does not affect the overall complexity of the algorithm and its jet clustering application proof of concept, wehave chosen not to implement it. Consequently, the quantum algorithm’s complexity is of the order of O ( KN log D ) , resultingfrom the fact that only log D qubits are needed to encode both the particles’ momentum vectors and jet centroids | (cid:126) p i (cid:105) and | µ k (cid:105) . | (cid:105) H • H | ψ (cid:105) ×| φ (cid:105) × Figure 3.

Quantum circuit corresponding to the

SwapTest routine.

The k-means clustering algorithm usually aggregates points by taking into account how distant they are from each other. Forthis purpose, it makes use of the squared Euclidean distance, such that any points closer to each other tend to be clusteredtogether. However, while performing this type of operation for jet clustering, it will tend to aggregate soft particles emittedin opposite directions despite the fact that they belong to opposing jets. This happens due to the small magnitude of theirmomenta. Furthermore, we know that if any two ﬁnal-state particles (cid:126) p i and (cid:126) p j belong to the same jet, the angle θ i j between thetwo particles will tend to be much smaller than to that of any other particle belonging to another jet. This is expected, since thelarge momenta of the produced quarks result in highly collimated jets such that θ ( (cid:126) p i ,(cid:126) p j ) (cid:28) π for any two particles (cid:126) p i and (cid:126) p j in the same jet. As such, we thus re-scale every particle’s momenta to some multiple of the unit-sphere, such that we have acorrespondence between the angle θ i j between any two particles (cid:126) p i and (cid:126) p j , and the distance || (cid:126) p i − (cid:126) p j || between them. K In order to successfully run the k-means algorithm, it is known that one needs to provide the expected number K of clustersto the algorithm. In the case of jet clustering, however, the number K of jets produced is not known a priori . Nevertheless,one does know the expected range of K values as a function of the center-of-mass energy √ s and which particles are beingcollided. As such, we propose to run the algorithm a small number of times over the expected range for K , so that the mostadequate number of jets can be inferred. We chose the value of K which produces the highest quality clustering. For this work,we chose the Silhouette Index as a ﬁgure of merit for clustering quality. By performing a quick complexity analysis of a givenclustering’s Silhouette calculation, however, we ﬁnd that its computational cost is of the order of O ( N D ) , thus surpassing thatof the algorithm itself. For this reason, a simpliﬁed Silhouette ﬁgure of merit is used, composed of the similarity measure a ( (cid:126) p i ) , issimilarity measure b ( (cid:126) p i ) , and Silhouette index s ( (cid:126) p i ) for each of the clustered particles: a ( (cid:126) p i ) = d ( (cid:126) p i , µ i ) , b ( (cid:126) p i ) = min C k (cid:54) = C i d ( (cid:126) p i , µ k ) , s ( (cid:126) p i ) =  b ( (cid:126) p i ) − a ( (cid:126) p i ) max (cid:8) a ( (cid:126) p i ) , b ( (cid:126) p i ) (cid:9) , if | C i | > , , if | C i | = . (3)where C i represents the jet cluster to which particle (cid:126) p i belongs. This way we have managed to reduce its computational cost to O (cid:0) N ( K − ) (cid:1) , which scales slower than the overall algorithm. The overall clustering’s Silhouette is then obtained by computingthe mean of all N particles’ Silhouette values: S K = N ∑ i s ( (cid:126) p i ) . (4) The observables used in the clustering process are the three-momentum vectors of the particles. The dimensionality of theproblem is thus constant with D =

3. Since D is constant, so is log D and this factor drops out in the calculation of thealgorithmic complexity. The computational cost of the algorithm is thus simply O ( KN ) .In what concerns the classical k t algorithm benchmark, we see that despite the possible naive O ( N ) or even O ( N ) implementations, it can be cleverly implemented in O ( N log N ) by exploiting some of its geometrical and minimum-ﬁndingaspects . We now look at both the proposed algorithm and the k t benchmark’s complexities, in order to understand how the twocompare. The new proposed method becomes of interest only in the regime where the number of reconstructed jets K ≤ log N .Furthermore, it is important to notice that the D = k-means algorithm, but also its classical counterpart. Indeed, by dropping the D factor, we obtain acomplexity of O ( KN ) for the classical k-means algorithm. Since this is equivalent to that of its quantum analog, it can be saidthat the use of this quantum algorithm for real day-to-day jet clustering analysis becomes only relevant if one is able to exploitits dimensionality advantage of log D versus D relative to the classical version. Although no such exploitation is proposedin this work, it should not be discarded, as interesting synergies with other stages of the jet clustering process are a strongpossibility.When measuring the algorithm’s jet clustering efﬁciency, the ideal would be to compare it to the true jet regrouping forany given generated event, giving us information on the parenthood of each ﬁnal-state particle and enabling us to know whichparticles should be clustered together. Unfortunately such Monte-Carlo truth is not available by design. Consequently, asmentioned above, we have chosen to measure the algorithm’s performance against that of the classical k t algorithm. For a givenclustering output, where the N ﬁnal-state particles have been sorted into K jets, we compare both algorithm’s clustering resultson a particle-by-particle basis according to the following developed efﬁciency metric, ε : ε = k t k t (5)In order to identify the physically meaningful jets out of all the jets found by the k t algorithm, we apply a minimum transversemomentum p T jet cutoff, such that any given jet with transverse momentum lower than the set cutoff p T is discarded. K versus log N We have used the

PYTHIA Monte-Carlo (version 8.3) to generate the events on which the clustering should be performed(see Appendix for more details on event generation). To study the events’ scaling of K versus log N , we have generatedboth e + e − → Z → q ¯ q collision events at a center-of-mass energy of √ s = m Z = . ± . GeV / c , as well as pp collision events at center-of-mass energies of √ s = TeV and √ s = TeV . We have also explored pp collision eventsinvolving t -quarks given its high jet multiplicity. As such we have performed clustering on 1000 generated events of each kind,storing both the number of found meaningful jets K , as well as the corresponding event’s logarithm of the number of ﬁnal-stateparticles, log N .We present the resulting plots in Figures 4 and 5. The blue line dividing each of the plots in half represents the limit where K = log N . As such, since we are hoping that K ≤ log N , ideally one would ﬁnd the majority of events (red dots) below the blueline, indicating that indeed, the number of jets found is smaller than the logarithm of the number of ﬁnal-state particles involvedin the clustering process. From observation of the four plots of Figure 4, regardless of the chosen jet p T cutoff, it is clear that the ajority of events (red points) already fall below the blue line, thus being in the regime where there is an advantage. Moreover,as expected, the higher we set the jet p T cutoff, the more events fall below the blue line, since we are allowing only higher p T jets to be accounted for, thus lowering the total number of jets found in each event. From observation of the plots of Figure 5,similarly to e + e − collisions, the majority of events also fall below the blue line, thus ﬁnding themselves within the region ofinterest. As expected, both of the plots where the t quarks are generated show higher jet multiplicities overall, neverthelesshaving most of the events falling below the line as well. It is also important to note that for a higher jet p T cutoff, higher K levels would become less populated, moving down towards lower K levels. This would lower the overall event multiplicity forall plots, thus resulting in an even higher portion of events falling within the region of K ≤ log N . We therefore conclude that asigniﬁcant majority of events will yield a number of jets K ≤ log N , thus conﬁrming our algorithmic complexity advantage. Figure 4.

Plots of the number K of found jets, against the logarithm of the number of ﬁnal-state particles log N for fourdifferent jet p T cutoff scenarios in e + e − collision generated events. Each red point represents a generated event, where K jetshave been found for the logarithm of N particles. Ideally, when applying the algorithm to the problem of jet clustering, one would want to explore its performance relative toboth e + e − and pp collision events. However, given its unavoidably rudimentary implementation, the computational load ofclustering pp events is simply too high for the local CPU being used. As a result, we postpone these important test scenariosfor future work, when a more robust version of the algorithm is likely to be developed.As such, we ran the proposed algorithm on the same 1000 events as above, studying both its clustering efﬁciency accordingto equation (5), as well as its jet ﬁnding distribution, comparing it with the one obtained with the k t algorithm (see Appendixfor more details on the k t clustering parameters and quantum k-means implementation). The results for a jet p T cutoff of 8 GeV are presented in Figure 6. From the left histogram, it can be seen that in the overwhelming majority of the clustered events, thequantum k-means algorithm found same jet conﬁgurations as the k t benchmark, with a decreasing fraction of events for lowerclustering efﬁciencies. The overall jet ﬁnding efﬁciency with respect to the k t algorithm is ε = . k t algorithm is in the range from 0 to 3 while the proposed quantum k-means algorithm ranges between 2 and 5. This is expected given that the high transverse momentum jet cutoff of p T = GeV has been applied only to the k t algorithm, thus resulting in a lower number of overall found jets .To better understand the relation between the two algorithms, the applied jet p T cutoff was lowered to p T = GeV , withthe purpose of artiﬁcially imposing a near zero barrier to the number of meaningful jets found by the k t algorithm. The resulting igure 5. Plots of the number of jets found K , against the logarithm of the number of ﬁnal-state particles log N for twodifferent center-of-mass energies in pp collision generated events. Each red point represents a generated event, where K jetshave been found for the logarithm of N particles.plots can be found in Figure 7. As before, a very high efﬁciency of ε = .

2% has been obtained, showing that even fora signiﬁcantly larger number of jets (see right plot) found by the k t , the clustering efﬁciency has remained nearly as high.Regarding the distribution of the number of jets found, we can now see from both the righthand histogram and the heatmap plot,that there is a strong correlation between the number of jets found by both algorithms. In this work, we have introduced the ﬁrst digital quantum algorithm for jet clustering in high-energy physics, representing analternative to the existing quantum annealing state-of-the-art , and opening the doors to harnessing the power of future digitalquantum processors to address this increasingly complex problem. Our algorithm yields jet reconstruction efﬁciencies of theorder of 93%. Furthermore, we have found a clear correlation with the results of the k t algorithm in the number of jets found,further validating our quantum algorithm.It is interesting to note the differences between both algorithms in number of events for each K (see Figure 7, right plot),where a larger number of events with smaller K (mostly K =

2) has been found by the quantum algorithm. This can be explainedby the choice of the

Silhouette ﬁgure of merit: the proposed algorithm showed a tendency to systematically output a smaller K .This can be improved by a suitable choice of a ﬁgure of merit adequate to the nature of the task at hand. It is also important torealize that given the efﬁciency metric used, it is possible that for a portion of the events, the quantum algorithm performs betterthan the k t algorithm benchmark, which lowers its efﬁciency versus the latter, when in fact it should be increasing. Althoughunfortunate, this effect cannot be quantiﬁed.On the other hand, despite being currently on equal footing relative to its classical counterpart, the quantum algorithm hasbeen shown to beneﬁt from the fact that K < log N for the majority of events, thus scaling better than its k t algorithm benchmark.It is nevertheless clear that there is great potential to improve the jet clustering process by exploiting the dimensionalityadvantage of log D versus D of our quantum algorithm.Furthermore, our digital quantum algorithm motivates investigating other challenges such as outlier particles (particleremnants that do not belong to any jet) and pileup. The results of these investigations will be reported in future publications. References Brüning, O. S. et al. LHC Design Report . CERN Yellow Reports: Monographs (CERN, Geneva, 2004). igure 6.

Histograms of the obtained efﬁciencies ε (on the left) and number of found meaningful jets (on the right) for a jet p T cutoff of 8 GeV . Figure 7.

Histograms of the obtained efﬁciencies ε (on the left), number of found meaningful jets K (middle), and heatmap ofthe number of jets found by the proposed algorithm against those found by the k t benchmark for a jet p T cutoff of 1 GeV . Apollinari, G. et al. High-Luminosity Large Hadron Collider (HL-LHC): Technical Design Report V. 0.1 . CERN YellowReports: Monographs (CERN, Geneva, 2017). Wei, A. Y., Naik, P., Harrow, A. W. & Thaler, J. Quantum algorithms for jet clustering.

Phys. Rev. D , DOI:10.1103/physrevd.101.094015 (2020). Pires, D., Omar, Y. & Seixas, J. Adiabatic quantum algorithm for multijet clustering in high energy physics (2020).2012.14514. Kopczyk, D. Quantum machine learning for data scientists (2018). 1804.10068. Abraham, H. et al.

Qiskit: An open-source framework for quantum computing, DOI: 10.5281/zenodo.2562110 (2019). Catani, S., Dokshitzer, Y. L., Seymour, M. & Webber, B. Longitudinally invariant K t clustering algorithms for hadronhadron collisions. Nucl. Phys. B , 187–224, DOI: 10.1016/0550-3213(93)90166-M (1993). Lloyd, S. Least squares quantization in pcm.

IEEE Transactions on Inf. Theory , 129–137, DOI: 10.1109/TIT.1982.1056489 (1982). Chekanov, S. A new jet algorithm based on the k-means clustering for the reconstruction of heavy states from jets.

TheEur. Phys. J. C , 611–616, DOI: 10.1140/epjc/s2006-02618-3 (2006). Thaler, J. & Van Tilburg, K. Maximizing boosted top identiﬁcation by minimizing n-subjettiness.

J. High Energy Phys. , DOI: 10.1007/jhep02(2012)093 (2012).

Stewart, I. W., Tackmann, F. J., Thaler, J., Vermilion, C. K. & Wilkason, T. F. Xcone: N-jettiness as an exclusive cone jetalgorithm.

J. High Energy Phys. , DOI: 10.1007/jhep11(2015)072 (2015). Aïmeur, E., Brassard, G. & Gambs, S. Machine learning in a quantum world. In

Proceedings of the 19th InternationalConference on Advances in Artiﬁcial Intelligence: Canadian Society for Computational Studies of Intelligence , AI’06,431–442, DOI: 10.1007/11766247_37 (Springer-Verlag, Berlin, Heidelberg, 2006).

Lloyd, S., Mohseni, M. & Rebentrost, P. Quantum algorithms for supervised and unsupervised machine learning (2013).1307.0411.

Möttönen, M., Vartiainen, J. J., Bergholm, V. & Salomaa, M. M. Transformation of quantum states using uniformlycontrolled rotations.

Quantum Info. Comput. , 467–473 (2005). Niemann, P., Datta, R. & Wille, R. Logic synthesis for quantum state generation. In , 247–252, DOI: 10.1109/ISMVL.2016.30 (2016).

Giovannetti, V., Lloyd, S. & Maccone, L. Quantum random access memory.

Phys. Rev. Lett. , DOI: 10.1103/physrevlett.100.160501 (2008).

Durr, C. & Hoyer, P. A Quantum algorithm for ﬁnding the minimum (1996). quant-ph/9607014.

Grover, L. K. A fast quantum mechanical algorithm for database search. In

Proceedings of the Twenty-Eighth Annual ACMSymposium on Theory of Computing , STOC ’96, 212–219, DOI: 10.1145/237814.237866 (Association for ComputingMachinery, New York, NY, USA, 1996).

Rousseeuw, P. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.

J. Comput. Appl. Math. , 53–65, DOI: 10.1016/0377-0427(87)90125-7 (1987). Cacciari, M. & Salam, G. P. Dispelling the n myth for the k t jet-ﬁnder. Phys. Lett. B , 57 – 61, DOI: https://doi.org/10.1016/j.physletb.2006.08.037 (2006).

Sjöstrand, T. et al.

An introduction to PYTHIA 8.2.

Comput. Phys. Commun. , 159–177, DOI: 10.1016/j.cpc.2015.01.024 (2015). 1410.3012.

Particle Data Group. Review of Particle Physics*.

Prog. Theor. Exp. Phys. , DOI: 10.1093/ptep/ptaa104 (2020).083C01, https://academic.oup.com/ptep/article-pdf/2020/8/083C01/34461960/ptaa104.pdf.

Cacciari, M., Salam, G. P. & Soyez, G. FastJet User Manual.

Eur. Phys. J. C , 1896, DOI: 10.1140/epjc/s10052-012-1896-2 (2012). 1111.6097. Acknowledgements

The authors would like to thank Akshat Kumar, Duarte Magano, Adam Glos and Jesse Thaler for their valuable feedback.YO thanks the support from Fundação para a Ciência e a Tecnologia (Portugal), namely through project UIDB/50008/2020,as well as from projects TheBlinQC and QuantHEP supported by the EU H2020 QuantERA ERA-NET Cofund in QuantumTechnologies and by FCT (QuantERA/0001/2017 and QuantERA/0001/2019, respectively), and from the EU H2020 QuantumFlagship project QMiCS (820505). JS would like to thank the support of FCT under contracts CERN/FIS-COM/0036/2019 andUIDB/04540/2020.

Appendix

PYTHIA

Event Generation

For the event generation, as already mentioned, we have used

PYTHIA , where different event generation settings have used for e + e − and pp collisions. Starting by e + e − collision events, we have studied events of the type e + e − → Z → q ¯ q , such that all Z decays have been switched off with only those to quarks having been manually switched on. Implicit by the center-of-massenergy used of √ s = m Z , only q ¯ q ∈ { u ¯ u , d ¯ d , c ¯ c , s ¯ s , b ¯ b } decays have been allowed, since the t quark is too massive for thecenter-of-mass energy used here ( m t (cid:28) √ s = m Z ).For pp collisions, all hard QCD processes were switched on through the ﬂag HardQCD:all = on . Moreover, a minimuminvariant p T threshold of 200 GeV was set, through the parameter

PhaseSpace:pTHatMin = 200.

For the t quark processesstudied, all hard QCD processes were turned off, while t quark processes were switched on through the ﬂag Top:all = on . Nofurther constraints or settings were imposed besides the collision’s center-of-mass energy deﬁnition of √ s ∈ { , } TeV , thusresembling LHC conditions. k t Clustering Parameters

The k t clustering algorithm has been implemented and used through the FastJet software package . By using the JetDeﬁnition jet_def(kt_algorithm, R) , the k t clustering algorithm has been selected and chosen to run with an R parameter of = .

8. It is of interest to note that for values of R ∈ [ . , ] , the clustering efﬁciency of the quantum algorithm has remainedconstant. The k t algorithm received as input PYTHIA generated events’ ﬁnal-state particles’ momentum coordinates plus theircorresponding energy. When outputting the jets found, jet p T cutoffs of p T ∈ [ , ] GeV were applied by selecting only thosejets with p T higher than the deﬁned cutoff, again without much variation on the resulting efﬁciencies. Quantum k-means

Implementation

Taking into account the current cloud-available Quantum Processing Unit (QPU) hardware constraints, we have chosen to usethe local CPU in order to simulate quantum QPU behaviour with the help of Qiskit’s appropriate software packages,

Terra and

Aer . The proposed quantum k-means algorithm should be seen as a hybrid algorithm, meaning that only a portion of it isperformed on a digital quantum computer, with the rest of it running on a classical machine. In this particular case, only thedistance computation between particles and centroids has been implemented to run under the Qiskit’s quantum QPU behaviorsimulations package while the remainder of the algorithm has been coded from scratch to be classical.Given the inherently large nature of jet clustering problems in HEP and the fact that IBM’s qasm_simulator feature (whichallows QPU behavior simulation on the local CPU) is prepared for the simulation of a maximum of 32 qubits, we have chosen toimplement one single 5-qubit

SwapTest circuit, running ∼ ∼

24 qubits on the classical machine that isbeing used (15-inch, 2017 MacBook Pro, with a 3.1 GHz Intel Quadcore i7 CPU, and 16GB of RAM).Even though it is trivial to initialize the control qubit, given that it corresponds simply to a one qubit register | (cid:105) , the samecannot be said of the qubit registers | ψ (cid:105) and | φ (cid:105) . Indeed, given their deﬁnitions in (1), one has to put a bit of effort into theirpreparation. Fortunately, Qiskit allows for a very straightforward and useful initial state deﬁnition, simply by feeding it each ofthe states’ amplitudes. Let us start by the simpler case of the quantum state | φ (cid:105) . Given its deﬁnition, we can write: | φ (cid:105) = √ Z (cid:0) || (cid:126) p i || | (cid:105) − || µ k || | (cid:105) (cid:1) = || (cid:126) p i ||√ Z (cid:20) (cid:21) − || µ k ||√ Z (cid:20) (cid:21) = √ Z (cid:20) || (cid:126) p i ||−|| µ k || (cid:21) (6)On the other hand, assuming Amplitude Encoding , when it comes to the quantum state | ψ (cid:105) , one needs ﬁrst to adequatelyexpress the quantum states corresponding to particle (cid:126) p i and jet centroid µ k , | (cid:126) p i (cid:105) and | µ k (cid:105) . We thus write for an arbitrary datapoint a : | a (cid:105) = | a | D ∑ i = a i | i (cid:105) = a x | a | | (cid:105) + a y | a | | (cid:105) + a z | a | | (cid:105) = | a |  a x a y a z  (7)As such, for the quantum state | ψ (cid:105) , we can now write: | ψ (cid:105) = √ (cid:0) | ,(cid:126) p i (cid:105) + | , µ k (cid:105) (cid:1) = √ (cid:32) (cid:20) (cid:21) ⊗ || (cid:126) p i ||  p x p y p z  + (cid:20) (cid:21) ⊗ || µ k ||  µ kx µ ky µ kz  (cid:33) = √ (cid:104) p x || (cid:126) p i || p y || (cid:126) p i || p z || (cid:126) p i || µ kx || µ k || µ ky || µ k || µ kz || µ k || (cid:105) T (8) ow, through equations (6) and (8), one can simply take the respective vectors’ elements and feed them to the Qiskit initialize() function, thus successfully generating the desired quantum states | ψ (cid:105) and | φ (cid:105) ..