Deep Neural Network Discrimination of Multiplexed Superconducting Qubit States
Benjamin Lienhard, Antti Vepsäläinen, Luke C. G. Govia, Cole R. Hoffer, Jack Y. Qiu, Diego Ristè, Matthew Ware, David Kim, Roni Winik, Alexander Melville, Bethany Niedzielski, Jonilyn Yoder, Guilhem J. Ribeill, Thomas A. Ohki, Hari K. Krovi, Terry P. Orlando, Simon Gustavsson, William D. Oliver
DDeep Neural Network Discrimination of Multiplexed Superconducting Qubit States
Benjamin Lienhard,
1, 2, ∗ Antti Veps¨al¨ainen, Luke C. G. Govia, † Cole R. Hoffer,
1, 2
Jack Y. Qiu,
1, 2
Diego Rist`e, Matthew Ware, David Kim, Roni Winik, Alexander Melville, Bethany Niedzielski, Jonilyn Yoder, Guilhem J.Ribeill, Thomas A. Ohki, Hari K. Krovi, Terry P. Orlando,
1, 2
Simon Gustavsson, and William D. Oliver
1, 2, 4 Department of Electrical Engineering and Computer Science,Massachusetts Institute of Technology, Cambridge, MA 02139, USA Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA Quantum Engineering and Computing Group, Raytheon BBN Technologies, Cambridge, MA 02138, USA MIT Lincoln Laboratory, Lexington, MA 02421, USA (Dated: February 26, 2021)Demonstrating the quantum computational advantage will require high-fidelity control and read-out of multi-qubit systems. As system size increases, multiplexed qubit readout becomes a practicalnecessity to limit the growth of resource overhead. Many contemporary qubit-state discriminatorspresume single-qubit operating conditions or require considerable computational effort, limitingtheir potential extensibility. Here, we present multi-qubit readout using neural networks as statediscriminators. We compare our approach to contemporary methods employed on a quantum devicewith five superconducting qubits and frequency-multiplexed readout. We find that fully-connectedfeedforward neural networks increase the qubit-state-assignment fidelity for our system. Relative tocontemporary discriminators, the assignment error rate is reduced by up to 25 % due to the com-pensation of system-dependent nonidealities such as readout crosstalk which is reduced by up to oneorder of magnitude. Our work demonstrates a potentially extensible building block for high-fidelityreadout relevant to both near-term devices and future fault-tolerant systems.
I. INTRODUCTION
Quantum computers hold the promise to solve particu-lar computational tasks substantially faster than conven-tional computers [1, 2]. Depending on the computationaltask, such quantum devices need to be composed of hun-dreds to millions of high-fidelity qubits. An increase froma few to many qubits is generally accompanied by thechallenge of maintaining low error rates for qubit controland readout.Over the past two decades, superconducting qubitshave emerged as a leading quantum computing plat-form [3, 4]. Today, individual qubits with coherencetimes exceeding 100 µ s [5], gate times of a few tens ofnanoseconds [6], and single- and two-qubit gate operationfidelities above the most lenient thresholds for quantumerror correction have been demonstrated for devices withup to 50 qubits [6, 7]. However, considerable work is stillneeded to retain and even further improve these fidelitiesas systems increase in size and complexity [8].Errors arise during all stages of the circuit model:initialization [9, 10], computation [11, 12], and read-out [13]. In many implementations, qubit readoutplays a key role beyond merely measuring the com-putational output. For example, quantum error cor-rection protocols require repeated readout of syndromequbits [8, 14, 15]. Even without error correction, manyof the noisy intermediate-scale quantum (NISQ) [16] eraalgorithms involve an iterative optimization that gener-ates a target quantum state based on prior trial-state ∗ [email protected] † [email protected] measurements of qubits [17, 18]. In addition, diagnosingqubit-readout errors in post-processing requires compu-tationally expensive statistical analyses of repeated com-putation and measurement [6, 19, 20]. Developing accu-rate and resource-efficient qubit-state readout is a key torealize useful quantum information processing tasks.In this work, we present machine-learning-enabledqubit-state discrimination. We evaluate the qubit-state discrimination performance of deep neural networks(DNN) relative to contemporary methods used for super-conducting qubits. Nonlinear filters such as DNNs canbetter cope with system-dependent nonidealities, suchas readout crosstalk. To evaluate these different qubit-state discriminator techniques, we use a quantum sys-tem comprising five frequency-tunable transmon qubitsread out simultaneously via a common feedline using astandard frequency multiplexing approach. In contrastto single-qubit readout, such a multi-qubit system issubject to nonidealities, such as readout crosstalk, thatmay benefit from more sophisticated discriminators. Weshow that a DNN classifier can efficiently converge toa higher-performing multi-qubit discriminator with suf-ficient training. In our five-qubit system, we show thatqubit-state assignment errors are reduced by up to 25 %for multi-qubit architectures sharing a readout transmis-sion line [6, 21, 22]. By examining the qubit-state as-signment performance using a confusion matrix and thecross-fidelity metric, we attribute the reduction to theDNN compensating for crosstalk.It has been shown that neural networks can learn thequantum evolution of a single superconducting qubit us-ing merely measurement data and without introducingthe rules of quantum mechanics [23]. Statistical learningalgorithms have been applied to superconducting qubit a r X i v : . [ qu a n t - ph ] F e b (a) (b)(c) (d)
300 K3K0.02 KControl ADCRT AmpHEMTJTWPA Readout R R R R R ω Res /2 π (GHz) 7.06 7.10 7.15 7.20 7.25 χ / κ eff Q Q Q Q ω Qubit /2 π (GHz) 5.09 4.40 5.00 4.30 5.17T ( μ s) 40.8 6.40 21.4 11.8 23.4QubitsReadoutResonators ADC5-Qubit Chip I Q AWG LO
Readout I Q AWG LO
Mixer I Q Q Q Q Q Q I IF [n] Q IF [n] FIG. 1. Measurement Setup and Chip. (a) Schematic of superconducting qubit control and readout. The control and readoutpulses, generated by an arbitrary waveform generator (AWG) and up-converted to GHz frequencies using a local oscillator(LO), are sent through attenuated signal lines to the readout resonator on the five-qubit chip. The transmitted readout signalis amplified by a Josephson traveling-wave parametric amplifier (JTWPA), a high-electron-mobility transistor (HEMT), anda room-temperature amplifier. Subsequently, the signal is down-converted to MHz frequencies and digitized—in-phase I IF [ n ]and quadrature Q IF [ n ] sequences at intermediate frequencies (IF). Colored optical micrograph (b) and the circuit schematic(c) comprising five superconducting transmon qubits. The qubit transition frequencies are tuned via a global flux bias. Eachqubit is capacitively coupled to a quarter-wave readout resonator that couples inductively to a bandpass (Purcell) filteredfeedline. (d) The resonator frequencies ω Res / π are near 7 GHz with χ/κ eff ratios ranging from 0 .
12 to 0 .
19, where χ and κ eff are respectively the dispersive shift and the effective resonator decay rate through the feedline. Table of the qubit lifetimes ( T )and operating frequencies ( ω Qubit / π ). Qubit color indicate the qubit operating frequency: red (purple) → lowest (highest)operating frequency. readout in the form of support vector machines [20], hid-den Markov models [24], or a reservoir computing ap-proach [25]. Using DNNs, improved single-qubit readoutfidelity has previously been demonstrated for trapped-ions and spin qubits [26–28]. In this manuscript, we ex-tend the application of neural networks to superconduct-ing qubit readout and, more generally, to dispersive qubitreadout. Furthermore, we demonstrate readout discrim-ination using a DNN of multiple simultaneously read outqubits on a single feedline. While we apply our methodsto a superconducting qubit system, we anticipate thatthey will generalize to other platforms. II. SUPERCONDUCTING QUBIT READOUT
Superconducting qubit readout is generally performedtoday under the paradigm of circuit quantum electro-dynamics (cQED) in the dispersive regime [29]. Here,the qubit is coupled to a far-detuned resonator, suchthat their interaction can be treated perturbatively. Theleading-order effect on the resonator is a qubit-state-dependent frequency shift ˆ H disp = χ ˆ a † ˆ a ˆ σ z , where ˆ a isthe resonator lower operator, ˆ σ z the Pauli-Z operator de-scribing the qubit state, and χ the dispersive frequencyshift. As a result, a coherent microwave signal inci-dent on the resonator acquires a qubit-state-dependentphase shift upon transmission or reflection. The read-out resonator population has to remain below a criti-cal photon number, typically tens to hundreds of pho-tons, to remain in the dispersive readout regime. Low- FIG. 2. Measurement Data Processing and Discrimination. (a) Superconducting qubit-state discrimination can be accomplishedusing a single-qubit matched filter (MF) with kernel k i [n] which serves as a windowing function that projects the readout signalsto a single axis and subsequent discriminator threshold optimization (no pulse applied, denoted by ∅ , qubit initialized in theground state: ∅ → | (cid:105) and labeled as 0; π -pulse applied, denoted by π , qubit initialized in the excited state: π → | (cid:105) andlabeled as 1). We analyze (b) single-qubit linear support vector machines (SQ-LSVM), (c) multi-qubit LSVMs (MQ-LSVM),and (d) fully-connected feedforward neural networks (NN) as alternatives to MFs. The qubit-state-assignment fidelity of theMF and LSVM is maximized if the intermediate frequency signal ( z IF [n] = I IF [n] + j Q IF [n]) is digitally demodulated (e.g., forresonator 1: z IF [n] . ∗ − j ω IF1 n = I [n]+j Q [n] with . ∗ indicating an element-wise multiplication). The training data is relabelled totrain five parallel single-qubit discriminators (MF, SQ-LSVM). The training data can either be limited to measurements duringwhich spectator qubits are kept in their ground state (denoted by ∅ ) or in all combinations of the ground and excited state(symbolized by ∗ . The MQ-LSVM as a single multi-qubit discriminator requires the digitally demodulated data to be stackedand concatenated to form a single data block. The feedforward NN does not require any digital demodulation or preprocessing. noise cryogenic preamplification—a Josephson traveling-wave parametric amplifier (JTWPA) [30] at the mixingchamber (20 mK) and a high-electron-mobility transistor(HEMT) at 3 K— are used to improve the signal-to-noiseratio (SNR). Subsequent heterodyne detection and digi-tization of the amplified signal imprints the informationof the qubit state in the in-phase ( I ) and quadrature ( Q )components of the output signal, as depicted in Fig. 1(a).For multi-qubit systems, there are three main qubit-state-readout approaches. First, each qubit can bemeasured with a separate readout resonator, feed-line, and amplifier chain—a resource-intensive approachwith minimal crosstalk. Alternatively, more-resource-efficient readout architectures have several qubits cou-pled to a single readout resonator [31] or use frequency-multiplexed readout signals from multiple readout res- onators [32] sharing a single feedline and amplifierchain [33]. In many contemporary architectures, Purcellfilters are added to further reduce residual off-resonantenergy decay from the qubits to the resonators [34, 35].For a qubit with static coupling to its readout res-onator, energy decay and excitation during the readoutare typically the primary sources of qubit measurementerrors. In addition, a frequency-multiplexed readout sig-nal contains state information on multiple qubits and issusceptible to crosstalk-induced qubit-state-readout er-rors. Such crosstalk errors occur due to intrinsic inter-actions between the qubits themselves, qubits couplingparasitically to the readout resonators associated withother qubits, or insufficient spectral separation betweenreadout frequencies [21].As a result of crosstalk, state transitions due to deco-herence, and other nonidealities [36], multi-qubit hetero-dyne signals are more complicated than for single qubits,making state discrimination more challenging. Therehas been significant progress in reducing error rates andmeasurement times for both single- and multi-qubit de-vices [21, 37]. However, managing, classifying, and ex-tracting useful information from the measured signal re-mains an important challenge in light of the complex er-ror mechanisms, such as crosstalk, introduced by multi-plexed readout at scale.Here, we focus on multiple frequency-tunable trans-mon qubits [38] arranged in a linear array with operatingfrequencies ω Qubit / π between 4 . . T ranging from 7 µ s to 40 µ s (see sup-plementary information [39] for additional details). Thequbits are connected via individual co-planar waveguideresonators to the same Purcell filtered feedline, as de-picted in Fig. 1(b,c). The frequency-multiplexed readouttone comprises superposed baseband signals at interme-diate frequencies (IF) between 10 MHz to 150 MHz up-converted to the individual readout resonator frequen-cies ω Res . After passing the feedline, the transmittedand phase-shifted tones are down-converted to IF. Up-and down-conversion is conducted with a shared local os-cillator at 7 .
127 GHz. Lastly, the down-converted I - and Q -components of the signal are digitized with a 2 ns sam-pling period. The resulting sequences, I IF [ n ] and Q IF [ n ],are subsequently digitally processed—the focus of thiswork—to extract the individual qubit states. III. QUBIT-STATE DISCRIMINATION
We employ supervised machine learning methods toimprove superconducting qubit-state readout. This re-quires a classifier capable of distinguishing the qubit-state-dependent phase shift encoded in the discrete-time I IF [n] and Q IF [n] sequences. This section will also reviewthe current approaches to state discrimination (which wewill use as comparative benchmarks). Boxcar filters average the equal-weighted digitally-demodulated elements of the I IF [n] and Q IF [n] discrete-time readout signal. The digital demodulation employedhere is further elaborated in the supplementary mate-rial [39]. Each boxcar filtered digitally-demodulated se-quence I [n] and Q [n] results in a single two-dimensionaldata point in the IQ -plane [4]. Subsequently, the result-ing data set can be further processed and discriminatedsuch as for example with a support vector machine (seesupplementary materials [39]). Matched filter (MF) windows are generalized win-dowing functions with each element optimized to max-imize the SNR within a given system noise model [40].The boxcar window is the simplest example of a filterin the absence of such a noise model. For additive sta-tionary noise independent of the qubit state and diag-onal Gaussian covariance matrices, the optimal filter interms of the SNR uses a “window” or “kernel,” propor- tional to the difference between the mean ground- andexcited-state-readout signal, referred to as a “matchedfilter” in Ref. [41], “mode matched filter” in Ref. [21], oras “Fisher’s linear discriminant” in the context of statis-tics and machine learning [42]. Applying such a matchedfilter reduces each readout single-shot measurement toa single one-dimensional value dependent on the qubit-state-dependent phase, allowing the qubit states to bediscriminated by a simple threshold classifier. Here, werefer to a discriminator composed of a matched filter [41]and subsequently optimized threshold as MF.While MFs are computationally efficient and provablyoptimal (for stationary noise) for single qubits, the com-putational complexity to derive multi-qubit MFs scalesexponentially in the number of qubits, N [43]. Conse-quently, in practice, multi-qubit readout is conducted perqubit with individually optimized single-qubit MFs—theapproach used for many contemporary single- and multi-qubit readout schemes [6, 21, 41, 44, 45] and does not ac-count for noise sources and nonidealities present in mulit-qubit systems.The MF kernel k i [n] is equal to the difference be-tween the mean ground- and excited-state readout sig-nal normalized by its standard deviation, which mustbe measured experimentally using calibration runs withknown qubit states. In our setup, the highest qubit-state-assignment fidelity for MFs is achieved using time tracesrecorded with the other qubits (spectator qubits) initial-ized in their ground states, as depicted in Fig. 2(a). Thisis a consequence of the simple noise model presumed forthe MF, and thus, the MF discriminator does not capturemulti-qubit readout crosstalk. In this paper we use theMF as a baseline to compare the following methods (seethe supplementary materials [39] for other variations ofall the methods). Support vector machines (SVM) are quadraticprograms [46, 47] with the objective to maximize thedistance between each data point and a decision bound-ary, a learned hyperplane separating two distinct classes.SVMs are a purely geometric approach to discrimination.For a single superconducting qubit, it has been reportedthat SVMs generate decision boundaries superior to thatof MFs, as realistic noise deviates from the simple single-qubit noise model assumed for the MF [20].Similar to the MF approach, multi-qubit-state discrim-ination can be conducted using a SVM classifier perqubit-readout signal. In contrast to our MF tune-up,we find that the highest assignment fidelity is achievedwhen the SVMs are trained using qubit-state measure-ment traces with the spectator qubits prepared in allcombinations of ground and excited states.Alternatively, multi-qubit states can be discriminatedby a single SVM composed of several hyperplanes thatpartition the full multidimensional IQ -space, shown inFig. 2(c). Such a multi-qubit SVM can be tuned usinga “one-versus-all” strategy. We solve 2 N ( N , the numberof qubits) two-class discrimination problems with a singlequbit state as one class and the remaining qubit statesas the other. In our analysis, linear SVMs (LSVM) usedas parallel single- and multi-qubit discriminators outper-form their nonlinear counterparts in robustness, compu-tational efficiency, and assignment fidelity [39]. Deep neural networks (DNN) are mapping func-tions composed of arbitrarily connected nodes arrangedin layers [48]. Depending on the layer organization andthe functions governing the connections between nodes,different neural network archetypes can be generated.Here, we investigate three of the most common and suc-cessful DNNs: fully-connected feedforward neural net-works, convolutional neural networks, and recurrent neu-ral networks. We find a fully-connected feedforwardneural network (FNN)—implemented in PyTorch [49]—outperforms the other network architectures in qubit-state-assignment fidelity. Our FNN architecture is com-posed of three hidden layers (1st, 2nd, and 3rd layer con-sist of 1000, 500, and 250 nodes, respectively) that useSELU activation functions [50], and a softmax appliedto the 2 N -node output layer. The network is trained(validation-training set ratio of 0.35) using the Adamoptimizer [51] with categorical cross-entropy as the lossfunction.In contrast to the MF and LSVM, the FNN candirectly discriminate the frequency-multiplexed multi-qubit readout sequences I IF [n] and Q IF [n] without de-modulation or filtering. Training the network directly onthe multiplexed readout signal bypasses the need for fur-ther preprocessing stages, suggesting a more efficient useof the measurement output, as illustrated in Fig. 2(d).In addition, fewer independent operations in the readoutchain may reduce the possibility of systematic errors. IV. RESULTS
We now present our five-qubit readout experiment re-sults, comparing the performance of parallelized single-qubit MFs, parallelized single-qubit LSVMs (SQ-LSVM),multi-qubit LSVM (MQ-LSVM), and FNN approaches.The same qubit-readout sequences I IF [n] and Q IF [n] withvarying amounts of preprocessing [Fig. 2]—are used forall approaches. We compare the discrimination results,a five-bit string with each bit representing the assignedstate of a qubit. The qubit-state-assignment fidelity forqubit i is F i = 1 − [ P (0 i | π i ) + P (1 i |∅ i )] / , (1)where P (0 i | π i ) is the conditional probability of assigningthe ground state with label 0 to qubit i when preparedin the excited state with a π -pulse applied. P (1 i |∅ i ) isthe conditional probability of assigning the excited statewith label 1 to qubit i when prepared in the ground state(no pulse applied: ∅ ).The data to train and evaluate the discriminator per-formance was acquired using the five-qubit chip intro-duced in Fig. 1(b,c). For five qubits, all 32 qubit-state permutations are sequentially initialized and the mea-surement output is recorded. The generated data setcontains 50,000 single-shot sequences I IF [n] and Q IF [n]recorded over 2 µ s for each qubit-state configuration. Therecorded data set is subsequently divided into a random-ized training and test set (15,000 traces per qubit-stateconfiguration for training and 35,000 for testing). All ofthe following results are evaluated using 35,000 single-shot measurements per qubit-state configuration.We quantify the assignment fidelity per qubit using thegeometric mean assignment fidelity, F GM = ( F F F F F ) / , (2)with each qubit-state-assignment fidelity defined byEq. 1. Both SVM approaches improve the assignmentfidelity relative to the MF, with the parallelized single-qubit SVM outperforming the multi-qubit approach by0 . µ s-measurement time. For multi-classdiscriminators such as the MQ-LSVM, geometric con-straints result in ambiguous regions without a uniqueclass assigned [52], which leads to poor performance rel-ative to the other approaches. After a 1 µ s-long mea-surement time, the FNN, compared to the MF, in-creases the qubit-state-assignment fidelity from 0 .
885 to0 . − (1 − F FNN ) / (1 − F MF )] by 0.244. Compared to theSQ-LSVM, the FNN increases the qubit-state-assignmentfidelity from 0 .
905 to 0 .
913 and thus reduces the single-qubit assignment error by 0 . µ s-measurement time and 10,000 trainingsamples per qubit-state configuration.The assignment fidelity per qubit, discriminated in-dividually and in parallel with up to N = 5 qubits, ispresented in Fig. 3(c). For N -qubit discrimination taskswith N >
2, the FNN starts outperforming its discrim-inator alternatives. Except for qubit 2, the per-qubit-assignment fidelity decreases with an increasing numberof discriminated qubits. We observe a more substantial G M
10 10 10
Number of Training Samples
MFSQ-LSVMMQ-LSVMFNN (a) (b)(c)
Measurement Time ( μ s) MFSQ-LSVMMQ-LSVMFNN
QubitMFSQ-LSVMMQ-LSVMFNN
12 34 50.760.740.72
Qubit Qubit Qubit Qubit G M a ss i gn m e n t Q ub it s FIG. 3. Qubit-State-Assignment Fidelity. (a) Geometric mean qubit-state-assignment fidelity F GM (Eq. 2) for five qubitsversus measurement time for the matched filter (MF), single-qubit linear support vector machine (SQ-LSVM), multi-qubitlinear SVM (MQ-LSVM), and the fully-connected feedforward neural network (FNN). (b) F GM versus the number of traininginstances for each of the 32 qubit-state configurations evaluated after a measurement time of 1 µ s [vertical dashed-dotted linein (a)]. (c) Achievable assignment fidelity F assignment per qubit when N = { , , . . . , } qubits are simultaneously discriminatedafter a 1 µ s-measurement time. For each N -qubit discrimination task, the spectator qubits are initialized in their ground state.Single-qubit discrimination ( N = 1): the first data point of each of the five panels represents the single-qubit F assignment defined by Eq. 1, while the states of the four spectator qubits are not discriminated and initialized in their ground state. Whenemployed as single-qubit discriminators, all methods perform similarly. Two-qubit discrimination ( N = 2): The following fourdata points show F assignment when the state of each panel’s qubit is simultaneously discriminated with the state of one otherqubit. N-qubit discrimination ( N > N − N -qubit discrimination task, the non-spectator qubits are indicated with a colored square at the graphbottom. assignment fidelity decrease if the resonators involved inthe discrimination are proximal in frequency, suggest-ing the occurrence of readout crosstalk. In addition toreadout crosstalk, qubit 3 reveals control crosstalk withqubit 1 and 5, the qubits closest in frequency. Under theassumption of additive stationary noise independent ofthe qubit state and diagonal Gaussian covariance matri-ces, the estimated upper qubit-state-assignment fidelitybound per qubit for MFs [20] including the label confi-dence [39] are F MF1 ≈ . F MF2 ≈ . F MF3 ≈ . F MF4 ≈ .
95, and F MF5 ≈ . F MF2 isprimarily reduced due to T -events and limited qubit-state separation in the IQ -plane (see supplementary in-formation [39] for additional details). The different dis- criminators yield a similar assignment fidelity within afew tenths of a percent of the upper MF assignment fi-delity bound—except for qubit 2 where it is off by a fewpercent—when tasked to discriminate a single qubit, asshown in Tab. I. The small discrepancy between this up-per bound and the achieved assignment fidelity suggeststhat the noise sources affecting single-qubit readout inour devices are reasonably well approximated by addi-tive stationary noise independent of the qubit state anddiagonal Gaussian covariance matrices. As the numberof simultaneously discriminated qubits increases, the as-signment fidelity increasingly deviates from F MF i , reveal-ing system dynamics unaccounted for by the Gaussiannoise model. TABLE I. Qubit-assignment fidelity if discriminated individually, F i , and in parallel with all other qubits, F i . The lastfive columns present the assignment fidelity for an N -qubit discrimination process with N = { , , . . . , } . (cid:104)F N Q (cid:105) repre-sents the mean assignment fidelity of all qubit permutations. The single-qubit assignment fidelity is similar for all discrim-inator approaches. For a two-qubit discrimination task, the SQ-LSVM and FNN outperform the MF and MQ-LSVM. For N -discrimination tasks with N >
2, the FNN outperforms all other methods.Qubit 1 Qubit 2 Qubit 3 Qubit 4 Qubit 5 (cid:104)F (cid:105) (cid:104)F (cid:105) (cid:104)F (cid:105) (cid:104)F (cid:105) (cid:104)F (cid:105)F F F F F F F F F F MF 0.971 0.968 0.740 0.719 0.962 0.914 0.946 0.934 0.976 0.967 0.9185 0.9100 0.9042 0.8993 0.8946SQ-LSVM 0.970 0.969 0.740 0.744 0.963 0.924 0.951 0.943 0.976 0.968
The confusion matrix, a matrix P assign with the qubit-state-assignment probability distribution for each pre-pared qubit-state configuration as rows, provides furtherinsight into the underlying error mechanisms. The confu-sion matrix is the identity matrix if each prepared stateis correctly labeled and assigned. In practice, in addi-tion to misclassification, the preparation of states can beimperfect. We estimate the mean state preparation fideli-ties for each qubit [39]: F prep1 ≈ . F prep2 ≈ . F prep3 ≈ . F prep4 ≈ . F prep5 ≈ . P FNNassign and P MFassign , shown in Fig. 4(a). The FNN generally re-duces the erroneous off-diagonal assignment probabilitiesrelative to the MF. The most significant exception beingthe lower off-diagonal elements corresponding to decay ofqubit 2, as presented in Fig. 4(b).Deviations from the ideal confusion matrix occur dueto initialization errors, state transitions during the mea-surement, or readout crosstalk. Typically, the qubit-statemisclassifications in the lower off-diagonal block outweighthose of the upper off-diagonal due to the greater likeli-hood of decay events at cryogenic temperatures. Here,for a 1 µ s-long measurement, qubit 2—the qubit withthe shortest lifetime—has a 15 % probability of T -decay,such that for a significant portion of the training mea-surements with qubit 2 excited, the final state of qubit 2is the ground state.As shown in Fig. 4(b), the FNN is more likely to as-sign a ground-state label to qubit 2 than an excited-statelabel, whereas the MF reveals the reverse trend. Thissuggests that the assignment probabilities of the FNNagree better with the expected error model. However,we can attribute the pattern of the MF assignment prob-ability to a training bias. Since measurements with qubit2 prepared in the excited state and corrupted by a T -decay have integrated signals similar to measurementswith qubit 2 prepared in the ground state, the thresh-old optimizer overcompensates to correctly classify T -decay corrupted excited-state measurements at the costof misclassification of ground-state measurements. Thisresults in the misclassification pattern seen in Fig. 4(b) TABLE II. Mean absolute value, (cid:104)| · |(cid:105) , of the qubit-state-assignment correlations between readout resonators i and j ( i (cid:54) = j ) extracted from the cross-fidelity matrix F CF whenusing a MF or FNN discriminator. (cid:104)|F CF j = i ± |(cid:105) (cid:104)|F CF j = i ± |(cid:105) (cid:104)|F CF j = i ± |(cid:105) (cid:104)|F CF j = i ± |(cid:105) MF 0.020 0.015 0.006 ∼ ∼ for P MFassign .From the confusion matrix, we can further extract theprobability distribution of the non-zero Hamming dis-tance. This is the probability distribution describingthe number of misassigned qubits per qubit-state con-figuration. The assignment errors of the FNN (MF) are85 . . . . . . F CF ij is defined as F CF ij = (cid:104) − [ P (1 i |∅ j ) + P (0 i | π j )] (cid:105) , (3)where ∅ j ( π j ) represent the preparation of qubit j inthe ground (excited) state and 0 i (1 i ) the subsequentassignment to the ground (excited) state ( (cid:104) f (cid:105) denotesthe mean value of a function f ). A positive (negative)off-diagonal indicates a correlation (anti-correlation) be-tween the two qubits. Such correlations can occur due toreadout crosstalk. The off-diagonal entries for the FNNare all less than one percent, and are drastically reducedrelative to the MF. Relative to the MF, the mean cross-fidelity, (cid:104)|F CF ij |(cid:105) , for nearest neighbors ( j = i ±
1) is re-duced by one order of magnitude from (cid:104)|F
MF CF j = i ± |(cid:105) = 0 . (cid:104)|F FNN CF j = i ± |(cid:105) = 0 . Assigned QubitFNN MFAssigned Qubit (a)(b)
Assigned State (---0,---1) P a ss i gn F NN P a ss i gn M F – P r e p a r e d S t a t e (--- , --- π ) (c) -0.20.00.20.60.81.0 C r o ss - F i d e lit y P r e p a r e d Q ub it
00 01 10 11 π π P assignFNN P assignMF – -0.2 0.0 0.2 Assigned State P r e p a r e d S t a t e π
00 0 π π ππ
00 0 π π ππ
00 0 π P assign – = FIG. 4. Assignment Fidelity Analysis. (a) Difference betweenthe confusion (assignment probability) matrix of the feedfor-ward neural network (FNN) P FNNassign and of the matched filter(MF) P MFassign . The rows of the confusion matrix encompassthe discriminator’s probability distribution to assign each ofthe 32 qubit-state configurations to the row’s prepared qubit-state configuration (no pulse applied, qubit initialized in theground state: ∅ → π -pulse applied, qubit initialized in theexcited state: π → for all j (cid:54) = i , as presented in Tab. II. The FNN’s reductionof assignment correlations by up to one order of magni-tude corroborates the claim of the FNN’s diminishingreadout-crosstalk-related discrimination errors. V. CONCLUSION
We have demonstrated an approach to multi-qubitreadout using neural networks as multi-qubit state dis-criminators that is more crosstalk-resilient than othercontemporary approaches. We find that a fully-connected FNN increases the readout assignment fi-delity for a multi-qubit system compared to contempo-rary methods. We observe that the FNN compensatessystem-nonidealities such as readout crosstalk more ef-fectively relative to alternatives such as matched filters(MFs) or support vector machines (SVMs). The assign-ment error rate is diminished by up to 25 % and crosstalk-induced discrimination errors are suppressed by up to oneorder of magnitude. The relative assignment fidelity im-provement of the FNN over its contemporary alternativesgrows as the number of simultaneously read out and mul-tiplexed qubits increases.While FNNs are initially more resource-intensive intraining, its re-calibration can be significantly more effi-cient due to transfer learning [53]. Periodic re-calibrationof control and readout parameters is necessary as quan-tum systems drift in time. For a marginal drift, neu-ral networks can be updated at a fraction of the initialresource requirements. Furthermore, to speed up qubitreadout, the techniques developed here can be transi-tioned to dedicated hardware such as field-programmablegate arrays (FPGA) [27].We have tested our FNN multi-qubit-state discrim-ination approach on a quantum system with five su-perconducting qubits and frequency-multiplexed read-out. While the readout fidelity of Qubit 2 was relativelymarginal, four qubits revealed multi-qubit readout fideli-ties comparable with contemporary multi-qubit systems,albeit with measurement times around 1 µ s (see supple-mentary information [39] for additional details), muchlonger than the state of the art of 100 ns for single-qubitsystems [37]. We demonstrated an improvement usingFNN for all qubits. The next step is to test the per-formance of FNNs on higher-fidelity multi-qubit systemswith measurement times below 100 ns to assess if the ad-vantage is retained on already high-performing devices.FNNs offer a readout-state discrimination approach tai-lored to the underlying system. They can be readilyemployed to more general discrimination tasks than wehave considered here, such as multi-level readout in aqudit architecture [54–57]. This work presents a poten-tial building block to scaling quantum processors whilemaintaining high-fidelity readout. ACKNOWLEDGEMENTS
We want to express our appreciation for MirabellaPulido and Chihiro Watanabe for administrative assis-tance. This research was funded in part by the DARPAPolyplexus grant No. HR00112010001; by the U.S. ArmyResearch Office (ARO) Multidisciplinary University Re-search Initiative (MURI) W911NF-18-1-0218; and by the Department of Defense via Lincoln Laboratory under AirForce Contract No. FA8721-05-C-0002. The views andconclusions contained herein are those of the authors andshould not be interpreted as necessarily representing theofficial policies or endorsements, either expressed or im-plied, of DARPA or the US Government. [1] L. K. Grover, A fast quantum mechanical algorithm fordatabase search, Proceedings, 28th Annual ACM Sym-posium on the Theory of Computing , 212 (1996).[2] P. Shor, Polynomial-Time Algorithms for Prime Factor-ization and Discrete Logarithms on a Quantum Com-puter, Proceedings of the 37’th Annual Symposium onFoundations of Computer Science (FOCS) (IEEE Press,Burlington, VT (1996).[3] X. Gua, A. F. Kockum, A. Miranowicz, Y.-X. Liu,and F. Nori, Microwave photonics with superconductingquantum circuits, Physics Reports , 1 (2017).[4] P. Krantz, M. Kjaergaard, F. Yan, T. P. Orlando, S. Gus-tavsson, and W. D. Oliver, A quantum engineer’s guideto superconducting qubits, Appl. Phys. Rev. , 021318(2019).[5] X. Y. Jin, A. Kamal, A. P. Sears, T. Gudmundsen,D. Hover, J. Miloshi, R. Slattery, F. Yan, J. Yoder, T. P.Orlando, S. Gustavsson, and W. D. Oliver, Thermal andresidual excited-state population in a 3d transmon qubit,Phys. Rev. Lett. , 240501 (2015).[6] F. Arute, K. Arya, R. Babbush, et al. , Quantumsupremacy using a programmable superconducting pro-cessor, Nature , 505 (2019).[7] M. Kjaergaard, M. E. Schwartz, J. Braum¨uller,P. Krantz, J. I.-J. Wang, S. Gustavsson, and W. D.Oliver, Superconducting qubits: Current state of play,Annual Review of Condensed Matter Physics (2019).[8] J. M. Gambetta, J. M. Chow, and M. Steffen, Buildinglogical qubits in a superconducting quantum computingsystem, npj Quantum Inf. , 350 (2017).[9] D. Rist`e, J. G. van Leeuwen, H.-S. Ku, K. W. Lehn-ert, and L. DiCarlo, Initialization by Measurement of aSuperconducting Quantum Bit Circuit, Phys. Rev. Lett. (2012).[10] J. E. Johnson, C. Macklin, D. H. Slichter, R. Vijay,E. B. Weingarten, J. Clarke, and I. Siddiqi, HeraldedState Preparation in a Superconducting Qubit, Phys.Rev. Lett. (2012).[11] M. D. Reed, L. DiCarlo, S. E. Nigg, L. Sun, L. Frun-zio, S. M. Girvin, and R. J. Schoelkopf, Realization ofthree-qubit quantum error correction with superconduct-ing circuits, Nature , 382 (2012).[12] R. Barends, J. Kelly, A. Megrant, A. Veitia, D. Sank,E. Jeffrey, T. C. White, J. Mutus, A. G. Fowler,B. Campbell, et al. , Superconducting quantum circuitsat the surface code threshold for fault tolerance, Nature , 500 (2014).[13] P. Krantz, A. Bengtsson, M. Simoen, S. Gustavsson,V. Shumeiko, W. D. Oliver, C. M. Wilson, P. Delsing,and J. Bylander, Single-shot read-out of a superconduct- ing qubit using a Josephson parametric oscillator, Nat.Commun. , 1 (2016).[14] D. P. DiVincenzo, Fault tolerant architectures for super-conducting qubits, Phys. Scr. T 137 , 014020 (2009).[15] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N.Cleland, Surface codes: Towards practical large-scalequantum computation, Phys. Rev. A , 032324 (2012).[16] J. Preskill, Quantum Computing in the NISQ era andbeyond, Quantum (2018).[17] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q.Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien,A variational eigenvalue solver on a photonic quantumprocessors, Nat. Commun. (2014).[18] E. Farhi, J. Goldstone, and S. Gutmann, A quan-tum approximate optimization algorithms (2014),arXiv:1411.4028.[19] F. B. Maciejewski, Z. Zimbor´as, and M. Oszmaniec, Mit-igation of readout noise in near-term quantum devices byclassical post-processing based on detector tomography,Quantum , 257 (2020).[20] E. Magesan, J. M. Gambetta, A. C`orcoles, and J. M.Chow, Machine Learning for Discriminating QuantumMeasurement Trajectories and Improving Readout, Phys.Rev. Lett. , 200501 (2015).[21] J. Heinsoo, C. K. Andersen, A. Remm, S. Krinner,T. Walter, Y. Salath´e, S. Gasparinetti, J. C. Besse,A. Potoˇcnik, A. Wallraff, and C. Eichler, Rapid High-fidelity Multiplexed Readout of Superconducting Qubits,Phys. Rev. Appl. , 1 (2018).[22] C. C. Bultink, T. E. O’Brien, R. Vollmer, N. Muthusub-ramanian, M. W. Beekman, M. A. Rol, X. Fu, B. Tarasin-ski, V. Ostroukh, B. Varbanov, A. Bruno, and L. Di-Carlo, Protecting quantum entanglement from leakageand qubit errors via repetitive parity measurements, Sci-ence Advances (2020).[23] E. Flurin, L. S. Martin, S. Hacohen-Gourgy, and I. Sid-diqi, Using a recurrent neural network to reconstructquantum dynamics of a superconducting qubit fromphysical observations, Phys. Rev. X , 011006 (2020).[24] L. A. Martinez, Y. J. Rosen, and J. L. DuBois, Improvingqubit readout with hidden markov models, Phys. Rev. A , 062426 (2020).[25] G. Angelatos, S. Khan, and H. E. T¨ureci, Reservoir com-puting approach to quantum state measurement (2020),arXiv:2011.09652 [quant-ph].[26] A. Seif, K. A. Landsman, N. M. Linke, C. Figgatt,C. Monroe, and M. Hafezi, Machine learning assistedreadout of trapped-ion qubits, Journal of Physics B:Atomic, Molecular and Optical Physics , 174006(2018). [27] Z.-H. Ding, J.-M. Cui, Y.-F. Huang, C.-F. Li, T. Tu,and G.-C. Guo, Fast High-Fidelity Readout of a Sin-gle Trapped-Ion Qubit via Machine-Learning Methods,Phys. Rev. Applied , 014038 (2019).[28] Y. Matsumoto, T. Fujita, A. Ludwig, A. D. Wieck,K. Komatani, and A. Oiwa, Noise-robust classificationof single-shot electron spin readouts using a deep neuralnetwork (2020), arXiv:2012.10841 [quant-ph].[29] A. Blais, R. S. Huang, A. Wallraff, S. M. Girvin, and R. J.Schoelkopf, Cavity quantum electrodynamics for super-conducting electrical circuits: An architecture for quan-tum computation, Phys. Rev. A - At. Mol. Opt. Phys. , 1 (2004).[30] C. Macklin, K. O’Brien, D. Hover, M. E. Schwartz,V. Bolkhovsky, X. Zhang, W. D. Oliver, and I. Siddiqi,A near–quantum-limited Josephson traveling-wave para-metric amplifier, Science , 307 (2015).[31] L. DiCarlo, M. D. Reed, L. Sun, B. R. Johnson, J. M.Chow, J. M. Gambetta, L. Frunzio, S. M. Girvin, M. H.Devoret, and R. J. Schoelkopf, Preparation and measure-ment of three-qubit entanglement in a superconductingcircuit, Nature (2010).[32] M. Jerger, S. Poletto, P. Macha, U. H¨ubner, E. Il’ichev,and A. V. Ustinov, Frequency division multiplexing read-out and simultaneous manipulation of an array of fluxqubits, Appl. Phys. Lett. , 042604 (2012).[33] E. Jeffrey, D. Sank, J. Y. Mutus, T. C. White, J. Kelly,R. Barends, Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, et al. , Fast Accurate State Measurement with Supercon-ducting Qubits, Phys. Rev. Lett. , 190504 (2014).[34] E. A. Sete, J. M. Martinis, and A. N. Korotkov, Quan-tum theory of a bandpass purcell filter for qubit readout,Phys. Rev. A , 012325 (2015).[35] C. Neill, P. Roushan, K. Kechedzhi, S. Boixo, S. V.Isakov, V. Smelyanskiy, A. Megrant, B. Chiaro,A. Dunsworth, K. Arya, R. Barends, et al. , A blueprintfor demonstrating quantum supremacy with supercon-ducting qubits, Science , 195 (2018).[36] L. C. G. Govia and F. K. Wilhelm, Unitary-feedback-improved qubit initialization in the dispersive regime,Phys. Rev. Applied , 054001 (2015).[37] T. Walter, P. Kurpiers, S. Gasparinetti, P. Mag-nard, A. Potoˇcnik, Y. Salath´e, M. Pechal, M. Mon-dal, M. Oppliger, C. Eichler, and A. Wallraff, RapidHigh-Fidelity Single-Shot Dispersive Readout of Super-conducting Qubits, Phys. Rev. Appl. , 1 (2017).[38] J. Koch, T. M. Yu, J. Gambetta, A. A. Houck, D. I.Schuster, J. Majer, A. Blais, M. H. Devoret, S. M. Girvin,and R. J. Schoelkopf, Charge-insensitive qubit design de-rived from the cooper pair box, Phys. Rev. A , 042319(2007).[39] Supplementary Information.[40] G. Turin, An introduction to matched filters, IRE Trans-actions on Information Theory , 311 (1960).[41] C. A. Ryan, B. R. Johnson, J. M. Gambetta, J. M. Chow,M. P. Da Silva, O. E. Dial, and T. A. Ohki, Tomographyvia correlation of noisy measurement records, Phys. Rev.A - At. Mol. Opt. Phys. , 1 (2015).[42] C. M. Bishop, Pattern Recognition and Machine Learning(Information Science and Statistics) (Springer-Verlag,Berlin, Heidelberg, 2006).[43] K. Fukunaga,
Introduction to Statistical Pattern Recogni-tion (2nd Ed.) (Academic Press Professional, Inc., USA,1990). [44] N. T. Bronn, B. Abdo, K. Inoue, S. Lekuch, A. D.C`orcoles, J. B. Hertzberg, M. Takita, L. S. Bishop, J. M.Gambetta, and J. M. Chow, Fast, high-fidelity readout ofmultiple qubits, J. Phys.: Conf. Ser. , 012003 (2017).[45] C. C. Bultink, B. Tarasinski, N. Haandbæk, S. Poletto,N. Haider, D. J. Michalak, A. Bruno, and L. DiCarlo,General method for extracting the quantum efficiencyof dispersive qubit readout in circuit QED, Appl. Phys.Lett. , 092601 (2018).[46] B. E. Boser, I. M. Guyon, and V. N. Vapnik, A trainingalgorithm for optimal margin classifiers, in
Proceedings ofthe Fifth Annual Workshop on Computational LearningTheory , COLT ’92 (Association for Computing Machin-ery, New York, NY, USA, 1992) p. 144–152.[47] C. Cortes and V. Vapnik, Support-vector networks, Ma-chine Learning , 273 (1995).[48] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learn-ing (The MIT Press, 2016).[49] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury,G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. , Pytorch: An imperative style, high-performancedeep learning library, in
Advances in Neural InformationProcessing Systems 32 (Curran Associates, Inc., 2019)pp. 8024–8035.[50] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochre-iter, Self-normalizing neural networks, in
Proceedings ofthe 31st International Conference on Neural InformationProcessing Systems , NIPS’17 (Curran Associates Inc.,Red Hook, NY, USA, 2017) p. 972–981.[51] D. P. Kingma and J. Ba, Adam: A method for stochasticoptimization (2017), arXiv:1412.6980.[52] R. O. Duda and P. E. Hart,
Pattern Classification andScene Analysis (John Willey & Sons, New Yotk, 1973).[53] Y. Bengio, Deep learning of representations for unsu-pervised and transfer learning, in
Proceedings of ICMLWorkshop on Unsupervised and Transfer Learning , Pro-ceedings of Machine Learning Research, Vol. 27, editedby I. Guyon, G. Dror, V. Lemaire, G. Taylor, andD. Silver (JMLR Workshop and Conference Proceedings,Bellevue, Washington, USA, 2012) pp. 17–36.[54] P. Kurpiers, P. Magnard, T. Walter, B. Royer, M. Pechal,J. Heinsoo, Y. Salath´e, A. Akin, S. Storz, J.-C. Besse,S. Gasparinetti, A. Blais, and A. Wallraff, Deterministicquantum state transfer and remote entanglement usingmicrowave photons, Nature , 1476 (2018).[55] S. S. Elder, C. S. Wang, P. Reinhold, C. T. Hann,K. S. Chou, B. J. Lester, S. Rosenblum, L. Frunzio,L. Jiang, and R. J. Schoelkopf, High-fidelity measurementof qubits encoded in multilevel superconducting circuits,Phys. Rev. X , 011001 (2020).[56] M. A. Yurtalan, J. Shi, G. J. K. Flatt, and A. Lupascu,Characterization of multi-level dynamics and decoher-ence in a high-anharmonicity capacitively shunted fluxcircuit (2020), arXiv:2008.00593 [quant-ph].[57] C. Wang, M.-C. Chen, C.-Y. Lu, and J.-W. Pan, Optimalreadout of superconducting qubits exploiting high-levelstates, Fundamental Research , 16 (2021).[58] B. Lienhard, J. Braum¨uller, W. Woods, D. Rosenberg,G. Calusine, S. Weber, A. Veps¨al¨ainen, K. O’Brien,T. P. Orlando, S. Gustavsson, and W. D. Oliver,Microwave packaging for superconducting qubits, in (2019) pp. 275–278. [59] S. Huang, B. Lienhard, G. Calusine, A. Veps¨al¨ainen,J. Braum¨uller, D. K. Kim, A. J. Melville, B. M.Niedzielski, J. L. Yoder, B. Kannan, T. P. Orlando,S. Gustavsson, and W. D. Oliver, Microwave packagedesign for superconducting quantum processors (2020),arXiv:2012.01438.[60] F. Yan, S. Gustavsson, A. Kamal, J. Birenbaum, A. P.Sears, D. Hover, T. J. Gudmundsen, D. Rosenberg,G. Samach, S. Weber, J. L. Yoder, T. P. Orlando,J. Clarke, A. J. Kerman, and W. D. Oliver, The fluxqubit revisited to enhance coherence and reproducibility,Nat. Commun. , 12964 (2016).[61] A. Blais, A. L. Grimsmo, S. M. Girvin, andA. Wallraff, Circuit quantum electrodynamics (2020),arXiv:2005.12667 [quant-ph].[62] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,Gradient-based learning applied to document recogni-tion, Proceedings of the IEEE , 2278 (1998).[63] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, Bert:Pre-training of deep bidirectional transformers for lan-guage understanding (2019), arXiv:1810.04805 [cs.CL].[64] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou,M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran,T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis,A general reinforcement learning algorithm that masterschess, shogi, and go through self-play, Science , 1140(2018).[65] R. A. Fisher, The use of multiple measurements in taxo-nomic problems, Annals of Eugenics , 179 (1936).[66] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa,A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer,A. Gramfort, J. Grobler, R. Layton, J. Vanderplas,A. Joly, B. Holt, and G. Varoquaux, Api design for ma-chine learning software: experiences from the scikit-learnproject (2013), arXiv:1309.0238 [cs.LG].[67] V. Nair and G. E. Hinton, Rectified linear units improverestricted boltzmann machines, in Proceedings of the 27thInternational Conference on International Conference onMachine Learning , ICML’10 (Omnipress, Madison, WI,USA, 2010) p. 807–814.[68] Y. Bengio, Practical Recommendations for Gradient-Based Training of Deep Architectures, in
Neural Net-works: Tricks of the Trade: Second Edition (SpringerBerlin Heidelberg, Berlin, Heidelberg, 2012) pp. 437–478. Appendix A: Measurement Setup
Qubit control and readout pulses—envelopes with co-sine shaped rising and falling edges encompassing aplateau—are programmed in Labber. They are cre-ated using three—two for control and one for readout—Keysight M3202A PXI arbitrary waveform generators(AWG) with a sampling rate of 1 GSa s − . The in-phase( I ) and quadrature ( Q ) components of the signals atMHz frequencies are up-converted to the qubit transitionfrequency using an IQ -mixer and a local oscillator (LO)(Rohde and Schwarz SGS100A) per AWG. The controland readout tones are combined and sent to the qubitchip in the dilution refrigerator via a single microwaveline attenuated by 60 dB.The qubit chip is mounted in a microwave package fol-lowing design principles as reported in Refs. [58, 59]. Acoil—centered above the qubit chip—is mounted in thedevice package. A global flux bias Φ is applied throughthat coil to the superconducting quantum interferencedevices (SQUID) of the qubits using a Yokogawa GS200.The readout signal, upon acquisition of a qubit-state-dependent phase shift, is first amplified using a Joseph-son traveling-wave parametric amplifier (JTWPA) withnear quantum-limited performance over a bandwidth ofmore than 2 GHz and a 1 dB compression point of ap-proximately −
100 dBm [30]. An Agilent E8267D signalgenerator provides the pump tone for the JTWPA. Themicrowave line carrying the pump tone is attenuated by50 dB and fed into the JTWPA via a set of directionalcouplers and isolators located in the mixing chamber ofthe refrigerator. The signal is further amplified by ahigh-electron-mobility transistor (HEMT) amplifier thatis thermally anchored to the 3 K stage.At room temperature, the readout signal is amplified,IQ-mixed with the LO at 7 .
127 GHz, and fed into a het-erodyne detector. The I - and Q -components of the read-out signal are digitized with a Keysight M3102A PXIAnalog to Digital Converter (ADC) at a sampling rate of500 MSa s − . The subsequent digital signal processing todistinguish qubit states is the focus of this manuscript. Appendix B: Five-Qubit Chip
The quantum system five superconducting qubits isfabricated on a (001) silicon substrate ( > Ω cm) bydry etching a molecular-beam epitaxy (MBE) grown alu-minum film in an optical lithography process before beingdiced into 5 × chips, as described in [60].The superconducting chip consists of coplanar waveg-uides and five frequency-tunable transmon qubits [38].The target qubit transition frequencies alternate between4 . . → operat-ing frequency) to limit qubit-qubit and control crosstalk.The capacitive nearest-neighbor (next-nearest-neighbor)qubit-qubit coupling rate, J nn ( J nnn ), is designed (us-ing COMSOL Multiphysics ® ) to be J nn / π ≈
14 MHz
TABLE IS. Chip comprising five superconducting frequency-tunable transmon qubits with alternating transition frequen-cies. A normalized magnetic flux bias Φ / Φ (magnetic fluxquantum Φ ) detunes the qubits from their idling to theiroperating frequency. The qubit anharmonicities α are in themoderate transmon regime. The qubit lifetimes T , Ramseycoherence times T , and spin-echo relaxation times T aremeasured at the qubit operating frequency.Qubit ω Qubit / π Bias α/ π T T T Idle Biased (cid:16) ΦΦ (cid:17) (MHz) ( µ s)(GHz)1 5.249 5.092 0.124 -212 40.8 1.3 7.42 4.708 4.404 0.160 -216 6.4 0.6 4.13 5.202 5.000 0.166 -204 21.4 1.0 7.24 4.560 4.309 0.154 -214 11.8 0.8 5.45 5.196 5.165 0.085 -200 23.4 7.6 31.8TABLE IIS. Chip comprising five superconducting readoutresonators at bare resonance frequencies ∼ ω LO / π = 7 .
127 GHz. Eachresonator couples to a designated qubit with strength g , lead-ing to a dispersive shift χ . The effective resonator decay ratethrough the Purcell filter is κ eff . The qubit-resonator inter-action remains in the dispersive regime for readout resonatorphoton populations below the critical photon number n crit .Resonator ω Res / π ω IF / π g/ π χ/ π κ eff / π n crit (GHz) (MHz) (MHz)1 7.06 -65 116.3 0.83 4.29 33.82 7.10 -26 143.3 0.51 4.25 55.33 7.15 24 125.7 0.77 4.41 34.94 7.20 70 133.1 0.49 3.33 56.95 7.25 127 125.4 0.80 6.90 33.0 ( J nnn / π < < . < .
01 MHz) [61]. Each qubit couples ca-pacitively to a quarter-wave resonator that couples in-ductively to a shared bandpass (Purcell) filtered feed-line. Neighboring readout resonator frequencies differ by ∼
50 MHz. The qubit and resonator operation parametersare included in Tab. IS and Tab. IIS.
Appendix C: Qubit-State Discriminators
The study of computational algorithms with the abilityto improve through experience is typically referred to asmachine learning [42]. These algorithms strive to identifypatterns in sample data, called training data, and cre-ate an approximate model of an underlying decision pro-cess without explicit instructions. While many machinelearning ideas are several decades old, they only recentlybecame widely applicable due to the development of suf-ficient computational resources and are applied today inimage processing [62], natural language processing [63],3or playing advanced games such as chess [64].Machine learning can be broadly divided into threecategories: unsupervised, supervised, and reinforcementlearning. Here, we focus on supervised learning meth-ods that learn an input-output mapping function usinga trusted set of input-output pairs (training set). Typ-ically, the input-output pairs for training are acquiredby the “supervisor,” hence the terminology. The qualityof the learned mapping function can be probed utilizingan additional set of trusted input-output pairs (test set).The comparison of performance of a supervised learningmethod on the training set compared to the test set isreferred to as generalization.
1. Matched Filter (MF) Threshold Discriminator
A matched filter applied to a readout single-shot mea-surement, the average of the element-wise product ofa readout signal and optimized kernel—optimized interms of signal-to-noise ratio (SNR)—projects the multi-dimensional input data to a single dimension such thatthe data can be linearly partitioned [40]. For stationarynoise and if the two classes are symmetric and Gaussiandistributed, a kernel proportional to the mean groundand excited state is optimal [41, 42]. For a system withThe optimized discriminator threshold is then located at0, the axis origin [41]. While such classifiers are typicallynot attributed to classical learning algorithms, the filtertune-up and threshold optimization require a “training”step.For superconducting qubits, the optimal kernel is equalto the difference between the mean ground- and excited-state-readout signal normalized by the signal variance,which must be measured experimentally using calibra-tion runs with known qubit states—as described andtermed “matched filter” in Ref. [41], “mode matchedfilter” in Ref. [21], or as “Fisher’s linear discriminant”in Ref. [42]. In our implementation, as illustrated inFig. 1S(a), each matched filter kernel—following the ter-minology in Ref. [41]—is multiplied with a rectangularwindow to limit the impact of nonidealities such as qubit-energy decay. Summing up the element-wise product ofthe windowed matched filter kernel k i [n] and the readoutsignal, I i [n] and Q i [n], yields a distribution along a singledimension (here, along I i ). A threshold optimized witha linear support vector machine (or optimizer of yourchoice) partitions the one-dimensional projection into aground- and excited-state class, depicted in Fig. 1S(b).Finally, the concatenation of the one-bit labels assignedby each single-qubit discriminator results in the assignedfive-qubit-state label. Note, the demodulation step at in-termediate frequencies using e − j ω IFi n with ω IF i defined inTab. IIS (as described in Ref. [4]) can be incorporated inthe kernel tune-up.Under the assumption of symmetric noise, the achiev-able assignment fidelity depends on the separation R be-tween the ground- and excited-state-readout signals, S and S , referred to as the Fisher criterion [65]. The sep-aration R is defined as R = ( (cid:104) S (cid:105) − (cid:104) S (cid:105) ) / var( S ) , (C1)with a symmetric variance, var( S ) = var( S ) = var( S )( (cid:104) f (cid:105) denotes the mean value of f ). For Gaussian dis-tributed states and diagonal covariance matrices, R canbe maximized using the introduced matched filter ker-nel k ∝ (cid:104) S − S (cid:105) / [var( S ) + var( S )] [41, 42]. For asystem with additive stationary noise independent of thequbit state and diagonal Gaussian covariance matrices,the maximally achievable assignment fidelity is F ach = 12 (cid:104) (cid:16)(cid:112) R/ (cid:17)(cid:105) , (C2)with erf( z ), the Gauss error function of z .The qubit-readout-state histograms that result afterthe matched filter are fit with Gaussian functions, shownin Fig. 1S(b). For the fit functions, the variance forthe ground and excited state are kept identical to eval-uate F ach , as presented in Tab. IIIS. Fitting the groundstate with a bimodal and the excited state with a tri-modal Gaussian fit reveals aspects of the state tran-sition dynamics such as thermal excitations or qubit-energy decays. The product of the label and achievablefidelity provides an estimation of the upper boundaryfor the matched filter (MF) discriminator qubit-state-assignment fidelity F MF , as shown in the last columnof Tab. IIIS.
2. Support Vector Machine (SVM)
Support vector machines (SVMs)—known for their ro-bustness and good generalization—are fundamental two-class discriminators that draw a single decision boundary,called a hyperplane, in a supervised learning scheme [46,47]. The margin between the classes and the hyperplanecan be maximized by penalizing misclassified data pointsand data points within the margin boundaries. Thepenalty for data points within the margin boundaries canbe varied using a regularization term. A lenient penaltyresults in a so-called soft-margin SVM which can bettercope with problems that are not linearly-separable.The hyperplane dimension is equal to the one less thanthe number of features–the dimensions of the measure-ment data. The location of a new data point relativeto the hyperplane decides on the associated label. Thisdeterministic decision process is not probabilistic, andthe information on the probability of label associationis thus not directly accessible. While hyperplane sep-arations only work for linearly-separable data, nonlin-ear SVMs use the kernel trick to map the data pointsto higher dimensions via a nonlinear transformation andfind a hyperplane in that higher-order feature space.Several SVMs can be trained in concert for multi-classdiscrimination to divide the feature space into areas asso-ciated with distinct classes [52]. For an N -class ( N >
TABLE IIIS. Numerical values extracted from Gaussian fits to readout data distribution after a 1 µ s-measurement time using amatched filter, as illustrated in Fig. 1S(a,b). The peak ratio of bimodal Gaussian fits (with equal variance) to the readout-traceshistograms of qubits initialized in the ground state (no pulse applied: ∅ ) provide insight in the thermal excitation probability P (1 |∅ ). Comparing the peak ratios for trimodal Gaussian fits to the readout-traces histograms of qubits initialized in theexcited state ( π -pulse applied: π ) indicate the conditional probability for qubit-energy decays P (0 | π ) and second-excited statepopulation P (2 | π ). F label = 1 − ( P (1 |∅ ) + P (0 | π )) / F π represents the fitted π -pulse fidelities. (cid:104) S (cid:105) , (cid:104) S (cid:105) , and var( S ) arethe mean ground state, mean excited state, and variance of both states used to derive the Fisher criterion R and achievableassignment fidelity F ach (see Eq. C1, C2). F MF , the product of F label and F ach , is an estimate for an upper qubit-state-assignment fidelity bound for a classifier composed of a matched filter and the subsequent optimized threshold, here referredto as MF.Qubit P (1 i |∅ i ) P (2 i |∅ i ) P (0 i | π i ) P (2 i | π i ) F label F π (cid:104) S (cid:105) (cid:104) S (cid:105) var( S ) R F ach F MF (cid:28) (cid:28) (cid:28) (cid:28) (cid:28) (a)(b)(c) Qubit 1 Qubit 2 Qubit 3 Qubit 4 Qubit 5 N o r m a li ze d C oun t s Measurement Time ( μ s) T i m e - B i n W e i gh t ( a . u . ) MFMF. RW − − − − − − − − − − π threshold02 π − bound a r y − − − − − -Quadrature (a.u.) I - Q u a d r a t u r e ( a . u . ) Q -Quadrature (a.u.) I * FIG. 1S. Readout Data Statistics. (a) Magnitude of the time-bin weights of the qubit-specific matched filter shapes derivedusing prepared ground and excited states. A rectangular window (RW) is applied to each matched filter kernel to reduce theimpact of qubit-energy decays and maximize qubit-state-assignment fidelities. The resulting matched filter windows are shadedin gray. (b) Shown are the histograms of the qubit-state-readout single-shot traces after applying the optimized 1 µ s-longmatched filter. The dashed lines represent the optimized thresholds with the states to the right attributed to the ground stateand left to the excited state. Using bimodal Gaussian fit functions for the ground state (green) and trimodal Gaussian fitfunctions for the excited state (blue) provides insight into the underlying dynamics such as thermal excitation or qubit-energydecays (see Tab. IIIS). (c) Plotted are boxcar filtered single-shot traces of ground (black) and excited states (gray) in the IQ -plane. A linear support vector machine trained on the two-dimensional data generates the qubit-specific colored discriminationboundary. G M t e s t G M t r a i n /
100 200 3000.91.0 − − − L ea r n i ng R a t e , η Epoch (a)(b) − − − − − − − − − − − − (c) Output LayerInput Layer
N=5 N=1N=2N=3N=4maximum bx n w n f(z) HiddenLayersNode l–1 x n+1l–1 x ml f(z)x ml f(z)=--w-x- +b n n nl–1 SELULayer l Output:Layer l Inputs:
FIG. 2S. Architecture and Training of Fully-Connected Feed-forward Neural Network (FNN). (a) The FNN architectureused here comprises an input layer, three hidden layers, andan output layer. For a 1 µ s-long measurement time, the inputlayer consists of 1,000 nodes. 1,000, 500, and 250 nodes formthe first, second, and third hidden layer. The output layerscales as 2 N (N, the number of qubits). For five qubits, theoutput layer encompasses 32 nodes. (b) The nodes composingthe hidden layer l are functions that depend on the followingparameter inputs: the output values x l − n of the prior layer l − b . The output value x lm of node m corresponds to the weighted (weights w n ) sum of the in-puts x l − n and the bias b after passing through an activationfunction, here a scaled exponential linear unit (SELU), shownin orange. (c) Shown is the training performance for an FNNtasked to discriminate N qubits with N = 1 , , . . . ,
5. Thegeneralization—the ratio of the geometric mean test F testGM andtraining qubit-state-assignment fidelity F trainGM —as the num-ber of epochs increases is shown in black using the left y-axis.The associated standard deviation of the generalization is in-dicated in gray. The number of epochs to achieve the max-imum qubit-state-assignment fidelity is indicated with a redvertical bar. The learning rate η , shown in blue and usingthe right y-axis, is gradually reduced as the number of epochsincreases. classification task, the number of necessary hyperplanesis at least N − Measurement Time ( μ s) a ss i gn m e n t FIG. 3S. Qubit-State-Assignment Fidelity. Matched filter dis-criminator for each qubit versus measurement time. The max-imum assignment fidelity F i ( t i ) for each qubit i is reachedafter t = 1 µ s, t = 2 µ s, t = 0 . µ s, t = 0 . µ s, and t = 0 . µ s. be associated with a single class [42].Here, we use scikit-learn library to implement single-qubit and multi-qubit linear and nonlinear SVMs inPython [66]. We employ the LinearSVC implementa-tion for linear and SVC for nonlinear soft-margin SVMswith regularization parameters optimized per discrimi-nator to deliver the maximally achievable qubit-state-assignment fidelity. In general, the training wall-clock-time for an SVM implemented using LinearSVC is sig-nificantly reduced relative to the training time requiredfor SVC SVMs. Nonlinear SVMs can only be imple-mented in SVC, as LinearSVC does not offer the ker-nel trick. In addition to the resulting unfavorable scal-ing of the training wall-clock-time of nonlinear SVMs,the multi-dimensional optimization problem, if taskedto discriminate multiple qubit states, mostly resulted innon-optimal hyperplanes (for five qubits, nonlinear SVMsachieved an average qubit-state-assignment fidelity about10 % worse than the one achieved by its linear counter-part). We limit the study of nonlinear SVMs to a basiccharacterization due to the lack of qubit-state-assignmentfidelity robustness and the training-time requirements(for five qubits more than one day). Henceforth, we fo-cus on linear soft-margin SVMs as parallel single-qubit ormulti-qubit discriminators (in the one-versus-all mode).6 FIG. 4S. Measurement Data Processing and Discrimination. (a) M -dimensional data ( z IF [n]) processing for single-qubit (SQ)and multi-qubit (MQ) discrimination. For single-qubit discrimination, z IF [n] is digitally demodulated at the intermediatefrequency of a resonator i . The resulting signal z i [n] can be simplified with a boxcar filter (BF) [ M (cid:80) n z i [n] = ¯ I i + j ¯ Q ] or keptas sequences I i [n] and Q i [n]. The discriminators can either be trained with the spectator qubits exclusively in their groundstate (denoted by ∅ ) or, alternatively, in either their ground or excited state (denoted by ∗ ). For multi-qubit discriminators,the digitally demodulated signals z i [n] at all resonator frequencies i are stacked up. The resulting data block is subsequentlyused for the discriminator training. Alternatively, the discriminator can be tasked to discriminate z IF [n] directly without anydigital preprocessing. (b) Comparison of the geometric mean qubit-state-assignment fidelity for five qubits after a 1 µ s-longmeasurement and 10,000 training instances per qubit-state configuration. All single-qubit discriminators are evaluated usingtraining data with the spectator qubits in the ground as well as all combinations of ground and excited state. The matchedfilter (MF) threshold discriminator [the matched filter is part of the discriminator and thus not shown in (a)] is shown in twoconfigurations; the threshold set to 0 and the threshold optimized. The linear support vector machine (SVM) is applied toboxcar-filtered (BF) and time-trace data of I i [n] and Q i [n]. The multi-qubit discriminators are evaluated utilizing digitallydemodulated and unprocessed data. Shown are a multi-qubit linear SVM, a recurrent neural network (NN), a convolutionalNN, and feedforward NN.
3. Neural Networks (NN)
Typically, a neural network consists of an input layercomposed of several nodes—the number of nodes de-pends on the input data dimension—and an output layerthat contains the computed output values. In betweenthe input and output layer are layers of neurons— so-called hidden layers as their output value is not directlyaccessible—with unique tasks per layer. The input andoutput channels of a neuron are called edges, illustratedin Fig. 2S(a). Each neuron can be described as a math-ematical function of incoming weighted parameters—typically output values of other neurons—and exter- nal parameters. The function output generally passesthrough a nonlinear filter before it can serve as an inputto other neurons, depicted in Fig. 2S(b). Varying theconnectivity, neuron functions, and the nonlinear func-tion at each neuron output provides a flexible toolsetto engineer a broad spectrum of neural network types.Supervised training of such a network can optimize theweights for each neuron input and external parameter toalmost arbitrarily approximate any function.We have examined various neural network architec-tures to determine the most useful one in improving thequbit-state assignment fidelity and measurement time ofmulti-qubit devices. We have explored fully-connected7 P r e p a r e d S t a t e (--- , --- π ) Assigned State (---0,---1) A ss i gn m e n t P r ob a b ilit y MF = 0.644 N0.00.10.20.30.50.60.70.8 A ss i gn m e n t P r ob a b ilit y P r e p a r e d S t a t e (--- , --- π ) Assigned State (---0,---1)FNN = 0.691 N (a) (b) FIG. 5S. Qubit-State-Assignment Fidelity Analysis. Confusion (assignment probability) matrix of the feedforward neuralnetwork (FNN) (a) and matched filter (MF) (b). The rows of the confusion matrix encompass the probability distribution ofthe discriminator to assign each of the 32 qubit-state configurations to the row’s prepared qubit-state configuration (no pulseapplied, qubit initialized in the ground state: ∅ → π -pulse applied, qubit initialized in the excited state: π → F N , introduced in Eq. D2, represents a metric to indicate the overlap between the confusionmatrix and an identity matrix (the ideal confusion matrix). F N = 1 if the confusion matrix is an identity matrix. feedforward neural networks (FNN)—among the mostelementary neural networks—convolutional neural net-works (CNN)—among the most successful image clas-sification methods in use today—and long short-termmemory recurrent neural networks (LSTM)—among themost successful architectures in language processing. Thefully-connected FNN with three hidden layers excelled inassignment fidelity compared to the other neural networktypes.Implemented in PyTorch [49], the FNN architecturethat yields the highest assignment fidelity for five qubitsis composed of three hidden layers. The number ofnodes composing the input layer depends on the mea-surement time and the size of the discrete time-bins—here 2 ns. For a 1 µ s-long measurement time, the in-put layer contains 1,000 nodes with the in-phase andquadrature components alternating. The dimension ofthe first hidden layer is equal to, the second hidden layeris half of, and the third hidden layer is a quarter of theinput layer dimension. Finally, the output layer con-sists of 2 N nodes, with N being the number of qubits(32 for the five-qubit readout we focus on here). Theactivation function, the nonlinear filter acting on thehidden layer nodes, is a scaled exponential linear unit(SELU) [50], instead of the common rectified linear unit(ReLU) [67] due to its improved robustness and learningrate. The output layer is filtered using a softmax functionsoftmax( x i ) = exp( x i ) / (cid:80) j exp( x j ).Multiple training cycles, referred to as epochs, are re-quired to ensure the discriminator output to converge to the maximum qubit-state-assignment fidelity. Thenumber of epochs to reach a convergence plateau de-pends on the correction factor per cycle, the learningrate. We start with a more aggressive learning rate of0 . Appendix D: Result Analysis
In addition to a specific choice of discriminator, the to-be-discriminated data can be differently prepared. Typi-cally, the discrete time readout signals at intermediatefrequency, z IF [n] = I IF [n] + jQ IF [n], are digitally de-modulated following the steps outlined in Fig. 4S(a) andRef [4]. The signal components I i [n] = (cid:60) ( z i [n]) and8 Q i [n] = (cid:61) ( z i [n]) can be boxcar filtered [4] or kept asa sequences I i [n] and Q i [n]. For digitally demodulateddata and multi-qubit discrimination, z IF [n] are demodu-lated at each intermediate frequency. The resulting dig-itally demodulated time traces need to be stacked up toform a single data block before used as the input to themulti-qubit discriminator.Furthermore, the training data set can be either com-posed of all permutations of the qubit states or a specificsubset. Here, we focus on either training discriminatorswith qubits not involved in the training process, the spec-tator qubits, in all combinations of the ground and ex-cited state (indicated as ∗ ), or kept in the ground state(denoted by ∅ ).We evaluate the comparison for a measurement timeof 1 µ s after which four out of five qubits have reachedtheir maximum assignment fidelity for matched filters,as shown in Fig. 3S. For five qubits, a 1 µ s-long mea-surement time, and 10,000 training instances, we show acomparison of the qubit-state-assignment fidelity of theabove introduced single- and multi-qubit discriminatorapproaches in Fig. 4S(b). Optimizing the threshold ofMFs and using training data with the spectator qubits inthe ground state increases the qubit-state-assignment fi-delity. Single-qubit linear SVMs perform best if tasked todiscriminate vectorized digitally-demodulated data andtrained with a data set with all qubit-state combinationsrepresented.Multi-qubit linear SVMs appear to perform better iftasked to discriminate digitally demodulated readout sig-nals. On the contrary, the neural networks perform thebest if unprocessed data is used. The feedforward neu-ral network outperforms its counterparts, the recurrent and convolutional neural network, in the achieved qubit-state-assignment fidelity.In the main part of the manuscript, we focus on thebest performing discriminator approach of each category:matched filter, single-qubit linear SVM, multi-qubit lin-ear SVM, and neural networks.Next, we analyze the qubit-state-assignment probabil-ities using the metric of confusion matrices. Fig. 5S il-lustrates the confusion matrix for the FNN and MF dis-criminator. For an ideal confusion matrix with all pre-pared states agreeing with the assigned state, the confu-sion matrix is an identity matrix. To evaluate the over-lap between an identity matrix (entries represented as aKronecker delta δ ij with i and j representing the indicesof the matrix row and column) and a confusion matrix(with entries c ij ), we propose the following metric basedon the Frobenius norm || A || F = (cid:115)(cid:88) i (cid:88) j | c ij − δ ij | . (D1)To bound the Frobenius norm between 1 and 0, we nor-malize the Frobenius norm with the maximum value ofEq. D1 ( √ N +1 ). The normalized Frobenius norm isequal to 0 if the confusion matrix is exactly an identitymatrix. An alternative representation of this metric andmore closely related to fidelities with the best outcomeat 1 is F N = 1 − || A || F √ N +1 . (D2)Using Eq. D2 and shown in Fig. 5S, the MF achieves F N = 0 . F N =0 . ..