[PDF] Deep Neural Network Discrimination of Multiplexed Superconducting Qubit States

Abstract

Demonstrating a quantum computational advantage will require high-fidelity control and readout of multi-qubit systems. As system size increases, multiplexed qubit readout becomes a practical necessity to limit the growth of resource overhead. Many contemporary qubit-state discriminators presume single-qubit operating conditions or require considerable computational effort, limiting their potential extensibility. Here, we present multi-qubit readout using neural networks as state discriminators. We compare our approach to contemporary methods employed on a quantum device with five superconducting qubits and frequency-multiplexed readout. We find that fully-connected feedforward neural networks increase the qubit-state-assignment fidelity for our system. Relative to contemporary discriminators, the assignment error rate is reduced by up to 25% due to the compensation of system-dependent nonidealities such as readout crosstalk which is reduced by up to one order of magnitude. Our work demonstrates a potentially extensible building block for high-fidelity readout relevant to both near-term devices and future fault-tolerant systems.

Full PDF

DDeep Neural Network Discrimination of Multiplexed Superconducting Qubit States

Benjamin Lienhard,

1, 2, ∗ Antti Veps¨al¨ainen, Luke C. G. Govia, † Cole R. Hoﬀer,

1, 2

Jack Y. Qiu,

1, 2

Diego Rist`e, Matthew Ware, David Kim, Roni Winik, Alexander Melville, Bethany Niedzielski, Jonilyn Yoder, Guilhem J.Ribeill, Thomas A. Ohki, Hari K. Krovi, Terry P. Orlando,

1, 2

Simon Gustavsson, and William D. Oliver

1, 2, 4 Department of Electrical Engineering and Computer Science,Massachusetts Institute of Technology, Cambridge, MA 02139, USA Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA Quantum Engineering and Computing Group, Raytheon BBN Technologies, Cambridge, MA 02138, USA MIT Lincoln Laboratory, Lexington, MA 02421, USA (Dated: February 26, 2021)Demonstrating the quantum computational advantage will require high-ﬁdelity control and read-out of multi-qubit systems. As system size increases, multiplexed qubit readout becomes a practicalnecessity to limit the growth of resource overhead. Many contemporary qubit-state discriminatorspresume single-qubit operating conditions or require considerable computational eﬀort, limitingtheir potential extensibility. Here, we present multi-qubit readout using neural networks as statediscriminators. We compare our approach to contemporary methods employed on a quantum devicewith ﬁve superconducting qubits and frequency-multiplexed readout. We ﬁnd that fully-connectedfeedforward neural networks increase the qubit-state-assignment ﬁdelity for our system. Relative tocontemporary discriminators, the assignment error rate is reduced by up to 25 % due to the com-pensation of system-dependent nonidealities such as readout crosstalk which is reduced by up to oneorder of magnitude. Our work demonstrates a potentially extensible building block for high-ﬁdelityreadout relevant to both near-term devices and future fault-tolerant systems.

I. INTRODUCTION

Quantum computers hold the promise to solve particu-lar computational tasks substantially faster than conven-tional computers [1, 2]. Depending on the computationaltask, such quantum devices need to be composed of hun-dreds to millions of high-ﬁdelity qubits. An increase froma few to many qubits is generally accompanied by thechallenge of maintaining low error rates for qubit controland readout.Over the past two decades, superconducting qubitshave emerged as a leading quantum computing plat-form [3, 4]. Today, individual qubits with coherencetimes exceeding 100 µ s [5], gate times of a few tens ofnanoseconds [6], and single- and two-qubit gate operationﬁdelities above the most lenient thresholds for quantumerror correction have been demonstrated for devices withup to 50 qubits [6, 7]. However, considerable work is stillneeded to retain and even further improve these ﬁdelitiesas systems increase in size and complexity [8].Errors arise during all stages of the circuit model:initialization [9, 10], computation [11, 12], and read-out [13]. In many implementations, qubit readoutplays a key role beyond merely measuring the com-putational output. For example, quantum error cor-rection protocols require repeated readout of syndromequbits [8, 14, 15]. Even without error correction, manyof the noisy intermediate-scale quantum (NISQ) [16] eraalgorithms involve an iterative optimization that gener-ates a target quantum state based on prior trial-state ∗ [email protected] † [email protected] measurements of qubits [17, 18]. In addition, diagnosingqubit-readout errors in post-processing requires compu-tationally expensive statistical analyses of repeated com-putation and measurement [6, 19, 20]. Developing accu-rate and resource-eﬃcient qubit-state readout is a key torealize useful quantum information processing tasks.In this work, we present machine-learning-enabledqubit-state discrimination. We evaluate the qubit-state discrimination performance of deep neural networks(DNN) relative to contemporary methods used for super-conducting qubits. Nonlinear ﬁlters such as DNNs canbetter cope with system-dependent nonidealities, suchas readout crosstalk. To evaluate these diﬀerent qubit-state discriminator techniques, we use a quantum sys-tem comprising ﬁve frequency-tunable transmon qubitsread out simultaneously via a common feedline using astandard frequency multiplexing approach. In contrastto single-qubit readout, such a multi-qubit system issubject to nonidealities, such as readout crosstalk, thatmay beneﬁt from more sophisticated discriminators. Weshow that a DNN classiﬁer can eﬃciently converge toa higher-performing multi-qubit discriminator with suf-ﬁcient training. In our ﬁve-qubit system, we show thatqubit-state assignment errors are reduced by up to 25 %for multi-qubit architectures sharing a readout transmis-sion line [6, 21, 22]. By examining the qubit-state as-signment performance using a confusion matrix and thecross-ﬁdelity metric, we attribute the reduction to theDNN compensating for crosstalk.It has been shown that neural networks can learn thequantum evolution of a single superconducting qubit us-ing merely measurement data and without introducingthe rules of quantum mechanics [23]. Statistical learningalgorithms have been applied to superconducting qubit a r X i v : . [ qu a n t - ph ] F e b (a) (b)(c) (d)

300 K3K0.02 KControl ADCRT AmpHEMTJTWPA Readout R R R R R ω Res /2 π (GHz) 7.06 7.10 7.15 7.20 7.25 χ / κ eff Q Q Q Q ω Qubit /2 π (GHz) 5.09 4.40 5.00 4.30 5.17T ( μ s) 40.8 6.40 21.4 11.8 23.4QubitsReadoutResonators ADC5-Qubit Chip I Q AWG LO

Readout I Q AWG LO

Mixer I Q Q Q Q Q Q I IF [n] Q IF [n] FIG. 1. Measurement Setup and Chip. (a) Schematic of superconducting qubit control and readout. The control and readoutpulses, generated by an arbitrary waveform generator (AWG) and up-converted to GHz frequencies using a local oscillator(LO), are sent through attenuated signal lines to the readout resonator on the ﬁve-qubit chip. The transmitted readout signalis ampliﬁed by a Josephson traveling-wave parametric ampliﬁer (JTWPA), a high-electron-mobility transistor (HEMT), anda room-temperature ampliﬁer. Subsequently, the signal is down-converted to MHz frequencies and digitized—in-phase I IF [ n ]and quadrature Q IF [ n ] sequences at intermediate frequencies (IF). Colored optical micrograph (b) and the circuit schematic(c) comprising ﬁve superconducting transmon qubits. The qubit transition frequencies are tuned via a global ﬂux bias. Eachqubit is capacitively coupled to a quarter-wave readout resonator that couples inductively to a bandpass (Purcell) ﬁlteredfeedline. (d) The resonator frequencies ω Res / π are near 7 GHz with χ/κ eﬀ ratios ranging from 0 .

12 to 0 .

19, where χ and κ eﬀ are respectively the dispersive shift and the eﬀective resonator decay rate through the feedline. Table of the qubit lifetimes ( T )and operating frequencies ( ω Qubit / π ). Qubit color indicate the qubit operating frequency: red (purple) → lowest (highest)operating frequency. readout in the form of support vector machines [20], hid-den Markov models [24], or a reservoir computing ap-proach [25]. Using DNNs, improved single-qubit readoutﬁdelity has previously been demonstrated for trapped-ions and spin qubits [26–28]. In this manuscript, we ex-tend the application of neural networks to superconduct-ing qubit readout and, more generally, to dispersive qubitreadout. Furthermore, we demonstrate readout discrim-ination using a DNN of multiple simultaneously read outqubits on a single feedline. While we apply our methodsto a superconducting qubit system, we anticipate thatthey will generalize to other platforms. II. SUPERCONDUCTING QUBIT READOUT

Superconducting qubit readout is generally performedtoday under the paradigm of circuit quantum electro-dynamics (cQED) in the dispersive regime [29]. Here,the qubit is coupled to a far-detuned resonator, suchthat their interaction can be treated perturbatively. Theleading-order eﬀect on the resonator is a qubit-state-dependent frequency shift ˆ H disp = χ ˆ a † ˆ a ˆ σ z , where ˆ a isthe resonator lower operator, ˆ σ z the Pauli-Z operator de-scribing the qubit state, and χ the dispersive frequencyshift. As a result, a coherent microwave signal inci-dent on the resonator acquires a qubit-state-dependentphase shift upon transmission or reﬂection. The read-out resonator population has to remain below a criti-cal photon number, typically tens to hundreds of pho-tons, to remain in the dispersive readout regime. Low- FIG. 2. Measurement Data Processing and Discrimination. (a) Superconducting qubit-state discrimination can be accomplishedusing a single-qubit matched ﬁlter (MF) with kernel k i [n] which serves as a windowing function that projects the readout signalsto a single axis and subsequent discriminator threshold optimization (no pulse applied, denoted by ∅ , qubit initialized in theground state: ∅ → | (cid:105) and labeled as 0; π -pulse applied, denoted by π , qubit initialized in the excited state: π → | (cid:105) andlabeled as 1). We analyze (b) single-qubit linear support vector machines (SQ-LSVM), (c) multi-qubit LSVMs (MQ-LSVM),and (d) fully-connected feedforward neural networks (NN) as alternatives to MFs. The qubit-state-assignment ﬁdelity of theMF and LSVM is maximized if the intermediate frequency signal ( z IF [n] = I IF [n] + j Q IF [n]) is digitally demodulated (e.g., forresonator 1: z IF [n] . ∗ − j ω IF1 n = I [n]+j Q [n] with . ∗ indicating an element-wise multiplication). The training data is relabelled totrain ﬁve parallel single-qubit discriminators (MF, SQ-LSVM). The training data can either be limited to measurements duringwhich spectator qubits are kept in their ground state (denoted by ∅ ) or in all combinations of the ground and excited state(symbolized by ∗ . The MQ-LSVM as a single multi-qubit discriminator requires the digitally demodulated data to be stackedand concatenated to form a single data block. The feedforward NN does not require any digital demodulation or preprocessing. noise cryogenic preampliﬁcation—a Josephson traveling-wave parametric ampliﬁer (JTWPA) [30] at the mixingchamber (20 mK) and a high-electron-mobility transistor(HEMT) at 3 K— are used to improve the signal-to-noiseratio (SNR). Subsequent heterodyne detection and digi-tization of the ampliﬁed signal imprints the informationof the qubit state in the in-phase ( I ) and quadrature ( Q )components of the output signal, as depicted in Fig. 1(a).For multi-qubit systems, there are three main qubit-state-readout approaches. First, each qubit can bemeasured with a separate readout resonator, feed-line, and ampliﬁer chain—a resource-intensive approachwith minimal crosstalk. Alternatively, more-resource-eﬃcient readout architectures have several qubits cou-pled to a single readout resonator [31] or use frequency-multiplexed readout signals from multiple readout res- onators [32] sharing a single feedline and ampliﬁerchain [33]. In many contemporary architectures, Purcellﬁlters are added to further reduce residual oﬀ-resonantenergy decay from the qubits to the resonators [34, 35].For a qubit with static coupling to its readout res-onator, energy decay and excitation during the readoutare typically the primary sources of qubit measurementerrors. In addition, a frequency-multiplexed readout sig-nal contains state information on multiple qubits and issusceptible to crosstalk-induced qubit-state-readout er-rors. Such crosstalk errors occur due to intrinsic inter-actions between the qubits themselves, qubits couplingparasitically to the readout resonators associated withother qubits, or insuﬃcient spectral separation betweenreadout frequencies [21].As a result of crosstalk, state transitions due to deco-herence, and other nonidealities [36], multi-qubit hetero-dyne signals are more complicated than for single qubits,making state discrimination more challenging. Therehas been signiﬁcant progress in reducing error rates andmeasurement times for both single- and multi-qubit de-vices [21, 37]. However, managing, classifying, and ex-tracting useful information from the measured signal re-mains an important challenge in light of the complex er-ror mechanisms, such as crosstalk, introduced by multi-plexed readout at scale.Here, we focus on multiple frequency-tunable trans-mon qubits [38] arranged in a linear array with operatingfrequencies ω Qubit / π between 4 . . T ranging from 7 µ s to 40 µ s (see sup-plementary information [39] for additional details). Thequbits are connected via individual co-planar waveguideresonators to the same Purcell ﬁltered feedline, as de-picted in Fig. 1(b,c). The frequency-multiplexed readouttone comprises superposed baseband signals at interme-diate frequencies (IF) between 10 MHz to 150 MHz up-converted to the individual readout resonator frequen-cies ω Res . After passing the feedline, the transmittedand phase-shifted tones are down-converted to IF. Up-and down-conversion is conducted with a shared local os-cillator at 7 .

127 GHz. Lastly, the down-converted I - and Q -components of the signal are digitized with a 2 ns sam-pling period. The resulting sequences, I IF [ n ] and Q IF [ n ],are subsequently digitally processed—the focus of thiswork—to extract the individual qubit states. III. QUBIT-STATE DISCRIMINATION

We employ supervised machine learning methods toimprove superconducting qubit-state readout. This re-quires a classiﬁer capable of distinguishing the qubit-state-dependent phase shift encoded in the discrete-time I IF [n] and Q IF [n] sequences. This section will also reviewthe current approaches to state discrimination (which wewill use as comparative benchmarks). Boxcar ﬁlters average the equal-weighted digitally-demodulated elements of the I IF [n] and Q IF [n] discrete-time readout signal. The digital demodulation employedhere is further elaborated in the supplementary mate-rial [39]. Each boxcar ﬁltered digitally-demodulated se-quence I [n] and Q [n] results in a single two-dimensionaldata point in the IQ -plane [4]. Subsequently, the result-ing data set can be further processed and discriminatedsuch as for example with a support vector machine (seesupplementary materials [39]). Matched ﬁlter (MF) windows are generalized win-dowing functions with each element optimized to max-imize the SNR within a given system noise model [40].The boxcar window is the simplest example of a ﬁlterin the absence of such a noise model. For additive sta-tionary noise independent of the qubit state and diag-onal Gaussian covariance matrices, the optimal ﬁlter interms of the SNR uses a “window” or “kernel,” propor- tional to the diﬀerence between the mean ground- andexcited-state-readout signal, referred to as a “matchedﬁlter” in Ref. [41], “mode matched ﬁlter” in Ref. [21], oras “Fisher’s linear discriminant” in the context of statis-tics and machine learning [42]. Applying such a matchedﬁlter reduces each readout single-shot measurement toa single one-dimensional value dependent on the qubit-state-dependent phase, allowing the qubit states to bediscriminated by a simple threshold classiﬁer. Here, werefer to a discriminator composed of a matched ﬁlter [41]and subsequently optimized threshold as MF.While MFs are computationally eﬃcient and provablyoptimal (for stationary noise) for single qubits, the com-putational complexity to derive multi-qubit MFs scalesexponentially in the number of qubits, N [43]. Conse-quently, in practice, multi-qubit readout is conducted perqubit with individually optimized single-qubit MFs—theapproach used for many contemporary single- and multi-qubit readout schemes [6, 21, 41, 44, 45] and does not ac-count for noise sources and nonidealities present in mulit-qubit systems.The MF kernel k i [n] is equal to the diﬀerence be-tween the mean ground- and excited-state readout sig-nal normalized by its standard deviation, which mustbe measured experimentally using calibration runs withknown qubit states. In our setup, the highest qubit-state-assignment ﬁdelity for MFs is achieved using time tracesrecorded with the other qubits (spectator qubits) initial-ized in their ground states, as depicted in Fig. 2(a). Thisis a consequence of the simple noise model presumed forthe MF, and thus, the MF discriminator does not capturemulti-qubit readout crosstalk. In this paper we use theMF as a baseline to compare the following methods (seethe supplementary materials [39] for other variations ofall the methods). Support vector machines (SVM) are quadraticprograms [46, 47] with the objective to maximize thedistance between each data point and a decision bound-ary, a learned hyperplane separating two distinct classes.SVMs are a purely geometric approach to discrimination.For a single superconducting qubit, it has been reportedthat SVMs generate decision boundaries superior to thatof MFs, as realistic noise deviates from the simple single-qubit noise model assumed for the MF [20].Similar to the MF approach, multi-qubit-state discrim-ination can be conducted using a SVM classiﬁer perqubit-readout signal. In contrast to our MF tune-up,we ﬁnd that the highest assignment ﬁdelity is achievedwhen the SVMs are trained using qubit-state measure-ment traces with the spectator qubits prepared in allcombinations of ground and excited states.Alternatively, multi-qubit states can be discriminatedby a single SVM composed of several hyperplanes thatpartition the full multidimensional IQ -space, shown inFig. 2(c). Such a multi-qubit SVM can be tuned usinga “one-versus-all” strategy. We solve 2 N ( N , the numberof qubits) two-class discrimination problems with a singlequbit state as one class and the remaining qubit statesas the other. In our analysis, linear SVMs (LSVM) usedas parallel single- and multi-qubit discriminators outper-form their nonlinear counterparts in robustness, compu-tational eﬃciency, and assignment ﬁdelity [39]. Deep neural networks (DNN) are mapping func-tions composed of arbitrarily connected nodes arrangedin layers [48]. Depending on the layer organization andthe functions governing the connections between nodes,diﬀerent neural network archetypes can be generated.Here, we investigate three of the most common and suc-cessful DNNs: fully-connected feedforward neural net-works, convolutional neural networks, and recurrent neu-ral networks. We ﬁnd a fully-connected feedforwardneural network (FNN)—implemented in PyTorch [49]—outperforms the other network architectures in qubit-state-assignment ﬁdelity. Our FNN architecture is com-posed of three hidden layers (1st, 2nd, and 3rd layer con-sist of 1000, 500, and 250 nodes, respectively) that useSELU activation functions [50], and a softmax appliedto the 2 N -node output layer. The network is trained(validation-training set ratio of 0.35) using the Adamoptimizer [51] with categorical cross-entropy as the lossfunction.In contrast to the MF and LSVM, the FNN candirectly discriminate the frequency-multiplexed multi-qubit readout sequences I IF [n] and Q IF [n] without de-modulation or ﬁltering. Training the network directly onthe multiplexed readout signal bypasses the need for fur-ther preprocessing stages, suggesting a more eﬃcient useof the measurement output, as illustrated in Fig. 2(d).In addition, fewer independent operations in the readoutchain may reduce the possibility of systematic errors. IV. RESULTS

We now present our ﬁve-qubit readout experiment re-sults, comparing the performance of parallelized single-qubit MFs, parallelized single-qubit LSVMs (SQ-LSVM),multi-qubit LSVM (MQ-LSVM), and FNN approaches.The same qubit-readout sequences I IF [n] and Q IF [n] withvarying amounts of preprocessing [Fig. 2]—are used forall approaches. We compare the discrimination results,a ﬁve-bit string with each bit representing the assignedstate of a qubit. The qubit-state-assignment ﬁdelity forqubit i is F i = 1 − [ P (0 i | π i ) + P (1 i |∅ i )] / , (1)where P (0 i | π i ) is the conditional probability of assigningthe ground state with label 0 to qubit i when preparedin the excited state with a π -pulse applied. P (1 i |∅ i ) isthe conditional probability of assigning the excited statewith label 1 to qubit i when prepared in the ground state(no pulse applied: ∅ ).The data to train and evaluate the discriminator per-formance was acquired using the ﬁve-qubit chip intro-duced in Fig. 1(b,c). For ﬁve qubits, all 32 qubit-state permutations are sequentially initialized and the mea-surement output is recorded. The generated data setcontains 50,000 single-shot sequences I IF [n] and Q IF [n]recorded over 2 µ s for each qubit-state conﬁguration. Therecorded data set is subsequently divided into a random-ized training and test set (15,000 traces per qubit-stateconﬁguration for training and 35,000 for testing). All ofthe following results are evaluated using 35,000 single-shot measurements per qubit-state conﬁguration.We quantify the assignment ﬁdelity per qubit using thegeometric mean assignment ﬁdelity, F GM = ( F F F F F ) / , (2)with each qubit-state-assignment ﬁdelity deﬁned byEq. 1. Both SVM approaches improve the assignmentﬁdelity relative to the MF, with the parallelized single-qubit SVM outperforming the multi-qubit approach by0 . µ s-measurement time. For multi-classdiscriminators such as the MQ-LSVM, geometric con-straints result in ambiguous regions without a uniqueclass assigned [52], which leads to poor performance rel-ative to the other approaches. After a 1 µ s-long mea-surement time, the FNN, compared to the MF, in-creases the qubit-state-assignment ﬁdelity from 0 .

885 to0 . − (1 − F FNN ) / (1 − F MF )] by 0.244. Compared to theSQ-LSVM, the FNN increases the qubit-state-assignmentﬁdelity from 0 .

905 to 0 .

913 and thus reduces the single-qubit assignment error by 0 . µ s-measurement time and 10,000 trainingsamples per qubit-state conﬁguration.The assignment ﬁdelity per qubit, discriminated in-dividually and in parallel with up to N = 5 qubits, ispresented in Fig. 3(c). For N -qubit discrimination taskswith N >

2, the FNN starts outperforming its discrim-inator alternatives. Except for qubit 2, the per-qubit-assignment ﬁdelity decreases with an increasing numberof discriminated qubits. We observe a more substantial  G M

10 10 10

Number of Training Samples

MFSQ-LSVMMQ-LSVMFNN (a) (b)(c)

Measurement Time ( μ s) MFSQ-LSVMMQ-LSVMFNN

QubitMFSQ-LSVMMQ-LSVMFNN

12 34 50.760.740.72

Qubit Qubit Qubit Qubit  G M  a ss i gn m e n t Q ub it s FIG. 3. Qubit-State-Assignment Fidelity. (a) Geometric mean qubit-state-assignment ﬁdelity F GM (Eq. 2) for ﬁve qubitsversus measurement time for the matched ﬁlter (MF), single-qubit linear support vector machine (SQ-LSVM), multi-qubitlinear SVM (MQ-LSVM), and the fully-connected feedforward neural network (FNN). (b) F GM versus the number of traininginstances for each of the 32 qubit-state conﬁgurations evaluated after a measurement time of 1 µ s [vertical dashed-dotted linein (a)]. (c) Achievable assignment ﬁdelity F assignment per qubit when N = { , , . . . , } qubits are simultaneously discriminatedafter a 1 µ s-measurement time. For each N -qubit discrimination task, the spectator qubits are initialized in their ground state.Single-qubit discrimination ( N = 1): the ﬁrst data point of each of the ﬁve panels represents the single-qubit F assignment deﬁned by Eq. 1, while the states of the four spectator qubits are not discriminated and initialized in their ground state. Whenemployed as single-qubit discriminators, all methods perform similarly. Two-qubit discrimination ( N = 2): The following fourdata points show F assignment when the state of each panel’s qubit is simultaneously discriminated with the state of one otherqubit. N-qubit discrimination ( N > N − N -qubit discrimination task, the non-spectator qubits are indicated with a colored square at the graphbottom. assignment ﬁdelity decrease if the resonators involved inthe discrimination are proximal in frequency, suggest-ing the occurrence of readout crosstalk. In addition toreadout crosstalk, qubit 3 reveals control crosstalk withqubit 1 and 5, the qubits closest in frequency. Under theassumption of additive stationary noise independent ofthe qubit state and diagonal Gaussian covariance matri-ces, the estimated upper qubit-state-assignment ﬁdelitybound per qubit for MFs [20] including the label conﬁ-dence [39] are F MF1 ≈ . F MF2 ≈ . F MF3 ≈ . F MF4 ≈ .

95, and F MF5 ≈ . F MF2 isprimarily reduced due to T -events and limited qubit-state separation in the IQ -plane (see supplementary in-formation [39] for additional details). The diﬀerent dis- criminators yield a similar assignment ﬁdelity within afew tenths of a percent of the upper MF assignment ﬁ-delity bound—except for qubit 2 where it is oﬀ by a fewpercent—when tasked to discriminate a single qubit, asshown in Tab. I. The small discrepancy between this up-per bound and the achieved assignment ﬁdelity suggeststhat the noise sources aﬀecting single-qubit readout inour devices are reasonably well approximated by addi-tive stationary noise independent of the qubit state anddiagonal Gaussian covariance matrices. As the numberof simultaneously discriminated qubits increases, the as-signment ﬁdelity increasingly deviates from F MF i , reveal-ing system dynamics unaccounted for by the Gaussiannoise model. TABLE I. Qubit-assignment ﬁdelity if discriminated individually, F i , and in parallel with all other qubits, F i . The lastﬁve columns present the assignment ﬁdelity for an N -qubit discrimination process with N = { , , . . . , } . (cid:104)F N Q (cid:105) repre-sents the mean assignment ﬁdelity of all qubit permutations. The single-qubit assignment ﬁdelity is similar for all discrim-inator approaches. For a two-qubit discrimination task, the SQ-LSVM and FNN outperform the MF and MQ-LSVM. For N -discrimination tasks with N >

2, the FNN outperforms all other methods.Qubit 1 Qubit 2 Qubit 3 Qubit 4 Qubit 5 (cid:104)F (cid:105) (cid:104)F (cid:105) (cid:104)F (cid:105) (cid:104)F (cid:105) (cid:104)F (cid:105)F F F F F F F F F F MF 0.971 0.968 0.740 0.719 0.962 0.914 0.946 0.934 0.976 0.967 0.9185 0.9100 0.9042 0.8993 0.8946SQ-LSVM 0.970 0.969 0.740 0.744 0.963 0.924 0.951 0.943 0.976 0.968

The confusion matrix, a matrix P assign with the qubit-state-assignment probability distribution for each pre-pared qubit-state conﬁguration as rows, provides furtherinsight into the underlying error mechanisms. The confu-sion matrix is the identity matrix if each prepared stateis correctly labeled and assigned. In practice, in addi-tion to misclassiﬁcation, the preparation of states can beimperfect. We estimate the mean state preparation ﬁdeli-ties for each qubit [39]: F prep1 ≈ . F prep2 ≈ . F prep3 ≈ . F prep4 ≈ . F prep5 ≈ . P FNNassign and P MFassign , shown in Fig. 4(a). The FNN generally re-duces the erroneous oﬀ-diagonal assignment probabilitiesrelative to the MF. The most signiﬁcant exception beingthe lower oﬀ-diagonal elements corresponding to decay ofqubit 2, as presented in Fig. 4(b).Deviations from the ideal confusion matrix occur dueto initialization errors, state transitions during the mea-surement, or readout crosstalk. Typically, the qubit-statemisclassiﬁcations in the lower oﬀ-diagonal block outweighthose of the upper oﬀ-diagonal due to the greater likeli-hood of decay events at cryogenic temperatures. Here,for a 1 µ s-long measurement, qubit 2—the qubit withthe shortest lifetime—has a 15 % probability of T -decay,such that for a signiﬁcant portion of the training mea-surements with qubit 2 excited, the ﬁnal state of qubit 2is the ground state.As shown in Fig. 4(b), the FNN is more likely to as-sign a ground-state label to qubit 2 than an excited-statelabel, whereas the MF reveals the reverse trend. Thissuggests that the assignment probabilities of the FNNagree better with the expected error model. However,we can attribute the pattern of the MF assignment prob-ability to a training bias. Since measurements with qubit2 prepared in the excited state and corrupted by a T -decay have integrated signals similar to measurementswith qubit 2 prepared in the ground state, the thresh-old optimizer overcompensates to correctly classify T -decay corrupted excited-state measurements at the costof misclassiﬁcation of ground-state measurements. Thisresults in the misclassiﬁcation pattern seen in Fig. 4(b) TABLE II. Mean absolute value, (cid:104)| · |(cid:105) , of the qubit-state-assignment correlations between readout resonators i and j ( i (cid:54) = j ) extracted from the cross-ﬁdelity matrix F CF whenusing a MF or FNN discriminator. (cid:104)|F CF j = i ± |(cid:105) (cid:104)|F CF j = i ± |(cid:105) (cid:104)|F CF j = i ± |(cid:105) (cid:104)|F CF j = i ± |(cid:105) MF 0.020 0.015 0.006 ∼ ∼ for P MFassign .From the confusion matrix, we can further extract theprobability distribution of the non-zero Hamming dis-tance. This is the probability distribution describingthe number of misassigned qubits per qubit-state con-ﬁguration. The assignment errors of the FNN (MF) are85 . . . . . . F CF ij is deﬁned as F CF ij = (cid:104) − [ P (1 i |∅ j ) + P (0 i | π j )] (cid:105) , (3)where ∅ j ( π j ) represent the preparation of qubit j inthe ground (excited) state and 0 i (1 i ) the subsequentassignment to the ground (excited) state ( (cid:104) f (cid:105) denotesthe mean value of a function f ). A positive (negative)oﬀ-diagonal indicates a correlation (anti-correlation) be-tween the two qubits. Such correlations can occur due toreadout crosstalk. The oﬀ-diagonal entries for the FNNare all less than one percent, and are drastically reducedrelative to the MF. Relative to the MF, the mean cross-ﬁdelity, (cid:104)|F CF ij |(cid:105) , for nearest neighbors ( j = i ±

1) is re-duced by one order of magnitude from (cid:104)|F

MF CF j = i ± |(cid:105) = 0 . (cid:104)|F FNN CF j = i ± |(cid:105) = 0 . Assigned QubitFNN MFAssigned Qubit (a)(b)

Assigned State (---0,---1) P a ss i gn F NN P a ss i gn M F – P r e p a r e d S t a t e (--- , --- π ) (c) -0.20.00.20.60.81.0 C r o ss - F i d e lit y P r e p a r e d Q ub it

00 01 10 11 π π P assignFNN P assignMF – -0.2 0.0 0.2 Assigned State P r e p a r e d S t a t e π

00 0 π π ππ

00 0 π P assign – = FIG. 4. Assignment Fidelity Analysis. (a) Diﬀerence betweenthe confusion (assignment probability) matrix of the feedfor-ward neural network (FNN) P FNNassign and of the matched ﬁlter(MF) P MFassign . The rows of the confusion matrix encompassthe discriminator’s probability distribution to assign each ofthe 32 qubit-state conﬁgurations to the row’s prepared qubit-state conﬁguration (no pulse applied, qubit initialized in theground state: ∅ → π -pulse applied, qubit initialized in theexcited state: π → for all j (cid:54) = i , as presented in Tab. II. The FNN’s reductionof assignment correlations by up to one order of magni-tude corroborates the claim of the FNN’s diminishingreadout-crosstalk-related discrimination errors. V. CONCLUSION

We have demonstrated an approach to multi-qubitreadout using neural networks as multi-qubit state dis-criminators that is more crosstalk-resilient than othercontemporary approaches. We ﬁnd that a fully-connected FNN increases the readout assignment ﬁ-delity for a multi-qubit system compared to contempo-rary methods. We observe that the FNN compensatessystem-nonidealities such as readout crosstalk more ef-fectively relative to alternatives such as matched ﬁlters(MFs) or support vector machines (SVMs). The assign-ment error rate is diminished by up to 25 % and crosstalk-induced discrimination errors are suppressed by up to oneorder of magnitude. The relative assignment ﬁdelity im-provement of the FNN over its contemporary alternativesgrows as the number of simultaneously read out and mul-tiplexed qubits increases.While FNNs are initially more resource-intensive intraining, its re-calibration can be signiﬁcantly more eﬃ-cient due to transfer learning [53]. Periodic re-calibrationof control and readout parameters is necessary as quan-tum systems drift in time. For a marginal drift, neu-ral networks can be updated at a fraction of the initialresource requirements. Furthermore, to speed up qubitreadout, the techniques developed here can be transi-tioned to dedicated hardware such as ﬁeld-programmablegate arrays (FPGA) [27].We have tested our FNN multi-qubit-state discrim-ination approach on a quantum system with ﬁve su-perconducting qubits and frequency-multiplexed read-out. While the readout ﬁdelity of Qubit 2 was relativelymarginal, four qubits revealed multi-qubit readout ﬁdeli-ties comparable with contemporary multi-qubit systems,albeit with measurement times around 1 µ s (see supple-mentary information [39] for additional details), muchlonger than the state of the art of 100 ns for single-qubitsystems [37]. We demonstrated an improvement usingFNN for all qubits. The next step is to test the per-formance of FNNs on higher-ﬁdelity multi-qubit systemswith measurement times below 100 ns to assess if the ad-vantage is retained on already high-performing devices.FNNs oﬀer a readout-state discrimination approach tai-lored to the underlying system. They can be readilyemployed to more general discrimination tasks than wehave considered here, such as multi-level readout in aqudit architecture [54–57]. This work presents a poten-tial building block to scaling quantum processors whilemaintaining high-ﬁdelity readout. ACKNOWLEDGEMENTS

We want to express our appreciation for MirabellaPulido and Chihiro Watanabe for administrative assis-tance. This research was funded in part by the DARPAPolyplexus grant No. HR00112010001; by the U.S. ArmyResearch Oﬃce (ARO) Multidisciplinary University Re-search Initiative (MURI) W911NF-18-1-0218; and by the Department of Defense via Lincoln Laboratory under AirForce Contract No. FA8721-05-C-0002. The views andconclusions contained herein are those of the authors andshould not be interpreted as necessarily representing theoﬃcial policies or endorsements, either expressed or im-plied, of DARPA or the US Government. [1] L. K. Grover, A fast quantum mechanical algorithm fordatabase search, Proceedings, 28th Annual ACM Sym-posium on the Theory of Computing , 212 (1996).[2] P. Shor, Polynomial-Time Algorithms for Prime Factor-ization and Discrete Logarithms on a Quantum Com-puter, Proceedings of the 37’th Annual Symposium onFoundations of Computer Science (FOCS) (IEEE Press,Burlington, VT (1996).[3] X. Gua, A. F. Kockum, A. Miranowicz, Y.-X. Liu,and F. Nori, Microwave photonics with superconductingquantum circuits, Physics Reports , 1 (2017).[4] P. Krantz, M. Kjaergaard, F. Yan, T. P. Orlando, S. Gus-tavsson, and W. D. Oliver, A quantum engineer’s guideto superconducting qubits, Appl. Phys. Rev. , 021318(2019).[5] X. Y. Jin, A. Kamal, A. P. Sears, T. Gudmundsen,D. Hover, J. Miloshi, R. Slattery, F. Yan, J. Yoder, T. P.Orlando, S. Gustavsson, and W. D. Oliver, Thermal andresidual excited-state population in a 3d transmon qubit,Phys. Rev. Lett. , 240501 (2015).[6] F. Arute, K. Arya, R. Babbush, et al. , Quantumsupremacy using a programmable superconducting pro-cessor, Nature , 505 (2019).[7] M. Kjaergaard, M. E. Schwartz, J. Braum¨uller,P. Krantz, J. I.-J. Wang, S. Gustavsson, and W. D.Oliver, Superconducting qubits: Current state of play,Annual Review of Condensed Matter Physics (2019).[8] J. M. Gambetta, J. M. Chow, and M. Steﬀen, Buildinglogical qubits in a superconducting quantum computingsystem, npj Quantum Inf. , 350 (2017).[9] D. Rist`e, J. G. van Leeuwen, H.-S. Ku, K. W. Lehn-ert, and L. DiCarlo, Initialization by Measurement of aSuperconducting Quantum Bit Circuit, Phys. Rev. Lett. (2012).[10] J. E. Johnson, C. Macklin, D. H. Slichter, R. Vijay,E. B. Weingarten, J. Clarke, and I. Siddiqi, HeraldedState Preparation in a Superconducting Qubit, Phys.Rev. Lett. (2012).[11] M. D. Reed, L. DiCarlo, S. E. Nigg, L. Sun, L. Frun-zio, S. M. Girvin, and R. J. Schoelkopf, Realization ofthree-qubit quantum error correction with superconduct-ing circuits, Nature , 382 (2012).[12] R. Barends, J. Kelly, A. Megrant, A. Veitia, D. Sank,E. Jeﬀrey, T. C. White, J. Mutus, A. G. Fowler,B. Campbell, et al. , Superconducting quantum circuitsat the surface code threshold for fault tolerance, Nature , 500 (2014).[13] P. Krantz, A. Bengtsson, M. Simoen, S. Gustavsson,V. Shumeiko, W. D. Oliver, C. M. Wilson, P. Delsing,and J. Bylander, Single-shot read-out of a superconduct- ing qubit using a Josephson parametric oscillator, Nat.Commun. , 1 (2016).[14] D. P. DiVincenzo, Fault tolerant architectures for super-conducting qubits, Phys. Scr. T 137 , 014020 (2009).[15] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N.Cleland, Surface codes: Towards practical large-scalequantum computation, Phys. Rev. A , 032324 (2012).[16] J. Preskill, Quantum Computing in the NISQ era andbeyond, Quantum (2018).[17] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q.Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien,A variational eigenvalue solver on a photonic quantumprocessors, Nat. Commun. (2014).[18] E. Farhi, J. Goldstone, and S. Gutmann, A quan-tum approximate optimization algorithms (2014),arXiv:1411.4028.[19] F. B. Maciejewski, Z. Zimbor´as, and M. Oszmaniec, Mit-igation of readout noise in near-term quantum devices byclassical post-processing based on detector tomography,Quantum , 257 (2020).[20] E. Magesan, J. M. Gambetta, A. C`orcoles, and J. M.Chow, Machine Learning for Discriminating QuantumMeasurement Trajectories and Improving Readout, Phys.Rev. Lett. , 200501 (2015).[21] J. Heinsoo, C. K. Andersen, A. Remm, S. Krinner,T. Walter, Y. Salath´e, S. Gasparinetti, J. C. Besse,A. Potoˇcnik, A. Wallraﬀ, and C. Eichler, Rapid High-ﬁdelity Multiplexed Readout of Superconducting Qubits,Phys. Rev. Appl. , 1 (2018).[22] C. C. Bultink, T. E. O’Brien, R. Vollmer, N. Muthusub-ramanian, M. W. Beekman, M. A. Rol, X. Fu, B. Tarasin-ski, V. Ostroukh, B. Varbanov, A. Bruno, and L. Di-Carlo, Protecting quantum entanglement from leakageand qubit errors via repetitive parity measurements, Sci-ence Advances (2020).[23] E. Flurin, L. S. Martin, S. Hacohen-Gourgy, and I. Sid-diqi, Using a recurrent neural network to reconstructquantum dynamics of a superconducting qubit fromphysical observations, Phys. Rev. X , 011006 (2020).[24] L. A. Martinez, Y. J. Rosen, and J. L. DuBois, Improvingqubit readout with hidden markov models, Phys. Rev. A , 062426 (2020).[25] G. Angelatos, S. Khan, and H. E. T¨ureci, Reservoir com-puting approach to quantum state measurement (2020),arXiv:2011.09652 [quant-ph].[26] A. Seif, K. A. Landsman, N. M. Linke, C. Figgatt,C. Monroe, and M. Hafezi, Machine learning assistedreadout of trapped-ion qubits, Journal of Physics B:Atomic, Molecular and Optical Physics , 174006(2018). [27] Z.-H. Ding, J.-M. Cui, Y.-F. Huang, C.-F. Li, T. Tu,and G.-C. Guo, Fast High-Fidelity Readout of a Sin-gle Trapped-Ion Qubit via Machine-Learning Methods,Phys. Rev. Applied , 014038 (2019).[28] Y. Matsumoto, T. Fujita, A. Ludwig, A. D. Wieck,K. Komatani, and A. Oiwa, Noise-robust classiﬁcationof single-shot electron spin readouts using a deep neuralnetwork (2020), arXiv:2012.10841 [quant-ph].[29] A. Blais, R. S. Huang, A. Wallraﬀ, S. M. Girvin, and R. J.Schoelkopf, Cavity quantum electrodynamics for super-conducting electrical circuits: An architecture for quan-tum computation, Phys. Rev. A - At. Mol. Opt. Phys. , 1 (2004).[30] C. Macklin, K. O’Brien, D. Hover, M. E. Schwartz,V. Bolkhovsky, X. Zhang, W. D. Oliver, and I. Siddiqi,A near–quantum-limited Josephson traveling-wave para-metric ampliﬁer, Science , 307 (2015).[31] L. DiCarlo, M. D. Reed, L. Sun, B. R. Johnson, J. M.Chow, J. M. Gambetta, L. Frunzio, S. M. Girvin, M. H.Devoret, and R. J. Schoelkopf, Preparation and measure-ment of three-qubit entanglement in a superconductingcircuit, Nature (2010).[32] M. Jerger, S. Poletto, P. Macha, U. H¨ubner, E. Il’ichev,and A. V. Ustinov, Frequency division multiplexing read-out and simultaneous manipulation of an array of ﬂuxqubits, Appl. Phys. Lett. , 042604 (2012).[33] E. Jeﬀrey, D. Sank, J. Y. Mutus, T. C. White, J. Kelly,R. Barends, Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, et al. , Fast Accurate State Measurement with Supercon-ducting Qubits, Phys. Rev. Lett. , 190504 (2014).[34] E. A. Sete, J. M. Martinis, and A. N. Korotkov, Quan-tum theory of a bandpass purcell ﬁlter for qubit readout,Phys. Rev. A , 012325 (2015).[35] C. Neill, P. Roushan, K. Kechedzhi, S. Boixo, S. V.Isakov, V. Smelyanskiy, A. Megrant, B. Chiaro,A. Dunsworth, K. Arya, R. Barends, et al. , A blueprintfor demonstrating quantum supremacy with supercon-ducting qubits, Science , 195 (2018).[36] L. C. G. Govia and F. K. Wilhelm, Unitary-feedback-improved qubit initialization in the dispersive regime,Phys. Rev. Applied , 054001 (2015).[37] T. Walter, P. Kurpiers, S. Gasparinetti, P. Mag-nard, A. Potoˇcnik, Y. Salath´e, M. Pechal, M. Mon-dal, M. Oppliger, C. Eichler, and A. Wallraﬀ, RapidHigh-Fidelity Single-Shot Dispersive Readout of Super-conducting Qubits, Phys. Rev. Appl. , 1 (2017).[38] J. Koch, T. M. Yu, J. Gambetta, A. A. Houck, D. I.Schuster, J. Majer, A. Blais, M. H. Devoret, S. M. Girvin,and R. J. Schoelkopf, Charge-insensitive qubit design de-rived from the cooper pair box, Phys. Rev. A , 042319(2007).[39] Supplementary Information.[40] G. Turin, An introduction to matched ﬁlters, IRE Trans-actions on Information Theory , 311 (1960).[41] C. A. Ryan, B. R. Johnson, J. M. Gambetta, J. M. Chow,M. P. Da Silva, O. E. Dial, and T. A. Ohki, Tomographyvia correlation of noisy measurement records, Phys. Rev.A - At. Mol. Opt. Phys. , 1 (2015).[42] C. M. Bishop, Pattern Recognition and Machine Learning(Information Science and Statistics) (Springer-Verlag,Berlin, Heidelberg, 2006).[43] K. Fukunaga,

Introduction to Statistical Pattern Recogni-tion (2nd Ed.) (Academic Press Professional, Inc., USA,1990). [44] N. T. Bronn, B. Abdo, K. Inoue, S. Lekuch, A. D.C`orcoles, J. B. Hertzberg, M. Takita, L. S. Bishop, J. M.Gambetta, and J. M. Chow, Fast, high-ﬁdelity readout ofmultiple qubits, J. Phys.: Conf. Ser. , 012003 (2017).[45] C. C. Bultink, B. Tarasinski, N. Haandbæk, S. Poletto,N. Haider, D. J. Michalak, A. Bruno, and L. DiCarlo,General method for extracting the quantum eﬃciencyof dispersive qubit readout in circuit QED, Appl. Phys.Lett. , 092601 (2018).[46] B. E. Boser, I. M. Guyon, and V. N. Vapnik, A trainingalgorithm for optimal margin classiﬁers, in

Proceedings ofthe Fifth Annual Workshop on Computational LearningTheory , COLT ’92 (Association for Computing Machin-ery, New York, NY, USA, 1992) p. 144–152.[47] C. Cortes and V. Vapnik, Support-vector networks, Ma-chine Learning , 273 (1995).[48] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learn-ing (The MIT Press, 2016).[49] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury,G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. , Pytorch: An imperative style, high-performancedeep learning library, in

Advances in Neural InformationProcessing Systems 32 (Curran Associates, Inc., 2019)pp. 8024–8035.[50] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochre-iter, Self-normalizing neural networks, in

Proceedings ofthe 31st International Conference on Neural InformationProcessing Systems , NIPS’17 (Curran Associates Inc.,Red Hook, NY, USA, 2017) p. 972–981.[51] D. P. Kingma and J. Ba, Adam: A method for stochasticoptimization (2017), arXiv:1412.6980.[52] R. O. Duda and P. E. Hart,

Pattern Classiﬁcation andScene Analysis (John Willey & Sons, New Yotk, 1973).[53] Y. Bengio, Deep learning of representations for unsu-pervised and transfer learning, in

Proceedings of ICMLWorkshop on Unsupervised and Transfer Learning , Pro-ceedings of Machine Learning Research, Vol. 27, editedby I. Guyon, G. Dror, V. Lemaire, G. Taylor, andD. Silver (JMLR Workshop and Conference Proceedings,Bellevue, Washington, USA, 2012) pp. 17–36.[54] P. Kurpiers, P. Magnard, T. Walter, B. Royer, M. Pechal,J. Heinsoo, Y. Salath´e, A. Akin, S. Storz, J.-C. Besse,S. Gasparinetti, A. Blais, and A. Wallraﬀ, Deterministicquantum state transfer and remote entanglement usingmicrowave photons, Nature , 1476 (2018).[55] S. S. Elder, C. S. Wang, P. Reinhold, C. T. Hann,K. S. Chou, B. J. Lester, S. Rosenblum, L. Frunzio,L. Jiang, and R. J. Schoelkopf, High-ﬁdelity measurementof qubits encoded in multilevel superconducting circuits,Phys. Rev. X , 011001 (2020).[56] M. A. Yurtalan, J. Shi, G. J. K. Flatt, and A. Lupascu,Characterization of multi-level dynamics and decoher-ence in a high-anharmonicity capacitively shunted ﬂuxcircuit (2020), arXiv:2008.00593 [quant-ph].[57] C. Wang, M.-C. Chen, C.-Y. Lu, and J.-W. Pan, Optimalreadout of superconducting qubits exploiting high-levelstates, Fundamental Research , 16 (2021).[58] B. Lienhard, J. Braum¨uller, W. Woods, D. Rosenberg,G. Calusine, S. Weber, A. Veps¨al¨ainen, K. O’Brien,T. P. Orlando, S. Gustavsson, and W. D. Oliver,Microwave packaging for superconducting qubits, in (2019) pp. 275–278. [59] S. Huang, B. Lienhard, G. Calusine, A. Veps¨al¨ainen,J. Braum¨uller, D. K. Kim, A. J. Melville, B. M.Niedzielski, J. L. Yoder, B. Kannan, T. P. Orlando,S. Gustavsson, and W. D. Oliver, Microwave packagedesign for superconducting quantum processors (2020),arXiv:2012.01438.[60] F. Yan, S. Gustavsson, A. Kamal, J. Birenbaum, A. P.Sears, D. Hover, T. J. Gudmundsen, D. Rosenberg,G. Samach, S. Weber, J. L. Yoder, T. P. Orlando,J. Clarke, A. J. Kerman, and W. D. Oliver, The ﬂuxqubit revisited to enhance coherence and reproducibility,Nat. Commun. , 12964 (2016).[61] A. Blais, A. L. Grimsmo, S. M. Girvin, andA. Wallraﬀ, Circuit quantum electrodynamics (2020),arXiv:2005.12667 [quant-ph].[62] Y. LeCun, L. Bottou, Y. Bengio, and P. Haﬀner,Gradient-based learning applied to document recogni-tion, Proceedings of the IEEE , 2278 (1998).[63] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, Bert:Pre-training of deep bidirectional transformers for lan-guage understanding (2019), arXiv:1810.04805 [cs.CL].[64] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou,M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran,T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis,A general reinforcement learning algorithm that masterschess, shogi, and go through self-play, Science , 1140(2018).[65] R. A. Fisher, The use of multiple measurements in taxo-nomic problems, Annals of Eugenics , 179 (1936).[66] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa,A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer,A. Gramfort, J. Grobler, R. Layton, J. Vanderplas,A. Joly, B. Holt, and G. Varoquaux, Api design for ma-chine learning software: experiences from the scikit-learnproject (2013), arXiv:1309.0238 [cs.LG].[67] V. Nair and G. E. Hinton, Rectiﬁed linear units improverestricted boltzmann machines, in Proceedings of the 27thInternational Conference on International Conference onMachine Learning , ICML’10 (Omnipress, Madison, WI,USA, 2010) p. 807–814.[68] Y. Bengio, Practical Recommendations for Gradient-Based Training of Deep Architectures, in

Neural Net-works: Tricks of the Trade: Second Edition (SpringerBerlin Heidelberg, Berlin, Heidelberg, 2012) pp. 437–478. Appendix A: Measurement Setup

Qubit control and readout pulses—envelopes with co-sine shaped rising and falling edges encompassing aplateau—are programmed in Labber. They are cre-ated using three—two for control and one for readout—Keysight M3202A PXI arbitrary waveform generators(AWG) with a sampling rate of 1 GSa s − . The in-phase( I ) and quadrature ( Q ) components of the signals atMHz frequencies are up-converted to the qubit transitionfrequency using an IQ -mixer and a local oscillator (LO)(Rohde and Schwarz SGS100A) per AWG. The controland readout tones are combined and sent to the qubitchip in the dilution refrigerator via a single microwaveline attenuated by 60 dB.The qubit chip is mounted in a microwave package fol-lowing design principles as reported in Refs. [58, 59]. Acoil—centered above the qubit chip—is mounted in thedevice package. A global ﬂux bias Φ is applied throughthat coil to the superconducting quantum interferencedevices (SQUID) of the qubits using a Yokogawa GS200.The readout signal, upon acquisition of a qubit-state-dependent phase shift, is ﬁrst ampliﬁed using a Joseph-son traveling-wave parametric ampliﬁer (JTWPA) withnear quantum-limited performance over a bandwidth ofmore than 2 GHz and a 1 dB compression point of ap-proximately −

100 dBm [30]. An Agilent E8267D signalgenerator provides the pump tone for the JTWPA. Themicrowave line carrying the pump tone is attenuated by50 dB and fed into the JTWPA via a set of directionalcouplers and isolators located in the mixing chamber ofthe refrigerator. The signal is further ampliﬁed by ahigh-electron-mobility transistor (HEMT) ampliﬁer thatis thermally anchored to the 3 K stage.At room temperature, the readout signal is ampliﬁed,IQ-mixed with the LO at 7 .

127 GHz, and fed into a het-erodyne detector. The I - and Q -components of the read-out signal are digitized with a Keysight M3102A PXIAnalog to Digital Converter (ADC) at a sampling rate of500 MSa s − . The subsequent digital signal processing todistinguish qubit states is the focus of this manuscript. Appendix B: Five-Qubit Chip

The quantum system ﬁve superconducting qubits isfabricated on a (001) silicon substrate ( > Ω cm) bydry etching a molecular-beam epitaxy (MBE) grown alu-minum ﬁlm in an optical lithography process before beingdiced into 5 × chips, as described in [60].The superconducting chip consists of coplanar waveg-uides and ﬁve frequency-tunable transmon qubits [38].The target qubit transition frequencies alternate between4 . . → operat-ing frequency) to limit qubit-qubit and control crosstalk.The capacitive nearest-neighbor (next-nearest-neighbor)qubit-qubit coupling rate, J nn ( J nnn ), is designed (us-ing COMSOL Multiphysics ® ) to be J nn / π ≈

14 MHz

TABLE IS. Chip comprising ﬁve superconducting frequency-tunable transmon qubits with alternating transition frequen-cies. A normalized magnetic ﬂux bias Φ / Φ (magnetic ﬂuxquantum Φ ) detunes the qubits from their idling to theiroperating frequency. The qubit anharmonicities α are in themoderate transmon regime. The qubit lifetimes T , Ramseycoherence times T , and spin-echo relaxation times T aremeasured at the qubit operating frequency.Qubit ω Qubit / π Bias α/ π T T T Idle Biased (cid:16) ΦΦ (cid:17) (MHz) ( µ s)(GHz)1 5.249 5.092 0.124 -212 40.8 1.3 7.42 4.708 4.404 0.160 -216 6.4 0.6 4.13 5.202 5.000 0.166 -204 21.4 1.0 7.24 4.560 4.309 0.154 -214 11.8 0.8 5.45 5.196 5.165 0.085 -200 23.4 7.6 31.8TABLE IIS. Chip comprising ﬁve superconducting readoutresonators at bare resonance frequencies ∼ ω LO / π = 7 .

127 GHz. Eachresonator couples to a designated qubit with strength g , lead-ing to a dispersive shift χ . The eﬀective resonator decay ratethrough the Purcell ﬁlter is κ eﬀ . The qubit-resonator inter-action remains in the dispersive regime for readout resonatorphoton populations below the critical photon number n crit .Resonator ω Res / π ω IF / π g/ π χ/ π κ eﬀ / π n crit (GHz) (MHz) (MHz)1 7.06 -65 116.3 0.83 4.29 33.82 7.10 -26 143.3 0.51 4.25 55.33 7.15 24 125.7 0.77 4.41 34.94 7.20 70 133.1 0.49 3.33 56.95 7.25 127 125.4 0.80 6.90 33.0 ( J nnn / π < < . < .

01 MHz) [61]. Each qubit couples ca-pacitively to a quarter-wave resonator that couples in-ductively to a shared bandpass (Purcell) ﬁltered feed-line. Neighboring readout resonator frequencies diﬀer by ∼

50 MHz. The qubit and resonator operation parametersare included in Tab. IS and Tab. IIS.

Appendix C: Qubit-State Discriminators

The study of computational algorithms with the abilityto improve through experience is typically referred to asmachine learning [42]. These algorithms strive to identifypatterns in sample data, called training data, and cre-ate an approximate model of an underlying decision pro-cess without explicit instructions. While many machinelearning ideas are several decades old, they only recentlybecame widely applicable due to the development of suf-ﬁcient computational resources and are applied today inimage processing [62], natural language processing [63],3or playing advanced games such as chess [64].Machine learning can be broadly divided into threecategories: unsupervised, supervised, and reinforcementlearning. Here, we focus on supervised learning meth-ods that learn an input-output mapping function usinga trusted set of input-output pairs (training set). Typ-ically, the input-output pairs for training are acquiredby the “supervisor,” hence the terminology. The qualityof the learned mapping function can be probed utilizingan additional set of trusted input-output pairs (test set).The comparison of performance of a supervised learningmethod on the training set compared to the test set isreferred to as generalization.

1. Matched Filter (MF) Threshold Discriminator

A matched ﬁlter applied to a readout single-shot mea-surement, the average of the element-wise product ofa readout signal and optimized kernel—optimized interms of signal-to-noise ratio (SNR)—projects the multi-dimensional input data to a single dimension such thatthe data can be linearly partitioned [40]. For stationarynoise and if the two classes are symmetric and Gaussiandistributed, a kernel proportional to the mean groundand excited state is optimal [41, 42]. For a system withThe optimized discriminator threshold is then located at0, the axis origin [41]. While such classiﬁers are typicallynot attributed to classical learning algorithms, the ﬁltertune-up and threshold optimization require a “training”step.For superconducting qubits, the optimal kernel is equalto the diﬀerence between the mean ground- and excited-state-readout signal normalized by the signal variance,which must be measured experimentally using calibra-tion runs with known qubit states—as described andtermed “matched ﬁlter” in Ref. [41], “mode matchedﬁlter” in Ref. [21], or as “Fisher’s linear discriminant”in Ref. [42]. In our implementation, as illustrated inFig. 1S(a), each matched ﬁlter kernel—following the ter-minology in Ref. [41]—is multiplied with a rectangularwindow to limit the impact of nonidealities such as qubit-energy decay. Summing up the element-wise product ofthe windowed matched ﬁlter kernel k i [n] and the readoutsignal, I i [n] and Q i [n], yields a distribution along a singledimension (here, along I i ). A threshold optimized witha linear support vector machine (or optimizer of yourchoice) partitions the one-dimensional projection into aground- and excited-state class, depicted in Fig. 1S(b).Finally, the concatenation of the one-bit labels assignedby each single-qubit discriminator results in the assignedﬁve-qubit-state label. Note, the demodulation step at in-termediate frequencies using e − j ω IFi n with ω IF i deﬁned inTab. IIS (as described in Ref. [4]) can be incorporated inthe kernel tune-up.Under the assumption of symmetric noise, the achiev-able assignment ﬁdelity depends on the separation R be-tween the ground- and excited-state-readout signals, S and S , referred to as the Fisher criterion [65]. The sep-aration R is deﬁned as R = ( (cid:104) S (cid:105) − (cid:104) S (cid:105) ) / var( S ) , (C1)with a symmetric variance, var( S ) = var( S ) = var( S )( (cid:104) f (cid:105) denotes the mean value of f ). For Gaussian dis-tributed states and diagonal covariance matrices, R canbe maximized using the introduced matched ﬁlter ker-nel k ∝ (cid:104) S − S (cid:105) / [var( S ) + var( S )] [41, 42]. For asystem with additive stationary noise independent of thequbit state and diagonal Gaussian covariance matrices,the maximally achievable assignment ﬁdelity is F ach = 12 (cid:104) (cid:16)(cid:112) R/ (cid:17)(cid:105) , (C2)with erf( z ), the Gauss error function of z .The qubit-readout-state histograms that result afterthe matched ﬁlter are ﬁt with Gaussian functions, shownin Fig. 1S(b). For the ﬁt functions, the variance forthe ground and excited state are kept identical to eval-uate F ach , as presented in Tab. IIIS. Fitting the groundstate with a bimodal and the excited state with a tri-modal Gaussian ﬁt reveals aspects of the state tran-sition dynamics such as thermal excitations or qubit-energy decays. The product of the label and achievableﬁdelity provides an estimation of the upper boundaryfor the matched ﬁlter (MF) discriminator qubit-state-assignment ﬁdelity F MF , as shown in the last columnof Tab. IIIS.

2. Support Vector Machine (SVM)

Support vector machines (SVMs)—known for their ro-bustness and good generalization—are fundamental two-class discriminators that draw a single decision boundary,called a hyperplane, in a supervised learning scheme [46,47]. The margin between the classes and the hyperplanecan be maximized by penalizing misclassiﬁed data pointsand data points within the margin boundaries. Thepenalty for data points within the margin boundaries canbe varied using a regularization term. A lenient penaltyresults in a so-called soft-margin SVM which can bettercope with problems that are not linearly-separable.The hyperplane dimension is equal to the one less thanthe number of features–the dimensions of the measure-ment data. The location of a new data point relativeto the hyperplane decides on the associated label. Thisdeterministic decision process is not probabilistic, andthe information on the probability of label associationis thus not directly accessible. While hyperplane sep-arations only work for linearly-separable data, nonlin-ear SVMs use the kernel trick to map the data pointsto higher dimensions via a nonlinear transformation andﬁnd a hyperplane in that higher-order feature space.Several SVMs can be trained in concert for multi-classdiscrimination to divide the feature space into areas asso-ciated with distinct classes [52]. For an N -class ( N >

TABLE IIIS. Numerical values extracted from Gaussian ﬁts to readout data distribution after a 1 µ s-measurement time using amatched ﬁlter, as illustrated in Fig. 1S(a,b). The peak ratio of bimodal Gaussian ﬁts (with equal variance) to the readout-traceshistograms of qubits initialized in the ground state (no pulse applied: ∅ ) provide insight in the thermal excitation probability P (1 |∅ ). Comparing the peak ratios for trimodal Gaussian ﬁts to the readout-traces histograms of qubits initialized in theexcited state ( π -pulse applied: π ) indicate the conditional probability for qubit-energy decays P (0 | π ) and second-excited statepopulation P (2 | π ). F label = 1 − ( P (1 |∅ ) + P (0 | π )) / F π represents the ﬁtted π -pulse ﬁdelities. (cid:104) S (cid:105) , (cid:104) S (cid:105) , and var( S ) arethe mean ground state, mean excited state, and variance of both states used to derive the Fisher criterion R and achievableassignment ﬁdelity F ach (see Eq. C1, C2). F MF , the product of F label and F ach , is an estimate for an upper qubit-state-assignment ﬁdelity bound for a classiﬁer composed of a matched ﬁlter and the subsequent optimized threshold, here referredto as MF.Qubit P (1 i |∅ i ) P (2 i |∅ i ) P (0 i | π i ) P (2 i | π i ) F label F π (cid:104) S (cid:105) (cid:104) S (cid:105) var( S ) R F ach F MF (cid:28) (cid:28) (cid:28) (cid:28) (cid:28) (a)(b)(c) Qubit 1 Qubit 2 Qubit 3 Qubit 4 Qubit 5 N o r m a li ze d C oun t s Measurement Time ( μ s) T i m e - B i n W e i gh t ( a . u . ) MFMF. RW − − − − − − − − − − π threshold02 π − bound a r y − − − − − -Quadrature (a.u.) I - Q u a d r a t u r e ( a . u . ) Q -Quadrature (a.u.) I ＊ FIG. 1S. Readout Data Statistics. (a) Magnitude of the time-bin weights of the qubit-speciﬁc matched ﬁlter shapes derivedusing prepared ground and excited states. A rectangular window (RW) is applied to each matched ﬁlter kernel to reduce theimpact of qubit-energy decays and maximize qubit-state-assignment ﬁdelities. The resulting matched ﬁlter windows are shadedin gray. (b) Shown are the histograms of the qubit-state-readout single-shot traces after applying the optimized 1 µ s-longmatched ﬁlter. The dashed lines represent the optimized thresholds with the states to the right attributed to the ground stateand left to the excited state. Using bimodal Gaussian ﬁt functions for the ground state (green) and trimodal Gaussian ﬁtfunctions for the excited state (blue) provides insight into the underlying dynamics such as thermal excitation or qubit-energydecays (see Tab. IIIS). (c) Plotted are boxcar ﬁltered single-shot traces of ground (black) and excited states (gray) in the IQ -plane. A linear support vector machine trained on the two-dimensional data generates the qubit-speciﬁc colored discriminationboundary.  G M t e s t G M t r a i n /

100 200 3000.91.0 − − − L ea r n i ng R a t e , η Epoch (a)(b) − − − − − − − − − − − − (c) Output LayerInput Layer

N=5 N=1N=2N=3N=4maximum bx n w n f(z) HiddenLayersNode l–1 x n+1l–1 x ml f(z)x ml f(z)=--w-x- +b n n nl–1 SELULayer l Output:Layer l Inputs:

FIG. 2S. Architecture and Training of Fully-Connected Feed-forward Neural Network (FNN). (a) The FNN architectureused here comprises an input layer, three hidden layers, andan output layer. For a 1 µ s-long measurement time, the inputlayer consists of 1,000 nodes. 1,000, 500, and 250 nodes formthe ﬁrst, second, and third hidden layer. The output layerscales as 2 N (N, the number of qubits). For ﬁve qubits, theoutput layer encompasses 32 nodes. (b) The nodes composingthe hidden layer l are functions that depend on the followingparameter inputs: the output values x l − n of the prior layer l − b . The output value x lm of node m corresponds to the weighted (weights w n ) sum of the in-puts x l − n and the bias b after passing through an activationfunction, here a scaled exponential linear unit (SELU), shownin orange. (c) Shown is the training performance for an FNNtasked to discriminate N qubits with N = 1 , , . . . ,

5. Thegeneralization—the ratio of the geometric mean test F testGM andtraining qubit-state-assignment ﬁdelity F trainGM —as the num-ber of epochs increases is shown in black using the left y-axis.The associated standard deviation of the generalization is in-dicated in gray. The number of epochs to achieve the max-imum qubit-state-assignment ﬁdelity is indicated with a redvertical bar. The learning rate η , shown in blue and usingthe right y-axis, is gradually reduced as the number of epochsincreases. classiﬁcation task, the number of necessary hyperplanesis at least N − Measurement Time ( μ s)  a ss i gn m e n t FIG. 3S. Qubit-State-Assignment Fidelity. Matched ﬁlter dis-criminator for each qubit versus measurement time. The max-imum assignment ﬁdelity F i ( t i ) for each qubit i is reachedafter t = 1 µ s, t = 2 µ s, t = 0 . µ s, t = 0 . µ s, and t = 0 . µ s. be associated with a single class [42].Here, we use scikit-learn library to implement single-qubit and multi-qubit linear and nonlinear SVMs inPython [66]. We employ the LinearSVC implementa-tion for linear and SVC for nonlinear soft-margin SVMswith regularization parameters optimized per discrimi-nator to deliver the maximally achievable qubit-state-assignment ﬁdelity. In general, the training wall-clock-time for an SVM implemented using LinearSVC is sig-niﬁcantly reduced relative to the training time requiredfor SVC SVMs. Nonlinear SVMs can only be imple-mented in SVC, as LinearSVC does not oﬀer the ker-nel trick. In addition to the resulting unfavorable scal-ing of the training wall-clock-time of nonlinear SVMs,the multi-dimensional optimization problem, if taskedto discriminate multiple qubit states, mostly resulted innon-optimal hyperplanes (for ﬁve qubits, nonlinear SVMsachieved an average qubit-state-assignment ﬁdelity about10 % worse than the one achieved by its linear counter-part). We limit the study of nonlinear SVMs to a basiccharacterization due to the lack of qubit-state-assignmentﬁdelity robustness and the training-time requirements(for ﬁve qubits more than one day). Henceforth, we fo-cus on linear soft-margin SVMs as parallel single-qubit ormulti-qubit discriminators (in the one-versus-all mode).6 FIG. 4S. Measurement Data Processing and Discrimination. (a) M -dimensional data ( z IF [n]) processing for single-qubit (SQ)and multi-qubit (MQ) discrimination. For single-qubit discrimination, z IF [n] is digitally demodulated at the intermediatefrequency of a resonator i . The resulting signal z i [n] can be simpliﬁed with a boxcar ﬁlter (BF) [ M (cid:80) n z i [n] = ¯ I i + j ¯ Q ] or keptas sequences I i [n] and Q i [n]. The discriminators can either be trained with the spectator qubits exclusively in their groundstate (denoted by ∅ ) or, alternatively, in either their ground or excited state (denoted by ∗ ). For multi-qubit discriminators,the digitally demodulated signals z i [n] at all resonator frequencies i are stacked up. The resulting data block is subsequentlyused for the discriminator training. Alternatively, the discriminator can be tasked to discriminate z IF [n] directly without anydigital preprocessing. (b) Comparison of the geometric mean qubit-state-assignment ﬁdelity for ﬁve qubits after a 1 µ s-longmeasurement and 10,000 training instances per qubit-state conﬁguration. All single-qubit discriminators are evaluated usingtraining data with the spectator qubits in the ground as well as all combinations of ground and excited state. The matchedﬁlter (MF) threshold discriminator [the matched ﬁlter is part of the discriminator and thus not shown in (a)] is shown in twoconﬁgurations; the threshold set to 0 and the threshold optimized. The linear support vector machine (SVM) is applied toboxcar-ﬁltered (BF) and time-trace data of I i [n] and Q i [n]. The multi-qubit discriminators are evaluated utilizing digitallydemodulated and unprocessed data. Shown are a multi-qubit linear SVM, a recurrent neural network (NN), a convolutionalNN, and feedforward NN.

3. Neural Networks (NN)

Typically, a neural network consists of an input layercomposed of several nodes—the number of nodes de-pends on the input data dimension—and an output layerthat contains the computed output values. In betweenthe input and output layer are layers of neurons— so-called hidden layers as their output value is not directlyaccessible—with unique tasks per layer. The input andoutput channels of a neuron are called edges, illustratedin Fig. 2S(a). Each neuron can be described as a math-ematical function of incoming weighted parameters—typically output values of other neurons—and exter- nal parameters. The function output generally passesthrough a nonlinear ﬁlter before it can serve as an inputto other neurons, depicted in Fig. 2S(b). Varying theconnectivity, neuron functions, and the nonlinear func-tion at each neuron output provides a ﬂexible toolsetto engineer a broad spectrum of neural network types.Supervised training of such a network can optimize theweights for each neuron input and external parameter toalmost arbitrarily approximate any function.We have examined various neural network architec-tures to determine the most useful one in improving thequbit-state assignment ﬁdelity and measurement time ofmulti-qubit devices. We have explored fully-connected7 P r e p a r e d S t a t e (--- , --- π ) Assigned State (---0,---1) A ss i gn m e n t P r ob a b ilit y MF = 0.644  N0.00.10.20.30.50.60.70.8 A ss i gn m e n t P r ob a b ilit y P r e p a r e d S t a t e (--- , --- π ) Assigned State (---0,---1)FNN = 0.691  N (a) (b) FIG. 5S. Qubit-State-Assignment Fidelity Analysis. Confusion (assignment probability) matrix of the feedforward neuralnetwork (FNN) (a) and matched ﬁlter (MF) (b). The rows of the confusion matrix encompass the probability distribution ofthe discriminator to assign each of the 32 qubit-state conﬁgurations to the row’s prepared qubit-state conﬁguration (no pulseapplied, qubit initialized in the ground state: ∅ → π -pulse applied, qubit initialized in the excited state: π → F N , introduced in Eq. D2, represents a metric to indicate the overlap between the confusionmatrix and an identity matrix (the ideal confusion matrix). F N = 1 if the confusion matrix is an identity matrix. feedforward neural networks (FNN)—among the mostelementary neural networks—convolutional neural net-works (CNN)—among the most successful image clas-siﬁcation methods in use today—and long short-termmemory recurrent neural networks (LSTM)—among themost successful architectures in language processing. Thefully-connected FNN with three hidden layers excelled inassignment ﬁdelity compared to the other neural networktypes.Implemented in PyTorch [49], the FNN architecturethat yields the highest assignment ﬁdelity for ﬁve qubitsis composed of three hidden layers. The number ofnodes composing the input layer depends on the mea-surement time and the size of the discrete time-bins—here 2 ns. For a 1 µ s-long measurement time, the in-put layer contains 1,000 nodes with the in-phase andquadrature components alternating. The dimension ofthe ﬁrst hidden layer is equal to, the second hidden layeris half of, and the third hidden layer is a quarter of theinput layer dimension. Finally, the output layer con-sists of 2 N nodes, with N being the number of qubits(32 for the ﬁve-qubit readout we focus on here). Theactivation function, the nonlinear ﬁlter acting on thehidden layer nodes, is a scaled exponential linear unit(SELU) [50], instead of the common rectiﬁed linear unit(ReLU) [67] due to its improved robustness and learningrate. The output layer is ﬁltered using a softmax functionsoftmax( x i ) = exp( x i ) / (cid:80) j exp( x j ).Multiple training cycles, referred to as epochs, are re-quired to ensure the discriminator output to converge to the maximum qubit-state-assignment ﬁdelity. Thenumber of epochs to reach a convergence plateau de-pends on the correction factor per cycle, the learningrate. We start with a more aggressive learning rate of0 . Appendix D: Result Analysis

In addition to a speciﬁc choice of discriminator, the to-be-discriminated data can be diﬀerently prepared. Typi-cally, the discrete time readout signals at intermediatefrequency, z IF [n] = I IF [n] + jQ IF [n], are digitally de-modulated following the steps outlined in Fig. 4S(a) andRef [4]. The signal components I i [n] = (cid:60) ( z i [n]) and8 Q i [n] = (cid:61) ( z i [n]) can be boxcar ﬁltered [4] or kept asa sequences I i [n] and Q i [n]. For digitally demodulateddata and multi-qubit discrimination, z IF [n] are demodu-lated at each intermediate frequency. The resulting dig-itally demodulated time traces need to be stacked up toform a single data block before used as the input to themulti-qubit discriminator.Furthermore, the training data set can be either com-posed of all permutations of the qubit states or a speciﬁcsubset. Here, we focus on either training discriminatorswith qubits not involved in the training process, the spec-tator qubits, in all combinations of the ground and ex-cited state (indicated as ∗ ), or kept in the ground state(denoted by ∅ ).We evaluate the comparison for a measurement timeof 1 µ s after which four out of ﬁve qubits have reachedtheir maximum assignment ﬁdelity for matched ﬁlters,as shown in Fig. 3S. For ﬁve qubits, a 1 µ s-long mea-surement time, and 10,000 training instances, we show acomparison of the qubit-state-assignment ﬁdelity of theabove introduced single- and multi-qubit discriminatorapproaches in Fig. 4S(b). Optimizing the threshold ofMFs and using training data with the spectator qubits inthe ground state increases the qubit-state-assignment ﬁ-delity. Single-qubit linear SVMs perform best if tasked todiscriminate vectorized digitally-demodulated data andtrained with a data set with all qubit-state combinationsrepresented.Multi-qubit linear SVMs appear to perform better iftasked to discriminate digitally demodulated readout sig-nals. On the contrary, the neural networks perform thebest if unprocessed data is used. The feedforward neu-ral network outperforms its counterparts, the recurrent and convolutional neural network, in the achieved qubit-state-assignment ﬁdelity.In the main part of the manuscript, we focus on thebest performing discriminator approach of each category:matched ﬁlter, single-qubit linear SVM, multi-qubit lin-ear SVM, and neural networks.Next, we analyze the qubit-state-assignment probabil-ities using the metric of confusion matrices. Fig. 5S il-lustrates the confusion matrix for the FNN and MF dis-criminator. For an ideal confusion matrix with all pre-pared states agreeing with the assigned state, the confu-sion matrix is an identity matrix. To evaluate the over-lap between an identity matrix (entries represented as aKronecker delta δ ij with i and j representing the indicesof the matrix row and column) and a confusion matrix(with entries c ij ), we propose the following metric basedon the Frobenius norm || A || F = (cid:115)(cid:88) i (cid:88) j | c ij − δ ij | . (D1)To bound the Frobenius norm between 1 and 0, we nor-malize the Frobenius norm with the maximum value ofEq. D1 ( √ N +1 ). The normalized Frobenius norm isequal to 0 if the confusion matrix is exactly an identitymatrix. An alternative representation of this metric andmore closely related to ﬁdelities with the best outcomeat 1 is F N = 1 − || A || F √ N +1 . (D2)Using Eq. D2 and shown in Fig. 5S, the MF achieves F N = 0 . F N =0 . ..