[PDF] Mixed State Entanglement Classification using Artificial Neural Networks

Abstract

Reliable methods for the classification and quantification of quantum entanglement are fundamental to understanding its exploitation in quantum technologies. One such method, known as Separable Neural Network Quantum States (SNNS), employs a neural network inspired parameterisation of quantum states whose entanglement properties are explicitly programmable. Combined with generative machine learning methods, this ansatz allows for the study of very specific forms of entanglement which can be used to infer/measure entanglement properties of target quantum states. In this work, we extend the use of SNNS to mixed, multipartite states, providing a versatile and efficient tool for the investigation of intricately entangled quantum systems. We illustrate the effectiveness of our method through a number of examples, such as the computation of novel tripartite entanglement measures, and the approximation of ultimate upper bounds for qudit channel capacities.

Full PDF

MMixed State Entanglement Classiﬁcation using Artiﬁcial Neural Networks

Cillian Harney , Mauro Paternostro , and Stefano Pirandola Computer Science and York Centre for Quantum Technologies,University of York, York YO10 5GH, United Kingdom and Centre for Theoretical Atomic, Molecular and Optical Physics,School of Mathematics and Physics, Queen’s University Belfast, Belfast BT7 1NN, United Kingdom (Dated: February 12, 2021)Reliable methods for the classiﬁcation and quantiﬁcation of quantum entanglement are funda-mental to understanding its exploitation in quantum technologies. One such method, known asSeparable Neural Network Quantum States (SNNS), employs a neural network inspired parameter-isation of quantum states whose entanglement properties are explicitly programmable. Combinedwith generative machine learning methods, this ansatz allows for the study of very speciﬁc formsof entanglement which can be used to infer/measure entanglement properties of target quantumstates. In this work, we extend the use of SNNS to mixed, multipartite states, providing a versatileand eﬃcient tool for the investigation of intricately entangled quantum systems. We illustrate theeﬀectiveness of our method through a number of examples, such as the computation of novel tri-partite entanglement measures, and the approximation of ultimate upper bounds for qudit channelcapacities.

The core tasks of entanglement classiﬁcation [1–3] and quantiﬁcation [4–6] are essential for future quantum tech-nologies, and ask the seemingly straightforward ques-tions: Given a quantum state ρ , is it entangled? If so,by how much is it entangled? As the system size ordimension of a quantum system grows, these questionsbecome highly non-trivial and in general there are nouniversal criteria or methods to provide answers. Themost popular mathematical recipe for classiﬁcation, thePositive Partial Transposition (PPT) criterion (or Peres-Horodecki criterion) [7, 8], applies only to (2 ⊗

2) or (2 ⊗ d -dimensional quantum states. We show how SNNS can beused to perform highly speciﬁc entanglement classiﬁca-tion, and approximate entanglement measures to a veryhigh degree of accuracy. The ability to implicitly charac-terise the space of separable states is extremely valuable,and allows one to compute entanglement measures thatare otherwise extremely diﬃcult to measure, such as theRelative Entropy of Entanglement (REE) [28].This paper is structured as follows: In Section I werevise the NNS architecture and its variants for pureand mixed states. Section II overviews separable archi-tectures, and shows how speciﬁc forms of entanglementcan be guaranteed. In Section III the methods of clas-siﬁcation and quantiﬁcation using SNNS are discussed.Section IV provides numerical evidence for their utilitythrough a number of relevant examples, with interest-ing applications in the study of noisy tripartite entan-glement, bound entanglement, and quantum channel ca-pacities. Finally, conclusions and future directions areaddressed in Section V. I. NEURAL NETWORK QUANTUM STATESA. Pure states

The simplest neural network model we can introduceis the positive, real NNS. This model uses a real val-ued restricted Boltzmann machine (RBM) architecture,with n v visible units s = { s , . . . , s n v } representing thenumber of qudits being modelled within the target quan-tum system, fully interconnected with n h hidden units h = { h , . . . , h n h } . The visible units are typically binaryvalued to study d = 2 dimensional systems, s i ∈ {− , } as are the hidden units h j ∈ {− , } ; however this de-pends on the system being modelled. This network ar- a r X i v : . [ qu a n t - ph ] F e b chitecture allows us to capture the correlations of theobjective quantum system through network parameters:Π = { a k , b j , W kj } for k ∈ [1 , n v ] , j ∈ [1 , n h ] , (1) a ∈ R n v , b ∈ R n h , W ∈ R n v × n h , (2)where a are visible biases, b are hidden biases, and W isthe network weight matrix. The total number of param-eters is | Π | = n h · n v + n h + n v (see Fig. 1).The inherent advantage oﬀered by the RBM architec-ture for generative modelling is that there are no intra -layer connections (i.e. there are no connections betweenadjacent visible units or hidden units). This allows foran ansatz that is independent from the activations of thehidden state space. Thus, one can deﬁne a positive NNSwavefunction as [12]Ψ Π ( s ) = e (cid:80) nvk =1 a k s k n h (cid:89) j =1 (cid:32)(cid:88) k W kj s k + b j (cid:33) , (3)and therefore the NNS is | Ψ Π (cid:105) = (cid:80) s Ψ Π ( s ) | s (cid:105) .Whilst NNS have typically been applied to qubit sys-tems using binary visible units, one can extend the mod-elling to d -dimensional qudits by using a set of visiblebinary neurons that collectively represent a single qu-dit [17]. One may choose to encode d -dimensional statesusing a collection of ˜ d visible, binary neurons via an en-coding function C , i.e. | s (cid:105) (cid:55)→ C ( s ) = { g , g , . . . , g ˜ d } = g . (4)The n v qudit visible-layer can then be encoded into ˜ n v =˜ dn v > n v visible neurons, s = { s , s , . . . , s n v } (cid:55)→ { g , g , . . . , g ˜ n v } . (5)We may identically deﬁne the qudit decoding function¯ C such that ¯ C ( g ) = | s (cid:105) . One may encode qudits intobinary codes on the visible-layer | s (cid:105) (cid:55)→ bin( s ), requiring˜ n v = (cid:100) log d (cid:101) n v visible binary neurons, which howeverrequires d = 2 r for some integer r in order to admit acomplete basis set. For arbitrary d it may be more usefulto utilise one-hot encoding such that | s (cid:105) (cid:55)→ onehot( s ) = e ds where e ds is a d -length vector that is zero at all indicesexcept index s .In order to study non-positive quantum states one canintroduce complex network parameters. Letting a k = α k + iβ k , b j = γ j + iλ j , and W kj = Γ kj + i Λ kj , then theNNS wavefunction isΨ Π ( s ) = e (cid:80) nvk =1 ( α k + iβ k ) s k n h (cid:89) j =1 (cid:0) θ γj + iθ λj (cid:1) , (6)where θ γj = (cid:80) k Γ kj s k + γ j , and θ λj = (cid:80) k Λ kj s k + λ j .Thus the NNS can exhibit phase properties of quan-tum states. The network parameter set extends toΠ = { a k , b j , W kj } ∈ C .Alternatively one can preserve reality of network pa-rameters by restructuring the nature of the NNS ansatz (a) NNS Qudit Architecture s s k s n v b j | s i , s ∈ { , . . . , d } n v h ∈ {− , } n h W kj ... ......... a k s g g g ˜ d ... = ¯ C ! g ∈ {− , } ˜ d (b) Amplitude/Phase NNS s s k s n v b j | s i , s ∈ { , . . . , d } n v h ∈ {− , } n h W kj ... ...... ......... c j U kj Ξ - PhaseΠ - Amplitude

Figure 1. Neural network quantum state architectures forthe simulation of pure-states. Panel (a) illustrates the stan-dard NNS construction for n qudits. The visible-layer con-sists of n v × ˜ d units which encode the accessible basis statesof the target system; Here ˜ d is the number of visible unitsrequired to encode a single qudit state where C ( · ) is someencoding function such that C ( | d (cid:105) ) = { g i } ˜ di =1 and its inverse¯ C ( { g i } ˜ di =1 ) = | d (cid:105) . Correlations between qudits are capturedby an n h unit hidden-layer with interconnected weights andbiases. Panel (b) illustrates the amplitude/phase machinethat uses two hidden-layers and only real valued parameters. itself. In particular we can construct an ansatz thatuses two RBMs that unify to represent a complete state.Deﬁning a variational phase state Φ Ξ ( s ), and amplitudestate Ψ Π ( s ), this network ansatz is given as [14] | Ψ Π , Ξ (cid:105) = (cid:88) s e i log Φ Ξ ( s ) Ψ Π ( s ) | s (cid:105) . (7)Therefore both the variational phase and amplitude net-works need only be real valued, since the complex/phaseproperties of the state are managed through the complexexponential. The state is now deﬁned by two parametersets, Π = { a k , b j , W kj } ∈ R and Ξ = { c k , d j , U kj } ∈ R . B. Mixed States

To extend the variational ansatz to mixed states re-quires the addition of a hidden mixing-layer with n m hidden units, capable of encoding the classical probabil-ity distribution of the mixed quantum state [19–21]. Thenetwork state can be constructed from two sets of vari-ational network parameters: Π = { c p , U kp } , c p ∈ R n m and U kp ∈ C n v × n m encoding the mixing probabilities [29]and the previously deﬁned Ξ = { a k , b j , W kj } ∈ C whichencodes the pure-state probability distribution. Let thedensity-matrix row and column degrees of freedom be de-scribed by basis vectors { α , β } respectively. As these pa-rameter sets are independent, we may describe a density-matrix element as a contribution from a classical mixingstate P Π and a pure-state σ Ξ . The contribution from aclassical mixing network is given by P α , β Π = n m (cid:89) p =1 cosh ( φ p ( α , β )) , (8) φ p ( α , β ) = c p + (cid:88) k U kp α k + U ∗ kp β k . (9)where x ∗ denotes complex conjugation. Meanwhile thepure-state contribution is σ α , β Ξ = e ω ( α , β ) n h (cid:89) j =1 cosh ( θ j ( α )) cosh ( θ ∗ j ( β )) , (10) ω ( α , β ) = (cid:88) k a k α k + a ∗ k β k , (11) θ j ( x ) = b j + (cid:88) k W kj x k . (12)The complete variational state can therefore be con-structed as a sum over all density-matrix elements, ρ Π , Ξ = (cid:88) α , β σ α , β Ξ · P α , β Π | α (cid:105)(cid:104) β | = P Π (cid:12) σ Ξ , (13)where (cid:12) is the Hadamard product. It is important to em-phasise that the classical mixing state P Π cannot capturequantum correlations, only classical correlations. Hencethe pure-state σ Ξ alone simulates the quantum correla-tions within the network state. This architecture is pre-sented in Fig. 2.The network parameters in this ansatz are necessarilycomplex, but one can create a reformulated ansatz inorder to use only real parameters. One could use the NNSused in Eq. (7) to learn a vectorised density-matrix ρ = | ρ Π , Ξ (cid:105) . Whilst optimal convergence towards the targetvectorised mixed state is possible in this way, the ansatzitself is neither Hermitian or positive semi-deﬁnite underreshaping to a density-matrix, i.e. ρ = vec − ( ρ ) is not avalid density-matrix.Instead one can restructure the mixed state ansatz inorder to take a closer form to the complex exponentialformat utilised in the previous sections. Let the real pa-rameter sets Ξ , Π be used to describe the pure-state phaseand amplitude networks respectively, and the complexparameter set Ω used to describe the mixing network.Recall a pure state wavefunction in complex exponen-tial form Ψ Π , Ξ ( α ) = e i log ϕ Ξ ( α ) σ Π ( α ). It is useful to α α k α n v β β k β n v c p b j b ∗ j | α i h β | mh α h β W kj W ∗ kj U ∗ kp U kp ... ... ... ......... ... ......... Figure 2. A restricted Boltzmann machine architecture forthe simulation of (generally entangled) density matrices usingcomplex parameters. deﬁne the following functions of our pure density-matrixphase/amplitude wavefunctionsΦ α , β Ξ = ϕ Ξ ( α ) ϕ Ξ ( β ) , Γ α , β Π = σ Π ( α ) σ Π ( β ) . (14)In order to incorporate the classical mixing we need amixing-layer that takes a similar vectorised form. Omit-ting the visible biases which are already possessed by thepure-states, the mixing-layer takes the form P α , β Ω = n m (cid:89) p =1 cosh( µ p + iψ p ) = n m (cid:89) p =1 r α , β p e i log ϑ α , β p , (15) µ p ( α , β ) = c p + (cid:88) k R kp ( α k + β k ) , (16) ψ p ( α , β ) = (cid:88) k I kp ( α k − β k ) , (17)where R kp = Re( U kp ) and I kp = Im( U kp ) denote thereal and imaginary components of the mixing networkrespectively. One can then construct the following phaseand amplitude functions for the classical mixing r α , β Ω = n m (cid:89) p =1 (cid:113) cosh( µ p + iψ p ) cosh( µ p − iψ p ) , (18) ϑ α , β Ω = n m (cid:89) p =1 exp (cid:20) i log (cid:18) − cosh( µ p + iψ p )cosh( µ p − iψ p ) (cid:19)(cid:21) , (19)such that the vectorised mixing state takes the form e i log | ϑ Ω (cid:105) | r Ω (cid:105) . This allows for any element of the com-plete mixed state to be expressed according to ρ α , β Ω , Π , Ξ = e i log ( Φ α , β Ξ ϑ α , β Ω )Γ α , β Π r α , β Ω . (20) II. SEPARABLE NEURAL NETWORKARCHITECTURESA. Separable Pure Network States

Through restrictions on the connectivity of the weightmatrix W kj , one can guarantee separability of the gen-erative network state. Let us deﬁne K as a collection of K -disjoint subsets K = { k l } Kl =1 , that collect qudit indicesfrom an n -qudit system. More precisely, K = K (cid:91) l =1 k l , s.t { , . . . , n } ⊆ K , (21) k m ∩ k l = ∅ , ∀ m (cid:54) = l ∈ { , . . . , n } . (22)In Eq. (21) we have demanded that the global partitionset necessarily contains all n -qudits in the system, andthat subsets of qudits are disjoint in Eq. (22). Hence,an n -qudit, pure-state | Ψ (cid:105) is deﬁned to be K -separableif it can expressed as a tensor-product of sub-states | Ψ (cid:105) = (cid:78) k ∈K | ψ k (cid:105) , i.e. it is separable with respect to thepartition set K . This is a very precise format of separabil-ity, as it precisely speciﬁes the arrangement of entangledparties. If we were to disregard speciﬁc party orderingswe would refer to ( |K| = K )-separability.Disjointedness in this deﬁnition of K -separability en-sures that each qudit is only entangled with respect to asingle subset of the quantum system. This provides a spe-ciﬁc level of detail to the entanglement structure, whilealso degenerating many forms of entanglement that wemay not be interested in. For example, genuine tripar-tite entanglement under disjoint K -separability allows foronly a single set K = { k } = { , , } with no partitions.We may then deﬁne non-disjoint K -separability as an ex-tension of the previous deﬁnition simply by removing theconditions in Eq. (22). Using this non-disjoint deﬁnition,genuine tripartite entanglement allows for many moredeﬁnitions, K = { , , } , { , | , } , { , | , | , } , . . . ,which is studied in later sections.To strictly impose either type of separability on anNNS, the goal is to express the wavefunction of the net-work state in the following formΨ Π ( s ) = K (cid:89) l =1 ψ k l Π ( s ) , (23)where ψ k l Π are separable sub-wavefunctions that describethe behaviour of qudits in the partition k l . We maythen construct an analogous hidden-layer partition set H = { h l } Kl =1 , which assigns a subset of hidden units toeach visible subset of entangled qudits K = { k l } Kl =1 . Bysegmenting the layer of hidden units into these K -subsetsand applying the following restriction to the weight ma-trix W ij = 0 for i ∈ k l , j / ∈ h l , ∀ l ∈ { , . . . , K } , (24)this condition then provides the complete, K -separablenetwork stateΨ Π |K ( s ) = K (cid:89) l =1 e ˜ ω l ( s ) (cid:89) j ∈ h l (cid:16) θ jl ( s ) (cid:17) ,θ jl ( s ) = (cid:88) i ∈ k l W ij s i + b j , ˜ ω l ( s ) = (cid:88) i ∈ k l a i s i . (25) (a) GHZ-type entanglement s s s (b) W-type entanglement s s s Figure 3. Diﬀerent pure-state network architectures used tosimulate genuine tripartite entanglement. Panel (a) depicts aform of GHZ-type entanglement according to the partition set K GHZ = { , | , } . Notice that qudits 1 and 3 do not possessa direct connection, but may relay correlations through qudit2. Panel (b) illustrates a non-disjoint, W-type entanglementstructure according to K = { , | , | , } . B. Separable Neural Network Density Matrices

Whilst pure-states are K -separable when they can beexpressed as the tensor product of |K| = k local sub-states, a mixed state possesses a form of separability iﬀit can be expressed as a convex combination of local sub-states ρ { k l } Kl =1 . It is now useful to deﬁne two distinctforms of separability; consistent and inconsistent mixed-multipartite separability.A state is consistently K -separable if it can be ex-pressed as a convex combination of states which all admitan identical form of separability, ρ K = (cid:88) j p j (cid:79) k ∈K ρ k j . (26)On the contrary, a state is inconsistently {K j } -separableif it is a mixture of states with diﬀerent entanglementproperties, ρ {K j } = (cid:88) j p j (cid:79) k ∈K j ρ k j , (27)so its entanglement properties are deﬁned by a combina-tion of constituent K j -separabilities. Precise classiﬁca-tion methods are much more diﬃcult for mixed states,however there are still some very useful approaches thatcan be introduced using NNS.Consistently K -separable states require a direct appli-cation of the separability conditions given by Eq. (24)onto the pure-state of the NNS. Since the mixing statecannot capture quantum correlations, it is already sep-arable and requires no restrictions. It is thus expedientto apply the separability conditions of Eq. (24) onto thepure-states of the mixed NNS, restricting the capacityof the neural network to simulate quantum correlations.Enforcing separability on the pure density-matrix in thisway σ α , β Ξ |K = K (cid:89) l =1 e ω l ( α , β ) (cid:89) j ∈ h l cosh (cid:16) θ jl ( α ) (cid:17) cosh (cid:16) θ j ∗ l ( β ) (cid:17) ,ω l ( α , β ) = (cid:88) i ∈ k l a i α i + a ∗ i β i , (28)thus provides a NNS guaranteed to be consistently K -separable ρ K Π , Ξ = P Π (cid:12) σ Ξ |K . (29)If one wishes to enforce complete separability such thatfor an n -qudit state ρ = (cid:80) j p j (cid:78) nm =1 ρ mj , one can ofcourse just apply consistent separability onto the net-work state via the separability set K = { | | , . . . , | n } inan identical manner as before. However, as the state iscompletely separable, there are no quantum correlationsand the pure-states in the network ansatz are not neces-sary for simulation of the state. It can then be simpliﬁedto ρ Π = P Π , and we can simulate completely separablemixed quantum systems using an RBM with a classicalmixing-layer only [30] ρ SepΠ = (cid:88) α , β e ω ( α , β ) n m (cid:89) p =1 cosh ( φ p ( α , β )) | α (cid:105)(cid:104) β | . (30)Unfortunately, it is not possible to strictly classifyan inconsistently separable mixed state according toansatzes discussed in this Section. Take the tripartiteexample ρ = (cid:88) j p j ρ { , | } j + (cid:88) k p k ρ { | , } k + (cid:88) m p m ρ { , | } m , (31)which can be thought of as “cheap” genuine tripartiteentangled state. We can certainly deﬁne an NNS thatcan reconstruct a state of this form (trivially, one canutilise a fully connected NNS that can reconstruct ρ );however we cannot specify all three forms of separabil-ity in ρ without also allowing the NNS to potentiallymanifest genuine, pure tripartite entanglement. One caninstead utilise independent consistently separable NNSaccording to the partitions { , | } , { , | } and { , | } in order to quantify the amount of entanglement in thetarget state with respect to each partition. III. CLASSIFYING AND QUANTIFYINGENTANGLEMENTA. Learning of Quantum States

We present a learning protocol for a pure NNS | Ψ Π , Ξ (cid:105) to reconstruct a target state | ϕ (cid:105) using the ansatz fromEq. (7), which is then extendible to mixed states. Weemploy a uniﬁed learning approach, where the variationalstate optimises the global, vectorised ﬁdelity with a tar-get state, rather than separate phase and amplitude ﬁ-delities. We may deﬁne the loss function as the negativelogarithmic ﬁdelity between two pure-states as a functionof our set of variational parameters L = − log (cid:115) | (cid:104) Ψ Π , Ξ | ϕ (cid:105) | (cid:104) Ψ Π , Ξ | Ψ Π , Ξ (cid:105) (cid:104) ϕ | ϕ (cid:105) . (32)Splitting these wavefunctions into respective phase andamplitude functions,Ψ Π , Ξ ( s ) = ψ Π ( s ) e i log( φ Ξ ( s )) , ϕ ( s ) = λ ( s ) e i log( ξ ( s )) , (33)we wish to compute the derivatives of the uniﬁed costfunction with respect to the parameter sets { Π , Ξ } . Sincethese wavefunctions utilise only real parameters, it is ex-pedient to compute the derivatives using the followingchain rule formulation, ∇ ψ Π k L = ∂ L ∂ | ψ Π (cid:105) · ∂ | ψ Π (cid:105) ∂ Π k , ∇ φ Ξ k L = ∂ L ∂ | φ Ξ (cid:105) · ∂ | φ Ξ (cid:105) ∂ Ξ k . (34)Computing these gradients will provide the necessary pa-rameter update rules at the m th iteration to the k th net-work parameter by gradient descent, taking the formΠ m +1 k = Π mk − η ∇ ψ Π k L , Ξ m +1 k = Ξ mk − η ∇ φ Ξ k L , (35)where η is some learning rate small enough such that thenetwork state converges to the target state over suﬃcientiterations of the learning scheme.Deﬁning the quantity∆( s ) = (cid:104) Ψ Π , Ξ | ϕ (cid:105) − e i log φ Ξ( s ) ξ ( s ) , (36)complete gradients with respect to variational parame-ters can therefore be computed as ∇ ψ Π k L = (cid:88) s (cid:34) ψ Π ( s ) | Ψ Π , Ξ | − λ ( s )Re (cid:2) ∆( s ) (cid:3)(cid:35) O Π k | ψ Π (cid:105) , (37) ∇ φ Ξ k L = − (cid:88) s (cid:34) λ ( s ) ψ Π ( s ) φ Ξ ( s ) Im (cid:2) ∆( s ) (cid:3)(cid:35) O Ξ k | φ Ξ (cid:105) , (38)where O Π k = diag ( ∂ Π k log | ψ Π (cid:105) ), O Ξ k = diag ( ∂ Ξ k log | φ Ξ (cid:105) )denote diagonal matrices containing the logarithmicderivatives of the network state with respect to the k th amplitude and phase network parameters respectively.Utilising Eq. (38) in the update rule given by Eq. (35),the phase and amplitude properties will optimise in auniﬁed manner, maximising the ﬁdelity between the net-work and the target state endowed with non-trivial phasestructure.Fortunately this learning procedure is readily extendedto mixed states via the ansatz in Eq. (20). Since the vari-ational state is in a complex exponential format, one thenformulates a cost function based on the ﬁdelity betweenthe vectorised density-matrix and the vectorised targetstate. The extension is straightforward and explained inAppendix A.As shown in Ref. [27] separable neural network statescan be used to perform entanglement classiﬁcation andprovide entanglement measures of pure, two-dimensionalquantum states. Using qudit sub-encoding and the mixedstate architectures discussed in the previous sections,these ideas can be extended to classiﬁcation of more com-plex quantum systems.Let us devise a precise decision rule for classiﬁcation.Consider a target n -qudit state σ , a K -separable learner ρ K Ω , and a free, entangled learner ρ EntΩ which have bothbeen optimised with respect to reconstructing σ . Us-ing the Bures ﬁdelity, F ( σ, ρ ) = Tr (cid:112) √ σρ √ σ , we denotethe reconstruction ﬁdelity of a learning process as theﬁnal/optimal ﬁdelity achieved after a given number oflearning iterations. A target σ is learnable via ρ EntΩ iﬀ itsreconstruction ﬁdelity satisﬁes F ( σ, ρ EntΩ ) ≥ F opt = 1 − (cid:15), (39)for a suﬃciently small threshold (cid:15) . The choice of F opt determines the reliability of classiﬁcation, and in our nu-merical experiments we ﬁx (cid:15) ≤ − . The accuracy ofthis reconstruction via free learning also benchmarks thesatisfactory computational resources required in the net-work, informing the separable reconstruction.One can reliably infer that a target state is K -separableif it is learnable by both a free NNS ( ρ EntΩ ), and a K -separable NNS ( ρ K Ω ). Then the NNS reconstruction ﬁ-delities must satisfy F ( σ, ρ K Ω ) ≥ F ( σ, ρ EntΩ ) ≥ F opt . (40)Otherwise, the state is entangled to a higher degree. Onemay then quantify the entanglement content of the targetby investigating the distance between σ and an approxi-mation to the closest K -separable state. B. Quantifying Entanglement

The most diﬃcult aspect of quantifying entanglementstems from the complicated nature of characterising thespace of separable quantum states. Thanks to the im-plicit guarantee of speciﬁc separability, SNNS oﬀer anextremely useful tool to help with this, and provide theopportunity to study a variety of entanglement measuresthat are otherwise much too diﬃcult to explore. Let us consider measures E that satisfy the generalproperties of a valid entanglement measure [31]. Manyimportant types of E are constructed as a geometric op-timisation problem with respect to the space of all fullyseparable states D Sep . That is, given a target state σ anda distance measure (possibly quasi-distance measure) f , E ( σ ) = min ρ ∈D Sep f ( σ, ρ ) , (41)if σ ∈ D Sep = ⇒ E ( σ ) = 0 , (42)if σ / ∈ D Sep = ⇒ E ( σ ) > . (43)These are entanglement measures which are computed bylocating the Closest Separable State (CSS) σ (cid:63) to σ , withrespect to the distance measure f . For such measures,the employment of SNNS to parameterise the separablestates ρ Ω ∈ D Sep is extremely useful, as it oﬀers an ef-ﬁcient way to perform this optimisation. Furthermore,since SNNS are inherently separable, they will alwaysapproximate an upper bound on E , since they are certi-ﬁably limited in the quantum correlations that they areable to simulate. This is, E ( σ ) ≤ E Ω ( σ ) = min ρ Ω ∈D Sep f ( σ, ρ Ω ) . (44)To generalise, we may construct a measure E K whichis analogous to E , but is deﬁned with respect to the spaceof all states which are at most K -separable. Deﬁning theset of all states that are K -separable as D K , then the setof all states that are at most K -separable is given by [32]˜ D K = D K (cid:91) |K (cid:48) | > |K| D K (cid:48) . (45)Assuming a measure of the form Eq. (41), then we candeﬁne E K ( σ ) = min ρ ∈ ˜ D K f ( σ, ρ ) ≤ E K Ω ( σ ) , (46)if σ ∈ ˜ D K = ⇒ E K ( σ ) = 0 , (47)if σ / ∈ ˜ D K = ⇒ E K ( σ ) > . (48) E K satisﬁes all the general properties of an entanglementmeasure, but now with respect to ˜ D K , and is thereforeable to classify/quantify more complex forms of entan-glement.Let us specify some important entanglement measureswhich SNNS can utilise, starting from the GeometricMeasure of Entanglement (GME) [33]. For pure-states,the GME is the maximum ﬁdelity that can be obtainedbetween a target state | σ (cid:105) and the set of pure, at most K -separable states ˜ B K E G ( | σ (cid:105) ) = max | ϕ (cid:105)∈ ˜ B K F ( | σ (cid:105) , | ϕ (cid:105) ) . (49)For more sophisticated mixed state approaches, it is ex-pedient to employ any number of density-matrix distancemeasures. Several important examples include the tracedistance E C ( σ ) = 12 min ρ ∈D Sep (cid:107) σ − ρ (cid:107) , (50)where (cid:107) X (cid:107) = Tr √ X † X or the Bures metric E B ( σ ) = min σ ∈D Sep (cid:2) − F ( ρ, σ ) (cid:3) , (51)where F is the Bures ﬁdelity as before. These quantitiesare readily approximated via SNNS, and easily speciﬁedto diﬀerent forms of K -separability.Of particular interest is the Relative Entropy of En-tanglement (REE) [28], an entanglement measure thathas many applications in quantum communications andchannel capacities [34]. The REE is based on the quan-tum relative entropy (QRE), a kind of distance measurebetween two quantum states where S ( ρ (cid:107) σ ) = Tr [ ρ (log ρ − log σ )] , (52)such that S ( ρ (cid:107) σ ) ∈ [0 , + ∞ ). Due to its asymmetry andthe fact that it is inﬁnite on pure-states, it is not a truemetric, however it is nonetheless extremely useful. Deﬁn-ing the REE then follows E R ( ρ ) = min σ ∈D Sep S ( ρ (cid:107) σ ) , (53)which can be readily employed with respect to param-eterised NNS. This can of course generalise to E K R ( σ )given a form of separability. Interestingly, the REE issub-additive and in general E R ( ρ ⊗ σ ) ≤ E R ( ρ ) + E R ( σ ) . (54)This lets us deﬁne a regularised n -shot REE E nR ( ρ ) = 1 n min σ ∈D Sep S ( ρ ⊗ n (cid:107) σ ) ≤ E R ( ρ ) . (55)The single-shot, standard REE alone is an extremely dif-ﬁcult quantity to compute, largely due to the charac-terisation of D Sep and the unruliness of the QRE. Itscomputation has recently been explored using an activelearning strategy [35], in which the authors use activelearning to compress D Sep into a more relevant subsetof the separable state space that contributes strongly tothe REE. Thanks to the implicit separability of NNS, wemay choose an alternative approach where it is possible tooptimise some other cost function such as ﬁdelity/tracedistance that will simultaneously minimise the QRE to-wards the optimal REE. In doing so, SNNS should allowfor the accurate and eﬃcient approximation of E R , andpreviously unexplored REEs with respect to other formsof separability E K R . . . ρ SepΩ ≈ % ?η, − . − . . . . ρ EntΩ ≈ % η, ,

000 4 ,

000 6 ,

000 8 ,

000 10 , . . . . Iterations E R ( % η , ) ρ EntΩ ρ SepΩ

Figure 4. The classiﬁcation and entanglement quantiﬁcationof a d = 5 Werner state (cid:37) η,d , deﬁned in Eq. (56) for η = − . (cid:15) < − precision of the known analytical value E R ( (cid:37) η,d ) ≈ . ρ SepΩ ≈ (cid:37) (cid:63)η, and target state approxi-mations are also shown. IV. APPLICATIONS AND RESULTSA. Mixed States in d -dimensions The most substantial generalisation of the methods in-troduced in Ref. [27] is the ability to classify and quan-tify entanglement in mixed, d -dimensional states. Toillustrate this improvement, consider the d -dimensionalWerner state, parameterised by (cid:37) η,d = ( d − η ) I ⊗ d + ( dη − F d d ( d − , (56)where F d = (cid:80) d − i,j =0 | ij (cid:105)(cid:104) ji | is the two-qudit ﬂip operator, I d is the d -dimensional identity operator, and η char-acterises the entanglement properties of the state. For η ∈ [ − ,

0] the state is entangled, and we can easily quan-tify this entanglement using the analytically known REE[36], E R ( (cid:37) η,d ) = 1 + η (1 + η ) + 1 − η (1 − η ) . (57)In Fig. 4 we display an optimisation procedure for d =5 , η = − .

75 using an entangled learner ρ EntΩ and a fullyseparable learner ρ SepΩ . The free, entangled learner is ableto reconstruct the target Werner state with ease, and anextremely high ﬁdelity, while the fully separable learnercorrectly classiﬁes the target as entangled.Beyond the obvious entanglement classiﬁcation, theSNNS is able to quantify the REE of the state, bymonitoring the relative entropy E Ω R ( (cid:37) η,d ) = S ( (cid:37) η,d (cid:107) ρ SepΩ )throughout the learning process. As the optimisationconverges, E Ω R → E R , we gather an approximation to theREE of the state. Indeed, under typical optimisation set-tings, the REE is approximated to within (cid:15) < − preci-sion of the known analytical value E R ( (cid:37) − . , ) ≈ . B. Classiﬁcation of Bound Entangled States

The positivity of a partially transposed quantum sys-tem can be a signature of separability. However it isnot universal, and there exist classes of states which arePPT but are entangled, known as bound entangled (BE)states. Here we consider the following two-qutrit state, σ + = −

13 ( | (cid:105)(cid:104) | + | (cid:105)(cid:104) | + | (cid:105)(cid:104) | ) ,σ − = 13 ( | (cid:105)(cid:104) | + | (cid:105)(cid:104) | + | (cid:105)(cid:104) | ) ,σ α = 27 | Φ + (cid:105)(cid:104) Φ + | + α σ + + 5 − α σ − , (58)where | Φ + (cid:105) = √ ( | (cid:105) + | (cid:105) + | (cid:105) ) is a d = 3 dimensionalBell state. This state is known to satisfy the followingentanglement properties [37]: σ α is  Separable if 2 ≤ α ≤ , Bound Entangled if 3 < α ≤ , Free Entangled if 4 < α ≤ . (59)Here we investigate the target state in the bound entan-gled region, and show that this bipartite state cannotbe optimally reconstructed via SNNS. Fig. 5 depicts theemployment of entangled learners ρ EntΩ (blue), and fullyseparable learners ρ SepΩ (red) to reconstruct σ α across thedomain 3 < α ≤ α , ρ EntΩ is able to reconstruct the stateto a high degree of precision such that the trace distanceis (cid:107) σ α − ρ EntΩ (cid:107) ≤ − . However, the separable learnersare unable to reach this level of reconstruction accuracy.Hence, since σ α are learnable via free NNS, the inabilityof ρ SepΩ to reconstruct σ α implies that these states areentangled in this region. Since they are also PPT in thisregion, we have successfully shown the ability of SNNSto classify bound entanglement.During each constrained optimisation we gather an up-per bound on the distance between the target bound en-tangled state, and its CSS. As said before, this is an up-per bound since ρ SepΩ oﬀers an approximation to the CSS,and is potentially loose. Nonetheless the inferred classi-ﬁcation is informative. Fig. 5 plots the trace distance (cid:107) σ α − ρ SepΩ (cid:107) , shown to steadily rise as α increases, whichis expected as σ α becomes freely entangled for 4 < α ≤ . . . . ρ SepΩ ≈ σ ? . . . . . ρ EntΩ ≈ σ . . . . . . . . . α E C ( σ α ) ρ Sep Ω ρ Ent Ω Figure 5. Bound entangled state classiﬁcation. Entangledlearners ρ EntΩ (blue) are used to conﬁrm the learnability ofthe target bound entangled state via NNS. Separable learn-ers ρ SepΩ (red) and then used to classify the target state, andapproximate an upper bound on the trace distance from theCSS, σ (cid:63)α . Here we illustrate density matrices of the approxi-mate CSS, and the target state for α = 3 . C. Detection and Measurement of MultipartiteEntanglement

The versatility of the K -separable state design meansthat we can explore entanglement classiﬁcation and quan-tiﬁcation methods that are otherwise very diﬃcult. Inparticular, we may construct a NNS protocol that isable to witness W/GHZ-state entanglement, and mea-sure W/GHZ-type correlations in both pure and mixedquantum states. Consider the three-qubit W and GHZstates respectively [38, 39] | W (cid:105) = 1 √ | (cid:105) + | (cid:105) + | (cid:105) ) , | GHZ (cid:105) = 1 √ | (cid:105) + | (cid:105) ) . These are both maximally entangled three party states.However they possess two inequivalent forms of tripar-tite entanglement, such that | W (cid:105) cannot be transformedinto | GHZ (cid:105) by means of LOCC (local operations andclassical communications) strategies. The key diﬀer-ence in these forms of entanglement is their robust-ness i.e. when a party is removed from a GHZ statethe remaining states are separable, whilst a W-stateremains entangled. Therefore a W-state possessesstrict bipartite entanglement between all three parties,whereas GHZ entanglement can be achieved via “relayedentanglement” [40].To classify between these states, we must deﬁne a par-tition set that is capable of capturing GHZ correlations,but incompletely capture W-type correlations. The non-disjoint separability set K W = { , | , | , } , (60)is capable of learning both W and GHZ entangled states,as it strictly speciﬁes bipartite entanglement between all (a) Classiﬁcation of | W (cid:105) , (b) σ p W / σ p GHZ for p = , (c) REE for E D ( σ pW ). − − − − − Iterations E G ( | W i ) | Ψ S GHZ Ω i| Ψ S W Ω i ,

000 10 ,

000 15 ,

000 20 , . . . . Iterations E C ( σ p W / G H Z ) k σ p W − ρ S W Ω k k σ p W − ρ S GHZ Ω k k σ p GHZ − ρ S GHZ Ω k k σ p GHZ − ρ S W Ω k . . . . . . . . . p E R ( σ p W ) E W R E Gen R E R Figure 6. Classiﬁcation and quantiﬁcation of d = 2 W/GHZ type entanglement using NNS. Panel (a) shows the classiﬁcationof W-type entanglement using two NNS designed according to the partition sets K GHZ = { , | , } and K W = { , | , | , } .If a variational state endowed with K W -separability can optimally reconstruct a target that K GHZ cannot, then it must possessW-type entanglement. In turn, we locate the closest GHZ-entangled state to | W (cid:105) . In Panel (b) this is extended to mixed,depolarised W/GHZ-states for p = . Panel (c) depicts diﬀerent versions of the REE upper bounds on a depolarised W-state σ p W with respect to depolarising probability. Here we plot three types of REE: The fully separable REE E R (red), the genuinetripartite REE E Gen R (green) and the strictly W-type entanglement REE E W R (blue). parties. However, one can construct the partition set K GHZ = { i, j | i, k } , i (cid:54) = j (cid:54) = k ∈ { , , } , (61)which is any possible permutation of two subsets of K W .Programming a NNS according to K GHZ does not allowthe network to capture direct correlations between qubits j and k , and will therefore provide an insuﬃcient ansatzto reconstruct W-states. This forms a witness for W-typeentanglement; if a target state is learnable via a NNSendowed with K W -separability, but is not learnable via K GHZ -separability, then the state is veriﬁed as possessingW-type entanglement. Furthermore, by constructing en-tanglement measures E K GHZ Ω we are able to measure theamount of W-type correlations within a target state.Figure 6(a) shows the pure-state classiﬁcation of athree-qubit W-state, where the non-disjoint network ar-chitectures perform classiﬁcation easily. Note that thesethree-qubit partitions can be analogously embedded intolarger, n -qudit systems in order to study more complexforms of entanglement.Realistically, multipartite entangled resources for fu-ture quantum communication/computing protocols willbe noisy and imperfect. Generating and distributing mul-tipartite entanglement over noisy quantum channels isfundamental for many future quantum technologies, par-ticularly for secure communications and quantum net-works [41–48]. Therefore it is a more interesting challengeto consider the classiﬁcation and quantiﬁcation of tripar-tite entanglement subject to decoherence. For instance,one can consider versions of | W (cid:105) / | GHZ (cid:105) in which eachqudit has been passed through a depolarising channel E D ( ρ ) = (1 − p ) ρ + pd n I ⊗ nd , (62)where n denotes the number of qudits being acted on(in this case n = 3). We denote these noisy, three-qubit states as σ p W = (1 − p ) | W (cid:105)(cid:104) W | + p I ⊗ , (63) σ p GHZ = (1 − p ) | GHZ (cid:105)(cid:104)

GHZ | + p I ⊗ . (64)Using mixed NNS programmed with diﬀerent separabili-ties, we may then easily distinguish between the entangle-ment properties of noisy W/GHZ-states subject to depo-larising channels. Indeed, Fig. 6(b) shows that for p = we can perform this classiﬁcation. Given two learners ρ K W Ω and ρ K GHZ Ω , it is clear that both are able to opti-mally reconstruct the noisy GHZ-state, whilst only ρ K W Ω is able to optimally reconstruct the noisy W-state, com-pleting the classiﬁcation.This is taken a step further in Fig. 6(c) where diﬀerentversions of the REE of σ p W is monitored for various de-polarising probabilities. This plot describes three formsof REE: • The standard E R (red) deﬁned on the space ofall fully separable states (using the partition set K FS = { | | } ) which measures the amount of anyentanglement present. • The genuine tripartite entangled REE, E Gen R (green), using the bi-separable partition sets K BS = { i, j | k } , i (cid:54) = j (cid:54) = k ∈ { , , } , which measures theamount of genuine tripartite entanglement in thestate (W or GHZ correlations). • The W-REE, E W R (blue) using the partition set K GHZ in Eq. (61), which measures the amount ofgenuine, tripartite, strictly W-type entanglementwithin the state.By employing more complex separable architectures, wemay study how diﬀerent forms of entanglement behave0with respect to environmental properties, such as depo-larisation. By measuring E Gen R and E W R for instance, wemay monitor the decoherence of genuine tripartite en-tanglement, rather than any entanglement as done soby E R . Such characterisations could prove very usefulin communication/networking scenarios, where genuinemultipartite entanglement is critical to performance.It is important to remind the reader that these are up-per bounds. The standard REE upper bound is expectedto be tight, as fully separable NNS architectures preciselycapture full separability. However, K BS and K GHZ are de-generate, e.g. K BS = { i, j | k } has 3 unique forms. Sincemixed SNNS are restricted to consistent separabilities,there may be convex combinations of states of these sep-arabilities that produce tighter bounds. It is unknownif this is the case, nonetheless E Gen R and E W R provide in-formative upper bounds on these unique entanglementmeasures. D. Ultimate Limits for Channel Capacities

We may provide a more practical example for the useof SNNS in the realm of quantum communications, usingthem to approximate upper bounds of quantum chan-nel capacities. Introduced in Ref. [34], the Pirandola-Laurenza-Ottaviani-Banchi (PLOB) bound is an ulti-mate upper bound on the two-way assisted quantum (andsecret-key) capacity C ( E ) for a given quantum channel E .Its derivation is based on the techniques of channel simu-lation and teleportation stretching, which have proven tobe extremely versatile in a number of settings [42, 49–54].An essential class of quantum channels are those whichare teleportation covariant, meaning that they satisfy thecondition E ( U ρU † ) = V E ( ρ ) V † , (65)for some pair of teleportation unitaries { U, V } . Let usdeﬁne the Choi matrix of a d -dimensional channel E asthe result of passing one mode of a maximally entangledstate Φ + through the E , and the other through an iden-tity channel I ρ E = I ⊗ E [Φ + ] , (66)where the maximally entangled state may take the formΦ + = d (cid:80) d − i,j =0 | ii (cid:105)(cid:104) jj | . For teleportation covariant chan-nels, the ultimate channel capacity can then be upperbounded in a remarkably simple way [34] C ( E ) ≤ E nR ( ρ E ) ≤ E R ( ρ E ) , (67)where E R is the standard relative entropy of entangle-ment (and E nR its n -shot version). SNNS can be used toapproximate upper bounds on these channel capacities,via constrained reconstruction of the Choi state of thedesired quantum channel.We consider two important, teleportation covariant, d -dimensional quantum channels in an eﬀort to illustrate (a) Depolarising Channel . . . . . . p C ( E D ) d = 4 d = 3 d = 2 (b) Holevo Werner Channel − − . − . − . − . . . . . ,

000 10 ,

000 15 ,

000 20 , . . . . Iterations E R SNNSExact η C ( E H W ) d = 3 , n = 1 d = 3 , n = 2 Figure 7. PLOB channel capacity upper bounds computed viaseparable neural network states. Continuous plots are exact,while the scatter plots are SNNS data. Panel (a) displays thecommunication capacities for d = 2 , , p , using mixed, qudit SNNS ansatzes. Panel (b) depicts thecapacity for Holevo-Werner (HW) qutrit channels. The net-work states approximate the REE to a typical accuracy of (cid:15) < − , hence reproducing the capacities to a very highdegree of precision. the eﬀectiveness of our approach: The depolarising chan-nel considered in Eq. (62), and the Holevo-Werner chan-nel [55–57]. The Choi states of these channels are theclasses of isotropic states and Werner states respectively,whose REE bounds are known analytically. Therefore,we can compare the numerical performance of comput-ing the REE via SNNS with the known, exact bounds.Fig. 7(a) reports REE bounds on the capacity of depo-larising channels for dimensions d = 2 , ,

4. Approximat-ing these bounds via separable network states requiresthe targeted reconstruction of the isotropic state, ρ E D = (1 − p )Φ + + pd I ⊗ d . (68)1Using a bipartite SNNS ρ SepΩ , and attempting to learn thetarget Choi state leads to an approximation of the REEof said state. Performing this optimisation for many de-polarising probabilities p , the results in Fig. 7(a) can beproduced. This is be achieved to a very high degree ofaccuracy, reproducing the analytical bounds with an av-erage error ∼ (cid:15) < − . Furthermore, these bounds canbe computed very eﬃciently by performing each optimi-sation sequentially, initialising the network parametersusing the results of previous optimisations [58].In Fig. 7(b) we give REE upper bounds for the HWchannel, which takes the form E η,d HW ( ρ ) = ( d − η ) I ⊗ d + ( dη − ρ T d − , (69)such that T superscript denotes the transposition. TheChoi state of the HW channel is the d -dimensionalWerner state, introduced in Eq. (56). The single shotREE bounds for the HW channel are analytically knownand given in Eq. (57), and are independent of dimension d . Again, this single shot bound is approximated to agood precision, as shown in the results.For Werner states of dimension d >

2, their REE isknown to be strictly sub-additive when η < − d , and pre-vious studies have explored the two-shot REE for theseChoi states [56], which can therefore be used to tightenthese upper bounds. For instance, in Fig. 7(b) the two-shot capacity can be seen to signiﬁcantly tighten thebounds for d = 3. In order to compute these tighterbounds, one must modify the deﬁnition of the n -shotquantities slightly. Now the minimisation is performedwith respect to the space of all locally bi-separable states.Consider the n -copy Werner state, and let us label eachcopy with indices of its modes { i, j } , (cid:37) ⊗ nη,d = (cid:37) { , } η,d ⊗ (cid:37) { , } η,d ⊗ . . . ⊗ (cid:37) { n − , n } η,d . (70)The goal is now to ﬁnd the CSS that possesses the fol-lowing bi-separability σ n = σ { , , ,..., n − } a ⊗ σ { , , ,..., n } b , (71)where we have permuted the labels into a bi-separabledecomposition such that each state belongs to exclusivelyeven or odd mode labels. This corresponds to a situationwhere two users each possess n local modes, and theirgoal is to produce the closest state to (cid:37) ⊗ nη,d that is bi-separable between them. In general this is a very diﬃculttask, and while beyond the scope of this paper, poses asan interesting future application for SNNS. V. CONCLUSIONS AND OUTLOOK

We have generalised the concept of NNS with pro-grammable separability to mixed, d -dimensional quan- tum states. We discussed a number of neural networkarchitectures for the description of quantum states, anddetailed how their entanglement properties may be con-trolled via constraints placed on network connectivity.It was shown that network connectivity controls entan-glement structure on a very speciﬁc level, requiring dis-tinctions between certain forms of entanglement. Outlin-ing one of many possible optimisation protocols, methodsof classiﬁcation and quantiﬁcation via SNNS have beenlogically developed, and applied in a number of impor-tant settings. We then studied a practical application ofthese tools in the bounding of ultimate quantum channelcapacities, showing that they can reproduce the PLOBbounds for DV channels with high precision.There are a number of valuable future directions inwhich SNNS may be explored and expanded. Whilean optimisation scheme based on the vectorised ﬁdelityis eﬀective for a variety of applications (as shown inthis work) more sophisticated optimisation protocolscould enhance performance for more speciﬁc entangle-ment measures. In particular, a gradient descent methodthat directly minimises the relative entropy (or some vari-ant thereof) would provide a more eﬀective computationof the REE for complex states. This would also lend wellto the study of n -shot REE quantities with applicationsin quantum channel capacities, and the characterisationof more complex bound entangled states (such as thoseconstructed from un-extendible product bases). Combin-ing these tools with those from practical quantum tomog-raphy could also be extremely useful, e.g. where SNNSmay be used to certify the eﬀectiveness an entanglementdistribution protocol. ACKNOWLEDGMENTS

C.H acknowledges funding from the EPSRC via aDoctoral Training Partnership (EP/R513386/1). MPacknowledges the H2020-FETOPEN-2018-2020 projectTEQ (grant nr. 766900), the DfE-SFI Investiga-tor Programme (grant 15/IA/2864), COST ActionCA15220, the Royal Society Wolfson Research Fellowship(RSWF \ R3 \ [1] O. G¨uhne and G. T´oth, Physics Reports , 1 (2009).[2] R. Horodecki, P. Horodecki, M. Horodecki, andK. Horodecki, Rev. Mod. Phys. , 865 (2009).[3] M. A. Nielsen and I. L. Chuang, Quantum Computationand Quantum Information: 10th Anniversary Edition ,10th ed. (Cambridge University Press, USA, 2011).[4] V. Vedral, M. B. Plenio, M. A. Rippin, and P. L. Knight,Phys. Rev. Lett. , 2275 (1997).[5] V. Vedral and M. B. Plenio, Phys. Rev. A , 1619(1998).[6] M. B. Plenio and S. Virmani, Quant. Inf. Comput. , 1(2007).[7] A. Peres, Phys. Rev. Lett. , 1413 (1996).[8] P. Horodecki, Physics Letters A , 333 (1997).[9] G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld,N. Tishby, L. Vogt-Maranto, and L. Zdeborov´a, Rev.Mod. Phys. , 045002 (2019).[10] K. Bharti, T. Haug, V. Vedral, and L.-C. Kwek, AVSQuantum Science , 034101 (2020).[11] J. Carrasquilla, Advances in Physics: X , 1797528(2020).[12] G. Carleo and M. Troyer, Science , 602 (2017).[13] R. G. Melko, G. Carleo, J. Carrasquilla, and J. I. Cirac,Nature Physics , 887 (2019).[14] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer,R. Melko, and G. Carleo, Nature Physics , 447 (2018).[15] G. Torlai and R. G. Melko, Phys. Rev. Lett. , 240503(2018).[16] E. S. Tiunov, V. V. T. (Vyborova), A. E. Ulanov, A. I.Lvovsky, and A. K. Fedorov, Optica , 448 (2020).[17] I. J. S. De Vlugt, D. Iouchtchenko, E. Merali, P.-N. Roy,and R. G. Melko, Phys. Rev. B , 035108 (2020).[18] D. Yuan, H. Wang, Z. Wang, and D.-L. Deng,arXiv:2009.00019 (2020).[19] M. J. Hartmann and G. Carleo, Phys. Rev. Lett. ,250502 (2019).[20] A. Nagy and V. Savona, Phys. Rev. Lett. , 250501(2019).[21] F. Vicentini, A. Biella, N. Regnault, and C. Ciuti, Phys.Rev. Lett. , 250503 (2019).[22] N. Yoshioka and R. Hamazaki, Phys. Rev. B , 214306(2019).[23] B. J´onsson, B. Bauer, and G. Carleo, arXiv:1808.05232(2018).[24] M. Medvidovic and G. Carleo, arXiv:2009.01760 (2020).[25] G. Torlai, G. Mazzola, G. Carleo, and A. Mezzacapo,Phys. Rev. Research , 022060 (2020).[26] D.-L. Deng, X. Li, and S. Das Sarma, Phys. Rev. X ,021021 (2017).[27] C. Harney, S. Pirandola, A. Ferraro, and M. Paternostro,New Journal of Physics , 045001 (2020).[28] V. Vedral, Rev. Mod. Phys. , 197 (2002).[29] The mixing hidden biases c p are set as real, while the net-work weights U kp are complex. This is just a simpliﬁca-tion, since imaginary components of c p are negated in thenetwork output functions. If c p were complex, it would beclear in Eq. (9) that only the real components would beutilised, since φ p ( α , β ) = c p + c ∗ p + (cid:80) k U kp α k + U ∗ kp β k =2Re( c p ) + (cid:80) k U kp α k + U ∗ kp β k .[30] We are also free to combine visible layer biases a into theparameter set Π, since they are inherently local and thus invoke separable correlations.[31] V. Vedral, M. B. Plenio, M. A. Rippin, and P. L. Knight,Phys. Rev. Lett. , 2275 (1997).[32] Note that in Eq. (45) the union runs over all |K (cid:48) | > |K| .This is necessarily a strict inequality. Suppose |K| = k such that K describes a form of k -separability. The set ofall K -separable states thus inherits all states which are( l < k )-separable, but there are other forms of |K (cid:48) | = k separability which it will not inherit. This is true fordisjoint and non-disjoint constructions, however in somenon-disjoint cases these sets coincide.[33] T.-C. Wei and P. M. Goldbart, Phys. Rev. A , 042307(2003).[34] S. Pirandola, R. Laurenza, C. Ottaviani, and L. Banchi,Nature Communications , 15043 (2017).[35] S.-Y. Hou, C. Cao, D. L. Zhou, and B. Zeng, QuantumScience and Technology , 045019 (2020).[36] K. G. H. Vollbrecht and R. F. Werner, Phys. Rev. A ,062307 (2001).[37] P. Horodecki, M. Horodecki, and R. Horodecki, Phys.Rev. Lett. , 1056 (1999).[38] W. D¨ur, G. Vidal, and J. I. Cirac, Phys. Rev. A ,062314 (2000).[39] D. M. Greenberger, M. A. Horne, A. Shimony, andA. Zeilinger, American Journal of Physics , 1131(1990).[40] We refer to relayed entanglement as that which is causedindirectly through mutually entangled parties. For exam-ple, if K = { , | , } , entanglement is indirect betweenparties 1 and 3 - it is relayed .[41] S. Pirandola et al. , Advances in Optics and Photonics ,1012 (2020).[42] S. Pirandola, Communications Physics , 51 (2019).[43] S. Pirandola, Quantum Science and Technology , 045006(2019).[44] S. Pirandola, IET Quantum Communication , 22(2020).[45] F. Grasselli, H. Kampermann, and D. Bruß, New Journalof Physics , 123002 (2019).[46] G. Murta, F. Grasselli, H. Kampermann, and D. Bruß,Advanced Quantum Technologies , 2000025 (2020).[47] V. Lipinska, G. Murta, and S. Wehner, Phys. Rev. A , 052320 (2018).[48] A. Unnikrishnan, I. J. MacFarlane, R. Yi, E. Diamanti,D. Markham, and I. Kerenidis, Phys. Rev. Lett. ,240501 (2019).[49] S. Pirandola, S. L. Braunstein, R. Laurenza, C. Otta-viani, T. P. W. Cope, G. Spedalieri, and L. Banchi,Quantum Science and Technology , 035009 (2018).[50] S. Pirandola and C. Lupo, Phys. Rev. Lett. , 100502(2017).[51] R. Laurenza, C. Lupo, S. Lloyd, and S. Pirandola, Phys.Rev. Research , 023023 (2020).[52] R. Laurenza, C. Lupo, G. Spedalieri, S. L. Braunstein,and S. Pirandola, Quantum Measurements and QuantumMetrology , 1 (2018).[53] S. Pirandola, R. Laurenza, C. Lupo, and J. L. Pereira,npj Quantum Information , 50 (2019).[54] L. Banchi, J. Pereira, S. Lloyd, and S. Pirandola, npjQuantum Information , 42 (2020).[55] R. F. Werner and A. S. Holevo, Journal of Mathematical Physics , 4353 (2002),https://doi.org/10.1063/1.1498491.[56] T. P. W. Cope, K. Goodenough, and S. Pirandola,Journal of Physics A: Mathematical and Theoretical ,494001 (2018).[57] T. P. W. Cope and S. Pirandola, Quantum Measurementsand Quantum Metrology , 44 (2017).[58] A scenario in which eﬃciency can be greatly enhanced,is the the study of evolving states. Consider the resultsfrom Fig. 5-7. In a number of instances, we are clas-sifying/quantifying the entanglement of a target statewhich is changing incrementally (and by a small amount)throughout an interval. Consider a NNS ρ Ω that learnsa state σ . It is logical to assume that if the target stateis evolved by some small amount, σ (cid:48) = σ + δσ , the net-work Ω will only need to be optimised by a small amountΩ (cid:48) = Ω + δ Ω. Therefore, when studying evolving targetstates, it is extremely useful to initialise each state us-ing the parameter distribution of the previous learner.This not only simpliﬁes learning and performance, butincreases eﬃciency dramatically; the initial target canbe reconstructed over a number of optimisation steps S ,but subsequent alterations to the network only require afraction of S steps. Appendix A: Learning with Complex-ExponentialAnsatz for Mixed States

As discussed in Section I B, one can make use of a re-structuring of the mixed state ansatz into complex expo-nential form in order to take better control of the learn-ing procedure. Indeed, the total mixed state can be ex-pressed as ρ α , β Ω , Π , Ξ = e i log(Φ Ξ ( α , β ) ϑ Ω ( α , β )) Γ Π ( α , β ) r Ω ( α , β ) , (A1)such that the state is constructed from three variationalparameter sets, where r Ω and Γ Π assume responsibilityfor the magnitude of any element of the density-matrix,while functions Φ Ξ and ϑ Ω are responsible for the com- plex phase of such elements. Consider a target state χ which also admits the following decomposition χ α , β = λ ( α , β ) e i log ξ ( α , β ) . (A2)The pure density-matrix phase/amplitude functionsΦ Ξ and Γ Π respectively, are parameterised by real val-ued parameter sets. Furthermore, they are decomposedwith respect to their pure-state wavefunctions, as shownin Eq. (14). The logarithmic derivatives of the pair ofpure-state phase functions take the form ∂ log | Φ Ξ (cid:105) ∂ Ξ k = (cid:88) α , β (cid:18) ∂ log ϕ ( α ) ∂ Ξ k − ∂ log ϕ ( β ) ∂ Ξ k (cid:19) , (A3)while the amplitude function derivatives are ∂ log | Γ Π (cid:105) ∂ Π k = (cid:88) α , β (cid:18) ∂ log σ ( α ) ∂ Π k + ∂ log σ ( β ) ∂ Π k (cid:19) . (A4)Meanwhile, the mixing state phase/amplitude wavefunc-tions ϑ Ω and r Ω respectively are based on complex pa-rameters. In this case, it is expedient to take deriva-tives with respect to real and imaginary components,i.e. ∂ log | r Ω (cid:105) ∂ Re(Ω k ) , ∂ log | r Ω (cid:105) ∂ Im(Ω k ) , ∂ log | ϑ Ω (cid:105) ∂ Re(Ω k ) and ∂ log | ϑ Ω (cid:105) ∂ Re(Ω k ) which can betreated separately. All these derivatives take real, com-pact and easily derived forms with respect to the neu-ral network parameters, making gradient computationsstraightforward.The learning procedure of minimising the negativelogarithmic ﬁdelity between a target vectorised density-matrix | χ (cid:105) and the mixed NNS is given by the usual up-date rule in Section III. Deﬁning the quantity∆( α , β ) = (cid:104) ρ Ω , Π , Ξ | χ (cid:105) − e i log ΦΞ( α , β ) ϑ Ω( α , β ) ξ ( α , β ) , (A5)where (cid:104) ρ Ω , Π , Ξ | χ (cid:105) is the vectorised overlap between thevariational and target state, we can then make use of thefollowing gradients, ∇ Γ Π k L = (cid:88) α , β (cid:34) r ( α , β )Γ Π ( α , β ) | ρ Ω , Π , Ξ | − λ ( α , β ) r Ω ( α , β ) Re [∆( α , β )] (cid:35) · O Π k | Γ Π (cid:105) , (A6) ∇ r Ω k L = (cid:88) α , β (cid:34) Γ ( α , β ) r Ω ( α , β ) | ρ Ω , Π , Ξ | − λ ( α , β ) Γ Π ( α , β ) Re [∆( α , β )] (cid:35) · O Ω r k | r Ω (cid:105) , (A7) ∇ Φ Ξ k L = − (cid:88) α , β (cid:34) r Ω ( α , β ) λ ( α , β ) Γ Π ( α , β )Φ Ξ ( α , β ) Im [∆( α , β )] (cid:35) · O Ξ k | Φ Ξ (cid:105) , (A8) ∇ ϑ Ω k L = − (cid:88) α , β (cid:34) r Ω ( α , β ) λ ( α , β ) Γ Π ( α , β ) ϑ Ω ( α , β ) Im [∆( α , β )] (cid:35) · O Ω ϑ k | ϑ Ω (cid:105) . (A9)Here, | ρ Ω , Π , Ξ | is the magnitude of the vectorised density- matrix. Furthermore O Ω r k = diag ( ∂ Ω k log | r Ω (cid:105) ) and4 O Ω ϑ k = diag ( ∂ Ω k log | ϑ Ω (cid:105)(cid:105)