[PDF] Full randomness from arbitrarily deterministic events

Abstract

Do completely unpredictable events exist in nature? Classical theory, being fully deterministic, completely excludes fundamental randomness. On the contrary, quantum theory allows for randomness within its axiomatic structure. Yet, the fact that a theory makes prediction only in probabilistic terms does not imply the existence of any form of randomness in nature. The question then remains whether one can certify randomness independent of the physical framework used. While standard Bell tests approach this question from this perspective, they require prior perfect randomness, which renders the approach circular. Recently, it has been shown that it is possible to certify full randomness using almost perfect random bits. Here, we prove that full randomness can indeed be certified using quantum non-locality under the minimal possible assumptions: the existence of a source of arbitrarily weak (but non-zero) randomness and the impossibility of instantaneous signalling. Thus we are left with a strict dichotomic choice: either our world is fully deterministic or there exist in nature events that are fully random. Apart from the foundational implications, our results represent a quantum protocol for full randomness amplification, an information task known to be impossible classically. Finally, they open a new path for device-independent protocols under minimal assumptions.

Full PDF

FFull randomness from arbitrarily deterministic events

Rodrigo Gallego, Lluis Masanes, Gonzalo De La Torre, Chirag Dhara, Leandro Aolita, and Antonio Ac´ın

1, 2 ICFO-Institut de Ciencies Fotoniques, Av. Carl Friedrich Gauss, 3, 08860 Castelldefels, Barcelona, Spain ICREA-Instituci´o Catalana de Recerca i Estudis Avanc¸ats, Llu´ıs Companys 23, 08010 Barcelona, Spain

Do completely unpredictable events exist in nature? Classical theory, being fully deterministic, completelyexcludes fundamental randomness. On the contrary, quantum theory allows for randomness within its axiomaticstructure. Yet, the fact that a theory makes prediction only in probabilistic terms does not imply the existence ofany form of randomness in nature. The question then remains whether one can certify randomness independentof the physical framework used. While standard Bell tests [1] approach this question from this perspective, theyrequire prior perfect randomness, which renders the approach circular. Recently, it has been shown that it ispossible to certify full randomness using almost perfect random bits [2]. Here, we prove that full randomnesscan indeed be certiﬁed using quantum non-locality under the minimal possible assumptions: the existence of asource of arbitrarily weak (but non-zero) randomness and the impossibility of instantaneous signalling. Thuswe are left with a strict dichotomic choice: either our world is fully deterministic or there exist in nature eventsthat are fully random. Apart from the foundational implications, our results represent a quantum protocol forfull randomness ampliﬁcation, an information task known to be impossible classically [3]. Finally, they open anew path for device-independent protocols under minimal assumptions.

Understanding whether nature is deterministically pre-determined or there are intrinsically random processes isa fundamental question that has attracted the interest ofmultiple thinkers, ranging from philosophers and mathe-maticians to physicists or neuroscientists. Nowadays thisquestion is also important from a practical perspective, asrandom bits constitute a valuable resource for applicationssuch as cryptographic protocols, gambling, or the numeri-cal simulation of physical and biological systems.Classical physics is a deterministic theory. Perfectknowledge of the positions and velocities of a system ofclassical particles at a given time, as well as of their inter-actions, allows one to predict their future (and also past)behavior with total certainty [4]. Thus, any randomnessobserved in classical systems is not intrinsic to the theorybut just a manifestation of our imperfect description of thesystem.The advent of quantum physics put into question thisdeterministic viewpoint, as there exist experimental situa-tions for which quantum theory gives predictions only inprobabilistic terms, even if one has a perfect descriptionof the preparation and interactions of the system. A pos-sible solution to this classically counterintuitive fact wasproposed in the early days of quantum physics: Quantummechanics had to be incomplete [5], and there should bea complete theory capable of providing deterministic pre-dictions for all conceivable experiments. There would thusbe no room for intrinsic randomness, and any apparent ran-domness would again be a consequence of our lack of con-trol over hypothetical “hidden variables” not contemplatedby the quantum formalism.Bell’s no-go theorem [1], however, implies that hidden-variable theories are inconsistent with quantum mechan-ics. Therefore, none of these could ever render a deter-ministic completion to the quantum formalism. More pre-cisely, all hidden-variable theories compatible with a localcausal structure predict that any correlations among space-like separated events satisfy a series of inequalities, knownas Bell inequalities. Bell inequalities, in turn, are violatedby some correlations among quantum particles. This form of correlations deﬁnes the phenomenon of quantum non-locality.Now, it turns out that quantum non-locality does notnecessarily imply the existence of fully unpredictable pro-cesses in nature. The reasons behind this are subtle. Firstof all, unpredictable processes could be certiﬁed only if theno-signalling principle holds. This states that no instanta-neous communication is possible, which imposes in turna local causal structure on events, as in Einstein’s specialrelativity. In fact, Bohm’s theory is both deterministic andable to reproduce all quantum predictions [6], but it is in-compatible with no-signalling. Thus, we assume through-out the validity of the no-signalling principle. Yet, evenwithin the no-signalling framework, it is still not possibleto infer the existence of fully random processes only fromthe mere observation of non-local correlations. This is dueto the fact that Bell tests require measurement settings cho-sen at random, but the actual randomness in such choicescan never be certiﬁed. The extremal example is given whenthe settings are determined in advance. Then, any Bell vi-olation can easily be explained in terms of deterministicmodels. As a matter of fact, super-deterministic models,which postulate that all phenomena in the universe, includ-ing our own mental processes, are fully pre-programmed,are by deﬁnition impossible to rule out.These considerations imply that the strongest result onthe existence of randomness one can hope for using quan-tum non-locality is stated by the following possibility:Given a source that produces an arbitrarily small but non-zero amount of randomness, can one still certify the exis-tence of completely random processes? The main resultof this work is to provide an afﬁrmative answer to thisquestion. Our results, then, imply that the existence ofcorrelations as those predicted by quantum physics forcesus into a dichotomic choice: Either we postulate super-deterministic models in which all events in nature are fullypre-determined, or we accept the existence of fully unpre-dictable events.Besides the philosophical and physics-foundational im-plications, our results provide a protocol for perfect ran- a r X i v : . [ qu a n t - ph ] O c t FIG. 1:

Local causal structure and randomness ampliﬁcation .A source S produces a sequence x , x , . . . x j , . . . Change x j inthe ﬁgure to x j , . . . of imperfect random bits. The goal of ran-domness ampliﬁcation is to produce a new source S f of perfectrandom bits, that is, to process the initial bits to get a ﬁnal bit k fully uncorrelated (free) from any potential cause of it. All space-time events outside the future light-cone of k may have been in itspast light-cone before and therefore constitute a potential causeof it. Any such event can be modeled by a measurement z , withan outcome e , on some physical system. This system may be un-der the control of an adversary Eve, interested in predicting thevalue of k . domness ampliﬁcation using quantum non-locality. Ran-domness ampliﬁcation is an information-theoretic taskwhose goal is to use an input source S of imperfectly ran-dom bits to produce perfect random bits that are arbitrarilyuncorrelated from all the events that may have been a po-tential cause of them, i.e. arbitrarily free. In general, S produces a sequence of bits x , x , . . . x j , . . . , with x j = 0 or 1 for all j , see Fig. 1. Each bit j contains some random-ness, in the sense that the probability P ( x j | e ) that it takesa given value x j , conditioned on any pre-existing variable e , is such that (cid:15) ≤ P ( x j | e ) ≤ − (cid:15) (1)for all j and e , where < (cid:15) ≤ / . The variable e can cor-respond to any event that could be a possible cause of bit x j . Therefore, e represents events contained in the space-time region lying outside the future light-cone of x j . Freerandom bits correspond to (cid:15) = ; while deterministic ones,i.e. those predictable with certainty by an observer with ac-cess to e , to (cid:15) = 0 . More precisely, when (cid:15) = 0 the bound(C1) is trivial and no randomness can be certiﬁed. We re-fer to S as an (cid:15) -source, and to any bit satisfying (C1) asan (cid:15) -free bit. The aim is then to generate, from arbitrarilymany uses of S , a ﬁnal source S f of (cid:15) f arbitrarily close to / . If this is possible, no cause e can be assigned to thebits produced by S f , which are then fully unpredictable.Note that efﬁciency issues, such as the rate of uses of S required per ﬁnal bit generated by S f do not play any rolein randomness ampliﬁcation. The relevant ﬁgure of merit is just the quality, measured by (cid:15) f , of the ﬁnal bits. Thus,without loss of generality, we restrict our analysis to theproblem of generating a single ﬁnal free random bit k .Santha and Vazirani proved that randomness ampliﬁca-tion is impossible using classical resources [3]. This is in asense intuitive, in view of the absence of any intrinsic ran-domness in classical physics. In the quantum regime, ran-domness ampliﬁcation has been recently studied by Col-beck and Renner [2]. There, S is used to choose the mea-surement settings by two distant observers, Alice and Bob,in a Bell test [7] involving two entangled quantum parti-cles. The measurement outcome obtained by one of theobservers, say Alice, in one of the experimental runs (alsochosen with S ) deﬁnes the output random bit. Colbeckand Renner proved how input bits with very high random-ness, of . < (cid:15) ≤ . , can be mapped into arbitrarilyfree random bits of (cid:15) f → / , and conjectured that ran-domness ampliﬁcation should be possible for any initialrandomness [2]. Our results also solve this conjecture, aswe show that quantum non-locality can be exploited to at-tain full randomness ampliﬁcation , i.e. that (cid:15) f can be madearbitrarily close to / for any < (cid:15) ≤ / .Before presenting the ingredients of our proof, it isworth commenting on previous works on randomness inconnection with quantum non-locality. In [8] it was shownhow to bound the intrinsic randomness generated in a Belltest. These bounds can be used for device-independent ran-domness expansion, following a proposal by Colbeck [9],and to achieve a quadratic expansion of the amount of ran-dom bits [8] (see [10–13] for further works on device-independent randomness expansion). Note however that,in randomness expansion, one assumes instead, from thevery beginning, the existence of an input seed of free ran-dom bits, and the main goal is to expand this into a largersequence. The ﬁgure of merit there is the ratio betweenthe length of the ﬁnal and initial strings of free randombits. Finally, other recent works have analyzed how a lackof randomness in the measurement choices affects a Belltest [14–16] and the randomness generated in it [17].Let us now sketch the realization of our ﬁnal source S f .We use the input (cid:15) -source S to choose the measurementsettings in a multipartite Bell test involving a number ofobservers that depends both on the input (cid:15) and the target (cid:15) f . After verifying that the expected Bell violation is ob-tained, the measurement outcomes are combined to deﬁnethe ﬁnal bit k . For pedagogical reasons, we adopt a cryp-tographic perspective and assume the worst-case scenariowhere all the devices we use may have been prepared by anadversary Eve equipped with arbitrary non-signalling re-sources, possibly even supra-quantum ones. In the prepa-ration, Eve may have also had access to S and correlatedthe bits it produces with some physical system at her dis-posal, represented by a black box in Fig. 1. Without lossof generality, we can assume that Eve can reveal the valueof e at any stage of the protocol by measuring this system.Full randomness ampliﬁcation is then equivalent to prov-ing that Eve’s correlations with k can be made arbitrarilysmall.Bell tests for which quantum correlations achieve ... .................. ... FIG. 2:

Protocol for full randomness ampliﬁcation based on quantum non-locality . In the ﬁrst two steps, all N quintuplets measuretheir devices, where the choice of measurement is done using the (cid:15) -source S ; the quintuplets whose settings happen not to take placein the ﬁve-party Mermin inequality are discarded (in red). In steps 3 and 4, the remaining quintuplets are grouped into blocks. One ofthe blocks is chosen as the distillation block, using again S , while the others are used to check the Bell violation. In the ﬁfth step, therandom bit k is extracted from the distillation block. the maximal non-signalling violation, also known asGreenberger-Horne-Zeilinger (GHZ) paradoxes [18], arenecessary for randomness ampliﬁcation. This is due tothe fact that unless the maximal non-signalling violationis attained, for sufﬁciently small (cid:15) , Eve may fake the ob-served correlations with classical deterministic resources.This attack ceases to be possible when the maximal non-signalling violation is observed, as Eve is forced to pre-pare only those non-local correlations attaining the maxi-mal violation. GHZ paradoxes are however not sufﬁcient.Consider for instance the GHZ paradox given by the tri-partite Mermin Bell inequality [19]. One can see that Evecan predict with certainty any function of the measurementoutcomes and still deliver the maximal violation, for all ≤ (cid:15) ≤ / (see Appendix B).For more parties though, the latter happens not to holdany longer. In fact, consider any correlations attainingthe maximal violation of the ﬁve-party Mermin inequality.Take the bit corresponding to the majority-vote functionof the outcomes of any subset of three out of the ﬁve ob-servers, say the ﬁrst three. This function is equal to zeroif at least two of the three bits are equal to zero, and equalto one otherwise. We show in Appendix B that Eve’s pre-dictability on this bit is at most 3/4. This is our ﬁrst result: Result 1.

Given an (cid:15) -source with any < (cid:15) ≤ / , andquantum ﬁve-party non-local resources, an intermediate (cid:15) i -source of (cid:15) i = 1 / can be obtained.The partial unpredictability in the ﬁve-party MerminBell test is the building block of our protocol. To com-plete it, we must equip it with two essential components:( i ) an estimation procedure that veriﬁes that the untrusteddevices do yield the required Bell violation; and ( ii ) a dis-tillation procedure that, from sufﬁciently many (cid:15) i -bits gen-erated in the 5-party Bell experiment, distills a single ﬁ-nal (cid:15) f -source of (cid:15) f → / . To these ends, we considera more complex Bell test involving N groups of ﬁve ob-servers (quintuplets) each, as depicted in Fig. 2. The stepsin the protocol are described in Box 1.In the appendices we prove using techniques from [20]that, if the protocol is not aborted, the ﬁnal bit producedby the protocol is indistinguishable from an ideal randombit uncorrelated to the eavesdropper. Thus, the output freerandom bits satisfy universally-composable security [5],the highest standard of cryptographic security, and couldbe used as seed for randomness expansion or any otherprotocol.Finally, we must show that quantum resources can in-deed successfully implement our protocol. It is immediate Box 1: Protocol for Randomness Ampliﬁcation

1. Every observer measures his device in one of two settingschosen at random by the input (cid:15) -source S .2. Every quintuplet whose settings combination does notappear in the ﬁve-party Mermin Bell test is discarded.If the quintuplets left are fewer than N/ , abort.3. Group the quintuples left into N b blocks of equal size N d . Choose a distillation block at random with S .4. If the outcomes of any quintuplet not in the distillationblock are inconsistent with the maximal violation of theﬁve-party Mermin Bell test, abort.5. Distill the ﬁnal bit from the distillation block. This isdone in the following way. The majority vote maj ( a ) among for instance the outcomes a , a and a of theﬁrst three users is computed for each quintuplet. Then, afunction f maps the resulting N d bits into the ﬁnal bit k . to see that the qubit measurements X or Y on the quan-tum state | Ψ (cid:105) = √ ( | (cid:105) + | (cid:105) ) , with | (cid:105) and | (cid:105) the eigenstates of the Z qubit basis, yield correlationsthat maximally violate the ﬁve-partite Mermin inequalityin question. This completes our main result. Result 2 ( Main Result).

Given an (cid:15) -source with any <(cid:15) ≤ / , a perfect free random bit k can be obtained usingquantum non-local correlations.In summary, we have presented a protocol that, usingquantum non-local resources, attains full randomness am-pliﬁcation . This task is impossible classically and was not known to be possible in the quantum regime. As our goalwas to prove full randomness ampliﬁcation, our analysisfocuses on the noise-free case. In fact, the noisy case onlymakes sense if one does not aim at perfect random bits andbounds the amount of randomness in the ﬁnal bit. Then, itshould be possible to adapt our protocol in order to get abound on the noise it tolerates. Other open questions thatnaturally follow from our results consist of studying ran-domness ampliﬁcation against quantum eavesdroppers, orthe search of protocols in the bipartite scenario.From a more fundamental perspective, our results im-ply that there exist experiments whose outcomes are fullyunpredictable. The only two assumptions for this conclu-sion are the existence of events with an arbitrarily smallbut non-zero amount of randomness and the validity of theno-signalling principle. Dropping the former implies ac-cepting a super-determinisitc view where no randomnessexist, so that we experience a fully pre-determined reality.This possibility is uninteresting from a scientiﬁc perspec-tive, and even uncomfortable from a philosophical one.Dropping the latter, in turn, implies abandoning a localcausal structure for events in space-time. However, this isone of the most fundamental notions of special relativity,and without which even the very meaning of randomnessor predictability would be unclear, as these concepts im-plicitly rely on the cause-effect principle. Acknowledgements

We acknowledge support from the ERCStarting Grant PERCENT, the EU Projects Q-Essence andQCS, the Spanish MICIIN through a Juan de la Cierva grantand projects FIS2010-14830, Explora-Intrinqra and CHIST-ERADIQIP, an FI Grant of the Generalitat de Catalunya, Catalunya-Caixa, and Fundaci´o Privada Cellex, Barcelona.[1] J. S. Bell, Physics , 195 (1964); Speakable and unspeakablein quantum mechanics , Cambridge University Press (Cam-bridge, 1987).[2] R. Colbeck and R. Renner,

Free randomness can be amplied ,Nature Phys. , 450 (2012).[3] M. Santha and U. V. Vazirani, in Proc. 25th IEEE Symposiumon Foundations of Computer Science (FOCS-84) , 434 (IEEEComputer Society, 1984).[4] P. S. Laplace, A Philosophical Essay on Probabilities , Paris(1840).[5] A. Einstein, B. Podolsky and N. Rosen, Phys. Rev., , 777-780 (1935).[6] D. Bohm, Phys. Rev. , 166-179 (1952); Phys. Rev. , 180-193 (1952).[7] S. L. Braunstein and C. M. Caves, Wringing out better Bellinequalities , Ann. Phys. , 22 (1990).[8] S. Pironio et al. , Random numbers certiﬁed by Bell’s theorem ,Nature , 1021 (2010).[9] R. Colbeck,

Quantum and Relativistic Protocols for Se-cure Multi-Party Computation , PhD dissertation, Univ. Cam-bridge (2007).[10] A. Ac´ın, S. Massar and S. Pironio, Phys. Rev. Lett. ,100402 (2012).[11] S. Pironio and S. Massar, arXiv:1111.6056.[12] S. Fehr, R. Gelles and C. Schaffner, arXiv:1111.6052. [13] U. V. Vazirani and T. Vidick, Proceedings of the ACM Sym-posium on the Theory of Computing (2012).[14] J. Koﬂer, T. Paterek, and C. Brukner,

Experimenter’s free-dom in Bell’s theorem and quantum cryptography , Phys. Rev.A , 022104 (2006).[15] J. Barrett and N. Gisin, How much measurement indepen-dence is needed to demonstrate nonlocality?

Phys. Rev. Lett. , 100406 (2011).[16] M. J. W. Hall,

Local deterministic model of singlet statecorrelations based on relaxing measurement independence ,Phys. Rev. Lett. , 250404 (2010).[17] D. E. Koh, M. J. W. Hall, Setiawan, J. E. Pope, C. Mar-letto, A. Kay, V. Scarani, and A. Ekert,

The effects ofreduced ‘free will” on Bell-based randomness expansion ,arXiv:1202.3571.[18] D. M. Greenberger, M. A. Horne, and A. Zeilinger, in

Bell’sTheorem, Quantum Theory, and Conceptions of the Universe (Kluwer, Dordrecht), p. 69 (1989).[19] N. D. Mermin,

Simple uniﬁed form for the major no-hidden-variables theorems , Phys. Rev. Lett. , 3373 (1990).[20] L. Masanes, Universally-composable privacy ampliﬁcationfrom causality constraints , Phys. Rev. Lett. , 140501(2009).[21] R. Canetti; Proc. 42nd IEEE Symposium on Foundations ofComputer Science (FOCS), 136 (2001).

Appendix A: Mermin inequalities

The 5-party Mermin inequality [3] plays a central role in our construction. In each run of this Bell test, measurements (inputs) x = ( x , . . . , x ) on ﬁve distant black boxes generate 5 outcomes (outputs) a = ( a , . . . , a ) , distributed according to a non-signalingconditional probability distribution P ( a | x ) . Both inputs and outputs are bits, as they can take two possible values, x i , a i ∈ { , } with i = 1 , . . . , . The inequality can be written as (cid:88) a , x I ( a , x ) P ( a | x ) ≥ , (A1)with coefﬁcients I ( a , x ) = ( a ⊕ a ⊕ a ⊕ a ⊕ a ) δ x ∈X + ( a ⊕ a ⊕ a ⊕ a ⊕ a ⊕ δ x ∈X , (A2)where δ x ∈X = (cid:40) if x ∈ X if x / ∈ X , and X = { (10000) , (01000) , (00100) , (00010) , (00001) , (11111) } , X = { (00111) , (01011) , (01101) , (01110) , (10011) , (10101) , (10110) , (11001) , (11010) , (11100) } . That is, only half of all possible combinations of inputs, namely those in X = X ∪ X , appear in the Bell inequality.The maximal, non-signalling and algebraic, violation of the inequality corresponds to the situation in which the left-hand side of (A1)is zero. The key property of inequality (A1) is that its maximal violation can be attained by quantum correlations. In fact, Mermininequalities are deﬁned for an arbitrary number of parties and quantum correlations attain the maximal non-signalling violation for anyodd number of parties [4]. This violation is always attained by performing local measurements on a GHZ quantum state. Appendix B: Partial unpredictability in the ﬁve-party Mermin inequality

Our interest in Mermin inequalities comes from the fact that, for an odd number of parties, they can be maximally violated byquantum correlations. These correlations, then, deﬁne a GHZ paradox, which, as explained in the main text, is necessary for fullrandomness ampliﬁcation. As also mentioned in the main text, GHZ paradoxes are however not sufﬁcient. In fact, it is always possibleto ﬁnd non-signalling correlations that (i) maximally violate the 3-party Mermin inequality but (ii) assign a deterministic value to anyfunction of the measurement outcomes. This observation can be checked for all unbiased functions mapping { , } to { , } (there are (cid:0) (cid:1) of those) through a linear program analogous to the one used to prove the next Theorem. For a larger number of parties, however,some functions cannot be deterministically ﬁxed to an speciﬁc value while maximally violating a Mermin inequality, as implied by thefollowing Theorem. Theorem 1.

Let a ﬁve-party non-signaling conditional probability distribution P ( a | x ) in which inputs x = ( x , . . . , x ) and outputs a = ( a , . . . , a ) are bits. Consider the bit maj( a ) ∈ { , } deﬁned by the majority-vote function of any subset consisting of three ofthe ﬁve measurement outcomes, say the ﬁrst three, a , a and a . Then, all non-signalling correlations attaining the maximal violationof the 5-party Mermin inequality are such that the probability that maj( a ) takes a given value, say 0, is bounded by / ≤ P (maj( a ) = 0) ≤ / . (B1) Proof.

This result was obtained by solving a linear program. Therefore, the proof is numeric, but exact. Formally, let P ( a | x ) be a -partite no-signaling probability distribution. For x = x ∈ X , we performed the maximization, P max = max P P (maj( a ) = 0 | x ) subject to I ( a , x ) · P ( a | x ) = 0 (B2)which yields the value P max = 3 / . Since the same result holds for P (maj( a ) = 1 | x ) , we get the bound / ≤ P (maj( a ) = 0) ≤ / .As a further remark, note that a lower bound to P max can easily be obtained by noticing that one can construct conditional probabilitydistributions P ( a | x ) that maximally violate -partite Mermin inequality (A1) for which at most one of the output bits (say a ) isdeterministically ﬁxed to either or . If the other two output bits ( a , a ) were to be completely random, the majority-vote of thethree of them maj( a , a , a ) could be guessed with a probability of / . Our numerical results say that this turns out to be an optimalstrategy.Theorem 1 implies Result 1 in the main text. Moreover it constitutes the simplest GHZ paradox in which some randomness can becertiﬁed. This paradox is the building block of our randomness ampliﬁcation protocol, presented in the next section. Appendix C: Protocol for full randomness ampliﬁcation

In this section, we describe with more details the protocol summarized in Box 1 of the main text. The protocol uses as resources the (cid:15) -source S and N quantum systems. Recall that the bits produced by the source S are such that the probability P ( x j | e ) that bit j takes a given value x j , conditioned on any pre-existing variable e , is bounded by (cid:15) ≤ P ( x j | e ) ≤ − (cid:15), (C1)for all j and e , where < (cid:15) ≤ / . The bound, when applied to n -bit strings produced by the (cid:15) -source, implies that (cid:15) n ≤ P ( x , . . . , x n | e ) ≤ (1 − (cid:15) ) n . (C2)Each of the quantum systems is abstractly modeled by a black box with binary input x and output a . The protocol processes classicallythe bits generated by S and by the quantum boxes. The result of the protocol is a classical symbol k , associated to an abort/no-abortdecision. If the protocol is not aborted, k encodes the ﬁnal output bit, with possible values 0 or 1. Whereas when the protocol isaborted, no numerical value is assigned to k but the symbol ∅ instead, representing the fact that the bit is empty. The formal steps ofthe protocol are:1. S is used to generate N quintuple-bits x , . . . x N , which constitute the inputs for the N boxes. The boxes then provide N output quintuple-bits a , . . . a N .2. The quintuplets such that x / ∈ X are discarded. The protocol is aborted if the number of remaining quintuplets is less than N/ .3. The quintuplets left after step 2 are organized in N b blocks each one having N d quintuplets. The number N b of blocks is chosento be a power of 2. For the sake of simplicity, we relabel the index running over the remaining quintuplets, namely x , . . . x N b N d and outputs a , . . . a N b N d . The input and output of the j -th block are deﬁned as y j = ( x ( j − N d +1 , . . . x ( j − N d + N d ) and b j = ( a ( j − N d +1 , . . . a ( j − N d + N d ) respectively, with j ∈ { , . . . , N b } . The random variable l ∈ { , . . . N b } is generated byusing log N b further bits from S . The value of l speciﬁes which block ( b l , y l ) is chosen to generate k , i.e. the distilling block.We deﬁne (˜ b, ˜ y ) = ( b l , y l ) . The other N b − blocks are used to check the Bell violation.4. The function r [ b, y ] = (cid:40) if I ( a , x ) = · · · = I ( a N d , x N d ) = 00 otherwise (C3)tells whether block ( b, y ) features the right correlations ( r = 1 ) or the wrong ones ( r = 0 ), in the sense of being compatiblewith the maximal violation of inequality (A1). This function is computed for all blocks but the distilling one. The protocols isaborted unless all of them give the right correlations, g = N b (cid:89) j =1 ,j (cid:54) = l r [ b j , y j ] = (cid:40) not abort abort . (C4)Note that the abort/no-abort decision is independent of whether the distilling block l is right or wrong.5. If the protocol is not aborted then k is assigned a bit generated from b l = ( a , . . . a N d ) as k = f (maj( a ) , . . . maj( a N d )) . (C5)Here f : { , } N d → { , } is a function characterized in Lemma 4 below, while maj( a i ) ∈ { , } is the majority-vote amongthe three ﬁrst bits of the quintuple string a i . If the protocol is aborted it sets k = ∅ .At the end of the protocol, k is potentially correlated with the settings of the distilling block ˜ y = y l , the bit g in (C4), and the bits t = [ l, ( b , y ) , . . . ( b l − , y l − ) , ( b l +1 , y l +1 ) , . . . ( b N b , y N b )] . Additionally, an eavesdropper Eve might have a physical system correlated with k , which she may measure at any instance of theprotocol. This system is not necessarily classical or quantum, the only assumption about it is that measuring it does not produceinstantaneous signaling anywhere else. We label all possible measurements Eve can perform with the classical variable z , and with e the corresponding outcome. In summary, after the performance of the protocol all the relevant information is k, ˜ y, t, g, e, z , withstatistics described by an unknown conditional probability distribution P ( k, ˜ y, t, g, e | z ) .To assess the security of our protocol for full randomness ampliﬁcation, we have to show that the distribution describing the protocolwhen not aborted is indistinguishable from the distribution P ideal ( k, ˜ y, t, g, e | zg = 1) = P (˜ y, t, e | zg = 1) describing an ideal freerandom bit. For later purposes, it is convenient to cover the case when the protocol is aborted with an equivalent notation: if the protocolis aborted, we deﬁne P ( k, ˜ y, t, e | zg = 0) = δ ∅ k P (˜ y, t, e | zg = 0) and P ideal ( k, ˜ y, t, e | zg = 0) = δ ∅ k P (˜ y, t, e | zg = 0) , where δ k (cid:48) k is a Kronecker’s delta. In this case, it is immediate that P = P ideal , as the locally generated symbol ∅ is always uncorrelated to theenvironment. To quantify the indistinguishability between P and P ideal , we consider the scenario in which an observer, having accessto all the information k, ˜ y, t, g, e, z , has to correctly distinguish between these two distributions. We denote by P (guess) the optimalprobability of correctly guessing between the two distributions. This probability reads P (guess) = 12 + 14 (cid:88) k, ˜ y,t,g max z (cid:88) e (cid:12)(cid:12)(cid:12) P ( k, ˜ y, t, g, e | z ) − P ideal ( k, ˜ y, t, g, e | z ) (cid:12)(cid:12)(cid:12) , (C6) where the second term can be understood as (one fourth of) the variational distance between P and P ideal generalized to the case whenthe distributions are conditioned on an input z [6]. If the protocol is such that this guessing probability can be made arbitrarily closeto 1/2, it generates a distribution P that is basically undistinguishable from the ideal one. This is known as “universally-composablesecurity”, and accounts for the strongest notion of cryptographic security (see [5] and [6]). It implies that the protocol produces arandom bit that is secure (free) in any context. In particular, it remains secure even if the adversary Eve has access to ˜ y , t and g .Our main result, namely the security of our protocol for full randomness ampliﬁcation, follows from the following Theorem. Theorem 2 ( Main Theorem).

Consider the previous protocol for randomness ampliﬁcation and the conditional probability distribution P ( k, ˜ y, t, g, e | z ) describing the statistics of the bits k, ˜ y, t, g generated during its execution and any possible system with input z and output e correlated to them. The probability P (guess) of correctly guessing between this distribution and the ideal distribution P ideal ( k, ˜ y, t, g, e | z ) is such that P (guess) ≤

12 + 3 √ N d (cid:104) α N d + 2 N log (1 − (cid:15) ) b (cid:0) β(cid:15) − (cid:1) N d (cid:105) . (C7)where α and β are real numbers such that < α < < β .The right-hand side of (C7) can be made arbitrary close to / , for instance by setting N b = (cid:0) β (cid:15) − (cid:1) N d / | log (1 − (cid:15) ) | andincreasing N d subject to the fulﬁllment of the condition N d N b ≥ N/ . [Note that log (1 − (cid:15) ) < .] In the limit P (guess) → / ,the bit k generated by the protocol is indistinguishable from an ideal free random bit.The proof of Theorem 2 is provided in the next section. Before moving to it, we would like to comment on the main intuitions behindour protocol. As mentioned, the protocol builds on the 5-party Mermin inequality because it is the simplest GHZ paradox allowingsome randomness certiﬁcation. The estimation part, given by step 4, is rather standard and inspired by estimation techniques introducedin [7], which were also used in [2] in the context of randomness ampliﬁcation. The most subtle part is the distillation of the ﬁnal bit instep 5. Naively, and leaving aside estimation issues, one could argue that it is nothing but a classical processing by means of the function f of the imperfect random bits obtained via the N d quintuplets. But this seems in contradiction with the result by Santha and Vaziraniproving that it is impossible to extract by classical means a perfect free random bit from imperfect ones [1]. This intuition is howeverwrong. The reason is because in our protocol the randomness of the imperfect bits is certiﬁed by a Bell violation, which is impossibleclassically. Indeed, the Bell certiﬁcation allows applying techniques similar to those obtained in Ref. [6] in the context of privacyampliﬁcation against non-signalling eavesdroppers. There, it was shown how to amplify the privacy, that is the unpredictability, of oneof the measurement outcomes of bipartite correlations violating a Bell inequality. The key point is that the ampliﬁcation, or distillation,was attained in a deterministic manner. That is, contrary to standard approaches, the privacy ampliﬁcation process described in [6]does not consume any randomness. Clearly, these deterministic techniques are extremely convenient for our randomness ampliﬁcationscenario. In fact, the distillation part in our protocol can be seen as the translation of the privacy ampliﬁcation techniques of Ref. [6] toour more complex scenario, involving now 5-party non-local correlations and a function of three of the measurement outcomes. Appendix D: Proof of Theorem 2

Before entering the details of the proof of Theorem 2, let us introduce a convenient notation. In what follows, we sometimes treatconditional probability distributions as vectors. To avoid ambiguities, we explicitly label the vectors describing probability distributionswith the arguments of the distributions in upper case. Thus, for example, we denote by P ( A | X ) the (2 × ) -dimensional vector withcomponents P ( a | x ) for all a , x ∈ { , } . We also denote by I the vector with components I ( a , x ) given in (A2). With this notation,inequality (A1) can be written as the scalar product I · P ( A | X ) = (cid:88) a , x I ( a , x ) P ( a | x ) ≥ . Any probability distribution P ( a | x ) satisﬁes C · P ( A | X ) = 1 , where C is the vector with components C ( a , x ) = 2 − . We also usethis scalar-product notation for full blocks, as in I ⊗ N d · P ( B | Y ) = (cid:88) a ,... a Nd (cid:88) x ,... x Nd (cid:34) N d (cid:89) i =1 I ( a i , x i ) (cid:35) P ( a , . . . a N d | x , . . . x N d ) . Following our upper/lower-case convention, the vector P ( B | Y, e, z ) has components P ( b | y, e, z ) for all b, y but ﬁxed e, z .The proof of Theorem 2 relies on two crucial lemmas, which are stated and proven in Sections D 1 and D 2, respectively. The ﬁrstlemma bounds the distinguishability between the distribution distilled from a block of N d quintuplets and the ideal free random bit asfunction of the Bell violation (A1) in each quintuplet. In particular, it guarantees that, if the correlations of all quintuplets in a givenblock violate inequality (A1) sufﬁciently much, the bit distilled from the block will be indistinguishable from an ideal free randombit. The second lemma is required to guarantee that, if the statistics observed in all blocks but the distilling one are consistent with amaximal violation of inequality (A1), the violation of the distilling block will be arbitrarily large. Proof of Theorem 2.

We begin with the identity P (guess) = P ( g = 0) P (guess | g = 0) + P ( g = 1) P (guess | g = 1) . (D1) As discussed, when the protocol is aborted ( g = 0 ) the distribution generated by the protocol and the ideal one are indistinguishable.In other words, P (guess | g = 0) = 12 . (D2)If P ( g = 0) = 1 then the protocol is secure, though in a trivial fashion. Next we address the non-trivial case where P ( g = 1) > .From formula (C6), we have P (guess | g = 1)= 12 + 14 (cid:88) k, ˜ y,t max z (cid:88) e (cid:12)(cid:12)(cid:12) P ( k, ˜ y, t, e | z, g = 1) − P (˜ y, t, e | z, g = 1) (cid:12)(cid:12)(cid:12) = 12 + 14 (cid:88) ˜ y,t P (˜ y, t | g = 1) (cid:88) k max z (cid:88) e (cid:12)(cid:12)(cid:12) P ( k, e | z, ˜ y, t, g = 1) − P ( e | z, ˜ y, t, g = 1) (cid:12)(cid:12)(cid:12) ≤

12 + 14 (cid:88) ˜ y,t P (˜ y, t | g = 1) 6 √ N d ( αC + βI ) ⊗ N d · P ( ˜ B | ˜ Y , t, g = 1)= 12 + 3 √ N d αC + βI ) ⊗ N d · (cid:88) ˜ y,t P (˜ y, t | g = 1) P ( ˜ B | ˜ Y , t, g = 1)= 12 + 3 √ N d αC + βI ) ⊗ N d · (cid:88) t P ( t | g = 1) P ( ˜ B | ˜ Y , t, g = 1)= 12 + 3 √ N d αC + βI ) ⊗ N d · (cid:88) t P ( ˜ B, t | ˜ Y , g = 1)= 12 + 3 √ N d αC + βI ) ⊗ N d · P ( ˜ B | ˜ Y , g = 1) (D3)where the inequality is due to Lemma 1 in Section D 1, we have used the no-signalling condition through P (˜ y, t | z, g = 1) = P (˜ y, t | g =1) , in the second equality, and Bayes rule in the second and sixth equalities. From (D3) and Lemma 2 in Section D 2, we obtain P (guess | g = 1) ≤

12 + 3 √ N d (cid:34) α N d + 2 N log (1 − (cid:15) ) b P ( g = 1) (cid:0) β(cid:15) − (cid:1) N d (cid:35) . (D4)Finally, substituting bound (D4) and equality (D2) into (D1), we obtain P (guess) ≤

12 + 3 √ N d (cid:104) P ( g = 1) α N d + 2 N log (1 − (cid:15) ) b (cid:0) β(cid:15) − (cid:1) N d (cid:105) , (D5)which, together with P ( g = 1) ≤ , implies (C7).

1. Statement and proof of Lemma 1

As mentioned, Lemma 1 provides a bound on the distinguishability between the probability distribution obtained after distilling ablock of N d quintuplets and an ideal free random bit in terms of the Bell violation (A1) in each quintuplet. The proof of Lemma 1, inturn, requires two more lemmas, Lemma 3 and Lemma 4, stated and proven in Section D 3. Lemma 1.

For each integer N d ≥ there exists a function f : { , } N d → { , } such that, for any given (5 N d + 1) -partite non-signaling distribution P ( a , . . . a N d , e | x , . . . x N d , z ) = P ( b, e | y, z ) , the random variable k = f (maj( a ) , . . . maj( a N d )) satisﬁes (cid:88) k max z (cid:88) e (cid:12)(cid:12)(cid:12) P ( k, e | y, z ) − P ( e | y, z ) (cid:12)(cid:12)(cid:12) ≤ √ N d ( αC + βI ) ⊗ N d · P ( B | Y ) (D6)for all inputs y = ( x , . . . x N d ) ∈ X N d , and where α and β are real numbers such that < α < < β . Proof of Lemma 1.

For any x ∈ X let M x w be the vector with components M x w ( a , x ) = δ w maj( a ) δ x x . The probability of getting maj( a ) = w when using x as input can be written as P ( w | x ) = M x w · P ( A | X ) . Note that this probability can also be written as P ( w | x ) = Γ x w · P ( A | X ) , where Γ x w = M x w + Λ x w and Λ x w is any vector orthogonal to the no-signaling subspace, that is, suchthat Λ x w · P ( A | X ) = 0 for all no-signaling distribution P ( A | X ) . We can then write the left-hand side of (D6) as (cid:88) k max z (cid:88) e (cid:12)(cid:12)(cid:12)(cid:12) P ( k, e | y, z ) − P ( e | y, z ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:88) k max z (cid:88) e P ( e | y, z ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) w (cid:18) δ kf ( w ) − (cid:19) P ( w | y, e, z ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:88) k max z (cid:88) e P ( e | z ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) w (cid:18) δ kf ( w ) − (cid:19) (cid:32) N d (cid:79) i =1 Γ x i w i (cid:33) · P ( B | Y, e, z ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (D7) where in the last equality we have used no-signaling through P ( e | y, z ) = P ( e | z ) and the fact that the probability of obtaining thestring of majorities w when inputting y = ( x , . . . x N d ) ∈ X N d can be written as P ( w | y ) = (cid:32) N d (cid:79) i =1 Γ x i w i (cid:33) · P ( B | Y ) . (D8)In what follows, the absolute value of vectors is understood to be component-wise. Bound (D7) can be rewritten as (cid:88) k max z (cid:88) e (cid:12)(cid:12)(cid:12)(cid:12) P ( k, e | y, z ) − P ( e | y, z ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:88) k max z (cid:88) e P ( e | z ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) w (cid:18) δ kf ( w ) − (cid:19) N d (cid:79) i =1 Γ x i w i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) · P ( B | Y, e, z )= (cid:88) k max z (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) w (cid:18) δ kf ( w ) − (cid:19) N d (cid:79) i =1 Γ x i w i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) · (cid:32)(cid:88) e P ( e | z ) P ( B | Y, e, z ) (cid:33) = (cid:88) k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) w (cid:18) δ kf ( w ) − (cid:19) N d (cid:79) i =1 Γ x i w i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) · P ( B | Y ) , (D9)where the inequality follows from the fact that all the components of the vector P ( B | Y, e, z ) are positive and no-signalling has beenused again through P ( B | Y, z ) = P ( B | Y ) in the last equality. The bound applies to any function f and holds for any choice of vectors Λ x i w in Γ x i w . In what follows, we compute this bound for a speciﬁc choice of these vectors and function f .Take Λ x i w to be equal to the vectors Λ x w in Lemma 3. These vectors then satisfy the bounds (D20) and (D29) in the same Lemma.Take f to be equal to the function whose existence is proven in Lemma 4. Note that the conditions needed for this Lemma to applyare satisﬁed because of bound (D20) in Lemma 3, and because the free parameter N d ≥ satisﬁes (cid:0) √ N d (cid:1) − /N d ≥ γ = 0 . .With this choice of f and Λ x i w , bound (D9) becomes (cid:88) k max z (cid:88) e (cid:12)(cid:12)(cid:12)(cid:12) P ( k, e | y, z ) − P ( e | y, z ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:88) k √ N d (cid:32) N d (cid:79) i =1 Ω x i (cid:33) · P ( B | Y ) ≤ √ N d ( αC + βI ) ⊗ N d · P ( B | Y ) , (D10)where we have used Ω x i = (cid:112) (Γ x i ) + (Γ x i ) , (cid:80) k , bound (D20) in Lemma 3 and bound (D29) in Lemma 4.

2. Statement and proof of Lemma 2

In this section we prove Lemma 2. This Lemma bounds the Bell violation in the distillation block in terms of the probability of notaborting the protocol in step 4 and the number and size of the blocks, N b and N d . Lemma 2.

Let P ( b , . . . b N b | y , . . . y N b ) be a (5 N d N b ) -partite no-signaling distribution, y , . . . y N b and l the variables generated insteps 2 and 3 of the protocol, respectively, and α and β real numbers such that < α < < β ; then ( αC + βI ) ⊗ N d · P ( ˜ B | ˜ Y , g = 1) ≤ α N d + 2 N log (1 − (cid:15) ) b P ( g = 1) (cid:0) β(cid:15) − (cid:1) N d . (D11) Proof of Lemma 2.

According to deﬁnition (C3) we have I ( a i , x i ) ≤ δ r [ b,y ] for all values of b = ( a , . . . a N d ) and y =( x , . . . x N d ) . This also implies I ( a i , x i ) I ( a j , x j ) ≤ δ r [ b,y ] and so on. Due to the property < α < < β , one has that ( α − ) N d − i β i ≤ β N d for any i = 1 , . . . N d . All this in turn implies N d (cid:89) i =1 (cid:2) α − + βI i (cid:3) = (cid:0) α − (cid:1) N d + (cid:0) α − (cid:1) N d − β (cid:88) i I i + (cid:0) α − (cid:1) N d − β (cid:88) i (cid:54) = j I i I j + · · ·≤ (cid:0) α − (cid:1) N d + β N d (cid:88) i I i + (cid:88) i (cid:54) = j I i I j + · · ·  ≤ (cid:0) α − (cid:1) N d + β N d (cid:88) i δ r [ b,y ] + (cid:88) i (cid:54) = j δ r [ b,y ] + · · ·  ≤ (cid:0) α − (cid:1) N d + β N d (cid:16) N d − (cid:17) δ r [ b,y ] ≤ (cid:0) α − (cid:1) N d + ( β N d δ r [ b,y ] , (D12)where I i = I ( a i , x i ) . This implies that ( αC + βI ) ⊗ N d · P ( B | Y, g = 1)= (cid:88) a ,... a Nd (cid:88) x ,... x Nd N d (cid:89) i =1 (cid:2) α − + βI ( a i , x i ) (cid:3) P ( a , . . . a N d | x , . . . x N d , g = 1) ≤ (cid:88) b,y (cid:104)(cid:0) α − (cid:1) N d + (2 β ) N d δ r [ b,y ] (cid:105) P ( b | y, g = 1)= α N d (cid:88) y − N d + (2 β ) N d (cid:88) y P ( r = 0 | y, g = 1)= α N d + (2 β ) N d (cid:88) y P ( r = 0 | y, g = 1)= α N d + (2 β ) N d (cid:88) y P ( r = 0 , y | g = 1) P ( y | g = 1) . (D13)We can now bound P ( y | g = 1) taking into account that y denotes a N d -bit string generated by the (cid:15) -source S that remains after step 2in the protocol. Note that only half of the 32 possible 5-bit inputs x generated by the source belong to X and remain after step 2. Thus, P (( x , . . . , x N d ) ∈ X N d | g = 1) ≤ N d (1 − (cid:15) ) N d , where we used (C2). This, together with P (( x , . . . , x N d ) | g = 1) ≥ (cid:15) N d implies that P ( y | g = 1) ≥ (cid:18) (cid:15) − (cid:15) ) (cid:19) N d . (D14)Substituting this bound in (D13), and summing over y , gives ( αC + βI ) ⊗ N d · P ( B | Y, g = 1) ≤ α N d + (2 β ) N d (cid:18) − (cid:15) ) (cid:15) (cid:19) N d P ( r = 0 | g = 1) . (D15)In what follows we use the notation P (1 , , , , . . . ) = P ( r [ b , y ] = 1 , r [ b , y ] = 0 , r [ b , y ] = 1 , r [ b , y ] = 1 , . . . ) . According to (C4), the protocol aborts ( g = 0 ) if there is at least a “not right” block ( r [ b j , y j ] = 0 for some j (cid:54) = l ). While abortionalso happens if there are more than one “not right” block, in what follows we lower-bound P ( g = 0) by the probability that there isonly one “not right” block: ≥ P ( g = 0) ≥ N b (cid:88) l =1 P ( l ) N b (cid:88) l (cid:48) =1 , l (cid:48) (cid:54) = l P (1 , . . . l − , l +1 , . . . l (cid:48) − , l (cid:48) , l (cid:48) +1 , . . . N b ) ≥ (cid:88) l P ( l ) (cid:88) l (cid:48) (cid:54) = l P (1 , . . . l − , l , l +1 , . . . l (cid:48) − , l (cid:48) , l (cid:48) +1 , . . . N b )= (cid:88) l (cid:48) (cid:104)(cid:80) l (cid:54) = l (cid:48) P ( l ) (cid:105) P (1 , . . . l − , l , l +1 , . . . l (cid:48) − , l (cid:48) , l (cid:48) +1 , . . . N b )= (cid:88) l (cid:48) [1 − P ( l (cid:48) )] P (1 , . . . l (cid:48) − , l (cid:48) , l (cid:48) +1 , . . . N b ) , (D16) where, when performing the sum over l , we have used that P (1 , . . . l − , l , l +1 , . . . l (cid:48) − , l (cid:48) , l (cid:48) +1 , . . . N b ) ≡ P (1 , . . . l (cid:48) − , l (cid:48) , l (cid:48) +1 , . . . N b ) does not depend on l . Bound (C2) implies − P ( l ) P ( l ) ≥ − (1 − (cid:15) ) log N b (1 − (cid:15) ) log N b = N log − (cid:15) b − ≥ N log − (cid:15) b , (D17)where the last inequality holds for sufﬁciently large N b . Using this and (D16), we obtain ≥ (cid:88) l (cid:48) N log − (cid:15) b P ( l (cid:48) ) P (1 , . . . l (cid:48) − , l (cid:48) , l (cid:48) +1 , . . . N b ) ≥ N log − (cid:15) b P (˜ r = 0 , g = 1) , (D18)where ˜ r = r [ b l , y l ] . This together with (D15) implies ( αC + βI ) ⊗ N d · P ( ˜ B | ˜ Y , g = 1) ≤ α N d + (2 β ) N d (cid:18) − (cid:15) ) (cid:15) (cid:19) N d P (˜ r = 0 | g = 1) ≤ α N d + 2 P ( g = 1) (cid:18) β (1 − (cid:15) ) (cid:15) (cid:19) N d N log (1 − (cid:15) ) b , (D19)where, in the second inequality, Bayes rule was again invoked. Inequality (D19), in turn, implies (D11).

3. Statement and proof of the additional LemmasLemma 3.

For each x ∈ X there are three vectors Λ x , Λ x , Λ x orthogonal to the non-signaling subspace such that for all w ∈{ , } and a , x ∈ { , } they satisfy (cid:113) [ M x ( a , x ) + Λ x ( a , x )] + [ M x ( a , x ) + Λ x ( a , x )] ≤ αC ( a , x ) + βI ( a , x ) + Λ x ( a , x ) (D20)and | M x w ( a , x ) + Λ x w ( a , x ) | ≤ γ (cid:113) [ M x ( a , x ) + Λ x ( a , x )] + [ M x ( a , x ) + Λ x ( a , x )] (D21)where α = 0 . , β = 1 . and γ = 0 . . Proof of Lemma 3.

The proof of this lemma is numeric but rigorous. It is based on two linear-programming minimization problems,which are carried for each value of x ∈ X . We have repeated this process for different values of γ , ﬁnding that γ = 0 . is roughlythe smallest value for which the linear-programs described below are feasible.The fact that the vectors Λ x , Λ x , Λ x are orthogonal to the non-signaling subspace can be written as linear equalities D · Λ x w = (D22)for w ∈ { , , } , where is the zero vector and D is a matrix whose rows constitute a basis of non-signalingprobability distributions. A geometrical interpretation of constraint (D20) is that the point in the plane with coordinates [ M x ( a , x ) + Λ x ( a , x ) , M x ( a , x ) + Λ x ( a , x )] ∈ R is inside a circle of radius αC ( a , x ) + βI ( a , x ) + Λ x ( a , x ) centeredat the origin. All points inside an octagon inscribed in this circle also satisfy constraint (D20). The points of such an inscribed octagonare the ones satisfying the following set of linear constraints: [ M x ( a , x ) + Λ x ( a , x )] η cos θ + [ M x ( a , x ) + Λ x ( a , x )] η sin θ ≤ αC ( a , x ) + βI ( a , x ) + Λ x ( a , x ) , (D23)for all θ ∈ { π , π , π , π , π , π , π , π } , where η = (cos π ) − ≈ . . In other words, the eight conditions (D23) implyconstraint (D20). From now on, we only consider these eight linear constraints (D23). With a bit of algebra, one can see that inequal-ity (D21) is equivalent to the two almost linear inequalities there was an error in the following equation, as the pre-factor in terms of γ was wrong. Please check what was computed and how it affects to γ and, then, to the value of N d ± [ M x w ( a , x ) + Λ x w ( a , x )] ≤ (cid:115) γ − γ | M x ¯ w ( a , x ) + Λ x ¯ w ( a , x ) | , (D24)for all w ∈ { , } , where ¯ w = 1 − w . Clearly, the problem is not linear because of the absolute values. The computation described inwhat follows constitutes a trick to make a good guess for the signs of the terms in the absolute value of (D24), so that the problem canbe made linear by adding extra constraints.The ﬁrst computational step consists of a linear-programming minimization of α subject to the constraints (D22), (D23), where theminimization is performed over the variables α, β, Λ x , Λ x , Λ x . This step serves to guess the signs σ w ( a , x ) = sign[ M x w ( a , x ) + Λ x w ( a , x )] , (D25) for all w, a , x , where the value of Λ x w ( a , x ) corresponds to the solution of the above minimization. Once we have identiﬁed all thesesigns, we can write the inequalities (D24) in a linear fashion: σ w ( a , x ) [ M x w ( a , x ) + Λ x w ( a , x )] ≥ , (D26) σ w ( a , x ) [ M x w ( a , x ) + Λ x w ( a , x )] ≤ (cid:115) γ − γ σ ¯ w ( a , x ) [ M x ¯ w ( a , x ) + Λ x ¯ w ( a , x )] , (D27)for all w ∈ { , } .The second computational step consists of a linear-programming minimization of α subjected to the constraints (D22), (D23), (D26),(D27), over the variables α, β, Λ x , Λ x , Λ x . Clearly, any solution to this problem is also a solution to the original formulation of theLemma. The minimization was performed for any x ∈ X and the values of α, β turned out to be independent of x ∈ X . Theseobtained numerical values are the ones appearing in the formulation of the Lemma.Note that Lemma 3 allows one to bound the predictability of maj( a ) by a linear function of the 5-party Mermin violation. Thiscan be seen by computing Γ x w · P ( A | X ) and applying the bounds in the Lemma. In principle, one expects this bound to exist, asthe predictability is smaller than one at the point of maximal violation, as proven in Theorem 1, and equal to one at the point of noviolation. However, we were unable to ﬁnd it. This is why we had to resort to the linear optimization technique given above, whichmoreover provides the bounds (D20) and (D21) necessary for the security proof. Lemma 4.

Let N d be a positive integer and let Γ iw ( a , x ) be a given set of real coefﬁcients such that for all i ∈ { , . . . N d } , w ∈ { , } and a , x ∈ { , } they satisfy (cid:12)(cid:12)(cid:12) Γ iw ( a , x ) (cid:12)(cid:12)(cid:12) ≤ (cid:16) √ N d (cid:17) − /N d Ω i ( a , x ) , (D28)where Ω i ( a , x ) = (cid:112) Γ i ( a , x ) + Γ i ( a , x ) . There exists a function f : { , } N d → { , } such that for each sequence ( a , x ) , . . . ( a N d , x N d ) we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) w (cid:18) δ kf ( w ) − (cid:19) N d (cid:89) i =1 Γ iw i ( a i , x i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ N d N d (cid:89) i =1 Ω i ( a i , x i ) , (D29)where the sum runs over all w = ( w , . . . w N d ) ∈ { , } N d . Proof of Lemma (4) . First, note that for a sequence ( a , x ) , . . . ( a N d , x N d ) for which there is at least one value of i ∈ { , . . . N d } satisfying Γ i ( a i , x i ) = Γ i ( a i , x i ) = 0 , both the left-hand side and the right-hand side of (D29) are equal to zero, hence, inequal-ity (D29) is satisﬁed independently of the function f . Therefore, in what follows, we only consider sequences ( a , x ) , . . . ( a N d , x N d ) for which either Γ i ( a i , x i ) (cid:54) = 0 or Γ i ( a i , x i ) (cid:54) = 0 , for all i = 1 , . . . N d . Or, equivalently, we consider sequences such that N d (cid:89) i =1 Ω i ( a i , x i ) > . (D30)The existence of the function f satisfying (D29) for all such sequences is shown with a probabilistic argument. We consider thesituation where f is picked from the set of all functions mapping { , } N d to { , } with uniform probability, and upper-boundthe probability that the chosen function does not satisfy the constraint (D29) for all k and all sequences ( a , x ) , . . . ( a N d , x N d ) satisfying (D30). This upper bound is shown to be smaller than one. Therefore there must exist at least one function satisfying (D29).For each w ∈ { , } N d consider the random variable F w = ( δ f ( w ) − ) ∈ { , − } , where f is picked from the set of all functionsmapping { , } N d → { , } with uniform distribution. This is equivalent to saying that the N d random variables { F w } w are indepen-dent and identically distributed according to Pr { F w = ± } = . For ease of notation, let us ﬁx a sequence ( a , x ) , . . . ( a N d , x N d ) satisfying (D30) and use the short-hand notation Γ iw i = Γ iw i ( a i , x i ) . We proceed using the same ideas as in the derivation of the exponential Chebyshev’s Inequality. For any µ, ν ≥ , we have Pr (cid:40)(cid:88) w F w N d (cid:89) i =1 Γ iw i ≥ µ (cid:41) = Pr (cid:40) ν (cid:32) − µ + (cid:88) w F w N d (cid:89) i =1 Γ iw i (cid:33) ≥ (cid:41) = Pr (cid:40) exp (cid:32) − νµ + ν (cid:88) w F w N d (cid:89) i =1 Γ iw i (cid:33) ≥ (cid:41) ≤ E (cid:34) exp (cid:32) − νµ + ν (cid:88) w F w N d (cid:89) i =1 Γ iw i (cid:33)(cid:35) (D31) = E (cid:34) e − νµ (cid:89) w exp (cid:32) νF w N d (cid:89) i =1 Γ iw i (cid:33)(cid:35) = e − νµ (cid:89) w E (cid:34) exp (cid:32) νF w N d (cid:89) i =1 Γ iw i (cid:33)(cid:35) (D32) ≤ e − νµ (cid:89) w E  νF w N d (cid:89) i =1 Γ iw i + (cid:32) νF w N d (cid:89) i =1 Γ iw i (cid:33)  . (D33)Here E stands for the average over all F w . In (D31) we have used that any positive random variable X satisﬁes Pr { X ≥ } ≤ E [ X ] .In (D32) we have used that the { F w } w are independent. Finally, in (D33) we have used that e η ≤ η + η , which is only valid if η ≤ . Therefore, we must show that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ν N d (cid:89) i =1 Γ iw i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ , (D34)which is done below, when setting the value of ν . In what follows we use the chain of inequalities (D33), the fact that E [ F w ] = 0 and E [ F w ] = 1 / , bound η ≤ e η for η ≥ , and the deﬁnition Ω i = (Γ i ) + (Γ i ) : Pr (cid:40)(cid:88) w F w N d (cid:89) i =1 Γ iw i ≥ µ (cid:41) ≤ e − νµ (cid:89) w (cid:32) E [ F w ] ν N d (cid:89) i =1 Γ iw i + E [ F w ] ν N d (cid:89) i =1 (cid:16) Γ iw i (cid:17) (cid:33) = e − νµ (cid:89) w (cid:32) ν N d (cid:89) i =1 (cid:16) Γ iw i (cid:17) (cid:33) ≤ e − νµ (cid:89) w exp (cid:32) ν N d (cid:89) i =1 (cid:16) Γ iw i (cid:17) (cid:33) = exp (cid:32) − νµ + (cid:88) w ν N d (cid:89) i =1 (cid:16) Γ iw i (cid:17) (cid:33) = exp (cid:32) − νµ + ν N d (cid:89) i =1 Ω i (cid:33) (D35)In order to optimize this upper bound, we minimize the exponent over ν . This is done by differentiating with respect to ν and equatingto zero, which gives ν = 2 µ N d (cid:89) i =1 Ω − i . (D36)Note that constraint (D30) implies that the inverse of Ω i exists. Since we assume µ ≥ , the initial assumption ν ≥ is satisﬁed bythe solution (D36). By substituting (D36) in (D35) and rescaling the free parameter µ as ˜ µ = µ (cid:81) N d i =1 Ω i , (D37)we obtain Pr (cid:40)(cid:88) w F w N d (cid:89) i =1 Γ iw i ≥ ˜ µ N d (cid:89) i =1 Ω i (cid:41) ≤ e − ˜ µ , (D38) for any ˜ µ ≥ consistent with condition (D34). We now choose ˜ µ = 3 √ N d , see Eq. (D29), getting Pr (cid:40)(cid:88) w F w N d (cid:89) i =1 Γ iw i ≥ √ N d N d (cid:89) i =1 Ω i (cid:41) ≤ e − N d . (D39)With this assignment, and using (D36) and (D37), condition (D34), yet to be fulﬁlled, becomes √ N d N d (cid:89) i =1 | Γ iw i | Ω i ≤ , (D40)which now holds because of the initial premise (D28).Bound (D39) applies to each of the sequences ( a , x ) , . . . ( a N d , x N d ) satisfying (D30), and there are at most N d of them. Hence,the probability that the random function f does not satisfy the bound (cid:88) w F w N d (cid:89) i =1 Γ iw i ≥ √ N d N d (cid:89) i =1 Ω i , (D41)for at least one of such sequences, is at most N d e − N d , which is smaller than / for any value of N d . A similar argument provesthat the probability that the random function f does not satisfy the bound (cid:88) w F w N d (cid:89) i =1 Γ iw i ≤ − √ N d N d (cid:89) i =1 Ω i , (D42)for at least one sequence satisfying (D30) is also smaller than 1/2. The lemma now easily follows from these two results. Appendix E: Final remarks

The main goal of our work was to prove full randomness ampliﬁcation. In these appendices, we have shown how our protocol,based on quantum non-local correlations, achieves this task. Unfortunately, we are not able to provide an explicit description of thefunction f : { , } N d → { , } which maps the outcomes of the black boxes to the ﬁnal random bit k ; we merely show its existence.Such function may be obtained through an algorithm that searches over the set of all functions until it ﬁnds one satisfying (D29). Theproblem with this method is that the set of all functions has size N d , which makes the search computationally costly. However, thisproblem can be ﬁxed by noticing that the random choice of f in the proof of Lemma 4 can be restricted to a four-universal family offunctions, with size polynomial in N d . This observation will be developed in future work.A more direct approach could consist of studying how the randomness in the measurement outcomes for correlations maximallyviolating the Mermin inequality increases with the number of parties. We solved linear optimization problems similar to those used inTheorem 1 which showed that for 7 parties Eve’s predictability is / for a function of 5 bits deﬁned by f (00000) = 0 , f (01111) = 0 , f (00111) = 0 and f ( x ) = 1 otherwise. Note that this value is lower than the earlier / and also that the function is different from themajority-vote. We were however unable to generalize these results for an arbitrary number of parties, which forced us to adopt a lessdirect approach. Note in fact that our protocol can be interpreted as a huge multipartite Bell test from which a random bit is extractedby classical processing of some of the measurement outcomes.We conclude by stressing again that the reason why randomness ampliﬁcation becomes possible using non-locality is becausethe randomness certiﬁcation is achieved by a Bell inequality violation. There already exist several protocols, both in classical andquantum information theory, in which imperfect randomness is processed to generate perfect (or arbitrarily close to perfect) randomness.However, all these protocols, e.g. two-universal hashing or randomness extractors, always require additional good-quality randomnessto perform such distillation. On the contrary, if the initial imperfect randomness has been certiﬁed by a Bell inequality violation, thedistillation procedure can be done with a deterministic hash function (see [6] or Lemma 1 above). This property makes Bell-certiﬁedrandomness fundamentally different from any other form of randomness, and is the key for the success of our protocol.[1] M. Santha and U. V. Vazirani, in Proc. 25th IEEE Symposium on Foundations of Computer Science (FOCS-84) , 434 (IEEE Com-puter Society, 1984).[2] R. Colbeck and R. Renner, Free randomness can be amplied , Nature Phys. , 450 (2012).[3] N. D. Mermin, Extreme quantum entanglement in a superposition of macroscopically distinct states , Phys. Rev. Lett. , 1838(1990).[4] D. N. Klyshko, Phys. Lett. A , 399 (1993); A. V. Belinskii and D. N. Klyshko, Physics - Uspekhi , 653 (1993); N. Gisin, H.Bechmann-Pasquinucci, Phys.Lett. A , 1-6 (1998).[5] R. Canetti; Proc. 42nd IEEE Symposium on Foundations of Computer Science (FOCS), 136 (2001).[6] L. Masanes; Universally-composable privacy ampliﬁcation from causality constraints ; Phys. Rev. Lett. , 140501 (2009).[7] J. Barrett, L. Hardy and A. Kent,