Scalability of all-optical neural networks based on spatial light modulators
Ying Zuo, Zhao Yujun, You-Chiuan Chen, Shengwang Du, Junwei Liu
AAPS/123-QED
Scalability of all-optical neural networks based on spatial light modulators
Ying Zuo, ∗ Zhao Yujun, ∗ You-Chiuan Chen, Shengwang Du, † and Junwei Liu ‡ Department of Physics, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China Department of Physics, The University of Texas at Dallas, Richardson, Texas 75080, USA (Dated: February 22, 2021)Optical implementation of artificial neural networks has been attracting great attention due toits potential in parallel computation at speed of light. Although all-optical deep neural networks(AODNNs) with a few neurons have been experimentally demonstrated with acceptable errors re-cently, the feasibility of large scale AODNNs remains unknown because error might accumulateinevitably with increasing number of neurons and connections. Here, we demonstrate a scalableAODNN with programmable linear operations and tunable nonlinear activation functions. We ver-ify its scalability by measuring and analyzing errors propagating from a single neuron to the entirenetwork. The feasibility of AODNNs is further confirmed by recognizing handwritten digits andfashions respectively.
I. INTRODUCTION
In the last decades, artificial neural networks havegrown into one of the most disruptive technologies [1–3]and found success in both practical applications, such ascomputer vision, speech recognition and natural languageprocessing[4–6], and fundamental studies such as particlephysics, condensed matter physics and material science[7–17]. The power of an artificial neural network comesfrom its large amount of artificial neurons and their inten-sive interconnections, which require large memory size,computational complexity and energy consumption [18].Due to the light wave nature, all-optical deep neural net-works (AODNNs) hold a great promise in high-speedparallel computation with low energy consumption[19–25]. AODNNs have not been implemented experimen-tally until recently with the breakthroughs in realiz-ing optical nonlinear activation functions using phase-change materials [26] and electromagnetically inducedtransparency (EIT) atomic medium [27], which explic-itly remove all the principle restrictions of constructing ageneral-purpose AODNN. However, any of these demon-strated AODNNs has no more than 22 neurons and thescalability remains a question.Different from a universal digital electronic implemen-tation of artificial neural networks, an optical neural net-work is usually designed and built for a specific giventask [28–30], and one has to rebuild a completely newoptical neural network even for a slightly different taskwhich shares similar structure. In addition, the errorfrom the imperfection of optical signals and componentsmay accumulate exponentially with the number of neu-rons to completely ruin the performance in a large opticalneural network even though in general large-size neuralnetworks have large tolerance for random local error froman individual neuron [31]. ∗ These authors contributed equally to this work. † [email protected] ‡ [email protected] In this work, we demonstrate a scalable AODNN basedon free-space Fourier optics and EIT nonlinear activa-tion functions. The programmable linear transforma-tions in our AODNNs are realized using spatial lightmodulators[32] (SLMs) and optical lenses. We experi-mentally show that local random errors can be reducedby increasing the area of SLM ( S ) per neuron. Moreoverthe total error in the linear transformation only accu-mulates with the deviation proportional to square rootof the number of neurons ( M ). In addition, the nonlin-ear activation functions realized with EIT in cold atomsare spatially separated, thus their errors are independentand accumulates similarly to the linear transformation(with the deviation proportional to square root of theirnumber). Therefore this AODNN can be scaled up tolarge size since effect of this kind of slowly accumulatederrors can be easily compensated by increasing the num-ber of hidden neurons[31]. We confirm the feasibility andprogrammability by experimentally constructing a fully-connected AODNN with 174 optical neurons, and ap-ply it to recognize handwritten digits and fashions, withclassification rates of 81.8% and 71.3%, respectively. Fur-thermore, we clearly show its scalability using accuratenumerical simulations, where the classification rates in-deed increase with the number of optical neurons evenwith large random error. II. OPTICAL NEURONS AND INTERLAYERCONNECTIONS.
Fig.1 shows the schematics of optical neurons and theirinterconnections between two adjacent layers. As thebuilding block of an artificial neural network, neuronsreceive input signals u inj , conduct a linear transforma-tion z i = (cid:80) j W ij u inj , pass the results to the nonlinearactivation function f i ( • ) and produce the output signals u outi = f i ( z i ) as shown in Fig.1(a). In our AODNN,signals are encoded in the power of light, and computa-tions are conducted through light propagation, diffrac-tion, and interference. In detail, we perform a linear a r X i v : . [ phy s i c s . op ti c s ] F e b (a)(b) W 𝑁𝑁 × 𝑀𝑀 𝑢𝑢 𝑢𝑢 𝑀𝑀𝑖𝑖𝑖𝑖
Linear operation Nonlinear operation 𝑧𝑧 𝑧𝑧 𝑧𝑧 𝑁𝑁 𝑓𝑓 ( � ) |1 ⟩ |2 ⟩ |3 ⟩𝜔𝜔 𝑐𝑐 𝜎𝜎 + 𝜔𝜔 𝑝𝑝 𝜎𝜎 + Probe detuning (MHz)40-40 20-20 0 P r obe T r an s m i ss i on 𝜔𝜔 𝑐𝑐 𝐼𝐼 𝑝𝑝 , 𝑜𝑜𝑜𝑜𝑜𝑜 𝑓𝑓 ( 𝑧𝑧 𝑖𝑖 ) 𝐼𝐼 𝑝𝑝 , 𝑖𝑖𝑖𝑖 𝜔𝜔 𝑝𝑝 𝛺𝛺 𝑐𝑐2 𝑧𝑧 𝑖𝑖 𝑧𝑧 SLM Lens Viewing Screen 𝑢𝑢 𝑢𝑢 𝑢𝑢 𝑢𝑢 𝑁𝑁𝑜𝑜𝑜𝑜𝑜𝑜
FIG. 1. Optical neurons and connections. (a) Schematics of optical neurons and connections between two adjacent layers in anAODNN. The N nodes u inj form the input layer. W N × M is the N × M linear linear matrix connection. f i ( z i ) are M outputsafter the nonlinear activation functions f i ( • ). (b) Optical implementation of linear matrix operation W N × M . A SLM, placedon the back focal plane of an optical lens, splits each input optical nodes into multiple directions. The lens performs Fouriertransform and sum the light beams along the same directions onto a spot on its front focal plane. (c) EIT nonlinear opticalactivation function with a three-level atomic medium (top panel) and EIT transmission spectrum of the probe beam (bottompanel). The transmission of the probe beam power I p,in is nonlinearly controlled by the power I c of the control light. Therelevant Rb atomic energy level are | (cid:105) = | S / , F = 2 (cid:105) , | (cid:105) = | S / , F = 3 (cid:105) and | (cid:105) = | P / , F = 3 (cid:105) . The black dashedline in the bottom panel is obtained without the control beam and shows maximum absorption on resonance at zero probedetuning. The solid lines (blue and green) are two typical EIT spectra with different control light power. transformation using a SLM and a Fourier lens, as shownin Fig. 1(b). The input light beams ( u in ∼ u inn ) are in-cident on different parts of the SLM, which is placed atthe back focal plane of the lens. Through controlling themultiple phase gradings in a certain area of the SLM,the input beam array u inj are diffracted to different direc-tions with powers W ij u inj . Then, these weighted beamsfrom different neurons are focus onto the same spot onthe front focal plane of the lens and complete the lin-ear summation z i = (cid:80) j W ij u inj . It is worth noting thatthe computation time here is independent of the numberof neurons and interlayer connections but determined bythe propagation time of light from the SLM to the frontfocal plane of the lens. The wave nature of light makesit possible to perform linear transformation with mas-sive parallelism at the speed of light. Moreover, differentfrom many other systems, the layer structure, including the number of neurons and the weight matrix, in oursystem are fully programmable.The optical nonlinear activation functions are realizedusing EIT [33, 34] effect in cold Rb atoms preparedin a two-dimensional (2D) magneto-optical trap (MOT)[35, 36]. As shown in top panel of Fig. 1(c), a pair ofcounter-propagating probe ( ω p ) and control ( ω c ) beamsare shined on the atomic medium. The relevant Rbatomic energy levels are labelled as | (cid:105) , | (cid:105) and | (cid:105) , andthe atoms are in the state | (cid:105) . The circularly ( σ + )polarized probe laser is on resonance to the transition | (cid:105) → | (cid:105) , and the circularly ( σ + ) polarized control laseris on resonance to the transition | (cid:105) → | (cid:105) . Withoutthe control light, the atomic medium is opaque to theprobe beam due to its on-resonance absorption. In pres-ence of the control beam, the atomic medium becomestransparent and the probe transmission depends on thepower of the control light. The bottom panel of Fig. 1(c)shows typical probe transmission spectrums with differ-ent control light powers. The nonlinear dependency ofthe output probe power ( I outp ) on the control laser power( I c ) is described as I outp = f ( I c ) = I inp e − OD γ γ c +4 γ γ ,where I inp is the input probe beam power. Ω c is the Rabifrequency of the control beam and Ω c is proportional tocontrol beam intensity I c . Here, γ = 2 π × | (cid:105) . The ground-state dephasing rate γ can be engineered by applying external magnetic field. OD is the atomic optical depth on the probe transitionand can be varied by changing the atomic density. Inour optical neuron, the control light works as the inputsignal to its nonlinear activation function and the probelight output is the output signal of the activation func-tion. The EIT optical nonlinear activation function canbe tuned by manipulating the cold atom parameters OD and γ , and they are also independent for different neu-rons. Moreover, the energy cost here is very small. Underthe experimental condition that OD ≈ γ ≈ mW with the diameterof beam to be 100 µm . III. ERROR AND SCALABILITY
Clearly, the scalability of our AODNNs is determinedby accuracy of the linear interlayer connections z N = W N × M u inM , where z N ( u inM ) is a N ( M ) dimensional vec-tor, and W N × M is an N × M matrix. In our opticalimplementation, such a linear transformation is dividedinto two steps: (1) Weighted multiplication by the SLM:for any given j ∈ [1 , M ] and i ∈ [1 , N ], y ji = W ij u inj ; and(2) summation by the lens: z i = (cid:80) Mj =1 y ji . M and N arethe number of input and output nodes. As illustrated inFig. 1(b), the SLM is divided into M sections and thebeam on each section is split into N directions. Fig. 2(a)displays typical optical intensity distribution patterns of M = 144 input nodes on the SLM and N = 64 outputnodes split from one of the input nodes. Obviously, the M input sections ( u in ) on the SLM are independent. Fora given input node u inj , its splitting into N weighted out-puts, i.e. y ji = W ij u inj are handled by the same sectionarea S of SLM and inevitably entangled with each other.We use the weighted Gerchberg-Saxton (GSW)algorithm[37–39] to obtain an appropriate phase patternapplied on SLM for achieving target weights W ij . Ateach iteration of GSW algorithm, we measure powersof output spots and compute the root-mean-square-error(RMSE) between measured weights and target weights,which is taken as feedback for the next iteration (moredetails are shown in Appendix. A). A typical example ofiteration process is shown in Fig. 2(b), where the RMSErapidly decreases within 20 iteration steps and usually converges in less than 50 steps. To make our test ex-periment fair, we randomly choose the value of weightsbetween 0 and 1. The test result for N = 400 and S = 320 ×
180 is shown in Fig. 2(c). Overall, the ac-curacy is very high with a small error, and the error isslightly larger for target weights around zero because itis difficult to reduce the light power to zero due to thediffraction from the adjacent beams. As shown in theinsert of Fig. 2(c), the absolute error follows a normaldistribution centered at 0 with a standard deviation of0 . (cid:112) N/S , asshown in Fig. 2(d). For a small N , the RMSE is linearlyproportional to √ N for all different area size S of theinput node and the slope of fitted line is proportional to1 / √ S (more details in Supplementary InformationS1 ). Such a dependency of error on N and S is governedby the nature of Fourier transformation. More details ofthe theoretically analysis can be found in in Supplemen-tary Information S2 . As N becomes very large, thenearest diffracted spots cannot be well separated, whichincreases the error significantly. The threshold is indictedby the grey lines in the figure and it clearly increases with S because the area of each diffracted spot decrease with S .We then analyze the error accumulated in step (2) oflight power summation by comparing the measured out-put (cid:101) z i to the target values z i = (cid:80) j w ij u j . As shown inFig. 2(e), for a given S , the RMSE( M, N, S ) for different N is linearly proportional to √ M . We also find that fora given N , the RMSE( M, N, S ) for different S is linearlyproportional to √ M as shown in Fig. 2(f) (more details in Supplementary Information S1 ). Such a √ M depen-dency of RMSE( M, N, S ) confirms that the errors fromdifferent input nodes are random, independent and un-correlated, and the total error only accumulates in thestochastic way. Combining with the previous analysisof error dependency on N and S , we conclude that theerror of optical linear summation increases linearly with (cid:112) M N/S . It is worth noting that the linear combina-tion error mentioned is absolute error, which means therelative error decreases with (cid:112)
N/SM .
With all errors confirmed to remain local and random,the scalability of our AODNNs is mainly limited by thecapacity of linear matrix operation: number of opticalneurons and their interconnections between two adja-cent layers. For a given pixel size ( d × d ) and numberof pixels ( K ) of the SLM, the numerical aperture ( N A )of the Fourier lens, as well as the light wavelength λ ,our analysis shows the maximum connections, or the lin-ear capacity of a single SLM-lens layer, is determined by C L = ( M × N ) max = NA πKd λ (more details in the Ap-pendix B) . For the SLM (HOLOEYE, PLUTO-2) used inthis work, we have d = 8 µ m, K = 1920 × N A = 0 .
05 of the lens and the light wavelength
Iterations R M SE Target weight M ea s u r ed w e i gh t -0.05 0 0.05 Error R M SE R M SE R M SE R M SE 𝑆𝑆 = 320 × 180 𝑁𝑁 = 60 (a) (b) (c)(d) (e) (f) FIG. 2.
Scalability of linear operation W N × M . a , Typical images of input nodes vector u M in and the weighted output W ij u inj from an individual input node u inj . For the purpose of illustration, we take here N = 64 and M = 144. b , Error duringthe GSW feedback iteration process for configuring the weights W ij . The data are taken with S = 320 ×
180 and N = 400. c ,The measured weights v.s. the target weights. The inset histogram shows the error distribution. d is the dependency of erroron (cid:112) N/S . The yellow triangles, red squares and blue circles are the experimental RMSE data of S = 128 × S = 160 × S = 320 ×
180 accordingly. The dashed lines are the fitting curves of each case. e and f are the dependency of error on √ M with fixed S and N , respectively. λ = 795 nm, our single SLM-lens layer can accommo-date up to C L = 412 ,
287 connections. In experiment,due to the restricted size of the camera (HamamatsuC11440-22CU), the maximum connection number we testin Fig. 2(d) is 46,656. Our system still have potential tobe scaled up.
IV. AODNN APPLICATION EXAMPLES.
We confirm the feasibility and programmability of ourAODNNs by building a fully-connected two-layer all op-tical neural network to recognize both handwritten digitsand fashion images. The network structure and its opti-cal layout are shown in Fig.3(a) and 3(b). There are 144,20, and 10 optical neurons in the input, hidden and out-put layer, respectively. A collimated control laser beam(marked with red color) with horizontal linear polariza-tion, generated from a fiber output through lens L1 isshone on SLM1 that is placed on the back focal plane of lens L2. The SLM1 is divided into 144 sections andreflects the control light to produce 144 input nodes withcontrollable weights through the programmed phase grat-ings on each section (See
Supplementary InformationS3 and Fig.S1-S3 ). An aperture stop is placed on thefront focal plane of L2 to filter away unwanted high-orderdiffraction modes. Then these 144 spatial nodes are im-aged through the lens L3 on the SLM2. The combinationof the SLM2 and the lens L4 performs the first linear ma-trix operation W . The 20 outputs of this linear opera-tion are then imaged on the cold atoms in MOT througha telescopic setup of the lenses L5 and L6. A quarter-wave plate (QWP1) converts the horizontal linear po-larization into σ + circular polarization to meet the EITrequirement. The control beam transverse mode area in-side the MOT is about 400 µ m . A probe laser beam(marked with blue color, σ + circularly polarized afterQWP2) from the fiber output is collimated by the lensL7 with a beam diameter 8 mm and is shone to the MOTwith the opposite direction to the control beams. The Probe fiberL1L2L3 L4 L5 L6 L8 Control fiber CameraPBSSLM1SLM2 SLM3 L9QWP2 L7QWP1 MOT (a) (b) ⋮ ⋮ ⋮
144 112143 2910 𝐖𝐖 𝐖𝐖 FIG. 3.
A two-layer AODNN with 144 input neurons, 20 hidden neurons, and 10 output neurons. a , Thecorresponding artificial neural network architecture. b , The optical layout of the AODNN. Spatial light modulators: SLM1(HOLOEY LETO), SLM2(HOLOEYE PLUTO-2), and SLM3(HOLOEY GEAE-2). Camera: Hamamatsu C11440-22CU. PBS:polarization beam splitter. Lenses: L1 ( f = 100 mm), L2 ( f = 200 mm), L3 ( f = 250 mm), L4 ( f = 350 mm), L5 ( f = 350mm), L6 ( f = 50 mm), L7 ( f = 50 mm), L8 ( f = 450 mm) and L9 ( f = 450 mm). measured 20 nonlinear activation functions are plottedin Supplemental Information S4 and Fig.S4 . Thenonlinear functions of different nodes are not exactly thesame since OD , γ and probe beam intensity are spa-tially varied. The probe beam, after passing through theatomic medium spatially dressed by the control beams,contains the output spatial patterns of the 20 hidden neu-rons. Passing through QWP1, the probe beam nodes be-comes vertically polarized and is reflected by the PBS.The 20 hidden neuron outputs are then imaged on theSLM3 through the lenses L6 and L8. The combinationof SLM3 and L9 performs the second linear operation W and generates 10 output nodes recorded by a camera.We first test the ability of the AODNN for handwrittendigit recognition. The initial training is done with a com-puter based neural network with the same network struc-ture and nonlinear activation functions. We train theclassifier using Modified National Institute of Standardsand Technology (MNIST) handwritten digit database[40](Fig. 4(a), compressed into images with 12 ×
12 resolu- tion) and obtained the optimal parameters of the two lin-ear matrix operations W and W (See SupplementaryMaterials MatrixElement.mat ) We achieve a classi-fication rate of 81 . SupplementaryMaterials MatrixElement.mat ). We obtain a classifi-cation rate of 71 . (a)(b) (d)(e)(c) (f) FIG. 4.
Results of handwritten digit and fashion classifications. a and d are examples of the test set of 10 handwrittendigits and fashion patterns. b and e are the measured normalized confusion matrixes of the AODNN. c and f are the normalizedconfusion matrixes of computer based neural network with the same structure as the AODNN. Both test sets have 8000 patterns. simulations with random relative local errors generatedfrom Gaussian distribution N (0 , σ ), where σ is the stan-dard deviation of relative local error. As shown in theFig. 5(a), the classification rate indeed drops with localerror as expected. However, the drop is relatively slow,and encouragingly the classification rate still can achieve71.5% for the large relative error 0.25. We choose therandom relative error of standard deviation σ = 0 . supplementary Information S9 andFig.S7 . C oun t s C oun t s (b)(a) FIG. 5.
Classification rate of a two-layer AODNNsimulation as a function of (a) standard deviation ofrandom relative local errors and (b) the number ofhidden neurons. a , Numerical simulation of a two-layerAODNN by fixing 144 input neurons, 20 hidden neurons and10 output neurons. Random relative errors sampled fromGaussian distribution N (0 , σ ) are added to the weight ma-trix elements with σ from 0 to 0.25. b , Numerical simulationof two-layer AODNNs by fixing 144 input neurons, 10 outputneurons and varying the number of hidden neurons. Randomrelative errors sampled from Gaussian distribution N (0 , σ ) isadded to the weight matrix elements with σ = 0 .
15. On both a and b , we perform different 1000 simulations with eachcombination of σ and the number of hidden neurons. Thehistogram shows the classification rates distribution of 1000simulations. The red circles show the average classificationrate of 1000 simulations. The squares present classificationrate of simulation without error. The two-layer AODNNs aretrained with MNIST database. V. CONCLUSION
Through a systematical error measurement and analy-sis, we show that the AODNN based on SLM, Fourieroptics and EIT nonlinear activation functions is scal-able. We find that the errors remain local, randomand independent, ( ∝ (cid:112) N/S ) and the error accumula-tion only follows square-root law of the network dimen-sions ( ∝ (cid:112) M N/S ),and hence effect of these errors can beeasily compensated by increasing the number of hiddenneurons[31]. We confirm the feasibility and programma-bility of AODNN with handwritten digits and fashionimages recognitions and obtain comparable classificationrates to computer based neural networks with the samestructure. Although our experimental demonstrationstake example of a two-layer AODNN architecture withonly one hidden layer limited by our physical resourcein lab, it can be extended into a much deeper AODNN.In next hidden layer, we can prepare atoms in state | (cid:105) in a second MOT, where the probe beams from the pre-vious hidden layer serve as “control” signals for the EITnonlinearity and control beams become “probe” outputs. By carefully designing the EIT parameters (such as atomnumber and ground state dephasing rates), a functionaldeeper AODNN with multiple hidden layers is achievable.For an AODNN, we can also insert optical repeaters (suchas laser power amplifier) to compensate the passive lossesin a very deep network. The transverse size can be sim-ply extended with more SLMs and lenses. Our resultssuggest that a larger scale AODNN will be promising forlight-speed optical computation and could be a powerfulAI hardware. ACKNOWLEDGMENTS
Y. Zuo and Y. Zhao contributed equally to this work.Y. C. C. acknowledges the support from the Undergrad-uate Research Opportunities Program at the Hong KongUniversity of Science and Technology. J. L. acknowledgesthe support from the Hong Kong Research Grants Coun-cil (Projects No. 16306220).
Appendix A: Gerchberg-Saxton (GSW) algorithm
The weighted Gerchberg-Saxton (GSW) algorithm[42] is a widely used algorithm. We developed an adap-tive GSW iteration program to optimize the phase pat-terns of a SLM to obtain target weights in a linear trans-formation. To do so, we insert a flip mirror to theAODNN to reflect the light beams and image them intoa camera. The difference between the measured inten-sity pattern and the target is feeded back to adjust thephase pattern of the SLM until desired weights with ac-ceptable errors are achieved. More details are shown in
Supplementary Information S5 and Fig.S5 . Appendix B: Linear capacity of a single SLM-lenslayer
We use a SLM and a Fourier lens to perform a lin-ear matrix operation. There are total K pixels in theSLM and the area of each pixel is d . With total M input optical neurons, each node occupies a rectangulararea of Kd /M . On the front focal plane of the Fourierlens with a focal length f , the corresponding Fouriertransformed spot area is 4 M f λ / ( Kd ) (See supple-mentary Information S6 ). With the lens diameter D ,the maximum number of nodes on the Fourier plane is N max = KπD d Mf λ . Then the linear capacity is estimatedas C L = M × N max = NA πKd λ , where N A = D/ (2 f ) isthe numerical aperture of the lens. [1] M. I. Jordan and T. M. Mitchell, Science , 255 (2015). [2] K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev,and A. Walsh, Nature , 547 (2018). [3] E. Alpaydin, Introduction to machine learning (MITpress, 2020).[4] D. Ciregan, U. Meier, and J. Schmidhuber, in (IEEE, 2012) pp. 3642–3649.[5] A. Graves, A.-r. Mohamed, and G. Hinton, in (IEEE, 2013) pp. 6645–6649.[6] Y. Goldberg, Synthesis Lectures on Human LanguageTechnologies , 1 (2017).[7] P. Raccuglia, K. C. Elbert, P. D. F. Adler, C. Falk, M. B.Wenny, A. Mollo, M. Zeller, S. A. Friedler, J. Schrier,and A. J. Norquist, Nature , 73 (2016).[8] J. Carrasquilla and R. G. Melko, Nature Physics , 431(2017).[9] G. Carleo and M. Troyer, Science , 602 (2017).[10] J. Liu, Y. Qi, Z. Y. Meng, and L. Fu, Phys. Rev. B ,041101(R) (2017).[11] L. Huang and L. Wang, Physical Review B , 035105(2017).[12] J. Liu, H. Shen, Y. Qi, Z. Y. Meng, and L. Fu, PhysicalReview B , 241104(R) (2017).[13] A. Radovic, M. Williams, D. Rousseau, M. Kagan,D. Bonacorsi, A. Himmel, A. Aurisano, K. Terao, andT. Wongjirad, Nature , 41 (2018).[14] H. Shen, J. Liu, and L. Fu, Physical Review B , 205140(2018).[15] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer,R. Melko, and G. Carleo, Nature Physics , 447 (2018).[16] G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld,N. Tishby, L. Vogt-Maranto, and L. Zdeborov´a, Reviewsof Modern Physics , 045002 (2019).[17] A. M. Palmieri, E. Kovlakov, F. Bianchi, D. Yudin,S. Straupe, J. D. Biamonte, and S. Kulik, npj Quan-tum Information , 1 (2020).[18] P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cas-sidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam,C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser,R. Appuswamy, B. Taba, A. Amir, M. D. Flickner, W. P.Risk, R. Manohar, and D. S. Modha, Science , 668(2014).[19] S. Jutamulia and F. Yu, Optics & Laser Technology ,59 (1996).[20] Y. Abu-mostafa and D. Psaltis, Scientific American ,88 (1987).[21] L. Larger, M. C. Soriano, D. Brunner, L. Appeltant, J. M.Guti´errez, L. Pesquera, C. R. Mirasso, and I. Fischer,Optics express , 3241 (2012). [22] F. Duport, B. Schneider, A. Smerieri, M. Haelterman,and S. Massar, Optics express , 22783 (2012).[23] T.-Y. Cheng, D.-Y. Chou, C.-C. Liu, Y.-J. Chang, andC.-C. Chen, Neurocomputing , 239 (2019).[24] S. Jiao, J. Feng, Y. Gao, T. Lei, Z. Xie, and X. Yuan,Optics letters , 5186 (2019).[25] X. Sui, Q. Wu, J. Liu, Q. Chen, and G. Gu, IEEE Access , 70773 (2020).[26] J. Feldmann, N. Youngblood, C. D. Wright,H. Bhaskaran, and W. Pernice, Nature , 208(2019).[27] Y. Zuo, B. Li, Y. Zhao, Y. Jiang, Y.-C. Chen, P. Chen,G.-B. Jo, J. Liu, and S. Du, Optica , 1132 (2019).[28] D. Woods and T. J. Naughton, Nature Physics , 257(2012).[29] Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle,D. Englund, and M. Soljaˇci´c, Nature Photonics , 441(2017).[30] X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo,M. Jarrahi, and A. Ozcan, Science , 1004 (2018).[31] W. Sung, S. Shin, and K. Hwang, arXiv:1511.06488(2015).[32] K. Lu and B. E. Saleh, Optical Engineering , 240(1990).[33] S. E. Harris, Physics Today , 36 (1997).[34] M. Fleischhauer, A. Imamoglu, and J. P. Marangos, Re-views of modern physics , 633 (2005).[35] H. J. Metcalf and P. van der Straten, JOSA B , 887(2003).[36] S. Zhang, J. F. Chen, C. Liu, S. Zhou, M. M. T. Loy,G. K. L. Wong, and S. Du, Review of Scientific Instru-ments , 073102 (2012).[37] N. Matsumoto, T. Inoue, T. Ando, Y. Takiguchi,Y. Ohtake, and H. Toyoda, Optics letters , 3135(2012).[38] D. Kim, A. Keesling, A. Omran, H. Levine, H. Bernien,M. Greiner, M. D. Lukin, and D. R. Englund, Opticsletters , 3178 (2019).[39] F. Nogrette, H. Labuhn, S. Ravets, D. Barredo,L. B´eguin, A. Vernier, T. Lahaye, and A. Browaeys,Physical Review X , 021034 (2014).[40] The MNIST database of handwritten digits,http://yann.lecun.com/exdb/mnist/ .[41] H. Xiao, K. Rasul, and R. Vollgraf, arXiv:1708.07747(2017).[42] R. Di Leonardo, F. Ianni, and G. Ruocco, Optics Express15