[PDF] Adjoint method and inverse design for nonlinear nanophotonic devices

Abstract

The development of inverse design, where computational optimization techniques are used to design devices based on certain specifications, has led to the discovery of many compact, non-intuitive structures with superior performance. Among various methods, large-scale, gradient-based optimization techniques have been one of the most important ways to design a structure containing a vast number of degrees of freedom. These techniques are made possible by the adjoint method, in which the gradient of an objective function with respect to all design degrees of freedom can be computed using only two full-field simulations. However, this approach has so far mostly been applied to linear photonic devices. Here, we present an extension of this method to modeling nonlinear devices in the frequency domain, with the nonlinear response directly included in the gradient computation. As illustrations, we use the method to devise compact photonic switches in a Kerr nonlinear material, in which low-power and high-power pulses are routed in different directions. Our technique may lead to the development of novel compact nonlinear photonic devices.

Full PDF

AAdjoint method and inverse design for nonlinear nanophotonic devices

Tyler W. Hughes, ∗ Momchil Minkov, ∗ Ian A. D. Williamson, and Shanhui Fan † Department of Electrical Engineering, and Ginzton Laboratory, Stanford University, Stanford, CA 94305, USA (Dated: November 6, 2018)The development of inverse design, where computational optimization techniques are used todesign devices based on certain speciﬁcations, has led to the discovery of many compact, non-intuitive structures with superior performance. Among various methods, large-scale, gradient-basedoptimization techniques have been one of the most important ways to design a structure containinga vast number of degrees of freedom. These techniques are made possible by the adjoint method,in which the gradient of an objective function with respect to all design degrees of freedom canbe computed using only two full-ﬁeld simulations. However, this approach has so far mostly beenapplied to linear photonic devices. Here, we present an extension of this method to modelingnonlinear devices in the frequency domain, with the nonlinear response directly included in thegradient computation. As illustrations, we use the method to devise compact photonic switches in aKerr nonlinear material, in which low-power and high-power pulses are routed in diﬀerent directions.Our technique may lead to the development of novel compact nonlinear photonic devices.

In recent years, there has been signiﬁcant interest inusing computational optimization tools to design novelnanophotonic devices with a wide range of applications[1–18]. Much of this progress [1–12] is made possible bythe adjoint method [18, 19], a technique which allows thegradient of an objective function to be computed withrespect to an arbitrarily large number of degrees of free-dom using only two full-ﬁeld simulations. This methodmakes large-scale gradient-based design of electromag-netic structures possible. When compared to brute forcesearching through the parameter space [13], and othercommonly used design methods, like stochastic global op-timization algorithms [14–17], gradient-based design hasa number of practical advantages. For example, a verylarge number of design parameters can be adjusted simul-taneously, and the number of structures one is requiredto evaluate in order to reach a high-performing structurecan be far smaller compared with the total number ofstructures in the search space.Up to now, in photonics the adjoint method has beenmostly applied to gradient-based optimization of linearoptical devices. The generalization of the adjoint methodto nonlinear optical devices would create new possibilitiesin several exciting ﬁelds such as on-chip lasers [20], fre-quency combs [21], spectroscopy [22], neural computing[23], and quantum information processing [24]. To thisend, several recent works [25–27] have applied adjointmethods to engineer linear devices to display favorableproperties for nonlinear optical applications, such as highquality factors, small mode volume, or large ﬁeld overlapbetween the modes of interest. However, these works donot directly optimize the nonlinear systems.To solve for the adjoint sensitivity of a nonlinear sys-tem, the standard option is to work within a time-domainadjoint formalism, which entails simulating an additionallinear system with a time-varying permittivity [3]. How-ever, as this formalism requires the storage of the ﬁelds ateach time step, it has substantial memory requirements. Furthermore, because in many cases the steady-state be-havior of the system is of interest, a frequency-domainapproach is preferred as the steady state response canbe obtained directly, without the need for going througha large number of time steps as in a time-domain simu-lation. The general mathematical formalism for the ad-joint method in nonlinear systems is known in the appliedmathematics literature [28]. But, with the exception of avery recent preprint that seeks to design a nonlinear ele-ment in an optical neural network [23], such a formalismhas not been previously applied to nonlinear photonicdevice optimizations.In this work we outline, in detail, how the adjointmethod may be used to optimize the steady-state re-sponse of a nonlinear optical device in the frequency do-main. We ﬁrst outline a formalism for generalizing ad-joint problems to arbitrary nonlinear problems. Then,as a demonstration, we use our method to inverse-designphotonic switches with Kerr nonlinearity. Our resultsmay be applied more generally to other objective func-tions and sources of nonlinearity and provides new pos-sibilities for designing novel nonlinear optical devices.

NONLINEAR ADJOINT METHOD

We ﬁrst outline the formulation of the adjoint methodfor the inverse design of nonlinear optical devices. Thegoal of inverse design is to ﬁnd a set of real-valued designvariables ϕ that maximize a real-valued objective func-tion L = L ( e , e ∗ , ϕ ), where the complex-valued vector e is given by the solution to the equation f ( e , e ∗ , ϕ ) = 0 . (1)For example, eq. (1) may represent the steady-stateMaxwell’s equations with an intensity-dependent permit-tivity distribution where e is the electric ﬁeld distribu-tion. The solution to eq. (1) may be found with any non-linear equation solver, such as with the Newton-Raphson a r X i v : . [ phy s i c s . op ti c s ] N ov method [29]. We further note that the treatment of e and its complex conjugate as independent variables isnecessary for diﬀerentiation as will be shown later.The aim of the optimization is to maximize the ob-jective function with respect to the design variables ϕ .For this purpose, it is essential to compute the sensitivityof L with respect to each element of ϕ . For simplicity,we derive the derivative of the objective function withrespect to a single parameter ϕ , which is written d L dϕ = ∂ L ∂ϕ + ∂ L ∂ e d e dϕ + ∂ L ∂ e ∗ d e ∗ dϕ . (2)Or, in matrix form as d L dϕ = ∂ L ∂ϕ + (cid:2) ∂ L /∂ e ∂ L /∂ e ∗ (cid:3) (cid:20) d e /dϕd e ∗ /dϕ (cid:21) . (3)To compute d e /dϕ and d e ∗ /dϕ , we diﬀerentiate eq. (1): d f dϕ = 0 = ∂ f ∂ϕ + ∂ f ∂ e d e dϕ + ∂ f ∂ e ∗ d e ∗ dϕ . (4)Eq. (4) together with its complex conjugate then yields (cid:20) ∂ f /∂ e ∂ f /∂ e ∗ ∂ f ∗ /∂ e ∂ f ∗ /∂ e ∗ (cid:21) (cid:20) d e /dϕd e ∗ /dϕ (cid:21) = − (cid:20) ∂ f /∂ϕ∂ f ∗ /∂ϕ (cid:21) . (5)Thus, formally we can rewrite eq. (3) as d L dϕ = ∂ L ∂ϕ − (6) (cid:2) ∂ L /∂ e ∂ L /∂ e ∗ (cid:3) (cid:20) ∂ f /∂ e ∂ f /∂ e ∗ ∂ f ∗ /∂ e ∂ f ∗ /∂ e ∗ (cid:21) − (cid:20) ∂ f /∂ϕ∂ f ∗ /∂ϕ (cid:21) . In analogy with the linear adjoint method, we can nowcompute the gradient by solving an additional linear sys-tem. We deﬁne a complex-valued adjoint ﬁeld e aj as thesolution to (cid:20) ∂ f /∂ e ∂ f /∂ e ∗ ∂ f ∗ /∂ e ∂ f ∗ /∂ e ∗ (cid:21) T (cid:20) e aj e ∗ aj (cid:21) = − (cid:34) ∂ L /∂ e T ∂ L /∂ e ∗ T (cid:35) , (7)and the gradient of the objective function is then d L dϕ = ∂ L ∂ϕ + 2 R (cid:18) e Taj ∂ f ∂ϕ (cid:19) , (8)where R denotes taking the real part. In deriving eq.(8), we have used the fact that both L and ϕ are real. Inthe case of multiple parameters ϕ , we can simply replace ∂ f /∂ϕ with the matrix ∂ f /∂ ϕ . Since e aj only needs tobe solved once regardless of the number of parameters,gradients may be computed with very little marginal costfor an arbitrary number of free parameters, making large-scale, gradient-based optimization possible. APPLICATION TO KERR NONLINEARITY

We now apply the general formalism as discussed aboveto the optimization of nonlinear optical systems. Sincethe formalism is applicable to linear optical systems aswell, for illustration purposes here we use it to treatboth the linear and the nonlinear cases, in order to high-light aspects that are unique to nonlinear systems. Aschematic outlining the two cases is presented in Fig. 1.For a linear system, Maxwell’s equations for the steady-state at a frequency ω may be written as µ − ∇ × ∇ × E ( r ) − ω (cid:15) (cid:15) r ( r ) E ( r ) = − iω J ( r ) , (9)where E ( r ) is the electric ﬁeld, J ( r ) is the electric currentsource, (cid:15) r ( r ) is the relative dielectric permittivity, and wehave assumed relative permeability µ r = 1 everywhere.Compactly, and to make connection to the general for-malism in the previous section, this can be written inmatrix form as f ( e , e ∗ , ϕ ) = A ( (cid:15) r ) e − b = 0 , (10)where A is a linear operator, vectors e and (cid:15) r now con-tain the electric ﬁelds and the relative permittivity, re-spectively, and b is a vector proportional to the currentsource. The design parameters ϕ in this case is the per-mittivity distribution (cid:15) r . Eq. (10) can be solved to ob-tain the electric ﬁelds e , as diagrammed by Fig. 1(a).We assume an objective function L that depends onthe ﬁeld solution to eq. (10) and we take the linear rel-ative permittivity distribution as the set of design vari-ables. Because ∂ f /∂ e = A and ∂ f /∂ e ∗ = 0 for the linearsystem, from eq. (7), the adjoint ﬁeld may be writtensimply as the solution to the equation A T ( (cid:15) r ) e aj = − ( ∂ L /∂ e ) T , (11)as shown in Fig. 1(b). For a reciprocal system, A T = A ,thus the original and the adjoint ﬁelds are solutions tothe same linear problem but with diﬀerent source terms.Note that the source for the adjoint ﬁeld, − ( ∂ L /∂ e ) T depends on both the objective function and the originalsolution.Once the adjoint ﬁeld is computed, the gradient of theobjective function with respect to the permittivity dis-tribution is given, through eq. (8), by d L d (cid:15) r = ∂ L ∂ (cid:15) r + 2 R (cid:18) e Taj ∂A∂ (cid:15) r e (cid:19) (12)= ∂ L ∂ (cid:15) r − ω (cid:15) R (cid:0) e Taj e (cid:1) . (13)Having reviewed the adjoint formalism for linear opti-cal systems we now consider nonlinear optical systems.As an example, we introduce Kerr nonlinearity into thesystem [30], which corresponds to an intensity-dependentpermittivity˜ (cid:15) r ( r ) = (cid:15) r ( r ) + 3 ω (cid:15) χ (3) ( r ) | E ( r ) | , (14) Figure 1. Illustration of the adjoint ﬁeld computation for alinear and a nonlinear system. (a) The linear system drivenby a point source b with an objective function L given by theﬁeld intensity at a measuring point. (b) The adjoint prob-lem for the linear system: the same system driven by a pointsource given by − ∂ L /∂ e located at the measuring point. (c)The nonlinear system containing a medium with Kerr nonlin-earity (red). The electric ﬁelds are the solution to a nonlinearequation. (d) The adjoint problem for the nonlinear system,which is a linear system of equations for the adjoint ﬁeld andits complex conjugate. The Kerr medium is replaced by a lin-ear region whose permittivity depends on the nonlinear ﬁelds. where χ (3) ( r ) is the nonlinear susceptibility distribution.Other types of nonlinear terms can also be treated withthe formalism outlined above. Replacing (cid:15) r ( r ) in Eq. (9)with ˜ (cid:15) r ( r ) in Eq. (14), our system is then described bythe equation: f ( e , e ∗ , ϕ ) = A nl ( (cid:15) r , χ , e ) e − b = 0 , (15)where A nl ≡ (cid:2) A ( (cid:15) r ) − diag (cid:0) χ (cid:12) | e | (cid:1)(cid:3) . Here, (cid:12) iselement-wise vector multiplication and diag( v ) repre-sents a diagonal matrix with vector v on the main diag-onal. The vector χ corresponds to the term 3 ω (cid:15) χ (3) ( r )and | e | ≡ e (cid:12) e ∗ . Again, for concreteness, the designparameters ϕ correspond to the permittivity (cid:15) r . Thesolution to this problem is diagrammed in Fig. 1(c).From eq. (7) we may now compute the partial deriva-tives of f with respect to the electric ﬁelds e , which isneeded to construct the adjoint problem. ∂ f /∂ e = A − (cid:0) χ (cid:12) | e | (cid:1) (16) ∂ f ∗ /∂ e = − diag ( χ (cid:12) e ∗ (cid:12) e ∗ ) . (17)With this, we then express the adjoint ﬁeld as a solutionto the linear system( ∂ f /∂ e ) T e aj + ( ∂ f ∗ /∂ e ) T e ∗ aj = − ( ∂ L /∂ e ) T , (18)which is diagrammed in Fig. 1(d).For a nonlinear system, to obtain the ﬁeld e , one willneed to solve a nonlinear equation (e.g. eq. (15)). How-ever, we emphasize that the adjoint problem, as required to determine the derivative of the objective function, is a linear problem. The size of the adjoint problem is twiceas large as the corresponding linear problem of Eq. (10),but it is of a similar form, with the source dependentupon the solution e .Once the adjoint ﬁeld is computed, the gradient d L /d (cid:15) r is evaluated from eq. (13) as in the linear case.Here for simplicity we do not assume any explicit depen-dence of the nonlinearity on the design variable. How-ever, the formalism is straightforward to extend to thatcase, as explained in the Supplementary Information. INVERSE DESIGN OF OPTICAL SWITCHES

We now demonstrate the use of this nonlinear adjointformalism to inverse design optical switches with desiredpower-dependent performance characteristics. In Figs. 2and 3, we show the optimization procedures and perfor-mance characteristics of a 1 → → µ m.For each device, we seek to maximize the correspond-ing objective function with respect to the permittivitydistribution within a ﬁxed design region. To performthe numerical optimization of the structure, we use theﬁnite-diﬀerence frequency-domain method (FDFD) [31],where the ﬁelds and operators of Eq. (9) are spatiallydiscretized using a Yee lattice [32]. For simplicity, we re-strict our study to two-dimensional structures (i.e. struc-tures with inﬁnite extent in the third dimension), andtransverse-magnetic polarization, which has only non-zero out-of-plane electric ﬁeld components. In the opti-mization process, we start with an initial relative permit-tivity in the design region. We solve the electric ﬁeld dis-tribution in the structure by solving the nonlinear equa-tion (eq. (15)). Then, we compute the gradient of L with respect to the relative permittivity distribution inthis region using eq. (13). With the gradient informa-tion, we perform updates of the design variables using thelimited-memory BFGS [33] algorithm, although a simplegradient ascent algorithm would also suﬃce. This proce-dure is repeated until convergence on a ﬁnal structure.We choose optimization parameters corresponding toa device made from chalcogenide glass (Al S ), which ex-hibits a strong χ (3) response and high damage threshold[30, 34, 35]. During the optimization, the relative per-mittivity was constrained to lie between 1 (air) and 5 . S ). We further assume that the materials exhibitnonlinearity only within the design regions outlined inFig. 2(a) and 3(a).To create a more realistic ﬁnal structure, the strengthof the nonlinear susceptibility was assumed to be propor-tional to the ”density” of material, ρ , deﬁned as ρ ( r ) = (cid:15) r ( r ) − (cid:15) m − , (19) design region linearnonlinear X (a)

5 m 10 input power (W / m)0.00.51.0 t r a n s m i ss i o n (b) low power (c)

5 m high power (d)

5 m10 | E z |/ P in | E z |/ P in Figure 2.

Inverse design demonstration of a → port switch. (a) Optical power is input into the left port (purplearrow). The goal of optimizing the design region (blue square) is to maximize power transmission in the linear regime (bluearrow) and minimize transmission in the nonlinear regime (red arrow). The ﬁnal permittivity distribution after optimizing isalso shown. The black regions are chalcogenide with a relative permittivity of 5.95 and a χ (3) of 4 . × − m / V . Thewaveguide regions outside the design region have a width of 0 . µ m. The operating wavelength is 2 µ m. (b) The transmissionas a function of input power, demonstrating the switching behavior at around 10 − W/ µ m. The dashed black line indicatesthe input power used for the high-power regime in the optimization. (c-d) The amplitude of the simulated electric ﬁeld of theﬁnal structure, in the linear (c) and nonlinear (d) regimes, respectively – with input power of 10 − W/ µ m and 0 .

157 W/ µ m,respectively. E z corresponds to the out-of-plane electric ﬁeld in the 2D simulation. where (cid:15) m is the permittivity of the material. This as-sumption ensures that air regions do not exhibit a non-linear refractive index. Eq. (19) adds an (cid:15) r dependencein the nonlinear susceptibility, which is straightforwardlytreated in the adjoint method, as discussed in the Supple-mentary Information. Low-pass spatial ﬁltering and pro-jection techniques [36] were applied during optimizationto create binarized (air and chalcogenide) ﬁnal structureswith large, smoothed features. Additional details on thisare described in the Supplementary Information.Our ﬁrst device, as shown in Fig. 2, consists of awaveguide-fed 1 → L ( e low , e high ) = (cid:12)(cid:12) m T e low (cid:12)(cid:12) − (cid:12)(cid:12) m T e high (cid:12)(cid:12) , (20)where e low and e high are the simulated ﬁelds with a lowand a high input power, respectively, m is the modal pro-ﬁle of the electric ﬁeld for the waveguide in the outputport, and the objective function is normalized such that its maximum value is 1. The optimization setup and theoptimized structure are diagrammed in Fig. 2(a). Theﬁnal structure resembles a resonator between two Braggmirrors, eﬀectively acting like a bistable switch [37, 38].Fig. 2(b) shows the transmission as a function of the in-put power, and it clearly switches from high to low as theinput power increases. This is also illustrated in panels(c)-(d), where we plot the ﬁeld amplitude distributionsin the low-power (high-transmission) regime and in thehigh-power (low-transmission) regime, respectively. Thecomputed power transmission coeﬃcients for these twopanels are 98.2% and 3.1%, respectively. The value of theinput power used in the optimization and in panel (d) isshown by a dashed line in panel (b). At this input power,the device exhibits a maximum nonlinear refractive indexshift of 4 . × − , which is below the damage thresh-old for Al S using sub-nanosecond pulses [39] (see Sup-plementary Information). The transmission spectrum ofthis structure gives a resonance peak with a full-widthat half maximum of 38GHz (see Supplementary Infor-mation). In the Supplementary Information, we also listthe speciﬁc optimization parameters, and show the valueof the objective function during the optimization process.Reaching the ﬁnal optimized structure shown in Fig. 2required the evaluation of 2000 structures, but a reason-ably high-performing structure is already reached after designregion linearnonlinear (a)

5 m 10 input power (W / m)0.00.20.40.60.81.0 t r a n s m i ss i o n (b) right portbottom portlow power (c)

5 m high power (d)

5 m10 | E z |/ P in | E z |/ P in Figure 3.

Inverse design demonstration of a → port switch. (a) Optical power is input into the left port (purplearrow). The goal of optimizing the design region (blue square) is to maximize the power transmission to the right port (bluearrow) in the linear regime and maximize transmission to the bottom port (red arrow) in the nonlinear regime. The ﬁnalpermittivity distribution after optimizing is also shown. (b) The transmission in the right (blue) and bottom (red) ports asa function of input power, demonstrating the switching behavior at around 2 × − W/ µ m. The dashed black line indicatesthe input power used for the high-power regime in the optimization. (c-d) The amplitude of the simulated electric ﬁeld of theﬁnal structure, in the linear (c) and nonlinear (d) regimes, respectively – with input power of 10 − W/ µ m and 0 .

057 W/ µ m,respectively. only a few hundred iterations.We also use the same technique for the inverse designof a 1 → L ( e low , e high ) = (cid:12)(cid:12) m T r e low (cid:12)(cid:12) − (cid:12)(cid:12) m T r e high (cid:12)(cid:12) (21) − (cid:12)(cid:12) m T b e low (cid:12)(cid:12) + (cid:12)(cid:12) m T b e high (cid:12)(cid:12) , where m r and m b denote the mode proﬁles of the waveg-uides in the right and bottom ports, respectively. Theseare normalized such that the objective function has a maximum value of 1 for a perfect switching operation.The setup of the optimization problem and the ﬁnal de-sign are diagrammed in Fig. 3(a). We note that thedevice displays a non-intuitive geometry while retaininglarge features and good binarization.In Fig. 3(b), we plot the transmission through theright and through the bottom ports as a function of in-put power. This clearly shows the switching of powerfrom the right port to the bottom port in the linear andnonlinear regimes, respectively. Speciﬁcally, in the linearregime, the device has a power transmission of 81.8% and5.9% to the right and bottom ports, respectively, whilein the nonlinear regime, at the input power marked bythe dashed line in Fig. 3(b), these values are 6.1% and80.8%, respectively. The electric ﬁeld amplitudes for lin-ear and nonlinear regimes are displayed in 3(c)-(d). Theoperational bandwidth for this device is approximately90 GHz (see Supplementary Information). The deviceexhibits a maximum nonlinear refractive index shift of3 . × − , which is also below the acceptable damagethreshold for Al S using sub-nanosecond pulses [39]. Afull list of optimization parameters and a plot of the ob-jective function vs. iteration number is shown in theSupplementary Information. DISCUSSION

We have presented an extension to the adjoint vari-able method applied to the optimization of an electro-magnetic system with Kerr nonlinearity. Our approachcan be straightforwardly applied to other types of non-linearities which do not mix frequencies, such as sat-urable gain or absorption. Moreover, the methods hereshould be straightforwardly generalizable to treat nonlin-ear problems involving frequency mixing. For example,one can imagine implementing a similar adjoint methodin combination with the multi-frequency ﬁnite-diﬀerencefrequency-domain implementations for nonlinear wave in-teractions [40].In addition to the design of optical switches, our for-malism may prove useful for many other interesting prob-lems in nonlinear photonics. For example, one could ap-ply our approach to design nonlinear elements in opti-cal neural networks [41] with speciﬁc forms of activationfunctions. Another interesting application is power reg-ulation in photonic networks. For example, as photonicnetworks for laser-driven particle accelerators [42] mustbe able to handle large input powers, it may be of interestto use our approach to design compact optical limiters inthese networks. For the purposes of exploring these andmany other potential applications, we have made pub-licly available a software package that implements thealgorithms discussed here [43].To summarize this paper, we have developed an adjointmethod, which enables gradient optimization of nonlinearphotonic devices. Our work broadens the capability ofinverse design for producing novel nonlinear devices.This work is supported by the Gordon and BettyMoore Foundation (GBMF4744); the Swiss National Sci-ence Foundation (P300P2 177721); and the Air Force Of-ﬁce of Scientiﬁc Research (AFOSR) (FA9550-17-1-0002). ∗ These authors contributed equally. † [email protected][1] Christopher M. Lalau-Keraly, Samarth Bhargava,Owen D. Miller, and Eli Yablonovitch, “Adjoint shape optimization applied to electromagnetic design,” OpticsExpress , 21693–21701 (2013).[2] Jiahui Wang, Yu Shi, Tyler Hughes, Zhexin Zhao,and Shanhui Fan, “Adjoint-based optimization of activenanophotonic devices,” Optics Express , 3236–3248(2018).[3] Y. Elesin, B. S. Lazarov, J. S. Jensen, and O. Sigmund,“Design of robust and eﬃcient photonic switches usingtopology optimization,” Photonics and Nanostructures -Fundamentals and Applications , 153–165 (2012).[4] Alexander Y. Piggott, Jan Petykiewicz, Logan Su, andJelena Vuˇckovi´c, “Fabrication-constrained nanophotonicinverse design,” Scientiﬁc Reports , 1786 (2017).[5] Alexander Y. Piggott, Jesse Lu, Konstantinos G.Lagoudakis, Jan Petykiewicz, Thomas M. Babinec, andJelena Vuˇckovi´c, “Inverse design and demonstration ofa compact and broadband on-chip wavelength demulti-plexer,” Nature Photonics , 374–377 (2015).[6] C. Y. Kao, S. Osher, and E. Yablonovitch, “Maximiz-ing band gaps in two-dimensional photonic crystals byusing level set methods,” Applied Physics B , 235–244(2005).[7] Tyler Hughes, Georgios Veronis, Kent P. Wootton,R. Joel England, and Shanhui Fan, “Method for com-putationally eﬃcient design of dielectric laser acceleratorstructures,” Optics Express , 15414–15427 (2017).[8] Sean Molesky, Zin Lin, Alexander Y. Piggott, WeiliangJin, Jelena Vuˇckovi´c, and Alejandro W. Rodriguez,“Outlook for inverse design in nanophotonics,” arXivpreprint arXiv:1801.06715 (2018), arXiv: 1801.06715.[9] O. Sigmund and J. Sondergaard Jensen, “Systematicdesign of phononic band-gap materials and structuresby topology optimization,” Philosophical Transactions ofthe Royal Society A: Mathematical, Physical and Engi-neering Sciences , 1001–1019 (2003).[10] Ren Matzen, Jakob S. Jensen, and Ole Sigmund, “Sys-tematic design of slow-light photonic waveguides,” Jour-nal of the Optical Society of America B , 2374–2382(2011).[11] Jakob S. Jensen and Ole Sigmund, “Topology optimiza-tion of photonic crystal structures: a high-bandwidthlow-loss T-junction waveguide,” Journal of the OpticalSociety of America B , 1191 (2005).[12] Louise F. Frellsen, Yunhong Ding, Ole Sigmund, andLars H. Frandsen, “Topology optimized mode multiplex-ing in silicon-on-insulator photonic wire waveguides,”Optics Express , 16866–16873 (2016).[13] Bing Shen, Peng Wang, Randy Polson, and Ra-jesh Menon, “An integrated-nanophotonics polarizationbeamsplitter with 2.4 × µ m footprint,” Nature Pho-tonics , 378–382 (2015).[14] Linfang Shen, Zhuo Ye, and Sailing He, “Design of two-dimensional photonic crystals with large absolute bandgaps using a genetic algorithm,” Physical Review B ,035109 (2003).[15] Momchil Minkov and Vincenzo Savona, “Automated op-timization of photonic crystal slab cavities.” Scientiﬁc re-ports , 5124 (2014).[16] Momchil Minkov and Vincenzo Savona, “Wide-band slowlight in compact photonic crystal coupled-cavity waveg-uides,” Optica , 631–634 (2015).[17] Yu Shi, Wei Li, Aaswath Raman, and Shanhui Fan, “Op-timization of Multilayer Optical Films with a MemeticAlgorithm and Mixed Integer Programming,” ACS Pho- tonics , 684–691 (2018).[18] Georgios Veronis, Robert W. Dutton, and Shanhui Fan,“Method for sensitivity analysis of photonic crystal de-vices,” Optics Letters , 2288–2290 (2004).[19] Michael B Giles and Niles A Pierce, “An introduction tothe adjoint approach to design,” Flow, turbulence andcombustion , 393–415 (2000).[20] Daiki Yamashita, Yasushi Takahashi, Takashi Asano,and Susumu Noda, “Raman shift and strain eﬀect inhigh-Q photonic crystal silicon nanocavity,” Optics Ex-press , 3951 (2015).[21] Yoshitomo Okawachi, Kasturi Saha, Jacob S. Levy,Y. Henry Wen, Michal Lipson, and Alexander L. Gaeta,“Octave-spanning frequency comb generation in a siliconnitride chip,” Optics Letters , 3398 (2011).[22] Joong Ho Moon, Jin Ho Kim, Ki-jeong Kim, Tai-Hee Kang, Bongsoo Kim, Chan-Ho Kim, Jong HoonHahn, and Joon Won Park, “Absolute Surface Densityof the Amine Group of the Aminosilylated Thin Layers:Ultraviolet-Visible Spectroscopy, Second Harmonic Gen-eration, and Synchrotron-Radiation Photoelectron Spec-troscopy Study,” Langmuir , 4305–4310 (1997).[23] Erfan Khoram, Ang Chen, Dianjing Liu, Qiqi Wang,and Zongfu Yu, “Stochastic optimization of nonlinearnanophotonic media for artiﬁcial neural inference,” arXivpreprint arXiv:1810.07815 (2018).[24] Xiang Guo, Chang-Ling Zou, Hojoong Jung, andHong X. Tang, “On-Chip Strong Coupling and Eﬃ-cient Frequency Conversion between Telecom and Visi-ble Optical Modes,” Physical Review Letters (2016),10.1103/PhysRevLett.117.123902.[25] Zin Lin, Xiangdong Liang, Marko Lonˇcar, Steven G.Johnson, and Alejandro W. Rodriguez, “Cavity-enhanced second-harmonic generation via nonlinear-overlap optimization,” Optica , 233 (2016).[26] Zin Lin, Marko Lonˇcar, and Alejandro W. Rodriguez,“Topology optimization of multi-track ring resonatorsand 2d microcavities for nonlinear frequency conversion,”Optics Letters , 2818 (2017).[27] Jorge Bravo-Abad, Alejandro Rodriguez, Peter Bermel,Steven G. Johnson, John D. Joannopoulos, and MarinSoljaˇci´c, “Enhanced nonlinear optics in photonic-crystalmicrocavities,” Optics Express , 16161–16176 (2007).[28] Gilbert Strang, Computational Science and Engineering (Wellesley-Cambridge Press, 2007) chapter 8.[29] William H Press, Saul A Teukolsky, William T Vetter-ling, and Brian P Flannery,

Numerical recipes 3rd edi-tion: The art of scientiﬁc computing (Cambridge univer-sity press, 2007).[30] Robert W. Boyd,

Nonlinear Optics (Academic Press,2008).[31] Wonseok Shin and Shanhui Fan, “Choice of the perfectlymatched layer boundary condition for frequency-domainmaxwell’s equations solvers,” Journal of ComputationalPhysics , 3406–3431 (2012).[32] Kane Yee, “Numerical solution of initial boundary value problems involving maxwell’s equations in isotropic me-dia,” IEEE Transactions on antennas and propagation , 302–307 (1966).[33] Richard H Byrd, Peihuang Lu, Jorge Nocedal, and CiyouZhu, “A limited memory algorithm for bound constrainedoptimization,” SIAM Journal on Scientiﬁc Computing , 1190–1208 (1995).[34] Richard T. White and Tanya M. Monro, “Cascaded Ra-man shifting of high-peak-power nanosecond pulses inas s and as se optical ﬁbers,” Optics Letters , 2351(2011).[35] Michael R. Lamont, Barry Luther-Davies, Duk-YongChoi, Steve Madden, and Benjamin J. Eggleton, “Su-percontinuum generation in dispersion engineered highlynonlinear ( γ = 10 w/m) as s chalcogenide planar waveg-uide,” Optics Express , 14938 (2008).[36] Mingdong Zhou, Boyan S Lazarov, Fengwen Wang, andOle Sigmund, “Minimum length scale in topology op-timization by geometric constraints,” Computer Meth-ods in Applied Mechanics and Engineering , 266–282(2015).[37] Marin Soljaˇci´c, Mihai Ibanescu, Steven G. Johnson, YoelFink, and J. D. Joannopoulos, “Optimal bistable switch-ing in nonlinear photonic crystals,” Phys. Rev. E ,055601 (2002).[38] Mehmet Fatih Yanik, Shanhui Fan, Marin Soljaˇci´c, andJ. D. Joannopoulos, “All-optical transistor action withbistable switching in a photonic crystal cross-waveguidegeometry,” Optics Letters , 2506 (2003).[39] Marine Chorel, Thomas Lanternier, Eric Lavastre, Nico-las Bonod, Bruno Bousquet, and J´erˆome N´eauport, “Ro-bust optimization of the laser induced damage thresholdof dielectric mirrors for high power lasers,” Optics express , 11764–11774 (2018).[40] Yu Shi, Wonseok Shin, and Shanhui Fan, “Multi-frequency ﬁnite-diﬀerence frequency-domain algorithmfor active nanophotonic device simulations,” Optica ,1256–1259 (2016).[41] Yichen Shen, Nicholas C Harris, Scott Skirlo, MihikaPrabhu, Tom Baehr-Jones, Michael Hochberg, Xin Sun,Shijie Zhao, Hugo Larochelle, Dirk Englund, and MarinSoljaˇci´c, “Deep learning with coherent nanophotonic cir-cuits,” Nature Photonics , 441 (2017).[42] Tyler W. Hughes, Si Tan, Zhexin Zhao, Neil V. Sapra,Kenneth J. Leedle, Huiyang Deng, Yu Miao, Dylan S.Black, Olav Solgaard, James S. Harris, Jelena Vuckovic,Robert L. Byer, Shanhui Fan, R. Joel England, Yun JoLee, and Minghao Qi, “On-chip laser-power delivery sys-tem for dielectric laser accelerators,” Physical ReviewApplied , 054017 (2018).[43] Tyler W Hughes, Momchil Minkov, and IanA D Williamson, “ Angler – an adjoint nonlinergradient open-source package,” https://github.com/fancompute/angler (2018). SUPPLEMENTARY INFORMATION

OPTIMIZATION DETAILS

Table S1 contains the parameters used in the inverse design demonstrations of Fig. 2 and Fig. 3 of the main text.The values of the objective function vs. iteration are shown in Fig. S1 for both the 2-port and 3-port devices. parameter symbol value (2-port) value (3-port) unitsmax relative permittivity (cid:15) m χ (3) . × − . × − m V − input power P

157 57 mW/ µ mfree space wavelength λ µ mdesign region length L

10 6 µ mdesign region height H µ mwaveguide width w

300 300 nmgrid size g 40 40 nmlow-pass ﬁlter feature size R 160 200 nmprojection strength β

100 500 -projection mid-point η o b j e c t i v e f un c t i o n (a) o b j e c t i v e f un c t i o n (b) Figure S1. Objective function vs. iteration of the optimization for (a) 2-port device of Fig. 2 and (b) 3-port device of Fig. 3of the main text.

PERMITTIVITY-DEPENDENT NONLINEAR SUSCEPTIBILITY

In the inverse design demonstration of the main text, we made the assumption that the nonlinear susceptibilitydistribution was proportional to the density of material in the design region. Here we will derive the form of theadjoint sensitivity with this modiﬁcation.With our assumption, the nonlinear susceptibility vector χ may be written in terms of the scalar magnitude of thenonlinear susceptibility χ (3) and the relative permittivity vector (cid:15) r explicitly as χ = 3 ω (cid:15) χ (3) (cid:15) r − (cid:15) m − is a vector of all ones and (cid:15) m is the maximum relative permittivity allowed in the optimization, correspondingto the material relative permittivity.2When choosing to relative permittivity distribution as the set of design variables, ϕ = (cid:15) r , the nonlinear adjointproblem requires the calculation of the partial derivatives ∂ f /∂ e , ∂ f ∗ /∂ e , and ∂ f /∂ (cid:15) r . When choosing the form of χ from eq. (S1), the partial derivatives ∂ f /∂ e and ∂ f ∗ /∂ e are the same as derived in the main text. However, the term ∂ f /∂ (cid:15) r takes on a more complicated form given by ∂ f ∂ (cid:15) r = ∂∂ (cid:15) r (cid:20) A ( (cid:15) r ) e − ω (cid:15) χ (3) (cid:15) r − (cid:15) m − (cid:12) e (cid:12) e (cid:12) e ∗ − b (cid:21) (S2)= ∂A∂ (cid:15) r e − ω (cid:15) χ (3) (cid:15) m − (cid:0) e (cid:12) | e | (cid:1) (S3)= diag ( e ) − ω (cid:15) χ (3) (cid:15) m − (cid:0) e (cid:12) | e | (cid:1) , (S4)where we make use of the fact that (cid:16) ∂A∂ (cid:15) r (cid:17) ijk = δ ij δ jk , where δ is the Kronecker delta.Thus, while the adjoint ﬁeld e aj will have the same form as in the main text, when computing the sensitivity asin eq. (8) in the main text, one must insert the form of ∂ f /∂ (cid:15) r from eq. (S4). This is in contrast to the usual casewhere the nonlinear susceptibility is ﬁxed, ∂ f /∂ (cid:15) r = diag ( e ). MAINTAINING MINIMUM FEATURE SIZE AND BINARIZATION

To create realistic devices with suﬃciently large minimum feature sizes and binarized permittivity distributions,we employed ﬁltering and projection schemes during our optimization. These schemes are discussed in great detail in[36] and related works.Rather than updating the permittivity distribution directly, one may instead choose to update a design density ρ ,which varies between 0 and 1 within the design region. To create a structure with larger feature sizes, a low pass ﬁltercan be applied to ρ to created a ﬁltered density, labelled ˜ ρ :˜ ρ i = (cid:80) j ∈D W ij ρ j (cid:80) j ∈D W ij , (S5)where D denotes the design region, and W is the spatial ﬁlter, deﬁned for a feature size of R as W ij = (cid:40) R − | r i − r j | if | r i − r j | ≤ R , otherwise (S6)with | r i − r j | being the distance between points i and j . This deﬁnes a low-pass spatial ﬁlter on ρ with the eﬀect ofsmoothing out features with length scale below R .Now, for binarization of the structure, a projection scheme is used to recreate the ﬁnal permittivity from the ﬁltereddensity. For this we deﬁne ¯ ρ as the projected density, which is created from ˜ ρ as¯ ρ i = tanh ( βη ) + tanh ( β [˜ ρ i − η ])tanh ( βη ) + tanh ( β [1 − η ]) . (S7)Here, η is a parameter between 0 and 1 that controls the mid-point of the projection, typically 0.5, and β controlsthe strength of the projection, typically around 100.The relative permittivity can then be determined from ¯ ρ as (cid:15) r = ( (cid:15) m − ¯ ρ + 1 , (S8)where (cid:15) m is the maximum permittivity.The eﬀect of these ﬁltering and the binarization techniques on a sample permittivity set is illustrated in Fig. S2.In the optimizations of the main text, these techniques were performed only within the design region and requiredminimal modiﬁcations to the adjoint sensitivity. The determination of ∂ (cid:15) r /∂ ¯ ρ , ∂ ¯ ρ /∂ ˜ ρ , and ∂ ˜ ρ /∂ ρ were required tocompute the derivatives of the objective function with respect to the underlying ρ . For more details, see the softwarepackage accompanying this work [43].3 p i x e l s ( y ) design density ( ) p i x e l s ( y ) filtered density ( ) p i x e l s ( y ) projected density ( ) p i x e l s ( y ) rel. permittivity ( r ) Figure S2. Filtering and projection of an example design density, ρ . (top left) the original density ρ before processing. (topright) the density after applying a low pass ﬁlter, ˜ ρ . (bottom left) the density ¯ ρ after applying projection. (bottom right) theﬁnal relative permittivity distribution (cid:15) r . The parameters used are R = 200nm, β = 100, η = 0 . NONLINEAR INDEX SHIFT

Here we estimate the maximum nonlinear index shift of chalcogenide (Al S ) materials. Based on [30, 34, 35], Al S has a nonlinear index ( n ) between 3 × − and 2 × − cm /W. From [39], the damage threshold is 2.5 J/cm .At a pulse duration of 100 ps, this damage threshold corresponds to 2 . × W/cm . Together with the nonlinearrefractive index, the maximum refractive index shift sustainable by the material is approximately∆ n = n I damage ≈ . × − − × − (S9)with a corresponding pulse bandwidth (for a bandwidth-limited Gaussian pulse) of ≈ . × − and their objective functions have FWHM bandwidths above10 GHz. This suggests that they should exhibit their desired switching eﬀects without damage using pulse durationson the order of 100 ps and input powers on the order of 100 mW/ µ m. TRANSMISSION SPECTRA