[PDF] Feasibility, Accuracy and Performance of Contact Block Reduction method for multi-band simulations of ballistic quantum transport

Abstract

Numerical utilities of the Contact Block Reduction (CBR) method in evaluating the retarded Green's function, are discussed for 3-D multi-band open systems that are represented by the atomic tight-binding (TB) and continuum k\cdotp (KP) band model. It is shown that the methodology to approximate solutions of open systems which has been already reported for the single-band effective mass model, cannot be directly used for atomic TB systems, since the use of a set of zincblende crystal grids makes the inter-coupling matrix be non-invertible. We derive and test an alternative with which the CBR method can be still practical in solving TB systems. This multi-band CBR method is validated by a proof of principles on small systems, and also shown to work excellent with the KP approach. Further detailed analysis on the accuracy, speed, and scalability on high performance computing clusters, is performed with respect to the reference results obtained by the state-of- the-art Recursive Green's Function and Wavefunction algorithm. This work shows that the CBR method could be particularly useful in calculating resonant tunneling features, but show a limited practicality in simulating field effect transistors (FETs) when the system is described with the atomic TB model. Coupled to the KP model, however, the utility of the CBR method can be extended to simulations of nanowire FETs.

Full PDF

FFeasibility, Accuracy and Performance of Contact Block Reduction methodfor multi-band simulations of ballistic quantum transport

Hoon Ryu,

1, 2, a) Hong-Hyun Park, Mincheol Shin, Dragica Vasileska, and Gerhard Klimeck Supercomputing Center, Korea Institute of Science and Technology Information, Daejeon 305-806,Republic of Korea. Network for Computational Nanotechnology, Purdue University, West Lafayette, Indiana 47907,USA. Department of Electrical Engineering, Korea Advanced Insistute of Science and Technology, Daejeon 305-701,Republic of Korea. Department of Electrical Engineering, Arizona State University, Tempe, Arizona 85287,USA. (Dated: 10 November 2018)

Numerical utilities of the Contact Block Reduction (CBR) method in evaluating the retarded Green’s function,are discussed for 3-D multi-band open systems that are represented by the atomic tight-binding (TB) andcontinuum k · p (KP) band model. It is shown that the methodology to approximate solutions of open systemswhich has been already reported for the single-band eﬀective mass model, cannot be directly used for atomicTB systems, since the use of a set of zincblende crystal grids makes the inter-coupling matrix be non-invertible.We derive and test an alternative with which the CBR method can be still practical in solving TB systems.This multi - band CBR method is validated by a proof of principles on small systems, and also shown to workexcellent with the KP approach. Further detailed analysis on the accuracy, speed, and scalability on highperformance computing clusters, is performed with respect to the reference results obtained by the state-of-the-art Recursive Green’s Function and Wavefunction algorithm. This work shows that the CBR methodcould be particularly useful in calculating resonant tunneling features, but show a limited practicality insimulating ﬁeld eﬀect transistors (FETs) when the system is described with the atomic TB model. Coupledto the KP model, however, the utility of the CBR method can be extended to simulations of nanowire FETs.PACS numbers: 05.60.Gg, 03.65.Fd, 61.82.FkKeywords: Quantum Transport, Contact Block Reduction, Multi-band Model, Nanoelectronics I. INTRODUCTIONA. Needs for multi-band approaches

Semiconductor devices have been continuously down-scaled ever since the invention of the ﬁrst transistor ,such that the size of the single building component ofmodern electronic devices has already reached to a fewnanometers (nm). In such a nanoscale regime, twoconceptual changes are required in the device modelingmethodology. One aspect is widely accepted where carri-ers must be treated as quantum mechanical rather thanclassical objects. The second change is the need to em-brace the multi-band models which can describe atomicfeatures of materials, reproducing experimentally veriﬁedbulk bandstuructures. While the single-band eﬀectivemass approximation (EMA) predicts bandstructures rea-sonably well near the conduction band minimum (CBM),the subband quantization loses accuracy if devices are ina sub-nm regime . The EMA also fails to predict in-direct gaps, inter-band coupling and non-parabolicity inbulk bandstructures .The nearest-neighbor empirical tight-binding (TB) andnext nearest-neighbor k · p (KP) approach are most widely a) Electronic mail: [email protected] used band models of multiple bases . The most so-phisticated TB model uses a set of 10 localized orbitalbases (s, s*, 3 × p, and 5 × d) on real atomic grids (20with spin interactions), where the parameter set is ﬁtto reproduce experimentally veriﬁed bandgaps, masses,non-parabolic dispersions, hydrostatic and biaxial strainbehaviors of bulk materials using a global minimizationprocedure based on a genetic algorithm and analyticalinsights . This sp d s ∗ TB approach can easily in-corporate atomic eﬀects such as surface roughness andrandom alloy compositions as the model is based on aset of atomic grids. These physical eﬀects have beenshown to be critical to the quantitative modeling of Res-onance Tunneling Diodes (RTDs), quantum dots, disor-dered SiGe/Si quantum wells, and a single impurity de-vice in Si bulk .The KP approach typically uses four bases on a setof cubic grids with no spin interactions . While it stillfails to predict the indirect gap of bulk dispersions sinceit assumes that all the subband minima are placed onthe Γ point, the credibility is better than the EMA sincethe KP model can still explain the inter-band physicsof direct gap III-V devices, and valence band physics ofindirect gap materials such as silicon (Si) . a r X i v : . [ c ond - m a t . m e s - h a ll ] D ec B. Contact Block Reduction method

One of the important issues in modeling of nanoscaledevices, is to solve the quantum transport problemwith a consideration of real 3-D device geometries. Al-though the Non-Equilibrium Green’s Function (NEGF)and WaveFunction (WF) formalism have been widelyused to simulate the carrier transport , the com-putational burden has been always a critical problemin solving 3-D open systems as the NEGF formalismneeds to invert a system matrix of a degree-of-freedom(DOF) equal to the Hamiltonian matrix . The Recur-sive Green’s Function (RGF) method saves the comput-ing load by selectively targeting elements needed for thematrix inversion . However, the cost can be still hugedepending on the area of the transport-orthogonal plane(cross-section) and the length along the transport di-rection of target devices . The WF algorithm alsosaves the computing load if the transport is ballistic asit doesn’t have to invert the system matrix and ﬁndinga few solutions of the linear system is enough to predictthe transport behaviors. But, the load still depends onthe size of the system matrix and the number of solutionvectors (modes) needed to describe the carrier-injectionfrom external leads . In fact, RGF and WF calcula-tions for atomically resolved nanowire ﬁeld eﬀect transis-tors (FETs) have demonstrated the need to consume over200,000 parallel cores on large supercomputing clusters .Developed by Mamaluy et al. , the Contact BlockReduction (CBR) method has received much attentiondue to the utility to save computing expense required toevaluate the retarded Green’s function of 3-D open sys-tems. The CBR method is thus expected to be a goodcandidate for transport simulations since the methoddoesn’t have to solve the linear system yet reducingthe computing load needed for matrix inversion . Themethod indeed has been extensively used such that it suc-cessfully modeled electron quantum transport in exper-imentally realized Si FinFETs , and predicted optimaldesign points and process variations in design of 10-nmSi FinFETs . However, all the successful applicationsfor 3-D systems so far, have been demonstrated only forthe systems represented by the EMA. C. Goals of this work

While the use of multi-band approaches can increasethe accuracy of simulation results, it requires more com-puting load as a DOF of the Hamiltonian matrix is di-rectly proportional to the number of bases required torepresent a single atomic (or grid) spot in the device ge-ometry. To suggest a solution to this trade-oﬀ issue, weexamine the numerical utilities of the CBR method inmulti-band ballistic quantum transport simulations, fo-cusing on multi-band 3-D systems represented by eitherof the TB or KP band model.The objective of this work is to provide detail answers to the following questions through simulations of smalltwo-contact ballistic systems focusing on a proof of prin-ciples: (1) Can the original CBR method be extended tosimulate ballistic quantum transport of multi-band sys-tems? (2) If the answer to the question (1) is “ yes (cid:48)(cid:48) ,what is the condition under which the multi-band CBRmethod becomes particularly useful?, and (3) How is thenumerical practicality of the multi-band CBR methodcompared to the RGF and WF algorithms, in terms ofthe accuracy, speed and scalability on High PerformanceComputing (HPC) clusters?

II. METHODOLOGY

In real transport problems, a device needs to be cou-pled with external contacts that allow the carrier-in-and-out ﬂow. With the NEGF formalism, this can be doneby creating an open system that is described with a non-Hermitian system matrix . Representing this systemmatrix as a function of energy, we compute the trans-mission coeﬃcient and density of states, to predict thecurrent ﬂow and charge proﬁle in non-equilibrium. Thisenergy-dependent system matrix is called the retardedGreen’s function G R for an open system (Eq. (1)). G R ( E ) = [( E + iη ) I − H o − Σ( E )] − , η → + (1)where H o is is the Hamiltonian representing the deviceand Σ is the self-energy term that couples the deviceto external leads. As already mentioned in the previoussection, the evaluation of G R is quite computationally ex-pensive since it involves intensive matrix inversions. TheCBR method, however, reduces matrix inversions withthe mathematical process based on the Dyson equation.We start the discussion revisiting the CBR method thathas been so far utilized for EMA systems. A. Revisit: CBR with EMA

The CBR method starts decomposing the device do-main into two regions: (1) the boundary region c thatcouples with the contacts, and (2) the inner region d thatdoesn’t couple to the contacts. As the self-energy termΣ is non-zero only in the boundary region, H o and Σ aredecomposed as shown in Eq. (2), where subscripts ( c , d )denote above-mentioned regions, respectively. H = (cid:20) H oc H ocd H odc H od (cid:21) , Σ = (cid:20) Σ c cd dc d (cid:21) (2)Then, G R can be evaluated with the Dyson equationdeﬁned in Eq. (3) and Eq. (4), where Σ x and G x areconditioned with a Hermitian matrix X to minimize ma-trix inversions by solving the eigenvalue problem (Eqs.(5)). FIG. 1. Schematic of the semi-inﬁnite contact to illustratethe treatment of the external contact that is normally as-sumed to be an inﬁnite chain of the slab on the device bound-ary (the outmost slab in the device domain). A − c = ( I c − G xc Σ xc ) − (3) G R ( E ) = ( I − Σ x G x ) − G x (4)= (cid:20) A − c cd − G xdc Σ xc A − c I d (cid:21) (cid:20) G xc G xcd G xdc G xd (cid:21) X = (cid:20) x c cd dc d (cid:21) , Σ x = Σ − X, (5) G x = [ EI − ( H o + X )] − = (cid:20) G xc G xcd G xdc G xd (cid:21) = (cid:88) α | Ψ α (cid:105)(cid:104) Ψ α | E − (cid:15) α + iη where (cid:15) α and Ψ α are the α th eigenvalue and eigenvec-tor of the modiﬁed Hamiltonian ( H o + X ). Here, we notethat the matrix inversion is performed only to evaluatethe boundary block A c (contact-block) for one time whilethe RGF needs to perform the block-inversion manytimes depending on the device channel length. The com-puting load for matrix inversion is thus signiﬁcantly re-duced, and the method is also free from solving a linear-system problem. Instead, the major numerical issue nowbecomes a normal eigenvalue problem for a Hermitianmatrix ( H o + X ). For the numerical practicality, it isthus critical to reduce a number of required eigenvalues,and for EMA Hamiltonian matrices, a huge reduction inthe number of required eigenvalues can be achieved viaa smart choice of the prescription matrix X .To ﬁnd the matrix X and see if it can be extended tomulti-band systems, we ﬁrst need to understand how tocouple external contacts to the device. Fig. 1 illustratesthe common approach which treats the contact as a semi-inﬁnite nanowire of a ﬁnite cross-section. Here, H B isa block matrix that represents the unit-slab along thetransport direction, and W is another block matrix whichrepresent the inter-slab coupling. The eigenfunction ofthe plane wave at the m th mode in the n th slab, Ψ ( n,m ) should then obey the Schr¨odinger equation and the Blochcondition (Eqs. (6)). ( EI − H B )Ψ ( n,m ) = W + Ψ ( n − ,m ) + W Ψ ( n +1 ,m ) , Ψ ( n +1 ,m ) = exp ( ik m L )Ψ ( n,m ) (1 ≤ m ≤ M ) (6)where k m is the plane-wave vector at the m th mode, L isthe length of a slab along the transport direction, and Mis the maximum number of plane-wave modes that canexist in a single slab and is equal to the DOF of H B .Then, the surface Green’s function G surf and self-energyterm Σ can be evaluated by converting Eqs. (6) to thegeneralized eigenvalue problem for a complex and non-Hermitian matrix . The solution for G surf and Σ areprovided in Eqs. (7), where K and Λ are shown in Eqs.(8). G surf = K [ K − ( H B − EI ) K + K − W + K Λ] − K − , Σ = W + G surf W (7) K = [Ψ (0 , Ψ (0 , . . . Ψ (0 ,M ) ] , Λ = diag [ exp ( ik L ) exp ( ik L ) . . . exp ( ik M L )] (8)In systems described by the nearest-neighbor EMA,each slab becomes a layer of common cubic grids suchthat each grid on one layer is coupled to the same gridon the nearest layer. The inter-slab coupling matrix W thus becomes a scaled identity matrix , with which thegeneral solution for G surf and Σ in Eqs. (7) can besimpliﬁed using a process described in Eq. (9) and Eq.(10). We note that previous literatures have shown onlythe simpliﬁed solution for G surf and Σ . G surf = K [ K − ( H B − EI ) K + K − W + K Λ] − K − = K [ K − ( H B − EI ) K + W + Λ] − K − = K [ − K − ( W + K Λ +

W K Λ − ) + W + Λ] − K − [ ∵ ( EI − H B ) K = W + K Λ +

W K Λ − ]= − K [ W Λ − ] − K − = − K Λ W − K − (9)Σ = W + G surf W = W + ( − K Λ W − K − ) W = − W + K Λ K − ( ∵ W + = W )= − W K Λ K − (10)The original CBR method coupled to the EMA pre-scribes the Hermitian matrix X as − W or its Hermitiancomponent (if W is complex). The new self-energy termΣ x in Eqs. (5) then becomes (Eq. (11)):Σ x = Σ − X = Σ + W = − W K Λ K − + W = − W K (Λ − I ) K − (11)where the matrix (Λ − I ) becomes zero at Γ point, whereEMA subband minima are always placed on. The result-ing new Hamitonian ( H o - W ), becomes the Hamiltonianwith the generalized V on - N eumann boundary conditionat contact boundaries. The spectra of the matrix ( H o - W ), therefore become approximate solutions of the openboundary problem, and the retarded Green’s function G R ( E ) in Eq. (4) can be thus approximated with anincomplete set of energy spectra of the Hermitian matrixnear subband minima . B. CBR with multi-band models

Regardless of the band model, the G R ( E ) in Eq. (4)can be accurately calculated with a complete set of spec-tra since it then becomes the Dyson equation (Eq. (3))itself. The important question here is then whether wecan make the CBR method be still numerically practicalfor multi-band systems such that the transport can besimulated with a narrow energy spectrum. To study thisissue, we focus on the inter-slab coupling matrix W ofmulti-band systems. A toy Si device that consists of twoslabs along the [100] direction, is used as an example forour discussion.Fig. 2 shows the device geometry and correspondingHamiltonian matrix built with the EMA, KP and TBmodel, respectively. Here, we note that the simplifyingprocess in Eq. (9) and Eq. (10) is not strictly correct ifthe inter-slab coupling matrix W is not an identity ma-trix, since, for any square matrix K and W , K − W K cannot be simpliﬁed to W if W is neither an identitymatrix nor a scaled identity matrix. When a system isrepresented with KP model, a single slab is still a layer ofcommon cubic grids as the KP approach also uses a set ofcubic grids. But, the non-zero coupling is extended up tonext-nearest neighbors such that the inter-slab couplingmatrix W is no more an identity matrix. The simpli-ﬁed solution for G surf and Σ, however, can be still usedto approximate the general solutions in Eqs. (7) sincethe coupling matrix W is diagonally dominant and in-vertible. But, the situation becomes tricky for TB sys-tems that are represented on a set of real zincblende (ZB)grids.In the ZB crystal structure, a Si unit-slab has a to-tal of four unique atomic layers along the [100] direction.Because the TB approach assumes the nearest-neighborcoupling, only the last layer in one slab is coupled tothe ﬁrst layer in the nearest slab while all the other cou-pling blocks among layers in diﬀerent slabs become zero-matrices. As described in Fig. 2, this makes the inter-slab coupling matrix W be singular such that matrixinversions become impossible. The simpliﬁed solutionfor G surf and Σ in Eq. (9) and Eq. (10) are thereforemathematically invalid, and they cannot be even usedto approximate the full solution (Eqs. (7)). A new pre-scription for X is thus needed to make the CBR methodbe still practical for ZB-TB systems, and we propose an FIG. 2. Illustration of the geometry and the Hamilto-nian matrix built for the (a) EMA, (b) KP, and (c) TB toynanowire. Arrows represent the inter-slab coupling. The sim-plifying process in Eq. (9) and Eq. (10) are not strictlyaccurate for multi-band models since the inter-slab couplingmatrix is neither an identity matrix nor a scaled identity ma-trix. Especially, the coupling matrix becomes singular in TBmodel, which indicates that the simpliﬁed solution for G surf and Σ are even mathematically invalid. alternative in Eq. (12). X = Σ( (cid:15) edge ) + Σ + ( (cid:15) edge )2 (12)where (cid:15) edge is the energetic position of the CBM (va-lence band maximum (VBM)) of the bandstructure ofthe semi-inﬁnite contact.If only a few subbands near the CBM (or VBM) ofthe contact bandstructure are enough to describe the ex-ternal contact, the prescription suggested in Eq. (12)works quite well as X is the Hermitian part of the self-energy term, such that ( H o + X ) approximates the opensystem near the edge of the contact bandstructure, Theapproximation, however, becomes less accurate if moresubbands in higher energy (in lower energy for valenceband) are involved to the open boundaries. Away fromthe band edge, subband placement becomes denser andinter-subband coupling becomes stronger. The prescrip-tion X in Eq. (12) then would not be a good choice as itonly approximate the open boundary solution near bandedges, and the CBR method thus needs more eigenspec-trums to solve open boundary transport problems. So,for example, the multi-band CBR method would not benumerically practical to simulate FETs at a high source-drain bias, since a broad energy spectrum is then neededto get an accurate solution.Before closing this section, we note that, if the inter-slab coupling matrix W is either an identity matrix ora scaled identity matrix, the prescription matrix X inEq. (12) becomes identical to the one utilized to simu-late 3-D systems in the previous literatures , where( H o + X ) approximates the open system well near everysubband minima if the system is represented by theEMA . Once G surf and Σ are determined fromthe prescription matrix X , evaluation of the transmis-sion coeﬃcient (TR) and the density of states (DOS)can be easily done . Further detailed mathemat-ics regarding derivation of TR and DOS will not be thusdiscussed here. III. RESULTS AND DISCUSSIONS

The results are discussed in two subsections. First, wevalidate the CBR method for multi-band systems withthe new prescription for X in Eq. (12). Focusing on aproof of principles, we compute the TR and DOS proﬁlesfor a toy TB and KP system, compare the result to thereferences obtained with the RGF algorithm, and suggestthe device category where the multi-band CBR methodcould be particularly practical. Second, we examine thenumerical practicality of the multi-band CBR method bycomputing TR and DOS proﬁles of a resonant tunnelingdevice and a nanowire FET. The accuracy, the speed ofcalculations in a serial mode, and the scalability on HPCclusters, are compared to those obtained with the RGFand WF algorithm. We assume a two-contact ballistictransport for all the numerical problems. A. Validation of multi-band CBR method

To validate the multi-band CBR method that has beendiscussed in the previous section, we consider two multi-band toy Si systems represented by the 10-band sp d s ∗ TB and 3-band KP approach. Here, we intentionallychoose extremely small systems to calculate a completeset of energy spectra of the Hamiltonian, with which theCBR method should produce results identical to the onesobtained by the RGF algorithm. For the TB system,the electron-transport is simulated while we calculate thehole-transport for the KP system due to a limitation ofthe KP approach in representing the Si material . TB System : Fig. 3 illustrates the TR and DOS pro-ﬁle calculated for the TB Si toy device which consistsof (2 × ×

2) (100) unit-cells ( ∼ . FIG. 3. (Electron-transport in a toy Si TB system) TR andDOS proﬁles calculated by the CBR method: Results with aprescription suggested in Eq. (12) (New method), and an oldprescription suggested for the EMA (Original method). Notethat, with the old prescription, using more energy spectradoes not improve the accuracy of the CBR solution.

With the new prescription matrix X suggested in Eq.(12), the TR and DOS proﬁle obtained by the CBRmethod become closer to the reference result as morespectrums are used, and eventually reproduce the ref-erence result with a full set of spectrums, as shown inthe left column of Fig. 3. Here, the CBR result turnsout to be quite accurate near the CBM even with 1% ofthe total spectrums, indicating that the TB-CBR methodcould be a practical approach if most of the carriers areinjected from the ﬁrst one or two subbands of the con-tact bandstructure. This condition can be satisﬁed when(1) only the ﬁrst one or two subbands in the contactbandstructure are occupied with electrons, and (2) theenergy diﬀerence between the source and drain contactFermi-level (the source-drain Fermi-window) becomes ex-tremely narrow. So, the simulation of FETs at a highsource-drain bias would not be an appropriate target ofthe TB-CBR simulations since the source-drain Fermi-window may include many subbands, and many spectramay be thus needed for accurate solutions . Instead,we propose that RTDs could be one of device categoriesfor which the TB-CBR method is particularly practical,since the Fermi-window for transport becomes extremelysmall in RTDs in some cases .The same calculation is performed again but using theold prescription X suggested for the EMA, and corre-sponding TR and DOS proﬁles are shown in the rightcolumn of Fig. 3. The CBR method still reproduce thereference result with a full set of energy spectra since theDyson equation (Eq. (4)) should always work for any FIG. 4. (Hole-transport in a toy Si KP system) TR andDOS proﬁles calculated by the CBR method: Results with aprescription suggested in Eq. (12) (New method), and an oldprescription suggested for the EMA (Original method). Notethat the accuracy of the CBR solution is similar with boththe new and old prescription. X ’s. The accuracy of the results near the CBM, how-ever, turns out to be worse than the one with the newprescription. The results furthermore reveal that the ac-curacy with 10% of the total spectra does not necessarilybecomes better than the one with 1%, indicating thatthe old prescription for X cannot even approximate thesolution near the CBM of open TB systems. KP System : The TR and DOS proﬁle of the KP Si2.0(nm) (100) cube, are depicted in Fig. 4. The struc-ture is discretized with a 0.2(nm) grid and involves acomplex Hermitian Hamiltonian of 3,000 DOF. Here, theDOF of the real-space KP Hamiltonian can be eﬀectivelyreduced with the mode - space approach . The eﬀectiveDOF of the Hamiltonian therefore becomes 500, wherewe consider 50 modes per each slab along the transportdirection. Again, we note that the VBM of the contactbandstructure is placed at -0.4(eV), and lower in energythan the VBM of Si bulk (0(eV)) due to the conﬁnementcreated by the ﬁnite cross-section.We claim that the CBR method works quite well forthe KP system, since the TR and DOS proﬁles not onlybecome closer to the reference results as more of the en-ergy spectrums are used, but also exhibit excellent ac-curacy near the VBM of the contact bandstructure asshown in Fig. 4. We, however, observe a remarkablefeature that is not found in the CBR method coupled toTB systems: The KP-CBR method shows a good accu-racy with both the old and new prescription matrix X ,which supports that the simpliﬁed solution for G surf andΣ (Eq. (9) and Eq. (10)) are still useful to approximate FIG. 5. Illustration of the geometry and potential proﬁleof a Si:P RTD that is used as the example to examine theutility for the TB-CBR method. For the potential proﬁle,1.13(eV) is used as a reference value representing the Si BulkCBM. The single donor coulombic potential that has beencalibrated by Rahman et al. (Ref. [9]), with respect to the Sibulk, is superposed to the channel potential proﬁle to considerthe sharp structural conﬁnement stemming from the singledonor. the full solution (Eqs. (7)) as discussed in the previoussection. We also claim that the utility of the KP-CBRmethod could be extended to nanowire FETs because themode-space approach reduces the DOF of the Hamilto-nian such that we save more computing cost needed tocalculate energy spectra. In the next subsection, we willcome back to this issue again.

B. Practicality of multi-band CBR method

In this subsection, we provide a detailed analysis ofthe numerical utility of the multi-band CBR method interms of the accuracy and speed. Based on discussionsin the previous subsection with a focus on a proof ofprinciples on small systems, a RTD is considered as asimulation example of TB systems, while a nanowire FETis again used as an example of KP systems to discuss thenumerical practicality of the method. The TR and DOSproﬁles obtained by the RGF and WF algorithm are usedas reference results. We note that the WF case is addedin this subsection to provide a complete and competitiveanalysis on the speed and scalability on HPC clusters.

TB System : A single phosphorous donor in host Si ma-terial (Si:P) creates a 3-D structural conﬁnement arounditself. Such Si:P quantum dots have gained scientiﬁc in-terest due to their potential utility for qubit-based logicapplications . Especially, the Stark eﬀect in Si:P quan-tum dots is one of the important physical problems, andwas quantitatively explained by previous TB studies . FIG. 6. (Electron-transport in a Si:P RTD) (a) TR and (b) DOS proﬁles calculated by the CBR method. Note that all theresonances in the range of energy are captured even with just 40 spectra that corresponds to just 0.2% of the DOF of the TBHamiltonian.FIG. 7. (Electron-transport in a Si:P RTD) TR and DOSproﬁles integrated over energy. The cumulative proﬁles of TRand DOS eﬀectively indicates the accuracy of the current andcharge proﬁles. The cumulative DOS is especially importantas it is directly coupled to charge proﬁles that are needed forcharge-potential self-consistent simulations.

The electron-transport in such Si:P systems should betherefore another important problem that needs to bestudied.The geometry of the example Si:P device is illustratedin Fig. 5. Here, we consider a [100] Si nanowire thatis 14.0(nm) long and has a 1.7(nm) rectangular cross-section. The ﬁrst and last 3.0(nm) along the trans- port direction, are considered as densely N-type dopedsource-drain region assuming a 0.25(eV) band-oﬀset inequilibrium . Then, a single phosphorous atom is placedat the channel center with a superposition of the impu-rity coulombic potential that has been calibrated for asingle donor in Si bulk by by Rahman et al. . The elec-tronic structure has a total of 1872 atoms and involves acomplex Hamiltonian matrix of 18,720 DOF.Fig. 6 shows the TR and DOS proﬁles in four cases,where the ﬁrst three cases are the CBR results with 10,20 and 40 spectra that correspond to 0.05%, 0.1%, and0.2% of the Hamiltonian DOF, and the last one is usedas a reference. Due to the donor coulombic potential,the channel forms a double-barrier system such that theelectron transport should experience a resonance tunnel-ing. As shown in Fig. 5, the CBR method produces anice approximation of the reference result such that theﬁrst resonance is observed with just 10 energy spectra.It also turns out that 40 spectra are enough to captureall the resonances that show up in the range of energy ofinterest.The accuracy of the solutions approximated by theCBR method, is examined in a more quantitative man-ner by integrating the TR and DOS proﬁle over energy.Fig. 7 illustrates this cumulative TR (CTR) and DOS(CDOS) proﬁle, which are conceptually equivalent to thecurrent and charge proﬁle, respectively. In spite of aslight deviation in absolute values, the CTR proﬁles stillconﬁrm that the CBR method captures resonances quite

FIG. 8. (Hole-transport in a Si nanowire) (a) TR and DOSproﬁles, and (b) corresponding cumulative proﬁles. The KP-CBR solutions exhibit excellent accuracy such that 200 spec-tra, which corresponds to just 2.2%, turn out to be enough toalmost reproduce the reference solutions in the entire rangeof energy of interest (0.8(eV) beyond the VBM of the wirebandsturcutrue). precisely such that the energetic positions where the TRsharply increases, are almost on top of the reference re-sult. The CDOS proﬁle exhibits much better accuracysuch that the result with 40 spectra almost reproducesthe reference result even in terms of absolute values. Weclaim that the accuracy in the CDOS proﬁle is particu-larly critical, since it is directly connected to charge pro-ﬁles that are essential for charge-potential self-consistentsimulations.

KP System : Si nanowire FETs obtained through top-down etching or bottom-up growth have attracted at-tention due to their enhanced electrostatic control overthe channel, and thus become an important target ofvarious modeling works . For KP systems, the CBRmethod could become a practical approach to solve trans-port behaviors of FET devices since the computing loadfor solving eigenvalue problems can be reduced with themode-space approach.A [100] Si nanowire FET of a 15.0(nm) long channeland a 3.0(nm) rectangular cross-section, is therefore con-sidered as a simulation example to test the performanceof the KP-CBR method. The hole-transport is simu-lated with the 3-band KP approach, where the simula-tion domain is discretized with a set of 0.2(nm) meshcubic grids and involves a real-space Hamiltonian matrixof 50,625 DOF. As the device has a total of 75 slabs alongthe transport direction, the mode-space Hamiltonian has9,000 DOF with a consideration of 120 modes per slab. Ithas been reported that the wire bandstructure obtained

FIG. 9. Speed and scalability of the multi-band CBRmethod: For the example multi-band systems of TB Si:P RTDand KP Si nanowire FET, we measure the time required tocalculate the TR and DOS per single energy point. Scalabil-ity of the calculation time is also measured to examine thenumerical practicality of the method on HPC clusters. with 120 modes per slab, becomes quite close to the fullsolution for a cross-section smaller than 5.0 × ) .The wire is assumed to be purely homogeneous such thatneither the doping nor band-oﬀset are considered.To see if the CBR method can be reasonably practi-cal in simulating the hole-transport at a relatively largesource-drain bias, we plan to cover the energy range atleast larger than 0.4(eV) beyond the VBM of the wirebandstucture. For this purpose, we compute 50, 100, and200 energy spectra that correspond to 0.5%, 1.1%, and2.2% of the DOF of the mode-space Hamiltonian, respec-tively. Fig. 8(a) shows the corresponding TR and DOSproﬁles. Here, the CBR solution not only become closerto the reference result with more spectra considered, butalso demonstrate fairly excellent accuracy near the VBMof the wire bandstructure. The CTR and CDOS proﬁlesprovided in Fig. 8(b) further support the preciseness ofthe CBR solutions near the VBM. The cumulative pro-ﬁles also support that the CBR solution covers a rela-tively wide range of energy, such that 50 energy spectraare already enough to cover ∼ ∼ TABLE I. The time required to evaluate the TR and DOSper single energy point in a serial mode, for the RTD andnanowire FET considered as simulation examples.Approaches (TB) time (s) Approaches (KP) time (s)CBR 0.05(%) 11.5 CBR 0.5(%) 4.9CBR 0.1(%) 11.8 CBR 1.1(%) 5.1CBR 0.2(%) 12.0 CBR 2.2(%) 5.9RGF 19.0 RGF 5.0WF 6.5 WF 3.4

Speed and scalability on HPCs : So far, we have dis-cussed the practicality of the multi-band CBR methodfocusing on the accuracy of the solutions for two-contact,ballistic-transport problems. Another important crite-rion to determine the numerical utility should be thespeed of calculations. We therefore measure the timeneeded to evaluate the TR and DOS per single energypoint for the TB Si:P RTD and the KP Si nanowire FETrepresented that are utilized as simulation examples. Toexamine the practicality of the multi-band CBR methodon HPC clusters, we also benchmark the scalability of thesimulation time on the

Coates cluster under the supportof the Rosen Center for Advanced Computing (RCAC)at Purdue University. The CBR, RGF, and WF methodsare parallelized with MPI/C++, the MUltifrontal Mas-sively Parallel sparse direct linear Solver (MUMPS) ,and a self-developed eigensolver based on the shift-and-invert Arnoldi algorithm . All the measurements areperformed on a 64-bit, 8-core HP Proliant DL585 G5system of 16GB SDRAM and 10-gigabit ethernet localto each node.Table I summarizes the wall-times measured for vari-ous methods in a serial mode. Generally, the simulationof the KP Si nanowire FET needs less computing loads,such that the wall-times are reduced by a factor of twowith respect to the computing time taken for the TB Si:PRTD. This is because the KP approach can represent theelectronic structure with the mode-space approach suchthat the Hamiltonian matrix has a smaller DOF (9,000),compared to the one used to describe the TB Si:P RTD(18,720).Compared to the RGF algorithm in a serial mode,the CBR method demonstrates a comparable (KP), orbetter (TB) performance. Since a single slab of theKP Si nanowire is represented with a block matrix H B (Fig. 1) of 120 DOF, the matrix inversion is not a crit-ical problem any more in the RGF algorithm such thatthe CBR method doesn’t necessarily show better per-formances than the RGF algorithm. The TB exampledevice, however, needs a H B of 720 DOF to represent asingle slab (a total of 26 slabs) so the burden for matrixinversions become bigger compared to the KP example.As a result, the CBR method generally shows better per-formances. The CBR method, however, doesn’t beat theWF method in both the TB and KP case since, in a se-rial mode, the CBR method consumes time to allocate ahuge memory space that is needed to store “full” complexmatrices via vector-products (Eqs. (5)).The strength of the CBR method emerges in a parallel mode (on multiple CPUs), where the vector-products areperformed via MPI-communication among distributedsystems and each node thus saves only a fraction of thefull matrix. The scalability of the various methods iscompared up to a total of 16 CPUs in Fig. 9. The com-mon RGF calculation can be eﬀectively parallelized onlyup to a factor of two, due to its recursive nature , andthe scalability of the WF method becomes worse in manyCPUs because it uses a direct-solver-based LU factoriza- tion to solve the linear system. As a result, the CBRmethod starts to show the best speed when more than 8CPUs are used. IV. CONCLUSION

In this work, we discuss numerical utilities of the CBRmethod in simulating ballistic transport of multi-bandsystems described by the the atomic 10-band sp d s ∗ TBand 3-band KP approach. Although the original CBRmethod developed for single-band EMA systems achievesan excellent numerical eﬃciency by approximating solu-tions of open systems, we show that the same approachcan’t be used to approximate TB systems as the inter-slab coupling matrix becomes singular. We therefore de-velop an alternate method to approximate open systemsolutions. Focusing on a proof of principles on small sys-tems, we validate the idea by comparing the TR and DOSproﬁle to the reference result obtained by the RGF algo-rithm, where the alternative also works well with the KPapproach.Since the major numerical issue in the CBR methodis to solve a normal eigenvalue problem, the numericalpracticality of the method becomes better as the trans-port can be solved with a less number of energy spectra.Generally, the practicality would be thus limited in multi-band systems, since multi-band approaches need a largernumber of spectra to cover a certain range of energy thanthe single band EMA does. We, however, claim that theRTDs could be one category of TB devices, for whichthe multi-band CBR method becomes particularly prac-tical in simulating transport, and the numerical utilitycan be even extended to FETs when the CBR methodis coupled to the KP band model. To support this argu-ment, we simulate the electron resonance tunneling in a3-D TB RTD, which is basically a Si nanowire but hasa single phosphorous donor in the channel center, andthe hole-transport of a 3-D KP Si nanowire FET. Weexamine numerical practicalities of the multi-band CBRmethod in terms of the accuracy and speed, with respectto the reference results obtained by the RGF and WF al-gorithm, and observe that the CBR method gives fairlyaccurate TR and DOS proﬁle near band edges of contactbandstructures.In terms of the speed in a serial mode, the strengthof the CBR method over the RGF algorithm depends onthe size of the Hamiltonian such that the CBR showsa better performance than the RGF as a larger block-matrix is required to represent the unit-slab of devices.But, the speed of the WF method is still better than theCBR method as the CBR method consumes time to storea full complex matrix during the process of calculations.In a parallel mode, however, the CBR method starts tobeat both the RGF and WF algorithm since the full ma-trix can be stored into multiple clusters in a distributivemanner, while the scalability of both the RGF and WFalgorithm are limited due to the nature of recursive and0direct-solver-based calculation, respectively.

ACKNOWLEDGEMENTS

H. Ryu, H.-H. Park and G. Klimeck acknowledge theﬁnancial support from the National Science Foundation(NSF) under the contract No. 0701612 and the Semicon-ductor Research Corporation. M. Shin acknowledges theﬁnancial support from Basic Science Research Programthrough the National Research Foundation of Republicof Korea, funded by the Ministry of Education, Scienceand Technology under the contract No. 2010-0012452.Authors acknowledge the extensive use of computing re-sources in the Rosen Center for Advanced Computingat Purdue University, and NSF-supported computing re-sources on nanoHUB.org. G. E. Moore,

Electronics , 114, (1965). M. Luisier, A. Schenk and W. Fichtner,

Phys. Rev. B G. Klimeck, S. S. Ahmed, H. Bae, N. Kharche, R. Rahman, S.Clark, B. Haley, S. Lee, M. Naumov, H. Ryu, F. Saied, M. Prada,M. Korkusinski and T. B. Boykin,

IEEE Trans. Elec. Dev. ,2079, (2007). Y. X. Liu, D. Z. -Y. Ting and T. C. McGill,

Phys. Rev. B ,5675, (1996). G. Klimeck, R. Lake, R. C. Bowen, C. Fernando and W. Frensley,

VLSI Design , 79, (1998). T. B. Boykin, G. Klimeck and F. Oyafuso,

Phys. Rev. B , R. C. Bowen, G. Klimeck, W. R. Frensley, R. K. Lake,

J. Appl.Phys. , 3207, (1997) N. Kharche, M. Prada, T. B. Boykin and G. Klimeck,

Appl. Phys.Lett. , 9, (2007). R. Rahman, C. J. Wellard, F. R. Bradbury, M. Prada, J. H. Cole,G. Klimeck and L. C. L. Hollenberg,

Phys. Rev. Lett. , 036403,(2007). G. P. Lansbergen, R. Rahman, C. J. Wellard, I. Woo, J. Caro,N. Collaert, S. Biesemans, G. Klimeck, L. C. L. Hollenberg andS. Rogge,

Nature Physics , 656, (2008). M. Shin,

J. Appl. Phys. , 054505, (2009). C. Pryor,

Phys. Rev. B , 7190, (1998) S. Datta,

Superlatt. Microstruct. , 253, (2000). M. St¨adele, B. R. Tuttle and K. Hess,

J. Appl. Phys. , 348,(2001). N. Kharche, G. Klimeck, D. -H. Kim, J. A. del Alamo andM. Luisier,

Proceedings of IEEE International Electron DevicesMeeting , (2009). R. Lake, G. Klimeck, R. C. Bowen and D. Jovanovic,

J. Appl.Phys. , 7845, (1996). C. Rivas and R. Lake,

Phys. Stat. Sol. (b) , 94, (2003). S. Cauley, J. Jain, C. -K. Koh and V. Balakrishnan,

J. Appl.Phys. , 123715, (2007). D. Mamaluy, M. Sabathil, T. Zibold, P. Vogl and D. Vasileska,

Phys. Rev. B , 245321, (2005). G. Klimeck and M. Luisier,

Computing in Science and Engineer-ing , 28, (2010). D. Mamaluy, D. Vasileska, M. Sabathil and P. Vogl,

Semicond.Sci. Tech , 118, (2004). H. R. Khan, D. Mamaluy and D. Vasileska,

IEEE Trans. Elec.Dev. , 784, (2007). H. R. Khan, D. Mamaluy and D. Vasileska,

IEEE Trans. Elec.Dev. , 743, (2008). H. R. Khan, D. Mamaluy and D. Vasileska,

IEEE Trans. Elec.Dev. , 2134, (2008). D. Mamaluy, M. Sabathil and P. Vogl,

J. Appl. Phys. , 4628,(2003). In principle, the 4-band KP model uses a total of four bases tomodel direct bandgap materials such as GaAs and InAs, whereone basis is used to model the conduction band and the remainingthree bases are used to model the valence band. The valenceband of indirect bandgap materials such Si, however, can be stillmodeled with three bases if VBM is at the Γ point (Ref. [11]). Both the TB and KP model considered in this work place theVBM of Si bulk at 0 (eV). Assuming that the source contact is grounded, the Fermi-windowat V DS = V, becomes [ E FD -3 kT , E FS +3 kT ] = [ E F - qV -3 kT , E F +3 kT ], where q is the single electron charge and E F is theFermi-level of the system in equilibrium. The maximum and min-imum of the window are determined at the source and drain side,respectively. E. Lind, B. Gustafson, I. Pietzonka and L. -E. Wernersson,

Phys.Rev. B , 033312, (2003). L. C. L. Hollenberg, A. S. Dzurak, C. J. Wellard, A. R. Hamilton,D. J. Reilly, G. J. Milburn and R. G. Clark,

Phys. Rev. B ,113301, (2004). The band-oﬀset between the intrinsic channel and densely dopedsource-drain leads, is taken from the work of Martinez et al. ,where the equilibrium potential proﬁle has been self-consistentlyobtained for a 14.0(nm) long [100] Si nanowire that has a 2.0(nm)rectangular cross-section and 4.0(nm) long source-drain regions(Ref. [32]). A. Martinez, N. Seoane, A. R. Brown, J. R. Barker and A.Asenov,

IEEE Trans. Elec. Dev. , 603, (2009). N. Neophytou, A. Paul, M. S. Lundstrom and G. Klimeck,

IEEETrans. Elec. Dev. , 1286, (2008). http://graal.ens-lyon.fr/MUMPS V. Mehrmann and D. Watkins,

SIAM J. Sci. Statist. Comput.22