[PDF] Localized orbitals for optimal decomposition of molecular properties

Abstract

Localized orbitals are important for modeling and interpreting complicated electronic structures of atoms and molecules in a chemically meaningful way. Here, we present the parameter-free procedure for transforming delocalized molecular orbitals (either canonical self-consistent field orbitals, or Lowdin natural orbitals obtained from a general wavefunction) into the localized property-optimized orbitals (LPOs), which can be used for building the most accurate (in the Frobenius norm sense) approximation to the first-order reduced density matrix in form of the sum of localized mono- and diatomic terms. In this way any, any one-electron molecular property can be decomposed into contributions associated with individual atoms and the pairs of atoms, with the upper bound for the decomposition acucracy known in advance due to Cauchy-Bunyakovsky-Schwarz inequality. In addition, an algorithm is proposed for obtaining 'the Chemist's LPOs' (CLPOs) set of localized orbitals containing a single orbital per a pair of electrons and forming an idealized Lewis structure with the one-electron properties which are closest to the properties obtained from the original many-electron wavefunction. The computational algorithms for constructing LPOs and CLPOs as well as their underlying atomic hybrid orbitals (AHOs and LHOs respectively) from the results of quantum-chemical calculations are presented and their implementation within the open-source freeware program JANPA (this http URL ) is discussed. The performance of the proposed orbital localization procedures is assessed using the test set of density matrices of 33432 small molecules obtained at Hartree-Fock and 2-nd order Moller-Plesset theory levels.

Full PDF

LLocalized orbitals for optimal decomposition of molecular properties

T.Yu.Nikolaienko * and L.A.Bulavin Faculty of Physics of Taras Shevchenko National University of Kyiv; 64/13, Volodymyrska Street, 01601 Kyiv, Ukraine.

Abstract

Cauchy–Bunyakovsky–Schwarz inequality. In addition, an algorithm is proposed for obtaining 'the Chemist's LPOs' (CLPOs) set of localized orbitals containing a single orbital per a pair of electrons and forming an idealized Lewis structure with the one-electron properties which are closest to the properties obtained from the original many-electron wavefunction. The computational algorithms for constructing LPOs and CLPOs as well as their underlying atomic hybrid orbitals (AHOs and LHOs respectively) from the results of quantum-chemical calculations are presented and their implementation within the open-source freeware program JANPA (http://janpa.sourceforge.net/ ) is discussed. The performance of the proposed orbital localization procedures is assessed using the test set of density matrices of 33432 small molecules obtained at Hartree-Fock and 2-nd order Moller-Plesset theory levels.

1. Introduction

Localized molecular orbitals resulting from a unitary transformation of occupied canonical molecular orbitals (MOs) play essential role in physical chemistry as the 'building blocks' or 'descriptors' through which the complicated electronic structure of atoms and molecules can be modeled or interpreted in a comprehensible way. In addition, the localized orbitals concentrated in a limited spatial region of a molecule proved useful in making the high-level correlated quantum-chemical methods more computationally tractable . Although the concept of an orbital itself has been much methodologically debatable and is too ‘fuzzy’ to be defined more precisely than just as the function of coordinates of a single electron somehow related to the system’s wavefunction or electron density, this concept still remains virtually the best one proposed so far for moving the ideas of valence electrons and electron pairs (including bonding and antibonding orbitals, lone pairs etc.), which are the key elements of the chemist's Lewis-structure picture , from qualititative to a quantum-mechanical ground. From this perspective, the localized orbitals are used to decompose (typically in an approximate manner) true many-electron wavefunction of the molecule into the components allowing a chemically meaningful interpretation. In order to transform delocalized orbitals (either the canonical MOs obtained as a solutions of self-consistent field Hartree-Fock or Kohn-Sham equations , or the Lowding natural orbitals obtained by diagonalization of the first-order density matrix corresponding to a correlated wavefunction) into the localized ones, a number of procedures have been developed . However, most of them date back to the times preceding the 'mini revolution' of 1970’s , when personal computers became widely spread among chemists, and therefore these procedures were implemented as a handy computer codes, with probably the only exception of the Natural Bond Orbital (NBO) method implemented in NBO program . Moreover, most of the localization procedures developed so far focuse on optimizing the localized orbitals for representing only one * Author to whom correspondence should be addressed: tim_mail that_typical_symbol_in_addresses ukr.net uantity, e.g., electron charge , local orbital populations , bond order , Coulomb repulsion , correlation energy contributions etc. In this paper we propose a new orbital localization procedure, along with its implementation in the open-source program JANPA , for obtaining the localized orbitals (LPOs) designed to be equally suitable for optimal decomposition of any of one-electron properties into mono- and diatomic contributions. This optimality is ensured by defining the LPOs ( ) loci r   as a set of functions, with each function concentrated at a single atom or a pair of atoms, which provides the most accurate (in a Frobenius norm sense) localized approximation ( , ) ( ) ( ) loc loc loc loci i ii r r n r r            (1) for spin independent (spin traced) first-order reduced density matrix (1-RDM)     ... ...* 2 2 2... ( , ) , ,..., , ,..., ... N NN e N N N r r N r r r r r r dr dr                          , (2) where Ψ is the molecular wavefunction, N e is the total number of electrons and σ i stands for electron spin indices. It is essential that 1-RDM, as defined by Eq. (2), contains all the necessary information about the electronic structure of the system, which is rquired to compute any of the system (spin-independent) one-electron properties defined by the quantum-mechanical operator ˆˆ ( ) e N ii

F f r     as the expectation value     ˆ ˆˆ ˆ( ) ( , ) tr r r F f r r r dr f             , (3) where ˆ  is the first-order reduced density operator (1-RDO) defined as the operator posessing 1-RDM as the kernel. In addition, many of the chemically meaningful electronic structure descriptors, such as effective atomic charges , bond orders and valencies , can be obtained from the information entirely contained in 1-RDM. Many more characteristics of the system can also be obtained from 1-RDM through more complicated non-linear and/or implicit relationships taking into account that 1-RDM 'diagonal part' ( , ) r r    equals the system's electron density ( ) r   which determines virtually all the physical properties of a many-electron system in its ground state through the density functional theory (DFT) framework . Now, if the true 1-RDO ˆ  in (3) is substituted by its localized approximation ˆ loc  , defined by (1) as the kernel, we arrive at the corresponding localized decomposition   ˆ ˆˆ ˆtr loc loc loc loc loc loc loci i i i ii i F f n f n F         of molecular property ˆ F into localized contributions ˆ loc loc loci i i F f    , each being related to a single localized orbital loci  , which in turn, can be attributed to the contribution from a certain atom or atomic pair. Although ˆ loc  is only an approximation to the true ˆ  , and hence ˆ loc F equals the true expectation value ˆ F only approximately, the discrepancy between the last two quantities is minimized as soon as difference loc    is minimized, as can be justified by the trace version of Cauchy–Bunyakovsky–Schwarz inequality implying     ˆ ˆˆ ˆ ˆ ˆ ˆ ˆtr loc loc loc F F f f           . From this inequality, ˆ ˆ loc    becomes a natural choice for the criteria to be applied in the search for the localized orbitals optimally suitable for decomposition of any of the one-electron properties nto localized contributions. Due to this fact, we shall hereinafter refer to the localized orbitals ( ) loci r   which minimize ˆ ˆ loc    as the localized property-optimal orbitals (LPOs). In the following sections we develop the procedure for obtaining LPOs from 1-RDM expressed in terms of atomic basis functions and analyze the procedure performance on the large test set of 33432 small molecules containing from 2 to 12 atoms. For simplicity only a closed-shell case is considered throughout the paper.

2. Theory

We now turn to the development of the method for building such LPOs ( ) loci r   and their corresponding coefficients loci n (see (1)) which minimize

22 , ( ) ˆ ˆ ( ) ( ) ( , ) min loc loci i loc loc loc loci i i n ri n r r r r drdr                         . (4) It is convenient first to find the optimal values of loci n and then proceed with searching for the localized orbitals ( ) loci r   . We will assume the latter to be real-valued functions, normalized to unity and orthogonal, i.e.,   , ( ) ( ) loc loc loc loci j i j ij r r            where ij  is the Kronecker delta. Given this property, the target criteria (4) can be re-expressed as  

22 2 ˆ ˆ ˆ2 ( ) ( , ) ( ) loc loc loc loc loci i i ii i n n r r r r drdr                       , (5) where ˆ  is independent of both ( ) loci r   and loci n . The minimization of the obtained expression as the function of loci n gives   ˆ( ) ( , ) ( ) , loc loc loc loc loci i i i i n r r r r drdr                 , (6) so that the coefficients loci n appear as the LPO occupancies. It should be noted that if (5) is minimized with respect to ( ) loci r   without any further constraints, this results in a well-known Löwdin natural orbitals . ( ) nat i r   , which allow for exact ( ˆ ˆ 0 nat     ) expansion of the true 1-RDM as . . . . ( , ) ( ) ( ) nat nat nat nati i ii r r n r r            . (7) Clearly, the equality ˆ ˆ 0 nat     holds strictly only if the infinite number of terms is included into sum (7). However, since in practical quantum-chemical calculations the wavefunction and/or its components are always presented in the form of expansion over a finite number of basis functions, it can be also taken that ˆ ˆ 0 nat     remains true for 1-RDM obtained from such approximate calculations provided that the number of natural orbitals equals the number of original (linearly independent) basis functions used in the calculations. Still, the expansion (7) requires the use of orbitals which are typically delocalized over the entire molecule, just like the canonical MOs. Therefore, in order to obtain localized orbitalsm certain constraints must inevitably be imposed on loci  . In the proposed method such constraints can be best expressed in terms of coefficients of expansions ( ) ( ) loc loci i r c r         (8) of localized loci  and . . ( ) ( ) nat nati i r c r         of natural (or canonical) orbitals . ( ) nat i r   over the set of atom-centered basis functions which are assumed to be real-valued, normalized to unity and orthogonal (in both intra- and interatomic sense), i.e.,   ,        . For the sake of brevity, we shall further refer to these functions as AOs. Upon introducing AOs, each of which can be attributed to a particular atom in the system, the localization of the orbital is understood as the property of expansion (8) to contain only the AOs centered at a single atom or at any of the two atoms, hereinafter referred to as an atomic pair. Applying this restriction, Eq. (8) can be rewritten as ( ) ( ) loc loci iX r c r         , where X denotes a limited set of values the summation index can take. We further introduce a specific additional constraint on the structure of desired localized orbitals. There should exist such orthonormal linear combinations, further on referred to as atomic hybrid orbitals (AHOs) or simply 'hybrids', ( ) ( ) ( ) Ai iA h r r          (9) of the AOs   on each atom that any two-center orbital localized at the atomic pair A–B (typically representing a covalent bond between the atoms) are expressible as ( ) ( ), , ( ) ( ) ( ) ( ) loc loc A Bi i i A i i B iA B r c r v h r v h r              , (10) where i A i B v v   and neither of these coefficients is close to zero while any orbital localized at a single atom M (typically representing its vacant orbital or an unshared electron pair) are expressible as ( ) ( ) ( ) ( ) loc loc Mj j jM r c r h r           . (11) We shall refer to these two types of localized orbitals as 2c-LOs and 1c-LOs, respectively, for the sake of brevity. The number of orthonormal AHOs on each particular atom is mainained equal to the number of original AOs for the same atom, and both sets of functions span the same Hilbert space. We also require that the total number of all LOs (2c-LOs and 1c-LOs) shoule be the same as the number of AHOs (and hence, of AOs) and that they too should span the same Hilbert space. These requirements guarantee the existence of orthogonal transformations which relate the sets of AOs, AHOs and LOs to one another. The above constraint on the structure of LOs has important implications. Together with the requirement of orthonormality of the LOs it implies that if the AHO enters a certain 1c-LO, such AHO must be orthogonal to all the other LOs (either 1c-LOs or 2c-LOs) and hence can not appear in the (10)- or (11)-type expansion of any of them. Likewise, if the AHO enters 2c-LO, such AHO can not appear in the (11)-type expansion of any 1c-LO (since otherwise some LOs won't be orthogonal), but can additionally appear in the (10)-type expansion of only a single additional 2c-LO at most. What is more, since the total number of LOs equals the total number of AHOs, we onclude that if the AHO enters a certain 2c-LO, it must also appear in exactly a single additional 2c-LO. Consequently, under appropriate orbital numbering scheme the AHOs appearing in 2c-LOs (and not appearing in any of 1c-LOs) can be grouped into pairs in such a way that there would be a one-to-one correspondence between the pair of AHOs   ( ) ( ), ( ) i p i h r h r   and the pair of 2c-LOs   ( ) ( ), ( ) loc loci p i r r     . Here we have introduced a discrete 'pairing function' ( ) p i , which is involutive (   ( ) p p i i  ) and equals to the index of the 'partner' AHO belonging to the same pair as the i -th AHO. Then linear relations (10) between the members of the pairs can be expressed in the matrix form ( ) ( ) loci i i r r  φ v h   , (12) in which we have introduced column-vectors ( ) ( )( ) ( ) lociloci loc p i rr r       φ   and ( ) ( )( ) ( ) ii p i h rr h r      h   , containing 2c-LOs and AHOs of the corresponding pairs, as well as a 2x2 matrix k v containing the AHO 'mixing coefficients'. As can be concluded from the orthonormality conditions introduced above for both AHOs and 2c-LOs, the matrix k v must be orthogonal, i.e., T Tk k k k   v v v v 1 . To summarize, the constraints (10) and (11) imposed on the structure of LOs result in splitting the entire set of AHOs into two subsets: a) the '1C subset' which consists of AHOs identical to 1c-LOs in accordance with (11) and contains as many functions as there are 1c-LOs, b) the '2C subset' which consists of the pairs of AHOs k h , each pair being in one-to-one correspondence with a pair of 2c-LOs lock φ via orthogonal transformation (12). Below we shall use notation i  to indicate that there exists a pair of AHOs in 2C subset to which the i -th AHO belongs. The deduced restriction on the number of LOs into which each of the AHOs can enter leads readily to the criterion for finding AHOs. Indeed, since AHOs span the same Hilbert space as the original AOs, the AHOs can be used as the basis for expanding the true 1-RDM over it. It is thus possible to introduce the density matrix D by defining its elements as the expansion coefficients   , ( , ) ( ) ( ) r r h r h r         D     , (13) where we have dropped for simplicity the superscripts at AHOs. Due to orthonormality of AHOs we also conclude that the elements of D can be obtained as     ˆ, h h     D . (14) Similarly to (13), the localized approximate 1-DRM loc  can be expanded over the same set of basis functions as   , ( , ) ( ) ( ) loc loc r r h r h r         D     , (15) thereby defining the elements   loc  D of another matrix loc D . Similarly to (14), we find that     ˆ, loc loc h h     D , and employing (1) we also find that       , , loc loc loc loci i ii n h h        D . t follows from this expression that, given the properties of LOs loci  summarized by (10) and (11), only the following elements of loc D can be non-zero: all of its diagonal elements and those (and only those) off-diagonal elements which correspond to the AHOs belonging to the same pair in the '2C subset'. What is more, since      

22 2 , ˆ ˆ loc loc loc           D D D D , (16) all of the non-zero elements of loc D must be set equal to the corresponding elements of D in order to minimize ˆ ˆ loc    , thus making loc  as close to  as possible, which leads to      , if, if 2 C and ( )0, otherwise loc p          DD D , (17) where the condition ( ) p    indicates that the μ-th and ν-th AHOs both belong to the same pair in the 2C subset. Given (17), we proceed with the approximation error (16) as        

222 2 , ( )2C loc p          

D D D D D . It should be noted that each term involving the elements of symmetrical matrix D appears twice in the summation over   : first in the form of    

2, ( ) p   D and then as    

2( ), p   D when the index  coincides with ( ) p  for a certain value of   . Finally, since AHOs are merely the result of orthogonal transfromation (9) of original AOs and the sum of squared elements of the matrix (    

22 , . inv      D D ) is invariant under such transformations, it can be concluded that loc  D D is minimized as soon as the sum of squared non-zero elements (17) of D loc is maximized, i.e.,            

22 22 ( ), ( )2C 2C ˆ ˆ, , max

AHO pp h h h h                     

D D . (18) This condition comprises the guiding principle in the search of AHOs.

Condition (18) suggests that the procedure for finding AHOs should be performed by ensuring that: i) the unitary transformation (9) is built so as to make the elements of D appearing in (18) as large by their absolute values as possible, ii) the AHOs are grouped into pairs, thus forming the 2C subsets, in such a way that the second sum in Eq. (18) is as large as possible ('an optimal pairing requirement'). These two requirements are clearly interdependent and in general can be best satisfied in an iterative manner with each iteration involving two sub-steps. The first sub-step consists in building the AO-to-AHO transformation (9) and is executed differently at the very first iteration and at all subsequent iterations. At the very first iteration of our implementation the requirement i) is fulfilled approximately by simultaneous diagonalization algorithm (SDA) operating on the set matrices     , TA AA AA AB BA

G B A   

D D D D     , defined individually for each atom A, where

TAB BA  D D   denotes the sub-matrix of D composed of elements    D with indices , A B     . The SDA finds the unitary transformations A Θ in the space of AOs of each atom, which minimizes the sum of squared off-diagonal components         ( ) TAA AA AB BAB A off A            

D D D D     , or equivalently, maximizes the sum of squared diagonal components         ( )

TAA AA AB BAB A dg A        

D D D D     . (19) Note that only two approximations are involved at this initial iteration: the SDA target function (19) contains a greater number of the matrix D elements than the ultimate AHO target function (18), and the matrix D elements to the 4th rather than to the 2nd power are present in (19). The set of A Θ matrices obtained by SDA provides the initial approximation for AO-to-AHO transformations, which make most of the elements of D in AHO basis as small as possible and at the same time maximize the few remaining elements of D on the diagonals of AA D  and AB D  sub-matrices. After the transformation of this kind the elements of D become suitable for combining them into pairs to satisfy the optimal pairing requirement ii). This requirement is satisfied at the second sub-step of each iteration. During this sub-step, which is executed in the same way at all iterations, the AO-to-AHO transformation (9) is assumed fixed, so that the first term in (18) is constant and the optimal pairing requirement becomes essentially identical to the maximum bond order principle by Jug . Note, however, that in our approach this principle is a consequence of the more general optimality condition (4) subjected to restrictions (10)–(11) on the structure of localized orbitals, and therefore it should not be considered as an independent postulate. The optimal pairing requirement is fulfilled in our implementation by employing the maximum-weight matching 'blossom algorithm' by J. Edmonds . This algorithm finds a set of edges connecting each of the given nodes numbered from 1 to n to at most another single node in such way that the created matching would maximize the sum of weights corresponding to the edges. More formally, if a weight ij w , satisfying ij ji w w  and ii w  , is associated with the edge connecting the i -th and j -th nodes, and symmetrical n x n adjacency matrix P is introduced so as P ij = 1 if and only if there is an edge connecting the i -th and j -th nodes (in particlar, P ii = 1 if the i -th node is not connected to any of other node) and P ij = 0 otherwise, the blossom algorithm finds the adjacency matrix which maximizes the cost function , ij iji j w P  under constraints   ijj P   for all i from 1 to n (which indicates that each node can have no more than a single 'partner'). The algorithm has polynomial time complexity (namely O( n ) in our code, with n being the total number of AHOs), and is executed with the weights w  set to      D , where the elements of matrix D are computed by (14) using the AO-to-AHO transformation matrices available at the current iteration. Note that an optimal distribution of AHOs into 1C subset and pairs of 2C subset is found from the obtained adjacency matrix P by a simple rule: the i -th AHO is placed into 1C subset if ii P  and is placed into 2C subset paired with the j -th AHO if ij P  . The elements of the adjacency matrix are related to the above pairing function ( ) p i , defined on elements of 2C subset, as follows: ( ) p i j  where i j  if and only if ij P  . After the optimal pairing has been found, the next iteration begins. At this, as well as at all further iterations, the current approximations to both the AO-to-AHO transformation and the optimal pairing function are available. Now the sub-step i) is executed differently and is aimed at improving the unitary AO-to-AHO transformation for maximizing AHO  defined in (18) under assumption that the AHO pairing is fixed. In the current implementation this optimization is performed by a simple iterative algorithm, in a certain way similar to the steepest descent algorithm, with orthogonality constraint (cf. ref. 73 and references therein). The algorithm is described in details in Appendix B and is based on the following reasoning. Consider the optimization problem of finding an orthogonal matrix U ( T  U U 1 ) which maximizes some differentiable target function ( ) f U . If this function is linear, i.e.,   ( ) tr T f f   U G U , (20) where f is constant, the unitary matrix U , for which f achieves its maximum value among all orthogonal matrices of a given dimension, can be found as  

1/ 2 T   U G G G . (21) This result is a direct consequence of the fact that such U provides a unitary matrix closest (in the Frobenius norm sense) to the given (non-unitary) matrix G , i.e., U minimizes   T T    

G U G G G U . Note that the same matrix U can be alternatively computed from G by finding its singular value decomposition (SVD)  G VΛQ , (22) where V and Q are unitary matrices and Λ is a diagonal one. Indeed, the symmetrical square root of T T  G G Q Λ Q can be found as  

1/ 2

T T  G G Q ΛQ and its inverse as  

1/ 2 1

T T    G G Q Λ Q . Finally, the substitution of the latter result together with (22) into (21) yields T    U VΛQQ Λ Q VQ . (23) This reformulation allows avoiding the possible numerical instabilities in computing the inverse square root  

1/ 2 T  G G in case if T G G has small (or nearly zero) eigenvalues.

Now assume that the target function f depends on the components of matrix U in non-linear manner, which is the case in (18) for the dependence of the target function AHO  on the coefficients of AO-to-AHO transformation. In this case matrix U can be found as follows. Given that the AO-to-AHO transformation optimization algorithm is initialized with the result of SDA, which maximizes a similar target function (19), it can be anticipated that the ultimate transformation will appear rather 'close' to its initial guess. It is therefore reasonable to assume that due to the initialization used in our implementation, the target function can be approximated by the linear function quite accurately in the vicinity of an initial guess U . We thus consider the first-order expansion of the target function f into the Taylor series in the neighborhood of U :         ( ) ( ) ( ) tr ( ) T ff f f F             U U

U U U U U G U U UU , where G is the 'gradient matrix' with the elements     f      U U

G U . (24) Since  

1/ 2 T   U G G G rather than  U U maximizes   tr T G U , we conclude that     tr tr T T  G U G U , so that     tr 0 T   G U U and the hence ( ) ( ) F F  U U . In case if the linear approximation ( ) ( ) f F  U U were strictly guaranteed to be accurate, implying that ( ) ( ) f f  U U , it would be sufficient to repeatedly update the gradient matrix according to (24) and then re-compute the new matrix U according to (21) or (23) thus establishing an iterative procedure for finding U which maximizes f (cf. ref. 79). In this case it would be sufficient to require that the target function f is bounded from above in order to guarantee that this iterative procedure will converge. Given that the target function (18) of the AHO optimization roblem is bounded from above at least as AHO   D , the above argumentation for the convergence criteria of the iterative procedure would still hold provided that inequality ( ) ( ) f f  U U (25) holds at each iteration. Therefore, this inequality must be checked numerically at each iteration. If inequality (25) is violated at some step, i.e.,   ( ) f f  U U for the new U obtained from (21) or (23), this indicates that the linear approximation ( ) ( ) f F  U U has failed because of too large change of the argument U relative to its preceeding value U . In this case it is possible to adjust the 'stepsize' by modifying Eq. (21) in such a way that inequality ( ) ( ) f f  U U becomes valid again and the convergence argumentation outlined above is restored. As shown in Appendix C, such adjustment of the step size can be achieved by adding a penalty term to the target function, ultimately leading to the replacement of the gradient matrix defined in (24) by the 'damped gradient'     G G U  and then using it in Eq. (21) or (23) instead of the true gradient matrix G to compute a new U . The proposed procedure for choosing the value of the 'damping parameter'  is described in Appendix C. Generalization onto the case of a target function ( ,..., ) a N f U U depending on N a matrix arguments is straightforward: the first-order expansion is represented as         (0) (0) (0)1 1 ,..., ,..., tr a TN a k k kk f f     U U U U G U U , where the partial gradient matrices k G ( k = 1,2,.., N a ) have the elements     (0) l l k k f      U U

G U . In order to treat all arguments of f on equal footing, the new values of all arguments must be computed at once, after the corresponding matrices k G have been obtained using the 'old' argument values (0) (0)1 ,..., a U U . Inequality (25) must be then checked and the usual stepsize adjustment must be executed in case if (25) is violated. After these preparations, it is now possible to formulate the complete AHO construction algorithm:

Step 1 . Initialize AO-to-AHO transformation by simultaneous diagonalization algorithm and compute D . Step 2 . Determine an optimal hybrid pairing and create the 2C subset.

Step 3 . Optimize AO-to-AHO transformation by the algorithm from Appendix B with convergence threshold occh  and compute the target function using the currently available 2C subset. Step 4 . Compute D using optimized AO-to-AHO transformation, then determine the optimal hybrid pairing and calculate the correspondent value of the target function; if the calculated value is higher by at least occh  than the previous target function value saved at step 3, go to step 3; otherwise, stop the algorithm. By this algorithm both the hybrid pairing (i.e., 2C subset) and the AO-to-AHO transformation are optimized, thus maximizing the target function and minimizing ˆ ˆ loc loc      D D . It should be noted that the value of the target function is increased at both

Step 3 and

Step 4 , so that provided that if the algorithm has not been terminated yet after the convergence check performed at Step 4, the increase of the target function after these two steps is no smaller than occh  . The fact that the target function is bounded from above as AHO   D implies that the algorithm is guaranteed to terminate after / occh  D steps at most. .4. The LPO construction algorithm When AO-to-AHO transformation has been found as well as the distribution of the AHOs over 1C and 2C subsets, the set of one- and two-center LPOs are created. While 1c-LPOs coincide with AHOs of 1C subset and are thus readily available, a set of AHO mixing coefficients forming the matrices k v defined by (12) are to be determined for building 2c-LPOs. These matrices can be obtained from requirement that under transformation (12) the localized 1-RDM ( , ) loc r r    expressed by (1) using LPOs    ( , ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) Tloc loc loc loc loc loci i i j j ji jj p jTloc Ti i i j j j j ji jj p j r r n r r r rn h r h r r r                     φ n φh v n v h          , (26) where we introduced a 2x2 diagonal matrix ( ) loc jj loc p j n n       n containing the j -th pair 2c-LO occupancies (cf. (6)), becomes equivalent to Eq. (15), i.e.,     ( ), ( )1C 2C ( , ) ( ) ( ) ( ) ( ) loc i i j p jii j p ji j r r h r h r h r h r            D D       , (27) where we employed (17) for the elements of loc D matrix. Note that the condition ( ) j p j  in (26) ensures that each pair of 2c-LOs appears in the sum only once. If the same condition is applied in the second sum of (27), the terms of this sum can be rewritten in matrix form as    

1C 2C,( ) ( , ) ( ) ( ) ( ) ( )

Tloc pairi i j j jiii jj p j r r h r h r r r           

D h d h       (28) by introducing a 2x2 sub-matrices        , ( )( ), ( ), ( ) jj j p jpairj p j j p j p j      

D Dd D D (29) composed of the elements of D corresponding to the j -th pair of AHOs. By comparing (28) to (26) it can be readily concluded that pair Tj j j j  d v n v or, using orthonormality of j v , that pair Tj j j j  v d v n . The latter implies that the columns of Tj v must be the eigenvectors of pairj d matrix and that the diagonal elements of j n are their corresponding eigenvalues. This condition determines the elements of j v matrices (note that pairj d are known as soon as AHOs and their optimal pairing have been found) and hence completes the LPOs finding procedure. LPOs (CLPOs)

The algorithms for constructing AHOs and LPOs presented above are designed to ensure the optimality property (4) under restrictions (10) and (11) on the orbital structures. Therefore these algorithms can be considered 'nonparametric' in a sense that no additional empirical rules or adjustable parameters (apart from numerical convergence thresholds which are not essential in this context) were used to derive them. In spite of these attractive properties, the number of LPOs produced by the algorithms is equal to the number of basis functions initially used to present an input 1-RDM and therefore is typically much higher than the number of electrons in the system as soon as the quality of the basis is beyond the minimal basis . This number of localized orbitals ight well appear unreasonably large from the perspective of a chemist's Lewis-structure picture, in which a single doubly-occupied localized orbital is used to describe a pair of electrons in the system. Moreover, in the Lewis-structure picture a 'lone pair' (LP) of electrons, i.e., an orbital of (nearly) double occupancy localized at a single atom rather than at a pair of atoms, is an essential element, whereas the second sum in AHO target function (18) is usually maximized if as many AHOs as possible are paired (placed into 2C subset), thus leading mostly to a 2c- rather than 1c-LOs. Nevertheless, by imposing a few additional constraints on the properties of LOs obtained in the result of minimization of loc    , it proves possible to bring both the number and the properties of the produced LOs in close agreement with the expectations of the chemist's Lewis-structure picture at the cost of insignificant degradation of 1-RDM approximation error (4). A set of LOs generated in this way will be referred to as Chemist's Lewis-picture property-oriented localized orbitals or CLPOs for short. The new requirements imposed on CLPOs in addition to constraints (10) and (11), which are not abandoned, affect only 2c-LOs and can be formulated as follows: 1) In each pair of the desired 2c-LOs the occupancy of one orbital should be close to 2.0 electrons, while the occupancy of the other should be close to zero. Accordingly, the first can be considered a bonding orbital (further abbreviated as BD) and the second – antibonding orbital (or NB for short). 2) The desired 2c-LO should only correspond to a covalent but not ionic bond. That is, if the amount of ionic character of the bond is high enough, its bonding orbital must be converted into a monoatomic lone pair (LP). In our implementation, we employ bond ionicity I j defined for the j -th pair of 2c-LOs as            j jj j j threshj j I I     v v v vv v (30) and require that it must not be greater than the user-defined threshold value thresh I (by default set to 0.90 in our implementation, thus ensuring that the contribution of either AHO is no less than 5%) thus formalizing the distinction between the 'truly' covalent and 'almost ionic' bonds. In (30) we utilized an orthogonality property of j v matrix, which implies that     j j   v v and         j j j j    v v v v . The first of the two requirements is formalized by inequalities   ( ) max , 1.0 loc locj p j n n  and   ( ) min , 1.0 loc locj p j n n  (i.e., one of loc j n , ( ) loc p j n occupancies is greater than 1.0 while the other is smaller than 1.0), which must be fulfilled simultaneously. Note that the threshold value of 1.0 electrons must not be considered here as an empirical parameter, but rather as a natural limitation on how many electrons one can consider an 'electron pair'. Although this estimate might seem rather rough for distinguishing the pairs (as compared to, e.g., the threshold value of 1.90 electrons which was used in NBO approach ), this threshold performs quite well as illustrated in the Results and Discussion section by comparing the properties of the obtained CPLOs with the expectations of the Lewis model. Moreover, the proposed condition not only ensures that the pair consisting of both essentially unoccupied 2c-LOs ( loc j n  and ( ) loc p j n  ) will never be created, but additionally prevents creating 2c-LOs composed of two LPs ( loc j n  and ( ) loc p j n  ). The 2c-LOs of the latter type were possible and even favorable from the perspective of target function (18) maximization. However, they were not in line with the expectations of the chemist's Lewis-structure picture. In summary, the new CLPOs are defined as the orbitals which provide the best possible (in the Frobenius norm sense) approximation to the true 1-RDM by localized representation ( , ) ( ) ( )( ) ( ) CLPO CLPO CLPO CLPOi i ii CLPO CLPO CLPOj j jj BD r r n r rn r r                  (31) in which only 1c-LOs (including LPs) and only BD 2c-LOs satisfying the above formulated requirements are used. Although most of the considerations used in constructing the AHO/LPO algorithm still hold in CLPO case, a distinctive features of CLPO  are that it includes just one diatomic BD orbital from a corresponding 2c-LO pair and that this pair itself is formed from atomic hybrids only if these hybrids, when paired, produce the BD orbital, satisfying the ionicity condition (30), and NB orbital with occupancies above and below 1.0 electrons respectively. The set of indices of atomic hybrids satisfying these requirements will be further denoted by 2CL ('two-center Lewis') and their pairing function will be denoted by ( ) L p i . The unpaired atomic hybrids are naturally transferred into the 1C subset defined previously. In general, the optimal composition of atomic hybrids in CLPO case (i.e., A Θ matrices in (9)) must not be exactly the same as the composition of AHOs. Therefore, the new hybrids, which are optimal for CLPO construction, will be referred to as the Lewis atomic hybrid orbitals (LHOs) to avoid ambiguity. The derivation similar to (13)–(16) could now be repeated in order to formulate the target function, which, when maximized, generates LHOs. However, it is more convenient to arrive at this target function in a different way. By substituting (6) into (5), which are both valid in CLPO case, we find the approximation error       ˆ2 , loc loc loc loc loc loci i i i ii i i n n n                 (32) implying that the general target function is   max locii n   . (33) It can be noted that the obtained expression is equivalent to the target function (18), which was used earlier in LPO/AHO case, due to (6) and the fact that k v matrix in (12) is orthogonal. The present CLPO case differs only in that the summands in (33) involve only a single squared occupancy            

22 ( ) ( )11 12 11 12 ˆ( ) ,

L L

CLPO L L L L L L L Li i i i p i i i i p i n h h h h     v v v v (34) per 2c-LO (and LHO) pair. The quadratic form (34) is maximized if the matrix Li v containing LHO-to-CLPO transformation coefficients (defined in the similar way to (12)) is formed from the transposed eigenvectors of        ( )( ) ( ) ( ) ˆ ˆ, ,ˆ ˆ, , LL L L

L L L Li i i p iLi L L L Lp i i p i p i h h h hh h h h         d matrix (cf. (29)). In this case the maximum value achieved by (34) is  

22 , ( ) ( ) L LPLOi i p i n   , where             L L L L L L L L L Lij i i j j i i j j i j h h h h h h h h h h               (35) is the largest eigenvalue of the 2x2 sub-matrix of 1-RDO in LHO basis corresponding to the i -th and j -th LHOs. Such sub-matrix clearly coincides with Li d if ( ) L j p i  , i.e., if the i -th and j -th LHOs are paired. With the above definitions, the LHO target function can now be formulated as    

22 , ( )1C 2CL,( ) ˆ, max LL L LLHO i p iii p i h h            . (36) The complete CLPO algorithm can now be formulated as follows:

Step 1 . Create AO-to-AHO transformation as before and use AHOs as initial guess for LHOs.

Step 2 . Determine optimal LHO pairing by using the target function (36) subjected to constraints imposed on the composition of 2CL subset and find the optimal LHO pairing thus splitting the entire set of LHOs into the 2CL and 1C subsets.

Step 3 . Optimize AO-to-LHO transformation by the algorithm from Appendix B with the gradient matrices using the convergence threshold occh  and compute the target function (36) using the currently available 2CL subset. Step 4 . Use the updated AO-to-LHO transformation to determine the optimal LHO pairing and calculate the correspondent value of target function (36); if the new value is higher by at least occh  than the value of the target function saved previously at step 3, go to step 3, and otherwise stop. It should be noted that this algorithm differs from the AHO algorithm only in constraints imposed on possible LHO pairings. This fact justifies using AHOs as initial guess at Step 1 and further suggests that the ultimate value of the target function (36) will be only slightly lower than that of (36) therefore leading to an insignificant degradation in the created CLPO approximation to the true 1-RDM (see Results and Discussion section for numerical comparison). After the set of LHOs has been found, the Li d matrices are formed, and BD and NB CLPOs are built from LHOs using the mixing matrices Li v formed from transposed eigenvectors of Li d . The LP CLPOs are formed by selecting 1c-LHOs that have occupancies above 1.0 electrons, while the remaining 1c-LHOs are treated as 'unoccupied' or 'Rydberg' (RY) CLPOs. In this way, the total number of CLPOs is maintained the same as the number of initial basis functions (thus allowing a linear bijective transfromation between them). However, only BD and LP CLPOs are designed to make the dominant contribution to (31) and thus referred to as the Lewis subset of CLPOs while the contribution of NB and RY CLPOs (the 'non-Lewis' subset) is usually negligible.

3. Computation details

In order to assess the performance of the proposed orbital localization procedures and compare the results with the chemical-intuitive expectations, a test set of 33432 small molecules composed of 2 to 12 atoms was prepared based on the data available in The PubChemQC Project database . The PubChemQC database contains over 3.98 million of molecules and ions initially taken from PubChem Compound database and optimized at B3LYP/6-31G(d) level of theory . The structures corresponding to closed-shell singlet molecules with zero total charge and composed of no more than 12 atoms were selected and the 1-RDM was computed for each of these molecules at density-fitted MP2/Def2-TZVPP level of theory using PSI4 program (version 1.2a1.dev781). The 1-RDMs corresponding to the Hartree-Fock SCF wavefunctions available as a byproduct during the MP2 calculation were also saved and analyzed in order to assess the influence of electron correlation on the properties of localized orbitals. The MP2 calculations involved all the electrons (no frozen core approximation was used) and the resulting density matrices corresponded to the so-called 'relaxed density'. The relaxed 1-RDM, definded as an appropriate derivative of the total energy, is usually considered preferable over the density matrix determined directly from the approximate wavefunction through definition (2) since for non-variational methods (such as MP2) the relaxed densities usually lead to more correct one-electron properties . It is worth noting that MP2 densities were reported to be in better agreement with coupled-cluster densities than the DFT densities are. Besides, the MP2 densities are not constrained to 1-RDM idempotency property . The PLOs and CLPOs were obtained from the relaxed MP2 and Hartree-Fock (SCF) 1-RDMs according to the presented algorithms employing the Natural Atomic Orbitals (NAOs) as nitial atomic orbitals ( ) r    . These calculations were performed using the open-source JANPA program in which the newly developed algorithms were implemented. The following error measures were used to estimate the discrepancy between the true 1-RDM ( , ) r r    obtained from quantum-chemical calculations and its localized approximation ( , ) loc r r    of the form (1) built from the localized orbitals: loc loc locloc         D D DD D (37) (where we took advantage of the fact that loc D is diagonal if its elements are written in the basis of localized orbitals, and, further, employed (32)) and     ˆtr trˆtr tr loc locL f   DD . (38) The error measure loc  ( loc    ) is the one for which the proposed orbital localization algorithms were optimized. It evaluates the overall accuracy of the localized approximation of the true 1-RDM. In contrast to that, the error measure f L ( L f   ) was not used in the algorithms construction. In case if the whole set of localized orbitals is used in localized 1-RDM expansion, which defines the elements of D loc matrix, L f  due to the invariance of the matrix trace under the unitary transformations of orbitals. However, if the set of localized orbitals used to present ( , ) loc r r    in (1) is limited to, say, the Lewis subset (i.e., BD and LP CLPOs), L f  and equals the fraction of electron charge accumulated by the subset of localized Lewis orbitals. If the localized orbitals in (1) provided exact expansion for the true 1-RDM, loc  would be zero and L f would be 1.0.

4. Results and discussion

The error measures loc  and L f defined in (37) and (38) for each molecule of the test set have been calculated from 1-RDMs obtained at both HF and MP2 levels. Figs. 1 and 2 depict the obtained distributions of loc  and L f values respectively in the range where the frequency with which these value occur is essentially non-zero. Since L f is guaranteed to be exactly 1.0 if the entire set of localized orbitals is used to build loc  in (1) and D loc in (38), Fig. 2 shows L f only for the cases when the Lewis subset (i.e., BD and LP orbitals) of the localized orbitals was used in D loc . Fig. 1. Normalized distribution of relative error ε loc characterizing the localized approximation of 1-RDM using the complete set of LPOs ('all LPOs') and the Lewis-like subset of CLPOs ('BD+LP CLPOs') Fig. 2. Normalized distribution of the electron charge fraction accumulated by the BD and LP localized orbitals comprising the Lewis subset of LPOs and CLPOs The presented data show that reduction of the complete set of LPOs in (1)-type expansion of loc  to the Lewis subset of CLPOs increases the 1-RDM approximation error insignificantly at both HF and MP2 levels. Even in the most unfavorable case, when only Lewis subset of CLPO is used to approximate 1-RDM obtained at MP2 level at which a more delocalized electronic structure is typically obtained, the overall relative error loc  is mostly found in the range below 0.07 and the fraction of electronic charge accumulated by the BD and LP orbitals f L is typically well above 95%. Remarkably, the distributions of f L are concentrated in essentially the same range for LPO and CLPO. This indicates that although the number of Lewis orbitals is much higher in case of LPOs than in case of CLPOs, in fact most of LPOs have negligible occupancy (cf. Fig. 3), and thus they can be safely neglected. The remaining orbitals, being re-optimized and in this way converted into CLPOs, produce essentially as accurate localized approximation to the true 1-RDM as LPOs do. However, an even more notable asset of CLPOs is their closer agreement with the chemist's Lewis-structure picture of molecular electronic structure. This can be demonstrated, in particular, by considering the joint distributions of the occupancies of NB and BD 2c-LOs obtained by the LPO and CLPO methods (Fig. 3). Fig. 3. Joint distributions of NB and BD orbital occupancies obtained by LPO and CLPO algorithms using MP2 and HF reduced density matrices of the test set molecules as an input. Note the logarithmic scale of color bar showing the number of samples in each cell. In contrast to LPO NB/BD pairs exhibiting a pronounced maximum correspondent to nearly zero occupancy of both orbitals, a single maximum in CLPO case corresponds to almost doubly occupied BD orbital and almost unoccupied NB orbital (see Fig. 4 for the corresponding distribution densities). The latter is in perfect agreement with the bonding/antibonding orbital classification suggested by the Lewis-structure picture. Moreover, CLPO joint NB/BD distribution does not exhibit any maxima corresponding to simultaneously occupied NB and BD orbitals. Such maximum is, however, observed (although not so pronouncedly) for LPO pairs. Fig. 4. Normalized distributions of occupancies of diatomic bonding (BD), antibonding (NP) as well as of monoatomic lone pair (LP) and unoccupied (RY) orbitals obtained by CLPO method using MP2 and HF reduced density matrices of the test set molecules as an input. Note a double-range linear scale on the vertical axis. The occupancies of CLPOs are contained in rather narrow ranges corresponding to the Lewis (BD and LP with occupancies mostly above 1.7 electrons) and non-Lewis (NB and RY orbitals with occupancies mostly below 0.5 electrons) subsets, as can be seen from Fig. 4. It is remarkable that the boundaries of these ranges are much narrower than the limiting value of 1.0 electrons initially used in the CLPO algorithm for pre-selecting the atomic hybrid orbital suitable for pairing and 2c-LO formation. The similar conclusion holds for the threshold value of thresh I = 0.90 used to discriminate between the 'truly' covalent and almost ionic bonds in CLPO algorithm. Fig. 5. Normalized distributions occupancies of bonding orbital ionicities obtained by LPO and CLPO methods using MP2 (solid lines) and HF (dashed lines) reduced density matrices of the test set molecules as an input. As seen from Fig. 5, the CLPO BD/NB ionicity is mostly below 0.6, i.e. far below the thresh I value. In contrast, the ionicity of LPO BD/NB is typically close to 1.0 indicating that most of these pairs are dominated by a contribution from a single AHO only. Further evidence of associating CLPO BD and LP orbitals with the pairs of electrons, in the Lewis-structure picture sense, comes from the close relation between the total number of BD and LP CLPOs and half the total number of electrons in the system (see Fig. 6). It should be stressed that in CLPO algorithm, no limitations are explicitly imposed on the number of BD and LP orbitals, ut instead, their number results from the fulfilment of the optimal pairing requirement, which leaves some of the atomic hybrids unpaired. Fig. 6. The total number of Lewis electron pairs (BD and LP orbitals) obtained by the CLPO procedure using MP2 and HF reduced density matrices of the test set of molecules as an input, plotted as the function of the number of electrons in the molecule. Dashed line corresponds to half the total number of electrons in the molecule. The BD CLPOs themselves can be safely associated with the electron pairs forming the covalent bonds. This conjecture can readily be verified by comparing the total number of BD orbitals in which atom participates with its hybridized orbitals with the typical valence of the corresponding chemical element. Such comparison has been performed for H, C, N and O atoms which are the most abundant in the molecules of the investigated test set. Fig. 7 shows the obtained distribution of the 'CLPO valencies' for these atoms. Fig. 7. Normalized distributions of the number of CLPO bonding (BD) orbitals for H, C, N, O atoms in the test set of molecules These data demonstrate good agreement with the well-known chemical valencies of the H, C, N and O atoms. Some minor deviations observed for O and N atoms correspond to the molecules in which the electronic structure is badly representable by a single idealized Lewis structure, i.e., in which electron delocalization and resonance phenomena are pronounced. The visual inspection of isosurfaces of BD CLPOs in the representative selection of molecules (Fig. 8) further confirms the validity of their association with covalent bond orbitals. Fig. 8. CLPOs of 2-fluoroethenimine molecule and their constituent hybridized atomic orbitals (excluding the core electron orbitals) obtained by the proposed algorithms. Isosurfaces corresponding to 0.1 a.u.

5. Conclusions

Two procedures for obtaining the localized orbitals suitable for optimal decomposition of arbitrary one-electron molecular properties into mono- and diatomic contributions are proposed. The localized property-optimized orbitals (LPOs) are produced by the algorithm which does not involve empirical parameters and is deduced from a single optimality requirement to approximate most accurately (in a Frobenius norm sense) the first-order reduced density matrix in terms of localized orbitals of a specific structure. The chemist's Lewis-structure picture localized property-optimized orbitals (CLPOs) are derived from the similar requirement, but their bonding and lone pair orbitals subset is optimized to make the dominating contribution to the localized representation of the first-order reduced density matrix, while the antibonding and Rydberg orbitals comprise a typically inor correction. CLPO algorithm involves only a single empirical parameter which makes chemically meaningful discrimination between the covalent and highly ionic bonds. Both orbital localization procedures have been implemented in a freeware open-source program JANPA ( http://janpa.sourceforge.net/ ). The performance of the proposed procedures has been tested on a set of 33432 small closed-shell molecules containing from 2 to 12 atoms and the structure and properties of the CLPOs have been found to be in excellent agreement with the chemist's Lewis-structure picture. Thus, CLPOs can be considered as the localized orbitals forming a Lewis structure with one-electron physical properties which are closest to the properties computed from the true delocalized many-electron wavefunction.

Appendix A . Gradients of the AHO target function (18)

Let us introduce the density matrix D AO by defining its elements as the coefficients of expansion of the true 1-RDM over original orthonormal AOs (cf. (13) for hybridized orbitals case):   , ( , ) ( ) ( ) AO r r r r           D     (A1) Due to orthonormality of AO basis we obtain     ˆ, AO      D . These coefficients are usually available either directly from the quantum-chemical software used to calculate 1-RDM and/or molecular wavefunction, or can be obtained from 1-RDM in non-orthogonal basis by a linear transformation . By substituting (A1) into (14) and using (9) we find           , , , , AO AOi j i jij h h                  

D D D , (A2) so that the density matrix in AHO basis can be expressed as

T AO  D Θ D Θ , where Θ is the orthogonal matrix built from atomic AO-to-AHO transformation matrices A Θ (cf. (9)) as diagonal sub-blocks. Differentiation of (A2) with respect to the components of Θ matrix yields            ij AO AOi jj i                D D Θ D ΘΘ    

AO AOi jj i         

D Θ D Θ , (A3) where we employed the fact that matrix AO D is symmetrical. The derivatives of the target function (18) with respect to the components of A Θ can now be obtained using the chain rule applied to the target function conveniently rewritten employing adjacency matrix P (see Section 2.3) as                     AHO                      

D P D P D . This leads to the following result                

AO AOAHOAHO            G D D Θ P D D ΘΘ . By introducing the 'weighting matrix'        , if 2 C and, if0, otherwise

AHO p          DW D the AHO target function gradient matrix can be rewritten as AO AHOAHO  

G D ΘW . (A4) Note that since the AHO optimization algorithm involves only derivatives with respect to the elements of Θ with both indices corresponding to the same atom, only diagonal sub-blocks of AHO G matrix are to be computed. Appendix B . The AO-to-AHO transformation optimization algorithm Using the gradient matrix

AHO G (A4), the atomic hybrid optimization algorithm (cf. ref. 79) with the stepsize scaling procedure discussed in Appendix C can be formulated as follows: Step 1 ) Set    and set A Θ to their initial approximation obtained by SDA or skip this step if some other initial values for A Θ and AHO  are already available from the previous executions of the present optimization algorithm; Step 2 ) Compute the target function

AHO  and sub-blocks A G of its gradient matrix AHO G for each atom using the currently available A Θ matrices; Step 3 ) Check convergence: stop and return the accepted A Θ if: AHO occh      and AHO    , or AHO    and    (i.e., stepsize scaling has already been applied) and AHO occh      , or if maximum number N itMax of iterations has been reached; Step 4 ) If AHO    , set  to AHO  save the current A Θ as accepted, and set  to  ; otherwise (if AHO    ): discard the last values of A Θ matrices, set  to     

00 0 trtr tr

T AHOT T AAA N     G UG U G Θ (where

AHO N is the total number of AHOs) if    , or set  to / 2  (i.e., reduce the current value of  twice) if not. Step 5 ) Use the SVD procedure (23) to compute the new  

1/ 2

A TA A A   Θ G G G    , where

A A  G G  if    , or AA A    

G Θ G  otherwise. Proceed with Step 2. In the current implementation both the convergence threshold occh  and the maximum number of iterations N itMax are user-adjustable parameters initially set to 10 –5 and 1000 respectively. Appendix C . Modification of Eq. (21) for adjusting the 'stepsize' Let us assume that the linear approximation     ( ) ( ) tr T f f    U U G U U failed, i.e., that the difference between U and U appeared to be too large. In this case this difference can be reduced by adding a 'penalty term' to the target function f and arriving to a maximization problem for a new function ( ) 2 ( ) L f     

U U U U (C1) where  is an adjustable parameter which controls how 'close' should U be to U . Indeed, if the parameter  is set so small that the ( ) f U term can be neglected in (C1) in comparison with the 'penalty term' and ( ) L U achieves its maximum value of zero at  U U . Further, since               

20 0 00 0 0 00 trtr tr tr tr2 2 tr

T TT T T TTdim N             U U U U U UU U U U U U U UU U , where     tr tr tr T T dim N    U U U U 1 is the constant equal to the size of U and U matrices and   tr T dim N    U U U U , we conclude that the 'penalty term' is bounded from as  

20 0

Tdim dim

N N        

U U U U . This implies that for 'large'  , i.e., when   dim f U N    , the penalty term can be neglected and L is maximized when ( ) f U is. Now let us consider a case of 'intermediate' values of  and use a linear approximation for L near U :           ( ) 2 ( ) 2 tr 2 2 tr2 tr ( ) T TdimT approx

L f NL L              

U U G U U U UG U U U (C2) where  

Tdim

L f N       

U G U does not depend on U . The obtained function ( ) approx L U is maximized when  

1/ 2 T   U G G G    , (C3) where     G G U  (C4) is a 'damped' gradient matrix. It can be easily verified that the obtained expression (C3) reduces to  U U at   and to  

1/ 2 T   U G G G at a sufficiently large  . Hence, (C4) can be considered as a continuous interpolation between the 'old' ('undamped') solution (21), which can violate inequality (25), and a 'fully damped' solution  U U which does not violate inequality (25), but in this form is useless. It can be shown however, that apart from a special case of T G U being a symmetrical matrix, there exists a small (but finite!) value   for which inequality (25) is fulfilled and at the same time  U U . In order to show that, consider the behavior of (C3) at small  :                   

1/ 20 0 0 1/ 220 0 0 0 00 0 00 0 0 0 0 0 0 0

22 2

TT T T TT TT T T                                   

U G U G U G UG U G G G U U G U UG U 1 G U U GG U U G U U U G U G U G U and hence,  

1( ) 2 T dd        U G U G U . It is worth noting that    

T T T T T T dd                   UG G G U G U G G G U G U (C5) where the right-hand side can be rearranged using         

20 0 0 0 0 00 0 0 0 0 0 0 00 0 trtr2 tr

T T T T T TT T T T T T T TT T T          

GU U G U G GU GU U GU G GU U G U G GU GU GU U GG G G U G U (C6) With this result we further obtain for the derivative of the target function f    

20 00 0

T T T dUd df dfd dU d d                         UU G GU U G , so that dfd   is either zero (implying that T T  GU U G ) or positive. In the latter case there exists a finite range of values of the parameter  near zero in which the value of   ( ) f  U increases as  increases. This, in turn, implies that one can always find such  which is small enough (but finite!) to fulfill inequality (25). This fact proves the validity of the step adjustment procedure using Eqs. (C3) and (C4) proposed above. Note that the case when dfd   , i.e., T T  GU U G , implies that U is already a locally optimal solution in a sense that no higher value of the target function can be achieved neither by an arbitrary small additive admixture of G to U , nor by a small multiplicative modification in the form e  Ξ U U , where H   Ξ X X is a skew-Hermitian matrix which is assumed to be small (

22 20 e      Ξ U U 1 Ξ ). The latter fact follows from observation that maximization of   f e Ξ U starting from initial point U , which corresponds to  Ξ 0 , would require increasing Ξ in the direction of the corresponding gradient equal to   ( ) T T T f e         

X X X

Γ U GU U GX (the Riemannian derivative ) which is zero if T T  GU U G , so that no step of steepest descent can be taken in this direction. The only clarification needed to be done for practical application of the proposed stepsize adjustment procedure is establishing a rule for selecting the value of  in case of violation of inequality (25). This can be accomplished by noting that since Eq. (C2) was used to derive the stepsize adjustment rules (C3)–(C4), the two U -dependent terms it contains must be of the same order of magnitude. Indeed, if either of them dominates we immediately arrive at one of the limiting cases: a useless  U U at   , or  

1/ 2 T   U G G G , possibly violating inequality (25), at large  . We thus introduce  , a characteristic order of magnitude for  , from the requirement that         tr tr ~ tr tr T T T T dim N        G U G U U U U U , thus we set   tr dimT N   G U . The use of the upper bound for   tr T U U and lower bound for   tr T G U in the above estimate for  can be justified by the fact that during the search for appropriate value of  needed to restore the validity of inequality (25) we begin with setting  to  and then decrease  twice iteratively until inequality (25) is fulfilled. Note that due to the algorithm convergence proof outlined above the desired  can be chosen anywhere in a certain finite continuous range just above zero. The generalization onto the case of target function ( ,..., ) a N f U U depending on several arguments is straightforward: now an auxiliary function L is introduced as     ,..., 2 ,..., a a N N k kk

L f       U U U U U U and by repeating the above considerations we obtain       (0)1 0 ,..., 2 tr a Tapprox N k k kk

L L       U U G U U implying that Eqs. (C3) and (C4) must now be used for each argument k U and its appropriate partial gradient matrix k G . The caracteristic order of magnitude  can now be obtained from the condition         (0) (0)0 0 tr tr ~ tr TT Tk k k k k k kk k k k N           G U G U U U , where k N is the size of the k -th argument matrix k U , yielding   tr kk Tk kk N    G U . For the present implementation, k AHOk

N N   , i.e., the total number of AHOs. References

1. Minkin, V. I. Glossary of terms used in theoretical organic chemistry.

Pure and Applied Chemistry , Chemical physics letters , , 100(2), 151-154. 3. Saebo, S.; Pulay, P. Local treatment of electron correlation. Annual Review of Physical Chemistry , Chem. Soc. Rev. , , 43, 5032-5041 5. Schütz, M., Hetzer, G., & Werner, H. J. Low-order scaling local electron correlation methods. I. Linear scaling local MP2. The Journal of chemical physics , Journal of chemical theory and computation , WIREs Comput Mol Sci. ; e1357. https://doi.org/10.1002/wcms.1357 8. Casassa, S., Zicovich-Wilson, C. M., & Pisani, C. Symmetry-adapted localized Wannier functions suitable for periodic local correlation methods.

Theoretical Chemistry Accounts , Foundations of Chemistry , , 7, 125-148. 10. L. Piela. From Quantum Theory to Computational Chemistry. A Brief Account of Developments / in: Handbook of Computational Chemistry / Ed. Jerzy Leszczynski, Springer Netherlands, 2012, p. 1–12. 11. J. F. Ogilvie. The nature of the chemical bond—1990: There are no such things as orbitals!

Journal of Chemical Education , , 67, 280. 12. L. Pauling. The nature of the chemical bond—1992, J. Chem. Educ . , 69, 519–521. 3. J.F. Ogilvie. The nature of the chemical bond 1993: There are no such things as orbitals! / in: Conceptual trends in quantum chemistry, Eds.: E. S. Kryachko, J. L. Calais, Springer Netherlands, 1994, 171-198. 14. J.J. Morwick. Should orbitals be taught in high school? J. Chem. Educ. , , 56, 262–263. 15. E. R. Scerri. Have orbitals really been observed?. J. Chem. Educ ., , 77, 1492–1494. 16. J. M. Zuo, M. O'Keefe and J. C. H. Spence. Have orbitals really been observed?. J. Chem. Educ. , , 78, 877. 17. E. R. Scerri. Have orbitals really been observed?. J. Chem. Educ ., , 79, 310. 18. Z. Jenkins. Do you need to believe in orbitals to use them?: realism and the autonomy of chemistry. Philosophy of Science , , 70, 1052–1062. 19. S. Shahbazian and M. Zahedi. The role of observables and non-observables in chemistry: a critique of chemical language. Foundations of Chemistry ,

8, 37–52. 20. K. Wittel and S. P. McGlynn. The orbital concept in molecular spectroscopy.

Chemical Reviews , , 77, 745–771. 21. R. S. Mulliken. Spectroscopy, molecular orbitals, and chemical bonding. Science , , 157, 13–24. 22. G. Gryn'ova, M. L. Coote and C. Corminboeuf. Theory and practice of uncommon molecular electronic configurations., Wiley Interdisciplinary Reviews: Computational Molecular Science , , 5, 440–459. 23. G. Tsaparlis. Atomic orbitals, molecular orbitals and related concepts: Conceptual difficulties among chemistry students. Research in Science Education , , 27, 271-287. 24. G. Tsaparlis, G. Papaphotis. Quantum-chemical concepts: Are they suitable for secondary students? Chemistry Education Research and Practice , , 3, 129-144. 25. M. Labarca and O. Lombardi. Why orbitals do not exist? Foundations of Chemistry , , 12, 149-157. 26. J. Autschbach. Orbitals: some fiction and some facts. Journal of Chemical Education, , 89, 1032-1040. 27. F. Barradas-Solas and P.J. Sánchez Gómez. Orbitals in chemical education. An analysis through their graphical representations.

Chemistry Education Research and Practice , , 15, 311-319. 28. D. G. Truhlar. Are molecular orbitals delocalized?. Journal of Chemical Education, , 89, 573–574. 29. M.Dauth, T. Körzdörfer, S. Kümmel, J. Ziroff, M. Wiessner, A. Schöll, F. Reinert, M. Arita, and K. Shimada. Orbital density reconstruction for molecules.

Physical review letters , , 107, 193002-1–193002-5. 30. R.F.W.Bader. On the non-existence of parallel universes in chemistry. Foundations of Chemistry, , 13, 11–37. 31.A. Grushow Is it time to retire the hybrid atomic orbital?.

J.Chem.Educ. , , 88, 860-862. 32. E.R. Scerri. The recently claimed observation of atomic orbitals and some related philosophical issues. Philosophy of Science , , 68, S76-S88. 33. P. Mulder. Are orbitals observable? Hyle–Int J Philos Chem ,

17, 24-35. 34. J. F. Gonthier, S. N. Steinmann, M. D. Wodrich, C. Corminboeuf , Quantification of “fuzzy” chemical concepts: a computational perspective , Chem. Soc. Rev. , 41, 4671-4687. 35. F. Weinhold, C. R. Landis,

Valency And Bonding: A Natural Bond Orbital Donor–Acceptor Perspective , Cambridge University Press, New York, 2005. 36. F. Neese. Prediction of molecular properties and molecular spectroscopy with density functional theory: From fundamental theory to exchange-coupling.

Coordination Chemistry Reviews , , 253(5-6), 526-563. 37. P.-O. Löwdin. Quantum Theory of Many-Particle Systems. I. Physical Interpretations by Means of Density Matrices, Natural Spin-Orbitals, and Convergence Problems in the Method of Configurational Interaction, Phys. Rev. , 97, 1474–1489. 38. F. Weinhold,

Natural bond orbital methods , in: P.v.R. Schleyer, N.L. Allinger, T. Clark, J. Gasteiger, P.A. Kollman, H.F. Schaefer III, P.R. Schreiner (Eds.).

Encyclopedia of Computational

Chemistry , Chichester: John Wiley & Sons, ; Vol. 3., pp 1792–1811. 9. A. E. Reed, L. A. Curtiss, F. Weinhold.

Intermolecular interactions from a natural bond orbital, donor-acceptor viewpoint . Chemical Reviews, , 88(6), 899-926. 40. J. P. Foster, F. Weinhold, Natural hybrid orbitals,

J. Am. Chem. Soc. , 102, 7211-7218. 41. A. E. Reed, F. Weinhold, Natural bond orbital analysis of near-Hartree–Fock water dimer,

J. Chem. Phys . , 78, 4066–4073. 42. A. E. Reed, R. B. Weinstock and F. Weinhold, Natural population analysis, J. Chem. Phys., , 83, 735–746. 43. F. Weinhold, J.E. Carpenter (1988)

The Natural Bond Orbital Lewis Structure Concept for Molecules, Radicals, and Radical Ions . In: Naaman R., Vager Z. (eds)

The Structure of Small Molecules and Ions . Springer, Boston, MA. 44. E. D. Glendening, C. R. Landis, F. Weinhold, Natural bond orbital methods,

WIREs Comput Mol Sci , , 2, 1–42. 45. Foster, J. M., & Boys, S. Canonical configurational interaction procedure. Reviews of Modern Physics, , 32(2), 300–302. 46. Pipek, J., & Mezey, P. G. A fast intrinsic localization procedure applicable for abinitio and semiempirical linear combination of atomic orbital wave functions.

The Journal of Chemical Physics, , 90(9), 4916-4926. 47. Mayer, I., Bakó, I., & Stirling, A. Are there atomic orbitals in a molecule?.

The Journal of Physical Chemistry A , , 115(45), 12733-12737. 48. Li, Z., Li, H., Suo, B., & Liu, W. Localization of molecular orbitals: from fragments to molecule. Accounts of chemical research , , 47(9), 2758-2767. 49. Magnasco, V., & Perico, A. Uniform localization of atomic and molecular orbitals. I. The Journal of Chemical Physics , , 47(3), 971-981. 50. Jug, K. A maximum bond order principle. Journal of the American Chemical Society , , 99(24), 7800-7805. 51. Mayer, I. Some remarks on the maximum bond order. Theoretical Chemistry Accounts: Theory, Computation, and Modeling (Theoretica Chimica Acta), , 79(5), 377-378. 52. Edmiston, C., & Ruedenberg, K. Localized atomic and molecular orbitals.

Reviews of Modern Physics, , 35(3), 457–465. 53. Edmiston, C., & Krauss, M.. Pseudonatural Orbitals as a Basis for the Superposition of Configurations. I. He . J. Chem. Phys. , , 45(5), 1833-1839. 54. Edmiston, C., & Krauss, M. Configuration-Interaction Calculation of H and H . J. Chem. Phys. , , 42(3), 1119-1120. 55. Zoboki, T., & Mayer, I. Extremely localized nonorthogonal orbitals by the pairing theorem. Journal of computational chemistry , , 32(4), 689-695. 56. Clifford Dykstra, Gernot Frenking, Kwang Kim, Gustavo Scuseria. Theory and applications of computational chemistry: The first forty years -Elsevier (2005) 57. Nikolaienko, T. Yu.; Bulavin, L. A.; Hovorun, D. M. JANPA: An open source cross-platform implementation of the Natural Population Analysis on the Java platform,

Computational and Theoretical Chemistry , , 1050, 15-22. 58. JANPA: an open source cross-platform implementation of the Natural Population Analysis on the Java platform . http://janpa.sf.net (Accessed Apr. 10, 2018) 59. Thakkar A. J.; Tanner A. C.; Smith V. H.. Inter-relationships between various representations of one-matrices and related densities: A road map and an example. In Density Matrices and Density Functionals (pp. 327-337). Springer, Dordrecht, . 60. Coleman A. J.

RDMs: How did we get here? . In:

Many-Electron Densities and Reduced Density Matrices (pp. 1-17), Ed. Jerzy Cioslowski. Springer, Boston, MA, . 61. Davidson E. R. Reduced Density Matrices in Quantum Chemistry (

Theoretical chemistry: a series of monographs, v. 6 ); Academic Press Inc., . 62. Mayer I.

Bond orders and energy components: extracting chemical information from molecular wave functions , CRC Press, . 63. Wiberg K. B. Application of the pople-santry-segal CNDO method to the cyclopropylcarbinyl and cyclobutyl cation and to bicyclobutane,

Tetrahedron , , 24, 1083–1096. 4. Natiello M. A.; Medrano J. A. On the quantum theory of valence and bonding from the ab initio standpoint, Chem. Phys. Lett. , , 105, 180–182. 65. Kar T.; Sannigrahi A. B.; Mukherjee D. C. Comparison of atomic charges, valencies and bond orders in some hydrogen-bonded complexes calculated from Mulliken and Löwdin SCF density matrices, J. Mol. Struct.: THEOCHEM , , 153, 93–101. 66. Mayer I. Bond orders and valences from ab initio wave functions, Int. J. Quantum Chem. , , 29, 477–483. 67. Nikolaienko T. Yu.; Bulavin L. A.; Hovorun D. M. Can we treat ab initio atomic charges and bond orders as conformation-independent electronic structure descriptors? RSC Adv. , , 6, 74785-74796. 68. Byrne, C. L. Signal Processing: a mathematical approach . CRC Press, , p. 298. 69. Cardoso, J. F., & Souloumiac, A. Jacobi angles for simultaneous diagonalization.

SIAM journal on matrix analysis and applications , Multiple-View Spectral Clustering for Group-wise Functional Community Detection

Journal of Research of the National Bureau of Standards B , , 69(125-130), 55-56. 72. Korte B., Vygen J. Maximum Matchings. In: Combinatorial Optimization. Algorithms and Combinatorics , vol 21. Springer, Berlin, Heidelberg, . 73. Abrudan, T. E., Eriksson, J., & Koivunen, V. Steepest descent algorithms for optimization under unitary matrix constraint.

IEEE Transactions on Signal Processing , , 56(3), 1134-1147. 74. Keller J. B. Closest unitary, orthogonal and Hermitian operators to a given operator. Mathematics Magazine , , 48(4), 192-197. 75. Fan, K.; Hoffman, A. J. Some Metric Inequalities in the Space of Matrices. Proc. Amer. Math. Soc. , 6, 111-116. 76. Carlson B. C.; Keller J. M. Orthogonalization procedures and the localization of Wannier functions.

Physical Review , 105(1), 102. 77. Higham, N. J. Computing the polar decomposition—with applications.

SIAM Journal on Scientific and Statistical Computing , , 7(4), 1160-1174. 78. Philippe, B. An algorithm to improve nearly orthonormal sets of vectors on a vector processor. SIAM Journal on Algebraic Discrete Methods , , 8(3), 396-403. 79. Subotnik, J. E., Shao, Y., Liang, W., & Head-Gordon, M.. An efficient method for calculating maxima of homogeneous functions of orthogonal matrices: Applications to localized occupied orbitals. The Journal of chemical physics , , 121(19), 9220-9229. 80. Jensen, F. Atomic orbital basis sets. Wiley Interdisciplinary Reviews: Computational Molecular Science , , 3(3), 273-295. 81. Verhoeven, J. W. Glossary of terms used in photochemistry (IUPAC Recommendations 1996). Pure and Applied Chemistry , , 68(12), 2223-2286. 82. IUPAC. Compendium of Chemical Terminology, 2nd ed. (the "Gold Book"). Compiled by A. D. McNaught and A. Wilkinson. Blackwell Scientific Publications, Oxford (1997). XML on-line corrected version: http://goldbook.iupac.org (2006-) created by M. Nic, J. Jirat, B. Kosata; updates compiled by A. Jenkins. ISBN 0-9678550-9-8. https://doi.org/10.1351/goldbook. 83. Nakata M.; Shimazaki T. PubChemQC Project: a Large-Scale First-Principles Electronic Structure Database for Data-driven Chemistry, J. Chem. Inf. Model. , , 57 (6), 1300-1308. 84. Nakata M. The PubChemQC project: A large chemical database from the first principle calculations, AIP Conf. Proc. ,1702, 090058. 85.

The PubChemQC Project. http://pubchemqc.riken.jp/ (Accessed Apr. 10, 2018) 86. Kim, S.; Thiessen, P. A.; Bolton, E. E.; Chen, J.; Fu, G.; Gindulyte, A.; Han, L. Y.; He, J. E.; He, S. Q.; Shoemaker, B. A.; Wang, J. Y.; Yu, B.; Zhang, J.; Bryant, S. H.. PubChem substance and compound databases.

Nucleic acids research , , 44(D1), D1202-D1213. 87. Parrish, R. M.; Burns, L. A.; Smith, D. G. A.; Simmonett, A. C.; DePrince III, A. E.; Hohenstein, E. G.; Bozkaya, U.; Sokolov, A. Yu.; Di Remigio, R.; Richard, R. M.; Gonthier, J. F.; ames, A. M.; McAlexander, H. R.; Kumar, A.; Saitow, M.; Wang, X.; Pritchard, B. P.; Verma, P.; Schaefer III, H. F.; Patkowski, K.; King, R. A.; Valeev, E. F.; Evangelista, F. A.; Turney, J. M.; Crawford, T. D.; Sherrill, C. D. Psi4 1.1: An Open-Source Electronic Structure Program Emphasizing Automation, Advanced Libraries, and Interoperability, J. Chem. Theory Comput . , 13(7), 3185–3197. 88. Diercksen, G. H.; Roos, B. O.; Sadlej, A. J. Legitimate calculation of first-order molecular properties in the case of limited CI functions. Dipole moments. Chemical Physics , The Journal of chemical physics , , 87(10), 5976-5986. 90. Packer, M. J., Dalskov, E. K., Sauer, S. P., Oddershede, J. Correlated dipole polarizabilities and dipole moments of the halides HX and CH X (X= F, Cl and Br).

Theoretica chimica acta , The Journal of chemical physics , J. Phys. Chem.

96, 671–679 93. Tsuchimochi, T.; Ten-no, S. General technique for analytical derivatives of post-projected Hartree-Fock.

The Journal of chemical physics , Journal of chemical theory and computation , Journal of chemical theory and computation , Journal of chemical theory and computation ,9(12), 5365-5372.