A novel method for inference of chemical compounds with prescribed topological substructures based on integer programming
aa r X i v : . [ c s . C E ] D ec BH-cyclic arXiv v5: December 4, 2020 A Novel Method for Inference of Chemical Compoundswith Prescribed Topological Substructures Based onInteger Programming
Tatsuya Akutsu , Hiroshi Nagamochi
1. Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan2. Department of Applied Mathematics and Physics, Kyoto University, Kyoto 606-8501, Japan
Abstract
Analysis of chemical graphs is becoming a major research topic in computational molec-ular biology due to its potential applications to drug design. One of the major ap-proaches in such a study is inverse quantitative structure activity/property relation-ships (inverse QSAR/QSPR) analysis, which is to infer chemical structures from givenchemical activities/properties. Recently, a novel framework has been proposed for in-verse QSAR/QSPR using both artificial neural networks (ANN) and mixed integerlinear programming (MILP). This method consists of a prediction phase and an in-verse prediction phase. In the first phase, a feature vector f ( G ) of a chemical graph G is introduced and a prediction function ψ N on a chemical property π is constructedwith an ANN N . In the second phase, given a target value y ∗ of the chemical property π , a feature vector x ∗ is inferred by solving an MILP formulated from the trained ANN N so that ψ N ( x ∗ ) is equal to y ∗ and then a set of chemical structures G ∗ such that f ( G ∗ ) = x ∗ is enumerated by a graph enumeration algorithm. The framework hasbeen applied to chemical compounds with a rather abstract topological structure suchas acyclic or monocyclic graphs and graphs with a specified polymer topology withcycle index up to 2.In this paper, we propose a new flexible modeling method to the framework so that wecan specify a topological substructure of graphs and a partial assignment of chemicalelements and bond-multiplicity to a target graph. Keywords:
QSAR/QSPR, Molecular Design, Artificial Neural Network, Mixed In-teger Linear Programming, Enumeration of Graphs
Mathematics Subject Classification:
Primary 05C92, 92E10, Secondary 05C30,68T07, 90C11, 92-04
Graphs are a fundamental data structure in information science. Recently, design of novel graphstructures has become a hot topic in artificial neural network (ANN) studies. In particular, ex-tensive studies have been done on designing chemical graphs having desired chemical propertiesbecause of its potential application to drug design. For example, variational autoencoders [9],recurrent neural networks [21, 28], grammar variational autoencoders [13], generative adversarialnetworks [7], and invertible flow models [15, 22] have been applied.
H-cyclic arXiv v5: December 4, 2020 descriptors and correspond to feature vectors inmachine learning. Using these chemical descriptors, various heuristic and statistical methods havebeen developed for finding optimal or near optimal chemical graphs [10, 16, 20]. In many of suchmethods, inference or enumeration of graph structures from a given set of descriptors is a crucialsubtask, and thus various methods have been developed [8, 12, 14, 19]. However, enumeration initself is a challenging task, since the number of molecules (i.e., chemical graphs) with up to 30atoms (vertices) C , N , O , and S , may exceed 10 [5]. Furthermore, even inference is a challengingtask since it is NP-hard except for some simple cases [1, 17]. Indeed, most existing methodsincluding ANN-based ones do not guarantee optimal or exact solutions.In order to guarantee the optimality mathematically, a novel approach has been proposed [2]for ANNs, using mixed integer linear programming (MILP). However, this method outputs featurevectors only, not chemical structures. To overcome this issue, a new framework has been proposed[3, 6, 29] by combining two previous approaches; efficient enumeration of tree-like graphs [8], andMILP-based formulation of the inverse problem on ANNs [2]. This combined framework for inverseQSAR/QSPR mainly consists of two phases. The first phase solves (I) Prediction Problem ,where a feature vector f ( G ) of a chemical graph G is introduced and a prediction function ψ N on a chemical property π is constructed with an ANN N using a data set of chemical compounds G and their values a ( G ) of π . The second phase solves (II) Inverse Problem , where (II-a)given a target value y ∗ of the chemical property π , a feature vector x ∗ is inferred from the trainedANN N so that ψ N ( x ∗ ) is close to y ∗ and (II-b) then a set of chemical structures G ∗ such that f ( G ∗ ) = x ∗ is enumerated by a graph search algorithm. In (II-a) of the above-mentioned previousmethods [3, 6, 29], an MILP is formulated for acyclic chemical compounds. Afterwards, Ito et al.[11] and Zhu et al. [30] designed a method of inferring chemical graphs with rank (or cycle index)1 and 2, respectively by formulating a new MILP and using an efficient algorithm for enumeratingchemical graphs with rank 1 [24] and rank 2 [26, 27]. The computational results conducted oninstances with n non-hydrogen atoms show that a feature vector x ∗ can be inferred for up to around n = 40 whereas graphs G ∗ can be enumerated for up to around n = 15. Recently Azam et al. [4]introduced a new characterization of acyclic graph structure, called “branch-height” to define aclass of acyclic graphs with a restricted structure that still covers the most of the acyclic chemicalcompounds in the database. They also employed the dynamic programming method to designa new algorithm for generating chemical acyclic graphs which now works for instances with size n ( G ∗ ) = 50.The framework has been applied so far to a case of chemical compounds with a rather abstracttopological structure such as acyclic or monocyclic graphs and graphs with a specified polymertopology with rank up to 2. When there is a more specific requirement on some part of the graphstructure and the assignment of chemical elements in a chemical graph to be inferred, none of theabove-mentioned methods can be used directly. The main reason is that generating chemical graphsfrom a given feature vector is a considerably hard problem: an efficient algorithm needed to benewly designed for each of different classes of graphs. In this paper, we discover a new mechanism of H-cyclic arXiv v5: December 4, 2020 G † in the sense that all generated chemical graphs G ∗ have the same feature vector f ( G ∗ ) = f ( G † ). Section 7 makes some concluding remarks.Appendix A describes the details of all variables and constraints in our MILP formulation. This section introduces some notions and terminology on graphs, a modeling of chemical com-pounds and our choice of descriptors.Let R , Z and Z + denote the sets of reals, integers and non-negative integers, respectively. Fortwo integers a and b , let [ a, b ] denote the set of integers i with a ≤ i ≤ b . Multi-digraphs
A multi-digraph G is defined to be a pair of a set V of vertices and a set E ofdirected edges such that each edge e ∈ E corresponds to an ordered pair ( u, v ) of vertices, where u and v are called the tail and a head of e and denoted by head( e ) and tail( e ), respectively. Amulti-digraph G may contain an edge e ∈ E with head( e ) = tail( e ), which is called a self-loop ; ortwo edges e, e ′ ∈ E with the same pair of tail and head, which are called multiple edges .Let G = ( V, E ) be a multi-digraph. For each vertex v ∈ V , we define the sets as follows: E − G ( v ) , { e ∈ E | head( e ) = v } , E + G ( v ) , { e ∈ E | tail( e ) = v } ,N − G ( v ) , { tail( e ) ∈ V | e ∈ E − G ( v ) } , N + G ( v ) , { head( e ) ∈ V | e ∈ E + G ( v ) } . The in-degree deg − G ( v ) and out-degree deg + G ( v ) of a vertex v ∈ V are defined to be deg − G ( v ) , | E − G ( v ) | and deg + G ( v ) , | E + G ( v ) | , respectively. Given a multi-digraph G , let V ( G ) and E ( G ) denote the setsof vertices and edges, respectively. Muligraphs
The graph obtained from a multi-digraph by ignoring the order between the headand the tail of each edge is called a multigraph, where the head and the tail of an edge are calledthe end-vertices of the edge. Let V G ( e ) denote the set of end-vertices of an edge e . A multigraph G may contain an edge e ∈ E with only one end-vertex, which is called a self-loop ; or two edges H-cyclic arXiv v5: December 4, 2020 e, e ′ ∈ E with the same pair of end-vertices, which are called multiple edges . A multigraph withno self-loops and no multiple edges is called simple .Let G = ( V, E ) be a multigraph. For each vertex v ∈ V , we define the sets as follows: E G ( v ) , { e ∈ E | v ∈ V G ( e ) } , N G ( v ) , { u ∈ V G ( e ) \ { v } | e ∈ E G ( v ) } . The degree deg G ( v ) of a vertex v ∈ V is defined to be deg G ( v ) , | E G ( v ) | , where deg G ( v ) = p + 2 q for the number p of non-loop edges incident to v and the number q of self-loops incident to v .The length of a path is defined to be the number of edges in the path. Denote by ℓ ( P ) thelength of a path P . A simple connected graph is called a tree if it contains no cycle and is called cyclic otherwise. For two multigraphs G and G , we denote by G ≃ G when they are isomorphic.Given a multigraph G , let V ( G ) and E ( G ) denote the sets of vertices and edges, respectively. Rank of Multigraphs
The rank r( G ) of a multigraph M is defined to be the minimum numberof edges to be removed to make the multigraph a tree (a simple and connected graph). We call amultigraph G with r( G ) = k a rank- k graph . Rooted Trees A rooted tree is defined to be a tree where a vertex (or a pair of adjacent vertices)is designated as the root . Let T be a rooted tree, where for two adjacent vertices u and v , vertex u is called the parent of v if u is closer to the root than v is. The height height( v ) of a vertex v in T is defined to be the maximum length of a path from v to a leaf u in the descendants of v ,where height( v ) = 0 for each leaf v in T . The height ht( T ) of a rooted tree T is defined to be theheight( r ) of the root r . Bi-rooted Trees
As an extension of rooted trees, we define a bi-rooted tree to be a tree T withtwo designated vertices r ( T ) and r ( T ), called terminals . Let T be a bi-rooted tree. Define the backbone path P T to be the path of T between terminals r ( T ) and r ( T ), and denote by F ( T ) (orby F ( P T )) the set of subtrees of T in the graph T − E ( P T ) obtained from T by removing the edgesin P T , where we regard each tree T ′ ∈ F ( T ) as a tree rooted at the unique vertex in V ( T ′ ) ∩ V ( P T ).The height ht( T ) of T is defined to be the maximum of the heights of rooted trees in F ( T ).We may regard a rooted tree T as a bi-rooted tree T with r ( T ) = r ( T ). Degree-bounded Trees
For positive integers a, b and c with b ≥
2, let T ( a, b, c ) denote the rootedtree such that the number of children of the root is a , the number of children of each non-rootinternal vertex is b and the distance from the root to each leaf is c . Figure 1(a)-(d) illustrate rootedtrees T ( a, b, c ) with ( a, b, c ) ∈ { (1 , , , (2 , , , (2 , , , (3 , , } .We see that the number of vertices in T ( a, b, c ) is a ( b c − / ( b −
1) + 1, and the number ofnon-leaf vertices in T ( a, b, c ) is a ( b c − − / ( b −
1) + 1. In the rooted tree T ( a, b, c ), we denote thevertices by v , v , . . . , v n − with a breadth-first-search order, and denote the edge between a vertex v i with i ∈ [1 , n −
1] and its parent by e i , where n = a ( b c − / ( b −
1) + 1 and each vertex v i with i ∈ [1 , a ( b c − − / ( b −
1) + 1] is a non-leaf vertex. For each vertex v i in T ( a, b, c ), let Cld( i )denote the set of indices j such that v j is a child of v i , and prt( i ) denote the index j such that v j is the parent of v i when i ∈ [1 , n − isomorphic if they admits a graph isomorphism such that thetwo roots correspond to each other. Let T ( a, b, c ) denote the set of subtrees of T ( a, b, c ) that have H-cyclic arXiv v5: December 4, 2020 T ( a, b, c ). Let P prc ( a, b, c ) be a set of ordered index pairs ( i, j ) of vertices v i and v j in T ( a, b, c ) and T prc ( a, b, c ) denote the set of subtree T ∈ T ( a, b, c ) such that, for each pair( i, j ) ∈ P prc ( a, b, c ), T contains vertex v i if it contains vertex v j . We call P prc ( a, b, c ) proper if thenext conditions hold:(a) Each subtree T ∈ T ( a, b, c ) is isomorphic to a subtree T ∈ T prc ( a, b, c ) such thatfor each pair ( i, j ) ∈ P prc ( a, b, c ), if v j ∈ V ( T ) then v i ∈ V ( T ); and(b) For each pair of vertices v i and v j in T ( a, b, c ) such that v i is the parent of v j , there is asequence ( i , i ) , ( i , i ) , . . . , ( i k − , i ρ ) of index pairs in P prc ( a, b, c ) such that i = i and i ρ = j .Condition (b) can be used to reduce the size of a proper set P prc ( a, b, c ) by omitting some pair( i, j ) of indices of a vertex v j and the parent v i of v j . Note that a proper set P prc ( a, b, c ) is notnecessarily unique.For the rooted trees in Figure 1, we obtain proper sets of ordered index pairs as follows. P prc (1 , ,
2) = { (0 , , (1 , , (2 , } , P prc (2 , ,
2) = { (0 , , (1 , , (1 , , (2 , , (3 , , (3 , , (4 , , (5 , } , P prc (2 , ,
2) = { (0 , , (1 , , (1 , , (2 , , (3 , , (3 , , (4 , , (4 , , (5 , , (6 , , (7 , } and P prc (3 , ,
2) = { (0 , , (1 , , (1 , , (2 , , (2 , , (3 , , (4 , , (4 , , (5 , , (5 , , (6 , , (7 , , (7 , , (8 , , (8 , , (9 , , (10 , , (11 , } .With these proper sets, we see that every rooted tree T ∈ T prc ( a, b, c ) satisfies a special propertythat the leftmost path (or the path that visits children with the smallest index) from the root isalway of the length of the height ht( T ). (d) T ( , , ) e v (c) T ( , , ) v v v e e e v v v e e e v v e e v v v e e e v v v e e e v v e v v v e e v e v e root root (b) T ( , , ) e v (a) T ( , , ) v e v v e e v v e v v e e e v v v e root root Figure 1: An illustration of rooted trees T ( a, b, c ): (a) T (1 , , T (2 , , T (2 , , T (3 , , Branch-height in Trees
Azam et al. [4] introduced “branch-height” of a tree as a new measureto the “agglomeration degree” of trees. We specify a non-negative integer ρ , called a branch-parameter to define branch-height. First we regard T as a rooted tree by choosing the center of T as the root. H-cyclic arXiv v5: December 4, 2020 T .- A leaf ρ -branch : a non-root vertex v in T such that height( v ) = ρ .- A non-leaf ρ -branch : a vertex v in T such that v has at least two children u with height( u ) ≥ ρ .We call a leaf or non-leaf ρ -branch a ρ -branch .- A ρ -branch-path : a path P in T that joins two vertices u and u ′ such that each of u and u ′ isthe root or a ρ -branch and P does not contain the root or a ρ -branch as an internal vertex.- The ρ -branch-subtree of T : the subtree of T that consists of the edges in all ρ -branch-pathsof T . We call a vertex (resp., an edge) in T a ρ -internal vertex (resp., a ρ -internal edge ) ifit is contained in the ρ -branch-subtree of T and a ρ -external vertex (resp., a ρ -external edge )otherwise. Let V in and V ex (resp., E in and E ex ) denote the sets of ρ -internal and ρ -externalvertices (resp., edges) in T .- The ρ -branch-tree of T : the rooted tree obtained from the ρ -branch-subtree of T by replacingeach ρ -branch-path with a single edge.- A ρ -fringe-tree : One of the connected components that consists of the edges not in any ρ -branch-subtree. Each ρ -fringe-tree T ′ contains exactly one vertex v in a ρ -branch-subtree,where T ′ is regarded as a tree rooted at v . Note that the height of any ρ -fringe-tree is atmost ρ .- The ρ -branch-number bn ρ ( T ): the number of ρ -branches in T .- The ρ -branch-height bh ρ ( T ) of T : the maximum number of non-root ρ -branches along a pathfrom the root to a leaf of T ; i.e., bh ρ ( T ) is the height of the ρ -branch-tree T ∗ (the maximumlength of a path from the root to a leaf in T ∗ ). Core in Cyclic Graphs
Let H be a connected simple graph with rank r( H ) ≥ core Cr( H ) of H is defined to be an induced subgraph Cr( H ) = ( V ′ = V ′ ∪ V ′ , E ′ ) suchthat V ′ is the set of vertices in a cycle of H and V ′ is the set of verices each of which is in apath between two vertices u, v ∈ V ′ . A vertex (resp., an edge) in H is called a core-vertex (resp., core-edge ) if it is contained in the core Cr( H ) and is called a non-core-vertex (resp., non-core-edge )otherwise. We denote by V co (resp., V nc ) and E co (resp., E nc ) the set of core-vertices (resp.,non-core-vertices) and the set of core-edges (resp., non-core-edges) in H . The core size cs( H ) isdefined to be the number | V co | of core-vertices in the core of H .Figure 2 illustrates three examples of rank-2 graphs H i , i = 1 , , H i ), where cs( H ) = 17, ch( H ) = 6, cs( H ) = 12, ch( H ) = 3, cs( H ) = 12 andch( H ) = 5.A connected component in the subgraph induced by the non-core-vertices of H is called a non-core component of H . Each non-core component T contains exactly one non-core-vertex v T ∈ V nc that is adjacent to a core-vertex u T ∈ V co , where the tree T ′ that consists of T and edge v T u T ∈ E nc is called a pendant-tree of H regarded as a tree rooted at the core-vertex u T . The core height ch( H )is defined to be the maximum height ht( T ) of a pendant-tree T of H . H-cyclic arXiv v5: December 4, 2020 u (b) H (a) H (c) H u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v u u u u u u u u u u Figure 2: An illustration of rank-2 graphs H i , i = 1 , ,
3, where the core-vertices are depictedwtih squares, the 2-branch vertices are depicted with gray circles and non-core edges are depictedas directed edges with arrows: (a) H is 2-lean, cs( H ) = 17, ch( H ) = 6, bh ( H ) = 1 andbc( H ) = bl ( H ) = 2; (b) H is not 2-lean, cs( H ) = 12, ch( H ) = 3, bh ( H ) = 1, bc ( H ) = 1and bl ( H ) = 2; (c) H is not 2-lean, cs( H ) = 12, ch( H ) = 5, bh ( H ) = 2, bc ( H ) = 1 andbl ( H ) = 2.A core-path P of a graph H is defined to be a subgraph of the core Cr( H ) = ( V co , E co ) suchthat the degree of each internal vertex v of P is 2 in the core; i.e., deg P ( v ) = deg Cr( H ) ( v ) = 2. A path-partition P of the core Cr( H ) is defined to be a collection of core-paths P i with ℓ ( P i ) ≥ i = [1 , p ] such that each core-edge belongs to exactly one core-path in P ; i.e., [ i ∈ [1 ,p ] E ( P i ) = E co , E ( P i ) ∩ E ( P j ) = ∅ , ≤ i < j ≤ p. For example, the core Cr( H ) in Figure 2(a) admits a path-partition P = { P , P , . . . , P } suchthat P = ( v , v , v , v , v ), P = ( v , v , v , v ), P = ( v , v , v , v , v ), P = ( v , v , v , v , v )and P = ( v , v , v , v ). Branch-height in Cyclic Graphs
Let H be a connected simple graph with rank r( H ) ≥ ρ ≥
0, we define ρ -fringe-tree, leaf ρ -branch, ρ -branch, ρ -branch-path, ρ -branch-subtree, ρ -internal/ ρ -external vertex/edges, ρ -branch-tree and ρ -branch-height in eachpendant-tree of H analogously, where we do not regard any core-vertex as a ρ -branch. A non-core-vertex (resp., non-core-edge) in H is called a ρ -internal vertex (resp., edge) or a ρ -externalvertex (resp., edge) if it is in some ρ -fringe-tree of H . Let V in and V ex (resp., E in and E ex ) denotethe sets of ρ -internal and ρ -external vertices (resp., edges) in H , where V nc = V in ∪ V ex and E nc = E in ∪ E ex .Define the ρ -branch-leaf-number bl ρ ( H ) of H to the number of leaf ρ -branches in H and the ρ -branch-height bh ρ ( H ) to be the maximum ρ -branch-height bh ρ ( T ) over all pendant-trees T of H . We call a pendant-tree of H a ρ -pendant-tree if it contains at least one ρ -branch. We call acore-vertex adjacent to a ρ -pendant-tree a ρ -branch-core-vertex , denote by V bc ρ the set of ρ -branch-core-vertices and define the ρ -branch-core-size bc ρ ( H ) to be | V bc ρ | . Note that cs( H ) ≥ bc ρ ( H ),bl ρ ( H ) ≥ bc ρ ( H ) and either ρ > ch( H ) or ρ ≤ ch( H ) = bh ρ ( H ) + ρ . H-cyclic arXiv v5: December 4, 2020
H ρ -lean if bl ρ ( H ) = bc ρ ( H ); i.e., all ρ -branches in H are leaf ρ -branches andno two ρ -pendant-trees share the same ρ -branch-core-vertex. Note that the ρ -branch height of any ρ -lean graph is at most 1. Figure 2 illustrates three examples of rank-2 graphs. In the first example, u and u are the leaf 2-branches, v and v are the 2-branch-core-vertices, bc ( H ) = bl ( H ) = 2holds and H is 2-lean. In the second example, u and u are the leaf 2-branches, v is the 2-branch-core-vertex, bc ( H ) = 1 < bl ( H ) = 2 holds and H is not 2-lean. In the third example, u and u are the leaf 2-branches, u is the non-leaf 2-branch, v is the 2-branch-core-vertex,bc ( H ) = 1 < bl ( H ) = 2 holds and H is not 2-lean.We here show some statical feature of the chemical graphs in PubChem in terms of rank ofgraphs and ρ -branch height (see [4] for more details).- Nearly 87% (resp., 99%) of rank-4 chemical compounds with up to 100 non-hydrogen atomsin PubChem have the maximum degree 3 (resp., 4) of non-core-vertices.- Nearly 84% of the chemical compounds in the chemical database PubChem have rank atmost 4.- Over 87% (resp., 96%) of rank-1 or rank-2 (resp., rank-3 or rank-4) chemical compoundswith up to 50 non-hydrogen atoms in PubChem have the 2-branch height bh ( G ) at most 1.- Over 92% of 2-fringe-trees of chemical compounds with up to 100 non-hydrogen atoms inPubChem obey the following size constraint: n ≤ d + 2 for each 2-fringe-tree T with n vertices and d children of the root. (1)- For ρ = 2, nearly 97% of cyclic chemical compounds with up to 100 non-hydrogen atoms inPubChem are 2-lean. Polymer Topology
A multigraph is called a polymer topology if it is connected and the degreeof every vertex is at least 3. Tezuka and Oike [25] pointed out that a classification of polymertopologies will lay a foundation for elucidation of structural relationships between different macro-chemical molecules and their synthetic pathways. For integers r ≥ d ≥
3, let PT ( r, d )denote the set of all rank- r polymer topologies with maximum degree at most d . For example,there are three polymer topologies in PT (2 , polymer topology Pt( H ) of a multigraph H with r( H ) ≥ H ′ of degree at least 3 that is obtained from the core Cr( H ) by contracting all vertices of degree2. Note that r(Pt( H )) = r( H ). We represent the graph structure of a chemical compound as a graph H with labels on vertices andmultiplicity on edges in a hydrogen-suppressed model. We treat a cyclic graph H as a mixed graph (a graph possibly with undirected and directed edges) by regarding each non-core-edge uv ∈ E nc as a directed edge ( u, v ) such that u is the parent of v in some pendant-tree of H . Each of the H-cyclic arXiv v5: December 4, 2020 (b) Cr( H )(a) Cr( H ) (c) Cr( H )(e) M (d) M (f) M v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v Figure 3: An illustration of rank-2 graphs and rank-2 multigraphs: (a), (b), (c) the cores Cr( H i ), i = 1 , , H i , i = 1 , , M i ∈ PT (2 , i = 1 , ,
3, where M i ≃ Pt( H i ).examples of rank-2 graphs in Figure 2 is represented as a mixed graph where non-core-edges areregarded as directed edges.Let Λ be a set of labels each of which represents a chemical element such as C (carbon), O (oxygen), N (nitrogen) and so on, where we assume that Λ does not contain H (hydrogen). Letmass( a ) and val( a ) denote the mass and valence of a chemical element a ∈ Λ, respectively. In ourmodel, we use integers mass ∗ ( a ) = ⌊ · mass( a ) ⌋ , a ∈ Λ and assume that each chemical element a ∈ Λ has a unique valence val( a ) ∈ [1 , < over the elements in Λ according to their mass values; i.e., wewrite a < b for chemical elements a , b ∈ Λ with mass( a ) < mass( b ). To represent how two atoms a and b are joined in a chemical graph, we define some notions. A tuple ( a , b , m ) with chemicalelements a , b and a bond-multiplicity m , called an adjacency-configuration was used to representa pair of atoms a and b joined by a bond-multiplicity m [6]. In this paper, we introduce “edge-configuration,” a refined notion of adjacency-configuration.We represent an atom a ∈ Λ with i neighbors in a chemical compound by a pair ( a , i ) of thechemical element a and the degree i , which we call a chemical symbol . For a notational convenience,we write a chemical symbol ( a , i ) (resp., ( a , i + j )) as a i (resp., a { i + j } ). Define the set of thechemical symbols to be Λ dg (Λ , val) , { a i | a ∈ Λ , i ∈ [1 , val( a )] } . We extend the total order < over Λ to one over the elements in Λ dg (Λ , val) so that a i < b j if andonly if “ a < b ” or “ a = b and i < j .”To represent how two atoms a and b are joined in a chemical compound, we use a tuple( a i, b j, m ), a i, b j ∈ Λ dg (Λ , val), m ∈ [1 ,
3] such that i (resp., j ) are the number of neighbors of theatom a (resp., b ) and m is the bond-multiplicity between these atoms. We call the tuple ( a i, b j, m )the edge-configuration of the pair of adjacent atoms. Let Λ ′ dg be a subset of Λ dg (Λ , val). We denote H-cyclic arXiv v5: December 4, 2020 ′ dg ) ‘ the set of all tuples γ = ( a i, b j, m ) ∈ Λ ′ dg × Λ ′ dg × [1 ,
3] such that m ≤ min { val( a ) − i, val( b ) − j } . For a tuple γ = ( a i, b j, m ) ∈ Γ(Λ ′ dg ), let γ denote the tuple ( b j, a i, m ). Define setsΓ < (Λ ′ dg ) , { γ = ( a i, b j, m ) ∈ Γ(Λ ′ dg ) | a i < b j } , Γ = (Λ ′ dg ) , { γ = ( a i, b j, m ) ∈ Γ(Λ ′ dg ) | a i = b j } , Γ > (Λ ′ dg ) , { γ = ( a i, b j, m ) ∈ Γ(Λ ′ dg ) | a i > b j } . As components of a chemical graph to be inferred, we choose setsΛ codg ⊆ Λ dg (Λ , val) , Λ ncdg ⊆ Λ dg (Λ , val) , Γ co ⊆ Γ < (Λ codg ) ∪ Γ = (Λ codg ) , Γ nc ⊆ Γ(Λ ncdg )such that i ≥ a i Λ codg , where the degree of any vertex in the core of a cyclicgraph is at least 2.Let e = uv be an edge in a chemical graph G such that a , b ∈ Λ are assigned to the vertices u and v with deg G ( u ) = i and deg G ( v ) = j , respectively and the bond-multiplicity between themis m . When uv is a core-edge which is regarded as an undirected edge, the edge-configuration τ ( e ) of edge e is defined to be ( a i, b j, m ) if ( a i, b j, m ) ∈ Γ co (or ( b j, a i, m ) otherwise). When uv is a non-core-edge which is regarded as a directed edge ( u, v ) where u is the parent of v in somependant-tree, the edge-configuration τ ( e ) of edge e is defined to be ( a i, b j, m ) ∈ Γ nc .When a branch-parameter ρ is specified, we choose setsΓ in ⊆ Γ nc , Γ ex ⊆ Γ nc such that i, j ≥ a i, b j, m ) Γ in , where the degree of any ρ -internal vertex is atleast 2.We use a hydrogen-suppressed model because hydrogen atoms can be added at the final stage.A chemical cyclic graph over Λ and Γ = Γ co ∪ Γ nc is defined to be a tuple G = ( H, α, β ) of a cyclicgraph H = ( V, E ), a function α : V → Λ and a function β : E → [1 ,
3] such that(i) H is connected;(ii) P uv ∈ E β ( uv ) ≤ val( α ( u )) for each vertex u ∈ V ; and(iii) τ ( e ) ∈ Γ co for each core-edge e ∈ E ; and τ ( e ) ∈ Γ nc for each directed non-core-edge e ∈ E .For a notational convenience, we denote the sum of bond-multiplicities of edges incident to a vertex u as follows: β ( u ) , X uv ∈ E β ( uv ) for each vertex u ∈ V .When a branch-parameter ρ is given, the condition (iii) is given as follows. τ ( e ) ∈ Γ co for each core-edge e ∈ E ; τ ( e ) ∈ Γ in for each directed ρ -internal non-core-edge e ∈ E ; and τ ( e ) ∈ Γ ex for each directed ρ -external non-core-edge e ∈ E .We represent the graph structure of a chemical compound as a graph with labels on verticesand multiplicity on edges in a hydrogen-suppressed model. H-cyclic arXiv v5: December 4, 2020 In our method, we use only graph-theoretical descriptors for defining a feature vector, whichfacilitates our designing an algorithm for constructing graphs. We choose a branch-parameter ρ ≥
1, sets Λ codg and Λ ncdg of chemical symbols and sets Γ co , Γ in and Γ ex of edge-configurations.Let G = ( H = ( V, E ) , α, β ) be a chemical cyclic graph with the chemical symbols and the edge-configurations.We define a feature vector f ( G ) that consists of the following 16 kinds of descriptors.- n ( G ): the number | V | of vertices.- cs( G ): the core size of G .- ch( G ): the core height of G .- bl ρ ( G ): the ρ -branch-leaf-number of G .- ms( G ): the average mass ∗ of atoms in G ; i.e., ms( G ) , P v ∈ V mass ∗ ( α ( v )) /n ( G ).- dg co i ( G ), i ∈ [1 , i in G ;i.e., dg co i ( G ) , |{ v ∈ V co | deg H ( v ) = i }| .- dg nc i ( G ), i ∈ [1 , i in G ;i.e., dg nc i ( G ) , |{ v ∈ V nc | deg H ( v ) = i }| .- bd co m ( G ), m ∈ [2 , m ;i.e., bd co m ( G ) , { e ∈ E co | β ( e ) = m } .- bd in m ( G ), m ∈ [2 , ρ -internal edges with bond multiplicity m ;i.e., bd in m ( G ) , { e ∈ E in | β ( e ) = m } .- bd ex m ( G ), m ∈ [2 , ρ -external edges with bond multiplicity m ;i.e., bd ex m ( G ) , { e ∈ E ex | β ( e ) = m } .- ns co µ ( G ), µ ∈ Λ codg : the number of core-vertices v with α ( v ) = a and deg G ( v ) = i for µ = a i .- ns nc µ ( G ), µ ∈ Λ ncdg : the number of non-core-vertices v with α ( v ) = a and deg G ( v ) = i for µ = a i .- ec co γ ( G ), γ ∈ Γ nc : the number of undirected core-edges e ∈ E co such that τ ( e ) = γ .- ec in γ ( G ), γ ∈ Γ in : the number of directed ρ -internal edges e ∈ E in such that τ ( e ) = γ .- ec ex γ ( G ), γ ∈ Γ ex : the number of directed ρ -external edges e ∈ E ex with τ ( e ) = γ .- ns H ( G ): the number of hydrogen atoms; i.e., ns H ( G ) , X v ∈ V val( α ( v )) − X e ∈ E β ( e ) . The number K of descriptors in our feature vector x = f ( G ) is K = | Λ codg | + | Λ ncdg | + | Γ co | + | Γ in | + | Γ co | + 20. Note that the set of the above K descriptors is not independent in the sensethat some descriptor depends on the combination of other descriptors in the set. For example,descriptor bd ex m ( G ) can be determined by P γ =( µ,µ ′ ,i ) ∈ Γ: i = m ec ex γ ( G ). H-cyclic arXiv v5: December 4, 2020 We review the framework that solves the inverse QSAR/QSPR by using MILPs [6, 11, 30], whichis illustrated in Figure 4. For a specified chemical property π such as boiling point, we denote by a ( G ) the observed value of the property π for a chemical compound G . As the first phase, we solve(I) Prediction Problem with the following three steps.
Phase 1.Stage 1:
Let DB be a set of chemical graphs. For a specified chemical property π , choosea class G of graphs such as acyclic graphs or graphs with a given rank r . Prepare a data set D π = { G i | i = 1 , , . . . , m } ⊆ G ∩ DB such that the value a ( G i ) of each chemical graph G i , i = 1 , , . . . , m is available. Set reals a, a ∈ R so that a ≤ a ( G i ) ≤ a , i = 1 , , . . . , m . Stage 2:
Introduce a feature function f : G → R K for a positive integer K . We call f ( G ) the feature vector of G ∈ G , and call each entry of a vector f ( G ) a descriptor of G . Stage 3:
Construct a prediction function ψ N with an ANN N that, given a vector in R K , returnsa real in the range [ a, a ] so that ψ N ( f ( G )) takes a value nearly equal to a ( G ) for many chemicalgraphs in D . R K x* MILP g * y* : target value inputoutput R G G * M ( x , y , g ; C , C ) M ( x , y ; C ) M ( x , g ; C ) Stage 5 f ( G* ) i x* DA y ( f ( G* ) ) = y* N no G* G s.t. detectdeliver Stage 4 G : class of chemical graphs R K x : = f ( G ) a ( G ) y ( x ) ANN N R G a : property function Stage 2Stage 1 Stage 3 f ( G ) N DA N ... f : feature function y : prediction function N G ... f ( G ) y ( x *)= y * N G (a) Phase 1 (b) Phase 2 (cid:1291) (cid:1291) (cid:1291) (cid:1291) Figure 4: (a) An illustration of Phase 1: Stage 1 for preparing a data set D π for a graph class G and a specified chemical property π ; Stage 2 for introducing a feature function f with descriptors;Stage 3 for constructing a prediction function ψ N with an ANN N ; (b) An illustration of Phase 2:Stage 4 for formulating an MILP M ( x, y, g ; C , C ) and finding a feasible solution ( x ∗ , g ∗ ) of theMILP for a target value y ∗ so that ψ N ( x ∗ ) = y ∗ (possibly detecting that no target graph G ∗ exists);Stage 5 for enumerating graphs G ∗ ∈ G such that f ( G ∗ ) = x ∗ .See Figure 4(a) for an illustration of Stages 1, 2 and 3 in Phase 1.For the set of descriptors κ j , j ∈ [1 , K ] in a feature vector x ∈ R K , we can choose lower andupper bounds κ LB j and κ UB j on each descriptor κ j , and denote by D the set of vectors x ∈ R K suchthat κ LB j ≤ x j ≤ κ UB j , j ∈ [1 , K ]. For example, we can use the range-based method to define anapplicability domain (AD) [18] to inverse QSAR/QSPR by using such a restricted set D . Compute H-cyclic arXiv v5: December 4, 2020 x j and the maximum value x j of the j -th descriptor x j in f ( G i ) over all graphs G i , i = 1 , , . . . , m in a data set D π . Choose lower and upper bounds κ LB j and κ UB j so that x j ≤ κ LB j and κ UB j ≤ x j , j ∈ [1 , K ].In the second phase, we try to find a vector x ∗ ∈ R K from a target value y ∗ of the chemicalpropery π such that ψ N ( x ∗ ) = y ∗ . Based on the method due to Akutsu and Nagamochi [2],Chiewvanichakorn et al. [6] showed that this problem can be formulated as an MILP. By includinga set of linear constraints such that x ∈ D into their MILP, we obtain the next result. Theorem 1. ([11, 30])
Let N be an ANN with a piecewise-linear activation function for an inputvector x ∈ R K , n A denote the number of nodes in the architecture and n B denote the total numberof break-points over all activation functions. Then there is an MILP M ( x, y ; C ) that consists ofvariable vectors x ∈ D ( ⊆ R K ) , y ∈ R , and an auxiliary variable vector z ∈ R p for some integer p = O ( n A + n B ) and a set C of O ( n A + n B ) constraints on these variables such that: ψ N ( x ∗ ) = y ∗ if and only if there is a vector ( x ∗ , y ∗ ) feasible to M ( x, y ; C ) . In the second phase, we solve (II)
Inverse Problem , wherein given a target chemical value y ∗ , we are asked to generate chemical graphs G ∗ ∈ G such that f ( G ∗ ) = x ∗ . For this, we firstfind a vector x ∗ ∈ R K such that ψ N ( x ∗ ) = y ∗ and then generate chemical graphs G ∗ ∈ G suchthat f ( G ∗ ) = x ∗ . However, the resulting vector x ∗ may not admit such a chemical graph G ∗ ∈ G .Azam et al. [3] called a vector x ∈ R K admissible if there is a graph G ∈ G such that f ( G ) = x .Let A denote the set of admissible vectors x ∈ R K . To ensure that a vector x ∗ inferred from agiven target value y ∗ becomes admissible, we introduce a new vector variable g ∈ R q for an integer q so that a feasible solution ( x ∗ , g ∗ ) of the MILP for a target value y ∗ delivers a vector x ∗ with ψ N ( x ∗ ) = y ∗ and a vector g ∗ that represents a chemical graph G † ∈ G with f ( G † ) = x ∗ . In thesecond phase, we treat the next two problems.(II-a) Inference of Vectors Input:
A real y ∗ with a ≤ y ∗ ≤ a . Output:
Vectors x ∗ ∈ A ∩ D and g ∗ ∈ R q such that ψ N ( x ∗ ) = y ∗ and g ∗ forms a chemical graph G † ∈ G with f ( G † ) = x ∗ .(II-b) Inference of Graphs Input:
A vector x ∗ ∈ A ∩ D . Output:
All graphs G ∗ ∈ G such that f ( G ∗ ) = x ∗ .The second phase consists of the next two steps. Phase 2.Stage 4:
Formulate Problem (II-a) as the above MILP M ( x, y, g ; C , C ) based on G and N . Finda feasible solution ( x ∗ , g ∗ ) of the MILP such that x ∗ ∈ A ∩ D and ψ N ( x ∗ ) = y ∗ (where the second requirement may be replaced with inequalities (1 − ε ) y ∗ ≤ ψ N ( x ∗ ) ≤ (1 + ε ) y ∗ for a tolerance ε > H-cyclic arXiv v5: December 4, 2020 Stage 5:
To solve Problem (II-b), enumerate all (or a specified number) of graphs G ∗ ∈ G suchthat f ( G ∗ ) = x ∗ for the inferred vector x ∗ .See Figure 4(b) for an illustration of Stages 4 and 5 in Phase 2. Execution of Stage 5; i.e. generating chemical graphs G ∗ that satisfy f ( G ∗ ) = x ∗ for a given featurevector x ∗ ∈ Z K + is a challenging issue for a relatively large instance with size n ( G ∗ ) ≥
20. Therehave been proposed algorithms for Stage 5 for classes of graphs with rank 0 to 2 [8, 24, 26, 27].All of these are designed based on the branch-and-bound method where an enormous numberof chemical graphs are constructed by repeatedly appending and removing a vertex one by oneuntil a target chemical graph is constructed. These algorithms can generate a target chemicalgraph with size n ( G ∗ ) ≤
20. To break this barrier, Azam et al. [4] recently employed the dynamicprogramming method for designing a new algorithm in Stage 5 and showed that chemical acyclicgraphs G ∗ with a bounded branch-height can be generated for size n ( G ∗ ) = 50. However, for aclass of graphs with a different rank, we may need to design again a new algorithm by the dynamicprogramming method. Moreover, algorithms for higher ranks can be more complicated and do notrun as fast as the algorithm for acyclic graphs due to Azam et al. [4].In this paper, as a new mechanism of Stage 5, we adopt an idea of utilizing the chemical graph G † ∈ G obtained as part of a feasible solution of an MILP in Stage 4. In other words, we modifythe chemical graph G † to generate other chemical graphs G ∗ that are “chemically isomorphic”to G † in the sense that f ( G ∗ ) = f ( G † ) holds. Informally speaking, we reduce the problem offinding such a graph G ∗ into a problem of generating chemical acyclic graphs, to which we haveobtained an efficient dynamic programming algorithm [4]. We first decompose G † into a collectionof chemical trees T † , T † , . . . , T † m such that for a subset V B of the core-vertices of G † , any tree T † i contains at most two vertices in V B , as illustrated in Figure 5(a). Let x ∗ i denote the feature vector f ( T † i ). For each index i , we generate chemical acyclic graphs T ∗ i such that f ( T ∗ i ) = x ∗ i . Finally wecombine the generated chemical trees T ∗ , T ∗ , . . . , T ∗ m to construct a chemical cyclic graph G ∗ suchthat f ( G ∗ ) = P i ∈ [1 ,m ] x ∗ i = f ( G † ). See Section 6 for the details. Although a family of chemicalgraphs G ∗ chemically isomorphic to G † depends on a choice of decomposition into trees T † i andcovers only part of the entire set of target graphs G ∗ with f ( G ∗ ) = x ∗ , the new method can beapplied to any class of graphs or even to a graph with a specific substructure. In the previous application of the framework, a target chemical graph G to be inferred is specifiedwith a small number of parameters such as the number n ( G ) of vertices, the core size cs( G ) andthe core height ch( G ).In this paper, we also introduce a more flexible way of specifying a target graph so that ournew algorithm for generating chemically isomorphic graphs G ∗ can be used. Suppose that we aregiven a requirement R on a target graph specified other than the feature vector f . Now a target H-cyclic arXiv v5: December 4, 2020 (a) Stage 5 G (cid:1291) (cid:1291) T m (cid:1291) (cid:1291) (cid:1291) (cid:1291) T (cid:1291) (cid:1291) T (cid:1291) (cid:1291) T T * T m * T * T * * G T m (cid:1291) (cid:1291) (cid:1291) (cid:1291) T (cid:1291) (cid:1291) T (cid:1291) (cid:1291) T T * T m * T * T * x * x m * x * x * G (cid:1291) (cid:1291) * G * G ... (b) Chemical graphs G with y N ( f ( G ) ) = y* Target chemical graphs Chemical graphs G G ( G C , s co , s nc , s ab ) satisfying a target specification (cid:1291) (cid:1291) ( P , s ch ) -isomorphic to G Chemical graphs chemically decompose combine u u u u u u u u u u u u u u u u Figure 5: An illustration of a process of Stage 5 and a set of target isomorphic graphs: (a) Anew mechanism to Stage 5, where a given target chemical graph G † is decomposed into chemicaltrees T † i , i = 1 , , . . . , m into chemical trees T † i , i = 1 , , . . . , m based on a set V B = { u , u } ofcore-vertices and for each feature vector x ∗ i = f ( T † i ), a chemical tree T ∗ i such that f ( T ∗ i ) = x ∗ i isconstructed before a new target graph G ∗ is obtained as a combination of the resulting chemicaltrees T ∗ , . . . , T ∗ m ; (b) Given a target value y ∗ , an MILP M ( x, y, g ; C , C ) delivers a target chemicalgraph G † , if the intersection of the set of chemical graphs G with ψ N ( f ( G )) = y ∗ and the setof chemical graphs G that satisfy a target specification ( G C , σ co , σ nc , σ αβ ) is not empty, where allchemically ( P , σ ch )-isomorphic graphs to G † belong to the intersection.graph is defined to be a chemical graph G that satisfies ψ N ( f ( G )) = y ∗ for a given target value y ∗ and the requirement at the same time. In general, a chemical graph G ∗ such that f ( G ∗ ) = f ( G † )may not satisfy such an additional requirement R . Recall that G ∗ in Stage 5 is obtained as acombination of chemical trees T ∗ i each of which is chemically isomorphic to the corresponding tree T † i of the given graph G † . Hence if the requirement R on a target chemical graph is independentamong such chemical trees T ∗ i to be inferred from a vector x ∗ i , then any combination G ∗ of inferredchemical trees T ∗ i still satisfies the requirement R , whenever the original graph G † satisfies R . SeeFigure 5(b) for an illustration of the set of chemical graphs G ∗ that are chemically isomorphicto a target chemical graph G † . Section 4 describes a way of specifying a requirement, called a“target specification” such that a prescribed substructure of graphs such as a benzene ring to beincluded in a target chemical graph or a partly predetermined assignment of chemical elementsand bond-multiplicity to a target graph. This section presents a flexible way of specifying a topological structure of the core and assign-ments of chemical elements and bond-multiplicity of a target chemical graph. We define a targetspecification ( G C , σ co , σ nc , σ αβ ) with a multigraph G C and sets σ co , σ nc and σ αβ of lower and upperbounds on several descriptors that we describe in the following. H-cyclic arXiv v5: December 4, 2020 A seed graph G C = ( V C , E C ) is defined to be a multigraph with no self-loops such that the edge set E C consists of four sets E ( ≥ , E ( ≥ , E (0 / and E (=1) , where each of them can be empty. Figure 6illustrates an example of a seed graph. From a seed graph G C , the core of a cyclic graph will beconstructed in the following way:- Each edge e = uv ∈ E ( ≥ will be replaced with a u, v -path P e of length at least 2.- Each edge e = uv ∈ E ( ≥ will be replaced with a u, v -path P e of length at least 1 (equivalently e is directly used or replaced with a u, v -path P e of length at least 2).- Each edge e ∈ E (0 / is either used or discarded.- Each edge e ∈ E (=1) is always used directly. a a a u a a u a a u u a a u u a a u u a a u u a a u u a : E ( ≧ ) ={ a ,a ,...,a }: E ( ≧ ) ={ a }, : E ( / ) ={ a }, : E ( =1 ) ={ a ,a ,...,a } Figure 6: An illustration of a seed graph G C with E ( ≥ = { a , a , . . . , a } , E ( ≥ = { a } , E (0 / = { a } and E (=1) = { a , a , . . . , a } , where the vertices in V C are depicted with gray squares, theedges in E ( ≥ are depicted with dotted lines, the edges in E ( ≥ are depicted with dashed lines,the edges in E (0 / are depicted with gray lines and the edges in E (=1) are depicted with black solidlines. The core of a target chemical graph is constructed from a seed graph G C by a core specification σ co that consists of the following:- Lower and upper bound functions ℓ LB , ℓ UB : E ( ≥ ∪ E ( ≥ → Z + . For a notational conve-nience, set ℓ LB ( e ) := 0, ℓ UB ( e ) := 1, e ∈ E (0 / and ℓ LB ( e ) := 1, ℓ UB ( e ) := 1, e ∈ E (=1) .- Lower and upper bounds cs LB , cs UB ∈ Z + on the core size, where we assume cs LB ≥ | V C | + P e ∈ E ( ≥ ∪ E ( ≥ ( ℓ LB ( e ) − H-cyclic arXiv v5: December 4, 2020
17- Side constraints: As an option, we can specify additional linear constraints on the length ℓ ( P i ) of path P i , a i ∈ E C such as ℓ ( P ) + ℓ ( P ) = c for a constant c or ℓ ( P ) ≤ ℓ ( P ) + ℓ ( P ).An example of a core specification σ co to the seed graph G C in Figure 6 is given in Table 1.Table 1: Example 1 of a core specification σ co . a a a a a a ℓ LB ( a i ) 2 2 2 3 2 1 ℓ UB ( a i ) 3 4 3 5 4 4 cs LB = 20 cs UB = 28A σ co -extension of a seed graph G C is defined to be a graph C such that | V ( C ) | ∈ [cs LB , cs UB ]and C is obtained from replacing each edge e = uv ∈ E ( ≥ ∪ E ( ≥ with a u, v -path P e of length ℓ ( P e ) ∈ [ ℓ LB ( e ) , ℓ UB ( e )] under specified side constraints, if any. Figure 7 illustrates one of the σ co -extensions of the seed graph G C in Figure 6 with the core specification σ co in Table 1. Theedges a i ∈ E ( ≥ , i ∈ [1 ,
5] are replaced with paths P = ( u , u , u ), P = ( u , u , u ), P =( u , u , u , u ), P = ( u , u , u , u , u ) and P = ( u , u , u , u , u ), respectively. Theedge a ∈ E ( ≥ is used in the graph C and the edge a ∈ E (0 / is discarded, where E (=1) ⊆ E ( C ). a a P a P a P a a a a a a P P u u u u u u u u u u u u u u u u u u u u u u Figure 7: An illustration of a σ co -extension C with cs( C ) = 22, where the vertices in V ( C ) \ V C are depicted with white squares.Let C ( G C , σ co ) denote the set of all σ co -extensions of a seed graph G C . We employ a graph C ∈ C ( G C , σ co ) as the core Cr( G ) of a chemical graph G to be inferred.Remember that the core of any connected cyclic graph is a simple connected graph with mini-mum degree at least 2. Possibly some σ co -extension of a seed graph G C is not such a graph. Weshow some sufficient condition for any σ co -extension to be a simple connected graph with minimumdegree at least 2. Let C min ∈ C ( G C , σ co ) denote the minimum σ co -extension; i.e., C min is obtainedfrom the graph ( V C , E ( ≥ ∪ E ( ≥ ∪ E (=1) ) by replacing each edge e ∈ E ( ≥ with a path of the leastlength ℓ LB ( e ). We see that if C min is a connected graph with minimum degree at least 2 then anyextension C ∈ C ( G C , σ co ) becomes a simple connected graph with minimum degree at least 2. H-cyclic arXiv v5: December 4, 2020 Next we show how to specify the structure of the non-core part of a target chemical graph. For aseed graph G C , let a non-core specification σ nc consist of the following:- Lower and upper bounds n LB , n ∗ ∈ Z + on the number of vertices, where cs LB ≤ n LB ≤ n ∗ .- An upper bound dg nc4 , UB ∈ Z + on the number of non-core-vertices of degree 4.- Lower and upper functions ch LB , ch UB : V C → Z + and ch LB , ch UB : E ( ≥ ∪ E ( ≥ → Z + onthe maximum height of trees rooted at a vertex v ∈ V C or at an internal vertex of a path P e with e ∈ E ( ≥ ∪ E ( ≥ .- A branch-parameter ρ ∈ Z + .- Lower and upper functions bl LB , bl UB : V C → { , } on the number of leaf ρ -branches in thetree rooted at a vertex v ∈ V C , where bl UB ( u ) ≤ u ∈ V C for inferring a ρ -leancyclic graph and bl UB ( u ) = 0 if ch UB ( u ) ≤ ρ ;Lower and upper functions bl LB , bl UB : E ( ≥ ∪ E ( ≥ → Z + on the number of leaf ρ -branchesin the trees rooted at internal vertices in a path P e constructed for an edge e ∈ E ( ≥ ∪ E ( ≥ ,where bl UB ( e ) ≤ ℓ UB ( e ) −
1; and ch LB ( u ) > ρ (ch UB ( u ) ≤ ρ ) implies bl LB ( e ) ≥ UB = 0).- Side constraints: As an option, we can specify additional linear constraints on ℓ ( P i ) and thenumber bl( P i ) of leaf ρ -branches in the trees rooted at P i , a i ∈ E C such as bl( P ) + bl( P ) ≤ c for a constant.An example of a non-core specification σ nc to the seed graph G C in Figure 6 is given in Table 2.Table 2: Example 2 of a core specification σ nc . n LB = 30, n ∗ = 50.branch-parameter: ρ = 2 u u u u u u u u u u u u ch LB ( u i ) 0 0 0 0 1 0 0 0 0 0 0 0ch UB ( u i ) 1 0 0 0 3 0 1 1 0 1 2 4 a a a a a a ch LB ( a i ) 0 1 0 4 3 0ch UB ( a i ) 3 3 1 6 5 2 u u u u u u u u u u u u bl LB ( u i ) 0 0 0 0 0 0 0 0 0 0 0 0bl UB ( u i ) 1 1 1 1 1 0 0 0 0 0 0 0 a a a a a a bl LB ( a i ) 0 0 0 1 1 0bl UB ( a i ) 1 1 0 2 1 0 H-cyclic arXiv v5: December 4, 2020 C ∈ C ( G C , σ co ) be a σ co -extension of G C , where each edge e = uv ∈ E ( ≥ ∪ E ( ≥ is replacedwith a u, v -path P e (where possibly P e is equal to e ). We consider a ρ -lean cyclic graph H obtainedfrom C by appending a tree T v with at most one leaf ρ -branch at each vertex v ∈ V ( C ), wherepossibly E ( T v ) = ∅ . We call the vertices in C core-vertices of H and the newly added vertices non-core-vertices of H . For each edge e = uv ∈ E ( ≥ ∪ E ( ≥ let F ( P e ) denote the set of trees T w rooted at internal vertices w of the u, v -path P e (where w = u, v ).We call the above ρ -lean cyclic graph H obtained from a graph C ∈ C ( G C , σ co ) a ( σ co , σ nc ) -extension of G C if the following hold:- n ( H ) ∈ [ n LB , n ∗ ].- dg nc4 ( H ) ≤ dg nc4 , UB .- For each vertex v ∈ V C , the tree T v attached to v satisfies ht( T v ) ∈ [ch LB , ch UB ]; For eachedge e ∈ E ( ≥ ∪ E ( ≥ , max { ht( T ) | T ∈ F ( P e ) } ∈ [ch LB ( e ) , ch UB ( e )].- Each tree T v , v ∈ V ( C ) contains at most one leaf ρ -branch; i.e., H is a ρ -lean graph withCr( H ) = C .- For each edge e ∈ E ( ≥ ∪ E ( ≥ , P { bl ρ ( T ) | T ∈ F ( P e ) } ∈ [bl LB ( e ) , bl UB ( e )].- The additional linear constraints are satisfied.Figure 8 illustrates one of the ( σ co , σ nc )-extensions of the seed graph G C in Figure 6 with thespecifications σ co in Table 1 and σ nc in Table 2.Let H ( G C , σ co , σ nc ) denote the set of all ( σ co , σ nc )-extensions of a seed graph G C . We employa graph H ∈ H ( G C , σ co , σ nc ) as the underlying graph based on which we assign elements in Λ andbond-multiplicities to infer a chemical graph G = ( H, α, β ). C * C * C * C * N * C * C * C * C * C *u u u u u T u T u T u T u T u u u u u u u u u u v v v v v v v v v v v v v v v v v v v v v u u C *, O * C *, N * Figure 8: An illustration of a ( σ co , σ nc )-extension H with Cr( H ) = C with n ( H ) = 43, ch( H ) = 5and bl ( H ) = 3, where the non-core-vertices are depicted with circles, the leaf 2-branches aredepicted with gray circles, the 2-internal edges are depicted with thick gray arrows and the elementsin Λ ∗ ( v ), v ∈ V C are indicated with asterisk. H-cyclic arXiv v5: December 4, 2020 A chemical specification σ αβ consists of the following:- We choose a set Λ of chemical elementsΛ co , Λ nc ⊆ Λ . For a chemical graph G , let na a ( G ) (resp., na co a ( G ) and na nc a ( G )) denote the number ofvertices (resp., core-vertices and non-core-vertices) in G assigned chemical element a ∈ Λ(resp., a ∈ Λ co and a ∈ Λ nc ).- We choose sets of symbolsΛ codg ⊆ Λ dg (Λ co , val) , Λ ncdg ⊆ Λ dg (Λ nc , val)such that i ≥ a i Λ codg . the number of core-vertices v with α ( v ) = a anddeg G ( v ) = i for µ = a i . We choose sets of edge-configurationsΓ co ⊆ Γ < (Λ codg ) ∪ Γ = (Λ codg ) , Γ nc ⊆ Γ(Λ ncdg ) , Γ in ⊆ Γ nc , Γ ex ⊆ Γ nc such that i, j ≥ a i, b j, m ) Γ in .Define Γ co > , { γ = ( ξ, µ, m ) | γ = ( µ, ξ, m ) ∈ Γ co , µ < ξ } .- The induced adjacency-configuration ac( γ ) of an edge-configuration ( a d, b d ′ , m ) is definedto be the adjacency-configuration ac( γ ) = ( a , b , m ). Set the following sets of adjacency-configurations: Γ coac := { ac( γ ) | γ ∈ Γ co } , Γ coac ,> := { ac( γ ) | γ ∈ Γ co > } , Γ inac := { ac( γ ) | γ ∈ Γ in } , Γ exac := { ac( γ ) | γ ∈ Γ ex } . In a chemical specification, we define the adjacency-configuration of a core-edge uv to be( a , b , β ( uv )) with { a , b } = { α ( u ) , α ( v ) } and the adjacency-configuration of a directed non-core edge ( u, v ) to be ( α ( u ) , α ( v ) , β ( uv )).Let ac co ν ( G ) (resp., ac in ν ( G ) and ac ex ν ( G )) denote the number of core-edges (resp., directed ρ -internal edges and directed ρ -external edges) in G assigned adjacency-configuration ν ∈ Γ coac (resp., ν ∈ Γ inac and ν ∈ Γ exac ).- Subsets Λ ∗ ( v ), v ∈ V C of elements that are allowed to be assigned to vertex v ∈ V C ;- Lower and upper bound functions na LB , na UB : Λ → [1 , n ∗ ] and na tLB , na tUB : Λ t → [1 , n ∗ ],t ∈ { co , nc } on the number of core-vertices and non-core-vertices, respectively, assignedchemical element a .- Lower and upper bound functions ns LB , ns UB : Λ dg → [1 , n ∗ ] and ns tLB , ns tUB : Λ tdg → [1 , n ∗ ],t ∈ { co , nc } on the number of core-vertices and non-core-vertices, respectively, assignedsymbol µ . H-cyclic arXiv v5: December 4, 2020
21- Lower and upper bound functions ac tLB , ac tUB : Γ tac → Z + , t ∈ { co , in , ex } on the numberof core-edges, directed ρ -internal edges and directed ρ -external edges, respectively, assignedadjacency-configuration ν .- Lower and upper bound functions ec tLB , ec tUB : Γ t → Z + , t ∈ { co , in , ex } on the numberof core-edges, directed ρ -internal edges and directed ρ -external edges, respectively, assignededge-configurations γ .- Lower and upper functions bd m, LB , bd m, UB : E C → Z + , m ∈ [2 , , LB ( e ) +bd , LB ( e ) ≤ ℓ UB ( e ), e ∈ E C ;- Side constraints: Lower and upper bounds on the number of some adjacency-configurationsand edge-configurations; We can specify additional linear constraints on ℓ ( P i ), bl( P i ) and thenumber na a ( P i ) of chemical element a in the path P i , a i ∈ E C such as na N ( P ) + na N ( P ) ≤ c for a constant c and nitrogen N ∈ Λ.An example of a chemical specification σ αβ to the seed graph G C in Figure 6 is given in Table 3.For a graph H ∈ H ( G C , σ co , σ nc ), let α : V ( H ) → Λ and β : E ( H ) → [1 ,
3] be functions. Then G = ( H, α, β ) is called a ( σ co , σ nc , σ αβ )-extension of G C if the following hold:- P uv ∈ E β ( uv ) ≤ val( α ( u )) for each vertex u ∈ V ( H ); τ ( e ) ∈ Γ co for each core-edge e ; τ ( e ) ∈ Γ in for each directed ρ -internal edge; and τ ( e ) ∈ Γ ex for each directed ρ -external edge.- α ( v ) ∈ Λ ∗ ( v ) for each vertex v ∈ V C .- It holds that na a ( G ) ∈ [na LB ( a ) , na UB ( a )], a ∈ Λ, na co a ( G ) ∈ [na coLB ( a ) , na coUB ( a )], a ∈ Λ co , andna nc a ( G ) ∈ [na ncLB ( a ) , na ncUB ( a )], a ∈ Λ nc ;- It holds that ns µ ( G ) ∈ [na LB ( µ ) , na UB ( µ )], µ ∈ Λ dg , ns co µ ( G ) ∈ [ns coLB ( µ ) , ns coUB ( µ )], µ ∈ Λ codg ,and ns nc µ ( G ) ∈ [ns ncLB ( µ ) , ns ncUB ( µ )], µ ∈ Λ ncdg ;- It holds that ac co ν ( G ) ∈ [ac coLB ( ν ) , ac coUB ( ν )], ν ∈ Γ coac , ac in ν ( G ) ∈ [ac inLB ( ν ) , ac inUB ( ν )], ν ∈ Γ inac ,and ac ex ν ( G ) ∈ [ac exLB ( ν ) , ac exUB ( ν )], ν ∈ Γ exac ;- It holds that ec co γ ( G ) ∈ [ec coLB ( γ ) , ec coUB ( γ )], γ ∈ Γ co , ec in γ ( G ) ∈ [ec inLB ( γ ) , ec inUB ( γ )], γ ∈ Γ in , andec ex γ ( G ) ∈ [ec exLB ( γ ) , ec exUB ( γ )], γ ∈ Γ ex ;- For each edge e ∈ E ( ≥ ∪ E ( ≥ , |{ e ∈ E ( P e ) | β ( e ) = m }| ∈ [bd m, LB ( e ) , bd m, UB ( e )];- The additional linear constraints are satisfied.Figure 9 illustrates one of the ( σ co , σ nc , σ αβ )-extensions of the seed graph G C in Figure 6 withthe specifications σ co , σ nc and σ αβ in Tables 1, 2 and 3, respectively,Let G ( G C , σ co , σ nc , σ αβ ) denote the set of all ( σ co , σ nc , σ αβ )-extensions of G C .When a required condition on a target chemical graph to be inferred is described with a targetspecification ( σ co , σ nc , σ αβ ) with a seed graph G C , the inverse QSAR/QSPR can be formulated asan MILP, as discussed in the next section. H-cyclic arXiv v5: December 4, 2020 σ αβ .Λ = { C , N , O } Λ codg = { C , C , C , N , O } Λ ncdg = { C , C , C , C , N , N , O , O } Γ coac ν co1 = ( C , C , , ν co2 = ( C , C , , ν co3 = ( C , N , , ν co4 = ( C , O , coac ,> ν = ( N , C , , ν = ( O , C , inac ν in1 = ( C , C , , ν in2 = ( C , C , , ν in3 = ( C , O , exac ν ex1 = ( C , C , , ν ex2 = ( C , C , , ν ex3 = ( C , N , , ν ex4 = ( N , C , , ν ex5 = ( C , O , , ν ex6 = ( C , O , ,ν ex7 = ( O , C , co γ co1 = ( C , C , , γ co2 = ( C , C , , γ co3 = ( C , C , , γ co4 = ( C , C , , γ co5 = ( C , C , ,γ co6 = ( C , C , , γ co7 = ( C , C , , γ co8 = ( C , N , , γ co9 = ( C , N , , γ co10 = ( C , O , co > γ = ( C , C , , γ = ( C , C , , γ = ( C , C , , γ = ( C , C , , γ = ( N , C , ,γ = ( N , C , , γ = ( O , C , in γ in1 = ( C , C , , γ in2 = ( C , C , , γ in3 = ( C , C , , γ in4 = ( C , O , , γ in5 = ( C , O , ex γ ex1 = ( C , C , , γ ex2 = ( C , C , , γ ex3 = ( C , C , , γ ex4 = ( C , C , , γ ex5 = ( C , N , ,γ ex6 = ( C , N , , γ ex7 = ( C , O , , γ ex8 = ( O , C , , γ ex9 = ( O , C , , γ ex10 = ( N , C , ∗ ( u ) = { N } , Λ ∗ ( u ) = { C , N } , Λ ∗ ( u ) = { C , O } , Λ ∗ ( u ) = { C } , u ∈ V C \ { u , u , u } C N O na LB ( λ ) 27 1 1na UB ( λ ) 37 4 8 C N O na coLB ( λ ) 9 1 0na coUB ( λ ) 23 4 5 C N O na ncLB ( λ ) 9 1 2na ncUB ( λ ) 18 3 8 C C C C N N N O O LB ( µ ) 6 7 12 0 0 0 0 0 0ns UB ( µ ) 10 11 18 2 2 2 2 5 5 C C C N N O coLB ( µ ) 3 5 0 0 0 0ns coUB ( µ ) 8 15 2 2 3 5 C C C C N N N O O ncLB ( µ ) 6 1 1 0 0 0 0 0 0ns ncUB ( µ ) 10 5 5 2 2 2 2 5 5 ν co1 ν co2 ν co3 ν co4 ac coLB ( ν ) 0 0 0 0ac coUB ( ν ) 30 10 10 10 ν in1 ν in2 ν in3 ac inLB ( ν ) 0 0 0ac inUB ( ν ) 5 5 5 ν ex1 ν ex2 ν ex3 ν ex4 ν ex5 ν ex6 ν ex7 ac exLB ( ν ) 0 0 0 0 0 0 0ac exUB ( ν ) 10 10 10 10 10 10 10 γ co1 γ co2 γ co3 γ co4 γ co5 γ co6 γ co7 γ co8 γ co9 γ co10 ec coLB ( γ ) 0 0 0 0 0 0 0 0 0 0ec coUB ( γ ) 4 15 4 4 10 5 4 4 6 4 γ in1 γ in2 γ in3 γ in4 γ in5 ec inLB ( γ ) 0 0 0 0 0ec inUB ( γ ) 3 3 3 3 3 γ ex1 γ ex2 γ ex3 γ ex4 γ ex5 γ ex6 γ ex7 γ ex8 γ ex9 γ ex10 ec exLB ( γ ) 0 0 0 0 0 0 0 0 0 0ec exUB ( γ ) 8 4 4 4 4 4 6 4 4 4 a a a a a a a a a a a a a a a a bd , LB ( a i ) 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0bd , UB ( a i ) 1 1 0 2 2 0 0 0 0 0 0 1 0 0 0 0 a a a a a a a a a a a a a a a a bd , LB ( a i ) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0bd , UB ( a i ) 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 H-cyclic arXiv v5: December 4, 2020 C N O C C O C N C N C O C C C N C C C C C O O C C C C C C C C C C C C O C C C C C C C Figure 9: An illustration of a ( σ co , σ nc , σ αβ )-extension G = ( H, α, β ) of the seed graph G C inFigure 6 with the specifications σ co , σ nc and σ αβ in Tables 1, 2 and 3, respectively, where a symbol µ ∈ Λ dg is depicted with a pair of an element a and a degree i such as C H-cyclic arXiv v5: December 4, 2020 The framework for the inverse QSAR/QSPR [6, 11, 30] has been applied to a case of chemicalgraphs with an abstract topological structure such as acyclic or monocyclic graphs by Ito et al. [3]and rank-2 cyclic graphs with a specified polymer topology with a cycle index up to 2 by Zhu et al. [30].We show that such classes of cyclic graphs can be specified with part of our target specification( σ co , σ nc , σ αβ ) to a seed graph.In their applications [11, 30], a set Λ of chemical elements is given and the graph size n ( G ),the core size cs( G ) and the core height ch( G ) of a target graph G are required to be prescribedvalues cs † , ch † and n † . Then we specify the bounds on these values in σ nc so that n LB := n ∗ := n † ;cs LB := cs UB := cs † ; ch LB (t) := ch UB (t) := ch † for some graph element t ∈ V C ∪ E ( ≥ ∪ E ( ≥ ; andch LB (t) := 0, ch UB (t) := ch † for the other elements t ∈ V C ∪ E ( ≥ ∪ E ( ≥ . We set Λ codg and Λ ncdg tobe the sets of all possible symbols in Λ × [1 ,
4] and set Γ co , Γ in and Γ ex to be the sets of all possibleedge-configurations in σ αβ . a a u u a u u u u a a a a u u u u a a a a u u u u a a a a a a (a) G C (c) G C (d) G C (b) G C Figure 10: An illustration of seed graphs for inferring cyclic graphs with rank at most 2: (a) A seedgraph G for monocyclic graphs; (b) A seed graph G for rank-2 cyclic graphs with the polymertopology M ∈ PT (2 ,
4) in Figure 3(d); (c) A seed graph G for rank-2 cyclic graphs with thepolymer topology M ∈ PT (2 ,
4) in Figure 3(e); (d) A seed graph G for rank-2 cyclic graphswith the polymer topology M ∈ PT (2 ,
4) in Figure 3(f).A seed graph for inferring a chemical monocyclic graphs can be selected as a multigraph G with a vertex set V C = { u , u } and edge sets E ( ≥ = { a } and E ( ≥ = { a } , as illustrated inFigure 10(a). We can include a linear constraint ℓ ( a ) ≤ ℓ ( a ) as part of the side constraint in σ co .This constraints reduces the search space on an MILP.A seed graph for inferring a chemical rank-2 cyclic graphs with the polymer topology M ∈PT (2 ,
4) in Figure 3(d) can be selected as a multigraph G with a vertex set V C = { u , u , u , u } and edge sets E ( ≥ = { a , a } , E ( ≥ = { a } and E (=1) = { a , a } , as illustrated in Figure 10(b).We can include a linear constraint ℓ ( a ) ≤ ℓ ( a ) as part of the side constraint in σ co .A seed graph for inferring a chemical rank-2 cyclic graphs with the polymer topology M ∈PT (2 ,
4) in Figure 3(e) can be selected as a multigraph G with a vertex set V C = { u , u , u , u } and edge sets E ( ≥ = { a } , E ( ≥ = { a , a } and E (=1) = { a , a } , as illustrated in Figure 10(c).We can include a linear constraint ℓ ( a ) ≤ ℓ ( a ) + ℓ ( a ) and ℓ ( a ) ≤ ℓ ( a ) in σ co .A seed graph for inferring a chemical rank-2 cyclic graphs with the polymer topology M ∈PT (2 ,
4) in Figure 3(f) can be selected as a multigraph G with a vertex set V C = { u , u , u , u } and edge sets E ( ≥ = { a , a , a } and E (=1) = { a , a } , as illustrated in Figure 10(d). We caninclude a linear constraint ℓ ( a ) ≤ ℓ ( a ) + 1, ℓ ( a ) ≤ ℓ ( a ) + 1 and ℓ ( a ) ≤ ℓ ( a ) in σ co . H-cyclic arXiv v5: December 4, 2020 ρ -lean Graphs Let ( G C , σ co , σ nc , σ αβ ) be a target specification, where ρ denotes the branch-parameter in σ nc . Inthis section, we formulate to an MILP M ( x, g ; C ) in Stage 4 for inferring a chemical ρ -lean cyclicgraph G ∈ G ( G C , σ co , σ nc , σ αβ ). Recall that we treat the underlying graph of a chemical cyclic graph as a mixed graph to define ourdescriptors, where core-edges are undirected edges and non-core-edges are directed. To formulatean MILP that infers a chemical cyclic graph, we further assign a direction of each core-edge sothat constraints on a function τ : E → Γ can be described notationally simpler.Our method first gives directions to the edges in a given seed graph such that V C = { u , u , . . . , u p } , E C = { a , a , . . . , a q } and each edge a i ∈ E C is a directed edge a i = ( u j , u h ) with j < h . Let f V C denote the set of vertices u ∈ V C such that bl UB ( u ) = 1 and e t C = | f V C | . Our method first arrangesthe order of verices in V C so thatbl UB ( u i ) = 1, i ∈ [1 , e t C ] and bl UB ( u i ) = 0, i ∈ [ e t C + 1 , t C ].Next our method adds some more vertices and edges to the resulting digraph G C to constructa digraph, called a scheme graph SG = ( V , E ) so that any ρ -lean graph H ∈ H ( G C , σ co , σ nc ) (i.e.,any ( σ co , σ nc )-extension H of G C ) can be chosen as a subgraph of the scheme graph SG.To construct a scheme graph, our method first computes some integers that determine the sizeof each building block in SG.For a given specification ( σ co , σ nc ), define e ch LB , X v ∈ V C ch LB ( v ) + X e ∈ E ( ≥ ∪ E ( ≥ ch LB ( e )bl ∗ LB , X v ∈ V C bl LB ( v ) + X e ∈ E ( ≥ ∪ E ( ≥ bl LB ( e ) , bl ∗ UB , X v ∈ V C bl UB ( v ) + X e ∈ E ( ≥ ∪ E ( ≥ bl UB ( e ) ,ℓ ∗ inl , X v ∈ V C max { ch UB ( v ) − ρ, } + X e ∈ E ( ≥ ∪ E ( ≥ bl UB ( e ) · max { ch UB ( e ) − ρ, } ,β ∗ i , | E ( ≥ ( u i ) | + | E ( ≥ ( u i ) | + X e ∈ E (=1) ( u i ) (1 + bd , LB ( e ) + 2bd , LB ( e )) + bl LB ( u i ) , u i ∈ V C , ∆ i := 4 − β ∗ i , u i ∈ V C . H-cyclic arXiv v5: December 4, 2020 t C := | V C | , e t C := |{ u ∈ V C | bl UB ( u ) = 1 }| , m C := | E C | ,t T := cs UB − | V C | ,t F := min (cid:2) n ∗ − cs LB − max { e ch LB , ρ · bl ∗ LB } , ℓ ∗ inl (cid:3) ,d max := 3 if dg nc4 , UB = 0; d max := 4 if dg nc4 , UB ≥ n C ( i ) := ∆ i · (( d max − ρ − / ( d max − ,n T := 2(( d max − ρ − / ( d max − ,n F := ( d max − d max − ρ − / ( d max − ,m coUB := min { n ∗ + r( G C ) , m C + X e ∈ E ( ≥ ∪ E ( ≥ ℓ UB ( e ) } ,m ncUB := min { n ∗ , t F + X i ∈ [1 ,t C ] n C ( i ) + n T · t T + n F · t F − } ,m UB := min { n ∗ + r( G C ) , m coUB + m ncUB } , (2)where n C ( i ) is the number of “edges” in the rooted tree T (∆ i , d max − , ρ ), n T is the number of“edges” in the rooted tree T (2 , d max − , ρ ) and n F is the number of “edges” in the rooted tree T ( d max − , d max − , ρ ). Recall that any core-vertex is allowed to be of degree 4.Formally the scheme graph SG = ( V , E ) is defined with a vertex set V = V C ∪ V T ∪ V F ∪ V exC ∪ V exT ∪ V exF and an edge set E = E C ∪ E T ∪ E F ∪ E CT ∪ E TC ∪ E CF ∪ E TF ∪ E exC ∪ E exT ∪ E exF thatconsist of the following sets. Construction of the core
Cr( H ) of a ( σ co , σ nc ) -extension H of G C : Denote the vertex set V C and the edge set E C in the seed graph G C by V C = { v C i, | i ∈ [1 , t C ] } and E C = { a i | i ∈ [1 , m C ] } ,respectively, where V C is always included in Cr( H ). For including additional core-vertices inCr( H ), introduce a path P T = ( V T = { v T1 , , v T2 , , . . . , v T t T , } , E T = { e T1 , e T2 , . . . , e T t T } ) oflength t T − E CT (resp., E TC ) of directed edges e CT i,j = ( v C i, , v T j, ) (resp., e TC i,j =( v T j, , v C i, )) i ∈ [1 , t C ], j ∈ [1 , t T ]. In Cr( H ), an edge a k = ( v C i, , v C i ′ , ) ∈ E ( ≥ ∪ E ( ≥ isallowed to be replaced with a path P k from core-vertex v C i, to core-vertex v C i ′ , that visits a setof consecutive vertices v T j, , v T j +1 , , . . . , v T j + p, ∈ V T and edge e TC i,j = ( v C i, , v T j, ) ∈ E CT , thenedges e T j +1 , e T j +2 , . . . , e T j + p ∈ E T and finally edge e TC i ′ ,j + p = ( v T j + p, , v C i ′ , ) ∈ E TC . The verticesin V T in the path will be core-vertices in Cr( H ). Construction of paths with ρ -internal edges in a ( σ co , σ nc ) -extension H of G C : Intro-duce a path P F = ( V F = { v F1 , , v F2 , , . . . , v F t F , } , E F = { e F1 , e F2 , . . . , e F t F } ) of length t F − E CF of directed edges e CF i,j = ( v C i, , v F j, ), i ∈ [1 , t C ], j ∈ [1 , t F ], and a set E TF of di-rected edges e TF i,j = ( v T i, , v F j, ), i ∈ [1 , t T ], j ∈ [1 , t F ]. In H , a path P with ρ -internal edgesthat starts from a core-vertex v C i, ∈ V C (resp., v T i, ∈ V T ) visits a set of consecutive vertices v F j, , v F j +1 , , . . . , v F j + p, ∈ V F and edge e CF i,h = ( v C i, , v F j, ) ∈ E CF (resp., e TF i,j = ( v T i, , v F j, ) ∈ E TF ) and edges e F j +1 , e F j +2 , . . . , e F j + p ∈ E F . In H , the edges and the vertices (except for v C i, ) inthe path P are regarded as ρ -internal edges and ρ -internal vertices, respectively. Construction of ρ -fringe-trees in a ( σ co , σ nc ) -extension H of G C : In H , the root of a ρ -fringe-tree can be any vertex in V C ∪ V T ∪ V F . Let X ∈ { C , T , F } . Introduce a rooted tree X i , i ∈ [1 , t X ] at each vertex v X i, , where each C i is isomorphic to T ( d max − , d max − , ρ ), each T i H-cyclic arXiv v5: December 4, 2020 F i v T i ,0 v T t T ,0 F t F e T i v T v T e T F F (e) P F = ( V F , E F ) v F t F ,0 T i T t T v F T (c) P T = ( V T , E T ) v F i ,0 e F i v F e F T v F i -1,0 v T i -1,0 C C t C C C i : E ( ≧ ) ={ a ,a ,...,a p }, p =| E ( ≧ ) | (a) G C = ( V C , E C ) v C v C v C t C ,0 v C i ,0 e TC i , j e CT i , j e TF i , j e CF i , j i [1, t C ], j [1, t T ] U I U I i [1, t C ], j [1, t F ] U I U I i [1, t T ], j [1, t F ] U I U I e T t T e F t F i [1, t C ], j [1, t T ] U I U I e T i +1 e F i +1 (f) F i = T ( d max - , d max - , r ), i [1, t F ] e C i ,1 e C i ,2 v C i ,0 e T i ,1 v T i ,0 U I v T i , n T v T i ,2 v T i ,6 e T i ,2 e T i ,7 e T i ,8 v T i ,7 v T i ,1 v T i ,3 e T i ,3 e T i ,4 e T i ,5 v T i ,4 v T i ,5 e T i ,6 e F i ,1 v F i ,0 v F i ,2 v F i ,7 e F i ,3 e F i ,8 e F i ,9 v F i ,8 v F i ,1 v F i ,4 e F i ,4 e F i ,5 e F i ,6 v F i ,5 v F i ,6 e F i ,7 v F i , n F v F i ,3 v F i ,10 e F i ,11 e F i ,12 v F i ,11 e F i ,10 v F i ,9 e F i ,2 v C i ,6 e C i ,7 e C i ,8 v C i ,7 e C i ,5 v C i ,5 e C i ,6 v C i ,1 v C i ,2 e C i ,3 e C i ,4 v C i ,3 v C i ,4 : E ( ≧ ) ={ a p +1 ,a p +2 ,...,a q }, q = c C =| E ( ≧ ) |+| E ( ≧ ) |: E ( / ) ={ a q +1 ,a q +2 ,...,a s }, s =| E ( ≧ ) |+| E ( ≧ ) |+| E ( / ) |: E ( =1 ) ={ a s +1 ,a s +2 ,...,a m C }, m C =| E C | (d) T i = T ( , d max - , r ), i [1, t T ] U I (b) C i = T ( D i , d max - , r ), i [1, t C ] U I v C i , n C ( i ) Figure 11: An illustration of the structure of a scheme graph SG, where vertices depicted withsquares represent core-vertices, vertices depicted with gray circles represent ρ -internal-vertices, andvertices depicted with white circles represent ρ -external-vertices: (a) A seed graph G C = ( V C , E C = E ( ≥ ∪ E ( ≥ ∪ E (0 / ∪ E (=1) ); (b) A tree C i , i ∈ [1 , t C ] rooted at a core-vertex v C i, ∈ V C ; (c) Apath P T = ( V T , E T ) of length t T −
1; (d) A tree T i , i ∈ [1 , t T ] rooted at a core-vertex v T i, ∈ V T ;(e) A path P F = ( V F , E F ) of length t F −
1; (f) A rooted tree F i , i ∈ [1 , t F ] rooted at a ρ -internalvertex v F i, ∈ V F .is isomorphic to T (2 , d max − , ρ ) and each F i is isomorphic to T ( d max − , d max − , ρ ). The j -thvertex (resp., edge) in each rooted tree X i is denoted by v X i,j (resp., e X i,j ) See Figure 11. Let V exX and E exX denote the set of non-root vertices v X i,j and the set of edges e X i,j over all rooted trees X i , i ∈ [1 , t X ]. In H , a ρ -fringe-tree is selected as a subtree of X i , i ∈ [1 , t X ] with root v X i, .We see that the scheme graph SG = ( V , E ) for a specification ( G C , σ co , σ nc , σ αβ ) satisfies thefollowing. |V| = O (( n ∗ + cs UB )( d max − ρ ) , |E | = O ( | E C | + |V| + n ∗ · cs UB ) . H-cyclic arXiv v5: December 4, 2020 Let K denote the dimension of a feature vector x = f ( G ) used in constructing a prediction function ψ over a set of chemical graphs G . Note that sets of chemical symbols and edge-configuration inStages 4 and 5 can be subsets of those used in constructing a prediction function ψ in Stage 3.Based on the above scheme graph SG, we obtain an MILP formulation that satisfies the followingresult. Theorem 2.
Let ( σ co , σ nc , σ αβ ) be a target specification and | Γ | = | Λ codg | + | Λ ncdg | + | Γ co | + | Γ in | + | Γ co | for sets of chemical symbols and edge-configuration in σ αβ . Then there is an MILP M ( x, g ; C ) thatconsists of variable vectors x ∈ R K and g ∈ R q for an integer q = O (cs UB ( | E C | + n ∗ )+( | E C | + |V| ) | Γ | ) and a set C of O ([cs UB ( | E C | + n ∗ ) + |V| ] | Γ | ) constraints on x and g such that: ( x ∗ , g ∗ ) is feasibleto M ( x, g ; C ) if and only if g ∗ forms a chemical ρ -lean graph G ∈ G ( G C , σ co , σ nc , σ αβ ) such that f ( G ) = x ∗ . Note that our MILP requires only O ( n ∗ ) variables and constraints when the branch-parameter ρ , integers | E C | , cs UB and | Γ | are constant.We explain the basic idea of our MILP in Theorem 2. The MILP mainly consists of the followingthree types of constraints.C1. Constraints for selecting a ρ -lean graph H ∈ H ( G C , σ co , σ nc ) as a subgraph of the schemegraph SG;C2. Constraints for assigning chemical elements to vertices and multiplicity to edges to determinea chemical graph G = ( H, α, β ); andC3. Constraints for computing descriptors from the selected chemical graph G .In the constraints of C1, more formally we prepare the following.Variables:a binary variable v X ( i, j ) ∈ { , } for each vertex v X i,j ∈ V X , X ∈ { C , T , F } so that v X ( i, j ) = 1 ⇔ vertex v X i,j is used in a graph H selected from SG;a binary variable e X ( i ) ∈ { , } (resp., e C ( i ) ∈ { , } ) for each edge e X i ∈ E T ∪ E F (resp., e C i = a i ∈ E ( ≥ ∪ E ( ≥ ∪ E (0 / ) so that e X ( i ) = 1 ⇔ edge e X i is used in a graph H selectedfrom SG. To save the number of variables in our MILP formulation, we do not prepare a binaryvariable e X ( i, j ) ∈ { , } for any edge e X i,j ∈ E CT ∪ E TC ∪ E CF ∪ E TC , where we represent a choiceof edges in these sets by a set of O ( n ∗ | E C | ) variables (see [ ? ] for the details);Constraints:linear constraints so that each ρ -fringe-tree of a graph H from SG is selected a subtree of some ofthe rooted trees C i , i ∈ [1 , t C ], T i , i ∈ [1 , t T ] and F i , i ∈ [1 , t F ];linear constraints such that each edge e C i = a i ∈ E (=1) is always used as a core-edge in H andeach edge e C i = a i ∈ E (0 / is used as a core-edge in H if necessary;linear constraints such that for each edge a k = ( v C i , v C i ′ ) ∈ E ( ≥ , vertex v C i ∈ V C is connectedto vertex v C i ′ ∈ V C in H by a path P k that passes through some core-vertices in V T and edges H-cyclic arXiv v5: December 4, 2020 e CT i,j , e T j +1 , e T j +2 , . . . , e T j + p , e TC i ′ ,j + p for some integers j and p ;linear constraints such that for each edge a k = ( v C i , v C i ′ ) ∈ E ( ≥ , either the edge a k is used as acore-edge in H or vertex v C i ∈ V C is connected to vertex v C i ′ ∈ V C in H by a path P k as in thecase of edges in E ( ≥ ;linear constraints for selecting a path P with ρ -internal edges e CF i,j (or e TF i,j ), e F j +1 , e F j +2 , . . . , e F j + p for some integers j and p .Based on these, we include constraints with some more additional variables so that a selectedsubgraph H is a connected graph and satisfies the core specification σ co and the non-core speci-fication σ nc . See constraints (4) to (10) in Appendix A.1 for choosing core-edges from the path P T . See constraints (11) to (18) in Appendix A.2 for choosing internal ρ -internal vertices/edgesfrom the path P F . See constraints (19) to (32) in Appendix A.3 for choosing internal ρ -externalvertices/edges from the trees C i , T i and F i .In the constraints of C2, we prepare an integer variable α X ( i, j ) for each vertex v X i,j ∈ V ,X ∈ { C , T , F } in the scheme graph that represents the chemical element α ( v X i,j ) ∈ Λ if v X i,j is ina selected graph H (or α ( v X i,j ) = 0 otherwise); integer variables β C : E C → [0 , β T : E T → [0 , β F : E F → [0 ,
3] that represent the bond-multiplicity of edges in E C ∪ E T ∪ E F ; and integervariables β + , β − : E ( ≥ ∪ E ( ≥ → [0 ,
3] and β in : f V C ∪ V T → [0 ,
3] that represent the bond-multiplicity of edges in E CT ∪ E TC ∪ E CF ∪ E TF . This determines a chemical graph G = ( H, α, β ).Also we include constraints for a selected chemical graph G to satisfy the valence condition at eachvertex v with the edge-configurations τ ( e ) of the edges incident to v and the chemical specification σ αβ . See constraints (43) to (53) in Appendix A.5 for assigning multiplicity; and constraints (56)to (69) in Appendix A.6 for assigning chemical clements and satisfying valence condition.In the constraints of C3, we introduce a variable for each descriptor and constraints with somemore variables to compute the value of each descriptor in f ( G ) for a selected chemical graph G .See constraints (33) to (42) in Appendix A.4 for descriptor of the number of specified degree;constraints (70) to (73) in Appendix A.7 for lower and upper bounds on the number of bonds in achemical specification σ αβ ; constraints (74) to (84) in Appendix A.8 for descriptor of the numberof adjacency-configurations; constraints (85) to (88) in Appendix A.9 for descriptor of the numberof chemical symbols; and constraints (89) to (99) in Appendix A.10 for descriptor of the numberof edge-configurations. When we use adjacency-configuration in a feature vector f ( G ) instead ofedge-configuration, we do not need to include the constraints in Appendix A.10. This section designs a new algorithm for generating ρ -lean cyclic graphs G that have the samefeature vector f ( G † ) of a given chemical ρ -lean graph G † . We design a new algorithm for generating cyclic chemical graphs based on the following aspects:(a) Treat the non-core components of a cyclc graphs with a certain limited structure that fre-quently appears among chemical compounds registered in the chemical data base;
H-cyclic arXiv v5: December 4, 2020 fff ( G ′ )(some types of feature vectors) of subtrees G ′ of all target graphs and then construct alimited number of target graphs G from the process of computing the vectors; and(c) First construct a chemical graph G † ∈ G with fff ( G † ) = x ∗ by solving an MILP in Stage 4and restrict ourselves to a family of chemical graphs G ∗ ∈ G that have a common structurewith the initial chemical graph G † .In (a), we choose a small branch-parameter ρ such as ρ = 2 and treat chemical ρ -lean cyclicgraphs G such that each 2-fringe-tree in G satisfies the size constraint (1).We design a method in (b) by extending the dynamic programming algorithm for generatingacyclic chemical graphs proposed by Azam et al. [4]. The first phase of the algorithm computessome compressed forms of all substructures of target objects before the second phase realizes afinal object based on the computation process of the first phase.The idea of (c) is first introduced in this paper. Informally speaking, we first decompose thechemical graph G † into a collection of chemical subtrees T † , T † , . . . , T † p and then compute vectors x ∗ , x ∗ , . . . , x ∗ p such that fff ( T † i ) = x ∗ i , i ∈ [1 , p ] and any collection of other chemical trees T † i with fff ( T † i ) = x ∗ i always gives rise to a target chemical graph G ∗ ∈ G . Thus this allows us to generatechemical trees T † i with fff ( T † i ) = x ∗ i for each index i ∈ [1 , p ] independently before we combine themto get a chemical graph G † ∈ G in Stage 5.In the following, we describe a new algorithm that for a given ρ -lean chemical graph G † =( H † , α † , β † ), generates chemical ρ -lean cyclic graphs G ∗ = ( H, α, β ) such that fff ( G ∗ ) = fff ( G † ) and the core Cr( H ) is isomorphic to the core Cr( H † ),where G ∗ may not be isomorphic to G † and the elements in Λ may not correspond between thetwo cores; i.e., possibly α ( v ) = α † ( φ ( v )) for some core-vertex v of H in the graph-isomorphism φ between Cr( H ) and Cr( H † ).In this section, we describe our new algorithm in a general setting where a branch-parameteris any integer ρ ≥ G to be inferred is any chemical ρ -lean cyclic graph. Nc-trees
Let ρ be a branch-parameter. and H be a ρ -lean cyclic graph. We have introduced core-subtrees in Section 2. We define “non-core-subtrees” as follows depending on branch-parameter ρ .Let T be a connected subgraph of H . We call T a non-core-subtree of H if T consists of a path P in of a ρ -pendent-tree of H and the ρ -fringe-trees rooted at vertices in P in . We call a non-core-subtree T of H an internal-subtree (resp., an end-subtree ) of H if neither (resp., one) of the two end-verticesof P in is a leaf ρ -branch of H , as illustrated in Figure 12(a) (resp., in Figure 12(b)).To represent a non-core-subtree of a ρ -lean cyclic graph H , we introduce “nc-trees.” We definean nc-tree to be a chemical bi-rooted tree T such that each rooted tree T ′ ∈ F ( T ) has a height atmost ρ . For an nc-tree T , define e V co ( T ) , ∅ (resp., e E co ( T ) , ∅ ); e V in ( T ) , V ( P T ) (resp., e E in ( T ) , E ( P T )) for the backbone path P T of T ; H-cyclic arXiv v5: December 4, 2020 e V ex ( T ) , V ( T ) \ e V in ( T ) (resp., e E ex ( T ) , E ( T ) \ e E in ( T )).Define the number bc ρ ( T ) of ρ -branch-core-vertices in T to be bc ρ ( T ) = 0 and the core heightch( T ) of T to be 0. C-trees
To represent a core-subtree of a ρ -lean cyclic graph H , we introduce “c-trees.” For abranch-parameter ρ , we call a bi-rooted tree ρ -lean if each rooted tree T ′ ∈ F ( T ) contains at mostone ρ -branch; i.e., there is no non-leaf ρ -branch and no two ρ -branch-pendent-trees meet at thesame vertex in P T . A c-tree is defined to be a chemical ρ -lean bi-rooted tree T . See Figure 12(c)and (d) for illustrations of c-trees T with ℓ ( P T ) = 0 and ℓ ( P T ) ≥
1, respectively. For a c-tree T ,define e V co ( T ) , V ( P T ) (resp., e E co ( T ) , E ( P T )) for the backbone path P T of T ; e V in ( T ) (resp., e E in ( T )) to be the set of ρ -internal vertices (resp., ρ -internal vertices)in the rooted trees T ′ ∈ F ( T ); e V ex ( T ) , V ( T ) \ ( e V co ( T ) ∪ e V in ( T )) (resp., e E ex ( T ) , E ( T ) \ ( e E co ( T ) ∪ e E in ( T ))).Define the number bc ρ ( T ) of ρ -branch-core-vertices in T to be the number of rooted trees in T ′ ∈ F ( T ) with ht( T ′ ) > ρ . Define the core height ch( T ) , ht( T ) for the bi-rooted tree T . Notethat e V ex ( T ) (resp., e E ex ( T )) is the set of ρ -external vertices (resp., ρ -external vertices) in the rootedtrees in F ( T ). Fictitious Trees
For an nc-tree or a c-tree T and an integer ∆ ∈ [1 , T [+∆] denote a ficti-tious chemical graph obtained from T by regarding the degree of terminal r ( T ) as deg T ( r ( T ))+∆.Figure 13(a) and (b) illustrate fictitious trees T [+∆] in the case of r ( T ) = r ( T ) and T [+1] inthe case of ∆ = 1 and r ( T ) = r ( T ).For a c-tree T with r ( T ) = r ( T ) and integers ∆ , ∆ ∈ [1 , T [+∆ , ∆ ] denote afictitious chemical graph obtained from T by regarding the degree of terminal r i ( T ), i = 1 , T ( r i ( T )) + ∆ i . Figure 13(c) illustrates a fictitious bi-rooted c-tree T [+∆ , ∆ ]. For a finite set A of elements, let Z A + denote the set of functions : A → Z + . A function ∈ Z A + is called a non-negative integer vector (or a vector) on A and the value xxx ( a ) for an element a ∈ A is called the entry of xxx for a ∈ A . For a vector ∈ Z A + and an element a ∈ A , let + 111 a (resp., − a ) denote the vector ′ such that ′ ( a ) = ( a ) + 1 (resp., ′ ( a ) = ( a ) −
1) and ′ ( b ) = ( b )for the other elements b ∈ A \ { a } . For a vector ∈ Z A + and a subset B ⊆ A , let [ B ] denote the projection of to B ; i.e., [ B ] ∈ Z B + such that [ B ] ( b ) = ( b ), b ∈ B .To introduce a “frequency vector” of a subgraph of a chemical cyclic graph, we define sets ofsymbols that correspond to some descriptors of a chemical cyclic graph. Let Γ co , Γ in and Γ ex besets of edge-configurations in Section 2.2. We define a vector whose entry is the frequency of anedge-configuration in sets Γ t , t ∈ { co , in , ex } or the number of ρ -branch-core-vertices. We use asymbol bc to denote the number of ρ -branch-core-vertices in our frequency vector. To distinguishedge-configurations from different sets among three sets Γ t , t ∈ { co , in , ex } , we use γ t to denotethe entry of an edge-configuration γ ∈ Γ t , t ∈ { co , in , ex } . We denote by h Γ t i the set of entries γ t , H-cyclic arXiv v5: December 4, 2020 (a) a non-core-subtree (internal-subtree) T of G P T (b) a non-core-subtree (end-subtree) T of G core of G core of G P T P T r ( T ) r ( T ) r ( T ) r ( T ) core of G (c) a core-subtree T of G with l ( P T )=0 (d) a core-subtree T of G with l ( P T ) ≧ P T r ( T ) r ( T ) core of G ch ( T ) ch ( T ) Figure 12: An illustration of subtrees of a chemical ρ -lean cyclic graph G , where thick lines depictthe cycle of the core of G , gray circles depict leaf ρ -branches in G and arrows depict non-coredirected edges: (a) A non-core-subtree (internal-subtree) T of G represented by an nc-tree (achemical bi-rooted tree); (b) A non-core-subtree (end-subtree) T of G represented by an nc-tree(a chemical bi-rooted tree); (c) A core-subtree T of G with ℓ ( P T ) = 0 represented by a c-tree(a chemical rooted tree); (d) A core-subtree T of G with ℓ ( P T ) ≥ dr ( T ) r ( T ) P T (b) T [ + ](c) T [ + D , D ] r ( T ) r ( T ) P T v v v d D = D = r = r ( T )= r ( T ) d v (a) T [ + D ] D = m m m m . . . m d v d a a Figure 13: An illustration of fictitious trees: (a) T [+∆] of a rooted nc- or c-tree T ; (b) T [+1] of abi-rooted nc-tree T ; (c) T [+∆ , ∆ ] of a bi-rooted c-tree T . H-cyclic arXiv v5: December 4, 2020 γ ∈ Γ t , t ∈ { co , in , ex } . Define the set of all entries of a frequency vector to beΣ , { bc } ∪ [ t ∈{ co , in , ex } h Γ t i . Given an nc-tree or c-tree G or a chemical ρ -lean cyclic graph G , define the frequency vector fff ( G ), to be a vector ∈ Z Σ+ that consists of the following entries:- ( γ co ) = |{ uv ∈ e E co | ( α ( u ) , deg G ( u ) , α ( v ) , deg G ( v )) ∈ { ( a , i, b , j, m ) , ( b , j, a , i, m ) } , β ( uv ) = m }| , γ = ( a i, b j, m ) co ∈ h Γ co i ;- ( γ t ) = |{ ( u, v ) ∈ e E t | α ( u ) = a , deg G ( u ) = i, α ( v ) = b , deg G ( v ) = j, β ( uv ) = m }| , γ = ( a i, b j, m ) t ∈ h Γ t i , t ∈ { in , ex } ;- ( bc ) = bc ρ ( G ).Note that any other descriptors of a chemical ρ -lean cyclic graph G ∈ G ( xxx ∗ ) except for the coreheight can be determined by the entries of the frequency vector = fff ( G ). For example, the vector zzz ∈ Z { dg1 , dg2 , dg3 , dg4 } + with the numbers dg i of core-vertices of degree i ∈ [1 ,
4] is given by zzz = 12 h X γ co =( a i, a j,m ) co ∈h Γ co i ( γ co ) · (111 dg i +111 dg j ) − X i v =deg G ( v ): v ∈ V B (∆ v − dg i v i and the vector zzz ′ ∈ Z Λ dg + with the numbers of symbols µ ∈ Λ dg of core-vertices is given by zzz ′ = 12 h X γ co =( µ,ξ,m ) co ∈h Γ co i ( γ co ) · (111 µ +111 ξ ) − X µ v = a v deg G ( v ): v ∈ V B (∆ v − µ v i . Similarly the vector zzz ′′ ∈ Z Λ dg + the numbers of symbols µ ∈ Λ dg of non-core-vertices is given by zzz ′ = X γ in =( µ,ξ,m ) in ∈h Γ in i ( γ in ) · ξ + X γ ex =( µ,ξ,m ) ex ∈h Γ ex i ( γ ex ) · ξ . For an nc-tree or c-tree T , the frequency vector fff ( T [+∆]) of a fictitious tree T [+∆] is defined asfollows: Let r = r ( T ), d = deg T ( r ), N T ( r ) = { v , v , . . . , v d } , γ j = ( µ j = a j d, ξ j , m j ) = τ (( r, v i )) ∈ Γ, j ∈ [1 , d ]. Let e γ j = ( a j { d +∆ } , ξ j , m j ), j ∈ [1 , d ]. Set t := in if T is an nc-tree, and t := co if T is a c-tree. When r ( T ) = r ( T ), fff ( T [+∆]) = fff ( T ) + X j ∈ [1 ,d ] (111 e γ t j − γ t j ) . Let r ( T ) = r ( T ), and v d belong to P T . When T is an nc-tree, fff ( T [+∆]) = fff ( T ) + X j ∈ [1 ,d − (111 e γ ex j − γ ex j ) + 111 e γ in d − γ in d . The frequency vector fff ( T [+∆ , ∆ ]) of a fictitious tree T [+∆ , ∆ ] for a bi-rooted c-tree T with r ( T ) = r ( T ) is defined as follows: For each i = 1 ,
2, let r i = r i ( T ), a i = α ( r i ), d i = deg T ( r i ), m i = β ( e i ) of the unique edge incident to r i and γ i = ( µ i = a i d i , ξ i , m i ) = τ ( e i ) ∈ Γ, i = 1 ,
2. Let e γ i = ( a i { d i +∆ i } , ξ i , m i ), i = 1 ,
2. Then fff ( T [+∆ , ∆ ]) = fff ( T ) + X i =1 , (111 e γ co i − γ co i ) . H-cyclic arXiv v5: December 4, 2020 For a chemical ρ -lean cyclic graph for a branch-parameter ρ ≥
1, we choose a path-partition P = { P , P , . . . , P p } of the core Cr( H ) = ( V co , E co ), where |P| ≤ | E co | . Let V B denote the set ofall end-vertices of paths P ∈ P , where V bc ρ ⊆ V B ⊆ V co .Define the base-graph G B = ( V B , E B ) of H by P to be the multigraph obtained from H replacingeach path P j ∈ P with a single edge e j joining the end-vertices of P i , where E B = { e , e , . . . , e p } .We call a vertex in V B and an edge in E B a base-vertex and a base-edge , respectively. For anotational convenience in distinguishing the two end-vertices u and v of a base-edge e = uv ∈ E B ,we regard each base edge e = uv as a directed edge e = ( u, v ). For each base-edge e ∈ E B , let P e denote the path P j ∈ E B that is replaced by edge e = e j .We define the “components” of G by P as follows. Vertex-components
For each base-vertex v ∈ V B , define the component at vertex v (or the v -component ) T v of G to be the chemical core-subtree rooted at v in G ; i.e., T v consists of allpendent-trees rooted at v . We regard T v as a c-tree rooted at the core-vertex v of G and definethe code code v ( T v ) of T v to be a tuple ( a v , d v , m v , ∆ v , xxx v ) such that a v = α ( v ), d v = deg G ( v ) − deg G B ( v ), m v = P vv ′ ∈ E ( T v ) β ( vv ′ ),∆ v = deg G B ( v ) and xxx v = fff ( T v [+∆ v ]). Edge-components
For each base-edge e = e j = ( u, v ) ∈ E B , define the component at edge e (or the e -component ) T e of G to be the chemical core-subtree of G that consists of the core-path P j ∈ P and all pendant-trees of G rooted at internal vertices of path P j . We regard T e as abi-rooted c-tree with r ( T e ) = u and r ( T e ) = v for the base-edge e = uv and define the code code e ( T e ) of T e to be a tuple ( a eu , m eu , a ev , m ev , ∆ eu , ∆ ev , xxx e ) such that a eu = α ( u ), a ev = α ( v ), ∆ eu = deg G B ( u ) −
1, ∆ ev = deg G B ( v ) − xxx e = fff ( T e [+∆ eu , ∆ ev ]), m eu = β ( uu ′ ) and m ev = β ( vv ′ ) for the edges uu ′ , vv ′ ∈ E ( P j ) incident to u and v .Observe that fff ( G ) = X v ∈ V B xxx v + X e ∈ E B xxx e . We introduce a specification σ ch as a set of functions ch LB : V B ∪ E B → Z + .We call two chemical graphs ( P , σ ch ) -isomorphic if they consist of vertex and edge componentswith the same codes and heights; i.e., two chemical ρ -lean cyclic graphs G i = ( H i , α i , β i ), i = 1 , P , σ ch )-isomorphic if the following hold:- Cr( H ) and Cr( H ) are graph-isomorphic, where we assume that Cr( H ) = Cr( H ) =( V co , E co ) and G B = ( V B , E B ) denotes the base-graph of both graphs H and H by P ;- For the v -components T iv of G i , i = 1 , v ∈ V B ,code v ( T v ) = code v ( T v ) and ht( T v ) , ht( T v ) ∈ [ch LB ( v ) , ch UB ( v )]; H-cyclic arXiv v5: December 4, 2020
35- For the e -components T ie of G i , i = 1 , e ∈ E B ,code e ( T e ) = code e ( T e ) and ht( T v ) , ht( T v ) ∈ [ch LB ( v ) , ch UB ( v )];See Section 2 for the definition of height ht( T ) of a bi-rooted tree T .The ( P , σ ch )-isomorphism also implies that fff ( G ) = fff ( G ), n ( G ) = n ( G ), cs( G ) = cs( G ),bl ρ ( G ) = bc ρ ( G ) = bl ρ ( G ) = bc ρ ( G ) and | ch( G ) − ch( G ) | ≤ max t ∈ V B ∪ E B (ch UB (t) − ch LB (t)). G † Now we assume that a chemical ρ -lean cyclic graph G † for a branch-parameter ρ ≥ ch( G † ) isavailable in such case where a target chemical graph G † is constructed by solving an MILP inStage 4. Let T † v (resp., T † e ) denote the v -component (resp., the e -component) of G † . Target v -components Let ht ∗ v denote the height h ( T † v ) of the v -component of G † . For eachbase-vertex v ∈ V B , fix a code ( a v , d v , m v , ∆ v , xxx ∗ v ) := code v ( T † v ) and call a rooted c-tree T a target v -component if code v ( T ) = ( a v , d v , m v , ∆ v , xxx ∗ v ) and ht( T ) ∈ [ch LB ( v ) , ch UB ( v )],where the condition on ht( T ) is equivalent to ht( T ) = ht ∗ v when xxx ∗ v ( bc ) = 1, since G † is a ρ -leancyclic graph and the set of ρ -internal edges in any target component forms a single path of lengthht ∗ v from the root to a unique leaf ρ -branch. Target e -components For each base-edge e = ( u, v ) = e j ∈ E B , fix a code ( a eu , m eu , a ev , m ev , ∆ eu , ∆ ev , xxx ∗ e ) := code e ( T † e ) and call a bi-rooted c-tree T a target e -component ifcode e ( T ) = ( a eu , m eu , a ev , m ev , ∆ eu , ∆ ev , xxx ∗ e ) and ht( T ) ∈ [ch LB ( e ) , ch UB ( e )].Let T e denote the set of all target components of a base-edge e ∈ E B .Given a collection of target v -components T v ∈ T v , v ∈ V B and target e -components T e ∈ T e , e ∈ E B , there is a chemical ρ -lean cyclic graph G ∗ that is ( P , σ ch )-isomorphic to the originalchemical graph G † . Such a graph G ∗ can be obtained from G B by replacing each base-edge e ∈ E B with T e and attaching T v at each base-vertex v ∈ V B .From this observation, our aim is now to generate some number of target v -components foreach base-vertex v and target e -components for each base-edge e . In the following, we denote a eu , a ev , ∆ eu , ∆ ev , m eu and m ev for each base-edge e = ( u, v ) ∈ E B by a e , a e , ∆ e , ∆ e , m e and m e ,respectively for a notational simplicity. For each base-edge e = e j ∈ E B , let δ e := ⌊ ( ℓ ( P j ) − / ⌋ and δ e := ⌈ ( ℓ ( P j ) − / ⌉ . H-cyclic arXiv v5: December 4, 2020 We start with describing a sketch of our new algorithm for generating graphs G ∗ in Stage 5 beforewe present some technical details of the algorithm in the following sections.We start with enumerating chemical rooted trees with height at most ρ , which can be a ρ -fringe-tree of a target component. Next we extend each of the rooted tree to an nc-tree T andthen to a c-tree T under a constraint that the frequency vector of T does not exceed a given vector xxx = xxx ∗ v , v ∈ V B or xxx = xxx ∗ e , e ∈ E B .For a vector xxx ∈ Z Σ+ , we formulate the following sets of nc-trees and c-trees and of theirfrequency vectors:(i) T (0)inl ( a , d, m ; xxx ), a ∈ Λ, d ∈ [0 , min { val( a ) , d max } − m ∈ [ d, val( a ) − T with a root r such that α ( r ) = a , deg T ( r ) = d , β T ( r ) = m , ht( T ) ≤ ρ and fff ( T [+2]) ≤ xxx ;Let W (0)inl ( a , d, m ; xxx ) denote the set of the frequency vectors = fff ( T [+2]) for all nc-trees T ∈ T (0)inl ( a , d, m ; xxx );(ii) T (0)end ( a , d, m ; xxx ), a ∈ Λ, d ∈ [1 , min { val( a ) , d max } − m ∈ [ d, val( a ) − T with a root r such that α ( r ) = a , deg T ( r ) = d , β T ( r ) = m , ht( T ) = ρ and fff ( T [+1]) ≤ xxx ;Let W (0)end ( a , d, m ; xxx ) denote the set of the frequency vectors = fff ( T [+1]) for all nc-trees T ∈ T (0)end ( a , d, m ; xxx );(iii) T ( h )end ( a , d, m ; xxx ), xxx = xxx ∗ t , a ∈ Λ, d ∈ [1 , min { val( a ) , d max } − m ∈ [ d, val( a ) − h ∈ [1 , ch UB (t)]: the set of bi-rooted nc-trees T such that α ( r ( T )) = a , deg T ( r ( T )) = d , β ( r ( T )) = m , ℓ ( P T ) = h , fff ( T [+1]) ≤ xxx ,ht( T ′ ) ≤ ρ for all trees T ′ ∈ F ( T ) andht( T ′′ ) = ρ for the tree T ′′ ∈ F ( T ) rooted at terminal r ( T );Let W ( h )end ( a , d, m ) denote the set of all frequency vectors = fff ( T [+1]) for all bi-rootednc-trees T ∈ T ( h )end ( a , d, m );(iv) T (0)co+∆ ( a , d, m, h ; xxx ), xxx = xxx ∗ t , a ∈ Λ, ∆ ∈ [2 , d ∈ [0 , min { val( a ) , d max } − ∆], m ∈ [ d, val( a ) − ∆], h ∈ [0 , ch UB (t)]: the set of rooted c-trees T with a root r such that α ( r ) = a , deg T ( r ) = d , β T ( r ) = m , ht( T ) = h and fff ( T [+∆]) ≤ xxx ;Let W (0)co+∆ ( a , d, m, h ; xxx ) denote the set of the frequency vectors = fff ( T [+∆]) for all c-trees T ∈ T (0)co+∆ ( a , d, m, h ; xxx ); H-cyclic arXiv v5: December 4, 2020 T ( q )co+1 , ∆ ( a , d, m, b , , m ′ , h ; xxx ), xxx = xxx ∗ e , a , b ∈ Λ, ∆ ∈ [2 , d ∈ [1 , min { val( a ) , d max } − m ∈ [ d, val( a ) − m ′ ∈ [1 , val( b ) − ∆], h ∈ [0 , ch UB ( e )], q ∈ [1 , ℓ ( P e )]: the set of bi-rootedc-trees T such that α ( r ( T )) = a , deg T ( r ( T )) = 1, β ( r ( T )) = m , α ( r ( T )) = b , deg T ( r ( T )) = d , β ( r ( T )) = m ′ , ℓ ( P T ) = q , ht( T ) = h and fff ( T [+1 , ∆]) ≤ xxx ;Let W ( q )co+1 , ∆ ( a , d, m, b , , m ′ , h ; xxx ) denote the set of the frequency vectors = fff ( T [+1 , ∆])for all bi-rooted c-trees T ∈ T ( q )co+1 , ∆ ( a , d, m, b , , m ′ , h ; xxx ).Note that ( bc ) = 0 for any vector in the above set in (i)-(iii).Our algorithm consists of six steps. Step 1 computes the sets of trees and vectors in (i), (ii)and (iii) with h ≤ ρ , where each tree in these sets is of height at most ρ . Note that the frequencyvectors of some two trees in a tree set T in the above can be identical. In fact, the size |T | of aset T of trees can be considerably larger than that | W | of the set W of their frequency vectors.We mainly maintain a whole vector set W, and for each vector ∈ W, we store at least one tree T ∈ T such that is the frequency vector of a fictitious tree of T , where we call such a tree T a sample tree of the vector . With this idea, Steps 2-5 compute only vector sets W in (iii) with h > ρ , (iv) and (v). In each of these steps, we compute a set S of sample trees of the vectors ineach vector set W. The last step constructs at least one target component for each base-vertex orbase-edge, and then combines them to obtain a graph G ∗ to be inferred in Stage 5.We derive recursive formula that hold among the above sets. Based on this, we compute thevector sets in (iii) in Step 2, those in (iv) in Step 3 and those in (v) in Step 4. During these steps, wecan find a target v -component for each base-vertex v ∈ V B . For each base-edge e = ( u , u ) ∈ E B ,Step 5 compare vectors and , where i is the frequency vector of a c-tree T i that is extendedfrom the end-vertex u i , to examine whether T and T give rise to a target e -component. ρ -fringe-trees Step 1 computes the following sets in (i)-(iv).(i) For each base-vertex v ∈ V B such that ch LB ( v ) ≤ ρ and xxx ∗ v ( bc ) = 0, compute the set T (0)co+∆ v ( a v , d v , m v , h ; xxx ∗ v ), h ∈ [ch LB ( v ) , min { ρ, ch UB ( v ) } ] of rooted c-trees. Note that everyc-tree in the set T (0)co+∆ v ( a v , d v , m v , h ; xxx ∗ v ) with h ∈ [ch LB ( v ) , min { ρ, ch UB ( v ) } ] is a target v -component in T v ;Set S ( v ) be a subset of S h ∈ [ch LB ( v ) , min { ρ, ch UB ( v ) } ] T (0)co+∆ v ( a v , d v , m v , h ; xxx ∗ v );(ii) For each base-vertex v ∈ V B such that ρ < ch UB ( v ) and xxx ∗ v ( bc ) = 1 and integers m ∈ [ d v − , val( a v ) − ∆ v −
1] and p ∈ [0 , k ], compute the sets T (0)co+(∆ v +1) ( a v , d v − , m, p ; xxx ∗ v ) ofrooted c-trees and W (0)co+(∆ v +1) ( a v , d v − , m, p ; xxx ∗ v ) of their frequency vectors;For each vector ∈ W (0)co+(∆ v +1) ( a v , d v − , m, p ; xxx ∗ v ), choose some number of sample trees T ,and store them in a set S (0)co+(∆ v +1) ( a v , d v − , m, p ; xxx ∗ v ); H-cyclic arXiv v5: December 4, 2020 v ∈ V B such that ρ < ch UB ( v ) and xxx ∗ v ( bc ) = 1, each possible tuple( a , d, m ), compute the sets T (0)inl ( a , d, m ; xxx ∗ v ) and T (0)end ( a , d, m ; xxx ∗ v ) of rooted nc-trees and thesets W (0)inl ( a , d, m ; xxx ∗ v ) and W (0)end ( a , d, m ; xxx ∗ v ) of their frequency vectors;For each vector ∈ W (0)inl ( a , d, m ; xxx ∗ v ) (resp., ∈ W (0)end ( a , d, m ; xxx ∗ v )), choose some numberof sample trees T ∈ T (0)co+∆ ( a , d, m, h ; xxx ∗ e ), and store them in a set S (0)inl ( a , d, m ; xxx ∗ v ) (resp., ∈ W (0)end ( a , d, m ; xxx ∗ v ));(iv) For each base-edge e ∈ E B and each possible tuple ( a , d, m ), compute the sets T (0)inl ( a , d, m ; xxx ∗ e )and T (0)end ( a , d, m ; xxx ∗ e ) of rooted nc-trees and T (0)co+∆ ( a , d, m, h ; xxx ∗ e ), h ∈ [0 , min { ρ, ch UB ( e ) } ] ofrooted c-trees and the sets W (0)inl ( a , d, m ; xxx ∗ e ), W (0)end ( a , d, m ; xxx ∗ e ) and W (0)co+∆ ( a , d, m, h ; xxx ∗ e ) oftheir frequency vectors;For each vector ∈ W (0)inl ( a , d, m ; xxx ∗ e ) (resp., ∈ W (0)end ( a , d, m ; xxx ∗ e ) and ∈ W (0)co+∆ ( a , d, m, h ; xxx ∗ e )),choose some number of sample trees T and store them in a set S (0)inl ( a , d, m ; xxx ∗ e ) (resp., S (0)end ( a , d, m ; xxx ∗ e ) and S (0)co+∆ ( a , d, m, h ; xxx ∗ e )),To compute the above sets of trees and vectors, we enumerate all possible trees with height atmost 2 under the size constraint (1) by a branch-and-bound procedure. For each base-vertex t = v ∈ V B or each base-edge t = e ∈ E B such that ρ < ch UB (t) andeach possible tuple ( a , d, m ), Step 2 computes the set W ( h )end ( a , d, m ; xxx ∗ t ) in the ascending order of h = 1 , , . . . , ch UB (t) − ρ −
1. Observe that each vector ∈ W ( h )end ( a , d, m ; xxx ∗ t ) is obtained as = ′ + ′′ +111 γ in from a combination of vectors ′ ∈ W (0)inl ( a , d − , m ′ ; xxx ∗ t ) and ′′ ∈ W ( h − ( b , d ′′ , m ′′ ; xxx ∗ t )such that m ′ ≤ val( a ) − , ≤ m − m ′ ≤ val( b ) − m ′′ ′ + ′′ + 111 γ in ≤ xxx ∗ t for γ = ( a { d +1 } , b { d ′′ +1 } , m − m ′ ) ∈ Γ . Figure 14(a) illustrates this process of computing a vector ∈ W ( h )end ( a , d, m ; xxx ∗ t ).For each vector ∈ W ( h )end ( a , d, m ; xxx ∗ t ) obtained from a combination ′ ∈ W (0)inl ( a , d − , m ′ ; xxx ∗ t )and ′′ ∈ W ( h − ( b , d ′′ , m ′′ ; xxx ∗ t ), we construct at least one sample nc-tree T from their sam-ple nc-trees T ′ ∈ S (0)inl ( a , d − , m ′ ; xxx ∗ t ) and T ′′ ∈ S ( h − ( b , d ′′ , m ′′ ; xxx ∗ t ) and store them in a set S ( h )end ( a , d, m ; xxx ∗ t ). For each base-vertex t = v ∈ V B or each base-edge t = e ∈ E B such that ρ < ch UB (t) and eachpossible tuple ( a , d, m, h ) with h ∈ [ ρ + 1 , ch UB (t)], Step 3 computes the set W (0)co+∆ ( a , d, m, h ; xxx ∗ t ).Observe that each vector ∈ W (0)co+∆ ( a , d, m, h ; xxx ∗ t ) is obtained as = ′ + ′′ + 111 γ in + 111 bc from acombination of vectors ′ ∈ W (0)co+(∆+1) ( a , d − , m ′ , p ; xxx ∗ t ), p ∈ [0 , k ] and ′′ ∈ W ( h − ρ − ( b , d ′′ , m ′′ ; xxx ∗ t )such that m ′ ≤ val( a ) − ∆ − , ≤ m − m ′ ≤ val( b ) − m ′′ ′ + ′′ + 111 γ in + 111 bc ≤ xxx ∗ t for γ = ( a { d +∆ } , b { d ′′ +1 } , m − m ′ ) ∈ Γ , where ′ ( bc ) = ′′ ( bc ) = 0. Figure 14(b) illustrates this process of computing a vector ∈ W (0)co+∆ ( a , d, m, h ; xxx ∗ t ). H-cyclic arXiv v5: December 4, 2020 a m’’m-m’ b m’ T’ [ + T’’ [ + h -1 d -1 d’’ (b) T’ [ + ( D + T’’ [ + a m” b m’h- r -1 d -1 d’’ (a) D m mm-m’ D +1 Figure 14: (a) An illustration of computing a vector ∈ W ( h )end ( a , d, m ; xxx ∗ t ) from the fre-quency vectors ′ = fff ( T ′ [+2]) ∈ W (0)inl ( a , d − , m ′ ; xxx ∗ t ) of a bi-rooted nc-tree T ′ and ′′ = fff ( T ′′ [+1]) ∈ W ( h − ( b , d ′′ , m ′′ ; xxx ∗ t ) of an nc-tree T ′′ ; (b) An illustration of computing a vector ∈ W (0)co+∆ ( a , d, m, h ; xxx ∗ t ) from the frequency vectors ′ = fff ( T ′ [+(∆ + 1)]) ∈ W (0)co+(∆+1) ( a , d − , m ′ , p ; xxx ∗ t ), p ∈ [0 , ρ ] of a c-tree T ′ and ′′ = fff ( T ′′ [+1]) ∈ W ( h − ρ − ( b , d ′′ , m ′′ ; xxx ∗ t ).For each vector ∈ W (0)co+∆ ( a , d, m, h ; xxx ∗ t ) obtained from a combination of vectors ′ ∈ S p ∈ [0 ,k ] W (0)co+(∆+1) ( a , d − , m ′′ , p ; xxx ∗ t ) and ′′ ∈ W ( h − ρ − ( b , d ′ , m ′ ; xxx ∗ t ), we construct at leastone sample c-tree T from their sample nc-trees T ′ ∈ S p ∈ [0 ,k ] S (0)co ( a , d − , m ′′ , p ; xxx ∗ t ) and T ′′ ∈S ( h )end ( b , d ′ , m ′ ; xxx ∗ t ) and store them in a set S (0)co+∆ ( a , d, m, h ; xxx ∗ t ).For each base-vertex v ∈ V B with ρ < ch UB ( v ) and xxx ∗ v ( bc ) = 1, all sample c-trees T ∈S (0)co+∆ v ( a v , d v , m v , ht ∗ v ; xxx ∗ v ) are target v -components in T v , and we set S ( v ) := S (0)co+∆ v ( a v , d v , m v , ht ∗ v ; xxx ∗ v ). For each base-edge e ∈ E B , each index i = 1 , a , d, m, h ) with h ∈ [ch LB ( e ) , ch UB ( e )], Step 4 computes the set W ( q )co+1 , ∆ ei ( a , d, m, a ei , , m ei , h ; x ∗ e ) in the ascending order q = 1 , , . . . , δ ei . Observe that each vector ∈ W ( q )co+1 , ∆ ei ( a , d, m, a ei , , m ei , h ; x ∗ e ), is obtained as = ′ + ′′ + 111 γ co from a combination of vectors ′ ∈ W ( q − , ∆ ei ( b , d ′ , m ′ , a ei , , m ei , h ′ ; x ∗ e ), and ′′ ∈ W (0)co+2 ( a , d − , m ′′ , h ′′ ; x ∗ e ) such that h = max { h ′ , h ′′ } ∈ [ch LB ( e ) , ch UB ( e )] , m ′′ ≤ val( a ) − , ≤ m − m ′′ ≤ val( b ) − m ′ ′ + ′′ + 111 γ co ≤ xxx ∗ e for γ = ( a d, b { d ′ +1 } , m − m ′′ ) ∈ Γ ≤ . Figure 15(a) illustrates this process of computing a vector ∈ W ( q )co+1 , ∆ ei ( a , d, m, a ei , , m ei , h ; x ∗ e ),For each vector ∈ W ( q )co+1 , ∆ ei ( a , d, m, a ei , , m ei , h ; x ∗ e ) obtained from a combination ′ ∈ W ( q − , ∆ ei ( b ,d ′ , m ′ , a ei , , m ei , h ′ ; x ∗ e ) and ′′ ∈ W (0)co+2 ( a , d − , m ′′ , h ′′ ; x ∗ e ), we construct at least one sample c-tree T from their sample nc-trees T ′ ∈ S ( q − , ∆ ei ( b , d ′ , m ′ , a ei , , m ei , h ′ ; x ∗ e ) and T ′′ ∈ S (0)co+2 ( a , d − , m ′′ , h ′′ ; x ∗ e ) and store them in a set S ( q )co+1 , ∆ ei ( a , d, m, a ei , , m ei , h ; x ∗ e ). H-cyclic arXiv v5: December 4, 2020 (a) m - m’’ (b) a m’’ b m’ T’’ [ + T’ [ + D i e ] q -1 d -1 d’ a a d d m m m T [ + D e ] T [ + D e ] D e h h h’’ h’ D e d e d em e m e D i e a e a e m i e a i e Figure 15: (a) An illustration of computing a vector ∈ W ( q )co+1 , ∆ ei ( a , d, m, a ei , , m ei , h ; xxx ∗ e )for a base-edge e ∈ E B from the frequency vectors ′ = fff ( T ′′ [+1 , ∆ ei ]) ∈ W ( q − , ∆ ei ( b , d ′ , m ′ , a ei , , m ei , h ′ ; xxx ∗ e ) of a c-tree T ′ and ′′ = fff ( T ′′ [+2]) ∈ W (0)co+2 ( a , d − , m ′′ , h ′′ ; xxx ∗ e )of a c-tree T ′′ ; (b) An illustration of computing a feasible vector pair ( ) with i = fff ( T i [+1 , ∆ ei ]) ∈ W ( δ ei )co+1 , ∆ ei ( a i , d i , m i , a ei , , m ei , h i ; xxx ∗ e ), of c-trees T i i = 1 , e ∈ E B . For each edge e ∈ E B , a feasible vector pair is defined to be a pair of vectors i ∈ W ( δ ei )co+1 , ∆ ei ( a i , d i ,m i , a ei , , m ei , h i ; xxx ∗ e ), i = 1 , { h , h } ∈ [ch LB ( e ) , ch UB ( e )] and xxx ∗ e = + + 111 γ for an edge-configuration γ = ( a { d +1 } , a { d +1 } , m ) ∈ Γ ≤ with an integer m ∈ [1 , min { , val( a ) − m , val( a ) − m } ]. The second equality is equivalent with a condition that is equal to the vector xxx ∗ e − − γ , which we call the γ -complement of , and denote it by . Figure 15(b) illustratesthis process of computing a feasible vector pair ( ).For each edge e ∈ E B , Step 5 enumerates the set W pair ( e ) of all feasible vector pairs ( ).To efficiently search for a feasible pair of vectors in two sets W ( δ ei )co+1 , ∆ ei ( a i , d i , m i , a ei , , m ei , h i ; xxx ∗ e ), i = 1 , { h , h } ∈ [ch LB ( e ) , ch UB ( e )], we first compute the γ -complement vector ofeach vector ∈ W ( δ e )co+1 , ∆ e ( a , d , m , a e , , m e , h ) for each an edge-configuration γ = ( a { d +1 } , a { d + 1 } , m ) ∈ Γ with m ∈ [1 , min { , val( a ) − m , val( a ) − m } ], and denote by W ( δ )co theset of the resulting γ -complement vectors. Observe that ( ) is a feasible vector pair if andonly if = . To find such pairs, we merge the sets W ( δ e )co+1 , ∆ e ( a , d , m , a e , , m e , h ; xxx ∗ e ) andW ( δ )co into a sorted list L γ . Then each feasible vector pair ( ) appears as a consecutive pairof vectors and in the list L γ,µ . H-cyclic arXiv v5: December 4, 2020 The task of Step 6 is to construct for each feasible vector pair ( ) ∈ W pair ( e ), constructat least one target e -component T ( ) ∈ T e by combining the sample c-trees T i = T i ∈S ( δ ei )co+1 , ∆ ei ( a i , d i , m i , a ei , , m ei , h i ; xxx ∗ e ), i = 1 , r ( T ) r ( T ) with a bond-multiplicity m , and store these target e -components T ( ) in a set S ( e ). Figure 15(b) illustrates two samplec-trees T i , i = 1 , e = r ( T ) r ( T ).For each base-vertex v ∈ B and each base-edge e ∈ E B , a set S ( v ) of target v -components and aset S ( e ) of target e -components have be constructed. Let C be a collection obtained by choosing atarget v -component T v ∈ S ( v ) for each base-vertex v ∈ B and and a target e -component T e ∈ S ( e )for each base-edge e ∈ E B . Then a chemical graph G ∗ ( C ) obtained by assembling these componentsis a target chemical graph to be inferred in Stage 5. The number of chemical graphs G ∗ ( C ) in thismanner is Y v ∈ V B |S ( v ) | × Y e ∈ E B |S ( e ) | , where we ignore a possible automorphism over the resulting graphs.For a base-edge e ∈ E B with a relatively large instance size δ e , the number | W pair ( e ) | of feasiblevector pairs in Step 5 still can be very large. In fact, the size | W | of a vector set W to be computedin Steps 2 to 4 can also be considerably large during an execution of the algorithm. For such acase, we impose a time limitation on the running time for computing W and a memory limitationon the number of vectors stored in a vector set W. With these limitations, we can compute onlya limited subset c W of each vector set W in Steps 2 to 4. Even with such a subset c W, we still canfind a large size of a subset c W pair ( e ) of W pair ( e ) in Step 5.Our algorithm also can deliver a lower bound on the number |T t ( xxx ∗ t ) | , t ∈ V B ∪ E B of all targetcomponents in the following way. In Step 1, we also compute the number t ( ) of rooted trees T ∈ T (0) in (i)-(iii). In Steps 2, 3 and 4, when a vector is constructed from two vectors ′ and ′′ , we iteratively compute the number t ( ) of all trees T such that is the frequency vector ofa fictitious tree of T by t ( ) := t ( ′ ) × t ( ′′ ). In Step 5, when a feasible vector pair ( ) ∈ W pair ( e ) is obtained for a base-edge e ∈ E B , we know that the number of the corresponding target e -components is t ( ) × t ( ). Possibly we compute a subset c W pair ( e ) of W pair in Step 4. Then(1 / P ( ) ∈ c W pair t ( ) × t ( ) gives a lower bound on the number |T e | of target e -components,where we divided by 2 since an axially symmetric target e -component can correspond to two vectorpairs in W pair ( e ). A lower bound on the number |T v | of target v -components for a base-vertex v ∈ V B can be obtained in a similar way. This section describes how to apply our new algorithm for generating chemical isomers in Stage 4after we obtain a chemical ρ -lean cyclic graph G † .Let ( σ co , σ nc , σ αβ ) be a specification of substructures and G † ∈ G ( σ co , σ nc , σ αβ ) be an ( σ co , σ nc , σ αβ )-extension, where we assume that the minimum σ co -extension C min ∈ C ( σ co ) is a simple connectedgraph with the minimum degree at least 2. Let C † = ( V co , E co ) denote the core of Cr( G † ) and E denote the set of edges e ∈ E (0 / that are removed in the construction of C † from the seed graph H-cyclic arXiv v5: December 4, 2020 G C .To generate chemical ρ -lean cyclic graphs G ∗ ∈ G ( σ co , σ nc , σ αβ ) by our new algorithm, wefirst choose a path-partition P = { P , P , . . . , P p } of the core C † . Recall that the base-graph G B = ( V B , E B = { e , e , . . . , e p } ) is determined by the partition P so that each edge e j ∈ E B directly joins the end-vertices of each path P j ∈ P and V B is the set of end-vertices of paths in P .We choose a path-partition P = { P , P , . . . , P p } so that the next condition is satisfied. V B = V C , E B = E C \ E ;i.e., each edge e i ∈ E B corresponds to an edge a i ∈ E C \ E .We next set a specification σ ch to be a set of the lower and upper bound functions ch LB , ch UB : V C ∪ E C → Z + for the set V B = V C of vertices and the set E B = E C \ E of edges. Observe thatany ( P , σ ch )-isomer G ∗ of G † is a ( σ co , σ nc , σ αβ )-extension, since the assignment of elements of Λ tothe base-vertices in V B remains unchanged among all ( P , σ ch )-isomer of G † . The converse is nottrue in general; i.e., there may be a ( σ co , σ nc , σ αβ )-extension G that is not a ( P , σ ch )-isomer of G † .In Stage 5, we run our new algorithm for generating ( P , σ ch )-isomers G ∗ of G † .We remark that when lower and upper bound functions ch LB and ch UB in a core specification σ co are uniform overall vertices or edges, we can choose a path-partition P in a more flexible way.In this case, we can apply our algorithm if a path-partition P satisfies V ∗ C ∪ V co3 ⊆ V B for the set V co3 of core-vertices v ∈ V co of degree at least 3, i.e., deg C † ( v ) ≥
3. We can choose apath in P so that it ends with a core-vertex v ∈ V co \ ( V ∗ C ∪ V ) of degree 2, i.e., deg C † ( v ) = 2. When a given chemical graph G † is not a ρ -lean cyclic graph, we can extend the definition of thechemical graph isomorphism in a flexible way. Suppose that G † is an acyclic graph. We first choosea vertex r as the root of tree G † , where r is not necessarily a graph-theoretically designated vertexsuch as a center or a centroid. For a branch-parameter ρ such as ρ = 2, find the set V br of all ρ -branches of the tree. We set V B := V br ∪ { r } to be a set of base-vertices and P = { P , . . . , P p } denote the collection of paths with end-vertices of base-vertices in V B and no internal vertices from V B . For each path P u,v ∈ P between two base-vartices u, v ∈ V B , prepare a base-edge e = uv andlet E B denote the set of the resulting base-edges. Then the base-graph G B = ( V B , E B ) in this caseis a tree. Based on G B , we can define the vertex and edge components of G † in a similar way,and can generate target v -components T v , v ∈ V B and target e -components T e , e ∈ E B , each ofthem independently, by our algorithm with a slight modification or the algorithm for trees T withbl ρ ( T ) = 2 due to Azam et al. [4].Now consider the case where G † is cyclic but not ρ -lean. In this case, some tree T rooted ata core-vertex may contain more than one leaf ρ -branch. Let T br denote the set of all these rootedtrees, and let V br denote the set of all ρ -branches of the trees in T br . Let V cobr denote the set ofcore-vertices v at which some tree T v ∈ T br is rooted. We find a path-partition P co of the coreCr( G † ) so that each vertex in V cobr is used as an end-vertex of some path P ∈ P co . For the trees in H-cyclic arXiv v5: December 4, 2020 T br , we find a path-partition P nc with paths between two ρ -branches in V br in an analogous wayof the above tree case. Finally set P := P co ∪ P nc and define the base-graph G B = ( V B , E B ) basedon P . We see that target v -components T v , v ∈ V B and target e -components T e , e ∈ E B can begenerated by our algorithm with a slight modification. In this paper, we employed the new mechanism of utilizing a target chemical graph G † obtained inStage 4 of the framework for inverse QSAR/QSPR to generate a larger number of target graphs G ∗ in Stage 5. We showed that a family of graphs G ∗ that are chemically isomorphic to G † canbe obtained by the dynamic programming algorithm designed in Section 6. Based on the newmechanism of Stage 5, we proposed a target specification on a seed graph as a flexible way ofspecifying a family of target chemical graphs. With this specification, we can realize requirementson partial topological substructure of the core of graphs and partial assignment of chemical elementsand bond-multiplicity within the framework for inverse QSAR/QSPR by ANNs and MILPs.The current topological specification proposed in this paper does not allow to fix part of thenon-core structure of a graph. We remark that it is not technically difficult to extend the MILPformulation in Section 5 and the algorithm for computing chemical isomers in Section 6 so that amore general specification for such a case can be handled. References [1] T. Akutsu, D. Fukagawa, J. Jansson, and K. Sadakane, “Inferring a graph from path fre-quency,”
Discrete Applied Mathematics , vol. 160, no. 10-11, pp. 1416–1428, 2012.[2] T. Akutsu and H. Nagamochi, “A mixed integer linear programming formulation to artificialneural networks,”
Proceedings of the 2nd International Conference on Information Scienceand Systems , March 2019, pp. 215–220.[3] N. A. Azam, R. Chiewvanichakorn, F. Zhang, A. Shurbevski, H. Nagamochi, and T. Akutsu,“A method for the inverse QSAR/QSPR based on artificial neural networks and mixed integerlinear programming,”
BIOINFORMATICS2020 , Malta, February 2020, pp.101–108.[4] N. A. Azam, J. Zhu, Y. Sun, Y. Shi, A. Shurbevski, L. Zhao, H. Nagamochi, and T. Akutsu,“A novel method for inference of acyclic chemical compounds with bounded branch-heightbased on artificial neural networks and integer programming,” arXiv:2009.09646[5] R. S. Bohacek, C. McMartin, and W. C. Guida, “The art and practice of structure-based drugdesign: A molecular modeling perspective,”
Medicinal Research Reviews , vol. 16, no. 1, pp.3–50, 1996.[6] R. Chiewvanichakorn, C. Wang, Z. Zhang, A. Shurbevski, H. Nagamochi, and T. Akutsu,“A method for the inverse QSAR/QSPR based on artificial neural networks and mixed in-teger linear programming,” 2020.
Proceedings of the 2020 10th International Conference on
H-cyclic arXiv v5: December 4, 2020 Bioscience, Biochemistry and Bioinformatics (ICBBB’20).
Association for Computing Ma-chinery, New York, NY, USA, 40–46.[7] N. De Cao and T. Kipf, “MolGAN: An implicit generative model for small molecular graphs,”arXiv:1805.11973, 2018.[8] H. Fujiwara, J. Wang, L. Zhao, H. Nagamochi, and T. Akutsu, “Enumerating treelike chemicalgraphs with given path frequency,”
Journal of Chemical Information and Modeling , vol. 48,no. 7, pp. 1345–1357, 2008.[9] R. G´omez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hern´andez-Lobato, B. S´anchez-Lengeling, D. Sheberla, J. Aguilera-Iparraguirre, T. D. Hirzel, R. P. Adams, and A. Aspuru-Guzik, “Automatic chemical design using a data-driven continuous representation ofmolecules,”
ACS Central Science , vol. 4, no. 2, pp. 268–276, 2018.[10] H. Ikebata, K. Hongo, T. Isomura, R. Maezono, and R. Yoshida, “Bayesian molecular designwith a chemical language model,”
Journal of Computer-aided Molecular Design , vol. 31, no. 4,pp. 379–391, 2017.[11] R. Ito, N. A. Azam, C. Wang, A. Shurbevski, H. Nagamochi, and T. Akutsu, “A novelmethod for the inverse QSAR/QSPR to monocyclic chemical compounds based on artificialneural networks and integer programming,”
BIOCOMP2020 , Las Vegas, Nevada, USA, 27-30July 2020.[12] A. Kerber, R. Laue, T. Gr¨uner, and M. Meringer, “MOLGEN 4.0,”
Match Communicationsin Mathematical and in Computer Chemistry , no. 37, pp. 205–208, 1998.[13] M. J. Kusner, B. Paige, and J. M. Hern´andez-Lobato, “Grammar variational autoencoder,”
Proceedings of the 34th International Conference on Machine Learning-Volume 70 , 2017, pp.1945–1954.[14] J. Li, H. Nagamochi, and T. Akutsu, “Enumerating substituted benzene isomers of tree-likechemical graphs,”
IEEE/ACM Transactions on Computational Biology and Bioinformatics ,vol. 15, no. 2, pp. 633–646, 2016.[15] K. Madhawa, K. Ishiguro, K. Nakago, and M. Abe, “GraphNVP: an invertible flow model forgenerating molecular graphs,” arXiv:1905.11600, 2019.[16] T. Miyao, H. Kaneko, and K. Funatsu, “Inverse QSPR/QSAR analysis for chemical structuregeneration (from y to x),”
Journal of Chemical Information and Modeling , vol. 56, no. 2, pp.286–299, 2016.[17] H. Nagamochi, “A detachment algorithm for inferring a graph from path frequency,”
Algo-rithmica , vol. 53, no. 2, pp. 207–224, 2009.[18] T. I. Netzeva et al. , “Current status of methods for defining the applicability domain of(quantitative) structure-activity relationships: The report and recommendations of ECVAMworkshop 52,”
Alternatives to Laboratory Animals , vol. 33, no. 2, pp. 155–173, 2005.
H-cyclic arXiv v5: December 4, 2020
Accounts of Chemical Research , vol. 48, no. 3,pp. 722–730, 2015.[20] C. Rupakheti, A. Virshup, W. Yang, and D. N. Beratan, “Strategy to discover diverse optimalmolecules in the small molecule universe,”
Journal of Chemical Information and Modeling ,vol. 55, no. 3, pp. 529–537, 2015.[21] M. H. S. Segler, T. Kogej, C. Tyrchan, and M. P. Waller, “Generating focused moleculelibraries for drug discovery with recurrent neural networks,”
ACS Central Science , vol. 4,no. 1, pp. 120–131, 2017.[22] C. Shi, M. Xu, Z. Zhu, W. Zhang, M. Zhang, and J. Tang, “GraphAF: a flow-based autore-gressive model for molecular graph generation,” arXiv:2001.09382, 2020.[23] M. I. Skvortsova, I. I. Baskin, O. L. Slovokhotova, V. A. Palyulin, and N. S. Zefirov, “Inverseproblem in QSAR/QSPR studies for the case of topological indices characterizing molecularshape (Kier indices),”
Journal of Chemical Information and Computer Sciences , vol. 33, no. 4,pp. 630–634, 1993.[24] M. Suzuki, H. Nagamochi, and T. Akutsu, “Efficient enumeration of monocyclic chemicalgraphs with given path frequencies,”
Journal of Cheminformatics , vol. 6, no. 1, p. 31, 2014.[25] Y. Tezuka and H. Oike, Topological polymer chemistry.
Progress in Polymer Science , 1069–1122.[26] Y. Tamura, Y. Y. Nishiyama, C. Wang, Y. Sun, A. Shurbevski, H. Nagamochi, and T. Akutsu,“Enumerating chemical graphs with mono-block 2-augmented tree structure from given upperand lower bounds on path frequencies,” arXiv:2004.06367, 2020.[27] K. Yamashita, R. Masui, X. Zhou, C. Wang, A. Shurbevski, H. Nagamochi, and T. Akutsu,“Enumerating chemical graphs with two disjoint cycles satisfying given path frequency spec-ifications,” arXiv:2004.08381, 2020.[28] X. Yang, J. Zhang, K. Yoshizoe, K. Terayama, and K. Tsuda, “ChemTS: an efficient pythonlibrary for de novo molecular generation,” Science and Technology of Advanced Materials ,vol. 18, no. 1, pp. 972–976, 2017.[29] F. Zhang, J. Zhu, R. Chiewvanichakorn, A. Shurbevski, H. Nagamochi, T. Akutsu, “A newinteger linear programming formulation to the inverse QSAR/QSPR for acyclic chemical com-pounds using skeleton trees,”
The 33rd International Conference on Industrial, Engineeringand Other Applications of Applied Intelligent Systems , September 22-25, 2020 Kitakyushu,Japan, Springer LNCS 12144, pp. 433–444.[30] J. Zhu, C. Wang A. Shurbevski, H. Nagamochi, and T. Akutsu, “A novel method for inferenceof chemical compounds of cycle index two with desired properties based on artificial neuralnetworks and integer programming,”
Algorithms , vol. 13, no. 5, 124, 2020.
H-cyclic arXiv v5: December 4, 2020 A All Constraints in an MILP Formulation for ChemicalCyclic Graphs
We define a standard encoding of a finite set A of elements to be a bijection σ : A → [1 , | A | ], wherewe denote by [ A ] the set [1 , | A | ] of integers and by [ e ] the encoded element σ ( e ). Let ǫ denote null , a fictitious chemical element that does not belong to any set of chemical elements, chemicalsymbols, adjacency-configurations and edge-configurations in the following formulation. Given afinite set A , let A ǫ denote the set A ∪ { ǫ } and define a standard encoding of A ǫ to be a bijection σ : A → [0 , | A | ] such that σ ( ǫ ) = 0, where we denote by [ A ǫ ] the set [0 , | A | ] of integers and by [ e ]the encoded element σ ( e ), where [ ǫ ] = 0.We choose a branch-parameter ρ and subsets Λ co , Λ nc ⊆ Λ of chemical elements, subsets Λ codg ⊆ Λ co × [2 ,
4] and Λ ncdg ⊆ Λ nc × [1 ,
4] of chemical symbols, subsets Γ co ⊆ Γ < (Λ codg ) ∪ Γ = (Λ codg ) andΓ in , Γ ex ⊆ Γ(Λ ncdg ) of edge-configurations.Let ( σ co , σ nc , σ αβ ) be a specification, and let G be a chemical ρ -lean graph in G ( σ co , σ nc , σ αβ ). A.1 Selecting Core-vertices and Core-edges
Recall that E (=1) = { e ∈ E C | ℓ LB ( e ) = ℓ UB ( e ) = 1 } ; E (0 / = { e ∈ E C | ℓ LB ( e ) = 0 , ℓ UB ( e ) = 1 } ; E ( ≥ = { e ∈ E C | ℓ LB ( e ) = 1 , ℓ UB ( e ) ≥ } ; E ( ≥ = { e ∈ E C | ℓ LB ( e ) ≥ } ;- Every edge a i ∈ E (=1) is included in G ;- Each edge a i ∈ E (0 / is included in G if necessary;- For each edge a i ∈ E ( ≥ , edge a i is not included in G and instead a path P i = ( v Ctail C ( i ) , v T j − , , v T j, , . . . , v T j + t, , v Chead C ( i ) )of length at least 2 from vertex v Ctail C ( i ) , to vertex v Chead C ( i ) , visiting some core-vertices in V T is constructed in G ; and- For each edge a i ∈ E ( ≥ , either edge a i is directly used in G or the above path P i of lengthat least 2 is constructed in G .Let t C , | V C | and denote V C by { v C i, | i ∈ [1 , t C ] } . Regard the seed graph G C as a digraphsuch that each edge a i with end-vertices v C j, and v C j ′ , is directed from v C j, to v C j ′ , when j < j ′ .For each directed edge a i ∈ E C , let head C ( i ) and tail C ( i ) denote the head and tail of e C ( i ); i.e., a i = ( v Ctail C ( i ) , , v Chead C ( i ) , ).Assume that E C = { a i | i ∈ [1 , m C ] } , E ( ≥ = { a k | k ∈ [1 , p ] } , E ( ≥ = { a k | k ∈ [ p + 1 , q ] } , E (0 / = { a i | i ∈ [ q + 1 , t ] } and E (=1) = { a i | i ∈ [ t + 1 , m C ] } for integers p, q and t . Define k C , | E ( ≥ ∪ E ( ≥ | , f k C , | E ( ≥ | . H-cyclic arXiv v5: December 4, 2020 P i for each edge a k ∈ E ( ≥ ∪ E ( ≥ , we regard the index k ∈ [1 , k C ] of each edge a k ∈ E ( ≥ ∪ E ( ≥ as the “color” of the edge. To introduce necessary linearconstraints that can construct such a path P k properly in our MILP, we assign the color k to thevertices v T j − , , v T j, , . . . , v T j + t, in V T when the above path P k is used in G .For each index s ∈ [1 , t C ], let E +(=1) ( s ) (resp., E − (=1) ( s )) denote the set of edges e ∈ E (=1) suchthat the tail (resp., head) of a i is v C s, . Similarly for E +(0 / ( s ), E − (0 / ( s ), E +( ≥ ( s ), E − ( ≥ ( s ), E +( ≥ ( s )and E − ( ≥ ( s ).Let I (=1) denote the set of indices i of edges a i ∈ E (=1) . Similarly for I (0 / , I ( ≥ , I ( ≥ , I +(=1) ( s ), I − (=1) ( s ), I +(0 / ( s ), I − (0 / ( s ), I +( ≥ ( s ), I − ( ≥ ( s ), I +( ≥ ( s ) and I − ( ≥ ( s ). Note that [1 , k C ] = I ( ≥ ∪ I ( ≥ and [ f k C + 1 , m C ] = I ( ≥ ∪ I (0 / ∪ I (=1) . constants: t C = | V C | , f k C = | E ( ≥ | , k C = | E ( ≥ ∪ E ( ≥ | , t T = cs UB − | V C | , m C = | E C | ,where a i ∈ E C \ ( E ( ≥ ∪ E ( ≥ ), i ∈ [ k C + 1 , m C ]; constants for core specification σ co : cs LB , cs UB ∈ [2 , n ∗ ]; lower and upper bounds on cs( G ); ℓ LB ( k ) , ℓ UB ( k ) ∈ [1 , t T ], k ∈ [1 , k C ]: lower and upper bounds on the length of path P k ; constants for core specification σ nc : bl LB ( i ) , bl UB ( i ) ∈ [0 , i ∈ [1 , t T ]: lower and upper bounds on bl ρ ( T i ) of the tree rootedat a vertex v C i, ; variables: e C ( i ) ∈ [0 , i ∈ [1 , m C ]: e C ( i ) represents edge a i ∈ E C , i ∈ [1 , m C ]( e C ( i ) = 1, i ∈ I (=1) ; e C ( i ) = 0, i ∈ I ( ≥ )( e C ( i ) = 1 ⇔ edge a i is used in G ); v T ( i, ∈ [0 , i ∈ [1 , t T ]: v T ( i,
0) = 1 ⇔ vertex v T i, is used in G ; e T ( i ) ∈ [0 , i ∈ [1 , t T + 1]: e T ( i ) represents edge e T i = ( v T i − , , v T i, ) ∈ E T ,where e T1 and e T t T +1 are fictitious edges ( e T ( i ) = 1 ⇔ edge e T i is used in G ); χ T ( i ) ∈ [0 , k C ], i ∈ [1 , t T ]: χ T ( i ) represents the color assigned to core-vertex v T i, ( χ T ( i ) = k > ⇔ vertex v T i, is assigned color k ; χ T ( i ) = 0 means that vertex v T i, is not used in G );clr T ( k ) ∈ [ ℓ LB ( k ) − , ℓ UB ( k ) − k ∈ [1 , k C ] clr T (0) ∈ [0 , t T ]: the number of vertices v T i, ∈ V T with color c (the range [ ℓ LB ( c ) − , ℓ UB ( c ) −
1] expresses a constraint in σ co ); δ T χ ( k ) ∈ [0 , k ∈ [0 , k C ]: δ T χ ( k ) = 1 ⇔ χ T ( i ) = k for some i ∈ [1 , t T ]; χ T ( i, k ) ∈ [0 , i ∈ [1 , t T ], k ∈ [0 , k C ] ( χ T ( i, k ) = 1 ⇔ χ T ( i ) = k ); g deg +C ( i ) ∈ [0 , i ∈ [1 , t C ]: the out-degree of vertex v C i, with the used edges e C in E C ; g deg − C ( i ) ∈ [0 , i ∈ [1 , t C ]: the in-degree of vertex v C i, with the used edges e C in E C ; variables for specification σ co : cs ∈ [cs LB , cs UB ]: the core size (the range [cs LB , cs UB ] expresses a constraint in σ co ); H-cyclic arXiv v5: December 4, 2020 constraints: e C ( i ) = 1 , i ∈ I (=1) , (3) e C ( i ) = 0 , clr T ( i ) ≥ , i ∈ I ( ≥ , (4) e C ( i ) + clr T ( i ) ≥ , clr T ( i ) ≤ t T · (1 − e C ( i )) , i ∈ I ( ≥ , (5) X c ∈ I − ( ≥ ( i ) ∪ I − (0 / ( i ) ∪ I − (=1) ( i ) e C ( c ) = g deg − C ( i ) , X c ∈ I +( ≥ ( i ) ∪ I +(0 / ( i ) ∪ I +(=1) ( i ) e C ( c ) = g deg +C ( i ) , i ∈ [1 , t C ] , (6) χ T ( i,
0) = 1 − v T ( i, , X k ∈ [0 ,k C ] χ T ( i, k ) = 1 , X k ∈ [0 ,k C ] k · χ T ( i, k ) = χ T ( i ) , i ∈ [1 , t T ] , (7) X i ∈ [1 ,t T ] χ T ( i, k ) = clr T ( k ) , t T · δ T χ ( k ) ≥ X i ∈ [1 ,t T ] χ T ( i, k ) ≥ δ T χ ( k ) , k ∈ [0 , k C ] , (8) v T ( i − , ≥ v T ( i, ,k C · ( v T ( i − , − e T ( i )) ≥ χ T ( i − − χ T ( i ) ≥ v T ( i − , − e T ( i ) , i ∈ [2 , t T ] , (9) t C + X i ∈ [1 ,t T ] v T ( i,
0) = cs . (10) H-cyclic arXiv v5: December 4, 2020 A.2 Constraints for Including Internal Vertices and Edges
Define the set of colors for the vertex set { u i | i ∈ [1 , e t C ] } ∪ V T to be [1 , c F ] with c F , e t C + t T = |{ u i | i ∈ [1 , e t C ] } ∪ V T | . Let each core-vertex v C i, , i ∈ [1 , e t C ] (resp., v T i, ∈ V T ) correspond to a color i ∈ [1 , c F ] (resp., i + e t C ∈ [1 , c F ]). Let tail F ( i ) := i for each color i ∈ [1 , e t C ] and tail F ( i ) := i − t C for each color i ∈ [ e t C + 1 , c F ]. When a path P = ( u, v F j, , v F j +1 , , . . . , v F j + t, ) from a vertex u ∈ V C ∪ V T is usedin G , we assign the color i ∈ [1 , c F ] of the vertex u to the vertices v F j, , v F j +1 , , . . . , v F j + t, in V F . constants: c F : the maximum number of different colors assigned to the vertices in V F ; constants for non-core specification σ nc : bl LB ( i ) ∈ [0 , i ∈ [1 , e t C ]: a lower bound on the number of leaf ρ -branchesin the tree rooted at u i ∈ V C ;bl LB ( k ) , bl UB ( k ) ∈ [0 , ℓ UB ( k ) − k ∈ [1 , k C ] = I ( ≥ ∪ I ( ≥ : lower and upper bounds on the sumof bl ρ ( T ) of the trees rooted at internal vertices of a path P k for an edge a k ∈ E ( ≥ ∪ E ( ≥ ; variables: bl G ∈ [0 , max[max { bl UB ( v ) | v ∈ V C } , max { bl UB ( e ) | e ∈ E C } ]: bl ρ ( G ); v F ( i, ∈ [0 , i ∈ [1 , t F ]: v F ( i,
0) = 1 ⇔ vertex v F i, is used in G ; e F ( i ) ∈ [0 , i ∈ [1 , t F + 1]: e F ( i ) represents edge e F i = v F i − , v F t, ,where e F1 and e F t F +1 are fictitious edges ( e F ( i ) = 1 ⇔ edge e F i is used in G ); χ F ( i ) ∈ [0 , c F ], i ∈ [1 , t F ]: χ F ( i ) represents the color assigned to non-core-vertex v F i, ( χ F ( i ) = c ⇔ vertex v F i, is assigned color c );clr F ( c ) ∈ [0 , t F ], c ∈ [0 , c F ]: the number of vertices v F i, with color c ; δ F χ ( c ) ∈ [bl LB ( c ) , c ∈ [1 , e t C ]: δ F χ ( c ) = 1 ⇔ χ F ( i ) = c for some i ∈ [1 , t F ]; δ F χ ( c ) ∈ [0 , c ∈ [ e t C + 1 , c F ]: δ F χ ( c ) = 1 ⇔ χ F ( i ) = c for some i ∈ [1 , t F ]; χ F ( i, c ) ∈ [0 , i ∈ [1 , t F ], c ∈ [0 , c F ]: χ F ( i, c ) = 1 ⇔ χ F ( i ) = c ; σ ( c ) ∈ [0 , c ∈ [0 , c F ]: σ ( c ) = 1 ⇔ the ρ -pendent-tree rooted at vertex v C c, ( c ≤ e t C )or v T c − f t C , ( c > e t C ) has the core height ch G . σ X ( i ) ∈ [0 , i ∈ [1 , t X ], X ∈ { C , T , F } : σ X ( i ) = 1 ⇔ the subtree of X i rootedat vertex v X i, has the core height ch G . variables for chemical specification σ nc : bl( k, i ) ∈ [0 , k ∈ [1 , k C ] = I ( ≥ ∪ I ( ≥ , i ∈ [1 , t T ]: bl( k, i ) = 1 ⇔ path P k contains vertex v T i, as an internal vertex and the ρ -fringe-tree rooted at v T i, contains a leaf ρ -branch; constraints: χ F ( i,
0) = 1 − v F ( i, , X c ∈ [0 ,c F ] χ F ( i, c ) = 1 , X c ∈ [0 ,c F ] c · χ F ( i, c ) = χ F ( i ) , i ∈ [1 , t F ] , (11) H-cyclic arXiv v5: December 4, 2020 X i ∈ [1 ,t F ] χ F ( i, c ) = clr F ( c ) , t F · δ F χ ( c ) ≥ X i ∈ [1 ,t F ] χ F ( i, c ) ≥ δ F χ ( c ) , c ∈ [0 , c F ] , (12) e F (1) = e F ( t F + 1) = 0 , (13) v F ( i − , ≥ v F ( i, ,c F · ( v F ( i − , − e F ( i )) ≥ χ F ( i − − χ F ( i ) ≥ v F ( i − , − e F ( i ) , i ∈ [2 , t F ] , (14) X c ∈ [1 ,c F ] δ F χ ( c ) = bl G , X c ∈ [1 ,c F ] σ ( c ) + X i ∈ [1 ,t X ] , X ∈{ C , T , F } σ X ( i ) = 1 , (15)bl( k, i ) ≥ δ F χ ( e t C + i ) + χ T ( i, k ) − , k ∈ [1 , k C ] , i ∈ [1 , t T ] , (16) X k ∈ [1 ,k C ] ,i ∈ [1 ,t T ] bl( k, i ) ≤ X i ∈ [1 ,t T ] δ F χ ( e t C + i ) , (17)bl LB ( k ) ≤ X i ∈ [1 ,t T ] bl( k, i ) ≤ bl UB ( k ) , k ∈ [1 , k C ] . (18) H-cyclic arXiv v5: December 4, 2020 A.3 Constraints for Including Fringe-trees
We set d max = 3 if dg nc4 , UB = 0, and d max = 4 if dg nc4 , UB ≥ P prc (1 , , P prc (2 , , P prc (2 , ,
2) and P prc (3 , ,
2) of orderedindex pairs in Section 2, we observe that some vertices in the trees T (1 , , T (2 , , T (1 , , T (2 , ,
2) and T (3 , ,
2) will not be used in any choice of subtrees. In this MILP formulation, weuse the reduce trees T ′ illustrated in Figure 16(a)-(d). (d) T’ ( , , ) e v (c) T’ ( , , ) v v e v v e e e v v e e v v v e e e v v v e e e v v e v v e root root (b) T’ ( , , ) e v (a) T’ ( , , ), T’ ( , , ), v e v v e e v v e v v e e v v e root root Figure 16: An illustration of reduced trees T ′ of rooted trees T ( a, b, c ): (a) A reduced tree for T (1 , ,
2) = T ( d max − , d max − , ρ ) with d max = 3 and ρ = 2 and for T (1 , ,
2) = T ( d max − , d max − , ρ ) with d max = 4 and ρ = 2; (b) A reduced tree for T (2 , ,
2) = T ( d max − , d max − , ρ ) with d max = 3 and ρ = 2; (c) A reduced tree for T (2 , ,
2) = T ( d max − , d max − , ρ ) with d max = 4 and ρ = 2; (d) A reduced tree for T (3 , ,
2) = T ( d max − , d max − , ρ ) with d max = 4 and ρ = 2.We introduce some notations of rooted trees in the scheme graph SG.For each rooted tree X i such that C i = T (∆ i , d max − , ρ ), T i = T ( d max − , d max − , ρ ) and F i = T ( d max − , d max − , ρ ), X ∈ { C , T , F } ,Cld X ( j ) denotes the set of the indices h of children v X i,h of a vertex v X i,j ;prt X ( j ) denotes the index h of the parent v X i,h of a non-root vertex v X i,j ;Dsn X ( p ) denotes the set of indices j of a vertex v X i,j whose depth is p ; P prc , X is a proper set P prc ( d X , d max − , ρ ) of index pairs,where d C = ∆ i , d T = 2 and d F = d max − j X p denotes the index j ∈ Dsn X ( p ) of the vertex v X i,j with depth p in the leftmost pathfrom the root (where assume that the height of a rooted treesatisfying P prc , X is given by the leftmost path from the root);We assume that every rooted tree T ∈ T prc ( d max − , d max − , ρ ) (resp., T ∈ T prc ( d max − , d max − , ρ )) satisfies a property that the leftmost path (or the path that visits children with the smallestindex) from the root is alway is the height ht( T ). H-cyclic arXiv v5: December 4, 2020 ρ -fringe-tree is chosen from a rooted tree C i , T i or F i , weintroduce the following set of variables and constraints. constants for non-core specification σ nc : n LB , n ∗ ≥ cs LB : lower and upper bounds on n ( G );ch LB ( i ) , ch UB ( i ) ∈ [0 , n ∗ ], i ∈ [1 , t T ]: lower and upper bounds on ht( T i ) of the tree rootedat a vertex v C i, ;ch LB ( k ) , ch UB ( k ) ∈ [0 , n ∗ ], k ∈ [1 , k C ] = I ( ≥ ∪ I ( ≥ : lower and upper bounds on the maximumheight ht( T ) of the tree T ∈ F ( P k ) of rooted at an internal vertex of a path P k for an edge a k ∈ E ( ≥ ∪ E ( ≥ ; variables: n G ∈ [ n LB , n ∗ ]: n ( G );ch G ∈ [0 , max[max { ch UB ( v ) | v ∈ V C } , max { ch UB ( e ) | e ∈ E C } ]: ch( G ); v T ( i, j ) ∈ [0 , i ∈ [1 , t T ], j ∈ [1 , n T ]: v T ( i, j ) = 1 ⇔ vertex v T i,j is used in G ; v T ( i, j ) = 1 and j ≥ ⇔ edge e T i,j is used in G ; v X ( i, j ) ∈ [0 , i ∈ [1 , t C ], j ∈ [0 , n C ( i )] (resp., i ∈ [1 , t F ], j ∈ [0 , n F ]), X = C (resp., X = F): v X ( i, j ) = 1 ⇔ vertex v X i,j is used in G ; v X ( i, j ) = 1 and j ≥ ⇔ edge e X i,j is used in G ; h X ( i ) ∈ [0 , ρ ], i ∈ [1 , t X ], X ∈ { C , T , F } : the height of tree X i ; variables for chemical specification σ nc : σ ( k, i ) ∈ [0 , k ∈ [1 , k C ] = I ( ≥ ∪ I ( ≥ , i ∈ [1 , t T ]: σ ( k, i ) = 1 ⇔ a tree rooted at a vertex v T i, with color k has the largest height among such trees; constraints: v C ( i,
0) = 1 , i ∈ [1 , t C ] , (19) v F ( i, j F ρ ) ≥ v F ( i, − e F ( i + 1) , i ∈ [1 , t F ] ( e F ( t F + 1) = 0) , (20) v X ( i, j ) ≥ v X ( i, h ) , i ∈ [1 , t X ] , ( j, h ) ∈ P prc , X , X ∈ { C , T , F } , (21) X p ∈ [1 ,ρ ] v X ( i, j X p ) = h X ( i ) , i ∈ [1 , t X ] , X ∈ { C , T , F } , (22) X j ∈ [0 ,n C ( i )] v X ( i, j ) ≤ X h ∈ Cld C (0) v C ( i, h ) , i ∈ [1 , t C ] , (23) H-cyclic arXiv v5: December 4, 2020 X j ∈ [0 ,n X ] v X ( i, j ) ≤ X h ∈ Cld X (0) v X ( i, h ) , i ∈ [1 , t X ] , X ∈ { T , F } , (24)ch G ≥ clr F ( c ) + ρ ≥ ch G − n ∗ · (1 − σ ( c )) , c ∈ [1 , c F ] , (25)ch G ≥ h X ( i ) ≥ ch G − n ∗ · (1 − σ X ( i )) , i ∈ [1 , t X ] , X ∈ { C , T , F } (26) X i ∈ [1 ,t C ] ,j ∈ [0 ,n C ( i )] v C ( i, j ) + X i ∈ [1 ,t X ] ,j ∈ [0 ,n X ] , X ∈{ T , F } v X ( i, j ) = n G , (27) h C ( i ) ≥ ch LB ( i ) − n ∗ δ F χ ( i ) , clr F ( i ) + ρ ≥ ch LB ( i ) ,h C ( i ) ≤ ch UB ( i ) , clr F ( i ) + ρ ≤ ch UB ( i ) + n ∗ (1 − δ F χ ( i )) , i ∈ [1 , e t C ] , (28)ch LB ( i ) ≤ h C ( i ) ≤ ch UB ( i ) , i ∈ [ e t C + 1 , t C ] , (29) h T ( i ) ≤ ch UB ( k ) + n ∗ ( δ F χ ( e t C + i ) + 1 − χ T ( i, k )) , clr F ( e t C + i ) + ρ ≤ ch UB ( k ) + n ∗ (2 − δ F χ ( e t C + i ) − χ T ( i, k )) ,k ∈ [1 , k C ] , i ∈ [1 , t T ] , (30) X i ∈ [1 ,t T ] σ ( k, i ) = δ T χ ( k ) , k ∈ [1 , k C ] , (31) χ T ( i, k ) ≥ σ ( k, i ) ,h T ( i ) ≥ ch LB ( k ) − n ∗ ( δ F χ ( e t C + i ) + 1 − σ ( k, i )) , clr F ( e t C + i ) + ρ ≥ ch LB ( k ) − n ∗ (2 − δ F χ ( e t C + i ) − σ ( k, i )) , k ∈ [1 , k C ] , i ∈ [1 , t T ] . (32) H-cyclic arXiv v5: December 4, 2020 A.4 Descriptor for the Number of Specified Degree
We include constraints to compute descriptors dg co i ( G ) and dg nc i ( G ), i ∈ [1 , constants for non-core specification σ nc : dg nc4 , UB ∈ [0 , n ∗ − cs LB ]: an upper bound on dg nc4 ( G ). variables: deg X ( i, j ) ∈ [0 , i ∈ [1 , t X ], j ∈ [0 , n X ] ( n C = n C ( i )), X ∈ { C , T , F } :the degree deg G ( v X i,j ) of vertex v X i,j in G ;deg CT ( i ) ∈ [0 , i ∈ [1 , t C ]: the number of edges from vertex v C i, to vertices v T j, , j ∈ [1 , t T ];deg TC ( i ) ∈ [0 , i ∈ [1 , t C ]: the number of edges from vertices v T j, , j ∈ [1 , t T ] to vertex v C i, ;deg exC ( i ) ∈ [0 , i ∈ [1 , t C ]: the number of children of vertex v C i, in the ρ -fringe-tree C i ; δ Xdg ( i, j, d ) ∈ [0 , i ∈ [1 , t X ], j ∈ [0 , n X ] ( n C = n C ( i )), d ∈ [1 , ∈ { C , T , F } : δ Xdg ( i, j, d ) = 1 ⇔ deg X ( i, j ) = d ;dg( d ) ∈ [dg LB ( d ) , dg UB ( d )], d ∈ [1 , v with deg G ( v ) = d ;dg co ( d ) , dg C ( d ) , dg T (dg) ∈ [0 , cs UB ], d ∈ [1 , v ∈ V ( G ) (resp., v ∈ V ( G ) ∩ V C and v ∈ V ( G ) ∩ V T ) with deg G ( v ) = d ;dg nc ( d ) , dg in ( d ) , dg ex ( d ) ∈ [0 , n ∗ − cs LB ], d ∈ [1 , v ∈ V ( G )(resp., ρ -internal vertices v ∈ V ( G ) ∩ V F and ρ -external-vertices v ∈ V ( G )) with deg G ( v ) = d ;dg X( p ) ( d ) ∈ [0 , n ∗ − cs LB ], d ∈ [1 , , p ∈ [1 , ρ ], X ∈ { C , T , F } :the number of ρ -external-vertices v ∈ V ( G ) ∩ V ( X i ) with depth p ∈ [1 , ρ ] and deg G ( v ) = d ; constraints: v X ( i, j ) + X h ∈ Cld X ( j ) v X ( i, h ) = deg X ( i, j ) , i ∈ [1 , t X ] , j ∈ [1 , n X ] ( n C = n C ( i )) , X ∈ { C , T , F } , (33) X k ∈ I +( ≥ ( i ) ∪ I +( ≥ ( i ) δ T χ ( k ) = deg CT ( i ) , X k ∈ I − ( ≥ ( i ) ∪ I − ( ≥ ( i ) δ T χ ( k ) = deg TC ( i ) , X h ∈ Cld C (0) v C ( i, h ) = deg exC ( i ) , i ∈ [1 , t C ] , (34) g deg − C ( i ) + g deg +C ( i ) + deg CT ( i ) + deg TC ( i ) + δ F χ ( i ) + deg exC ( i ) = deg C ( i, , i ∈ [1 , e t C ] , (35) g deg − C ( i ) + g deg +C ( i ) + deg CT ( i ) + deg TC ( i ) + deg exC ( i ) = deg C ( i, , i ∈ [ e t C + 1 , t C ] , (36)2 v T ( i,
0) + δ F χ ( e t C + i ) + X h ∈ Cld T (0) v T ( i, h ) = deg T ( i, ,i ∈ [1 , t T ] ( e T (1) = e T ( t T + 1) = 0) , (37) H-cyclic arXiv v5: December 4, 2020 v F ( i,
0) + e F ( i + 1) + X h ∈ Cld F (0) v F ( i, h ) = deg F ( i, ,i ∈ [1 , t F ] ( e F (1) = e F ( t F + 1) = 0) , (38) X d ∈ [0 , δ Xdg ( i, j, d ) = 1 , X d ∈ [1 , d · δ Xdg ( i, j, d ) = deg X ( i, j ) ,i ∈ [1 , t X ] , j ∈ [0 , n X ] ( n C = n C ( i )) , X ∈ { T , C , F } , (39) X i ∈ [1 ,t C ] ,j ∈ Dsn X ( p ) δ Xdg ( i, j, d ) = dg X( p ) ( d ) , d ∈ [1 , , p ∈ [1 , ρ ] , X ∈ { C , T , F } , (40) X i ∈ [1 ,t C ] δ Cdg ( i, , d ) = dg C ( d ) , X i ∈ [1 ,t T ] δ Tdg ( i, , d ) = dg T ( d ) , X i ∈ [1 ,t F ] δ Fdg ( i, , d ) = dg in ( d ) , dg in ( d ) + X p ∈ [1 ,ρ ] , X ∈{ C , T , F } dg X( p ) ( d ) = dg nc ( d ) , dg C ( d ) + dg T ( d ) = dg co ( d ) , d ∈ [1 , , (41)dg nc (4) ≤ dg nc4 , UB . (42) H-cyclic arXiv v5: December 4, 2020 A.5 Assigning Multiplicity
We prepare an integer variable β ( e ) for each edge e in the scheme graph SG to denote the bond-multiplicity of e in a selected graph G and include necessary constraints for the variables to satisfyin G . variables: β X ( i ) ∈ [0 , i ∈ [2 , t X ], X ∈ { T , F } : the bond-multiplicity of edge e X i ; β C ( i ) ∈ [0 , i ∈ [ f k C + 1 , m C ] = I ( ≥ ∪ I (0 / ∪ I (=1) : the bond-multiplicity ofedge a i ∈ E ( ≥ ∪ E (0 / ∪ E (=1) ; β X ( i, j ) ∈ [0 , i ∈ [1 , t X ], j ∈ [1 , n X ] ( n C = n C ( i )), X ∈ { C , T , F } :the bond-multiplicity of edge e X i,j = ( v X i, prt X ( j ) , v X i,j ); β + ( k ) , β − ( k ) ∈ [0 , k ∈ [1 , k C ] = I ( ≥ ∪ I ( ≥ : the bond-multiplicity of the first(resp., last) edge of path P k ; β in ( c ) ∈ [0 , c ∈ [1 , c F ]: the bond-multiplicity of the first edge of ρ -branch-subtree T c rooted at vertex c ; δ X β ( i, m ) ∈ [0 , i ∈ [2 , t X ], m ∈ [0 , ∈ { T , F } : δ X β ( i, m ) = 1 ⇔ β X ( i ) = m ; δ C β ( i, m ) ∈ [0 , i ∈ [ f k C , m C ] = I ( ≥ ∪ I (0 / ∪ I (=1) , m ∈ [0 , δ C β ( i, m ) = 1 ⇔ β C ( i ) = m ; δ X β ( i, j, m ) ∈ [0 , i ∈ [1 , t X ], j ∈ [1 , n X ] ( n C = n C ( i )), m ∈ [0 , ∈ { C , T , F } : δ X β ( i, j, m ) = 1 ⇔ β X ( i, j ) = m ; δ + β ( k, m ) , δ − β ( k, m ) ∈ [0 , k ∈ [1 , k C ] = I ( ≥ ∪ I ( ≥ , m ∈ [0 , δ + β ( k, m ) = 1 (resp., δ − β ( k, m ) = 1) ⇔ β + ( k ) = m (resp., β − ( k ) = m ); δ in β ( c, m ) ∈ [0 , c ∈ [1 , c F ], m ∈ [0 , δ in β ( c, m ) = 1 ⇔ β in ( c ) = m ;bd( m ) ∈ [0 , m UB ], m ∈ [1 , m in G ;bd co ( m ) ∈ [0 , m coUB ], m ∈ [1 , m in G ;bd in ( m ) ∈ [0 , m ncUB ], m ∈ [1 , ρ -internal edges with bond-multiplicity m in G ;bd ex ( m ) ∈ [0 , n ∗ ], m ∈ [1 , ρ -external edges with bond-multiplicity m in G ;bd X ( m ) ∈ [0 , m coUB ] , X ∈ { C , T , CT , TC , } , bd X ( m ) ∈ [0 , m ncUB ] , X ∈ { F , CF , TF } , m ∈ [1 , e X ( i,
0) with bond-multiplicity m in G ;bd X( p ) ( m ) ∈ [0 , n ∗ ] , m ∈ [1 , , X ∈ { C , T , F } : the number of edges e X ( i, j ) with j ∈ Dsn X ( p )and bond-multiplicity m in G ; constraints: e C ( i ) ≤ β C ( i ) ≤ e C ( i ) , i ∈ [ f k C + 1 , m C ] = I ( ≥ ∪ I (0 / ∪ I (=1) , (43) e X ( i ) ≤ β X ( i ) ≤ e X ( i ) , i ∈ [2 , t X ] , X ∈ { T , F } , (44) v X ( i, j ) ≤ β X ( i, j ) ≤ v X ( i, j ) , i ∈ [1 , t X ] , j ∈ [1 , n X ] ( n C = n C ( i )) , X ∈ { C , T , F } , (45) H-cyclic arXiv v5: December 4, 2020 δ T χ ( k ) ≤ β + ( k ) ≤ δ T χ ( k ) , δ T χ ( k ) ≤ β − ( k ) ≤ δ T χ ( k ) , k ∈ [1 , k C ] , (46) δ F χ ( c ) ≤ β in ( c ) ≤ δ F χ ( c ) , c ∈ [1 , c F ] , (47) X m ∈ [0 , δ X β ( i, m ) = 1 , X m ∈ [0 , m · δ X β ( i, m ) = β X ( i ) , i ∈ [2 , t X ] , X ∈ { T , F } , (48) X m ∈ [0 , δ C β ( i, m ) = 1 , X m ∈ [0 , m · δ C β ( i, m ) = β C ( i ) , i ∈ [ f k C + 1 , m C ] , (49) X m ∈ [0 , δ X β ( i, j, m ) = 1 , X m ∈ [0 , m · δ X β ( i, j, m ) = β X ( i, j ) ,i ∈ [1 , t X ] , j ∈ [1 , n X ]( n C = n C ( i )) , X ∈ { C , T , F } , (50) X m ∈ [0 , δ + β ( k, m ) = 1 , X m ∈ [0 , m · δ + β ( k, m ) = β + ( k ) , k ∈ [1 , k C ] , X m ∈ [0 , δ − β ( k, m ) = 1 , X m ∈ [0 , m · δ − β ( k, m ) = β − ( k ) , k ∈ [1 , k C ] , X m ∈ [0 , δ in β ( c, m ) = 1 , X m ∈ [0 , m · δ in β ( c, m ) = β in ( c ) , c ∈ [1 , c F ] , (51) X i ∈ [1 ,t X ] ,j ∈ Dsn X ( p ) δ X β ( i, j, m ) = bd X( p ) ( m ) , p ∈ [1 , ρ ] , X ∈ { C , T , F } , m ∈ [1 , , (52) X i ∈ [ f k C +1 ,m C ] δ C β ( i, m ) = bd C ( m ) , X i ∈ [2 ,t T ] δ T β ( i, m ) = bd T ( m ) , X k ∈ [1 ,k C ] δ + β ( k, m ) = bd CT ( m ) , X k ∈ [1 ,k C ] δ − β ( k, m ) = bd TC ( m ) , bd C ( m ) + bd T ( m ) + bd CT ( m ) + bd TC ( m ) = bd co ( m ) , X i ∈ [2 ,t F ] δ F β ( i, m ) = bd F ( m ) , X c ∈ [1 , f t C ] δ in β ( c, m ) = bd CF ( m ) , X c ∈ [ f t C +1 ,c F ] δ in β ( c, m ) = bd TF ( m ) , bd F ( m ) + bd TF ( m ) + bd CF ( m ) = bd in ( m ) , X p ∈ [1 ,ρ ] , X ∈{ C , T , F } bd X( p ) ( m ) = bd ex ( m ) , bd co ( m ) + bd in ( m ) + bd ex ( m ) = bd( m ) , m ∈ [1 , , (53) H-cyclic arXiv v5: December 4, 2020 A.6 Assigning Chemical Elements and Valence Condition
We include constraints so that each vertex v in a selected graph H satisfies the valence condition;i.e., P uv ∈ E ( H ) β ( uv ) ≤ val( α ( u )). With these constraints, a chemical ρ -lean graph G = ( H, α, β )on a selected subgraph H will be constructed.Let I VC ∗ denote the set of indices i of vertices v C i, ∈ V ∗ C and α ∗ ( i ) denote α ∗ ( v C i, ), i ∈ I VC ∗ . constants: A subsets Λ co , Λ nc ⊆ Λ of chemical elements, where we denote by [ e ] (resp., [ e ] co and [ e ] nc )of a standard encoding of an element e in of set Λ (resp., Λ co ǫ and Λ nc ǫ );A valence function: val : Λ → [1 , ∗ : Λ → Z (we let mass( a ) denote the observed mass of a chemical element a ∈ Λ, and define mass ∗ ( a ) , ⌊ · mass( a ) ⌋ ); constants for chemical specification σ αβ : Subsets Λ ∗ ( i ) ⊆ Λ co , i ∈ [1 , t C ];na LB ( a ) , na UB ( a ) ∈ [0 , n ∗ ], a ∈ Λ: lower and upper bounds on the number of vertices v with α ( v ) = a ;na tLB ( a ) , na tUB ( a ) ∈ [0 , n ∗ ], a ∈ Λ t , t ∈ { co , nc } : lower and upper bounds on the numberof core-vertices (or non-core-vertices) v with α ( v ) = a ; variables: β CT ( i ) , β TC ( i ) ∈ [0 , , i ∈ [1 , t T ]: the bond-multiplicity of edge e CT j,i (resp., e TC j,i ) if one exists; β CF ( i ) , β TF ( i ) ∈ [0 , , i ∈ [1 , t F ]: the bond-multiplicity of e CF j,i (resp., e TF j,i ) if one exists; α X ( i, ∈ [Λ co ǫ ] , δ X α ( i, , [ a ] co ) ∈ [0 , , a ∈ Λ co ǫ , i ∈ [1 , t X ] , X ∈ { C , T } α F ( i, j ) ∈ [Λ nc ǫ ] , δ F α ( i, j, [ a ] nc ) ∈ [0 , , a ∈ Λ nc ǫ , i ∈ [1 , t F ] , j ∈ [0 , n F ]: α X ( i, j ) ∈ [Λ nc ǫ ] , δ X α ( i, j, [ a ] nc ) ∈ [0 , , a ∈ Λ nc ǫ , i ∈ [1 , t X ] , j ∈ [1 , n X ] ( n C = n C ( i )) , X ∈ { C , T , F } : α X ( i, j ) = [ a ] t ≥
1, t ∈ { co , nc } (resp., α X ( i, j ) = 0) ⇔ δ X α ( i, j, [ a ] t ) = 1 (resp., δ X α ( i, j,
0) = 0) ⇔ α ( v X i,j ) = a ∈ Λ (resp., vertex v X i,j is not used in G );Mass ∈ Z + : P v ∈ V mass ∗ ( α ( v ));n H ∈ [0 , n ∗ ]: the number of hydrogen atoms to be included to G ; variables for chemical specification σ αβ : na([ a ]) ∈ [na LB ( a ) , na UB ( a )], a ∈ Λ: the number of vertices v with α ( v ) = a ;na co ([ a ] co ) , na C ([ a ] co ) , na T ([ a ] co ) ∈ [na coLB ( a ) , na coUB ( a )], a ∈ Λ: the number of core-vertices v ∈ V ( G ) (resp., v ∈ V ( G ) ∩ V C and v ∈ V ( G ) ∩ V T ) with α ( v ) = a ;na nc ([ a ] nc ) , na in ([ a ] nc ) , na X( p ) ([ a ] nc ) ∈ [na ncLB ( a ) , na ncUB ( a )], a ∈ Λ, X ∈ { C , T , F } :the number of non-core-vertices v ∈ V ( G ) (resp., ρ -internal vertices v ∈ V ( G ) ∩ V F and ρ -external-vertices v ∈ V ( G ) ∩ V ( X i ) with depth p ∈ [1 , ρ ]) such that α ( v ) = a ; H-cyclic arXiv v5: December 4, 2020 constraints: β + ( k ) − e T ( i ) − χ T ( i, k ) + 1) ≤ β CT ( i ) ≤ β + ( k ) + 3( e T ( i ) − χ T ( i, k ) + 1) , i ∈ [1 , t T ] ,β − ( k ) − e T ( i + 1) − χ T ( i, k ) + 1) ≤ β TC ( i ) ≤ β − ( k ) + 3( e T ( i + 1) − χ T ( i, k ) + 1) , i ∈ [1 , t T ] ,k ∈ [1 , k C ] , (54) β in ( c ) − e F ( i ) − χ F ( i, c ) + 1) ≤ β CF ( i ) ≤ β in ( c ) + 3( e F ( i ) − χ F ( i, c ) + 1) , i ∈ [1 , t F ] , c ∈ [1 , e t C ] ,β in ( c ) − e F ( i ) − χ F ( i, c ) + 1) ≤ β TF ( i ) ≤ β in ( c ) + 3( e F ( i ) − χ F ( i, c ) + 1) , i ∈ [1 , t F ] , c ∈ [ e t C + 1 , c F ] , (55) X a ∈ Λ co ǫ δ X α ( i, , [ a ] co ) = 1 , X a ∈ Λ co ǫ [ a ] co · δ X α ( i, , [ a ] co ) = α X ( i, ,i ∈ [1 , t X ] , X ∈ { C , T } , (56) X a ∈ Λ nc ǫ δ F α ( i, , [ a ] nc ) = 1 , X a ∈ Λ nc ǫ [ a ] nc · δ F α ( i, , [ a ] nc ) = α F ( i, , i ∈ [1 , t F ] , (57) X a ∈ Λ nc ǫ δ X α ( i, j, [ a ] nc ) = 1 , X a ∈ Λ nc ǫ [ a ] nc · δ X α ( i, j, [ a ] nc ) = α X ( i, j ) ,i ∈ [1 , t X ] , j ∈ [1 , n X ] ( n C = n C ( i )) , X ∈ { C , T , F } , (58) X j ∈ E C ( i ) β C ( j ) + X k ∈ I +( ≥ ( i ) ∪ I +( ≥ ( i ) β + ( k ) + X k ∈ I − ( ≥ ( i ) ∪ I − ( ≥ ( i ) β − ( k )+ β in ( i ) + X h ∈ Cld C (0) β C ( i, h ) ≤ X a ∈ Λ co val( a ) δ C α ( i, , [ a ] co ) , i ∈ [1 , e t C ] , (59) X j ∈ E C ( i ) β C ( j ) + X k ∈ I +( ≥ ( i ) ∪ I +( ≥ ( i ) β + ( k ) + X k ∈ I − ( ≥ ( i ) ∪ I − ( ≥ ( i ) β − ( k )+ X h ∈ Cld C (0) β C ( i, h ) ≤ X a ∈ Λ co val( a ) δ C α ( i, , [ a ] co ) , i ∈ [ e t C + 1 , t C ] , (60) β T ( i ) + β T ( i +1) + X h ∈ Cld T (0) β T ( i, h ) + β CT ( i ) + β TC ( i ) + β in ( e t C + i ) ≤ X a ∈ Λ co val( a ) δ T α ( i, , [ a ] co ) ,i ∈ [1 , t T ] ( β T (1) = β T ( t T + 1) = 0) , (61) H-cyclic arXiv v5: December 4, 2020 β F ( i ) + β F ( i +1) + β CF ( i ) + β TF ( i ) + X h ∈ Cld F (0) β F ( i, h ) ≤ X a ∈ Λ nc val( a ) δ F α ( j, , [ a ] nc ) ,i ∈ [1 , t F ] ( β F (1) = β F ( t F + 1) = 0) , (62) β X ( i, j ) + X h ∈ Cld X ( j ) β X ( i, h ) ≤ X a ∈ Λ nc val( a ) δ X α ( i, j, [ a ] nc ) ,i ∈ [1 , t X ] , j ∈ [1 , n X ] ( n C = n C ( i )) , X ∈ { C , T , F } , (63) X i ∈ [1 ,t C ] δ C α ( i, , [ a ] co ) = na C ([ a ] co ) , X i ∈ [1 ,t T ] δ T α ( i, , [ a ] co ) = na T ([ a ] co ) , a ∈ Λ co X i ∈ [1 ,t F ] δ F α ( i, , [ a ] nc ) = na in ([ a ] nc ) , a ∈ Λ nc , (64) X i ∈ [1 ,t X ] ,j ∈ Dsn X ( p )] δ X α ( i, j, [ a ] nc ) = na X( p ) ([ a ] nc ) , p ∈ [1 , ρ ] , X ∈ { C , T , F } , a ∈ Λ nc , (65)na C ([ a ] co ) + na T ([ a ] co ) = na co ([ a ] co ) , a ∈ Λ co , X p ∈ [1 ,ρ ] , X ∈{ C , T , F } na X( p ) ([ a ] nc ) = na ex ([ a ] nc ) , na in ([ a ] nc ) + na ex ([ a ] nc ) = na nc ([ a ] nc ) , a ∈ Λ nc , na co ([ a ] co ) + na nc ([ a ] nc ) = na([ a ]) , a ∈ Λ co ∩ Λ nc , na co ([ a ] co ) = na([ a ]) , a ∈ Λ co \ Λ nc , na nc ([ a ] nc ) = na([ a ]) , a ∈ Λ nc \ Λ co , (66) X a ∈ Λ mass ∗ ( a ) · na([ a ]) = Mass , (67) X a ∈ Λ val( a ) · na([ a ]) − X m ∈ [1 , , t ∈{ co , in , ex } m · bd t ( m ) = n H . (68) X a ∈ Λ ∗ ( i ) δ C α ( i, , [ a ] co ) = 1 , i ∈ [1 , t C ] , (69) H-cyclic arXiv v5: December 4, 2020 A.7 Constraints for Bounds on the Number of Bonds
We include constraints for specification of lower and upper bounds bd LB and bd UB . constants for chemical specification σ αβ : bd m, LB ( i ) , bd m, UB ( i ) ∈ [0 , cs UB ], i ∈ [1 , m C ], m ∈ [2 , E ( P i ) with bond-multiplicity m ; variables for chemical specification σ αβ : bd T ( k, i, m ) ∈ [0 , k ∈ [1 , k C ], i ∈ [2 , t T ], m ∈ [2 , T ( k, i, m ) = 1 ⇔ path P k contains edge e T i and β ( e T i ) = m ; constraints: bd m, LB ( i ) ≤ δ C β ( i, m ) ≤ bd m, UB ( i ) , i ∈ I (=1) ∪ I (0 / , m ∈ [2 , , (70)bd T ( k, i, m ) ≥ δ T β ( i, m ) + χ T ( i, k ) − , k ∈ [1 , k C ] , i ∈ [2 , t T ] , m ∈ [2 , , (71) X j ∈ [2 ,t T ] δ T β ( j, m ) ≥ X k ∈ [1 ,k C ] ,i ∈ [2 ,t T ] bd T ( k, i, m ) , m ∈ [2 , , (72)bd m, LB ( k ) ≤ X i ∈ [2 ,t T ] bd T ( k, i, m ) + δ + β ( k, m ) + δ − β ( k, m ) ≤ bd m, UB ( k ) ,k ∈ [1 , k C ] , m ∈ [2 , . (73) H-cyclic arXiv v5: December 4, 2020 A.8 Descriptor for the Number of Adjacency-configurations
We call a tuple ( a , b , m ) ∈ Λ × Λ × [1 ,
3] an adjacency-configuration . The adjacency-configurationof an edge-configuration ( µ = a d, µ ′ = b d ′ , m ) is defined to be ( a , b , m ). We include constraints tocompute the frequency of each adjacency-configuration in an inferred chemical graph G . constants: Sets Γ co , Γ in and Γ ex of edge-configurations, where µ ≤ ξ for any edge-configuration γ = ( µ, ξ, m ) ∈ Γ co ;Let γ of an edge-configuration γ = ( µ, ξ, m ) denote the edge-configuration ( µ, ξ, m );Let Γ co < = { ( µ, ξ, m ) ∈ Γ co | µ < ξ } , Γ co= = { ( µ, ξ, m ) ∈ Γ co | µ = ξ } andΓ co > = { γ | γ ∈ Γ co < } ;Let Γ coac , Γ inac , Γ exac Γ coac ,< , Γ coac , = and Γ coac ,> denote the sets of the adjacency-configurations of edge-configurations in the setsΓ co , Γ in , Γ ex Γ co < , Γ co= and Γ co > respectively;Let ν of an adjacency-configuration ν = ( a , b , m ) denote the adjacency-configuration ( b , a , m );Prepare a coding of each of the three sets Γ coac ∪ Γ coac ,> , Γ inac and Γ exac and let [ ν ] co (resp., [ ν ] in and [ ν ] ex ) denote the coded integer of an element ν in Γ coac ∪ Γ coac ,> (resp., Γ inac and Γ exac );Choose subsets e Γ Cac , e Γ Tac , e Γ CTac , e Γ TCac ⊆ Γ coac ∪ Γ coac ,> ; e Γ Fac , e Γ CFac , e Γ TFac ⊆ Γ inac ; e Γ exac ⊆ Γ exac : To compute the frequencyof adjacency-configurations exactly, set e Γ Cac := e Γ Tac := e Γ CTac := e Γ TCac := Γ coac ∪ Γ coac ,> ; e Γ Fac := e Γ CFac := e Γ TFac := Γ inac ; e Γ exac := Γ exac ;ac coLB ( ν ) , ac coUB ( ν ) ∈ [0 , m UB ] , ν ∈ Γ coac ,ac inLB ( ν ) , ac inUB ( ν ) ∈ [0 , n ∗ ] , ν ∈ Γ inac ,ac exLB ( ν ) , ac exUB ( ν ) ∈ [0 , n ∗ ] , ν ∈ Γ exac :lower and upper bounds on the number of core-edges e = uv (resp., ρ -internal edges and ρ -external edges e = ( u, v )) with α ( u ) = a , α ( v ) = b and β ( e ) = m ; variables: ac co ([ ν ] co ) ∈ [ac coLB ( ν )ac coUB ( ν )] , ν ∈ Γ coac ,ac in ([ ν ] in ) ∈ [ac inLB ( ν ) , ac inUB ( ν )] , ν ∈ Γ inac ,ac ex ([ ν ] ex ) ∈ [ac exLB ( ν ) , ac exUB ( ν )] , ν ∈ Γ exac :the number of core-edges (resp., ρ -internal edges, ρ -external edges and non-core-edges)with adjacency-configuration ν ;ac C ([ ν ] co ) ∈ [0 , m C ] , ν ∈ e Γ Cac , ac T ([ ν ] co ) ∈ [0 , t T ] , ν ∈ e Γ Tac , ac F ([ ν ] in ) ∈ [0 , t F ] , ν ∈ e Γ Fac :the number of core-edges e C ∈ E C (resp., core-edges e T ∈ E T and ρ -internal edges e F ∈ E F )with adjacency-configuration ν ;ac X( p ) ([ ν ] ex ) ∈ [1 , n X ] , ν ∈ e Γ exac , p ∈ [1 , ρ ] , X ∈ { C , T , F } : the number of ρ -external edgeswith depth p and adjacency-configuration ν in the ρ -fringe-tree in X i ; H-cyclic arXiv v5: December 4, 2020 CT ([ ν ] co ) ∈ [0 , min { k C , t T } ] , ν ∈ e Γ CTac , ac TC ([ ν ] co ) ∈ [0 , min { k C , t T } ] , ν ∈ e Γ CTac ,ac CF ([ ν ] in ) ∈ [0 , e t C ] , ν ∈ e Γ CFac , ac TF ([ ν ] in ) ∈ [0 , t T ] , ν ∈ e Γ TFac : the number of core-edges e CT ∈ E CT (resp., core-edges e TC ∈ E TC and ρ -internal edges e CF ∈ E CF and e TF ∈ E TF )with adjacency-configuration ν ; δ Cac ( i, [ ν ] co ) ∈ [0 , , i ∈ [ f k C + 1 , m C ] = I ( ≥ ∪ I (0 / ∪ I (=1) , ν ∈ e Γ Cac , δ Tac ( i, [ ν ] co ) ∈ [0 , , i ∈ [2 , t T ] , ν ∈ e Γ Tac , δ Fac ( i, [ ν ] in ) ∈ [0 , , i ∈ [2 , t F ] , ν ∈ e Γ Fac : δ Xac ( i, [ ν ] t ) = 1 ⇔ edge e X i has adjacency-configuration ν ; δ Xac ( i, j, [ ν ] ex ) ∈ [0 , , i ∈ [1 , t X ] , j ∈ [1 , n X ] , ν ∈ e Γ exac , X ∈ { C , T , F } : δ Xac ( i, j, [ ν ] ex ) = 1 ⇔ ρ -external edge e X i,j has adjacency-configuration ν ; δ CTac ( k, [ ν ] co ) , δ TCac ( k, [ ν ] co ) ∈ [0 , , k ∈ [1 , k C ] = I ( ≥ ∪ I ( ≥ , ν ∈ e Γ CTac : δ CTac ( k, [ ν ] co ) = 1 (resp., δ TCac ( k, [ ν ] co ) = 1) ⇔ edge e CTtail C ( k ) ,j (resp., e TChead C ( k ) ,j )for some j ∈ [1 , t T ] has adjacency-configuration ν ; δ CFac ( c, [ ν ] in ) ∈ [0 , , c ∈ [1 , e t C ] , ν ∈ e Γ CFac : δ CFac ( c, [ ν ] in ) = 1 ⇔ edge e CF c,i for some i ∈ [1 , t F ] has adjacency-configuration ν ; δ TFac ( i, [ ν ] in ) ∈ [0 , , i ∈ [1 , t T ] , ν ∈ e Γ TFac : δ TFac ( i, [ ν ] in ) = 1 ⇔ edge e TF i,j for some j ∈ [1 , t F ] has adjacency-configuration ν ; α CT ( k ) , α TC ( k ) ∈ [0 , | Λ co | ] , k ∈ [1 , k C ]: α ( v ) of the edge ( v Ctail( k ) , v ) ∈ E CT (resp., ( v, v Chead( k ) ) ∈ E TC ) if any; α CF ( c ) ∈ [0 , | Λ nc | ] , c ∈ [1 , e t C ]: α ( v ) of the edge ( v C c, , v ) ∈ E CF if any; α TF ( i ) ∈ [0 , | Λ nc | ] , i ∈ [1 , t T ]: α ( v ) of the edge ( v T i, , v ) ∈ E TF if any;∆ C+ac ( i ) , ∆ C − ac ( i ) , ∈ [0 , | Λ co | ] , i ∈ [ f k C + 1 , m C ],∆ T+ac ( i ) , ∆ T − ac ( i ) ∈ [0 , | Λ co | ] , i ∈ [2 , t T ],∆ F+ac ( i ) , ∆ F − ac ( i ) ∈ [0 , | Λ nc | ] , i ∈ [2 , t F ]:∆ X+ac ( i ) = ∆ X − ac ( i ) = 0 (resp., ∆ X+ac ( i ) = α ( u ) and ∆ X − ac ( i ) = α ( v )) ⇔ edge e X i = ( u, v ) ∈ E X is used in G (resp., e X i E ( G ));∆ X+ac ( i, j ) , ∆ X − ac ( i, j ) ∈ [0 , | Λ nc | ] , i ∈ [1 , t X ] , j ∈ [1 , n X ] , X ∈ { C , T , F } :∆ X+ac ( i, j ) = ∆ X − ac ( i, j ) = 0 (resp., ∆ X+ac ( i, j ) = α ( u ) and ∆ X − ac ( i, j ) = α ( v )) ⇔ ρ -external edge e X i,j = ( u, v ) is used in G (resp., e X i,j E ( G ));∆ CT+ac ( k ) , ∆ CT − ac ( k ) ∈ [0 , | Λ co | ] , k ∈ [1 , k C ] = I ( ≥ ∪ I ( ≥ :∆ CT+ac ( k ) = ∆ CT − ac ( k ) = 0 (resp., ∆ CT+ac ( k ) = α ( u ) and ∆ CT − ac ( k ) = α ( v )) ⇔ edge e CTtail C ( k ) ,j = ( u, v ) ∈ E CT for some j ∈ [1 , t T ] is used in G (resp., otherwise);∆ TC+ac ( k ) , ∆ TC − ac ( k ) ∈ [0 , | Λ co | ] , k ∈ [1 , k C ] = I ( ≥ ∪ I ( ≥ : Analogous with ∆ CT+ac ( k ) and ∆ CT − ac ( k );∆ CF+ac ( c ) ∈ [0 , | Λ co | ] , ∆ CF − ac ( c ) ∈ [0 , | Λ nc | ] , c ∈ [1 , e t C ]:∆ CF+ac ( c ) = ∆ CF − ac ( c ) = 0 (resp., ∆ CF+ac ( c ) = α ( u ) and ∆ CF − ac ( c ) = α ( v )) ⇔ edge e CF c,i = ( u, v ) ∈ E CF for some i ∈ [1 , t F ] is used in G (resp., otherwise);∆ TF+ac ( i ) ∈ [0 , | Λ co | ] , ∆ TF − ac ( i ) ∈ [0 , | Λ nc | ] , i ∈ [1 , t T ]: Analogous with ∆ CF+ac ( c ) and ∆ CF − ac ( c ); constraints: H-cyclic arXiv v5: December 4, 2020 C ([ ν ] co ) = 0 , ν ∈ Γ coac \ e Γ Cac , ac T ([ ν ] co ) = 0 , ν ∈ Γ coac \ e Γ Tac , ac F ([ ν ] in ) = 0 , ν ∈ Γ inac \ e Γ Fac , ac X( p ) ([ ν ] ex ) = 0 , ν ∈ Γ exac \ e Γ exac ,p ∈ [1 , ρ ] , X ∈ { C , T , F } , ac CT ([ ν ] co ) = 0 , ν ∈ Γ coac \ e Γ CTac , ac TC ([ ν ] co ) = 0 , ν ∈ Γ coac \ e Γ TCac , ac CF ([ ν ] in ) = 0 , ν ∈ Γ inac \ e Γ CFac , ac TF ([ ν ] in ) = 0 , ν ∈ Γ inac \ e Γ TFac , (74) X ( a , b ,m )= ν ∈ Γ coac ac C ([ ν ] co ) = X i ∈ [ f k C +1 ,m C ] δ C β ( i, m ) , m ∈ [1 , , X ( a , b ,m )= ν ∈ Γ coac ac T ([ ν ] co ) = X i ∈ [2 ,t T ] δ T β ( i, m ) , m ∈ [1 , , X ( a , b ,m )= ν ∈ Γ inac ac F ([ ν ] in ) = X i ∈ [2 ,t F ] δ F β ( i, m ) , m ∈ [1 , , X ( a , b ,m )= ν ∈ Γ exac ac X( p ) ([ ν ] ex ) = X i ∈ [1 ,t X ] ,j ∈ Dsn X ( p ) δ X β ( i, j, m ) , p ∈ [1 , ρ ] , X ∈ { C , T , F } , m ∈ [1 , , X ( a , b ,m )= ν ∈ Γ coac ac CT ([ ν ] co ) = X k ∈ [1 ,k C ] δ + β ( k, m ) , m ∈ [1 , , X ( a , b ,m )= ν ∈ Γ coac ac TC ([ ν ] co ) = X k ∈ [1 ,k C ] δ − β ( k, m ) , m ∈ [1 , , X ( a , b ,m )= ν ∈ Γ inac ac CF ([ ν ] in ) = X c ∈ [1 , f t C ] δ in β ( c, m ) , m ∈ [1 , , X ( a , b ,m )= ν ∈ Γ inac ac TF ([ ν ] in ) = X c ∈ [ f t C +1 ,c F ] δ in β ( c, m ) , m ∈ [1 , , (75) H-cyclic arXiv v5: December 4, 2020 X ν =( a , b ,m ) ∈ e Γ Cac m · δ Cac ( i, [ ν ] co ) = β C ( i ) , ∆ C+ac ( i ) + X ν =( a , b ,m ) ∈ e Γ Cac [ a ] co δ Cac ( i, [ ν ] co ) = α C (tail C ( i ) , , ∆ C − ac ( i ) + X ν =( a , b ,m ) ∈ e Γ Cac [ b ] co δ Cac ( i, [ ν ] co ) = α C (head C ( i ) , , ∆ C+ac ( i ) + ∆ C − ac ( i ) ≤ | Λ co | (1 − e C ( i )) , i ∈ [ f k C + 1 , m C ] , X i ∈ [ f k C +1 ,m C ] δ Cac ( i, [ ν ] co ) = ac C ([ ν ] co ) , ν ∈ e Γ Cac , (76) X ν =( a , b ,m ) ∈ e Γ Tac m · δ Tac ( i, [ ν ] co ) = β T ( i ) , ∆ T+ac ( i ) + X ν =( a , b ,m ) ∈ e Γ Tac [ a ] co δ Tac ( i, [ ν ] co ) = α T ( i − , , ∆ T − ac ( i ) + X ν =( a , b ,m ) ∈ e Γ Tac [ b ] co δ Tac ( i, [ ν ] co ) = α T ( i, , ∆ T+ac ( i ) + ∆ T − ac ( i ) ≤ | Λ co | (1 − e T ( i )) , i ∈ [2 , t T ] , X i ∈ [2 ,t T ] δ Tac ( i, [ ν ] co ) = ac T ([ ν ] co ) , ν ∈ e Γ Tac , (77) X ν =( a , b ,m ) ∈ e Γ Fac m · δ Fac ( i, [ ν ] in ) = β F ( i ) , ∆ F+ac ( i ) + X ν =( a , b ,m ) ∈ e Γ Fac [ a ] nc δ Fac ( i, [ ν ] in ) = α F ( i − , , ∆ F − ac ( i ) + X ν =( a , b ,m ) ∈ e Γ Fac [ b ] nc δ Fac ( i, [ ν ] in ) = α F ( i, , ∆ F+ac ( i ) + ∆ F − ac ( i ) ≤ | Λ nc | (1 − e F ( i )) , i ∈ [2 , t F ] , X i ∈ [2 ,t F ] δ Fac ( i, [ ν ] in ) = ac F ([ ν ] in ) , ν ∈ e Γ Fac , (78) H-cyclic arXiv v5: December 4, 2020 X ν =( a , b ,m ) ∈ e Γ exac m · δ Xac ( i, j, [ ν ] ex ) = β X ( i, j ) , ∆ X+ac ( i, j ) + X ν =( a , b ,m ) ∈ e Γ exac [ a ] nc δ Xac ( i, j, [ ν ] ex ) = α X ( i, prt X ( j )) , ∆ X − ac ( i, j ) + X ν =( a , b ,m ) ∈ e Γ exac [ b ] nc δ Xac ( i, j, [ ν ] ex ) = α X ( i, j ) , ∆ X+ac ( i, j ) + ∆ X − ac ( i, j ) ≤ | Λ nc | (1 − v X ( i, j )) , i ∈ [1 , t X ] , j ∈ [1 , n X ] , X i ∈ [1 ,t X ] ,j ∈ Dsn X ( p ) δ Xac ( i, j, [ ν ] ex ) = ac X( p ) ([ ν ] ex ) , ν ∈ e Γ exac , p ∈ [1 , ρ ] , X ∈ { C , T , F } , (79) α T ( i,
0) + | Λ co | (1 − χ T ( i, k ) + e T ( i )) ≥ α CT ( k ) ,α CT ( k ) ≥ α T ( i, − | Λ co | (1 − χ T ( i, k ) + e T ( i )) , i ∈ [1 , t T ] , X ν =( a , b ,m ) ∈ e Γ CTac m · δ CTac ( k, [ ν ] co ) = β + ( k ) , ∆ CT+ac ( k ) + X ν =( a , b ,m ) ∈ e Γ CTac [ a ] co δ CTac ( k, [ ν ] co ) = α C (tail C ( k ) , , ∆ CT − ac ( k ) + X ν =( a , b ,m ) ∈ e Γ CTac [ b ] co δ CTac ( k, [ ν ] co ) = α CT ( k ) , ∆ CT+ac ( k ) + ∆ CT − ac ( k ) ≤ | Λ co | (1 − δ T χ ( k )) , k ∈ [1 , k C ] , X k ∈ [1 ,k C ] δ CTac ( k, [ ν ] co ) = ac CT ([ ν ] co ) , ν ∈ e Γ CTac , (80) α T ( i,
0) + | Λ co | (1 − χ T ( i, k ) + e T ( i + 1)) ≥ α TC ( k ) ,α TC ( k ) ≥ α T ( i, − | Λ co | (1 − χ T ( i, k ) + e T ( i + 1)) , i ∈ [1 , t T ] , X ν =( a , b ,m ) ∈ e Γ TCac m · δ TCac ( k, [ ν ] co ) = β − ( k ) , ∆ TC+ac ( k ) + X ν =( a , b ,m ) ∈ e Γ TCac [ a ] co δ TCac ( k, [ ν ] co ) = α TC ( k ) , ∆ TC − ac ( k ) + X ν =( a , b ,m ) ∈ e Γ TCac [ b ] co δ TCac ( k, [ ν ] co ) = α C (head C ( k ) , , ∆ TC+ac ( k ) + ∆ TC − ac ( k ) ≤ | Λ co | (1 − δ T χ ( k )) , k ∈ [1 , k C ] , X k ∈ [1 ,k C ] δ TCac ( k, [ ν ] co ) = ac TC ([ ν ] co ) , ν ∈ e Γ TCac , (81) H-cyclic arXiv v5: December 4, 2020 α F ( i,
0) + | Λ nc | (1 − χ F ( i, c ) + e F ( i )) ≥ α CF ( c ) ,α CF ( c ) ≥ α F ( i, − | Λ nc | (1 − χ F ( i, c ) + e F ( i )) , i ∈ [1 , t F ] , X ν =( a , b ,m ) ∈ e Γ CFac m · δ CFac ( c, [ ν ] in ) = β in ( c ) , ∆ CF+ac ( c ) + X ν =( a , b ,m ) ∈ e Γ CFac [ a ] co δ CFac ( c, [ ν ] in ) = α C (head C ( c ) , , ∆ CF − ac ( c ) + X ν =( a , b ,m ) ∈ e Γ CFac [ b ] nc δ CFac ( c, [ ν ] in ) = α CF ( c ) , ∆ CF+ac ( c ) + ∆ CF − ac ( c ) ≤ {| Λ co | , | Λ nc |} (1 − δ F χ ( c )) , c ∈ [1 , e t C ] , X c ∈ [1 , f t C ] δ CFac ( c, [ ν ] in ) = ac CF ([ ν ] in ) , ν ∈ e Γ CFac , (82) α F ( j,
0) + | Λ nc | (1 − χ F ( j, i + e t C ) + e F ( j )) ≥ α TF ( i ) ,α TF ( i ) ≥ α F ( j, − | Λ nc | (1 − χ F ( j, i + e t C ) + e F ( j )) , j ∈ [1 , t F ] , X ν =( a , b ,m ) ∈ e Γ TFac m · δ TFac ( i, [ ν ] in ) = β in ( i + e t C ) , ∆ TF+ac ( i ) + X ν =( a , b ,m ) ∈ e Γ TFac [ a ] co δ TFac ( i, [ ν ] in ) = α T ( i, , ∆ TF − ac ( i ) + X ν =( a , b ,m ) ∈ e Γ TFac [ b ] nc δ TFac ( i, [ ν ] in ) = α TF ( i ) , ∆ TF+ac ( i ) + ∆ TF − ac ( i ) ≤ {| Λ co | , | Λ nc |} (1 − δ F χ ( i + e t C )) , i ∈ [1 , t T ] , X i ∈ [1 ,t T ] δ TFac ( i, [ ν ] in ) = ac TF ([ ν ] in ) , ν ∈ e Γ TFac , (83) X X ∈{ C , T , CT , TC } (ac X ([ ν ] co ) + ac X ([ ν ] co )) = ac co ([ ν ] co ) , ν ∈ Γ coac ,< , X X ∈{ C , T , CT , TC } ac X ([ ν ] co ) = ac co ([ ν ] co ) , ν ∈ Γ coac , = , X X ∈{ F , CF , TF } ac X ([ ν ] co ) = ac in ([ ν ] co ) , ν ∈ Γ inac , X p ∈ [1 ,ρ ] , X ∈{ T , C , F } ac X( p ) ([ ν ] co ) = ac ex ([ ν ] co ) , ν ∈ Γ exac . (84) H-cyclic arXiv v5: December 4, 2020 A.9 Descriptor for the Number of Chemical Symbols
We include constraints for computing the frequency of each chemical symbol in Λ dg . Let cs( v )denote the chemical symbol of a vertex v in a chemical graph G to be inferred; i.e., cs( v ) = µ = a d ∈ Λ dg such that α ( v ) = a and deg G ( v ) = d . constants: Sets Λ codg and Λ ncdg of chemical symbols;Prepare a coding of each of the two sets Λ codg and Λ ncdg and let [ µ ] co (resp., [ µ ] nc ) denotethe coded integer of an element µ ∈ Λ codg (resp., Λ ncdg );Choose subsets e Λ Cdg , e Λ Tdg ⊆ Λ codg ; e Λ Fdg , e Λ C , ncdg , e Λ T , ncdg , e Λ F , ncdg ⊆ Λ ncdg :To compute the frequency of chemical symbols exactly, set e Λ Cdg := e Λ Tdg := Λ codg ; e Λ Fdg := e Λ C , ncdg := e Λ T , ncdg := e Λ F , ncdg := Λ ncdg ; variables: ns co ([ µ ] co ) ∈ [0 , cs UB ], µ ∈ Λ codg : the number of core-vertices v with cs( v ) = µ ;ns nc ([ µ ] nc ) ∈ [0 , n ∗ − cs LB ], µ ∈ Λ ncdg : the number of non-core-vertices v with cs( v ) = µ ; δ Xns ( i, , [ µ ] co ) ∈ [0 , i ∈ [1 , t X ] , j ∈ [0 , n X ] , µ ∈ Λ codg , X ∈ { C , T } , δ Fns ( i, , [ µ ] nc ) ∈ [0 , , i ∈ [1 , t F ] , µ ∈ Λ ncdg , δ Xns ( i, j, [ µ ] nc ) ∈ [0 , i ∈ [1 , t X ] , j ∈ [1 , n X ] ( n C = n C ( i )), µ = a d ∈ Λ ncdg , X ∈ { C , T , F } : δ Xns ( i, j, [ µ ] nc ) = 1 ⇔ α ( v X i,j ) = a and deg G ( v X i,j ) = d ; constraints: X µ ∈ e Λ Xdg ∪{ ǫ } δ Xns ( i, , [ µ ] co ) = 1 , X µ = a d ∈ e Λ Xdg [ a ] co · δ Xns ( i, , [ µ ] co ) = α X ( i, , X µ = a d ∈ e Λ Xdg d · δ Xns ( i, , [ µ ] co ) = deg X ( i, ,i ∈ [1 , t X ] , X ∈ { C , T } , (85) X µ ∈ e Λ Fdg ∪{ ǫ } δ Fns ( i, , [ µ ] nc ) = 1 , X µ = a d ∈ e Λ Fdg [ a ] nc · δ Fns ( i, , [ µ ] nc ) = α F ( i, , X µ = a d ∈ e Λ Fdg d · δ Fns ( i, , [ µ ] nc ) = deg F ( i, , i ∈ [1 , t F ] , (86) H-cyclic arXiv v5: December 4, 2020 X µ ∈ e Λ X , ncdg ∪{ ǫ } δ Xns ( i, j, [ µ ] nc ) = 1 , X µ = a d ∈ e Λ X , ncdg [ a ] nc · δ Xns ( i, j, [ µ ] nc ) = α X ( i, j ) , X µ = a d ∈ e Λ X , ncdg d · δ Xns ( i, j, [ µ ] nc ) = deg X ( i, j ) ,i ∈ [1 , t X ] , j ∈ [1 , n X ] ( n C = n C ( i )) , X ∈ { C , T , F } , (87) X i ∈ [1 ,t C ] δ Cns ( i, , [ µ ] co ) + X i ∈ [1 ,t T ] δ Tns ( i, , [ µ ] co ) = ns co ([ µ ] co ) , µ ∈ Λ codg , X i ∈ [1 ,t F ] δ Fns ( i, , [ µ ] nc ) + X i ∈ [1 ,t X ] ,j ∈ [1 ,n X ] , X ∈{ C , T , F } δ Xns ( i, j, [ µ ] nc ) = ns nc ([ µ ] nc ) , µ ∈ Λ ncdg . (88) H-cyclic arXiv v5: December 4, 2020 A.10 Descriptor for the Number of Edge-configurations
We include constraints to compute the frequency of each edge-configuration in an inferred chemicalgraph G . constants: Sets Γ co , Γ in and Γ ex of edge-configurations, where µ ≤ ξ for any edge-configuration γ = ( µ, ξ, m ) ∈ Γ co ;Let Γ co < = { ( µ, ξ, m ) ∈ Γ co | µ < ξ } , Γ co= = { ( µ, ξ, m ) ∈ Γ co | µ = ξ } andΓ co > = { ( ξ, µ, m ) | ( µ, ξ, m ) ∈ Γ co < } ;Prepare a coding of each of the three sets Γ co ∪ Γ co > , Γ in and Γ ex and let [ γ ] co (resp., [ γ ] in and [ γ ] ex ) denote the coded integer of an element γ in Γ co ∪ Γ co > (resp., Γ in and Γ ex );Choose subsets e Γ Cec , e Γ Tec , e Γ CTec , e Γ TCec ⊆ Γ co ∪ Γ co > ; e Γ Fec , e Γ CFec , e Γ TFec ⊆ Γ in ; e Γ exec ⊆ Γ ex : To compute the frequencyof edge-configurations exactly, set e Γ Cec := e Γ Tec := e Γ CTec := e Γ TCec := Γ co ∪ Γ co > ; e Γ Fec := e Γ CFec := e Γ TFec := Γ in ; e Γ exec := Γ ex ;ec coLB ( γ ) , ec coUB ( γ ) ∈ [0 , m UB ] , γ ∈ Γ co ,ec inLB ( γ ) , ec inUB ( γ ) ∈ [0 , n ∗ ] , γ ∈ Γ in ,ec exLB ( γ ) , ec exUB ( γ ) ∈ [0 , n ∗ ] , γ ∈ Γ ex :lower and upper bounds on the number of core-edges e = uv (resp., ρ -internal edges and ρ -external edges e = ( u, v )) with α ( u ) = a , α ( v ) = b and β ( e ) = m ; variables: ec co ([ γ ] co ) ∈ [ec coLB ( γ ) , ec coUB ( γ )] , γ ∈ Γ co ,ec in ([ γ ] in ) ∈ [ec inLB ( γ ) , ec inUB ( γ )] , γ ∈ Γ in ,ec ex ([ γ ] ex ) ∈ [ec exLB ( γ ) , ec exUB ( γ )] , γ ∈ Γ ex :the number of core-edges (resp., ρ -internal edges, ρ -external edges and non-core-edges)with edge-configuration γ ;ec C ([ γ ] co ) ∈ [0 , m C ] , γ ∈ e Γ Cec , ec T ([ γ ] co ) ∈ [0 , t T ] , γ ∈ e Γ Tec , ec F ([ γ ] in ) ∈ [0 , t F ] , γ ∈ e Γ Fec :the number of core-edges e C ∈ E C (resp., core-edges e T ∈ E T and ρ -internal edges e F ∈ E F )with edge-configuration γ ;ec X( p ) ([ γ ] ex ) ∈ [1 , n X ] , γ ∈ e Γ exec , p ∈ [1 , ρ ] , X ∈ { C , T , F } :the number of ρ -external edges with depth p in a rooted tree X i ;ec CT ([ γ ] co ) ∈ [0 , min { k C , t T } ] , γ ∈ e Γ CTec , ec TC ([ γ ] co ) ∈ [0 , min { k C , t T } ] , γ ∈ e Γ CTec ,ec CF ([ γ ] in ) ∈ [0 , e t C ] , γ ∈ e Γ CFec , ec TF ([ γ ] in ) ∈ [0 , t T ] , γ ∈ e Γ TFec : the number of core-edges e CT ∈ E CT (resp., core-edges e TC ∈ E TC and ρ -internal edges e CF ∈ E CF and e TF ∈ E TF )with edge-configuration γ ; δ Cec ( i, [ γ ] co ) ∈ [0 , , i ∈ [ f k C + 1 , m C ] = I ( ≥ ∪ I (0 / ∪ I (=1) , γ ∈ e Γ Cec , δ Tec ( i, [ γ ] co ) ∈ [0 , , i ∈ [2 , t T ] , γ ∈ e Γ Tec , δ Fec ( i, [ γ ] in ) ∈ [0 , , i ∈ [2 , t F ] , γ ∈ e Γ Fec : δ Xec ( i, [ γ ] t ) = 1 ⇔ edge e X i has edge-configuration γ ; δ Xec ( i, j, [ γ ] ex ) ∈ [0 , , i ∈ [1 , t X ] , j ∈ [1 n X ] , γ ∈ e Γ exec , X ∈ { C , T , F } : H-cyclic arXiv v5: December 4, 2020 δ Xec ( i, j, [ γ ] ex ) = 1 ⇔ ρ -external edge e X i,j has edge-configuration γ ; δ CTec ( k, [ γ ] co ) , δ TCec ( k, [ γ ] co ) ∈ [0 , , k ∈ [1 , k C ] = I ( ≥ ∪ I ( ≥ , γ ∈ e Γ CTec : δ CTec ( k, [ γ ] co ) = 1 (resp., δ TCec ( k, [ γ ] co ) = 1) ⇔ edge e CTtail C ( k ) ,j (resp., e TChead C ( k ) ,j )for some j ∈ [1 , t T ] has edge-configuration γ ; δ CFec ( c, [ γ ] in ) ∈ [0 , , c ∈ [1 , e t C ] , γ ∈ e Γ CFec : δ CFec ( c, [ γ ] in ) = 1 ⇔ edge e CF c,i for some i ∈ [1 , t F ] has edge-configuration γ ; δ TFec ( i, [ γ ] in ) ∈ [0 , , i ∈ [1 , t T ] , γ ∈ e Γ TFec : δ TFec ( i, [ γ ] in ) = 1) ⇔ edge e TF i,j for some j ∈ [1 , t F ] has edge-configuration γ ;deg CTT ( k ) , deg TCT ( k ) ∈ [0 , , k ∈ [1 , k C ]: deg G ( v ) of the edge ( v Ctail( k ) , v ) ∈ E CT (resp., ( v, v Chead( k ) ) ∈ E TC ) if any;deg CFF ( c ) ∈ [0 , , c ∈ [1 , e t C ]: deg G ( v ) of the edge ( v C c, , v ) ∈ E CF if any;deg TFF ( i ) ∈ [0 , , i ∈ [1 , t T ]: deg G ( v ) of the edge ( v T i, , v ) ∈ E TF if any;∆ C+ec ( i ) , ∆ C − ec ( i ) , ∈ [0 , , i ∈ [ f k C + 1 , m C ],∆ T+ec ( i ) , ∆ T − ec ( i ) ∈ [0 , , i ∈ [2 , t T ],∆ F+ec ( i ) , ∆ F − ec ( i ) ∈ [0 , , i ∈ [2 , t F ]:∆ X+ec ( i ) = ∆ X − ec ( i ) = 0 (resp., ∆ X+ec ( i ) = deg G ( u ) and ∆ X − ec ( i ) = deg G ( v )) ⇔ edge e X i = ( u, v ) ∈ E X is used in G (resp., e X i E ( G ));∆ X+ec ( i, j ) , ∆ X − ec ( i, j ) ∈ [0 , , i ∈ [1 , t X ] , j ∈ [1 , n X ] , X ∈ { C , T , F } :∆ X+ec ( i, j ) = ∆ X − ec ( i, j ) = 0 (resp., ∆ X+ec ( i, j ) = deg G ( u ) and ∆ X − ec ( i, j ) = deg G ( v )) ⇔ ρ -external edge e X i,j = ( u, v ) is used in G (resp., e X i,j E ( G ));∆ CT+ec ( k ) , ∆ CT − ec ( k ) ∈ [0 , , k ∈ [1 , k C ] = I ( ≥ ∪ I ( ≥ :∆ CT+ec ( k ) = ∆ CT − ec ( k ) = 0 (resp., ∆ CT+ec ( k ) = deg G ( u ) and ∆ CT − ec ( k ) = deg G ( v )) ⇔ edge e CTtail C ( k ) ,j = ( u, v ) ∈ E CT for some j ∈ [1 , t T ] is used in G (resp., otherwise);∆ TC+ec ( k ) , ∆ TC − ec ( k ) ∈ [0 , , k ∈ [1 , k C ] = I ( ≥ ∪ I ( ≥ : Analogous with ∆ CT+ec ( k ) and ∆ CT − ec ( k );∆ CF+ac ( c ) , ∆ CF − ec ( c ) ∈ [0 , , c ∈ [1 , e t C ]:∆ CF+ec ( c ) = ∆ CF − ec ( c ) = 0 (resp., ∆ CF+ec ( c ) = deg G ( u ) and ∆ CF − ec ( c ) = deg G ( v )) ⇔ edge e CF c,j = ( u, v ) ∈ E CF for some j ∈ [1 , t F ] is used in G (resp., otherwise);∆ TF+ec ( i ) , ∆ TF − ec ( i ) ∈ [0 , , i ∈ [1 , t T ]: Analogous with ∆ CF+ec ( c ) and ∆ CF − ec ( c ); constraints: H-cyclic arXiv v5: December 4, 2020 C ([ γ ] co ) = 0 , γ ∈ Γ co \ e Γ Cec , ec T ([ γ ] co ) = 0 , γ ∈ Γ co \ e Γ Tec , ec F ([ γ ] in ) = 0 , γ ∈ Γ in \ e Γ Fec , ec X( p ) ([ γ ] ex ) = 0 , γ ∈ Γ ex \ e Γ exec ,p ∈ [1 , ρ ] , X ∈ { C , T , F } , ec CT ([ γ ] co ) = 0 , γ ∈ Γ co \ e Γ CTec , ec TC ([ γ ] co ) = 0 , γ ∈ Γ co \ e Γ TCec , ec CF ([ γ ] in ) = 0 , γ ∈ Γ in \ e Γ CFec , ec TF ([ γ ] in ) = 0 , γ ∈ Γ in \ e Γ TFec , (89) X ( µ,µ ′ ,m )= γ ∈ Γ co ec C ([ γ ] co ) = X i ∈ [ f k C +1 ,m C ] δ C β ( i, m ) , m ∈ [1 , , X ( µ,µ ′ ,m )= γ ∈ Γ co ec T ([ γ ] co ) = X i ∈ [2 ,t T ] δ T β ( i, m ) , m ∈ [1 , , X ( µ,µ ′ ,m )= γ ∈ Γ in ec F ([ γ ] in ) = X i ∈ [2 ,t F ] δ F β ( i, m ) , m ∈ [1 , , X ( µ,µ ′ ,m )= γ ∈ Γ ex ec X( p ) ([ γ ] ex ) = X i ∈ [1 ,t X ] ,j ∈ Dsn X ( p ) δ X β ( i, j, m ) , p ∈ [1 , ρ ] , X ∈ { C , T , F } , m ∈ [1 , , X ( µ,µ ′ ,m )= γ ∈ Γ co ec CT ([ γ ] co ) = X k ∈ [1 ,k C ] δ + β ( k, m ) , m ∈ [1 , , X ( µ,µ ′ ,m )= γ ∈ Γ co ec TC ([ γ ] co ) = X k ∈ [1 ,k C ] δ − β ( k, m ) , m ∈ [1 , , X ( µ,µ ′ ,m )= γ ∈ Γ in ec CF ([ γ ] in ) = X c ∈ [1 , f t C ] δ in β ( c, m ) , m ∈ [1 , , X ( µ,µ ′ ,m )= γ ∈ Γ in ec TF ([ γ ] in ) = X c ∈ [ f t C +1 ,c F ] δ in β ( c, m ) , m ∈ [1 , , (90) H-cyclic arXiv v5: December 4, 2020 X γ =( a d, b d ′ ,m ) ∈ e Γ Cec [( a , b , m )] co · δ Cec ( i, [ γ ] co ) = X ν ∈ e Γ Cac [ ν ] co · δ Cac ( i, [ ν ] co ) , ∆ C+ec ( i ) + X γ =( a d,ξ,m ) ∈ e Γ Cec d · δ Cec ( i, [ γ ] co ) = deg C (tail C ( i ) , , ∆ C − ec ( i ) + X γ =( µ, b d,m ) ∈ e Γ Cec d · δ Cec ( i, [ γ ] co ) = deg C (head C ( i ) , , ∆ C+ec ( i ) + ∆ C − ec ( i ) ≤ − e C ( i )) , i ∈ [ f k C + 1 , m C ] , X i ∈ [ f k C +1 ,m C ] δ Cec ( i, [ γ ] co ) = ec C ([ γ ] co ) , γ ∈ e Γ Cec , (91) X γ =( a d, b d ′ ,m ) ∈ e Γ Tec [( a , b , m )] co · δ Tec ( i, [ γ ] co ) = X ν ∈ e Γ Tac [ ν ] co · δ Tac ( i, [ ν ] co ) , ∆ T+ec ( i ) + X γ =( a d,ξ,m ) ∈ e Γ Tec d · δ Tec ( i, [ γ ] co ) = deg T ( i − , , ∆ T − ec ( i ) + X γ =( µ, b d,m ) ∈ e Γ Tec d · δ Tec ( i, [ γ ] co ) = deg T ( i, , ∆ T+ec ( i ) + ∆ T − ec ( i ) ≤ − e T ( i )) , i ∈ [2 , t T ] , X i ∈ [2 ,t T ] δ Tec ( i, [ γ ] co ) = ec T ([ γ ] co ) , γ ∈ e Γ Tec , (92) X γ =( a d, b d ′ ,m ) ∈ e Γ Fec [( a , b , m )] in · δ Fec ( i, [ γ ] in ) = X ν ∈ e Γ Fac [ ν ] in · δ Fac ( i, [ ν ] in ) , ∆ F+ec ( i ) + X γ =( a d,ξ,m ) ∈ e Γ Fec d · δ Fec ( i, [ γ ] in ) = deg F ( i − , , ∆ F − ec ( i ) + X γ =( µ, b d,m ) ∈ e Γ Fec d · δ Fec ( i, [ γ ] in ) = deg F ( i, , ∆ F+ec ( i ) + ∆ F − ec ( i ) ≤ − e F ( i )) , i ∈ [2 , t F ] , X i ∈ [2 ,t F ] δ Fec ( i, [ γ ] in ) = ec F ([ γ ] in ) , γ ∈ e Γ Fec , (93) H-cyclic arXiv v5: December 4, 2020 X γ =( a d, b d ′ ,m ) ∈ e Γ exec [( a , b , m )] ex · δ Xec ( i, j, [ γ ] ex ) = X ν ∈ e Γ exac [ ν ] ex · δ Xac ( i, j, [ ν ] ex ) , ∆ X+ec ( i, j ) + X γ =( a d,ξ,m ) ∈ e Γ exec d · δ Xec ( i, j, [ γ ] ex ) = deg X ( i, prt X ( j )) , ∆ X − ec ( i, j ) + X γ =( µ, b d,m ) ∈ e Γ exec d · δ Xec ( i, j, [ γ ] ex ) = deg X ( i, j ) , ∆ X+ec ( i, j ) + ∆ X − ec ( i, j ) ≤ − v X ( i, j )) , i ∈ [1 , t X ] , j ∈ [1 , n X ] , X i ∈ [1 ,t X ] ,j ∈ Dsn X ( p ) δ Xec ( i, j, [ γ ] ex ) = ec X( p ) ([ γ ] ex ) , γ ∈ e Γ exec , p ∈ [1 , ρ ] , X ∈ { C , T , F } , (94)deg T ( i,
0) + 4(1 − χ T ( i, k ) + e T ( i )) ≥ deg CTT ( k ) , deg CTT ( k ) ≥ deg T ( i, − − χ T ( i, k ) + e T ( i )) , i ∈ [1 , t T ] , X γ =( a d, b d ′ ,m ) ∈ e Γ CTec [( a , b , m )] co · δ CTec ( k, [ γ ] co ) = X ν ∈ e Γ CTac [ ν ] co · δ CTac ( k, [ ν ] co ) , ∆ CT+ec ( k ) + X γ =( a d,ξ,m ) ∈ e Γ CTec d · δ CTec ( k, [ γ ] co ) = deg C (tail C ( k ) , , ∆ CT − ec ( k ) + X γ =( µ, b d,m ) ∈ e Γ CTec d · δ CTec ( k, [ γ ] co ) = deg CTT ( k ) , ∆ CT+ec ( k ) + ∆ CT − ec ( k ) ≤ − δ T χ ( k )) , k ∈ [1 , k C ] , X k ∈ [1 ,k C ] δ CTec ( k, [ γ ] co ) = ec CT ([ γ ] co ) , γ ∈ e Γ CTec , (95)deg T ( i,
0) + 4(1 − χ T ( i, k ) + e T ( i + 1)) ≥ deg TCT ( k ) , deg TCT ( k ) ≥ deg T ( i, − − χ T ( i, k ) + e T ( i + 1)) , i ∈ [1 , t T ] , X γ =( a d, b d ′ ,m ) ∈ e Γ TCec [( a , b , m )] co · δ TCec ( k, [ γ ] co ) = X ν ∈ e Γ TCac [ ν ] co · δ TCac ( k, [ ν ] co ) , ∆ TC+ec ( k ) + X γ =( a d,ξ,m ) ∈ e Γ TCec d · δ TCec ( k, [ γ ] co ) = deg TCT ( k ) , ∆ TC − ec ( k ) + X γ =( µ, b d,m ) ∈ e Γ TCec d · δ TCec ( k, [ γ ] co ) = deg C (head C ( k ) , , ∆ TC+ec ( k ) + ∆ TC − ec ( k ) ≤ − δ T χ ( k )) , k ∈ [1 , k C ] , X k ∈ [1 ,k C ] δ TCec ( k, [ γ ] co ) = ec TC ([ γ ] co ) , γ ∈ e Γ TCec , (96) H-cyclic arXiv v5: December 4, 2020 F ( i,
0) + 4(1 − χ F ( i, c ) + e F ( i )) ≥ deg CFF ( c ) , deg CFF ( c ) ≥ deg F ( i, − − χ F ( i, c ) + e F ( i )) , i ∈ [1 , t F ] , X γ =( a d, b d ′ ,m ) ∈ e Γ CFec [( a , b , m )] in · δ CFec ( c, [ γ ] in ) = X ν ∈ e Γ CFac [ ν ] in · δ CFac ( c, [ ν ] in ) , ∆ CF+ec ( c ) + X γ =( a d,ξ,m ) ∈ e Γ CFec d · δ CFec ( c, [ γ ] in ) = deg C ( c, , ∆ CF − ec ( c ) + X γ =( µ, b d,m ) ∈ e Γ CFec d · δ CFec ( c, [ γ ] in ) = deg CFF ( c ) , ∆ CF+ec ( c ) + ∆ CF − ec ( c ) ≤ − δ F χ ( c )) , c ∈ [1 , e t C ] , X c ∈ [1 , f t C ] δ CFec ( c, [ γ ] co ) = ec CF ([ γ ] co ) , γ ∈ e Γ CFec , (97)deg F ( j,
0) + 4(1 − χ F ( j, i + e t C ) + e F ( j )) ≥ deg TFF ( i ) , deg TFF ( i ) ≥ deg F ( j, − − χ F ( j, i + e t C ) + e F ( j )) , j ∈ [1 , t F ] , X γ =( a d, b d ′ ,m ) ∈ e Γ TFec [( a , b , m )] in · δ TFec ( i, [ γ ] in ) = X ν ∈ e Γ TFac [ ν ] in · δ TFac ( i, [ ν ] in ) , ∆ TF+ec ( i ) + X γ =( a d,ξ,m ) ∈ e Γ TFec d · δ TFec ( i, [ γ ] in ) = deg T ( i, , ∆ TF − ec ( i ) + X γ =( µ, b d,m ) ∈ e Γ TFec d · δ TFec ( i, [ γ ] in ) = deg TFF ( i ) , ∆ TF+ec ( i ) + ∆ TF − ec ( i ) ≤ − δ F χ ( i + e t C )) , i ∈ [1 , t T ] , X i ∈ [1 ,t T ] δ TFec ( i, [ γ ] co ) = ec TF ([ γ ] co ) , γ ∈ e Γ TFec , (98) X X ∈{ C , T , CT , TC } (ec X ([ γ ] co ) + ec X ([ γ ] co )) = ec co ([ γ ] co ) , γ ∈ Γ co < , X X ∈{ C , T , CT , TC } ec X ([ γ ] co ) = ec co ([ γ ] co ) , γ ∈ Γ co= , X X ∈{ F , CF , TF } ec X ([ γ ] co ) = ec in ([ γ ] co ) , γ ∈ Γ in , X p ∈ [1 ,ρ ] , X ∈{ T , C , F } ec X( p ) ([ γ ] co ) = ec ex ([ γ ] co ) , γ ∈ Γ ex ..