Efficient Network Reliability Computation in Uncertain Graphs
aa r X i v : . [ c s . D S ] S e p Efficient Network Reliability Computationin Uncertain Graphs
Yuya Sasaki † , Yasuhiro Fujiwara §† , Makoto Onizuka † † Graduate School of Information Science and Technology, Osaka University, Osaka, Japan § NTT Software Innovation Center, Tokyo, [email protected],[email protected],[email protected]
ABSTRACT
Network reliability is an important metric to evaluate the con-nectivity among given vertices in uncertain graphs. Since thenetwork reliability problem is known as strat-ified sampling . We theoretically guarantee that our approach im-proves the accuracy of approximation by using lower and upperbounds of network reliability, even though it reduces the num-ber of samples. To efficiently compute the bounds, we developan extended BDD, called S BDD . During constructing the S BD-D, our approach employs dynamic programming for efficientlysampling possible graphs. Our experiment with real datasets demon-strates that our approach is up to 51.2 times faster than existingsampling-based approach with a higher accuracy.
To understand and design our world, we need to model and an-alyze relationships between objects. Objects and relationshipscan be modeled by a graph, whose vertices and edges representthe objects and the relationships, respectively. Graph analysis iswidely used in many domains, and the reachability [8, 34, 37] and network reliability [5, 10, 33] are the fundamental research top-ics in graph analysis. Reachability techniques compute whetherthere are paths between two terminals (i.e., given vertices). Onthe other hand, network reliability techniques compute a proba-bility that all pairs of terminals are connected in uncertain graphs .In an uncertain graph, each edge is associated with an edge ex-istence probability to quantify the likelihood that the edge ex-ists in the graph. Network reliability is more generalized thanreachability in terms of two aspects (1) a probabilistic value (thereachability is binary) and (2) the number of terminals. Thus, net-work reliability techniques have two benefits over reachabilitytechniques. First, we can handle the inherent uncertainty of re-lationships in the real-world by modeling the uncertainty as theedge existence probability [1, 23]. Second, we can flexibly spec-ify arbitrary numbers of terminals. From the above two bene-fits, the network reliability can be widely used for the uncertaingraph analysis [6, 36] and many practical applications [20]. Forexample, protein-protein interaction networks can be modeledby uncertain graphs since protein interactions are not alwaysestablished due to the sensitivity to conditions [4, 17]. In suchprotein-protein interaction networks, analysts evaluate the net-work reliability among several proteins as the strengths of the © 2019 Copyright held by the owner/author(s). Published in Proceedings of the22nd International Conference on Extending Database Technology (EDBT), March26-29, 2019, ISBN 978-3-89318-081-3 on OpenProceedings.org.Distribution of this paper is permitted under the terms of the Creative Commonslicense CC-by-nc-nd 4.0. deba c e e e e e e (a) Original graph deba c (b) Possible graphs deba c deba c Figure 1: Uncertain graph relationships to elucidate the functions of proteins. The networkreliability is also used in many domains such as communicationnetworks [5, 29] and urban planning [13].Unfortunately, the computation cost of the network reliabil-ity is significantly large because it is possible graphs which have the same set of ver-tices and an arbitrary subset of the edges without their proba-bilities. Each possible graph has its probability computed fromthe existence probabilities of its edges. A set of possible graphsis logically equivalent with its original uncertain graph. To com-pute the network reliability, we sum up the probabilities of allpossible graphs in which all the terminals are connected.We explain an example of computation of the network reli-ability by using Figure 1. This figure shows an original uncer-tain graph and three examples of its possible graphs. The blackvertices represent terminals. Let us assume that each edge has0.7 as its existence probability. Since these possible graphs havefour existent and two non-existent edges, their probabilities are0.0216 (i.e., 0 . · ( − . ) ). All these terminals are connectedonly in the left and middle possible graphs. Thus, their probabil-ities are added to the network reliability. Problem Definition and Technical Overview
We approximate the network reliability since the computationcost of the network reliability is significantly large due to
Problem definition : (Approximate network reliability) . Givenan uncertain graph G , a set of terminals T , and the number ofsamples s , we efficiently compute the approximate network reli-ability ˆ R [G , T ] .The computation cost of sampling becomes considerable asthe number of samples increases. To efficiently approximate thenetwork reliability, we reduce the number of samples with keep-ing a high accuracy. Our challenges are (1) how to reduce thenumber of samples with a theoretical guarantee of the accuracyand (2) how to practically achieve the theoretical results fromthe first challenge. As for the first challenge, we extend the strat-ified sampling [32], which increases the accuracy of an estimatedvalue by using the lower and upper bounds of the value. We firstrove a theorem that we reduce the number of samples withoutsacrificing the accuracy of approximation.We can reduce the number of samples in accordance withthe theoretical results. The theoretical results have two require-ments; (1) to efficiently compute the approximate network re-liability, we need to efficiently obtain the tight lower and up-per bounds of the network reliability and (2) to guarantee theapproximation of accuracy, we need to sample possible graphsfrom the set of possible graphs that are not used to compute thebounds. There are no trivial techniques to effectively achievethem. Therefore, we develop an extended binary decision dia-gram , which we call scalable and sampling BDD ( S BDD for short).The S BDD enables preferentially searching for possible graphsin which terminals are connected/disconnected. The connectedand disconnected possible graphs are used for computing thelower and upper bounds. Our approach employs dynamic pro-gramming during constructing the S BDD for efficiently sam-pling the possible graphs. It enables avoiding sampling possiblegraphs from the set of possible graphs that are used to computethe bounds.Furthermore, our approach becomes more efficient by reduc-ing the size of graphs. Thus, we propose an extension techniqueof our approach which uses 2-edge connected components [7].The extension technique prunes vertices and edges that do notaffect the network reliability, decomposes the graph to severalsubgraphs, and transforms the subgraphs into a smaller graphs.It efficiently reduces the vertices and edges involved in the com-putation while preserving the network reliability.
Contributions and Organization
To the best of our knowledge, our approach is the first solutionto achieve both high efficiency and accuracy to compute the net-work reliability. Our approach has the following attractive char-acteristic. • Our approach improves the efficiency to compute an ap-proximate network reliability by reducing the number ofsamples. The extension technique effectively reduces thesize of graphs while preserving the network reliability. • Our approach outputs more accurate network reliabilitythan the existing approaches. We theoretically guaranteethat our approach improves the accuracy of approxima-tion, even though it reduces the number of samples. • Our approach computes the exact answer for small-scalegraphs due to the S BDD though the existing sampling-based approach cannot compute the exact answer. • Our approach can be used to improve the performanceson uncertain graph analyses [6, 18, 22] in terms of both ac-curacy and efficiency because many algorithms computethe network reliability by sampling techniques.The remainder of this paper is organized as follows. Section 2introduces related work. Section 3 then describes the prelimi-naries. Sections 4 and 5 present our approach and an extensiontechnique for our approach, respectively. Section 6 describes al-gorithms of our approach with the extension. Section 7 showsthe results obtained from the experiments, and Section 8 con-cludes the paper.
Querying and mining uncertain graphs have recently attractedmuch attention in the database and data mining research com-munities. We review some relevant works related to the networkreliability problem.
Network reliability:
For computing the network reliabil-ity, several approaches have been proposed such as cut-basedapproach and BDD-based approach. The cut-based [3, 15, 25]approach enumerates all cuts which are divides the terminalsand then computes the network reliability by using the set ofcuts. Harris and Srinivasan [15] proposed theoretical result to ob-tain the lower bound of network relaibility based on cuts. How-ever, they do not mention how to efficiently obtain the cuts. TheBDD-based approach is more efficient than the cut-based ap-proach. The BDD-based approach [14, 26, 35] effectively avoidsenumerating all possible graphs without sacrificing the exact-ness of the network reliability. However, it cannot be applicableto large graphs due to the large memory usage. The BDD-basedapproach first constructs a BDD, and then obtains the possiblegraphs in which terminals are connected by traversing the BDD.Recent work has shown that the BDD-based approach can be ap-plied only to graphs with 100–200 edges because of limitations ofmemory space [14, 26]. The state-of-the-art library TdZDD alsocan only be applied to very small-scale graphs. Herrmann andSoh [16] proposed a memory-efficient BDD that computes thenetwork reliability by constructing a BDD and deleting unneces-sary parts of it during the process. We partially adopt their ideato reduce the memory usage. There are several preprocessingand indexing techniques to efficiently compute the network reli-ability (and similar problems) [12, 24]. These techniques removeredundant parts of graphs, which have similar idea of our exten-sion technique. However, these techniques cannot directly ap-ply to k -terminal reliability. To the best of our knowledge, therehas been no prior work on approximating the network reliabilitywith BDD. Reachability query in uncertain graphs:
The reachabilityin uncertain graphs is a special type of network reliability (called s - t network reliability) [2]. Jin et al. [19] proposed a distance-constraint reachability query in uncertain graphs, which answersthe probability that the distance from one vertex to another isless than or equal to a threshold. They proposed approximatealgorithms as solutions to this problem. The approximate algo-rithms use unequal sampling techniques [31], and achieves higheraccuracy than Monte Carlo sampling. Cheng et al. [9] proposedan algorithm to compute the reachability in distributed environ-ments. The algorithm reduces the size of graphs without sacrific-ing the exactness of the result before computing the reachability.It divides the graph into several subgraphs and computes proba-bilities of the subgraphs in distributed environments. The algo-rithm is only applicable to directed acyclic graphs. While thesealgorithms [9, 19] deal with uncertain graphs, their objective isto compute reachability and their algorihms cannot be appliedto computing the network reliability. Other problems with uncertain graphs:
Many existingworks in uncertain graphs use the network reliability as the met-ric to evaluate the connectivity among vertices. The efficiencyand accuracy of their algorithms depend on those of the sam-pling techniques. Although they use the sampling technique tocompute the network reliability, they have not proposed efficientsampling techniques. Jin et al. [18] proposed an algorithm for https://github.com/kunisura/TdZdd able 1: Notations Symbol Meaning G Uncertain graph V Set of vertices E Set of edges e = ( v , v ′ ) p ( e ) Edge existence probability of eG p Possible graph E p Set of edges in G p Pr [ G p ] Existence probability of G p G E Intermediate graph E ∃ Set of existent edges in G E E ¬ Set of non-existent edges in G E Pr [G E ] Existence probability of G E T Set of terminals R [G , T ] Network reliability of G for T ˆ R [G , T ] Approximate network reliability of G for T k The number of terminals w Maximum size of BDD F l Set of frontiers at layer l | · | The number of elements in a setfinding reliable subgraphs in which the vertices are connectedwith a higher probability than a given threshold. Ceccarello etal. [6] proposed clustering techniques for uncertain graphs. Thetechnique uses the network reliabilities between vertices as dis-tances between them. Khan et al. [22] proposed a reliability searchthat returns a set of vertices that are connected from given ver-tices with a higher probability than the threshold. These studieshave different purposes, but they use the Monte Carlo samplingto compute the network reliability. Our approach can be usedto improve their performances in terms of both accuracy andefficiency instead of using the Monte Carlo sampling.
As preliminaries of our approach, we explain uncertain graphand network reliability. Table 1 summarizes the notations.
Let G = ( V , E , p ) be a connected and undirected uncertain graph,where V is a set of vertices, E ⊆ V × V is a set of uncertain edges,and p : E → ( , ] is a function that determines the edge exis-tence probability p ( e ) of uncertain edge e ∈ E in the graph. Wedenote edge e ∈ E between v and v ′ as e = ( v , v ′ ) . A state of un-certain edge e is existent with a probability p ( e ) or non-existent with a probability ( − p ( e )) . We assume that edge existence prob-abilities of different edges are independent of one another [6, 19].A possible graph G p = ( V , E p ) is a graph that contains a setof vertices and a subset of edges of G without their edge exis-tence probabilities. Edges in E \ E p are non-existent in the possi-ble graph. Although edges in possible graphs have no probabil-ities, the possible graphs themselves have existent probabilities.The existent probability Pr [ G p ] of possible graph G p is as fol-lows: Pr [ G p ] = Î e ∈ E p p ( e ) · Î e ∈ E \ E p ( − p ( e )) . The total number of the possible graphs of G is 2 | E | becauseeach edge is either existent or non-existent. We define W G asall possible graphs obtained from G .We define an intermediate graph G E ( E ∃ , E ¬ ) , which is an un-certain graph with the set of existent edges E ∃ , the set of non-existent edges E ¬ , and the set of uncertain edges E \( E ∃ ∪ E ¬ ) . The existent probability Pr [G E ( E ∃ , E ¬ )] of the intermediate graph G E ( E ∃ , E ¬ ) is as follows: Pr [G E ( E ∃ , E ¬ )] = Î e ∈ E ∃ p ( e ) · Î e ∈ E ¬ ( − p ( e )) . We simply use Pr [G E ] as Pr [G E ( E ∃ , E ¬ )] . We define W G E as allpossible graphs obtained from G E . The total number of the possi-ble graphs of G E ( E ∃ , E ¬ ) is 2 | E \( E ∃ ∪ E ¬ )| . We define that verticesare connected in intermediate graphs if there are paths amongthe vertices by existent edges, and vertices are disconnected ifthere are no paths among the vertices by existent and uncertainedges. Note that it is unsure to be connected or disconnectedeven if there are paths among the vertices by uncertain edges. The network reliability is computed by summing up the proba-bilities of all possible graphs in which all terminals (a subset ofvertices) are connected. The definition is as follows:
Definition 1 (Network reliability).
Given a set of k termi-nals T and an uncertain graph G , the network reliability R [G , T ] is R [G , T ] = Í G p ∈ W G I ( G p , T ) · Pr [ G p ] , (1) where G p denotes a possible graph, and I ( G p , T ) is an indicatorfunction that returns one if all terminals in T are connected in G p ,and returns zero, otherwise. We denote by ˆ R [G , T ] the approximate network reliability. Wesimply use R and ˆ R as R [G , T ] and ˆ R [G , T ] for the given uncertaingraph and terminals, respectively.The network reliability with k terminals is called the k -terminalreliability , and it is known as the most generalized network reli-ability [14]. The network reliability problem is P = N P .BDD [14] and sampling [19] are main techniques to computethe network reliability. BDD-based approach can compute theexact answer in small-scale graphs, while sampling-based ap-praoch can compute approximate answers in large-scale graphs.
A BDD D = ( N , A ) is a di-rected acyclic graph with sets of nodes N and arcs A . Figure 2(a)shows the BDD to compute the network reliability of the originalgraph in Figure 1. Nodes in the BDD correspond to intermediategraphs, and arcs in the BDD correspond to existent/non-existentedges. The BDD has a single node that has no incoming arcs,called the root node (node G in Figure 2(a)). Each node has twooutgoing arcs, called the and (represented by dashedand solid arrows in Figure 2(a), respectively). 0-arcs and 1-arcsindicate that edges are non-existent and existent in the uncertaingraph, respectively. Each arc is associated with a weight that rep-resents the existent or non-existent probability of the edge. Wedefine layer l (≥ ) as the depth from the root node. The nodes atlayer l of the BDD correspond to the intermediate graphs whoseedges e , . . . , e l − are existent/non-existent and the other edges e l , . . . , e | E | are uncertain. The BDD has special nodes that haveno outgoing arcs, called sink nodes . The sink nodes are of twotypes, called and (represented by rectangles withlabels 1 and 0 in Figure 2(a), respectively). If the terminals inthe intermediate graph are connected and disconnected, the arcs To avoid confusion, we use the terms “vertex” and “edge” to refer to a vertex andan edge in an uncertain graph, respectively, and “node” and “arc” to refer to a vertexand an edge in a BDD, respectively. e e (a) BDD (b) Intermediate graphs at layer 3 da e e e e e da e e e e e e da e e e e G G G G G G G G G Layer 1Layer 2Layer 3Layer 4Layer 5Layer 6Sink nodes eb c eb c eb c
Intermediate graph corresponding to G Intermediate graph corresponding to G Intermediate graph corresponding to G Figure 2: BDD for the original graph on Figure 1(a). point at the 1-sink and 0-sink, respectively. We can obtain inter-mediate graphs in which terminals are connected by traversingthe BDD from the root node to the 1-sink.To construct the BDD, the frontier-based method is a commonprocedure [21, 26]. This method first orders edges ( e , . . . , e | E | ).It generates the nodes on layer l + e l when a BDD is already constructed until layer l . In the frontier-based method, a vertex that has both existent/non-existent anduncertain edges are called a frontier f , and we denote by F l theset of frontiers at layer l . Figure 2(b) shows intermediate graphsafter processing e and e , where solid black, dashed black, anddashed gray lines denote existent, non-existent, and uncertainedges, respectively. These intermediate graphs correspond to G , G , and G in the BDD from the top, respectively. Vertices b and c are frontiers because they have both existent/non-existent anduncertain edges. Note that nodes at the same layer l have thesame set of frontiers F l . The frontier-based method maintainsseveral attributes on only the frontiers (e.g., the number of un-certain edges and the number of terminals connected to the fron-tiers). It merges the nodes if the attributes are the same. Thus,the frontier-based method can effectively reduce the number ofnodes.The size of the BDD is defined by the number of nodes in theBDD [14]. Generally, it exponentially increases as the numberof edges in the uncertain graphs increases. As the size of theBDD increases, both of the computation cost and the memoryusage increase. Thus, it is hard to compute the exact networkreliability. Sampling is a basic approach for computingthe approximate network reliability [9, 18, 19]. Given the num-ber of samples s , the sampling-based approach repeats the fol-lowing procedures s times: (1) picking a possible graph of G asa sample, G p i ( ≤ i ≤ s ) according to the probabilities Pr [ G p i ] from W G and then (2) computing whether all the terminals areconnected or not in G p i . The time complexity of the sampling-based approach is O ( s · (| V | + | E |)) . This is because it requires O (| E |) time to determine the states of all edges and O (| V | + | E |) time to compute the connectivity by a depth first search for eachsample.The accuracy of the sampling-based approach is evaluated byits variance. Since the sampling-based approach is a randomizedalgorithm [28], the average network reliability is most likely tobe closest to the exact network reliability. A small variance indi-cates a small rate of error (i.e., high accuracy). Note that unbiasedsampling is necessary that samples possible graphs according totheir probabilities for guaranteeing the theoretical variance. As the number of samples increases, the variance decreases but thecomputation cost increases. Therefore, there is a trade-off be-tween the accuracy and the computation cost.The stratified sampling is known as a successful method in thefield of statistics [32]. The stratified sampling divides the popu-lation into subgroups and individually picks samples from eachsubgroup. The variance of the estimated value for the whole pop-ulation are the sum of the variances of the estimated values forindividual subgroups. Let L be the number of subgroups and R i be the estimated total probabilities of possible graphs for sub-group i . The estimated network reliability is computed by sum-ming up the total probabilities for the subgroups as follows:ˆ R = Í Li = ˆ R i . The variance is the sum of the individual variances for the sub-groups as follows:
V ar [ ˆ R ] = Í Li = V ar [ ˆ R i ] . When we compute the exact values for the subgroups, the vari-ances of the estimated network reliability for the subgroup be-come zero. Thus, when we compute the exact values for the sub-groups, the variance of the estimated network reliability for thewhole population decreases.
In this paper, we solve the problem of the approximate networkreliability. Section 4.1 provides an overview of our approach. Sec-tion 4.2 explains how to reduce the number of samples. Section4.3 presents our extended BDD S BDD.
Our approach efficiently and accurately computes the approxi-mate network reliability. We achieve high efficiency and accu-racy with the following ideas: • Reduction of the number of samples : Our approachsignificantly reduces the number of samples with keepinga high accuracy of approximation by using the lower andupper bounds of the network reliability. • Efficient computation of the bounds of network re-liability : We develop the S BDD to efficiently computethe bounds of the network reliability. • Dynamic programming : During constructing S BDD,we employ dynamic programming for efficiently samplingpossible graphs.Our approach reduces the number of samples in accordancewith the stratified sampling. We theoretically guarantee that thenumber of samples becomes small as the lower and upper boundsbecome tight without sacrificing the accuracy of approximation.We prove it in two representative estimators; Monte Carlo andHorvitz-Thompson estimators [32].For achieving the theoretical result, we compute the lowerand upper bounds by constructing the S BDD. We specify themaximum size w of S BDD for avoiding a large cost to constructthe S BDD. Our approach deletes nodes on the S BDD when itssize exceeds w . To effectively delete nodes, we define a heuris-tic function for preferentially keeping high-priority nodes in theS BDD; the priorities are computed from the possibilities of im-proving the bounds. The S BDD enables efficiently computingthe bounds because nodes preferentially point at sink nodes.For efficiently sampling possible graphs, our approach em-ploys dynamic programming during constructing the S BDD. Wean straightforwardly employ dynamic programming for sam-pling because sampling possible graphs from intermediate graphsis a sub problem of sampling possible graphs from the originaluncertain graph. We also use the stratified random sampling fordetermining the number of samples for each sub problem. Thestratified random sampling divides the set of possible graphsinto subgroups and samples possible graphs from each subgroup.
In this section, we theoretically prove that our approach reducesthe number of samples while keeping a high accuracy in accor-dance with the stratified sampling [11, 27]. As we mentionedin Section 4.3.3, the accuracy of sampling is evaluated by thevariance of the estimated network reliability. Since the stratifiedsampling reduces the variance of the estimated network reliabil-ity, we can reduce the number of samples without sacrificing theaccuracy of approximation.To apply the stratified sampling, we divide the set W G of pos-sible graphs into three subgroups W G c , W G d , and W G u . W G c and W G d include the sets of only possible graphs in which terminalsare connected and disconnected, respectively. W G u includes theset of possible graphs that are not included in W G c and W G d . Let p c and p d be the sum of the probabilities of possible graphs in W G c and W G d , respectively. Hence, from Definition 1, the upperand lower bounds are given as follows: R = Í G p ∈ W G c Pr [ G p ] + Í G p ∈ W G u I ( G p , T ) Pr [ G p ] = p c + Í G p ∈ W G u I ( G p , T ) Pr [ G p ]≥ p c . R = − Í G p ∈ W G d Pr [ G p ] − Í G p ∈ W G u ( Pr [ G p ] − I ( G p , T ) Pr [ G p ]) = − p d − Í G p ∈ W G u ( Pr [ G p ] − I ( G p , T ) Pr [ G p ])≤ − p d . Consequently, we have p c ≤ R ≤ − p d . We reduce the numberof sample by using the lower bound p c and upper bound 1 − p d .The variance also depends on estimators. In our approach,we exploit two representative estimators; Monte Carlo estima-tor and Horvitz-Thompson estimator. The Monte Carlo estima-tor is a basic technique for computing the average values of thesamples. On the other hand, the Horvitz-Thompson estimator isunequal probability estimator, which provides smaller variancethan the Monte Carlo estimator under sampling without replace-ment. We explain how to reduce the number of samples in thetwo estimators with keeping a high accuracy. Monte Carlo estimator:
The Monte Carlo estimator for R is:ˆ R = Í si = I ( G pi , T ) s . The variance is computed by the following equation [11]:
V ar [ ˆ R ] = R ( − R ) s . Because the random sampling is unbiased, i.e., E ( ˆ R ) = R , thevariance can be simply written as follows [27]: V ar [ ˆ R ] = R ( − R ) s ≈ ˆ R ( − ˆ R ) s . (2)Let V ar [ ˆ R ] ′ be the variance using the upper and lower bounds. V ar [ ˆ R ] ′ is computed in accordance with the stratified samplingas follows [11, 27]: V ar [ ˆ R ] ′ = ( ˆ R − p c )( − p d − ˆ R ) s . (3) From Equations (2) and (3), we obtain the following equation: ˆ R ( − ˆ R ) s ≥ ( ˆ R − p c )( − p d − ˆ R ) s . (4)Therefore, we have V ar [ ˆ R ] ≥ V ar [ ˆ R ] ′ . From Equation (4), weobtain the following theorem: Theorem 1.
Given the number of samples s , the lower bound p c , and the upper bound − p d , the variance of network reliabilityby using Monte Carlo estimator with s ′ (≤ s ) samples is less thanand equal to that with s samples if s ′ is computed by the followingequations: s ′ = ⌊ s ( − p d )⌋ . ( p c = )⌊ s ( − p c )⌋ . ( p d = )⌊ s ( − · p c ( − p c ))⌋ . ( p c = p d )⌊ s ( − · p c ( − p d ))⌋ . ( p c < p d )⌊ s ( − min ( p c ( − p c ) , ( p c ( − p d ) + ( p d − p c )))⌋ . ( p c > p d ) Proof:
From Equation (4), we have the following equationsuch that the variance with s samples is equal to that with s ′ samples by using the lower and upper bounds: ( p c − ˆ R )( − p d − ˆ R ) s ′ = ˆ R ( − ˆ R ) s Then, s ′ is computed as follows: s ′ = s · ( ˆ R − p c )( − p d − ˆ R ) ˆ R ( − ˆ R ) = s · (cid:18) − p c ( − ˆ R ) + p d ( ˆ R − p c ) ˆ R ( − ˆ R ) (cid:19) (5)However, we cannot compute ˆ R before sampling s possible graphs.Therefore, we remove ˆ R from Equation (5) by dividing the pat-terns of p c and p d . First, if p c = s ′ is computed as follows: s (cid:18) − p d ˆ R ˆ R ( − ˆ R ) (cid:19) ≤ s ( − p d ) . s ′ = ⌊ s ( − p d )⌋ . Second, if p d = s ′ is computed as follows: s (cid:18) − p c ( − ˆ R ) ˆ R ( − ˆ R ) (cid:19) ≤ s ( − p c ) . s ′ = ⌊ s ( − p c )⌋ . Third, if p c = p d , s ′ is computed as follows: s (cid:18) − p c ( − ˆ R ) + p c ( ˆ R − p c ) ˆ R ( − ˆ R ) (cid:19) ≤ s ( − p c ( − p c )) . (6) s ′ = ⌊ s ( − p c ( − p c ))⌋ . In Equation (6), the maximum value of ˆ R ( − ˆ R ) is 0.25. Thus, wesubstitute 0.25 for ˆ R ( − ˆ R ) in the denominator. Fourth, if p c < p d , s ′ is computed as follows: s (cid:18) − p c ( − ˆ R ) + p d ( ˆ R − p c ) ˆ R ( − ˆ R ) (cid:19) ≤ s ( − p c ( − p d )) . s ′ = ⌊ s ( − p c ( − p d ))⌋ . Finally, if p c > p d , s ′ is computed as follows: s (cid:18) − p c ( − ˆ R ) + p d ( ˆ R − p c ) ˆ R ( − ˆ R ) (cid:19) ≤ s ( − p c ( − p c )) . s (cid:18) − p c ( − ˆ R ) + p d ( ˆ R − p c ) ˆ R ( − ˆ R ) (cid:19) ≤ s ( − ( p c ( − p c ) + ( p d − p c )) . s ′ = ⌊ s ( − min ( p c ( − p c ) , ( p c ( − p d ) + ( p d − p c ))))⌋ . (7)n Equation (7), the minimum s ′ depends on the values of p c and p d . Consequently, we have that s ′ ≤ s for all patterns of p c and p d . (cid:3) Horvitz-Thompson estimator:
The Horvitz-Thompson esti-mator for R is: ˆ R = Í si = Pr [ G pi ]· I ( G pi , T ) π i , where π i = − ( − Pr [ G p i ]) s . The variance is: V ar [ ˆ R ] = Í si = (cid:16) − π i π i (cid:17) I ( G p i , T ) Pr [ G p i ] + Í si Í sj , i , j (cid:16) π ij − π i π j π i π j (cid:17) I ( G p i , T ) I ( G p j , T ) Pr [ G p i ] Pr [ G p j ] , where π ij = − ( − Pr [ G p i ]) s − ( − Pr [ G p j ]) + ( − Pr [ G p i ] − Pr [ G p j ]) s . The variance is simplified as follows [19]: V ar [ ˆ R ] = R ( − R ) s − Σ si = ( s − ) I ( G pi , T ) Pr [ G pi ] s . (8)The variance using the lower and upper bounds is computedin accordance with the stratified sampling as follows: V ar [ ˆ R ] ′ = ( ˆ R − p c )( − p d − ˆ R ) s − Í si = ( s − ) I ( G pi , T ) Pr [ G pi ] s . (9) Theorem 2.
Given the number of samples s , the lower bound p c , and the upper bound − p d , the variance of network reliabilityby using Horvits-Thompson estimator with s ′ (≤ s ) samples is lessthan and equal to that with s samples where s ′ is equal to thenumber of samples in Monte Carlo estimator in 1.Proof: From Equations (8) and (9), we have the following equa-tion: ( ˆ R − p c )( − p d − ˆ R ) s ′ − Í si = ( s ′ − ) I ( G pi , T ) Pr [ G pi ] s ′ = ˆ R ( − ˆ R ) s − Í si = ( s − ) I ( G pi , T ) Pr [ G pi ] s . The values of the right are the same because the estimator isunbiased. The proof for this follows Theorem 1. (cid:3)
Our approach reduces the number of samples in accordancewith Theorems 1 and 2. As a result, our approach is more effi-cient than the existing sampling-based approach. BDD
We can reduce the number of samples by using the lower andupper bounds of network reliability. To efficiently obtain thebounds, we develop the S BDD. We efficiently search for the pos-sible graphs in which terminal are connected and disconnectedwith high probabilities by constructing the S BDD. Furthermore,during constructing the S BDD, we sample possible graphs thatare not used to compute the bounds, which is the requirementof stratified sampling. Our approach uses S BDD for both com-puting the bounds of network reliability and sampling possiblegraphs.We design the S BDD to effectively reduce its size. The S BDDkeeps a single layer and sink nodes while ordinary BDD containsall layers. This idea is based on the observation that the layer l − l to bothconstruct the layer l + BDD and then explain how to construct it.
Definition 2.
Let N l be a set of nodes at layer l . S BDD con-sists N l , the 1-sink, and the 0-sink. The S BDD maintains the fol-lowing attributes on node n ∈ N : • p n : the probability of the intermediate graph correspondingto node n . • { c n , f } for all f ∈ F l : an identifier of connected component.If frontiers f and f ′ ∈ F l are connected by existent edges, c n , f and c n , f ′ share the same identifier. • { d n , f } for all f ∈ F l : the sum of the numbers of uncertainedges connected to the frontiers such that { f ′ ∈ F l | c n , f = c n , f ′ } . • { t n , f } for all f ∈ F l : the number of the terminals that areconnected to f by existent edges.The 1-sink and 0-sink maintain the probabilities p c and p d thatterminals are connected and disconnected, respectively. For example, in Figure 2, S BDD contains third and sink lay-ers but does not contains first and second layers.To construct an S BDD, we process edge e l and generate theset of nodes N next at layer l +
1. The construction method com-prises four procedures; generating , merging , deleting , and sam-pling . The following sections explain these procedures in details. The BDD-based ap-proach uses the generating and merging procedures to constructthe BDD. We extend these procedures to effectively compute thebounds without sacrificing the exactness of the network reliabil-ity. For extending the generating and merging procedures, wecapture the feature of computing the network reliability suchthat we can skip the computation of nodes when we obtain theprobabilities p c and p d exactly.We first explain the generating procedure. The generatingprocedure sets the state of edge e l (recall that arcs at layer l in the BDD corresponding to e l ) and then generates the set ofnew nodes N next at layer l +
1. As the same as the traditionalprocedure, we generate two new nodes at layer l + l according to the state of e l . We set the at-tributes on the new nodes (i.e., p n , { c n , f } , { d n , f } , and { t n , f } ).More specifically, p n is set as p n · p ( e l ) when e l is existent andset as p n · ( − p ( e l )) when e l is non-existent. { c n , f } , { d n , f } ,and { t n , f } are computed from attributes of frontiers on nodes atlayer l by merging attributes of frontiers and creating new fron-tiers. If all the terminals in the intermediate graph are connected,we add its probability to p c , and if they are disconnected, we addits probability to p d .If we determine whether or not terminals are connected/dis-connected with processing a smaller number of edges, we canobtain the tight bounds of the network reliability earlier. Let n , n ′ , F , and F ′ be the new node at layer l +
1, the node beforesetting e l of n at layer l , the sets of frontier at layers l + l ,respectively. We determine whether or not terminals are conne-cted/disconnected based on following lemmas: Lemma 4.1.
All the terminals t ∈ T are connected if the at-tributes of the frontiers satisfy one of the following conditions: Condition 1 : edge e l = ( v , v ′ ) is existent, for t n , f = k , ∃ f ∈ F . Condition 2 : edge e l = ( v , v ′ ) is existent, for (1) v ∈ F ′ , (2) v ′ < F ′ ∪ F , (3) t n ′ , v = k − , and (4) v ∈ T (similarly, replacing v with v ′ and vice versa). Condition 3 : edge e l = ( v , v ′ ) is existent, for (1) v , v ′ ∈ F , (2) c n ′ , v , c n ′ , v ′ , and (3) t n ′ , v + t n ′ , v ′ = k .Proof: This is an immediate consequence of the definitions be-cause all the terminals are connected. (cid:3)
Lemma 4.2.
The terminals are disconnected if the attributes ofthe frontiers satisfy one of the following conditions:
Condition 1 : edge e l = ( v , v ′ ) is non-existent, for (1) v < F ′ ∪ F ,and (2) v ∈ T (similarly, for v ′ ). ondition 2 : edge e l = ( v , v ′ ) is non-existent, for (1) v ∈ F ′ , (2) t n ′ , v > , and (3) d n ′ , v = (similarly, for v ′ ). Condition 3 : edge e l = ( v , v ′ ) is existent or non-existent, for (1) v , v ′ ∈ F ′ \ F and (2) ( t n ′ , v > or t n ′ , v ′ > ).Proof: This is an immediate consequence of the definitions be-cause the terminals are disconnected. (cid:3)
Note that the state-of-the-art construction of the BDD uses onlythe condition 1 on Lemmas 1 and 2. As a result, the S BDD canmore effectively tighten the bounds of network reliability.We next explain the merging procedure. Since each interme-diate graph on S BDD has different existent and non-existentedges, the attributes on each frontier are different (in general).The merging procedure merges the nodes that make a transitionto the same sink nodes based on the following lemma:
Lemma 4.3.
Given nodes n and n at layer l , if we have for ∀ f ∈ F l (1) c n , f = c n , f and (2) ( t n , f = and t n , f = ) or( t n , f > and t n , f > ), then nodes derived from n and n with the same states of edges e l + , . . . , e | E | make a transition tothe same sink nodes.Proof: If n and n have (1) { c n , f } = { c n , f } for all f in F l ,the connected frontiers are the same in the intermediate graphscorresponding to n and n . New nodes n ′ and n ′ derived from n and n are the same { c n ′ , f } = { c n ′ , f } if they have the samestates of edges e l + , . . . , e | E | . Thus, { c n , f } and { c n , f } for all f in F l are the same until they make a transition to the sink nodes.Since the same { c n , f } and { c n , f } share the same connectedcomponents, each frontier has the same { d n , f } and { d n , f } . Inaddition, frontiers f and f ′ must be connected if they connectto at least one terminals (i.e., t n , f > t n , f > { c n , f } = { c n , f } and (2) ( t n , f = t n , f =
0) or ( t n , f > t n , f >
0) for all f in F l , nodes derived from n and n withthe same states of edges e l + , . . . , e | E | have the same attributeson the frontiers, and thus they make a transition to the samesink nodes. (cid:3) The probabilities of the merged nodes are aggregated to onenode. The probabilities p c and p d are consistent, regardless ofwhether or not the nodes are merged. These procedures do notsacrifice the exactness of the network reliability. The size of the S BDD increasesexponentially as the size of the graph increases. If the size ofS BDD increases, the computation cost increases to obtain thelower and upper bounds of the network reliability because ittakes a large time to construct the S BDD. Hence, we controlthe size of S BDD by specifying the maximum size w . The delet-ing procedure deletes the nodes so that the size of an S BDDis not larger than w . One of major difficulties in designing thisprocedure pertains to which nodes should be kept in the S BDDfor achieving higher efficiency and accuracy. According to The-orems 1 and 2, the number of samples effectively decreases asthe probabilities p c and p d increase. We identify intermediategraphs in which terminals are highly likely connected or discon-nected after processing a small number of edges. We make thefollowing key observations in terms of the connectivity of ter-minals: Observation 1
The terminals in the intermediate graph cor-responding to node n are highly likely connected if t n , f is large for ∃ f ∈ F l . Observation 2
The terminals in the intermediate graph cor-responding to node n are highly likely disconnected if d n , f is small and t n , f > ∃ f ∈ F l .Furthermore, if the probability of node p n is high and node n makes a transition to sink nodes, p c and p d increase consider-ably. Based on these observations, we define a heuristic functionbased on our observations. We compute the priorities of nodesfrom their attributes by the heuristic function and preferentiallykeep high-priority nodes. The heuristic function h to computethe priority of node n is as follows: h ( n ) = p n · max f ∈ F (cid:16) t n , f k , d n , f (cid:17) if t n , f > . (10)This function outputs larger value when (1) a frontier is con-nected to at least one terminals and (2) the frontier is connectedto a large number of terminals or (3) the frontier has a smallnumber of uncertain edges. In the former case, the terminalsare likely connected, and in the latter case, the terminals arelikely disconnected. Low-priority nodes (i.e., n with small h ( n ) )are then deleted from an S BDD.
Our approach samples possible graphsso that it avoids sampling the possible graphs that are used tocompute the lower and upper bounds of network reliability, forsatisfying the requirements of the stratified sampling. We sam-ple the possible graphs from the set of possible graphs that inwhich terminals are not connected/disconnected yet. We denoteby W G u such set of possible graphs, and the set is obtained fromintermediate graphs corresponding to the deleted nodes and nodesin the S BDD. We employ dynamic programming for efficientlysampling possible graphs from W G u . In addition, we use the ideaof the stratified random sampling [32] for determining the num-ber of samples for subgroups that are partial W G u .We first divide W G u into subgroups and then randomly sam-ple possible graphs from each subgroup. The number of samplesfor each subgroup is taken in proportion to the sum of the prob-abilities of the intermediate graphs in the subgroup. We hereexplain only how to divide the deleted nodes and how to decidethe number of samples for them. As for the nodes in S BDD,each subgroup is the set of possible graphs obtained from theintermediate graph corresponding to the node, and the numberof samples is computed from its probabilities.We divide the set of intermediate graphs for deleted nodesinto subgroups according to original BDD layers instead of thenode itself. This is because probabilities of deleted nodes are typ-ically quite small to decide the number of samples. W G l u and s l are the set of intermediate graphs corresponding to the deletednodes at layer l and the number of samples at layer l , respectively. s l is computed by multiplying s and the total probabilities ˆ p s l of deleted nodes at layer l . We compute ˆ p s l from the attributesmaintained by the S BDD by the following equation:ˆ p s l = − Í l − i = p s i − p N next − p c − p d , (11)where p N next denotes the sum of probabilities of n ∈ N next .ˆ p s l is the expected sum of probabilities of deleted nodes. Thisis because ˆ p s l indicates the sum of probabilities in N l when thenumber of nodes at layer l + s l at layer l becomes s · ˆ p s l . The dynamicprogramming and stratified random sampling improve the effi-ciency of sampling while keeping the unbiased sampling. .4 Complexity We explain the time and space complexities of our approach.
Theorem 3.
Given the uncertain graph G , the updated numberof samples s ′ , and the maximum width of S BDD w , the time andspace complexities of our approach are O ( w log w + s ′ (| V | + | E |)) and O ( w log w + | V | + | E |) , respectively.Proof: The time complexity of our approach is divided intotwo parts; constructing S BDD and sampling. To construct S BDD,our construction method compares attributes on each node eachother for generating and merging procedures. The number ofattributes on each node increases in proportion to the numberof frontiers. The number of frontiers is O ( log w ) because thenumber of existent/non-existent edges is at most log w . Thus,the time complexity for constructing S BDD is O ( w log w ) . Thetime complexity of sampling is O ( s ′ (| V | + | E |)) . Therefore, thetime complexity of our approach is O ( w log w + s ′ (| V | + | E |)) .The space complexity depends on the size of S BDD and theuncertain graphs. The size of S BDD is the number of nodes mul-tiplied by the number of attributes on each node. Therefore, thespace complexity is O ( w log w + | V | + | E |) . (cid:3) The computation cost of our approach depends on the size of theuncertain graphs as well as the number of samples. The computa-tion cost decreases as the size of the uncertain graphs decreases.Therefore, we propose an extension technique to efficiently re-duce the size of graphs while preserving the accuracy. The exten-sion technique preprocesses the uncertain graphs before sam-pling possible graphs and constructing an S BDD. It not onlyimproves the efficiency but also improves the accuracy of theapproximation. The extension technique uses for reducting the size of uncertain graphs [7].
Definition 3 (2-edge-connected component).
Given a graph G = ( V , E ) , an edge is called a bridge if G is disconnected after theremoval of the edge from E . Vertices that are connected by bridgesare called articulation points. A subgraph C = ( V C , E C ) of G is a 2-edge connected component if C is still connected after the removalof any edges from E C . We denote the sets of bridges, articulationpoints, and 2-edge connected components by B , A , and C , respec-tively The 2-edge-connected components, bridges, and articulationpoints provide sets of edges (and vertices) such that the uncer-tain graph is disconnected or still connected when the edges (andvertices) are deleted. Because we can compute 2-edge connectedcomponents only by using the network topology of a given un-certain graph, we precompute them as an index.The extension technique consists of three phases; (1) pruning,(2) decomposing, and (3) transforming. In the pruning phase, wefirst compute G ′ such that R [G] = R [G ′ ] . The number of edgesin G ′ is smaller than that in G by pruning edges and vertices thatdo not affect computing the network reliability. Next, in the de-composing phase, we compute the subgraphs G , . . . , G m where R [G ′ ] = Π mi = R [G i ] . Finally, in the transforming phase, we com-pute G ′ i such that R [G i ] = R [G ′ i ] for all 1 ≤ i ≤ m . Since wetransform the graph into a smaller graph, the number of edgesin G ′ i is smaller than that in G i . Prune:
We prune vertices and edges that do not affect the net-work reliability. A vertex (or an edge) is unnecessary if the graphis partitioned after the removal of the vertex (or edge) from G and one of the partitioned graphs does not include terminals. A naive approach deletes each articulation point and bridge, andthen checks whether partitioned graphs include terminals or not.This approach incurs O ((| B | + | A |)(| V | + | E |)) time complexity.To improve the efficiency, we reconstruct the uncertain graphbased on the 2-edge connected components. To do so, we firstunite the set of vertices and edges included in C ∈ C to form asingle vertex v c . We then set every articulation point includedin C as vertex v a and set edges between v a and v c . The othervertices and edges that are not included in C are still in the re-constructed graphs. Therefore, the vertices of the reconstructedgraph indicate C , A , and the vertices that are not included in C . If any vertex in C except for articulation points is a terminal, v c is also a terminal. The reconstructed graph is structured asa tree structure because the 2-edge connected components areconnected to the other components by a single edge. To com-pute the necessary vertices and edges, we compute the minimumSteiner tree for terminals in the reconstructed graph. The mini-mum Steiner tree includes only the necessary vertices and edgesto compute the network reliability because it includes only theedges and vertices that all the terminals are connected. Its com-putation cost is O (| V |) , because the minimum Steiner tree in atree structure is computed by a depth first search from a termi-nal. Decompose:
We decompose the graph because the time com-plexity for computing the network reliability on decomposedgraphs becomes smaller than that on that original uncertain graph.The decomposed graph has fewer edges than the original uncer-tain graph. We decompose the graph according to the followinglemma:
Lemma 5.1.
Given an uncertain graph and a set of bridges, weobtain R [G , T ] = p b · Î mi = R [G i , T i ] , where p b = Î e b ∈ B p ( e b ) and T i is the set of terminals for G i .Proof: Given intermediate graph G E ( E ∃ , E ¬ ) and edge e ∈ E \( E ∃ ∪ E ¬ ) , the network reliability is computed using the Fac-toring Theorem [10]: R [G E ( E ∃ , E ¬ )] = p ( e ) · R [G E ( E ∃ ∪ e , E ¬ )] + ( − p ( e )) · R [G E ( E ∃ , E ¬ ∪ e )] . (12)If we select bridge e b = ( v , v ′ ) ∈ B as e in Equation (12), R [G E ( E ∃ , E ¬ ∪ e )] is zero because terminals in G E ( E ∃ , E ¬ ∪ e ) are disconnected.Therefore, we obtain the following equation: R [G E ( E ∃ , E ¬ )] = p ( e b ) · R [G E ( E ∃ ∪ e b , E ¬ )] . (13)For connecting all the terminals, e b must be existent, and thuswe can decompose the intermediate graph G E into two graphs G E and G E . We also divide the terminals T into T and T for G E and G E , respectively; T includes { t ∈ T , v , v ′ | t , v , v ′ ∈ V } (similarly, T ). Thus, R [G E ] = p ( e b ) · R [G E ] R [G E ] . G E and G E are decomposed in the same manner. Then, we obtain R [G] = p b · Î mi = R [G i , T i ] . (cid:3) We decompose the uncertain graph into several subgraphsbased on the above lemma. Its computation cost is O (| B || V |) be-cause we check whether decomposed graphs include terminalsor not for each bridge. Transform:
We transform the graph to reduce its size. Wedelete and add the following edges and vertices without sacrific-ing the exactness of the network reliability: • Sequential edges ( e = ( v , v ′ ) , e ′ = ( v , v ′′ ) ): Delete v , e and e ′ , and add a new edge with probability p ( e ) · p ( e ′ ) between v ′ and v ′′ , provided that v is not a terminal andits degree is two. lgorithm 1: Computing the approximate network reliabil-ity input :
Uncertain graph G , terminals T , maximum BDD size w , size ofsamples s , 2-edge connected components C , bridges B , articulationpoints A output: Approximate network reliability ˆ R procedure our approach set T to G ; ˆ R , S G ← Preprocess( G , T , C , B , A ) ; for G i ∈ S G do r ← Construction( G i , w , s ) ; ˆ R ← ˆ R · r ; return ˆ R ; end procedure • Parallel edges ( e = ( v , v ′ ) , e ′ = ( v , v ′ ) ): Delete e and e ′ ,and add a new edge with probability ( −( − p ( e )·( − p ( e ′ )) between v and v ′ . • Loop : Delete the loop because loops do not contribute tothe network reliability. Note that transforming sequentialand parallel edges can generate loops.We iteratively repeat this process until the graph does not change.The computation cost is O ( γ · | V | · d avд ) where γ and d avд arethe number of repetitions and the average degree of the vertices,respectively.Consequently, the extension technique effectively reduces thecomputation cost for computing the network reliability with asmall preprocessing time. Furthermore, it improves the accuracyof the sampling technique. Theorem 4.
Given G , . . . , G m such that R [G] = p b · Π mi = R [G i ] ,the variance of the network reliability decreases for < ˆ R < and < p b < .Proof: The network reliability is denoted by ˆ R = p b · Π mi = ˆ R [G i ] .The valiance is computed as follows: V ar [ ˆ R ] = V ar [ p b · Π mi = ˆ R [G i ]] = ( V ar [ p b ] + p b )( V ar [ ˆ R [G ]] + ˆ R [G ] ) · · ·( V ar [ ˆ R [G m ]] + ˆ R [G m ] ) − p b · Π mi = ˆ R [G i ] = p b Π mi = ( V ar [ ˆ R [G i ]] + ˆ R [G i ] ) − p b Π mi = ˆ R [G i ] = p b Π mi = (cid:16) ˆ R [G i ]( − ˆ R [G i ]) s + ˆ R [G i ] (cid:17) − p b Π mi = ˆ R [G i ] = p b Π mi = ˆ R [G i ] (cid:16) ( + ( s − ) ˆ R [G i ]) s (cid:17) − p b Π mi = ˆ R [G i ] < p b Π mi = ˆ R [G i ] s − p b Π mi = ˆ R [G i ] s = p b ˆ R ( − ˆ R ) s < ˆ R ( − ˆ R ) s (14)Note that V ar [ p b ] = V ar [ ˆ R ] is smaller than the variance ofthe network reliability of the original graph. (cid:3) In this section, we explain the entire algorithm of our approach.Algorithm 1 shows the pseudo-codes. Our approach first pre-processes uncertain graphs and obtains decomposed uncertaingraphs (line 3). For each decomposed graph, it then constructsan S BDD to compute the approximate network reliability ofthe decomposed graphs (lines 4–5). The product of the networkreliability of each decomposed graph is the original network re-liability (line 6).
Algorithm 2:
Constructing S BDD input :
Uncertain graph G , maximum size w , number of samples s output: Approximate network reliability ˆ R procedure Construction( G , w , s ) Ordering( E ); p c , p d , ˆ p sl , c ← ; /* initialize probabilities and samplingcount */ s ′ ← s ; N ← CreateRoot; F ← null ; for l for , . . . , | E | do p N , p si ← ; F ′ ← F ; compute F based on e l ; while N is empty do n ← N . pop ; for state ∈ { non - existent , existent } do set( n , F ′ , F , state , G , e l ) ; if n is -sink then p d ← p d + p n ; else if n is -sink then p c ← p c + p n ; else if hashmap [ n ] is not null then p hashmap [ n ] ← p hashmap [ n ] + p n ; else if | N next | ≤ w then h n ← h ( n ) ; N next .add( n ); hashmap [ n ] ← n ; p N next ← p N next + p n ; else p si ← p si + p n ; for i for , . . . , ⌊ s ′ · ( − ˆ p sl − p N next − p c − p d )⌋ do if Sampling( G , n ) then c ← c + ; if c + ⌊ s ′ · p N next ⌋ ≥ s ′ then for n ∈ N do for i for , . . . , ⌊ s ′ · p N next ⌋ do if Sampling( G , n ) then c ← c + ; break ; if N n is empty then break ; N ← N next ; sort N in descending order of h ( n ) ; ˆ p sl ← ˆ p sl + p si ; compute s ′ ; clear N next ; clear hashmap ; compute ˆ R based on the sampling; return ˆ R ; end procedure Algorithm 2 shows the pseudo-codes for the construction ofan S BDD. We process edges in a predefined order, and computethe set of frontiers (lines 6–8). For each node at layer l , we com-pute the nodes at layer l + set function (line 12) sets attributes on thenew node to n and checks whether the terminals are connectedor disconnected based on Lemmas 1 and 2. If the new node are0-sink and 1-sink, we add p n to p d and p c , respectively (line 13–14). Otherwise, we compute hash values for n , and if the hash of n is not null, we add the probability p n to the node in the hash(lines 16–17). If the hash is null with respect to n , it inserts n into the set N next of nodes at layer l + N n exceeds the maximum size w , we delete n and pick pos-sible graphs as samples from n (lines 22–25). After sampling anenough number of possible graphs, we sample form the nodesin the S BDD (lines 26–29).Algorithm 3 shows the pseudo-codes for the extension tech-nique. The extension technique first reconstructs the uncertain lgorithm 3:
Extension technique input :
Uncertain graph G , terminals T , 2-edge connected components C ,bridges B , articulation points A output: Probability p b , the set of decomposed graphs S G procedure Preprocess( G , C , B , A )/* Prune */ G r ← Reconstruct( G ) ; Compute the minimum Steiner tree T for G r and terminals; Delete edges and vertices of G not included in T ; /* Decompose */ p b ← Î eb ∈ B p ( e b ) ; Delete the set of bridges in G ; S G ← the set of disconnected graphs; /* Transform */ for G ′ ∈ S G do while do for v ∈ V of G ′ do if v connects to edge e = ( v , v ) then delete e = ( v , v ) ; if v < T and v connects to just two edges e = ( v , v ′ ) and e ′ = ( v , v ′′ ) then delete e and e ′ from G ′ ; add a new edge ( v ′ , v ′′ ) with probability p ( e ) · p ( e ′ ) ; for v ∈ V of G ′ do for ∀ pair of u and u ′ ∈ the set of neighbor vertices of v do if u = u ′ then delete edge e = ( v , u ) and e ′ = ( v , u ′ ) ; add a new edge ( v , u ) with probability ( − ( − p ( e ) · ( − p ( e ′ )) ; if The number of edges does not change then break; return p b , S G ; end procedure graph (line 2). Then, it computes the minimum Steiner tree forthe reconstructed graph and prunes the edges and vertices thatare not included in the Steiner tree from the original uncertaingraph (lines 3–4). To decompose the graph, we compute the prod-uct of the probabilities of bridges p b (line 5). Then, we deletebridges from the uncertain graph, and the disconnected subgraphsare inserted into the set of decomposed uncertain graphs (lines6–7). For each decomposed graph, it transforms vertices and edgesthat satisfy the transformation rules (lines 8–20). We evaluate our approach in terms of efficiency, accuracy, andmemory usage.
We summarize the datasets in Table 2. The first two datasets;Zachary-karate-club and American-revolution are small datasetsfor evaluating accuracy, which are extracted from KONECT .We randomly assign probabilities based on the uniform distribu-tion [9]. The other five datasets; DBLP before 2000, DBLP after2000, Tokyo, New York City, and Hit-direct, are large datasets.Edge existence probabilities for each large dataset are assignedbased on the attributes of the edges in each dataset. DBLP be-fore 2000 and DBLP after 2000 are graphs extracted from DBLP ,where vertices and edges are authors and co-author, respectively.We compute the edge existence probabilities by log ( α + ) log ( α M + ) , where α and α M denote the number of co-authors and the maximumin each dataset, respectively [6]. The Tokyo and New York City http://konect.uni-koblenz.de/ http://dblp.uni-trier.de/ datasets are road networks extracted from OpenStreetMap . Wecompute the edge existence probabilities in the same manner aswith the DBLP datasets, although we use road lengths insteadof the number of co-authors. Note that both the Tokyo and NewYork City datasets are not planar graphs. Hit-direct is a protein-protein interaction network extracted from the Human GenomeCenter . We use the interaction scores ∈ ( , ] of interactions asthe edge existence probabilities. For each dataset, we generate 20 searches (except when we eval-uate the accuracy, for which see Section 7.6). The terminals areselected randomly from vertices. We vary the number of termi-nals k , the number of samples s , and the maximum size of theS BDD w .Because the existence probabilities of possible graphs can bevery small, we use the Boost.Multiprecision library, with preci-sion of 10,000 decimal points, for the large datasets. We com-pute the 2-edge-connected components using code provided bythe authors [7]. We compare our approach with two existing ap-proaches; the sampling-based and BDD-based approaches. TheBDD-based approach uses the state-of-the art library, TdZDD.All algorithms are implemented in C++, and run on a server withan Intel Xenon E7-8860v4 at 2.20GHz with 256GB RAM. We compare the efficiency of our approach with that of sampling-based and BDD-based approaches. Figure 3 shows the responsetime for each large dataset when the numbers of terminals k is set to 5, 10, and 20. DNF indicates that we cannot computethe network reliability due to the lack of memory space. We useMonte Carlo estimator for our approach and the sampling-basedapproach (denoted by Pro(MC) and
Sampling(MC) , respectively)and set s to 10,000. For our approach, we set w to 10,000. Wealso evaluate our approach without the extension technique de-noted by Pro(MC)w/o ext . We here omit the results of Horvitz-Thompson estimator because they are almost equivalent to thoseof Monte Carlo estimator.The results show that our approach is more efficient than bothof the sampling-based and the BDD-based approaches for all k . The BDD-based approach cannot compute the network reli-ability because it runs out of memory. Our approach achieveshigher efficiency than the sampling-based approach because itreduces the number of samples. Furthermore, we can see that theextension technique improves the efficiency. In particular, ourapproach works well on the Tokyo and NYC datasets. This is be-cause the S BDD works well for planar-like graphs (even whenthey are not strictly planar graphs). In the Hit-direct dataset, thelower and upper bounds do not effectively become tight becausethe number of degrees is large. Nevertheless, our approach ismore efficient than the sampling-based approach.
We evaluate the effect of the given number of samples. Figure 4shows (a) the rate of response time of our approach over that ofthe sampling-based approach and (b) the rate of updated samples s ′ over s , varying the number of samples. This figure shows thatour approach becomes more efficient as the given number ofsamples increases. This is because the reduction of the number http://hintdb.hgc.jp/htp/download.html. able 2: Dataset Name Abbr. Type DN F DN F DN F DN F DN F R e s pon s e t i m e [ s e c ] Pro(MC)Pro(MC)w/o extSampling(MC)BDD (a) k = DN F DN F DN F DN F DN F R e s pon s e t i m e [ s e c ] Pro(MC)Pro(MC)w/o extSampling(MC)BDD (b) k = DN F DN F DN F DN F DN F R e s pon s e t i m e [ s e c ] Pro(MC)Pro(MC)w/o extSampling(MC)BDD (c) k = Figure 3: Overview of efficiency R edu c t i on r a t e s o f t i m e (a) Response time R edu c t i on r a t e s o f o f s a m p l e s (b) of samples Figure 4: Efficiency with varying the number of samples M e m o r y u s age [ G B ] Maximum widthDBLP1DBLP2TokyoNYCHit-d (a) Memory usage R e s pon c e t i m e [ s e c ] Maximum widthDBLP1DBLP2TokyoNYCHit-d (b) Response time
Figure 5: Efficiency with varying the maximum width of samples is more effective when the given number of samplesis large. Therefore, our approach more effectively works whenwe need a high accurate network reliability.
We evaluate the effect of the given maximum width of S BDD.The maximum width w affects the memory usage and efficiency.Figure 5 shows (a) the memory usage and (b) the response time.From Figure 5(a), we can see that the memory usage increases asthe maximum width increases. The memory usage depends onthe maximum width but not depends on the size of graphs. Ourapproach can be used for large-scale graphs in terms of memoryusage. From Figure 5(b), we can see that the response time doesnot largely depend on the maximum width. When the maximumwidth is large, our approach can reduce the number of samplesbut takes a large computation cost for constructing S BDD. Ourapproach is robust enough to the maximum width in terms ofefficiency. Consequently, our approach effectively decreases theresponse time even for large-scale graphs.
We evaluate the accuracy of our approach compared with thesampling-based approaches. For both approaches, we use Horvits-Thompson estimator (denoted by
Pro(HT) and
Sampling(HT) ) as well as Monte Carlo estimator. Since the network reliabil-ity problem is P -complete, we cannot compute the exact an-swer for large datasets in terms of both response time and mem-ory usage. We use the Karate and Am-Rv datasets which canbe computed the exact network reliability. We evaluate the vari-ance and the error rate to determine the accuracy of the approx-imation as follows: variance = Σ q i = Σ q j = ( R i − ˆ R i , j ) q · q and error rate = Σ q i = Σ q j = | R i − ˆ R i , j | q · q · R i , where R i and ˆ R i , j denote the i -th exact net-work reliability and the j -th approximate network reliability forthe i -th search, respectively. We generate 100 searches and com-pute the network reliability 100 times for each search (i.e., both q and q are 100).Tables 3 and 4 show the accuracy on the Karate and Am-Rvdatasets, respectively. Table 3 shows that our approach outper-forms the sampling-based approaches in terms of both of thevariance and error rate. Comparing the variance between the es-timators, the Monte Carlo estimator is slightly better than theHorvits-Thompson sampling. This is because we sample possi-ble graphs with replacement, and thus the Horvits-Thompson es-timator is less effective. Table 4 shows that our approach alwayscomputes the exact network reliability on the Am-Rv dataset—its error rate is zero. Both of the existing sampling-based ap-proaches have high error rates when k =
20 although their vari-ances are small. Because the network reliability is very small, thesampling-based approaches rarely sample the possible graphs in able 3: Accuracy on Karate dataset k Method Variance Error rate5
Pro(MC)
Pro(HT)
Sampling(MC)
Sampling(HT)
Pro(MC)
Pro(HT)
Sampling(MC)
Sampling(HT)
Pro(MC) · − Pro(HT) · − Sampling(MC) · − Sampling(HT) · − Table 4: Accuracy on Am-Rv dataset k Method Variance Error rate5
Pro(MC)
Pro(HT)
Sampling(MC) · − Sampling(HT) · − Pro(MC)
Pro(HT)
Sampling(MC) · − Sampling(HT) · − Pro(MC)
Pro(HT)
Sampling(MC) · − Sampling(HT) · − Table 5: Effect of extension technique
Dataset Process time Reduced[sec] graph sizeKarate 0.0277 · − · − which terminals are connected. Thus, the approximate networkreliability is often zero, and the error rates are close to one. Fromthese results, we conclude that our approach can achieve lessvariance and error rate with fewer samples than the other ap-proaches and compute the exact answer for small-scale graphs. Finally, we evaluate the performance of the extension technique.The effect of the extension technique is detailed in Table 5 whichshows the process time and the ratio of the maximum numberof edges in decomposed graphs over the number of edges in theoriginal uncertain graph. The results show that the extensiontechnique requires a very small time compared with computingthe network reliability. Thus, it effectively reduces the total re-sponse time. Since it reduces the size of uncertain graphs, it mit-igates the computation cost for the S BDD. The extension tech-nique is effective for improving the efficiency of our approach.
In this paper, we proposed an efficient sampling-based approachfor computing the approximate network reliability. Our approachreduces the number of samples by using lower and upper boundsof the network reliability based on the stratified sampling. Wedeveloped scalable and sampling BDD, called S BDD, which effi-ciently computes the bounds. The S BDD preferentially searchesfor the possible graphs that highly improve the bounds. We fur-ther developed the extension technique of our approach to re-duce the size of graphs. Experiments demonstrated that our ap-proach is up to 51.2 times faster than the sampling-based ap-proach with a higher accuracy.
ACKNOWLEDGEMENT
This research is partially supported by JST ACT-I Grant NumberJPMJPR18UD and by JSPS KAKENHI Grant-in-Aid for Young Sci-entists (B) (JP15K21069), Japan.
REFERENCES [1] Charu C. Aggarwal. 2009.
Managing and Mining Uncertain Data . Vol. 35.Kluwer.[2] Avinash Agrawal and A Satyanarayana. 1984. An O ( | E | ) time algorithm forcomputing the reliability of a class of directed networks. Operations research
32, 3 (1984), 493–515.[3] S Hasanuddin Ahmad. 1988. Simple enumeration of minimal cutsets of acyclicdirected graph.
IEEE transactions on reliability
37, 5 (1988), 484–487.[4] Saurabh Asthana, Oliver D King, Francis D Gibbons, and Frederick P Roth.2004. Predicting protein complex membership using probabilistic networkreliability.
Genome research
14, 6 (2004), 1170–1175. [5] Michael O Ball, Charles J Colbourn, and J Scott Provan. 1995. Network re-liability.
Handbooks in operations research and management science
PVLDB
11, 4 (2017), 472–484.[7] Lijun Chang, Jeffrey Xu Yu, Lu Qin, Xuemin Lin, Chengfei Liu, and WeifaLiang. 2013. Efficiently computing k-edge connected components via graphdecomposition. In
SIGMOD . 205–216.[8] James Cheng, Zechao Shang, Hong Cheng, Haixun Wang, and Jeffrey Xu Yu.2014. Efficient processing of k-hop reachability queries.
The VLDB Journal
23, 2 (2014), 227–252.[9] Yurong Cheng, Ye Yuan, Lei Chen, Guoren Wang, Christophe Giraud-Carrier,and Yongjiao Sun. 2016. DISTR: a distributed method for the reachabilityquery over large uncertain graphs.
IEEE Transactions on Parallel and Dis-tributed Systems
27, 11 (2016), 3172–3185.[10] Charles J Colbourn. 1987.
The combinatorics of network reliability . OxfordUniversity Press New York.[11] George S Fishman. 1986. A comparison of four Monte Carlo methods forestimating the probabilityof st connectedness.
IEEE Transactions on reliability
35, 2 (1986), 145–155.[12] Christian Frey, Andreas Züfle, Tobias Emrich, and Matthias Renz. 2018. Ef-ficient information flow maximization in probabilistic graphs.
TKDE
30, 5(2018), 880–894.[13] R Hamer, G De Jong, E Kroes, and P Warffemius. 2005. The value of reliabilityin Transport–Provisional values for the Netherlands based on expert opinion.
Transport Research Centre of the Dutch Ministry of Transport (2005).[14] Gary Hardy, Corinne Lucet, and Nikolaos Limnios. 2007. K-terminal networkreliability measures with binary decision diagrams.
IEEE Transactions on Re-liability
56, 3 (2007), 506–515.[15] David G Harris and Aravind Srinivasan. 2018. Improved bounds and algo-rithms for graph cuts and network reliability.
Random Structures & Algo-rithms
52, 1 (2018), 74–135.[16] Johannes U Herrmann and Sieteng Soh. 2009. A memory efficient algorithmfor network reliability. In
Asia-Pacific Conference . 703–707.[17] Ronald Jansen, Haiyuan Yu, Dov Greenbaum, Yuval Kluger, Nevan J Krogan,Sambath Chung, Andrew Emili, Michael Snyder, Jack F Greenblatt, and MarkGerstein. 2003. A Bayesian networks approach for predicting protein-proteininteractions from genomic data. science
SIGKDD . 992–1000.[19] Ruoming Jin, Lin Liu, Bolin Ding, and Haixun Wang. 2011. Distance-constraint reachability computation in uncertain graphs.
PVLDB
4, 9 (2011),551–562.[20] Charles R Kalmanek and Y Richard Yang. 2010. The challenges of buildingreliable networks and networked application services. In
Guide to ReliableInternet Services and Applications . 3–17.[21] Jun Kawahara, Takeru Inoue, Hiroaki Iwashita, and Shinichi Minato. 2017.Frontier-based search for enumerating all constrained subgraphs with com-pressed representation.
IEICE Transactions on Fundamentals of Electronics,Communications and Computer Sciences
International Conference onExtending Database Technology . 535–546.[23] Arijit Khan and Lei Chen. 2015. On uncertain graphs modeling and queries.
PVLDB
8, 12 (2015), 2042–2043.[24] Minh Lê, Max Walter, and Josef Weidendorfer. 2014. Improving the kuo-lu-yeh algorithm for assessing two-terminal reliability. In
European DependableComputing Conference . 13–22.[25] Mitchell O Locks. 1987. A minimizing algorithm for sum of disjoint products.
IEEE Transactions on Reliability
36, 4 (1987), 445–453.[26] Takanori Maehara, Hirofumi Suzuki, and Masakazu Ishihata. 2017. ExactComputation of Influence Spread by Binary Decision Diagrams. In
Interna-tional Conference on World Wide Web . 947–956.27] Eugène Manzi, Martine Labbé, Guy Latouche, and Francesco Maffioli. 2001.Fishman’s sampling plan for computing network reliability.
IEEE Transactionson Reliability
50, 1 (2001), 41–46.[28] Rajeev Motwani and Prabhakar Raghavan. 2010.
Randomized algorithms .Chapman & Hall/CRC.[29] William G Ortel. 1999. Broad band optical fiber telecommunications network.US Patent 5,861,966.[30] J Scott Provan. 1986. The complexity of reliability computations in planarand acyclic graphs.
SIAM J. Comput.
15, 3 (1986), 694–702.[31] J NoK Rao, HO Hartley, and WG Cochran. 1962. On a simple procedure ofunequal probability sampling without replacement.
Journal of the Royal Sta-tistical Society. Series B (Methodological) (1962), 482–491.[32] S Thompson. 2002.
Sampling . Wiley.[33] Leslie G Valiant. 1979. The complexity of enumeration and reliability prob-lems.
SIAM J. Comput.
8, 3 (1979), 410–421.[34] Lucien DJ Valstar, George HL Fletcher, and Yuichi Yoshida. 2017. LandmarkIndexing for Evaluation of Label-Constrained Reachability Queries. In
SIG-MOD . 345–358.[35] Fu-Min Yeh, Shyue-Kung Lu, and Sy-Yen Kuo. 2002. OBDD-based evaluationof k-terminal network reliability.
IEEE Transactions on Reliability
51, 4 (2002),443–451.[36] Bihai Zhao, Jianxin Wang, Min Li, Fang-Xiang Wu, and Yi Pan. 2014. Detect-ing protein complexes based on uncertain graph model.
IEEE/ACM Transac-tions on Computational Biology and Bioinformatics
11, 3 (2014), 486–497.[37] Junfeng Zhou, Shijie Zhou, Jeffrey Xu Yu, Hao Wei, Ziyang Chen, and XianTang. 2017. DAG reduction: Fast answering reachability queries. In