The joint survival signature of coherent systems with shared components
Tahani Coolen-Maturi, Frank P.A. Coolen, Narayanaswamy Balakrishnan
TThe joint survival signature of coherent systems with sharedcomponents
Tahani Coolen-Maturi , a , Frank P.A. Coolen , Narayanaswamy Balakrishnan Department of Mathematical Sciences, Durham University, Durham, UK. Department of Mathematics and Statistics, McMaster University, Ontario, Canada.
Abstract
The concept of joint bivariate signature, introduced by Navarro et al. [13], is a usefultool for quantifying the reliability of two systems with shared components. As with theunivariate system signature, introduced by Samaniego [17], its applications are limitedto systems with only one type of components, which restricts its practical use. Coolenand Coolen-Maturi [2] introduced the survival signature, which generalizes Samaniego’ssignature and can be used for systems with multiple types of components. This paper in-troduces a joint survival signature for multiple systems with multiple types of componentsand with some components shared between systems. A particularly important featureis that the functioning of these systems can be considered at different times, enablingcomputation of relevant conditional probabilities with regard to a system’s functioningconditional on the status of another system with which it shares components. Severalopportunities for practical application and related challenges for further development ofthe presented concept are briefly discussed, setting out an important direction for futureresearch.
Keywords:
Coherent systems, exchangeable components, signature, survival signature,system reliability.
1. Introduction
In recent decades, the system signature has become a popular tool for quantifyingreliability of coherent systems consisting of components with exchangeable random failuretimes [16], where in the literature the assumption of exchangeability [4] is often replacedby the stronger assumption of independent and identically distributed ( iid ) componentfailure times. The system signature is a summary of the system structure function which a Corresponding author: [email protected] a r X i v : . [ m a t h . S T ] A ug s sufficient to quantify several important aspects of reliability of a system, in particularthe system’s failure time distribution. A detailed introduction and overview of systemsignatures is presented by Samaniego [17].The essential property of the system signature is that it enables information of thesystem structure to be fully taken into account through the signature, and this is sep-arated from information about the random failure times of the components. The maindisadvantage of system signatures, however, is that it becomes extremely complicated,and is indeed effectively impossible, to keep this separation when generalizing the conceptto systems with multiple types of components, which is crucial for a practically applica-ble theory as most real-world systems consist of more than a single type of components[2, 12]. As an alternative to the system signature, Coolen and Coolen-Maturi [2] intro-duced the survival signature. For systems with just one type of components, the survivalsignature is equivalent to the system signature, but the survival signature can be definedfor, and easily applied to, systems with multiple types of components.There are many scenarios where two or more systems share components, which canbe of different types. Consider for example the case of two computers linked to a server,where the performance of any computer will depend on the performance of the sharedcomponents, in the server, and the performance of its own components. It should be em-phasized that systems has a wide meaning in this context, including not only engineeringsystems but also networks and organisational structures. For example, if good functioningof multiple academic departments at one university during an exams period with strictmarking deadlines depend on one central information technology support group, then thelatter can be regarded as a component shared by the different departments (‘systems’).While we do not focus explicitly on it, it is important to note that the theory in this papercan also be applied for the case of one system which performs two or more functions, withsome but not all components involved in multiple functions.Navarro et al. [11] and Zarezadeh et al. [19] introduced the system signature for sys-tems with shared components, however the joint system signature representation hasno direct probability interpretation as it can take negative values. Navarro et al. [13]presented the so-called joint bivariate system signature, which has a probabilistic inter-pretation, and they also showed how the joint bivariate system signature can be usedto perform stochastic comparisons. But again their method is limited to one type ofcomponent, which is less useful in real world applications.In this paper, we introduce the joint survival signature of coherent systems withshared components, which can be of different types. This first presentation of the new2oncept emphasizes the opportunity to consider the reliability of the systems at differenttime points, which is crucial for many practical applications. In particular, it enablesone to infer one system’s reliability conditional on the information that another system,with which it shares some components, functions or not at a different time point. Thisintroduction of the new concept is the first step of an important and extensive researchdirection, challenges for computation and application will be outlined in the final section.This paper is organised as follows. Section 2 provides a brief overview of the survivalsignature introduced by Coolen and Coolen-Maturi [2]. Section 3 introduces the newjoint survival signature of two coherent systems with shared components, followed bygeneralisation to three coherent systems with shared components in Section 4. It is brieflydiscussed how the further generalization to more than three systems can be achieved, allrequired ingredients for such a generalization follow quite straightforwardly from the casewith three systems. Finally, in Section 5 opportunities and challenges for further researchto enable practical application of the new concept to large-scale systems and networksare briefly discussed.
2. Survival signature
For a system with n components, we define the state vector x ∈ { , } n with entry x i = 1 if component i functions and x i = 0 if not. The labelling of the componentsis arbitrary but must be fixed to define x . The structure function φ : { , } n → { , } ,defined for all possible x , takes the value 1 if the system functions and 0 if the system doesnot function for state vector x . In this paper, we restrict attention to coherent systems,which implies that φ ( x ) is not decreasing in any of the components of x , so systemfunctioning cannot be improved by worse performance of one or more of its components.We further assume that φ (0) = 0 and φ (1) = 1, so the system fails if all its componentsfail and it functions if all its components function. These assumptions could be relaxedbut are reasonable for most practical systems.Consider a system with K ≥ n k components of type k ∈ { , , . . . , K } and (cid:80) Kk =1 n k = n . Assume that the random failure times of compo-nents of the same type are exchangeable [4], while full independence is assumed for therandom failure times of components of different types. Due to the arbitrary orderingof the components in the state vector, components of the same type can be groupedtogether, leading to a state vector that can be written as x = ( x , x , . . . , x K ), with x k = ( x k , x k , . . . , x kn k ) the sub-vector representing the states of the components of type k .Coolen and Coolen-Maturi [2] introduced the survival signature for such a system,3enoted by Φ( l , l , . . . , l K ), with l k = 0 , , . . . , n k for k = 1 , . . . , K , which is defined tobe the probability that the system functions given that precisely l k of its n k componentsof type k function, for each k ∈ { , , . . . , K } .There are (cid:0) n k l k (cid:1) state vectors x k with (cid:80) n k i =1 x ki = l k ; let S kl denote the set of these statevectors for components of type k and let S l ,...,l K denote the set of all state vectors forthe whole system for which (cid:80) n k i =1 x ki = l k , k = 1 , , . . . , K . Due to the exchangeabilityassumption for the failure times of the n k components of type k , all the state vectors x k ∈ S kl are equally likely to occur, henceΦ( l , . . . , l K ) = (cid:34) K (cid:89) k =1 (cid:18) n k l k (cid:19) − (cid:35) × (cid:88) x ∈ S l ,...,lK φ ( x ) (1)Let C k ( t ) ∈ { , , . . . , n k } denote the number of components of type k in the systemwhich function at time t >
0. The probability that the system functions at time t > P ( T S > t ) = n (cid:88) l =0 · · · n K (cid:88) l K =0 Φ( l , . . . , l K ) P ( K (cid:92) k =1 { C k ( t ) = l k } ) (2)If one assumes independence of the failure times of components of different types, thenthis leads to, for l k ∈ { , , . . . , n k } for each k ∈ { , . . . , K } , P ( K (cid:92) k =1 { C k ( t ) = l k } ) = K (cid:89) k =1 P ( C k ( t ) = l k ) (3)If, in addition, one assumes that the failure times of components of the same type are iid with known cumulative distribution function (CDF) F k ( t ) for type k , then this leads to P ( K (cid:92) k =1 { C k ( t ) = l k } ) = K (cid:89) k =1 (cid:18) n k l k (cid:19) [ F k ( t )] n k − l k [1 − F k ( t )] l k (4)A crucial practical consideration is how to decide if it is reasonable to assume thatcomponents are of the same type, in the sense of having exchangeable failure times. Aneasy way to think about this is as follows. Suppose that there is a number of components,and you get the information that, at a specific time, one of them has failed, without anyfurther information which enables you to identify which component has failed. Exchange-ability then implies that each of these components is equally likely to be the one thathas failed. Note that this includes consideration of the role of the components and the4nvironment in which they function in the system. Of course, this is a subjective mod-elling assumption which relates to the level of detail in which one models the system, inmost of the reliability theory literature it is silently assumed. In particular the commonassumption of iid failure times of components is a stronger assumption.Since its presentation by Coolen and Coolen-Maturi [2], there has been substantialresearch contributing to the further theory and applicability of survival signature meth-ods. An important topic is computation of the survival signature, a useful method basedon binary decision diagrams has been presented by Reed [15], while derivation of thesurvival signature for systems built up by subsystems in series or parallel configurationwas also presented [3]. The survival signature also enables very efficient simulation meth-ods to be developed [7, 14], and further examples of powerful methodology for systemreliability quantification enabled by the use of survival signatures include the modellingof dependence between components of different types [5, 7], Bayesian and nonparametricpredictive inference [1, 3], reliability-redundancy allocation [9], phased-missions [8], com-ponent reliability importance measures [6], resilience achieved by swapping componentswithin a system [10] and stochastic comparison of different systems [18].
3. Joint survival signature of two coherent systems with shared components
In this section we present the joint survival signature of two coherent systems thatshare some components. It is important that the functioning of the two systems can beconsidered at different moments in time. This can be used to derive the marginal survivalfunction of one of the systems as well as the conditional reliability of a system given thestatus of the other system at any time. First we consider systems that only have a singletype of components, which is helpful to explain the main ideas and notation. This is laterextended to systems with components of multiple types. It should be emphasized thatno maintenance or replacement activities are being considered throughout this paper, soonce a component has failed it remains in failed state.
Let T and T be the failure times of two coherent systems, S and S , based oncomponents with iid failure times X , . . . , X n having a common continuous distributionfunction F . The assumption of iid failure times is for ease of presentation. A coherentsystem fails at the failure of one of its components. Assume that the first system has n ∗ components and the second system has n ∗ components, with n of these componentsin common, these are called shared components, so in total there are n = n ∗ + n ∗ − n n >
0, otherwisethe two systems are independent by the iid assumption. Let the numbers of componentsin S and S which are not shared with the other system be denoted by n and n ,respectively, so n ∗ = n + n , n ∗ = n + n and n = n + n + n .The joint survival signature will enable reliability quantification for both systems atpossibly different times, say S is considered at time t and S at time t . This meansthat the numbers of the shared components functioning at these two different times mustboth be specified, note that the specific times t and t do not play any further role inthe survival signature, as this enables inference for all time points when combined withthe component failure time distributions. This means that the joint survival signaturepresented in this paper has the same advantageous property as the survival signature fora single system, that is it takes the structure of the systems into account while beingseparated from the random component failure times.The joint survival signature Φ( l , l , l [1]2 , l ) can be defined as the probability thatsystems S and S both function given that precisely l out of n and l [1]2 out of the n shared components function when S is being considered, and precisely l out of n and l out of the n shared components function when S is being considered. Denotingthe events that S and S function at the moment of time they are considered by SF and SF , respectively, the joint survival signature denoted byΦ( l , l , l [1]2 , l ) = P ( SF , SF | l , l , l [1]2 , l ) (5)It is also important to emphasise, given the above setting, that the same min( l , l [1]2 )shared components are functioning at both times t and t , and the same n − max( l , l [1]2 )shared components are not functioning at both times. The remaining components (if l (cid:54) = l [1]2 are different) fail between the two different times. Therefore, Φ( l , l , l [1]2 , l ) =0 if t < t and l ≥ l [1]2 , or if t > t and l ≤ l [1]2 .Let C t ∈ { , , . . . , n } denote the number of components that are only in system S that function at time t >
0, and let C [1]2 t ∈ { , , . . . , n } denote the number of sharedcomponents in system S that function at t . Similarly, let C t ∈ { , , . . . , n } denotethe number of components that are only in system S that function at time t >
0, andlet C t ∈ { , , . . . , n } denote the number of shared components in system S thatfunction at t . Let the probability distribution of the component failure time have CDF6 ( t ), then for t < t , which implies that l ≤ l [1]2 , P ( T > t , T > t ) = n (cid:88) l =0 n (cid:88) l =0 n (cid:88) l [1]2 =0 n (cid:88) l =0 Φ( l , l , l [1]2 , l ) P C t
Example 1.
Consider the two systems in Figure 1. All components are assumed to beof the same type, with exchangeable failure times. The systems share components A,B and C, and each system has two further components. For System 1 we have n ∗ = 5, n = 2, n = 3 and for System 2 we have n ∗ = 5, n = 2, n = 3. For input( l = 1 , l = 1 , l [1]2 = 2 , l = 1), Table 1 lists all 24 possible scenarios of functioningcomponents. If these are indeed the numbers of functioning components, then each ofthese scenarios has probability 1/24 due to the exchangeability assumption. In this case,the system functions for 10 of the 24 possibille scenarios, hence the survival signature isequal to Φ(1 , , ,
1) = . 8 ystem functions l = 1 l = 1 l [1]2 = 2 l = 10 D F AB A0 D F AB B0 D G AB A0 D G AB B0 D F AC A1 D F AC C1 D G AC A0 D G AC C1 D F BC B1 D F BC C0 D G BC B0 D G BC C0 E F AB A1 E F AB B1 E G AB A0 E G AB B0 E F AC A1 E F AC C1 E G AC A0 E G AC C1 E F BC B1 E F BC C0 E G BC B0 E G BC C Table 1: Functioning components and system state, Example 1
Example 2.
Consider the two systems in Figure 2, which share components A and Band each have one further components. Again, all components are of the same type sotheir failure times are assumed to be exchangeable. For System 1 we have n ∗ = 3, n = 1, n = 2 and for System 2 we have n ∗ = 3, n = 1, n = 2. The survival signature iszero, Φ( l , l , l [1]2 , l ) = 0, for the trivial cases: l + l < l [1]2 = 0 and l = 0. Thesurvival signature for remaining cases is given in Table 2, it is derived in a similar way asillustrated in Example 1 for each possible input, although the actual work required is ofcourse not as bad as it may seem due to logical relationships between the systems’ statesand different inputs under the assumption that the systems are coherent. Assuming thatthe components’ failure times are iid with Exponential distribution with rate 1, thatis F ( t ) = 1 − e − t , then the joint survival function for the two system failure times ispresented in Figure 3. 9ystem 1AB C System 2 BAD Figure 2: Two systems with one type of components, Example 2
C D A,B l ∈ { , } l ∈ { , } l [1]2 ∈ { , , } l ∈ { , , } Φ( l , l , l [1]2 , l )0 1 1 1 1 /
21 0 1 1 01 1 1 1 1 /
20 0 1 1 00 0 2 1 00 1 2 1 1 /
21 0 2 1 01 1 2 1 1 /
20 0 1 2 1 /
20 1 1 2 1 /
21 0 1 2 11 1 1 2 11 1 2 2 1
Table 2: Two systems with one type of components, Example 2 t t s u r v Figure 3: Joint survival function of two systems with one type of components, Example 2 .2. Marginal and conditional survival functions If one has the joint survival signature for the two systems available, it is straight-forward to derive the marginal survival distribution for one of the systems, using theassumption that all components function at time 0, so P ( T > t ) = P ( T > , T > t ) = n (cid:88) l =0 n (cid:88) l =0 Φ( n , l , n , l ) P C t
4. Joint survival signature of three coherent systems with shared components
There will be situations where more than two systems may share some components,e.g. in networks of systems where a central server or electricity supply may serve severalor even all systems. Further examples can be encountered when systems consisting ofcombined hardware and software are considered, with software often shared betweendifferent systems. The concept of joint survival signature, presented in this paper, can begeneralized to any number of systems with any kind of component sharing. To illustratesuch a generalization, we briefly consider the case of three systems with a single typeof components, where the systems share some components. Generalization to multipletypes of components can be achieved along the lines presented above.Consider three coherent systems, S , S and S . Let the number of shared components S i and S j be denoted by n ij , and let the number of components shared by all three systemsbe n . Let n i be the number of components in S i that are not shared with any othersystem. The numbers of components in the systems are n ∗ = n + n + n + n n ∗ = n + n + n + n n ∗ = n + n + n + n and the total number of components is n = n + n + n + n + n + n + n .Let T , T and T be the failure times of systems S , S and S , respectively, basedon components with iid failure times X , X , . . . , X n having a common continuous dis-tribution function with CDF F ( t ). Let l [1]2 and l be the number of components outof n that function when, respectively, S and S is considered. Similarly let l [1]3 ( l )13e the number of components out of n that function when S ( S ) is considered, let l [2]3 ( l ) be the number of components out of n that function when S ( S ) is con-sidered, and finally let l [1]23 , l and l be the number of components out of n that function when S , S and S are considered, respectively. For ease of notation let l = ( l , l , l , l [1]2 , l , l [1]3 , l , l [2]3 , l , l [1]23 , l , l ), and we denote the summationover all their possible values as (cid:80) l . The joint survival signature for these three systems,Φ( l ), can be defined as Φ( l ) = P ( SF , SF , SF | l ) (18)where SF i represents the event that system S i functions at the time point it is considered.Generalizing the approach for two systems, as presented above, the joint survivalfunction for these three systems is derived by P ( T > t , T > t , T > t ) = (cid:88) l Φ( l ) P ( C t,l ) (19)where P ( C t,l ) denotes the probability of the event that the vector l describes preciselythe numbers of components, shared or not shared among the systems, that function atthe times t = ( t , t , t ), where t i is the time at which S i is considered. For example, for t < t < t , which logically implies l [1]2 ≥ l , l [1]3 ≥ l , l [2]3 ≥ l and l [1]23 ≥ l ≥ l , this probability is P ( C t,l ) = P ( C t = l , C t = l , C t = l , C [1]2 t = l [1]2 , C t = l , C [1]3 t = l [1]3 , C t = l ,C [2]3 t = l [2]3 , C t = l , C [1]23 t = l [1]23 , C t = l , C t = l )= n !( n − l )! l ! [1 − F ( t )] l [ F ( t )] n − l × n !( n − l )! l ! [1 − F ( t )] l [ F ( t )] n − l × n !( n − l )! l ! [1 − F ( t )] l [ F ( t )] n − l × n !( n − l [1]2 )!( l [1]2 − l )! l ! [ F ( t )] n − l [1]2 [ F ( t ) − F ( t )] l [1]2 − l [1 − F ( t )] l × n !( n − l [1]3 )!( l [1]3 − l )! l ! [ F ( t )] n − l [1]3 [ F ( t ) − F ( t )] l [1]3 − l [1 − F ( t )] l × n !( n − l [2]3 )!( l [2]3 − l )! l ! [ F ( t )] n − l [2]3 [ F ( t ) − F ( t )] l [2]3 − l [1 − F ( t )] l × n !( n − l [1]23 )!( l [1]23 − l )!( l − l )! l ! × [ F ( t )] n − l [1]23 [ F ( t ) − F ( t )] l [1]23 − l [ F ( t ) − F ( t )] l − l [1 − F ( t )] l (20) Figure 4: Three systems with one type of components, Example 3
Defining this probability similarly for all orderings of ( t , t , t ) becomes cumbersomewith regard to notation, but the idea will be clear. Further development of this method-ology is best done in direct relation to a real-world application, in order to take specificaspects of the joint systems into account. Extension to multiple types of components, andto more than three systems, is conceptually trivial following the methodology presentedin this paper, but cumbersome with regard to notation. Suitable computational algo-rithms for exact calculation or approximations also need to be developed, this is left asan important topic for future research. The next example briefly illustrates the computa-tional effort required when applying this method in a naive way to three simple systemswith joint components, this particularly serves to emphasize the need for development ofsuitable computational theory and algorithms. Example 3.
In addition to the two systems in Example 2, we consider a third systemas in Figure 4. To illustrate the complexity of the approach in this paper for more thantwo systems, we show the effort required to compute the survival signature restrictedto considering all systems at the same moment of time. Both systems 1 and 2 have3 components while system 3 has 4 components, and we have n = n = 0, n = 1(E), n = 1 (B), n = 1 (C), n = 1 (D), n = 1 (A). As in this example wehave at most one shared component among the systems, thus l = l = 0, l ∈ { , } , l ∗ = l [1]2 = l ∈ { , } , l ∗ = l [1]3 = l ∈ { , } , l ∗ = l [2]3 = l ∈ { , } , l ∗ = l [1]23 = l = l ∈ { , } . As l = l = 0 there are 2 possibilities we need to consider, the firstcolumn in Table 3 shows the corresponding number of these possibilities out of 2 . Forexample, the first three rows suggest that at least one of these systems fails when eithercomponent B or D fails, and this counts for 3 × = 768 out of 2 = 1024 possibilities.Clearly, this already requires many combinations to be considered, and the combinatoricsincrease enormously when also considering the systems at different moments in time.15 D A C E l ∗ ∈ { , } l ∗ ∈ { , } l ∗ ∈ { , } l ∗ ∈ { , } l ∈ { , } All systems function2 Table 3: Three systems with one type of components, Example 3
5. Concluding Remarks
In this paper we have introduced the concept of joint survival signature for two systemswith shared components. First we considered the case when we have only one type ofcomponent, then we extended that for multiple types of components. We showed howthis can be generalised for more than two systems with one or multiple types of sharedcomponents. We have also presented how the joint survival signature can be used toderive marginal and conditional survival functions.It is possible to derive variations to the presented joint survival signature for thecase with multiple systems sharing components, and for some specific scenarios reducedversions of the joint survival signature presented here may be sufficient. For example,one could only take into account the total numbers of functioning components of eachtype per system, and use the theorem of total probability and assumed exchangeability ofthe failure times of components of each type to relate this to our survival signature andto inferences on the systems’ reliability. These suggest interesting directions for futureresearch, which will be particularly useful if motivated by practical applications.A crucial consideration is how the joint survival signature can be computed. This ismostly left as an important topic for future research, in the examples in this paper onlysmall systems are considered for which all combinations of functioning components andthe corresponding state of the two systems are easily checked. One can use the classicaltheory of minimal cut and path sets for small systems, but it will be important to developalgorithms to compute the joint survival signature for more complex systems, in particularwhere there are multiple types of components and possibly more than two systems beingconsidered. A main advantage of the joint survival signature for coherent systems is thatit is a non-decreasing function of all its inputs, hence it is straightforward to derive boundsif one only computes the function for a limited number of inputs. The question which16nputs to focus on in order to derive useful bounds with relatively little computationtime is also interesting for future research. Throughout, a major advantage of the jointsurvival signature over the full structure function is that it requires substantially lessstorage, which particularly for large systems with relatively few components types canbe very important. While computing survival signatures may be cumbersome, and mayrequire approximations to be developed, for example by simulations of the system forcertain inputs, the computation is only required once for a system with a fixed structure.In Section 2, a brief discussion was provided of recent developments based on thesingle system survival signature for research and application. All these topics are also ofgreat interest based on the joint survival signature, where for example consideration ofcomponent importance brings novel aspects to the literature as it is likely that componentsshared between multiple systems are more important when all systems are considered thanthey may be for a single system.
Acknowledgements