Optimal multilevel protection in series-parallel systems

Gregory Levitin

The Israel Electric Corporation Ltd., Haifa

E-mail:

Abstract

This paper considers vulnerable systems that can have different states corresponding to different combinations of available elements composing the system. Each state can be characterized by a performance rate, which is the quantitative measure of a system's ability to perform its task. Both the impact of external factors (attack) and internal causes (failures) affect system survivability, which is determined as the probability of meeting a given demand.

In order to increase the system’s survivability a multilevel protection is applied to its subsystems. This means that a subsystem and its inner level of protection are in their turn protected by the protection of an outer level. This double-protected subsystem has its outer protection and so forth. In such systems, the protected subsystems can be destroyed only if all of the levels of their protection are destroyed. Each level of protection can be destroyed only if all of the outer levels of protection are destroyed.

In such systems, different protections play different roles in providing for the system's survivability. Subject to budget limitations a question arises which protections should be applied to obtain the desired survivability. An algorithm for solving the protection cost minimization problem subject to survivability constraint is presented in the paper. The algorithm is based on a universal generating function technique used for system survivability evaluation and on a genetic algorithm used as an optimization engine.

Illustrative example is presented.

Keywords: survivability, multilevel protection, optimization, universal generating function, genetic algorithm

Abbreviations

MSSmulti-state system

PGprotection group

GAgenetic algorithm

Nomenclature

Pr{x}probability of event x

smexpected survivability of m-th protection

cmcost of m-th protection

Gjrandom performance rate of MSS element j

gjkperformance rate of MSS element j at state k

pjkprobability that MSS element j is in state k

Kjnumber of different states of element j

G*random output performance rate of the entire MSS

Ssurvivability of the entire MSS

S*minimal allowable survivability of the entire MSS

wminimal allowable level of MSS performance (demand)

Ctottotal system protection cost

Hvector that determines the set of chosen protections

ui(z)u-function representing performance distribution of i-th element

u-function representing performance distribution of m-th subsystem belonging to some PG when protection of this group is destroyed

u-function representing performance distribution of m-th subsystem belonging to some PG when protection of this group is not destroyed

dm(z)double-u-function (d-function). Composition of u-functions Um(z) and

s(dm(z)) operator incorporating protection survivability into d-function representing the performance distributions of m-th PG

(U(z),w)operator over MSS u-function which determines probability Pr{G*w}

composition operator over u-functions for parallel connection of elements

composition operator over u-functions for series connection of elements

1. Introduction

Survivability, the ability of a system to tolerate intentional attacks or accidental failures or errors, is becoming especially important when a system operates in battle conditions or is affected by a corrosive medium or another hostile environment.

A survivable system is one that is able to "complete its mission in a timely manner, even if significant portions are incapacitated by attack or accident" [1]. This definition presumes two important things:

First, both the impact of external factors (attack) and internal causes (failures) affect the system’s survivability. Therefore, as was stated in [2], it is important to take into account the influence of the reliability (availability) of the system elements on the survivability of the entire system.

Second, a system can have different states corresponding to different combinations of failed or damaged elements composing the system. Each state can be characterized by a system performance rate, which is the quantitative measure of a system’s ability to perform its task. For example, in [2] and [3] each system state is characterized by an available ship propulsion power or by an available electric power respectively. Therefore, a system should be considered a multi-state one when its survivability is analyzed.

When applied to multi-state systems, the mission’s success depends on a system’s ability to meet the demand (the required performance level). In this case, the outage effect will be essentially different for units with different nominal performances and will also depend on demand. Therefore, the performance rates (productivity or capacity) of system elements should be taken into account as well as the level of demand when the entire system’s survivability is estimated.

Numerous studies were devoted to estimating the impact of external factors on the system’s survivability based on the common cause failure approach [4-13]. These studies consider systems with identical elements (k-out-of-n formulation). In recent works [14-18] the effect of the external impact on the survivability of different types of multi-state systems was studied.

All of the mentioned works consider independent common cause failures which allows one to estimate the effect of protections on the system’s survivability since the protection of the system elements (or subsystems) influences the probabilities of common cause failures. The protection can be performed by spatial dispersion, by encapsulating different elements into different protective casings, by building defense lines, etc. Each type of protection is characterized by its survivability defined as probability that it survives an external impact.

In real systems, a multilevel protection is often used. The multilevel protection means that a subsystem and its inner level protection are in their turn protected by the protection of the outer level. This double-protected subsystem has its outer protection and so forth. In such systems, the protected subsystems can be destroyed only if all of the levels of their protection are destroyed. Each level of protection can be destroyed only if all of the outer levels of protection are destroyed. This creates the statistical dependence among events of destruction of different protection levels.

In systems consisting of nonidentical elements and having complex multilevel protection, different protections play different roles in providing for the system's survivability. Subject to investment cost limitations, one usually has to find minimal cost configuration of protections that provides desired system survivability. The paper formulates the problem of optimal multilevel protection and presents an algorithm for solving this problem.

In the following section, the system model is presented and the optimization problem is defined. An algorithm for evaluating the system survivability for arbitrary set of protections is presented in section 3. Section 4 describes the optimization technique. In section 5 an illustrative example of the system optimization problem is presented.

2. System model and problem definition

The multi-state system (MSS) consists of n basic elements (the lowest-level parts of the system) composing a series-parallel structure in a reliability logic-diagram sense.

The functioning of each element j is characterized by its random performance Gj. The element can have Kj different states (from total failure up to perfect functioning) with performance rates gjk (1kKj). In a state of total failure, the element performance rate is equal to zero. The performance distribution of each element is given:

Pr{Gj=gjk}=pjk, for 1jn,(1)

where Pr{Gj=gjk} is the probability that the random performance Gj is equal to gjk.

The random performance rate of the entire MSS G* depends on the nature of the elements' interaction in the system and on the distribution of the elements' performance. It is determined by the system structure function:

G*=f(G1,G2,…,Gn). (2)

The structure function can usually be determined recursively.

The system survives if its performance rate is not less than the minimal allowable level w. The MSS survivability S is the probability that the system survives:

S(w)=Pr{G*w}.(3)

Single elements or groups of elements can be protected. All the elements having the same protection compose a protection group (PG). The elements belonging to any PG compose a series-parallel structure. Any protection group can belong to another protection group. All of the elements belonging to a PG are destroyed if the protection of this group is destroyed.

Protection of any PG m cannot be destroyed if this group belongs to another PG and the protection of the outer group is not destroyed. If all of the outer protections are destroyed, the protection of PG m survives with the probability sm.

The performance of any destroyed element is equal to zero.

It is assumed that any part of the system can be attacked with the same probability (the source of the external impact has sufficient resources to attack all elements and it cannot select which elements to attack).

The difference between external impact and internal failures is that an external impact cannot damage an element unless its various levels of protection have been destroyed, whereas internal failures can occur at any time, regardless of whether the protection has been destroyed. The element destruction caused by an external impact and its transitions in the space of states caused by failures and repairs are independent events.

Assume that M different protections can be installed in a given series-parallel system and for each protection m its cost cm and survivability sm are known. Let define a binary vector H in which h(m)=1 (1mM) means that protection m is installed and h(m)=0 means that the protection is not installed. Note that in the considered model the protection decisions are binary (either protect or not) and the parameters of each protection are fixed (this paper does not address continuous levels of protection).

For the given system structure and the given parameters of system elements and protections the system's survivability depends only on the vector H. The total protection cost is

.(4)

The optimal protection problem can be formulated as follows

subject to S(w,H)S*,(5)

where S* is a desired level of system survivability.

3. System survivability evaluation based on the universal generating function method

The procedure used in this paper for the system survivability evaluation is based on the universal generating function (u-function) technique, which was introduced in [19] and which proved to be very effective for the reliability evaluation of different types of multi-state systems [14-18, 21-25]. The u-function extends the widely known ordinary moment generating function (OGF). The essential difference between the ordinary and universal generating functions is that the latter allows one to evaluate probabilistic distributions of overall performance for wide range of systems characterized by different topology, different nature of interaction among system elements and different physical nature of elements' performance measures. This can be done by introducing different composition operators over UGF (the only composition operator used with OGF is the product of polynomials).

3.1. u-functions of individual elements and their compositions

The u-function of a discrete random variable X is defined as a polynomial

(6)

where the variable X has K possible values and qk is the probability that X is equal to xk. To evaluate the probability that the random variable X is not less than the value w the coefficients of polynomial u(z) should be summed for every term with xjw:

(7)

This can be done using the following  operator over u(z):

(8)

where for the individual term :

(9)

In our case, the polynomial uj(z) can define performance distribution of element j, i.e. it represents all of the possible states of the element by relating the probabilities of each state to the performance of the element in that state. Note that the performance distribution of the basic element j defined by the vectors {gjk, 1kKj} and {pjk, 1kKj} can now be represented as

(10)

To obtain the u-function of a subsystem containing two elements, composition operators are introduced. These operators determine the u-function for two elements connected in parallel and in series, respectively, using simple algebraic operations on the individual u-functions of basic elements. All the composition operators take the form

(11)

The obtained u-function relates the probability of each state of a subsystem (equal to the product of the probabilities of states of its independent elements) to the performance rate of the subsystem in this state. The function (.) in composition operators expresses the entire performance rate of the subsystem consisting of two elements connected in parallel or in a series in terms of the individual performance rates of the elements. The definition of the function (.) strictly depends on the physical nature of the system performance measure and on the nature of the interaction of the system elements. In [20] two types of MSS are considered. For the sake of simplicity, we consider here only those MSS in which the performance measure is defined as productivity or capacity (continuous materials or energy transmission systems, manufacturing systems, power supply systems). To apply the suggested method to other types of MSS one has only to choose the corresponding functions (.) [20, 25].

In MSS of a considered type, the total performance rate of a pair of elements connected in parallel is equal to the sum of the performance rates of the individual elements. When the elements are connected in series, the element with the lowest performance rate becomes the bottleneck of the subsystem. Therefore, for a pair of elements connected in series the performance rate of the subsystem is equal to the minimum of the performance rates of the individual elements.

Therefore, the composition operators  anddefined for the parallel and series connections of a pair of elements respectively take the form

(12)

and

. (13)

Note that any subsystem consisting of two elements can be considered as a single equivalent element with a performance distribution equal to the performance distribution of the subsystem (represented by u-function U(z) obtained by the corresponding composition operator over u-functions of the two elements).

3.2. u-functions of PGs and their compositions

Observe that the probability of each state of a protected element (or subsystem) depends on the state of the protection. Therefore, each subsystem belonging to some PG is characterized by two conditional performance distributions: first corresponding to the case when the protection of the PG is destroyed and second corresponding to the case when the protection of the PG survives. In order to represent the performance distributions of a subsystem m belonging to some PG we introduce the following double u-function (d-function) dm(z)=<Um(z),>, where Um(z) and represent performance distributions for the first and second cases respectively.

Note that if the protection of a basic single element is destroyed, this element is destroyed with probability 1 and has performance rate 0. Therefore for a basic single element j that has a performance distribution represented by the u-function uj(z)

dj(z)=<z0, uj(z)>. (14)

It can be easily seen that any pair of elements with d-functions dj(z) and di(z) belonging to the same PG can be replaced by the equivalent element with d-function

dj(z)di(z) = <,, > = < ,> (15)

if they are connected in parallel and can be replaced by the equivalent element with d-function

dj(z)di(z) = <,, > = < , >. (16)

if they are connected in series.

Observe that any PG m consists of a subsystem and its protection. The subsystem can always be replaced by its equivalent element with d-function dm(z)=<Um(z),>. The protection of the PG m has survivability sm. If the protection survives (with probability sm), the subsystem has its performance distribution represented by the u-function . If the protection is destroyed (with the probability 1-sm), the subsystem has its performance distribution represented by the u-function Um(z), Therefore, the performance distribution of the entire PG (subsystem and its protection) can be obtained as

(1-sm)+ sm. (17)

If the PG m with its protection are in their turn protected by an outer level protection (which means that they belong to another PG), in the case when the outer protection survives, the inner protection also survives and the subsystem performance distribution is represented by the u-function .

These considerations allow one to replace an element with d-function dm(z)=<Um(z),> and its protection with survivability sm by the equivalent unprotected element with d-function obtained by applying the following operator smover dm(z):

sm(dj(z)) = sm,> = <(1-sm)+ sm,> (18)

3.3. Algorithm for MSS survivability evaluation

Consecutively applying the operators presented in the previous section and replacing pairs of elements belonging to the same PG and a single element with protection by equivalent unprotected elements one can obtain the d-function representing the performance distribution of the entire system. The following recursive algorithm obtains the system survivability for an arbitrary set of protections determined by the vector H:

  1. Obtain d-functions of all of the system elements using Eq. (10) and (14).
  2. According to the vector H determine all of the system's PGs.
  3. If the system contains a pair of unprotected elements connected in parallel or in a series and belonging to the same PG, replace this pair with an equivalent element with d-function obtained by  or  operator using Eq. (12) and (15) or (13) and (16) respectively.
  4. If the system contains PG consisting of a single element, replace this element and its protection with a single equivalent element with d-function obtained using Eq. (18).
  5. If the system contains more than one element or PG with protection, return to step 2.
  6. Determine the d-function of the entire series-parallel system as the d-function of the remaining single equivalent element. The system performance distribution is represented by the first u-function of this d-function.
  7. Obtain the system survivability for the given demand w by applying the operator (8) over the u-function representing the system performance distribution.

4. Optimization technique

Equation (7) formulates a complicated assignment optimization problem having 2M possible solutions. For systems with large number of possible protections an exhaustive examination of all possible solutions is not realistic, considering reasonable time limitations. As in most optimal assignment problems, the quality of a given solution is the only information available during the search for the optimal solution. Therefore, a heuristic search algorithm is needed which uses only estimates of solution quality and which does not require derivative information to determine the next direction of the search.

The recently developed family of genetic algorithms is based on the simple principle of evolutionary search in solution space. GAs have been proven to be effective optimization tools for a large number of applications. There are several other universal approaches for solving reliability optimization problems (such as dynamic and nonlinear programming, tabu search, simulated annealing etc). The main advantage of the GA is that it can be easily adapted for solving vast range or optimization problems within the same basic framework.