A good example of a linear combination of variables

This paper discusses a good example using so-called linear combinations of variables. Such combinations are present in all engineering using components that are joined or combined, not only physical components but also e.g. time components. The concept of linear combinations is covered in 'A course in statistics', chapter 4.2. Also 'Exercises with computer support' examples 2.1 – 2.10 treat this subject.

The good and illustrative example below was discovered by Johan Bini, Sony Ericsson, Lund. He also drew the sketch explaining the main features.

The following steps are included:

  1. A good example
  2. Some statistical models
  3. Expressions for the 'mean' and 'sigma' of the linear combination
  4. A simulation of the expressions
  5. Summary

1. A good example

The sketch below shows a part of an assembly of a rather old telephone.

The problem was that if there were no overlap at measure Y it would be possible to see the interior of the telephone; this happened sometimes. With some study of the sketch it is possible to calculate this overlap from the other measurements (NB that some of the measurements in the sketch are not used at all in this discussion). The following expression is our model:

(1)

If Y is positive we can look into the interior of the assembly and if Y is negative there is an overlap making it impossible to see the interior.

We will now consider each letter as a variable with a certain variation and our main problem is how this is transferred to the Y-measure.

2. Some statistical models

The expression for Y in the sketch above seems to be rather simple as it is a mechanical assembly. Our concern is the variation of Y or, perhaps more to the point, the probability of getting a positive value of Y, i.e. an opening at the edge of the assembly.

However, perhaps the engineers say that the measure B, representing the green component, will be compressed to 0.95 of its value before assembly. Then we would state the following model:

(2)

This will not really add much difficulty to the analysis below when calculating the variation (although this will not be shown in this document).

Perhaps the engineers insist that the compression (K) is a random variable of some kind. If so we need to write a more complicated model (NB: model (3) will not be further discussed here).

(3)

3. Expressions for the 'mean' and 'sigma' of the linear combination

Suppose that we believe in model (1) above. Below are then the two expressions for the mean and standard deviation of the combination Y (See e.g. chapter 4.2 in 'A course in statistics'):

The mean:

The sigma:

We need of course good numerical estimates of the different entities in the expressions. Also, we assume that all variables are independent from one another, an assumption reconsidered below.

So far we have not discussed any type of distribution of the different variables even if we suspect that the normal distribution would be useful as at least an approximation. As Y is a combination of several variables we can assume that the result is approximately normally distributed (See e.g. %CLT).

A complication. Looking at the sketch we can see that the assumption of independence might not be fulfilled in all cases. See e.g. the measure C and G, and A and E, respectively. There might be reason to believe that when e.g. the measurement C is high, then also G is high; this because being measures on the same part. This means that there is a correlation between C and G, likewise for A and E. We still believe that other measures are uncorrelated.

This will now change the formula for sigma (not for the mean) and we thus get the following where two extra terms are added (see also 'Exercises with computer support' exercise 2.10):

If the correlation is positive between C and G the coefficient  is positive. This means that sigma of Y is increased (if  would have been negative, the variation of Y would decrease).

We can simplify the expression above if we might assume (need to be checked with data!) that sigma of C and G are equal, and the same idea for A and E. Also, if we can assume that  in both cases are equal to 1 we get the following (changing index for A and E to '1' and for C and G to '2'):

Next step is to simulate the results to illuminate the different expressions.

4. A simulation of the expressions

Below we simulate the situation above. First as if the model has independent variables and then the more complicated expression for sigma of Y. All parameter values are of course invented here as we do not have access to the drawing or any measurements.

Independent variables. In this first simulation we index the data columns by a '1':

name k1 'MuA' k2 'MuB' k3 'MuC' k4 'MuE' k5 'MuG'

name k6 'SigmaA' k7 'SigmaB'k8 'SigmaC'k9 'SigmaE'k10 'SigmaG'

name k11 'n'

name c1 'DataA1' c2 'DataB1' c3 'DataC1'c4 'DataE1' c5 'DataG1' c7 'DataY1'

let n = 10000 # Number of assemblies.

let MuA = 7 # Mean of measurement A.

let MuB = 6 # Mean of measurement B.

let MuC = 5 # Mean of measurement C.

let MuE = 9 # Mean of measurement E.

let MuG = 5 # Mean of measurement G.

let SigmaA = 0.03 # Sigma of measurement A (same as E).

let SigmaB = 0.03 # Sigma of measurement B.

let SigmaC = 0.03 # Sigma of measurement C (same as G).

let SigmaE = 0.03 # Sigma of measurement E (same as A).

let SigmaG = 0.03 # Sigma of measurement G (same as C).

random n DataA1; # Data for measurement A.

normal MuA SigmaA.

random n DataB1; # Data for measurement B.

normal MuB SigmaB.

random n DataC1; # Data for measurement C.

normal MuC SigmaC.

random n DataE1; # Data for measurement E.

normal MuE SigmaE.

random n DataG1; # Data for measurement G.

normal MuG SigmaG.

let DataY1 = DataA1 + DataB1 + DataC1-DataE1- DataG1

desc DataY1 # Describes the results.

Histogram 'DataY1'; # Creates a histogram

Bar; # with corresponding normal

Distribution; # distribution.

Size 2; # Giving a wider line.

Normal.

Correlated variables. In this second simulation we index the data columns by a '2'. To simulate this correlated situation we simulate one random disturbance that we add to both A and E measurements.In this way both A and E are large or small simultaneously and thus correlated. We apply the same idea to the correlated measures C and G (the disturbance will have mean 0):

name k1 'MuA' k2 'MuB' k3 'MuC' k4 'MuE' k5 'MuG'

name k6 'SigmaAE' k7 'SigmaB'k8 'SigmaCG'

name k11 'n'

name c11 'DataA2' c12 'DataB2' c13 'DataC2' c14 'DataE2' c15 'DataG2' c17 'DataY2'

let n = 10000 # Number of assemblies.

let MuA = 7 # Mean of measurement A.

let MuB = 6 # Mean of measurement B.

let MuC = 5 # Mean of measurement C.

let MuE = 9 # Mean of measurement E.

let MuG = 5 # Mean of measurement G.

let SigmaAE = 0.03 # Sigma of measurement A (same as E).

let SigmaB = 0.03 # Sigma of measurement B.

let SigmaCG = 0.03 # Sigma of measurement C (same as G).

random n c100; # Disturbance for A and E.

normal 0 SigmaAE.

let DataA2 = MuA + c100 # Data for measurement A.

let DataE2 = MuE + c100 # Data for measurement E.

random n DataB2; # Data for measurement B.

normal MuB SigmaB.

random n c100; # Disturbance for C and G.

normal 0 SigmaCG.

let DataC2 = MuC + c100 # Data for measurement C.

let DataG2 = MuG + c100 # Data for measurement E.

let DataY2 = DataA2 + DataB2 + DataC2 - DataE2 - DataG2

desc DataY2 # Describes the results.

Histogram 'DataY2'; # Creates a histogram

Bar; # with corresponding normal

Distribution; # distribution.

Size 2; # Giving a wider line.

Normal.

The result variable 'DataY2' shows as expected also a normally distributed result. As we do have the original drawing and tolerances for the parts we cannot apply a totally realistic simulation of the situation.

A comparison between the can be made by two overlaid histograms and two probability plots. It is obvious that the data 'DataY2' has a larger variation of the measure Y because of the correlation between A and E and C and G, respectively:

Histogram 'DataY1' 'DataY2'; # Creates two overlaid histograms

Bar; # with corresponding normal

Distribution; # distribution.

Size 2; # Giving a wider line.

Normal;

Overlay.

PPlot 'DataY1' 'DataY2'; # Creating a probability plot.

Normal;

Symbol;

FitD;

NoCI;

Grid 2;

Grid 1;

MGrid 1;

Overlay.

NB that the diagrams only show 500 data points in order to keep the size of the document low as possible.

5. Summary

This example shows a rather realistic application of the methods and thinking of linear combinations of variables (the non-linear combinations are slightly more difficult to handle, see e.g. and the button [Articles] and the document 'Combination of variables.doc'). Combination of variables often is referred to as 'transmission of errors', a study how the variation is affected.

In this example we did not have the true drawing and thus not the proper measurements but this fact does not have any important impact on the reasoning.

In the first simulation above we assumed that there is no correlation at all between the measurements but after taking another look at the data we changed this assumption stating that it most likely is a correlation between the measure A and E (and the same for the measurements C and G).

The result was that the variation of the measure Y increased substantially, see e.g. the histogram and the probability plot above.

We also assumed that the correlation between A and E was perfect (i.e. the coefficient was 1) but this needs of course to be verified by the available data.

A good example of a linear combination • rev A

2008-02-06 • 1(5)