Plot to Test If Data Is Normal

Susan Kolakowski

Design of Experiments – EQAS 770

Homework #1

March 22, 2006

Problem 1

Photoresist is a light-sensitive material applied to semiconductor wafers so that the

circuit pattern can be imaged on to the wafer. After application, the coated wafers are

baked to remove the solvent in the photoresist mixture and to harden the resist. Here

are the measurements of photoresist thickness( in kA) for eight wafers baked at 2

different temperatures. Assume that the runs were made in random order and they are

independent. (Problem statement copied from assignment)

Temp. / Photoresist Thickness (in kA)
95°C / 11.176 / 7.089 / 8.097 / 11.739 / 11.291 / 10.759 / 6.467 / 8.315
100°C / 5.263 / 6.748 / 7.461 / 7.015 / 8.133 / 7.418 / 3.772 / 8.963

a)Preliminary Analysis

For the preliminary analysis, the descriptive statistics were calculated and three plots were produced: boxplot, dotplot and histogram of data.

The results of the descriptive statistics calculations were as follows (where N represents the number of samples for each temperature and the mean, standard deviation, minimum, median and maximum are in units of kA):

Temperature / N / Mean / Standard Deviation / Minimum / Median / Maximum
95°C / 8 / 9.367 / 2.100 / 6.467 / 9.537 / 11.739
100°C / 8 / 6.847 / 1.640 / 3.772 / 7.217 / 8.963

By looking at these statistics, it appears that the photoresist thickness may differ depending on which temperature the resisters are baked at. At this stage, we can only hypothesize this due to the fact that the mean values of the 8 samples baked at each of the temperatures is different but since the mean value for 100°C is greater than one standard deviation away from the mean value for 95°C, this seems to be the case. Another observation to make is that the maximum thickness for 100°C is less than the mean for 95°C which also makes it appear that baking temperature affects the thickness of photoresisters.

Here we have a boxplot of the data illustrating the spread of the samples at each temperature. You can see from this plot that the entire sample set baked at 100°C has a lower thickness than the median of the sample set baked at 95°C. This again makes it appear that the baking temperature has a significant affect on the photoresistors’ thicknesses.

The dotplot is another illustration of the data collected but instead of display statistics of the data, it displays where each data sample falls. In my opinion it is harder to get an idea of the significance of temperature to photoresist thickness using this plot, although you can see that two resisters baked at 100°C were measured to have thicknesses lower than the minimum thickness achieved when baking at 95°C and that four photoresisters baked at 95°C exceeded the maximum thickness achieved when baking at 100°C.

This histogram of the two sets of data displays the probability of continuous Normal distributions described by the statistics produced by the 8 samples for each temperature. In this plot, you can again see that the mean for the 8 resisters baked at 95°C is greater than the mean for the 8 resisters baked at 100°C, although this plot does show a fair amount of overlap between the two distributions.

Based on only the descriptive statistics and the three plots produced, I would say that it appears that there may be a significant difference between the thickness of photoresisters baked at different temperatures and that it is worthwhile to go forward with this data to see if there is enough evidence to support this difference.

b) Check all assumptions needed to perform the analysis:

Samples are from Normal distribution.
Variance for each temperature is equal.
Runs were made in random order and are independent.

1.A probability plot was produced in Minitab to test if the data could be assumed to beNormal:

Since the p-values are greater than α=0.05 for both temperatures, there is not enough evidence to say that these two data sets are not Normally distributed. Therefore the assumption that the data is Normal is met.

2.A test to determine if the variances for each temperature could be assumed to be equal was run in Minitab. This test produced the following plot:

Since the p-values from both tests (F-test and Levene’s test) are greater than α=0.05, we can safely assume that the variances are equal. There is not enough evidence to reject this assumption.

3.It was given in the problem statement that runs were made in random order and are independent.

c)A two sample t-test for equal variances was performed to determine if there was enough evidence to support the claim that there is a difference in the mean thickness of photoresisters baked at 95°C versus 100°C. The assumptions required to perform this test were met as described in part b of this problem. For this test, an α-value of 0.05 was used.

The results of the test were produced by Minitab as follows:

Two-sample T for Data

Labels N Mean StDev SE Mean

T=100 8 6.85 1.64 0.58

T=95 8 9.37 2.10 0.74

Difference = mu (T=100) - mu (T=95)

Estimate for difference: -2.52000

95% CI for difference: (-4.54043, -0.49957)

T-Test of difference = 0 (vs not =):

T-Value = -2.68 P-Value = 0.018 DF = 14

Both use Pooled StDev = 1.8840

Since the p-value produced by this test is less than α=0.05, there is enough evidence to say that the means are not equal.

d)The 95% confidence interval for the difference in the means was calculated during the 2-sample t-test performed for part c: (-4.54043, -0.49957)

Since the value of 0 does not fall into this confidence interval, there is not enough confidence to say that the difference for the means of the populations could be zero (or that there may not be a difference between the population means).

e)The sample size necessary to detect an actual difference in mean thicknesses of 1.5kA with a power of 0.9 (or β-risk of 0.1) was determined in Minitab using a process standard deviation of 1.8 kA and an α-value of 0.05.

The results from determining this sample size were:

2-Sample t Test

Testing mean 1 = mean 2 (versus not =)

Calculating power for mean 1 = mean 2 + difference

Alpha = 0.05 Assumed standard deviation = 1.8

Sample Target

Difference Size Power Actual Power

1.5 32 0.9 0.906801

The sample size is for each group.

These results tell us that to detect a difference of 1.5 kA between the means for each temperature, a sample size of 32 photoresisters baked at each temperature is necessary. This value was determined under the assumption that the process variation is 1.8 kA, allowing the maximum β-risk to be 0.1 and using an α-value of 0.05.
Problem 2 - ANOVA

P-values much greater than α=0.05, not enough evidence to reject hypothesis that the four data sets are all from Normal distributions.

Descriptive Statistics: Data

Variable Labels N Mean StDev Minimum Median Maximum

Data MT1 4 2971.0 120.6 2865.0 2945.0 3129.0

MT2 4 3156.3 136.0 2975.0 3175.0 3300.0

MT3 4 2933.8 108.3 2800.0 2942.5 3050.0

MT4 4 2666.3 81.0 2600.0 2650.0 2765.0

ANOVA check using Minitab

One-way ANOVA: Data versus Labels

Source DF SS MS F P

Labels 3 489740 163247 12.73 0.000

Error 12 153908 12826

Total 15 643648

S = 113.3 R-Sq = 76.09% R-Sq(adj) = 70.11%

Individual 95% CIs For Mean Based on

Pooled StDev

Level N Mean StDev ---+------+------+------+------

MT1 4 2971.0 120.6 (------*-----)

MT2 4 3156.3 136.0 (-----*-----)

MT3 4 2933.8 108.3 (-----*-----)

MT4 4 2666.3 81.0 (-----*-----)

---+------+------+------+------

2600 2800 3000 3200

Pooled StDev = 113.3

Problem 3

Two-Sample T-Test and CI: MT1, MT3

Two-sample T for MT1 vs MT3

N Mean StDev Mean

MT1 4 2971 121 60

MT3 4 2934 108 54

Difference = mu (MT1) - mu (MT3)

Estimate for difference: 37.2500

95% CI for difference: (-160.9986, 235.4986)

T-Test of difference = 0 (vs not =): T-Value = 0.46 P-Value = 0.662 DF = 6

Both use Pooled StDev = 114.5795

Estimate for difference: 37.2500

95% CI for difference: (-160.9986, 235.4986)

Under assumption that variances for MT1 and MT2 are equal:

c) not enough evidence to say that the means for these two techniques are not equal.

Problem 4

P-value = all normal

Descriptive Statistics: Data

Variable Labels N Mean StDev Minimum Median Maximum

Data Compact 10 3.900 2.283 1.000 3.500 7.000

Full Size 10 5.300 2.452 2.000 5.000 10.000

Midsize 10 3.600 2.221 1.000 3.500 7.000

Sub-Compact 10 4.100 1.969 1.000 4.000 7.000

Full-size car may have affect

Here see one outlier for full-size increased mean for full-size and made it appear significant but at same time full-size had no one counts while others had total of 5 1 counts

One-way ANOVA: Data versus Labels

Source DF SS MS F P

Labels 3 16.68 5.56 1.11 0.358

Error 36 180.30 5.01

Total 39 196.98

S = 2.238 R-Sq = 8.47% R-Sq(adj) = 0.84%

Individual 95% CIs For Mean Based on

Pooled StDev

Level N Mean StDev --+------+------+------+------

Compact 10 3.900 2.283 (------*------)

Full Size 10 5.300 2.452 (------*------)

Midsize 10 3.600 2.221 (------*------)

Sub-Compact 10 4.100 1.969 (------*------)

--+------+------+------+------

2.4 3.6 4.8 6.0

Pooled StDev = 2.238

P greater than alpha=0.1 -> not enough evidence to say that the means are not equal therefore not enough evidence to state that the type of car effects the rental contract

Last plot – appears random = good

First plot - residuals fit line well – appear normal = good

Res vs fit – no pattern = good

P-value is low – data appears to move in pattern around fit line = bad

But p-value is greater than alpha so there’s not enough evidence to say that the residuals are not normally distributed

Test FS vs not FS – sample sizes not equal – just look at plots