In- and Output of Data

In- and Output of Data

D:\SG\aquila\qm\lehre\tacs-ue\ilmt\stat\STATG.DOCVersion 18.05.2004page 1 of 10

The Use of Statgraphics 5.0

IN AND OUTPUT

1. Import of Data Files

2. Input of Data

3. Modification and Output of Results

MULTIPLE COMPARISONS

1. Import of Data

2. Visualization of Data

3. Test for Outlier

4. Test for Homogenity of Variances

5. Test for Normal Distribution

6. ANOVA - Is there any difference between the samples ?

7. Multiple Range Tests - Which samples are different?

VARIANCE COMPONENTS (Nested Designs): error of sampling, analysis

1. Import of Data

2. Visualization of Data

3. Test for Outlier

4. Test for Homogenity of Variances

5. Test for Normal Distribution

6. ESTIMATE VARIANCE COMPONENTS

EXPERIMENTAL DESIGNS

1. Create Experiment

2. Run Experiments

3. Enter Data

4. Analyze Data

Files:

  • In and Output

IO.XLS

REGR.TXT

  • Anova

HVA.CSV

  • Experimental Designs

TAU.XLS

TAU1.SFX

IN AND OUTPUT

1. Import of Data Files

Configuration: \Windows\Systemsteuerung\Ländereinstellung\Zahl:

Dezimalzeichen...... .
Symbol f. Zifferngruppierung...... blank
Listentrennzeichen...... ;

File /Open Data File /Dateityp: Alle Files(*.*)

  • Excel:/IO.XLS/Variable Names from first row
  • CSV:/HVA.CSVcomma delimited/Variable Names from first row
  • Textfile: /REGR.TXTtab delimited/Variable Names from first row

2. Input of Data

  • Mark column /right mouse button /Modify Column

x y

1 11.5

2 12.4

3 13

4 16

5 17

  • File /Save Data File as: Regr.sf

3. Modification and Output of Results

ANALYSIS

  • File /Open Data File /regr.sf
  • Relate /Simple Regression
  • Tabular Options: Analysis Summary
  • Graphical Options: Plot fitted model, Residuals vs. x

Modification

  • click window 2x with left mouse button
  • Click element with right mouse button
  • Options

OUTPUT

a) to Statreporter: click window with right mouse button /Copy to Statreporter

b) from Statreporter to Winword: copy and paste

Textwindow

  • click window 2x with left mouse button /mark text /Icon Cut -> Winword: insert as text
  • (click window 2x with left mouse button /Icon Copy -> Winword: insert as object)

save Graphic to file

  • Save Graph as regr.wmf

Without colours:Graphics\Options\Profile: Black and White
File\PageSetup: Black and White

Regression Analysis - Linear model: Y = a + b*X

Dependent variable: Y

Independent variable: X

Standard T

Parameter Estimate Error Statistic P-Value

------

Intercept 9.6 0.73964 12.9793 0.0010

Slope 1.46 0.22301 6.5468 0.0072

------

Analysis of Variance

Source Sum of Squares Df Mean Square F-Ratio P-Value

------

Model 21.316 1 21.316 42.86 0.0072

Residual 1.492 3 0.497333

------

Total (Corr.) 22.808 4

Correlation Coefficient = 0.966739

R-squared = 93.4584 percent

Standard Error of Est. = 0.705219

The StatAdvisor

The output shows the results of fitting a linear model to describe the relationship between Y and X. The equation of the fitted model is:

Y = 9.6 + 1.46*X

Since the P-value in the ANOVA table is less than 0.01, there is a statistically significant relationship between Y and X at the 99% confidence level.

The R-Squared statistic indicates that the model as fitted explains 93.4584% of the variability in Y.

The correlation coefficient equals 0.966739, indicating a relatively strong relationship between the variables.

The standard error of the estimate shows the standard deviation of the residuals to be 0.705219. This value can be used to construct prediction limits for new observations by selecting the Forecasts option from the text menu.

MULTIPLE COMPARISONS

Problem: Which of 5 products are different in the moisture content?

Each product is analysed 5 times. The averages of each product are compared by multiple range tests.

D:\SG\aquila\qm\lehre\tacs-ue\ilmt\stat\STATG.DOCVersion 18.05.2004page 1 of 10

1. Import of Data

HVA.CSV (comma delimited):

!Variable Probe und Nr muß sortiert sein!

PROBE NR TS

1 1 90.91

1 7 90.60

1 11 90.40

1 13 90.52

1 15 90.77

2 2 90.79

2 5 90.36

2 8 90.32

2 18 90.59

2 21 90.51

3 4 90.27

3 12 90.37

3 17 90.38

3 20 90.31

3 24 90.49

4 6 90.57

4 9 90.82

4 14 90.63

4 19 90.98

4 23 90.44

5 3 90.24

5 10 90.35

5 16 90.15

5 22 90.08

5 25 90.46

2. Visualization of Data

a) Test for Trend: Data versus sequence of measurements

Icon Scatterplot: NR->X TS->Y PROBE->Select

Pane Options: PROBE->Point Codes, Points+Lines

b) Visual test for Outlier and Distribution

\Compare \Analysis of Variance \One-Way ANOVA

PROBE->Factor TS->Dependent Variable

  • Scatterplot

Graphic Options: Scatterplot

  • Box-and-Whisker Plot

Graphic Options: Box-and-Whisker-Plot

Pane Options: vertical

D:\SG\aquila\qm\lehre\tacs-ue\ilmt\stat\STATG.DOCVersion 18.05.2004page 1 of 10

3. Test for Outlier

Grubbs: PW = | xi - av(xi)| / s < T (replications;)

4. Test for Homogenity of Variances

\Compare \Analysis of Variance \One-Way ANOVA

PROBE->Factor TS->Dependent Variable

Tabular Options: Variance Check

Variance Check

Cochran's C test: 0.297977P-Value = 0.99813

Bartlett's test:1.19452P-Value = 0.519824

Hartley's test: 6.5PW = s2max / s2min < T(;samples;replications-1)

The StatAdvisor

The three statistics displayed in this table test the null hypothesis that the standard deviations of TS within each of the 5 levels of PROBE is the same. Of particular interest are the two P-values. Since the smaller of the P-values is greater than or equal to 0.05, there is not a statistically significant difference amongst the standard deviations at the 95.0% confidence level.

5. Test for Normal Distribution

\Compare \Analysis of Variance \One-Way ANOVA

PROBE->Factor TS->Dependent Variable

Tabular Options: Summary Statistics \Pane Options: Selection of parameters
NVif stand. Skewness/Curtosis <=+/-2

Summary Statistics for TS

PROBE Count Average

1 5 90.64

2 5 90.514

3 5 90.364

4 5 90.688

5 5 90.256

------

Total 25 90.4924

PROBE Variance Standard deviation

1 0.04085 0.202114

2 0.03583 0.189288

3 0.00698 0.0835464

4 0.04537 0.213002

5 0.02323 0.152414

------

Total 0.0530607 0.230349

PROBE Minimum Maximum

1 90.4 90.91

2 90.32 90.79

3 90.27 90.49

4 90.44 90.98

5 90.08 90.46

------

Total 90.08 90.98

PROBE Range Stnd. skewness

1 0.51 0.288577

2 0.47 0.589419

3 0.22 0.663105

4 0.54 0.39776

5 0.38 0.287198

------

Total 0.9 0.899742

PROBE Stnd. kurtosis Sum

1 -0.530679 453.2

2 -0.178296 452.57

3 0.314799 451.82

4 -0.446946 453.44

5 -0.589814 451.28

------

Total -0.279892 2262.31

The StatAdvisor

This table shows various statistics for TS for each of the 5 levels of PROBE. The one-way analysis of variance is primarily intended to compare the means of the different levels, listed here under the Average column. Select Means Plot from the list of Graphical Options to display the means graphically.

R/s-test (David): Tu(replications;)<(PW = R/s) <To (replications;)

6. ANOVA - Is there any difference between the samples ?

\Compare \Analysis of Variance \One-Way ANOVA

PROBE->Factor TS->Dependent Variable

Tabular Options: Anova Table

ANOVA Table for TS by PROBE

Analysis of Variance

Source Sum of Squares Df Mean Square F-Ratio P-Value

------

Between groups 0.664416 4 0.166104 5.45 0.0039

Within groups 0.60904 20 0.030452

------

Total (Corr.) 1.27346 24

The StatAdvisor: The ANOVA table decomposes the variance of TS into two components: a between-group component and a within-group component. The F-ratio, which in this case equals 5.45462, is a ratio of the between-group estimate to the within-group estimate. Since the P-value of the F-test is less than 0.05, there is a statistically significant difference between the mean TS from one level of PROBE to another at the 95.0% confidence level. To determine which means are significantly different from which others, select Multiple Range Tests from the list of Tabular Options.

7. Multiple Range Tests - Which samples are different?

Tabular Options: Multiple Range Tests

Pane Options: LSD, Tuckey HSD, Scheffe, Bonferroni, Student-Newman Keuls, Duncan

Multiple Range Tests for TS by PROBE: Method: 95.0 percent LSD

PROBE Count Mean Homogeneous Groups

------

5 5 90.256 X

3 5 90.364 XX

2 5 90.514 XX

1 5 90.64 X

4 5 90.688 X

Contrast Difference +/- Limits

------

1 - 2 0.126 0.230221

1 - 3 *0.276 0.230221

1 - 4 -0.048 0.230221

1 - 5 *0.384 0.230221

2 - 3 0.15 0.230221

2 - 4 -0.174 0.230221

2 - 5 *0.258 0.230221

3 - 4 *-0.324 0.230221

3 - 5 0.108 0.230221

4 - 5 *0.432 0.230221

------

* denotes a statistically significant difference.

The StatAdvisor: This table applies a multiple comparison procedure to determine which means are significantly different from which others. The bottom half of the output shows the estimated difference between each pair of means. An asterisk has been placed next to 5 pairs, indicating that these pairs show statistically significant differences at the 95.0% confidence level. At the top of the page, 3 homogenous groups are identified using columns of X's. Within each column, the levels containing X's form a group of means within which there are no statistically significant differences. The method currently being used to discriminate among the means is Fisher's least significant difference (LSD) procedure. With this method, there is a 5.0% risk of calling each pair of means significantly different when the actual difference equals 0.

Multiple Range Tests for TS by PROBE: Method: 95.0 percent Bonferroni

PROBE Count Mean Homogeneous Groups

------

5 5 90.256 X

3 5 90.364 XX

2 5 90.514 XX

1 5 90.64 X

4 5 90.688 X

Contrast Difference +/- Limits

------

1 - 2 0.126 0.348031

1 - 3 0.276 0.348031

1 - 4 -0.048 0.348031

1 - 5 *0.384 0.348031

2 - 3 0.15 0.348031

2 - 4 -0.174 0.348031

2 - 5 0.258 0.348031

3 - 4 -0.324 0.348031

3 - 5 0.108 0.348031

4 - 5 *0.432 0.348031

------

* denotes a statistically significant difference.

VARIANCE COMPONENTS (Nested Designs): error of sampling, analysis

Problem:How big are the contributions of the sampling method and the analysis method to the variability of the analysed moisture content?

To quantify the variance within the samples and the variance of the averages of the samples 5 samples are drawn from a bag, homogenized and each sample is analysed 5 times.

1. Import of Data

2. Visualization of Data

3. Test for Outlier

4. Test for Homogenity of Variances

5. Test for Normal Distribution

6. ESTIMATE VARIANCE COMPONENTS

\Compare\Analysis of Variance\Variance Components

PROBE->Factors in Order of Nesting TS->Dependent Variable

Tabular Options: Analysis Summary

Variance Components Analysis

Dependent variable: TS

Factors: PROBE

Number of complete cases: 25

Analysis of Variance for TS

Source Sum of Squares Df Mean Square Var. Comp. Percent Index

------

TOTAL (CORRECTED) 1.27346 24

------

PROBE 0.664416 4 0.166104 0.0271304 47.12 1

ERROR 0.60904 20 0.030452 0.030452 52.88 0

------

The StatAdvisor: The analysis of variance table shown here divides the variance of TS into 1 components, one for each factor. Each factor after the first is nested in the one above. The goal of such an analysis is usually to estimate the amount of variability contributed by each of the factors, called the variance components. In this case, the factor contributing the most variance is ERROR. Its contribution represents 52.8842% of the total variation in TS.

Error of sampling = s12 = (MQ1 - MQ0) / k, k..replications

Confidence limits:

lower limit = [(MQ1*L12 - MQ0) / k]1/2 < s1 < upper limit = [(MQ1*L22 - MQ0) / k]1/2

L1(, Df1, Df0) L2(, Df1, Df0)

Error of analysis = s02 = MQ0

Confidence limits:

lower limit: s0*L1 < s0 < upper limit: s0*L2

L1(,Df0, ) L2(, Df0, )

EXPERIMENTAL DESIGNS

Problem:How big is the effect of heating time and concentration of starch solutions on the viscosity of the gelatinised starch

Suspensions of starch with different concentrations in water are heated for different times at 80C. With this samples defined shear tests are made. The effects of starch concentration (Conc) and the time of heating (Time) on the shear resistance (tau) at D=300 s-1 is quantified.

1. Create Experiment

\Special \Experimental Design \Create Design \Screening Design

2 Factors, 1 Response, Fractional Design, 0 Center Point, 1 Replication

Randomize

correct Block

Tabular Options: Design Summary, Worksheet

Save Design File tau.sfx

Print Worksheet

Design Summary

Design class: Screening

Design name: Factorial 2^2

Base Design

Number of experimental factors: 2 Number of blocks: 1

Number of responses: 1 Number of centerpoints per block: 0

Number of runs: 8

Randomized: Yes

Factors Low High Units Continuous

------

Conc -1.0 1.0 Yes

Time -1.0 1.0 Yes

Responses Units

------

tau

The StatAdvisor: You have created a Factorial design which will study the effects of 2 factors in 8 runs. The design is to be run in a single block. The order of the experiments has been fully randomized. This will provide protection against the effects of lurking variables.

2. Run Experiments

3. Enter Data

\Special \Experimental Design \Open Design: tau.sfx

Tabular Options: Design Summary, Worksheet

!!!! take care of correct input of tau to the corresponding experiments !!!!

runBLOCKConcTimetau
41-1-140
1011-1105
71-11130
3111119
21-1-142
911-198
51-11134
8111122

4. Analyze Data

\Special \Experimental Design \Analyze Design

Analysis Options:
max. Order Effect: 2
-> ignore Block number
Estimated Sigma from: Experimental Data

Tabular Options: Analysis Summary, ANOVA Table, Regression coeff., Optimization

Graphical Options: Pareto Chart, Main Effects, Interaction Plot, Response Plots, Diagnostic Plots

Analysis Summary

Estimated effects for tau

average = 98.75 +/- 1.10397

A:CONC = 24.5 +/- 2.20794

B:TIME = 55.0 +/- 2.20794

AB = -36.0 +/- 2.20794

------

Standard errors are based on total error with 4 d.f.

The StatAdvisor: This table shows each of the estimated effects and interactions. Also shown is the standard error of each of the effects, which measures their sampling error.

To plot the estimates in decreasing order of importance, select Pareto Charts from the list of Graphical Options.

To test the statistical significance of the effects, select ANOVA Table from the list of Tabular Options.

You can then remove insignificant effects by pressing the alternate mouse button, selecting Analysis Options, and pressing the Exclude button.

Analysis of Variance for TAU:

Source Sum of Squares Df Mean Square F-Ratio P-Value

------

A:CONC 1200.5 1 1200.5 123.13 0.0004

B:TIME 6050.0 1 6050.0 620.51 0.0000

AB 2592.0 1 2592.0 265.85 0.0001

Total error 39.0 4 9.75

------

Total (corr.) 9881.5 7

R-squared = 99.6053 percentStandard Error of Est. = 3.1225
R-squared (adjusted for d.f.) = 99.3093 percentMean absolute error = 2.0
Durbin-Watson statistic = 2.76282

The StatAdvisor: The ANOVA table partitions the variability in TAU into separate pieces for each of the effects. It then tests the statistical significance of each effect by comparing the mean square against an estimate of the experimental error. In this case, 3 effects have P-values less than 0.05, indicating that they are significantly different from zero at the 95.0% confidence level.

The R-Squared statistic indicates that the model as fitted explains 99.6053% of the variability in TAU. The adjusted R-squared statistic, which is more suitable for comparing models with different numbers of independent variables, is 99.3093%.

The standard error of the estimate shows the standard deviation of the residuals to be 3.1225.

The mean absolute error (MAE) of 2.0 is the average value of the residuals.

The Durbin-Watson (DW) statistic tests the residuals to determine if there is any significant correlation based on the order in which they occur in your data file. Since the DW value is > 1.4, there is probably not any serious autocorrelation in the residuals.

Pareto Chart: Pane Options: Standardized

  • all factors and interactions are significant

Regression coeffs. for tau

constant = 98.75

A:CONC = 12.25

B:TIME = 27.5

AB = -18.0

The StatAdvisor: This pane displays the regression equation which has been fitted to the data. The equation of the fitted model is

TAU = 98.75 + 12.25*CONC + 27.5*TIME - 18.0*CONC*TIME

where the values of the variables are specified in their original units.

To have STATGRAPHICS evaluate this function, select Predictions from the list of Tabular Options.

To plot the function, select Response Plots from the list of Graphical Options.

  • at high CONC, the TIME has low effect
  • at high TIME, the CONC has no effect
Surface Plot: Pane Options: show points

Contour Plot: Pan Options: Painted Regions

  • same TAU can be obtained with low CONC at high TIME
  • at high CONC, the TIME has low effect

Diagnostic Plot:Pane Options: Residuals vs. Run Order

Pane Options: Residuals vs. Factor A:conc