Evaluating Analytical Data

CHE322_F06CHAPTER4_EAD

CHAPTER 4

EVALUATING ANALYTICAL DATA

Characterizing a Measurement and Results

Characterizing Experimental Errors

Propagation of Uncertainty

Distribution of Measurements and Results

Statistical Analysis of Data

Detection Limits

A.MEASURES OF CENTRAL TENDENCY AND SPREAD

A.1ESTIMATORS OF CENTRAL TENDENCY

The mean

Estimator of the central tendency or the true value

The median

is the middle value.

For an odd data set______

For an even data set______

A.2ESTIMATORS OF VARIABILITY (scatter)

The range

The range (w) is the difference between the largest and the smallest values in a data set

The standard deviation

Absolute standard deviation

n-1= degrees of freedom = number of independent pieces of information on which a parameter estimate is based.

Relative standard deviation

Percent relative standard deviation

The Variance

B.CHARACTERIZING EXPERIMENTAL ERRORS

B.1Accuracy

Absolute error

Percent relative error

Accuracy is determined by

Determinate Errors(systematic errors)

Sampling errors

Non representative sample

Method errors

Incorrect k used

Incorrect Reagent blank measurements

Other Interferents

Measurement errors

Instruments (loss of calibration)

Equipment (page 59)

Personal errors

Example: Dust in molecular absorption spectrophotometry

Constant determinate errors

Can be detected by using different size samples for making a determination

Proportional determinate errors

B.2PRECISION

Measure of the spread of data about a central value.

Repeatability

Spread of data obtained by one analyst using the same solutions and equipment during one period laboratory work.

Reproducibility

Reproducibility involves variations in analysts, laboratories, equipment, instruments, work periods etc…

Indeterminate errors

Inderterminate errors are random errors, which affect the precision. These errors can not be eliminated.

Sources

Sampling process

Sample treatment

Measurement (reading errors, electronic noise, stray light etc…)

Evaluating/ Identifying Sources of Indeterminate errors

Examples

Make several determinations of a single sample/ item.

Obtain measurements of several samples of the same 'composition'.

C.PROPAGATION OF UNCERTAINTY

Error = difference between a single measurement or result and the true value

The uncertainty is the range of possible values that a measurement or result may have. It includes all errors, determinate and indeterminate.

C.1Uncertainty on the Result of Additions and/or Subtractions

C.2Uncertainty on the Result of Multiplication and Divisions

C.3Uncertainty of Mixed Operations

Examples

4.7

Quiz 4.8

C.4 Uncertainty for other Mathematical Functions

FUNCTION / UNCERTAINTY

Calculations of the Propagation of Uncertainty are used for the following purposes:

1)to compare the Expected Uncertainty of an Analysis and Actual uncertainty obtained

2)to determine Major and Minor Contributions to overall uncertainty

3)to compare of two or more methods

4)Development of best procedure for preparing a sample

D.Distribution of Measurements and Results

Replicate 1measurement 1measurement 2measurement 3
Replicate 2measurement 1measurement 2measurement 3
Replicate 3measurement 1measurement 2 measurement 3
Replicate 4 measurement 1 measurement 2 measurement 3 / mean 1
mean 2
mean 3
mean 4
Mean

Presentation of results

Two students determine the concentration of a solution of NaOH by titrating several aliquots of a single stock solution.

Student 1/ Sample 1 / Student 2/ Sample 2
Aliquots / NaOH (M) / Aliquots / NaOH (M)
1 / 0.1007 / 1 / 0.1005
2 / 0.1010 / 2 / 0.1010
3 / 0.1011 / 3 / 0.1002
4 / 0.1013 / 4 / 0.1004
5 / 0.1005 / 5 / 0.1009
6 / 0.1009 / 6 / 0.1003
7 / 0.1008 / 7 / 0.1010
s

What can you say about the 'True concentration'?

You need to predict

1)true spread of the population

2)true central value

Population

Population refers to all members of a system being investigated.

It is an infinite number of data or a universe of data.

Sample

A sample is a finite number of experimental observations/ measurements.

It is a tiny fraction of the population.

A sample is that part of the population that is collected and analyzed. It is a subset of the population.

Analysis of the entire population provides the population's true central value () and spread ()

the probability of occurrence of V

Vvalue of interest

M frequency of occurrence

Nsize of population

In experimental sciences, we seldom sample the whole population. Rather asample of the population is analyzed.

From properties of the sample to properties of the population

(How do we extend what we know about the sample to the population?)

Requirement

-Need to make assumptions about the distribution of the population

Distributions of samples of chemical systems display trends of well-defined population distributions.

What are they?
D.3PROBABILITY DISTRIBUTION

Distribution of a population: Frequency of occurrence versus individual values

Distribution of data where the members of the population can take any value, i.e. continuous distribution.

Example: Use data obtained for the calibration of a 10-mL pipet

Generate a histogram of the data

Calculate the mean and the standard deviation

What is the shape/ trend of the distribution around the 'central value'?

Can you predict the distribution of the population from this sample's distribution?
D.3.21 NORMAL/ GAUSSIAN DISTRIBUTION

Members of the population may take any value

We will first discuss Gaussian statistics of populations; then we will show how these relationships can be modified and applied to small samples of data.

Gaussian Distribution Equation

f(X) versus X

:frequency of occurrence for a value X

Defined by two Parameters only:

true mean

true population's variance

Properties of a normal distribution

1)The mean occurs at the point of maximum frequency

2)There is a symmetrical distribution of positive and negative deviation about maximum

3)There is an exponential decrease in frequency as the magnitude of deviations increases

Universal Gaussian curve

Frequency of deviations from the mean versus deviation from the mean in units of standard deviation ()

When ,

Appendix 1A: z deviations versus fraction of population to the right of z

Area under two limits gives the probability of occurrence between the two limits

Limits / % population
/ 68.26
/ 95.44
/ 99.73
/ 99.99

Let us set = 0 and  = 1

Confidence interval

For taken from the population, we can state that:

Confidence Intervals for various values of z

The probability of finding  within 

Z / Confidence Interval (%) / X
0.5 / 38 / 0.5
1.00 / 68.26 / 1.0
1.50 / 86.64 / 1.50
1.96 / 95.00 / 1.96
2.50 / 98.76 / 2.50
3.00 / 99.73 / 3.00
3.50 / 99.95 / 3.50

D.3.2What if a mean is obtained from a sample of the population of known standard deviation?

Confidence Intervals in cases of a sample of measurements (n) and known population's 

Examples

D.3.3PROBABILITY DISTRIBUTIONS FOR SAMPLES

In experimental sciences, we seldom know the parameters of the population.

Therefore we must make assumptions about the population distribution or predict the distribution.

Measurements on a large sample can be used to verify the distribution trend.

Let us do that using data on the calibration of a pipet

Replicate data on the Calibration of a 10-mL Pipet

a)Construct histogram

b)Calculate the mean and the standard deviation

Central limit theorem

The distribution of measurements is normal when all errors are random, independent of each other and of similar magnitude.

Then, the sample mean is a good estimate of the population mean, and the sample variance is a good estimate of the population variance.

Therefore, we can

c)Generate a Gaussian curve using the mean and the standard deviation calculated

Estimating the true mean () and the true standard deviation ()

Analysis of a large number of samples will yield the true mean and standard deviation. When the sample size is 50 (>20) the sample mean and the sample standard deviation approach andrespectively.

Confidence intervals

As we have assumed Gaussian distribution of the population, we can determine a range within which the true mean is expected at a given confidence level.

Can we use z to define intervals?

Recall z was calculated using population parameters

So we use t and instead of z and 

tz at all confidence levels

:Standard error of the mean

s:Sample standard deviation

n-1 degrees of freedom (df, ); is the number of independent results used to compute the standard deviation (when n-1 deviation have been computed, the final one is known)

Appendix 1B lists values of t for various confidence levels and degrees of freedom.

How should t vary with sample size?

When n = 50 , = 2.01

For population (n = ), = 1.96

Example

Use Pipet data.

What is the 95 % confidence interval for the pipet data?

Mean volume = 9.982 mL

Standard deviation = 0.0056 mL

Number of trials = 50 = 49

There is 95 % probability that the pipet's mean volume is between 9.984 and 9.980 mL.

E.Statistical Analysis of Data

We can make definite statements only about the probability that the true value lies within a given range.

Q: How do we compare two or more samples of results, or two or more analysts results, or results obtained from two or more methods, made during a long period of time, from different sources/ subjects?

R: Use statistical tests to determine if the results are significantly different or not at a desired confidence level.

Note that there still remains the probability that the response may be wrong, because our hypothesis is tested statistically.

E.1SIGNIFICANCE TESTING/ hypothesis testing

Construct probability distribution curves for each sample of measurements

Use figure on page 82

Q: Can the difference between the samples be explained by indeterminate error?

R: One can only determine the probability that the difference is significant

Null hypothesis:assumes that the numerical quantities being compared are equal

E.2Test of significance for means

Sample mean and population mean

Null hypothesis (H0):the mean of the sample is equal to the mean of the population

Alternative hypothesis (HA):the mean of the sample is not equal to the mean of the population

Choose a significance level:

95 %: the probability that H0 will be correctly retained

The probability that H0 will be incorrectly rejected is  = 0.05

Confidence interval

Example

A new procedure for the rapid determination of sulfur in kerosenes was tested on a sample known from its method of preparation to contain 0.123 % S (). The results were %S = .112, 0.118, 0.115, and 0.119. Do the data indicate there is bias in the method?

s = 0.0032

Compute and compare it to critical at the desired confidence level

the null hypothesis must be rejected

The probability of rejecting the null hypothesis incorrectly is 0.05.

Type 1 error: null hypothesis is incorrectly rejected

Type 2 error: null hypothesis is incorrectly retained
E.3Test of significance for standard deviations

A) Are analysis results within statistical control?

Can the difference between the standard deviation of the sample and the population standard deviation be explained by random error?

Null hypothesis:

F-test

If reject the null hypothesis

B) Are two variances of two samples significantly different?

E.4Comparing Two Experimental Means

A) Unpaired data: samples are from the same source

Compare the mean of two sets of identical analysis

(1)

If the standard deviations are not significantly different use the pooled standard deviation and equation (2) to calculate t.

(2)

(3)

If the standard deviation are significantly different use equation (1) to calculate . Calculate degrees of freedoms using equation (4) and round to the nearest integer.

(4)

B) Paired data: samples are from different sources

:difference between paired data

:standard deviation of the differences

:average difference

E.5Detecting gross errors: Outliers test/ Q-test

Should a measurement be rejected?

A) Outlier is the smallest value (),

B) Outlier is the largest value ()

Appendix 1D:

Caution, when the sample is small such as the three to five determinations you make in the CHE 322 L laboratory course.

"Those who believe that they can discard observations with statistical sanction by using statistical rules for rejection of outliers are simply deluding themselves." J Mandel

F.Detection Limits

F.1IUPAC Definition

The detection limit is the smallest concentration or absolute amount of analyte that has a signal significantly larger than the signal arising from a reagent blank. (Detectable signal)

This limit is determined by the blank signal / 'background noise' of the method and the sensitivity of the method.

H0: no analyte in blank

: known standard deviation for reagent blank's signal

: standard deviation determined for a reagent blank's signal

t: for one-tailed analysis

( = 0.00135)

The probability of type 1 error is .135 %, but the probability of type 2 error is higher.

F.2Limit of Identification (LOI)

LOI: the smallest concentration or absolute amount of analyte such that the probability of type 1 and type 2 errors are equal.

F.3Limit of Quantitation (LOQ)

Committee on Environmental Chemistry: LOQ is the smallest concentration or absolute amount of analyte that can be reliably determined. (Quantifiable signal)