CHE322_F06CHAPTER4_EAD
CHAPTER 4
EVALUATING ANALYTICAL DATA
- Characterizing a Measurement and Results
- Characterizing Experimental Errors
- Propagation of Uncertainty
- Distribution of Measurements and Results
- Statistical Analysis of Data
- Detection Limits
A.MEASURES OF CENTRAL TENDENCY AND SPREAD
A.1ESTIMATORS OF CENTRAL TENDENCY
The mean
Estimator of the central tendency or the true value
The median
is the middle value.
For an odd data set______
For an even data set______
A.2ESTIMATORS OF VARIABILITY (scatter)
The range
The range (w) is the difference between the largest and the smallest values in a data set
The standard deviation
Absolute standard deviation
n-1= degrees of freedom = number of independent pieces of information on which a parameter estimate is based.
Relative standard deviation
Percent relative standard deviation
The Variance
B.CHARACTERIZING EXPERIMENTAL ERRORS
B.1Accuracy
Absolute error
Percent relative error
Accuracy is determined by
Determinate Errors(systematic errors)
- Sampling errors
Non representative sample
- Method errors
Incorrect k used
Incorrect Reagent blank measurements
Other Interferents
- Measurement errors
Instruments (loss of calibration)
Equipment (page 59)
- Personal errors
Example: Dust in molecular absorption spectrophotometry
Constant determinate errors
Can be detected by using different size samples for making a determination
Proportional determinate errors
B.2PRECISION
Measure of the spread of data about a central value.
Repeatability
Spread of data obtained by one analyst using the same solutions and equipment during one period laboratory work.
Reproducibility
Reproducibility involves variations in analysts, laboratories, equipment, instruments, work periods etc…
Indeterminate errors
Inderterminate errors are random errors, which affect the precision. These errors can not be eliminated.
Sources
Sampling process
Sample treatment
Measurement (reading errors, electronic noise, stray light etc…)
Evaluating/ Identifying Sources of Indeterminate errors
Examples
Make several determinations of a single sample/ item.
Obtain measurements of several samples of the same 'composition'.
C.PROPAGATION OF UNCERTAINTY
Error = difference between a single measurement or result and the true value
The uncertainty is the range of possible values that a measurement or result may have. It includes all errors, determinate and indeterminate.
C.1Uncertainty on the Result of Additions and/or Subtractions
C.2Uncertainty on the Result of Multiplication and Divisions
C.3Uncertainty of Mixed Operations
Examples
4.7
Quiz 4.8
C.4 Uncertainty for other Mathematical Functions
FUNCTION / UNCERTAINTYCalculations of the Propagation of Uncertainty are used for the following purposes:
1)to compare the Expected Uncertainty of an Analysis and Actual uncertainty obtained
2)to determine Major and Minor Contributions to overall uncertainty
3)to compare of two or more methods
4)Development of best procedure for preparing a sample
D.Distribution of Measurements and Results
Replicate 1measurement 1measurement 2measurement 3Replicate 2measurement 1measurement 2measurement 3
Replicate 3measurement 1measurement 2 measurement 3
Replicate 4 measurement 1 measurement 2 measurement 3 / mean 1
mean 2
mean 3
mean 4
Mean
Presentation of results
Two students determine the concentration of a solution of NaOH by titrating several aliquots of a single stock solution.
Student 1/ Sample 1 / Student 2/ Sample 2Aliquots / NaOH (M) / Aliquots / NaOH (M)
1 / 0.1007 / 1 / 0.1005
2 / 0.1010 / 2 / 0.1010
3 / 0.1011 / 3 / 0.1002
4 / 0.1013 / 4 / 0.1004
5 / 0.1005 / 5 / 0.1009
6 / 0.1009 / 6 / 0.1003
7 / 0.1008 / 7 / 0.1010
s
What can you say about the 'True concentration'?
You need to predict
1)true spread of the population
2)true central value
Population
Population refers to all members of a system being investigated.
It is an infinite number of data or a universe of data.
Sample
A sample is a finite number of experimental observations/ measurements.
It is a tiny fraction of the population.
A sample is that part of the population that is collected and analyzed. It is a subset of the population.
Analysis of the entire population provides the population's true central value () and spread ()
the probability of occurrence of V
Vvalue of interest
M frequency of occurrence
Nsize of population
In experimental sciences, we seldom sample the whole population. Rather asample of the population is analyzed.
From properties of the sample to properties of the population
(How do we extend what we know about the sample to the population?)
Requirement
-Need to make assumptions about the distribution of the population
Distributions of samples of chemical systems display trends of well-defined population distributions.
What are they?
D.3PROBABILITY DISTRIBUTION
Distribution of a population: Frequency of occurrence versus individual values
Distribution of data where the members of the population can take any value, i.e. continuous distribution.
Example: Use data obtained for the calibration of a 10-mL pipet
Generate a histogram of the data
Calculate the mean and the standard deviation
What is the shape/ trend of the distribution around the 'central value'?
Can you predict the distribution of the population from this sample's distribution?
D.3.21 NORMAL/ GAUSSIAN DISTRIBUTION
Members of the population may take any value
We will first discuss Gaussian statistics of populations; then we will show how these relationships can be modified and applied to small samples of data.
Gaussian Distribution Equation
f(X) versus X
:frequency of occurrence for a value X
Defined by two Parameters only:
true mean
true population's variance
Properties of a normal distribution
1)The mean occurs at the point of maximum frequency
2)There is a symmetrical distribution of positive and negative deviation about maximum
3)There is an exponential decrease in frequency as the magnitude of deviations increases
Universal Gaussian curve
Frequency of deviations from the mean versus deviation from the mean in units of standard deviation ()
When ,
Appendix 1A: z deviations versus fraction of population to the right of z
Area under two limits gives the probability of occurrence between the two limits
Limits / % population/ 68.26
/ 95.44
/ 99.73
/ 99.99
Let us set = 0 and = 1
Confidence interval
For taken from the population, we can state that:
Confidence Intervals for various values of z
The probability of finding within
Z / Confidence Interval (%) / X0.5 / 38 / 0.5
1.00 / 68.26 / 1.0
1.50 / 86.64 / 1.50
1.96 / 95.00 / 1.96
2.50 / 98.76 / 2.50
3.00 / 99.73 / 3.00
3.50 / 99.95 / 3.50
D.3.2What if a mean is obtained from a sample of the population of known standard deviation?
Confidence Intervals in cases of a sample of measurements (n) and known population's
Examples
D.3.3PROBABILITY DISTRIBUTIONS FOR SAMPLES
In experimental sciences, we seldom know the parameters of the population.
Therefore we must make assumptions about the population distribution or predict the distribution.
Measurements on a large sample can be used to verify the distribution trend.
Let us do that using data on the calibration of a pipet
Replicate data on the Calibration of a 10-mL Pipet
a)Construct histogram
b)Calculate the mean and the standard deviation
Central limit theorem
The distribution of measurements is normal when all errors are random, independent of each other and of similar magnitude.
Then, the sample mean is a good estimate of the population mean, and the sample variance is a good estimate of the population variance.
Therefore, we can
c)Generate a Gaussian curve using the mean and the standard deviation calculated
Estimating the true mean () and the true standard deviation ()
Analysis of a large number of samples will yield the true mean and standard deviation. When the sample size is 50 (>20) the sample mean and the sample standard deviation approach andrespectively.
Confidence intervals
As we have assumed Gaussian distribution of the population, we can determine a range within which the true mean is expected at a given confidence level.
Can we use z to define intervals?
Recall z was calculated using population parameters
So we use t and instead of z and
tz at all confidence levels
:Standard error of the mean
s:Sample standard deviation
n-1 degrees of freedom (df, ); is the number of independent results used to compute the standard deviation (when n-1 deviation have been computed, the final one is known)
Appendix 1B lists values of t for various confidence levels and degrees of freedom.
How should t vary with sample size?
When n = 50 , = 2.01
For population (n = ), = 1.96
Example
Use Pipet data.
What is the 95 % confidence interval for the pipet data?
Mean volume = 9.982 mL
Standard deviation = 0.0056 mL
Number of trials = 50 = 49
There is 95 % probability that the pipet's mean volume is between 9.984 and 9.980 mL.
E.Statistical Analysis of Data
We can make definite statements only about the probability that the true value lies within a given range.
Q: How do we compare two or more samples of results, or two or more analysts results, or results obtained from two or more methods, made during a long period of time, from different sources/ subjects?
R: Use statistical tests to determine if the results are significantly different or not at a desired confidence level.
Note that there still remains the probability that the response may be wrong, because our hypothesis is tested statistically.
E.1SIGNIFICANCE TESTING/ hypothesis testing
Construct probability distribution curves for each sample of measurements
Use figure on page 82
Q: Can the difference between the samples be explained by indeterminate error?
R: One can only determine the probability that the difference is significant
Null hypothesis:assumes that the numerical quantities being compared are equal
E.2Test of significance for means
Sample mean and population mean
Null hypothesis (H0):the mean of the sample is equal to the mean of the population
Alternative hypothesis (HA):the mean of the sample is not equal to the mean of the population
Choose a significance level:
95 %: the probability that H0 will be correctly retained
The probability that H0 will be incorrectly rejected is = 0.05
Confidence interval
Example
A new procedure for the rapid determination of sulfur in kerosenes was tested on a sample known from its method of preparation to contain 0.123 % S (). The results were %S = .112, 0.118, 0.115, and 0.119. Do the data indicate there is bias in the method?
s = 0.0032
Compute and compare it to critical at the desired confidence level
the null hypothesis must be rejected
The probability of rejecting the null hypothesis incorrectly is 0.05.
Type 1 error: null hypothesis is incorrectly rejected
Type 2 error: null hypothesis is incorrectly retained
E.3Test of significance for standard deviations
A) Are analysis results within statistical control?
Can the difference between the standard deviation of the sample and the population standard deviation be explained by random error?
Null hypothesis:
F-test
If reject the null hypothesis
B) Are two variances of two samples significantly different?
E.4Comparing Two Experimental Means
A) Unpaired data: samples are from the same source
Compare the mean of two sets of identical analysis
(1)
If the standard deviations are not significantly different use the pooled standard deviation and equation (2) to calculate t.
(2)
(3)
If the standard deviation are significantly different use equation (1) to calculate . Calculate degrees of freedoms using equation (4) and round to the nearest integer.
(4)
B) Paired data: samples are from different sources
:difference between paired data
:standard deviation of the differences
:average difference
E.5Detecting gross errors: Outliers test/ Q-test
Should a measurement be rejected?
A) Outlier is the smallest value (),
B) Outlier is the largest value ()
Appendix 1D:
Caution, when the sample is small such as the three to five determinations you make in the CHE 322 L laboratory course.
"Those who believe that they can discard observations with statistical sanction by using statistical rules for rejection of outliers are simply deluding themselves." J Mandel
F.Detection Limits
F.1IUPAC Definition
The detection limit is the smallest concentration or absolute amount of analyte that has a signal significantly larger than the signal arising from a reagent blank. (Detectable signal)
This limit is determined by the blank signal / 'background noise' of the method and the sensitivity of the method.
H0: no analyte in blank
: known standard deviation for reagent blank's signal
: standard deviation determined for a reagent blank's signal
t: for one-tailed analysis
( = 0.00135)
The probability of type 1 error is .135 %, but the probability of type 2 error is higher.
F.2Limit of Identification (LOI)
LOI: the smallest concentration or absolute amount of analyte such that the probability of type 1 and type 2 errors are equal.
F.3Limit of Quantitation (LOQ)
Committee on Environmental Chemistry: LOQ is the smallest concentration or absolute amount of analyte that can be reliably determined. (Quantifiable signal)
1