CHAPTER 1, Section 1.3 Revised Feb 2, 2012

CHAPTER 1, Section 1.3 Revised Feb 2, 2012

If the overall pattern of a large number of observations is quite regular, we chose to describe it by a smooth curve called a density curve.

A Density curve has the following properties:

1. It is on or above the horizontal axis.

2. The total area under the curve is 1.0000 or 100.00%.

3. The area under the curve and between any two values on the horizontal axis represents the percent or fraction of all observations that fall in that range, (probability of occurrence).

4. Because density curves are continuous distributions, the chance of any exact value occurring is 0; only an interval has a percent or a probability of occurring.

The median of a density curve is the equal areas or equal counts point.

The mean of a density curve is the balance point.

For a left skewed density curve, the mean is lower or less than the median.

For a symmetric density curve, the mean = the median.

For a right skewed density curve the mean is higher or greater than the median.

A density curve is an idealized model for a distribution of data. It is often used to describe the entire population of interest, and in this context the mean of the population is designated as µ, and the standard deviation as σ . When we take actual observations (generally a sample) we distinguish the mean of the sample observations as and the standard deviation as s.

Normal Distributions are a particularly important class of density curves. These density curves are symmetric, unimodal, and bell-shaped.

They have the following properties:

· They are all symmetric

· Their mean is equal to their median

· The standard deviation controls the spread of a normal curve. We can actually locate by eye on a normal curve. It is the point on the horizontal scale which is directly under the inflection points of the curve.

· Changing the mean,, without changing standard deviation, , shifts the normal curve along the horizontal axis without changing the spread.

· Changing the without changing changes only the spread of the normal distribution.

· The Normal density curve can be fully described by giving its mean, , and standard deviation, . The values and are parameters of the curve..

The standard notation is . Given X has a normal distribution with mean = 5 and standard deviation σ = 0.2, we write it as follows:

X ~ N(5, 0.2)

Common properties of Normal density curves:

The 68-95-99.7 Rule:

In the normal distribution with mean and standard deviation :

· Approximately 68% of the observations fall within 1 of the mean

· Approximately 95% of the observations fall within of

· Approximately 99.7% of the observations fall within of

Example: Checking account balances, X, are approximately normal with a mean of 1325 and a standard deviation of 25.

1. What is the notation for this distribution?

X is ~ N ( 1325, 25)

2. Between what numbers do 68% of the balances fall?

1300 and 1350

3. Above what number do 2.5% of the balances lie?

1375

4. Approximately what % of balances are between 1250 and 1400?

99.7%

What if you need different probabilities for X ~ N?

1. We use the Standard Normal Distribution, a normal distribution with a mean, , = 0 and a standard deviation, = 1, written as N(0,1).

2. And we can use the fact that all normal distributions are the same if we express the location of any point on the horizontal scale in terms the center, µ plus or minus a certain number of units of .

3. We can convert any normal distribution to a Standard Normal Distribution by the formula listed below. If we do this we can use the Standard Normal Table (Table A in the front cover of your book) for any variable which can be described as a Normal Distribution.

You convert X ~ N to Z ~ N(0,1). Convert/standardize using:

A standardized value is often called a z-score. The z-score effectively describes how many standard deviations any x is from the X distribution mean, and in what direction.

Z-scores are what you need in order to use the Standard Normal Table (Table A in the front cover of your book). In the table:

· Z-scores run down the left-most column of the tables. The 2nd decimal place of the z-score runs across the top-most row of the tables.

· The inner numbers are the probability that you are at or lower than your z-score.

· The first page of Table A has negative z-scores, the second page has positive z-scores.

· P(Z=z-score) = 0. Only intervals have probabilities.

To find a probability if you have X ~ N and a sample score, x, to work with:

1. Convert x to z-score.

2. Rearrange (if necessary the inequality so that it uses < or . This uses:

P(Z>z-score) = 1 – P(Z<z-score).

3. Look up the probability for your z-score on Table A.

4. If z-score is between 2 table points, use the closest value.

5. P(a<Z<b) = P(Z<b) – P(Za).

If you are given the probability and know X ~ N, but don’t know the sample’s score you will need to work the problem backwards. Find the appropriate z-score and convert it to x with x = + z. See part f in the example below.

Examples:

1. Checking account balances are ~N(1325,25).

a. Bill has a balance of $1270. What is Bill’s standardized balance (his z-score)?

b. What is the probability an account will have less money than Bill’s?

c. What is the probability an account balance will be more than $1380?

d. What is the probability an account balance will be exactly $1380?

e. What is the probability that an account will have between $1310 and $1390?

f. What account balance would be the beginning of the top 10% of all balances?

2. The beanstalks clubs are a social club for tall people. To join the Beanstalks, woman must be at least 70” tall and men must be at least 74” tall. The National Health Survey reports that:

Height of adult Women in U.S. = ~ N (63.6, 2.5) and

Height of adult Men in U.S. = ~ N (69, 2.8 )

a. What fraction of the adult female population of the U.S. could qualify for members of the Beanstalks?

b. What fraction of the adult male population of the U.S could qualify for membership in the Beanstalks?

c. If the Beanstalks Club wanted to be more exclusive for males, above what height would only 1% of males qualify?

3. A physical fitness association is including the mile run in its secondary school fitness test for boys. The time for this event for boys in secondary school is ~ N (450, 40)

a. What fraction of the secondary school boys would run the mile in less than 7 minutes( less than 420 seconds)?

b. What fraction of the secondary school boys run the mile in 7 to 8 minutes, (420 to 480 seconds) ?

c. If the association wants to designate the fastest 10% as “excellent”, what time should the association set for this criterion?

How do you determine if your data is normally distributed?

Two methods:

1. Graph the data and determine if the data is unimodal, symmetric, and approximately normally distributed (use the 68%-95%-99.7% rule to check). OR

make a Histogram, a Stemplot or a Normal Quantile Plot.

2. Generate a normal quantile plot.

One does a Normal Quantile plot using the following steps:

· Arrange the data, each value of x, from smallest to largest,

· Recording what percentile of the data each value represents,

· Convert that percentile to an expected z-score,

· Convert z-score to an expected value,

· Plot the x observed value and the expected value.

Example: Bob’s last 20 golf scores,

69 73 77 77 80 76 75 77 78 78 77 81 82 75 79 76 83 77 80 84

Put the data in ascending order:

69 73 75 75 76 76 77 77 77 77 77 78 78 79 80 80 81 82 83 84

If the points on a normal quantile plot lie close to a straight line, the plot indicates that the data are normal. Systematic deviations from a straight line indicate a non-normal distribution. Outliers appear as points that are far away from the overall pattern of the plot. A Q-Q Plot shows observed and expected values, as above. A P-P plot shows observed and expected cumulative probabilities.

For SPSS: Enter your variable data. Then select AnalyzeDescriptive Statistics QQ > move variable name to variable column, select OK.

Lecture 3, Section 1.3

Page 1