Ch2.1Numerical Summary Measures of Center for Data

------

Topics:

  • Measures of Center: mean, median

(Note: Here we will cover the summary measure for DATA only. We will cover the measures for DISTRIBUTIONS in the 4th and 5th weeks after we introduce the concept of probability distribution.)

------

I: Measures of Center for Data:

(1) Mean

  • Meanof nobservations is (here subscript doe not contain information on the magnitude of a data point)

Ex. Sue wanted to study the systolic blood pressure (BP), x, of the NCSU freshmen;7 freshmen were randomly selected and their BP values are

121, 110, 114, 100, 103, 130, 130 (Note: )

The sample mean of BPis

We use one more digit in the mean than the original data

Note: The R function to calculate mean is mean()

> bp <- c(121, 110, 114, 100, 103, 130, 130)

> mean(bp)
(2) Median

  • Median is the middle value of the data such that there are same numbers of data points above it and below it.
  • To gettheMedianof the nobservations in the sample:

(1)Sort the data, from the smallest to the largest

(2)If n is odd, then = the middle value, i.e.,

= the th data point

If n is even, then = the average of the middle two values, i.e.,

= [ the th data point+the th data point ]

Ex. In the BP example, there are 7 observations: 121, 110, 114, 100, 103, 130, 130.The sample median is:

Sorted data is: 100 103 110 114 121 130 130. So

Ex. In BP example, there is one more data point 105. Then the sample median becomes:

The sorted data becomes: 100 103 105 110 114 121 130 130. So .

Note: The R function to calculate mean is median()

> median(bp)
Comment (1) : Mean vs. Median

  1. Mean is sensitive to outliers (extreme values), while median is less affected by outliers.

Ex.Data set 1: {1, 2, 3}. Data set 2: {1, 2, 99}.

Themean of data set 1:2

The median of data set 1:2

The mean of data set 2:34

The median of data set 2:2

  1. Mean is the __balance point_ of the data.

A balance point is the point such that

sum of the distance of
the points above the mean / = / sum of the distance of
the points below the mean

Ex. A sample consists of 5 data points 1, 2, 3, 10, 14.

The mean =6

Data point / Data point
10, 14 / 1, 2, 3
Total distance to =4+8=12 / Total distance to =5+4+3=12

Median is the _midpoint__ of a the distribution

That is,half of the data points are above or below the median.

  1. The relationship between mean and median depends on the shape of the distribution
  2. For symmetric distribution, mean median
  1. For positively-skewed distribution, mean median
  1. For negatively-skewed distribution, mean median

In other word, from the relationship between mean and median, we can guess the shape of the distribution

Comment (2) : Change of Unit

Mean and median share the same unit as the measuring scale. The values change with the measuring unit.

Original data: , transformation: y = ax + b.

New data:

When unit of measure changes from to , then

The new mean

The new median

Ex.Thetemperaturesin Raleigh in the next 6 days are predicted to be 43, 39, 33, 39, 45 and 48 in Fahrenheit. What are the mean and median of these temperatures. What are the mean () and median () if we switch to Centigrade? Note that .

Mean = 41.2 (F)

Median = 41 (F)

New mean in C = 41.17*5/9 = 22.9 (C)

New median in C = 22.8 (C)

1