Handling Data – finding the median of large data sets and from frequency tables

Finding the median of large data sets: E.g. find the median of the 35 values in the box below

The definition of the median is: The middle value when arranged in order – either ascending or descending. From this definition we can deduce that there is only the requirement to arrange about half the values in the data set in order to determine the median – continuing to arrange the data in order once we have gone past the median value is a pointless exercise and a waste of time.

In a data set consisting of an odd number of values the median will always be one of those values.

To find which value of the data set is the median:

35 + 1 As a formula: n + 1 where n = number of values in data set

2 2

From this we have: (35 + 1) ÷ 2 = 18 This means that the median is the 18th value of the data set – when arranged in order:

1, 1, 1, 1, 5, 5, 5, 7, 7, 8, 8, 8, 9, 11, 11, 12, 13, 13 18thvalue:there is no need to arrange any more of the data set in order: the median is 13

The formula also works if there is an even number of values in a data set – but this time it will not identify one of the values, instead it will give the mid-point between the two values that must be used to find the median:

E.g. if there are26 values in a data set, using the formula would give the 13.5th value as the median. To find the median from this requires the 13thand 14thvalues to be added together and then divided by 2. Note: finding the median of a data set containing an even number of values requires finding the mean of the middle two values.

Finding the median of a set of data contained in a frequency table: E.g. find the median number of bedrooms per house from the table below

Houses sold during March by Bloggs Estate Agents

Number of bedrooms per house / 1 / 2 / 3 / 4 / 5
Number of houses (frequency) / 3 / 8 / 9 / 4 / 1

A frequency table shows how often something occurs – from the table above it can be seen that 8 two-bedroom houses were sold. Frequency tables, like bar charts, put the data in order – this data could be shown as: 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, etc.

However, writing the values down in this way would not be an ideal way of presenting the data, so a frequency table is used – a bar chart could also have been used.

To find the median from a frequency table first add up all the frequencies – in this case we require to know the total number of houses sold. This tells us how big the data set is: 3 + 8 + 9 + 4 + 1 = 25 this tells us that 25 houses were sold in total.

Next, apply the formula to find where the median value of a data set containing 25 values will occur: (25 +1) ÷ 2 = 13 therefore the 13th value will be the median value.

Next, make a cumulative count of the frequencies starting with the number of one-bedroom houses to find where the 13th value occurs: 3 + 8 = 11, 11 + 9= 20 this means that the 13th value must occur within the cell containing the number 9. This frequency refers to the three-bedrooms per house value, so therefore the median number of bedrooms per house is 3 – if the values had been written out in ascending order the 13th value would be the second of the ‘3s’. This can be checked by referring to where the values have started to be put in order in the first paragraph below the table of data.