Data Concepts Explained

  1. Data Set:

Schools have a wealth of data available to them. It is important when analysing data to know which group of children (data set) you wish to analyse, what their defining features are and why you are analysing them.

Your whole school will be the complete data set. There are a variety of sub-sets that range in size – gender, Key Stage, class, SEND, behaviour, ability etc. When setting up your project you need to establish:

  1. The data set (s) you need to collect.
  2. What the key questions are that you need to find out from the data at the outset, interim and end.
  3. Whether you need a comparative data set to evaluate against.
  1. Sampling

Sometimes data sets are too large to track purposefully. Where this is the case statisticians often use sampling to ease their work yet still give reasonably accurate outcomes.

  1. What is Simple Sampling?

A simple random sample is a subset of a statistical population in which each member of the subset has an equal probability of being chosen.

An example of a simple random sample (10%)

If a Primary school had 400 pupils and the names of 40 children were chosen as in a Bingo draw this would provide a simple random sample. In this case, the statistical population (or whole data set) is all 400 pupils, and the sample is random because each pupil has an equal chance of being chosen.

Simple Random Sample advantages

• Ease of use represents the biggest advantage of simple random sampling. No need to divide the population into sub-populations.

• It is meant to be an unbiased representation of a group. It is fair, since every member of the population has an equal chance of getting selected.

Simple Random Sample disadvantage

A sampling error can occur. For example, in our simple random sample of 40 pupils, it would be possible to draw 40children from Year 1, however unlikely. This can create an unwanted bias. If the researcher knows more about the population, it would be better to use a different sampling technique, such as ‘Stratified’ sampling, which can account for differences within the population, such as level of attainment or gender.

  1. Stratified Sampling

'Stratum' means 'layer'. A stratified sample is made up of different 'layers' of the population, for example, selecting samples from different age groups.

The sample size for each layer is proportional to the size of the 'layer'. A percentage for each aspect of the population required is given so that a proportion of SEND children, different year groups etc are chosen so that each sub-set is proportionally represented.

  1. Data Confidence

Both Data Dashboard and ASP use the concept of ‘confidence’ to establish how accurate any data is likely to be. Fr instance where a small school has only 10 pupils taking KS1 or KS2 SATs it is likely to have a wider variance in accuracy than one that has 100. There is a difference between one pupil’s results having a value of 10% or 1%. If a child is absent for Sats this can have a major impact and therefore less confidence in the accuracy of the data.

  1. Confidence intervals

The certainty or uncertainty of a contextual value-added score as a measure of school effectiveness can be presented as a confidence interval. This is a range of scores within which we can be statistically confident that the "true" school effectiveness will lie, so a school’s true result will be somewhere on the line, but we do not know where.

The figure given is the distance of the upper and lower limits of the confidence interval from the value of the corresponding measure. Therefore, by adding or subtracting this number from the measure, you can identify the upper or lower limit of the confidence interval.

  1. What is a confidence level?

The "confidence level" that is associated with a confidence interval is an indicator as to the level of uncertainty surrounding the value of the measure concerned. The size of the confidence interval depends on how uncertain we are prepared to be. If we are prepared to be less confident, then the interval is smaller, and vice versa. The convention amongst statisticians is to use a 95% confidence level. This means that we are 95% confident that the true value of a measure lies within the confidence interval. Confidence intervals are an important part of calculating statistical significance.

Example from ASP:

Reading Explore data in detail / Writing Explore data in detail / Maths Explore data in detail
School progress score / +4.44 / +5.97 / +4.56
Confidence interval What does this mean? / +0.91 to +7.96 / +2.55 to +9.39 / +1.38 to +7.75
Well above national average (about 10% of schools in England) / Well above average / Well above average / Well above average
Above national average (about 10% of schools in England)
Average (about 60% of schools in England)
Below national average (about 10% of schools in England)
Well below national average (about 10% of schools in England)
Number of pupils / 12 / 12 / 12

This example would tend to suggest that this school is doing very well. However, the range of the confidence interval shows that its reading may only be just above average and cannot be complacent due to the low number of pupils (12)

This becomes apparent when even lower numbers of pupils are looked at in relation to prior attainment. The interval becomes much wider and potential minus figures appear.

Average progress in reading by prior attainment
Prior attainment / Low What does this mean? / Middle What does this mean? / High What does this mean?
Group / All / Dis / All / Dis / All / Dis
Number of pupils / 1 / 0 / 6 / 2 / 5 / 1
Score / 10.86 / N/A / 6.95 / 9.08 / 0.13 / -1.48
National average What does this mean? / 0.00 / 0.47 / 0.00 / 0.34 / 0.00 / 0.28
Difference What does this mean? / 10.86 / N/A / 6.95 / 8.74 / 0.13 / -1.76
Confidence interval What does this mean? / -1.34 to +23.06 / N/A / +1.97 to +11.93 / +0.45 to +17.70 / -5.32 to +5.59 / -13.68 to +10.72
  1. What makes data statistically significant?

A measure is said to be "statistically significant" if we are statistically confident that it is different from average. Deciding whether we are statistically confident involves the use of confidence intervals. Put simply, a measure is not statistically significantly different from average if its confidence interval contains the average. If the measure’s confidence interval is completely above the average, then we say it is "statistically significantly above average", denoted by "sig+" in many analyses.

Alternatively, if the measure’s confidence interval is completely below the average, then we say it is "statistically significantly below average", denoted by "sig-" in some analyses. It is apparent, therefore, that statistical significance is very much dependent on the width of the confidence interval rather than the actual distance from the average of the measure.

Data in the report is frequently rounded to one decimal place; however, where significance has been tested, this has been done on the unrounded figures. Apparent anomalies are likely to simply be display issues due to this rounding.

  1. What is Correlation?

When two sets of data are strongly linked together we say they have a ‘High Correlation’. ie: they are showing that one is impacting on another. An example within a school might be the time that a group of children spend on an intervention project and how this relates to their progress.

Correlation is Positive when the values increase together, and

Correlation is Negative when one value decreases as the other increases

  1. Correlation

Correlation can have a value:

1 is a perfect positive correlation

 0 is no correlation (the values don't seem linked at all)

 -1 is a perfect negative correlation

The value shows how good the correlation is, and if it is positive or negative.

  1. Correlation and Causation

Many variable factors are often at play within schools that might also be impacting on a child’s progress ie the child has a new class teacher that term. It is difficult therefore to show with certainty that one factor has impacted or caused another to improve.Correlations will normally need to be considered in the child’s wider context to decide which has been the major contributory factor.

It may also be the case that for a period of time there is high progress due to the intervention but that after a while this impact wears off and signs of no correlation begin to appear.

  1. Data Analysis Tools

Scatterplots:

These are often used as a visual representation of individual attainment scores plotted against prior attainment in order to show progress. They are helpful:

  1. in presenting very quickly an overview of a cohort and
  2. filters can be used to show particular aspects eg gender.
  3. Individuals can be clicked on to give more information
  4. ‘outliers’ eg those outside the main bulk of the cohort can be tracked to ensure they are brought into better progress.

The example below is taken from Fisher Family Trust (FFT Aspire) – fuller details can be obtained by clicking in this link : FFT Aspire

  1. Data, Tracking and Progress Systems

These are some of the common systems in use within schools. They are hyperlinked so that you can explore them if you feel your current school system may not be providing the service you want or you want something with which to compare.

Early Years / Primary / Pupils with SEN
Tapestry
EYFS Tracker
Target Tracker
EduData UK / School pupil tracker
FFT Aspire
Insight Tracking
Learning Ladders
SIMS Discover / Blue Hills Map Writer
Blue Hills Dashboard
CASPA
PIVATS