The Security of Inspection of Initial Teacher Training
Peter Tymms
School of Education,
University of Durham,
Leazes Rd.,
Durham,
DH1 1TA
Summary
The 1996/7 Initial Teacher Training inspection Framework and the grades awarded within it are considered in the light of the security of Ofsted judgements. This paper takes a simulation approach in which calculations are made to estimate the likelihood of a Grade 4 (non-compliance) following an inspection of an institution which is, in reality, perfectly satisfactory.
Serious problems are identified with the Framework. It would seem that very satisfactory institutions have a high chance of failing an inspection.
Key words:Inspection, Ofsted, reliability, Teacher Training, judgement
Introduction
In the latest round of Initial Teacher Training (ITT) inspections by Ofsted institutions were rating on 14 “cells” (Ofsted 1996a). The grades for each cell were on a 1 to 4 scale with 1 being “Very Good” and 4 “Poor Quality”. Grade 4 is also described as “Does not comply with the Secretary of State’s current criteria” and a single Grade 4 in any of the 14 cells is of major concern.
Despite the high stakes nature of the inspection the Framework is new and untried:
“… the Framework has been revised and comes into immediate use for the rest of the academic year 1996/97.” (page 7 Framework)
“It is, however, significantly different from these earlier documents in design and content” (page 7 Framework.)
“This version of the Framework will apply during 1996/97 and will be revised at the end of that period both in the light of experience ..” (page 8 Framework)
“To ensure consistency in the coverage of important aspects of the inspections of reading and number, HMI will develop the following instruments.” (page 5 Methodology Ofsted 1996b)
(Emphases added)
The question to be addressed is: Is the system employed fair? In attempting to answer this question it is recognised that when setting up an inspection Framework a balance must be made between the need to make decisions and the possibility of erroneous conclusions. This balance involves estimates, either intuitive or explicit, of the rates and acceptability of mistakes being made.
This paper approaches the question by looking at the likelihood of a Grade 4 being given to institutions which are in fact performing satisfactorily. A series of simulations are run involving thousands of inspections and the likelihood of various results calculated for different levels of security of Ofsted judgement. This procedure is employed for the system overall and then for two the crucial T4 cells.
Likelihood of Grade 4 in one of the cells
In order to make the necessary estimates security was defined as the correlation between the true level of an institution’s performance and rating from inspection. It is assumed that there is an underlying normal distribution and that this converts into the following distribution of grades across the population:
Table I
Grade1 / 25%
2 / 50%
3 / 20%
4 / 5%
Our main interest is in the awarding of a Grade 4 when the true grade is satisfactory. The chances of this happening for any one of the 14 cells were estimated as follows: Data were simulated to represent 100,000 inspection situations and the “true” grades and judgements allocated for each case. Because the security of judgements is not known, and could not be known in such a new system, several possibilities are considered ranging from 0.6 to 0.9. The percent of Grade 4 judgements being associated with each true grade were then found. These are shown below:
Table II: Chance of a single Grade 4 being awarded
True Grade / 0.6 / 0.7 / 0.8 / 0.9
1 / 0.2% / 0% / 0% / 0%
2 / 2% / 1.4% / 0.7% / 0.1%
3 / 11% / 12% / 11% / 9%
4 / 31% / 40% / 47% / 64%
If the true grade were 1 then a Grade 4 would seem to be a very remote possibility. But, if the true grade were 3 there would be about a chance in 10 of any one cell being given a Grade 4. Surprisingly this is fairly constant across the four levels of security.
It might be argued that a chance in ten is an acceptable risk to set for borderline institutions. But an institution will be concerned if it is awarded a single Grade 4 in any of the 14 cells. The chance of this happening may be calculated using the formula:
(
Where P is the probability of n fails
X is the probability of success
N is 14
The table below shows the chances of an ITT establishment getting a single cell graded 4 out of all 14 cells.
Table III: Chance of at least one Grade 4 out of 14
True Grade / 0.6 / 0.7 / 0.8 / 0.9
1 / 3% / <1% / <1% / <1%
2 / 25% / 18% / 9% / 1%
3 / 80% / 83% / 80% / 73%
4 / 99% / >99% / >99% / >99%
(
Where p is the probability of n fails
X is the probability of success
N is 14
see h:/pbt_tos5/ofsted.xls)
If the true grade were 1 in all cells then the chance of any Grade 4 being awarded is low and for the four levels of security is below one per cent. If an institution’s actual grade were 2 on all 14 cells and if the security of judgement were 0.7 then there would be an 18% chance of at least one cell being given a Grade 4. This decreases rapidly as the security of judgement increases. At the highest security level the chance drops to just one in a hundred.
If the true grades were all 3 then the picture changes dramatically. Now, if the security were 0.7 there is only a 17% chance of a clean sheet (all cells graded 3 or above). Table III does not give details of the chances of several cells being graded 4, it simply gives the chances of at least one cell being given the lowest grade. The diagram below gives more details. It shows graphically the likelihood of several of the 14 cells being grade 4 even when they are truly a Grade 3. For a security level of 0.7 the most likely outcome, with a chance of 1 in 3, is to have a single cell graded as a 4. The next most likely outcome, which a chance above 1 in 4, is to have two cells graded as a 4.
Table III indicates that even if the judgements were very secure there would still be a 73% chance of a least one Grade 4 being awarded when the true value of each cell is 3. This does not seem to be healthy. But the Framework is structured in such a way that a further danger lies in two particular cells out of the 14. They are known as T4 cells.
Likelihood of a Grade 4 in T4
In each of the T4 cells there are a set of 5 statements against which the grade is to be judged. It would seem that if Ofsted judges a student to have failed that have been passed by the HEI the trouble will follow. “... evidence of unsatisfactory levels of competence among students notified to OFSTED as competent is a key indicator for a Grade 4.” (Taylor 1997)
The Framework requires that sixteen students be nominated and in two groups - Number and Reading. Thus there were two T4 cells each involving eight students. They were to be selected by the Institutions in the following categories.
Table IV
Category / Number nominatedGrade 1 / 4
Grade 2 / 4
Grade 3 (adequate) / 4
Grade 3 (weak) / 4
Grade 4 / 0
The indicative proportions of students in each of these categories across the student population in general are shown below. They were taken from a rather strangely worded table in the Annex of an Ofsted letter (Cavendish 1997). A certain amount of interpretation was need to complete the table.
Table V
Category / ProportionGrade 1 / 50%
Grade 2 / 25%
Grade 3 (adequate) / 10%
Grade 3 (weak) / 10%
Grade 4 / 5%
Using the same procedure as before the probabilities of individual students being identified as “Likely to Fail” (Grade 4) but who were correctly placed in the various categories are:
Table V
SecurityTrue Grade / 0.6 / 0.7 / 0.8 / 0.9
1 / 0.6% / 0.3% / 0% / 0%
2 / 3.5% / 2.4% / 1.4% / 0.2%
3 (adequate) / 7.9% / 7.2% / 5.4% / 2.3%
3 (weak) / 13.9% / 15.3% / 16.7% / 14.7%
It would seem very unlikely that a student with a true Grade of 1 would be failed by Ofsted even if the security of judgement were low. On the other hand the correctly identified weak, but passing candidates have a substantial chance of being given a Grade 4. The estimated chances of a Grade 4, for these candidates, was always above 12% and did not vary much with the security of judgement. This is an interesting finding and it has implications for the Framework of inspection especially when a single Grade 4 can be trigger by a single discrepancy for one student.
Chance of a single Grade 4 being given to one of the 16 students
There were 4 candidates in each of the 4 categories and one must ask what the likelihood is of any one of them being given a Grade 4. The calculated chances are given below.
Table VI
Security0.6 / 0.7 / 0.8 / 0.9
67% / 66% / 64% / 52%
Now the chances of a clean sheet (all students being given a pass) is reduced to just 48% even for the most secure situation examined for all institutions.
It would seem that the two T4 cells for Reading and Number provide a trap for the judgement of HEIs within the Framework.
Comment
The calculations above were based on explicit assumptions and reality will doubtless differ from this idealised model. Nevertheless it is clear that the Framework has been set up is such a way that any good HEI stands a high probability of being identified as not complying with the secretary of State’s criteria. The calculations suggest that any HEI inspected using the Framework rigorously would have a worse than 50:50 chance of falling foul of the judgements.
This is an outrageous state of affairs:
It is against natural justice.
It has been constructed either in ignorance or with malice aforethought.
It brings into question all Frameworks constructed by Ofsted.
References
Ofsted, 1996a Framework for the Assessment of Quality and Standards in Initial Teacher Training 1996/97 OFSTED Publication Centre
Ofsted, 1996b Primary ITT Follow-up Survey Methodology ref EM: 22/11/96
Taylor (1997) Letter (22nd July 1997) from D.W. Taylor, Head of Teacher Education and Training (Ofsted), to Vice-Chancellor of Durham University
Cavendish (1997) Letter (21st April 1997) from Peter Cavendish Ofsted HMI to Prof. D. Galloway Univ. of Durham
1