4

USING A VISUAL ANALOGUE SCALE TO ESTIMATE ASPECTS OF ETHOS IN FOUR SECONDARY SCHOOLS

Edwin Smith, University of Warwick Institute of Education

Paper presented at the Annual Conference of the British Educational Research Association, University of Exeter, England, 12-14 September 2002

Abstract

This paper reports the use of two instruments designed to estimate three main dimensions of ethos: achieving atmosphere / climate for learning; perceptions of the impact of other pupils; and pupils’ perceptions of the social background of the pupil population as a whole.

Visual analogue or linear analogue scales have been fairly widely used in medical psychology for several decades, mainly but not exclusively in estimating patients’ perceptions of pain. Figures for reliability and validity are encouraging. In contrast, no studies in educational research have been identified which make use of this technique although some semantic differential scales provide scales which are verbally anchored only at their poles. These scales, however, provide a number of discrete points or dashes.

All Year 8 pupils in four schools of different pupil social background composition (n = 676) took part in the piloting of the instruments. Each item used in this study presents to respondents a 10-centimetre horizontal line extending between two polar anchors, for example ‘no lessons’ and ‘all my lessons’, or ‘all the pupils’ and ‘no pupils’. They were asked to make a vertical mark on the line to indicate their perception or estimate. The marks were digitally scanned to provide an interval (and arguably ratio) level of measurement which permits a range of statistical treatments to compare the four schools. These treatments are discussed.

Acknowledgements

I was introduced to visual analogue scales by Tirjinder Singh Gidda, one of the many students at Churchfields High School, who over many years generously allowed me to learn from them as well as being one of their teachers. He and others continued to supply me with opportunities to learn after they left the school, and it was during a discussion of a project in his BDS course that he explained the scales to me. Without his unfailing willingness to share, I would not have known about them.

I am grateful, too, to Dr Sean Neill and Dr Daniel Mujis in the Warwick University Institute of Education for their comments on preliminary drafts. Any errors remain my responsibility.

The staff and students of five schools in a Midlands LEA have been exceptionally forbearing and generous in giving time to help with the research.

1. Context

The instruments discussed here form part of a study of the school composition effect in four secondary schools in a Midlands LEA. In that study, school- level SES is an important variable, and measures have been derived using 1991 census data for pupils’ postcodes to calculate Jarman indexes Jarman 1983). One pupil questionnaire attempts to triangulate these data, particularly in view of the date of the most recent available census data (1991). It asked pupils to provide their impressions of the proportions of pupils in their schools who matched descriptors related to Jarman elements, for example ‘have someone living in their family who has a car’. A complete questionnaire is in Appendix 1. The second questionnaire attempts to estimate pupils’ perceptions of the ‘ethos’ of their schools. A complete questionnaire is in Appendix 2. This paper focuses mainly upon this, second questionnaire. The concept of ethos is complex, and it is not possible here to explore it. The author has discussed the relationship between ethos, habitus, situated learning and culture elsewhere (Smith 2002, submitted).

2. Visual analogue scales.

Both questionnaires used horizontal 10 centimetre visual analogue scales in preference to more conventional Likert scales and rating scales. Although visual analogue scores appear not to be reported in the social science journals, they are commonly used in medicine, particularly to measure patients’ experience of pain (for example Pomeroy et al 2001; Schwenk et al 2002). However, still in medical contexts, Kluger et al (2002), Hall et al 2000, MacDonald et al (2002), O’Neill et al (2000), Jepegnanam et al (2001) and Ashley et al (2001) used them to assess attitudes. In nearly every case, the line was 10 centimetres in length and some were horizontal and some vertical.

These scales are derived from instruments originally devised to measure well being (Clarke and Spear 1964, cited in Reville et al 1976), and so their use in this study represents something of an innovation. The visual (or linear) analogue scale has a key advantage over the Likert-style inventories and rating scales which are commonly used in attitude surveys. Intermediate verbal anchors are not needed and consequently there is no need to assume equality of interval for example between agree / strongly agree and disagree / strongly disagree. The provision of a ten-centimetre line with end-point verbal anchors enables respondents to mark any point on the continuum. It can more safely be assumed that the result is an interval-level measurement, and if the lower anchor point represents a zero, then the measure can even be considered as a ratio-level one.

The reliability of the linear analogue depends on visual and motor co-ordination, the ability of a respondent to put a mark where they intend to (Reville et al 1976). While that may be more of an issue in clinical medicine, even among healthy adolescents there may be cultural and visual factors which impact on this ability, for example dyslexia or dysgraphia and pupils’ experiences of perceiving and visually analysing straight lines. The latter might affect significant numbers of pupils in a school, but the latter is no more of a factor in visual analogue scales than in Likert ones. Measures of reliability for visual analogue scales in medicine yield promising results for lines of 10, 15, and 20 cm length (but not 5 cm) (Reville et al 1976).

Jaesch et al (1990) conducted a carefully controlled study into to relative merits of visual analogue scales and seven-point ones, again in a medical setting. They found no significant differences between the two scales in validity or responsiveness. They did, however, conclude that visual analogue scales require considerably more training of the respondents (5 - 10 minutes compared with less than five), but as Butler (1997) argues, this is a function of deteriorating abstract reasoning with age (and possibly with pain). This factor is less likely to be an issue with school pupils than with chronically ill patients, but to test whether school-age pupils’ responses, a third pupil questionnaire was devised (see below)

Arslan et al (2001) found that visual analogue scales compared favourably with ordinal variables for the measurement of pain, and Jepegnanam et al (2001) found a visual analogue anxiety scale to be more reliable than conventional anxiety and depression scores in assessing preoperative anxiety

Butler (1997) developed a critique of the use of visual analogue scales in assessing pain, but his main sources of concern were the complexities involved in conceptualising pain, and the questions whether it can be estimated by a unidimensional measure. Ethos, too is (arguably) multi-dimensional, but here the questions did not directly ask pupils to estimate ethos: they addressed some of the putative dimensions within the concept. Butler points out that in pain measurement, it is difficult to select ‘end phrases’ the wording of the anchor points at the ends of the lines. In the questionnaires discussed here, this was less problematic as the end-points were unequivocally ‘no pupils / all pupils’ ; all lessons / no lessons’; and ‘ completely agree / completely disagree.

More pertinent to the present study is Butler’s challenge to the treatment of measures on the scales as interval level, and he argues that even if scores are statistically standardised. the grounds for applying parametric statistical procedures are far from secure However, this concern is centred mainly on the question whether patients can discriminate among 100 levels of pain intensity.

3. Validity, sensitivity and intervality

It may be that at the age of twelve to thirteen years not all school pupils have achieved a level of abstract reasoning commensurate with translating a number / proportion into a mark on a ten-centimetre line. This would be susceptible to empirical testing, but there appears to be no published research. The cognitive processes involved may include perception-formation, use of analogy to convert a numerical proportion into a proportion of a line, and psychomotor skills to mark accurately the intended proportion of the line. The first of these perception-formation) is neutral to these concerns since it is perception that is being estimated in the present study. The third (psychomotor skills) is less of an issue with school-age pupils than with chronically ill patients. It is the operation of the analogy between perceived proportion on a population and the proportion of a line that provides the strongest threat to validity.

To test this, a questionnaire was completed by a Year 8 tutor group in a secondary school. the school has the lowest mean SES of the four in the study. This questionnaire is presented in Appendix 3. There are essentially three types of question:

·  translating a number between into a mark on a ten-centimetre line

·  translating a simple fraction into a mark on a ten-centimetre line

·  translating a perception of a proportion (e.g. how much of a circle is shaded) innate a mark on a ten-centimetre line

The validity can be estimated by comparing the true value of the proportions I the question and the mean score returned. Since in this study, the mean sores on pupil and teacher questionnaires are the basis for analysis, it is the validity of the mean that is important. Fig 1 shows that with three exceptions (arrows , mice and lines, with true values of 0.4, 0.9 and .036 respectively ), all the mean errors were less than 0.1, i.e. one centimetre on the 10-centimetre scales

Fig 1 Mean size of error in Year 8 pupils’ estimates of proportion using a visual analogue scale

The size of the errors is related to the fineness of discrimination involved (sensitivity). The greatest error was for an item requiring an estimate of 43 out of 109 lines, an item which might tax visual acuity and perceptual judgement as much as the skills of translating those into marks on 10-centimetre lines.

The sensitivity of the mean scores can be judged from the 95 percent confidence intervals (Table 2) which suggest that nearly all the items are estimated by the pupils as a group within a range plus or minus 7 percent

cakes / drinks / mice / chocolat / half / quarter / eighth / arrows / shapes / lines
TRUE / 0.5 / 0.2 / 0.9 / 0.5 / 0.5 / 0.75 / 0.88 / 0.4 / 0.4 / 0.36
CI / 0.041 / 0.073 / 0.064 / 0.054 / 0.032 / 0.064 / 0.074 / 0.051 / 0.073 / 0.065

Table 2. Confidence intervals of Year 8 pupil’s estimates of proportion using a visual analogue scale

Whether, and to what extent the estimates on the visual analogue scales can be treated as interval-level data is indicated in a scatterplot comparing true and estimated values. Figs 3 and 4 support a claim that the estimated values are proportional to the true values - a straight line passing through the origin. Proportionality refers to a relationship in which increasing one of the variables by any ratio results in the second variable changing by the same ratio. This is the sole criterion for a ratio level of measurement, which itself entails intervality. The outlier in Fig 3 is the ‘lines’ question (43 out of 109 lines)

Fig. Scatterplot of Year 8 pupils’ mean scores against true values

By omitting one outlier (the 43 out of 109 lines item) it can be shown that the calculated linear trend-line passes almost exactly through the origin. If two sets of numbers are proportional, and one of them is interval it is extremely unlikely that the second is not.

Fig 4. Scatterplot from Fog 3, omitting one outlier, and adding a linear trend-line.

The concepts of intervality and sensitivity are distinct: the evidence points to a sensitivity for the mean scores of plus or minus 7 millimetres, and convincing support for a claim of intervality and validity - but only for the mean scores for the group of pupils.

That is not the same as a claim for validity, sensitivity and intervality for each individual pupil’s score. In the field of medical treatment of individuals (to which all the critiques of and support for visual analogue scales refer) it is the individual’s estimate of pain, etc., that are of interest, not means of groups. In the present study, like most of those published in the areas of attitude in medical matters, the salient variables are not individual estimates but the group means of those estimates . Thus the evidence for validity, sensitivity and intervality provided by the data from 25 Year 8 pupils in this study justify treating the group means as interval-level data.

However, taking an unweighted mean assumes that the individual data are ratio level. It can be argued that if those individual data were not at all interval-level, their means would not show the pattern they do. Stronger support comes from scatterplots of individual pupils’ estimates against true values. Taking the first six pupils (a random selection since the pupil order in which the data were entered was entirely random, the questionnaires having been ‘shuffled’ repeatedly) a similar, but unsurprisingly slightly weaker pattern of proportionality is discernible (Figs 5). Given that pupils 3 and 4 have poor enough English to warrant language support in their lessons, the overall pattern presented by the six scatterplots supports a claim that individual responses in these visual analogue scales can indeed be treated as interval-level, even in the school which of the four in this study has the lowest SES, and when the sample includes pupils with weak English language skills. Weak language skills do not, of course of necessity impede the kind of cognitive processing implicit in the visual analogue calculus. Pupil 3 is an example of this..