LANGUAGE TEST CRITIQUES

I. Test Purpose and Use

The Cambridge Young Learners of English (CYLE) tests are a new English proficiency instrument produced by the University of Cambridge Local Examination Syndicate (UCLES). They are intended to serve as a bridge to the UCLES Main Suite examinations (i.e., KET: Key English Test and PET: Preliminary English Test). The tests were first administered in 1997 in Cambridge and are currently taken by over 310,000 children in about 55 countries around the world. CYLE came to China in 1996.

The purpose of Cambridge Young Learners English Tests is “to offer a comprehensive approach to testing the English of primary learners between the ages of 7 and 12” (UCLES, 2003a:2). The YLE tests span three ability levels: Starters, Movers and Flyers. The Starters is designed for children at the age of 7, a Movers candidate is to be aged between 8 and 11 and a Flyers candidate is between 9 and 12. The YLE tests are designed to assess the listening, reading, writing, and speaking skills of children learning English as a foreign language (EFL). Each level has three components: Listening, Reading and Writing, and Speaking.

While one purpose of the test is to measure overall English proficiency, the CYLE tests also aim to give a positive impression of English testing to young learners in order to encourage and motivate them to continue their learning. To this end, the test items use large colorful illustrations and emphasize communicative discourse and vocabulary.

The tests aim to: “1) sample relevant and meaningful language use; 2)measure ability accurately and fairly; 3) present a positive impression of international tests; 4) promote and encourage effective learning and teaching”.(Cambridge Young Learners English Test,2003)

II. Content Validity

Content validity is the relevance of the test content to the goal of the test. A test is said to have content validity if its content constitutes a representative sample of the language skills, structures, etc, with which it is meant to be concerned. (Hughes, 2004)

This language test critique will focus on the speaking test of Movers, the second level of the Cambridge Young Learners English Test. The purpose of this speaking test is to identify EFL young learners’ interactive listening ability, pronunciation, and ability to produce words and phrases. Their speaking ability in Movers is measured by their ability to complete four tasks.

Part / Main Skill Focus / Input / Response
1 / Identifying and describing 4 differences between 2 pictures / 2 similar pictures / Words/short phrases
2 / Narrating a story / 4 pictures story / Extended narrative
3 / Identifying ‘odd one out’ and give reason / Picture sets / Words/phrases
4 / Answer personal questions(open-ended questions) / Examiner’s questions / Answer personalized questions (words/short phrases)

Figure 1: 4 Sections of Movers Speaking Test

One argument for the validity of the speaking test is that it is a direct test (testing speaking by speaking). Hughes (2004) suggests that direct testing improves the validity of the test since it promotes authentic tasks. Another point is that the four skills (comparing pictures, telling a story, categorizing and exploring, talking about oneself) constructed for the Movers speaking test are definitely geared toward testing the speaking ability of EFL young learners at the age of between 8 and 11. These four skills are also a representative sample of necessary speaking skills for this age level. In addition, the speaking test of young learners is age appropriate because it is not a pencil and paper test. Many sets of colorful pictures in each task are used to elicit describing, story-telling, explaining and communicative responses from the young learners. Items in the test are primarily comprised of everyday vocabulary for children’s toy, activities, general interests, and concepts such as weather, animals, days of the week, and shapes.

On the other hand, some vocabularies in the speaking test indicate that the test does not assess the skills or knowledge it wants to assess. For example, some pictures about American foods like “sandwich and salad” are inappropriate because many Chinese young learners, especially kids from the country have never tried these foods and may well not be able to answer. Rather than testing knowledge of the vocabulary items, the test is testing knowledge of another culture. Moreover, if a student is not used to seeing the representations of the words, even though they know the word, it could call into question the validity of the test. For example, a telephone or cinema might be represented differently in another country, so the test would be biased against students from countries where a telephone or cinema are depicted differently. Students from low-income backgrounds who have never seen telephones or cinemas would also be at a disadvantage.

III. Reliability

Reliability is the consistency of measurement. According to Hughes (2004), there are two components of test reliability: the performance of candidates from occasion to occasion, and the reliability of the scoring. I will analyze the reliability of the Movers’ speaking test from the perspective of the performance of candidates and oral examiners.

The Movers speaking test is conducted by one oral examiner and one candidate and takes 7 minutes for each candidate. Oral examiners are required to be trained on how to carry out the test, how to give positive feedback to the candidate, how to strictly follow “interlocutor frames”, and how to score. The interlocutor frames and the training requirement of oral examiners do help establish the reliability of the test.

However, some factors threaten the reliability of the test. First, it is not easy to guarantee the quality and objectivity and consistency of oral examiners although perfect consistency is not to be expected in the performance in an interview. In conducting the speaking test in China, it is realized that the unavoidable differences among oral examiners are a serious threat to the reliability of the test. The oral examiners’ performance has strong impact on candidates’ performance in the test: some oral examiners have clear English pronunciation to young learners, but some do not; Some oral examiners are able to follow the interlocutor frame strictly, but some can not help being flexible although they have been trained and monitored before the test; Some oral examiners are able to give positive feedback, but some never use encouraging language and even correct candidates’ errors repeatedly during the test; Some oral examiners give long wait times to candidates, but some do not give wait time at all; Some oral examiners provide clear and explicit instructions of the tasks, but some do not. Second, in some areas in China, it is not possible to ensure that all candidates have the opportunity to familiarize themselves with the format and the speaking testing techniques in order to learn what will be required of them and improve their performance on the exam. As an oral examiner, I always found that some candidates who were not familiar with test procedures did poorer job than those who knew what they were expected of them. Not all candidates have access to the sample tests or the provision of practice materials. So if any aspect of the speaking test is unfamiliar to candidates, they are likely to not perform as well as they would do otherwise. Third, administration of the test is also one threat to the reliability. The greater the differences between one administration of a test and another, the greater the differences one can expect between a candidate’s performance on the two occasions. It is hard to ensure uniformity in conducting the speaking test in China. It is almost impossible for each testing center to strictly adhere to the timing and quiet setting with no distracting sounds or movements. Fourth, the scoring of the speaking test is mostly subjective and some oral examiners whose scoring deviates markedly from the norm during the training session are still used due to the lack of English teachers in some areas in China. Fifth, In order to reduce young learners’ anxiety and deliver the test to children in an enjoyable and non-threatening way, only one examiner is used in the interview. However, it is difficult for one oral examiner to conduct the test and keep track of the candidate’s performance at the same time. In this subjective testing, it is a threat to reliability that the candidate’s performance is scored by only one scorer. Finally, some candidates have to wait a longer time than others to be interviewed. Fatigue can become an issue for the last individual student and threaten the reliability of the assessment.

IV. Scoring Method and Score Reporting

The Movers speaking test is criterion-referenced,

“which compares learner’s performance, not to other learners, but to a set of criteria of expected performance or learning targets. Criterion-referenced assessment can match the child’s performance against an expected response on an item, or it may make use of a set of descriptors along a scale which a learner is placed” (Cameron, 2001).

In the Movers speaking test, examples of an expected response and a descriptive scale are given in respect to the candidates’ speaking skills. The criterion used to assess the child’s speaking skills is the production of answers in single words or short phrases, and is rated on three aspects: 1) interactive listening ability; 2) pronunciation; 3) production of words and phrases. Each criterion carries a maximum mark of 3. (UCLES 2003a) (see Appendix)

A strong motivational characteristic of this test is that there is no pass or fail. It is designed to test what the candidates know instead of what they do not know. Speaking is scored locally by the oral examiner. All candidates are given a certificate from Cambridge University to reward their efforts and abilities. Students of the Movers speaking test receive certificates with an array of shields (1-5) which means that the best candidates receive five shields for their speaking proficiency and the lowest scoring examinees receive one shield.

Although the Handbook and Research notes have understandably de-emphasized numerical scores at this early stage of a young learner’s career, MacGregor (2001:5) points out “one obvious weakness in this score reporting system is that no indication of what these shield scores mean, and therefore the scores cannot be translated into descriptions of what the examinee is and is not able to do”. This is largely difficult for parents, teachers, researchers, and the oral examiners to understand how the raw scores are translated into shield bands each year. This is a serious problem for candidates who are almost ready to move to the next Flyers level and even worse for students at the Flyers level who are ready to move to the main suite of examinations.

According to Jones (2002), UCLES has now recognized the need to investigate the relationships between YLE levels and between the YLE and the main suite because of increasing demand by users to try to interpret results within such wider frameworks. One suggestion I offer is that a scoring report of the individual student’s strength and weakness in achieving different task types in the speaking test would greatly help the students, parents, and teachers evaluate the English teaching and learning process and results. This speaking test will become a powerful instructional tool in EFL contexts if the report of the individual student performance in the speaking test does already exist. The scoring report will also serve to achieve one of the purposes of the test: to promote and encourage effective learning and teaching.

V. Impact of the test

The stated purposes of Cambridge Young Learners’ English Tests are thought to be far from “high–stakes” for any individual child’s scholastic career (Bailey, 2005). Although the Movers speaking test can give the young learners, their teachers, and parents a kind of measure of how well they are acquiring English speaking, the testing result is not highly relevant and appropriate to their decisions to be made.

The test washback effect the young learners receive from taking the Movers Speaking Test relates well to promote their learning and using English as a Foreign Language. It is a test of real language skills. Language use is tested in meaningful, realistic, and interesting ways, using materials especially suited to children aged between 8 and 11. The test has a positive impact on children’s spoken English as they experience real communication in a foreign language. They are fun and children enjoy taking them. In this sense, the test does promote effective oral English learning.

The test has a positive instructional washback. For example, in China, UCLES has been working closely with Sino-British Center of China National Education Examination Authority (NEEA) to promote and support the Cambridge Young Learners English Tests. The impact of the test on instruction is evidenced not only by teacher training, the network of English training and testing, the national conferences and seminars on educational measurement for Cambridge Young Learners managed by UCLES and the Sino-British Center, but also by the large number of instructional materials linked to test content which “span the instructional spectrum from multi-unit formal classroom programs to cheerful puzzle books for independent study, and these materials appear to focus on language instruction rather than rehearsal of test taking skills.”(Bailey, 2005)

One positive consequence of using the test is that it does promote English teaching and learning in an EFL context. Both EFL teachers and students can greatly benefit from the principles of Cambridge Young Learners English: student-centered, activity-centered, listening and speaking first and writing and reading second, interest first, communicative teaching and learning, and motivational test. This is usually what EFL formal classroom lack. In this case, the purpose of the test is consistent with the instructional goals of EFL teachers and curriculum.