Language Assessment: Opportunities and Challenges

Language Assessment: Opportunities and Challenges

Lyle F. Bachman

UCLA

In the past three decades the field of language testing has matured in both the breadth of research questions it addresses and in the range of research methods at its disposal for addressing these questions and issues. We still grapple with the nature of language ability and the validity of the inferences we make on the basis of assessment results. However, the field is also addressing the difficult questions about how and why language assessments are used, the societal values that underlie such use, the consequences of assessment use, and the ethical responsibilities of test developers and users. Largely because of the increasing worldwide demand for accountability in K-12 education, where there are huge and growing numbers of students whose native language is not the language of instruction, and the growing need in the United States for individuals with high levels of foreign language proficiency, the greatest challenges that language assessment as a field now faces are in the arenas in which language tests are used to make decisions about individuals and institutions.

The immediate and long-term prospects for language assessment are filled with opportunities and challenges. There is a huge demand worldwide for greater involvement of individuals with expertise in language testing in the areas of classroom and accountability assessment. The assessment requirements of No Child Left Behind (NCLB) in the United States for example, have placed increased demands for useful assessments. Recent initiatives of the United States government to increase our nation’s capacity in foreign languages, will also require useful assessments of foreign languages, particularly the less commonly taught languages. Similar demands for the involvement of individuals with expertise in language assessment can be found in countries around the globe. Turning these challenges into accomplishments will depend upon the willingness and capability of language testers to apply the knowledge and skills acquired over the past half century to the urgent practical assessment needs of our education systems and societies.

In this paper, I will list very briefly some ways in which I believe the field of language testing has matured over the past 30 years, and what some of the issues of continuing concern are. I will then briefly describe an “assessment use argument” as a conceptual framework for problematizing many of these issues and for providing a principled basis for bringing together the rich diversity of research approaches at or disposal in order to investigate them empirically. I will then mention some of the challenges and opportunities that face language testers in the 21st century.

The Past 30 Years: Some Kudos

In an overview article 7 years ago, Bachman (2000) noted several areas of development in the field of language testing: a widening scope of inquiry, greater methodological diversity and sophistication, and advances in practice. As I believe these areas are still relevant, I will simply list these areas with some of Bachman’s references, as well as some more recent ones.

Widening Scope of Inquiry

Language assessment research has widened its scope of inquiry in a number of ways. It has broadened its view of language ability, has come to recognize the variety and complexity of factors other than language ability that affect test performance, has engaged in a deepening conversation with researchers in SLA, has taken seriously the consequences of assessment use and issues of ethics and professionalism, and has become more deeply involved in issues of language assessment in schools and classrooms.

Ø Nature of language ability Move from dominant view of language ability/proficiency as a unitary or global ability (e.g., Lowe, 1985; Oller, 1979) to a view that language ability is multicomponential (e.g., Bachman & Palmer, 1996; Canale, 1983; Oller, 1983). (See also the references, under Issues of continuing concern).

Ø Factors that affect performance on language tests increased interest in and understanding of factors other than language ability (e.g., Anderson, Bachman, Cohen, & Perkins, 1991; Clapham, 1996; Cohen, 2007; Kobayashi, 2002; Lumley & O' Sullivan, 2005; Sasaki, 1996; Song & Cheng, 2006)

Ø Closer contact with SLA research issues (e.g., Bachman, 1989; Bachman & Cohen, 1998; Douglas & Selinker, 1985; Kunnan & Lakshamanan, 2006; Wigglesworth, 2001)

Ø Impact of language assessment on instructional practice (“washback”) (e.g., Alderson & Wall, 1993; e.g., Alderson & Wall, 1996; Cheng, 1997; Cheng, Watanabe, & Curtis, 2004; Green, 2007; Wall, 1993, p. 299)

Ø Issues of ethics and professionalism in language testing: Move from little consideration of ethical issues to a concern for such issues as central to the field (e.g., Boyd & Davies, 2002; Davies, 1997; Shohamy, 1997a, 1997b; Stansfield, 1993). (See also the references, under Issues of continuing concern).

Ø Increased involvement with K-12 and classroom language assessment: Move from virtually no interest in school-based or classroom assessment to a growing interest and body of research and practice in this area (e.g., Ke, 2006; Leung, 2004; Rea-Dickins, 2000, 2004)

Greater Methodological Diversity and Sophistication

Language testing researchers now routinely employ both quantitative and qualitative methodologies in both the development of practical language assessments and in their basic research. Some methodological approaches that were either nonexistent or barely used 30 years ago have become standard, mainstream tools for language assessment research and practice. In addition, language assessment researchers are increasingly finding that the use of mixed methods can greatly enhance the relevance and significance of our research.

Ø Quantitative approaches Criterion-referenced measurement (e.g., Brown, 1989; Brown & Hudson, 2002; Hudson, 1991; Lynch & Davidson, 1994); Generalizability theory (e.g., Bachman, Lynch, & Mason, 1995; Bolis, Hinofotis, & Bailey, 1982; Kunnan, 1992; Schoonen, 2005; Stansfield & Kenyon, 1992); Item-response theory (e.g. Bonk & Ockey, 2003; Choi & Bachman, 1992; Henning, 1984, 1992; McNamara, 1990; O'Loughlin, 2002; Weigle, 1994); Structural equation modeling (e.g., Bachman & Palmer, 1981; Choi, Kim, & Boo, 2003; Kunnan, 1998; Shin, 2005; Xi, 2005)

Ø Qualitative approaches Conversation/discourse analysis (e.g., Brown, 2003; Huhta, Kalaja, & Pitkanen-Huhta, 2006; Lazaraton, 1996, 2002; Swain, 2001; van Lier, 1989) Verbal protocol analysis (e.g., Buck, 1991; Cohen, 1984; Lumley, 2002; Uiterwijk & Vallen, 2005)

Ø Mixed methods (e.g., Anderson et al., 1991; Brown, 2003; Clapham, 1996; North, 2000; O'Loughlin, 2001; Sasaki, 1996; Uiterwijk & Vallen, 2005; Weigle, 1994)

Advances in Practice

The past 30 years have also seen advances in language assessment practice in several areas.

Ø Cross-cultural pragmatics (e.g., Hudson, 1993; papers in Hudson & Brown, 2001; Hudson, Detmer, & Brown, 1992; Roever, 2006; Yamashita, 1996)

Ø Languages for specific purposes (e.g., Douglas, 2000; Hamp-Lyons & Lumley, 2001; Skehan, 1984; Weir, 1983)

Ø Vocabulary (e.g., Laufer & Nation, 1999; Meara & Buxton, 1987; Read, 1993, 2000; Read & Chapelle, 2001)

Ø Computer/web-based language assessment (e.g., Alderson & Windeatt, 1991; Chalhoub-Deville, 1999; Chapelle, 1997; Chapelle & Douglas, 2006; Hicks, 1986)

Issues of Continuing Concern

Nature of Language Ability

One major area of inquiry continues to be the nature of language ability. The dominant view in the field continues to be that language ability consists of a number of interrelated areas, such as grammatical knowledge, textual knowledge, and pragmatic knowledge, and that these areas of language knowledge are managed by a set of metacognitive strategies that also determine how language ability is realized in language use or the situated negotiation of meaning (Bachman, 1990; Bachman & Palmer, 1996; Chapelle, 1998; Chapelle, 2006). Recently, however, researchers who focus more closely on the nature of the interactions in language use have argued that the view of language ability as solely a cognitive attribute of language users ignores the essentially social nature of the interactions that take place in discourse. These researchers argue that language ability resides in the contextualized interactions or discursive practices that characterize language use (e.g., Chalhoub-Deville, 1995, 2003; Chalhoub-Deville & Deville, 2005; McNamara, 1997, 2003; Young, 2000). In a critical review of this debate, Bachman (2007) identified three different approaches to defining language ability: (a) ability-focused, (b) task-focused, and (c) interaction-focused. He concluded that the theoretical issues raised by these different approaches to defining the construct—language ability—present challenging questions for both empirical research in language testing and for practical test design, development, and use. For language testing research, these questions imply the need for a much broader methodological approach, involving both so-called quantitative and qualitative perspectives. For language testing practice, they imply that focus on ability, task, or interaction, to the exclusion of the others, will lead to weaknesses in the assessment itself, or to limitations on the uses for which the assessment is appropriate.

A closely related issue is that of the extent to which language ability includes topical knowledge. The effect of test takers’ topical or content knowledge on language test performance is well documented in the language assessment literature (e.g., Alderson & Urquhart, 1985; Clapham, 1996; Douglas & Selinker, 1993; Pappajohn, 1999), and the dominant view has been that this is a source of bias in language tests.[1] That is, it is either generally assumed or specifically stated, in designing a language test and interpreting scores from such a test, that “language knowledge” or “language ability” is what we want to assess, and not test takers’ content knowledge. An alternative, or perhaps complementary, view has been articulated that in the area of languages for specific purposes (LSP) assessment. According to this view, what we want to assess is what Douglas (2000) has called “specific purpose language ability,” which is a combination of language ability and background knowledge. Davies (2001) has argued that LSP assessment has no theoretical basis, but can be justified largely on pragmatic grounds. Bachman and Palmer (1996) argued that whether one includes topical knowledge as part of the construct to be assessed in a language test is essentially a function of the specific purpose for which the test is intended and the levels of topical knowledge that the test developer can assume test takers have.

Uses of Language Assessments

Although validity and validation continue to be a major area of focus of in language assessment research (e.g., Bachman, 2005; Chapelle et al., 2004) this is no longer the sole, or even the dominant concern of the field. Language testers are investigating the difficult questions about how and why language assessments are used, the ethical responsibilities of test developers and users (e.g., Bishop, 2004; Boyd & Davies, 2002; Davies, 2004; McNamara, 1998, 2001), fairness in language assessment (e.g., Elder, 1997; Kunnan, 2000, 2004), the impact and consequences of assessment use (e.g., Hawkey, 2006; Shohamy, 2001), particularly on instructional practice (e.g., Alderson & Wall, 1993; Bailey, 1996; Cheng, 1997; Cheng, Watanabe, & Curtis, 2004; Qi, 2005; Wall, 1996; Wall, 2005), the societal values that underlie such use (McNamara & Roever, 2006) and larger sociocultural contexts in which language tests are used (e.g., McNamara & Roever, 2006). What I find extremely encouraging is that these two strains of research and concern are coming together in a growing body of research that investigates both the validity of score interpretations and the consequences of assessment use (e.g., Bachman, 2005, 2006; papers in Kunnan, 2000; Reath, 2004).

Differing Epistemologies

McNamara (2006) argued that two distinct epistemologies, “quantitative” and “qualitative,” have evolved in the field, and that the vigorous debate these have spurred is healthy for the field. This debate reflects the larger historical debate that has engaged researchers in applied linguistics, education, and the social sciences for decades. Bachman (2006) pointed out that many characterizations of these differences are overly simplistic and described them not as holistic methodologies, but in terms of several different dimensions.

An ongoing critical examination of the epistemological foundations of our research approaches is, as McNamara and Bachman have argued, essential to the vitality of our field. (See, for example, the papers in Chalhoub-Deville, Chapelle, & Duff, 2006). To facilitate such a critical discourse, I believe that we need an epistemology that provides a principled approach to addressing our concerns with both validity and consequences, using whatever research approaches and tools are appropriate and at our disposal. As noted previously, the arsenal of methodological approaches to language assessment research is considerable. What, until recently, has been lacking is a principled basis for linking our concerns with validity and consequences in a way that provides a rationale for combining qualitative and quantitative approaches to research. In my view, an “assessment use argument,” (AUA) as described by Bachman (2005) and Bachman and Palmer (forthcoming) provides such a basis.

Assessment Use Argument

Drawing on argument-based approaches to validity in educational measurement (e.g., Kane, 2001; Kane, Crooks, & Cohen, 1999; Mislevy, Steinberg, & Almond, 2002), Bachman (2005, 2006) has described what he calls an “assessment use argument” as a conceptual framework for linking inferences from assessment performance to interpretation and use. Bachman and Palmer (in press) elaborate on this, describing an AUA as a series of data-claim links, based on Toulmin’s (2003) structure of practical reasoning. An AUA explicitly states the interpretations and decisions that are to be based on assessment performance as well as the consequences of using an assessment and of the decisions that are made. Bachman and Palmer argue that an AUA provides an overarching inferential framework to guide the design and development of language assessments and the interpretation and use of language assessment results.

An AUA consists of a series of claims which can be illustrated as in Figure 1:

Figure 1.

The arrows between the rectangles go both ways to illustrate that the claims, which may also be stated as questions, serve as a guide for both test development and for the interpretation and use of assessment results. In using an AUA for designing and developing an assessment, the developer would first ask what the consequences of using the assessment might be, and the extent to which these will be beneficial to stakeholders. Then he or she would consider the decisions to be made and whether these are sensitive to existing community values[2] and are equitable, with respect to different groups of stakeholders. Then the developer would consider the interpretations that are needed to make the intended decisions, and the extent to which these will be:

· meaningful with respect to a general theory of language ability or a particular learning syllabus,

· impartial to all groups of test takers,

· generalizable to the intended target language use domain,

· relevant to the decision to be made, and