A Project-Based Assessment Model for Judging the Effects of Technology Use

in Comparison Group Studies

Henry Jay Becker

University of California, Irvine

Barbara E. Lovitts

American Institutes for Research

FINAL

DRAFT

August 8, 2000

Abstract

Policymakers and the public continue to ask whether and under what conditions computers make a difference in student achievement. However, instruments used in standardized testing situations to compare students with and without computer experience are largely composed of tasks for which computer experience is not likely to have made a difference. At the same time, they omit a whole class of assessment tasks—tasks where computers might be important tools for the accomplishment of the task. Essentially, by having a limited range of tasks and employing a minimum-resource "standardized" testing environment, the tests which policymakers and the public pay attention to deny computer-capable students the ability to demonstrate important competencies they may have acquired.

On the other hand, innovative technology-exploiting curriculum development projects often define assessment tasks that require technology-specific competencies (e.g., spreadsheet proficiency), thereby rendering comparisons with students without computer experiences hardly relevant. This approach fails to satisfy policy-makers and a skeptical public who still want comparative data—data about competencies that might be aided by the use of technology but which could be accomplished by other means.

The research model developed here is based on defining outcome competencies and skills at a level of abstraction that permits computers to be used but does not require them to be used. The model envisions extended-in-time projects, in which participating teachers and researchers negotiate the boundaries for subject-matter content, component tasks and products, and ground-rules to assure that the assessment activity supports each teacher’s instructional objectives and meets standards needed by researchers for comparability, validity, and reliability. The paper outlines how this process might work and discusses the application of this model to planned study of the effects of high-intensity technology settings of several levels of comprehensiveness. In this matched design study, technology use and teacher pedagogical approach are systematically varied while other factors—subject, grade level, ability level, school context—are held constant as far as possible.

Prelude:

This paper is an exploratory foray into the subject of student assessment methodology by two sociologists--one, a survey researcher; the other, qualitatively trained. It is guided principally by a sense of urgency--a belief that current and impending public policy decisions about investment in computer technologies for teaching and learning in schools requires widespread use of a type of research-based assessment that has rarely even been employed. Given our backgrounds and limited experience in assessment design, we can only sketch an outline of what we believe is necessary. However, we are able provide a rationale for why such a design is necessary, and we start the process of developing a model for how it might be accomplished, knowing that much will need to be changed before it can be implemented successfully on the scale needed to accomplish its objectives.

With anger in her voice, Linda Roberts, Director of the U.S. Department of Education’s Office of Educational Technology, often recounts her experience of being asked by Ted Koppel, host of ABC’s Nightline, on a program stacked with opponents of educational technology, to defend the level of public expenditure on computers for schools given that most of the evidence provided has been anecdotal, and even then can be explained by other factors besides "technology." With all of the millions of dollars that schools have spent on computers, asked Koppel, where is the "national objective study…by someone who doesn't have an axe to grind" that all this money has made a difference. How do we know that kids and teachers are better off for it (Koppel, 1998)?

Try as she could to emphasize that no one argues that technology is a "magic bullet"—that effective programs using computers also depend upon knowledgeable teachers organizing instruction and resources in an effective way—Roberts left that interview frustrated that her instincts and intuitions about thoughtful and well-crafted uses of computers in school classrooms—that they do make a difference in children's competencies and their ability to be successful adults in the 21st century—simply is not being successfully communicated to a skeptical public.

Whenever large amounts of public funds are invested in moving schooling in a particular direction—funds that might otherwise be used in other ways--it is understandable that popular sentiment and the people responsible for making legislative and administrative policy decisions will phrase their concerns by asking the question "what are the effects?"[1] and at what cost compared to other avenues of investment.[2]

At the same time, the variance in outcomes of employing computers in educational programs is likely to be many times greater than the average effects. An instructional practice that makes substantial use of computers may be more or less effective depending on the appropriateness of the pedagogy for the outcomes measured; the convenience, density, and quality of the technology and software available; the preparation and understandings of the teachers and the students; and a whole host of other factors. Thus, any defensible evaluation of the cost-effectiveness of investments in educational technology cannot rely solely on providing data on mean effect sizes, but must assess the contribution of a wide variety of factors in systematically defining the conditions under which consequences of different magnitude arise with reasonable regularity.[3]

Still, the underlying question about effectiveness, even when phrased conditionally and delimited by specific circumstances and applications, is…

  • a legitimate question;
  • one that needs to be addressed at an appropriate level of generalization so as to provide evidence for decision-making at legislative and administrative levels from the Congress to individual school principals and teachers; and
  • ought to be conceived of as inherently comparative—that is, compared to other alternatives that are available and plausible for reaching chosen outcomes.

Not every question about the effectiveness of computer-based approaches to teaching and learning must compare students with and without those experiences. Some of the rationale for having students learn to use computers relates to the putative value of students having those computer skills themselves, for their future use in roles of college student, worker, citizen, or cultural denizen. Whether such computer skills are valuable for students to learn (or which ones, or when, or depending on how they are taught, or for which students) is an issue subject primarily to rational argument rather than by evidence regarding student achievement. Of course, the related question about whether the skills that are taught are actually learned to the point of being useful is certainly an empirical question, and this question may be an important one that lies behind policymaker and public concerns about the worth of school computer investments. However, the efficacy of computer skills training in schools is not a question that we address here.

The type of public policy question which we are trying to address is one that is most meaningful when expressed in terms of competencies that could be acquired without use of computer resources or tools. This issue includes questions about:

  • Students’ levels of understanding in the academic subjects of the school curriculum, their ability to gain further understanding in these areas, and their ability to apply what they know in practical ways.
  • Students' capacities to undertake a wide variety of tasks in various work, citizen, and community roles that involve integration of diverse competencies and understandings. These competencies are largely independent of specific subject-matter disciplines, and include skills in acquiring, evaluating, and using information and skills of working in groups to solve problems and accomplish tasks.

Research in Educational Technology: Why Are These Questions Not Being Answered?

If we grant that there is a legitimate public interest behind Koppel's question about the "effects" of technology investment in schools, we must ask why research on educational technology has not addressed this issue to date. Are the patterns of educational technology’s effects so idiosyncratic that we cannot provide empirically based guidance for school, district, state, and federal decision-makers regarding the consequences of different policies for investing in computer technology? Are the student outcomes from using computers so unique that they can't be compared to outcomes for students given other educational experiences? If educational research is not providing information that informs these policy decisions, what evidence is it providing, and why does that not assist policymakers?

Development-Linked Research

The research literature in educational technology has a number of genres. In one common approach, developers of unique computer-based tools present theoretical arguments for how their software product helps students to accomplish understandings or skills not typically addressed in traditional curricula—for example, by making difficult concepts meaningful to students who are usually not considered sufficiently prepared to grasp these ideas, or by providing new forms of articulation or communication of ideas so that insights arise which might generally not occur. To the extent that this development-oriented literature reports on empirical studies of the consequences of use of these software tools, one typically sees three types of evidence.

One type of evidence is anecdotal in nature and illustrative in intent: for example, protocols of individual students in the process of using the software or "screen shots" of students' work that demonstrate the researcher's perceptions of how students' viewpoints develop over the course of using the software. See, for example, Roschelle, Kaput, & Stroup (2000)'s portrait of a student's use of SimCalc, software designed to make the ideas of calculus accessible to younger students; or Lamon, Reeve, & Caswell (1999)'s use of CSILE, an instructional model and supportive software for collaborative inquiry and knowledge-building, with teacher education students.

A second type of study provides quantitative comparisons over time of the same students' ability to demonstrate competencies putatively associated with use the specific software being studied. For a number of years, for example, researchers at Educational Testing Service studied a group of several dozen secondary school teachers' implementation of STELLA, software designed for evaluating and creating causal models of social and physical systems. Paper and pencil instruments were used on several occasions to assess students' ability to understand "systems concepts" and students were asked to write essays that would enable the researchers to learn how well at that point in their experience with the software students could build a model of a particular system. This research was conducted with the substantial involvement of implementing teachers, and the researchers abandoned their initial plan to include comparable measurements in classrooms taught by other teachers of the same subject, teachers who did not use the systems modeling software in their classes (Mandinach & Cline, 1994;Mandinach & Cline, 1996).

A related type of study contrasts teachers who implement the same technology-based innovation but in different ways (either naturally-evolving differences or systematically prescribed variations). An example of this type of research is Allen & Thompson (1995)'s study of four classes in one elementary school that participated in the long-term Apple Classroom of Tomorrow demonstration of high-density technology classrooms. In the Allen and Thompson study, the word processing products of students in two classes were communicated to a outside audience for feedback via e-mail, and the quality of their writing was compared with the products of word processing done in two other classes in the same school, where the work was turned in for traditional teacher comments.

In its best instantiations, development-linked research:

  • is premised on a cumulative research program derived from well-informed and empirically validated cognitive research principles;
  • self-consciously employs a variety of reflective and formative evaluation techniques to assess and improve both the underlying model and its implementation;
  • is sensitive to the specific circumstances in which it is being implemented and adapts to these circumstances without breaking with its theoretical principles; and thus
  • provides carefully researched and well elaborated models for using technology resources and tools in ways that can significantly improve children's depth of understanding of important academic content.

So, given its own objectives, development-linked research certainly makes a contribution to the quality of computer-based tools for learning and instruction. Indeed, it is the type of research most urgently advocated by the White House Panel on Educational Technology (President's Committee of Advisors on Science and Technology, 1997). However, development-linked research does not provide the kinds of direct information that would help policymakers set legislative policy nor help school decision-makers make investment decisions.

Research associated with this tradition is, for the most part, designed either to help explain the rationale and functioning of the program to an outside audience or to assist the designers themselves in fine-tuning the product and its implementation. When systematic empirical research is conducted around development products, the outcomes studied tend to be defined in ways specific to the technology involved or for other reasons tend not to be measured in classroom settings where the technology is not being used.

Development projects often aim less at providing new ways for students to accomplish the same understandings as other instruction does than at developing curricula for new competencies and understandings that are not part of ordinary subject-matter instruction. As a result, the notion of comparing students who did not participate in the project with those who did founders on the lack of a common benchmark or common tools for making these comparisons.

Computer- vs. Non-Computer Comparisons: Standardized Testing Environments

In contrast to development-linked research, the other type of research widely present in the educational technology literature is inherently comparative—research that explicitly compares students who have used computers in school with students who have not. Most of this research is based on standardized norm-reference tests. However, this research, too, is unsatisfactory for purposes of policy-making or for understanding the actual effects of students' computer experiences. Norm reference testing prioritizes a specific set of student outcomes of interest as well as assessment procedures for measuring those outcomes, and does so in a way that seems likely to systematically under-represent what students are gaining from their in-school computer experiences.

Conventional measures of student "achievement" outcomes that have been developed for large scale indicators research (e.g., NAEP, TIMSS) and for purposes of instructional program assignments of students and accountability (e.g., state standards-based tests, SAT-9) are often used to compare students who have had computer experience in school (or specific experiences) with those who have not. These standardized norm-reference tests have been used in studies with large national datasets, such as Wenglinsky (1998)'s analysis of the effects of computer experience at school on NAEP mathematics scores. They are also the primary empirical basis for the dozen or more "meta-analyses" conducted over the past 30 years (e.g., Kulik, 1994), analyses which have accumulated findings of hundreds of small studies of computers and student achievement, most fielded in a handful of classrooms.

The generic competencies measured in these tests, such as "mathematics computation" or "language arts mechanics," are carefully designed around uniform testing conditions in which students are given a minimal set of materials and information, are given relatively specific directions, and are asked to answer specific multiple-choice or short-answer questions or to construct a brief handwritten explanation or essay. Indeed, providing a common information environment, identical (and necessarily limited) materials and tools, and identical student tasks has been one of the primary characteristics of a sophisticated model of assessment that aims to provide highly reliable information about individual students' abilities to demonstrate recall knowledge of and understanding of subject-matter concepts and to employ related skills on demand.

Such an approach to assessment makes sense for measuring specific enabling skills such as reading comprehension and algorithmic work in mathematics. For those skills, students are likely to perform at levels that approach their underlying competence, despite being given little contextual information or other resources. Standardized testing environments may also usefully measure students' recall of factual and conceptual knowledge as long as that knowledge could reasonably be assumed to have been universally been a topic of instruction across diverse schooling environments.

However, the approach just described could not conceivably be sufficient for assessing the full range of student learning. There are many competencies for which creating a strictly limited information and resource environment and giving precise directions regarding the nature of the problem or question to be addressed would underestimate the competencies of students. In particular, for competencies such as the ability to gain new knowledge or to apply knowledge to new contexts—as opposed to the ability to repeat ideas or facts previously remembered—it would seem to be quite a handicap for a student to have to demonstrate his competence without being able to use those particular tools and resources that he had come to rely upon because they were valuable in similar circumstances in the past. Thus, the assessment setting for standardized tests defines away any possible utility of computer-based tools and resources by preventing their use during the assessment situation. What is being tested, then, is whether the use of computers in learning has any residue that carries over to intellectual challenges students may face without being able to call upon technology tools or resources. In Salomon, Perkins, and Globerson's terms, norm-referenced outcome measurements test the effects "of technology" rather than the effects "with technology" (Salomon, Perkins, & Globerson, 1991).