Report General Education Assessment Review (GEAR) Sub-Group

Report – General Education Assessment Review (GEAR) Sub-Group

Evaluation of Collegiate Learning Assessment (CLA)/Community College Learning Assessment (CCLA) for Use in Strengthened Campus-Based Assessment (SCBA)

May 2009

Introduction/Background Information

SCBA and GEAR

After the SUNY Board of Trustees passed its Resolution on Strengthened Campus-Based Assessment (SCBA) in June 2004, SUNY System Administration assigned significant responsibility to the SUNY-wide General Education Assessment Review Group (GEAR) in implementing this initiative. One major task was to review and identify nationally-normed measures that campuses could use,if they so chose, to assess the student learning outcomes included under the SUNY GER Knowledge and Skills Areas of Mathematics and Basic Communication [Written] and the Critical Thinking [Reasoning] Competency.

During the 2004-05 academic year, GEAR reviewed many existing nationally-normed tests in these three outcome areas, including the ACT CAAP, the ETS MAPP, the Graduate Record Examination, AccuPlacer, and COMPASS. Ultimately, GEAR found that only the ACT CAAP test for Critical Thinking and two ACT CAAP Writing modules (one for the first Written Communication outcome and one for the second outcome) met its review criteria.[1] Since that time, no additional nationally-normed tests have been identified by GEAR as appropriate for use for either the Critical Thinking [Reasoning] or Basic Communication [Written] outcomes

In recent years, several SUNY campuses have expressed interest in using the Collegiate Learning Assessment (CLA)or the Community College Learning Assessment (CCLA) for SCBA. The CLA/CCLA, which has only become widely available and used by colleges and universities since 2004, is intended to assess writing, critical thinking, and problem solving skills.[2] During the Spring 2009 semester, a GEAR sub-group consisting of Robert Axelrod, Patty Francis, Tina Good, Milton Johnson, Rose Rudnitski, and Melanie Vainder reviewed the CLA/CCLA regarding its appropriateness for use as part of SCBA. This report summarizes the sub-group’s major conclusions from its review and offers recommendations with respect to this issue.

CLA/CCLA

The CLA/CCLA, developed and administered by the Council for Aid to Education (CAE), differssubstantively from other existing nationally-normed tests in a number of ways. Perhaps the most important difference is that the CLA/CCLA is not a multiple-choice test. Instead, it asks students to evaluate a complex set of materials or “prompts”and to construct written responses. The CLA/CCLA must be administered under controlled conditions in a computer lab, and students have two hours to take the test; typical test-taking time (depending on the particular task a student receives) ranges from 75-90 minutes. Answers are scored by trained evaluators, although recently CAE has piloted machine scoring for some tasks. An explanation of CLA/CCLA scoring can be found at

Another difference is that CAE’s analysis of results corrects statistically for students’ incoming SAT/ACT scores, providing a “purer” estimate of student performance (i.e., student CLA/CCLA scores cannot be attributed simply to the fact that they entered the institution with more or less pre-collegiate ability as assessed by the SAT/ACT).[3] Similarly, for last-semester students CLA/CCLA scores are corrected statistically for student’s college GPA.

Yet another difference is that the CLA/CCLA is “inherently” value-added, meaning that campuses receive three sets of scores: 1) those for first-semester students; 2) those for final-semester students; and 3) those indicating differences, or value-added,between the two student cohorts.[4] Campuses may choose between a cross-sectional or a longitudinal approach. The cross-sectional approach tests first-semester students in the fall semester and final-semester students in the spring semester of the same academic year. The longitudinal approach tests the same students at the two different testing points (i.e., after four years in four-year schools and after two years in community colleges). For obvious reasons, most institutions have elected to utilize the cross-sectional approach, so the current discussion will focus on that particular design.

With respect to sampling, the participating institution tests 100 first-year and 100 final-year students. Prior to testing, the institution must submit to CAE a sampling plan detailing how students will be selected for participation in order to assure samples are representative, and institutions cannot proceed with testing until their sampling plan is approved. Other safeguards in this respect include the fact that campuses, subsequent to testing, must report the average SAT/ACT scores and GPA (for last-semester students) for the population from which the sample was obtained. Most important, the fact that the analyses of results correct for student SAT/ACT scores and GPA makes it unlikely that campuses would intentionally select students with higher SAT/scores or GPA’s.

Evaluation Process

In reviewing the CLA/CCLA, the GEAR sub-group used the same criteria that were employed when the GEAR Group conducted its original reviews of existing tests in 2004-05. These criteria are explained below:

Does the test map to the SUNY SCBA outcomes and, if so, which ones?
Does the report on student test performance received by the institution provide sub-scores for specific SCBA outcomes, making it possible for institutions to determine strengths and weaknesses in that performance?
Is the test psychometrically sound (i.e., in terms of reliability, validity, and standardization)?
Can the test be administered within the context of a class session?[5]

During its initial conference call, sub-group members agreed to do preliminary research and share articles/sources with each other electronically, establishing a deadline of about a month, to be followed by three weeks in which members would review all materials that had been received. At that time, members participated in another conference call in order to discuss their impressions and reach final conclusions.

Sub-Group Conclusions

The following presentation of sub-group conclusions is organized by the four evaluative criteria described above, and also includes a discussion of other issues that were raised during group discussion.

Mapping of CLA/CCLA Items to Outcomes

The GEAR sub-group concluded that the CLA/CCLA does map satisfactorily to the two Critical Thinking outcomes (i.e., “Identify, analyze, and evaluate arguments as they occur in their own or other’s work” and “Develop well-reasoned arguments”) and to the first Basic Communication [Written] outcome (i.e., “Produce coherent texts within common college-level written forms”).

Reporting of Sub-scores for Individual SCBA Outcomes

A review of the institutional report received from CAE confirms that separate sub-scores are provided for the two Critical Thinking outcomes and one Writing outcome described immediately above. To be specific, this report provides summary information for students’ performance on the “Critique-an-Argument” task (i.e., Critical Thinking Outcome #1), the “Make-an-Argument” task (i.e., Critical Thinking Outcome #2), and the Analytic Writing task (i.e., Writing Outcome #1).

Psychometric Properties

Because the CCLA was offered for the first time in 2008-09, the psychometric data summarized in this section were gathered on the CLA. These data suggest that the CLA is characterized by reasonable reliability and validity. For inter-rater reliability, reported estimates range from .76 to .87 for performance tasks (mean = .81), and mean estimates of .65 and .80 have been reported for the Make-an-Argument and Critique-an-Argument tasks, respectively. Internal consistency is very good (i.e., ranging from .82 to .91 for Writing and from .84 to .92 for Performance Tasks).

Concurrent validity has been demonstrated between the CLA and SAT/ACT scores, with correlations ranging between .73 - .88 for Writing Tasks and between .78 - .92 for Performance Tasks. In addition, institutions typically report correlations between students’ CLA performance and their GPAs. At present, as part of a large FIPSE grant, construct and concurrent validity of the CLA, ACT CAAP, and EST MAPP are being investigated. When complete, this study should provide much more useful information about the validity of all three of these instruments.

With respect to standardization, the CLA has primarily been administered to college-age students from a wide variety of institution types, yielding significant normative data on student performance. At present, more than 350 institutions overall have participated in the CLA. A listing of institutions participating in the CLA during the 2008-09 academic year can be found at

A more thorough description of the CLA’s psychometric properties can be found at

Test Administration

Because the CLA/CCLA must be administered in controlled conditions in a computer laboratory, and because up to two hours must be allowed for testing, GEAR sub-group members noted that it would be difficult for campuses to administer the test totally within the context of a class session. As a result, campuses proposing to use the CLA/CCLA as part of SCBA, and receive funding for its use, would need to provide detailed information to GEAR as to how they would motivate students to participate in the testing and perform to the best of their ability.

Other Issues

In addition to concerns that have already been described, GEAR sub-group members expressed reservations regarding four other issues related to the administration of the CLA/CCLA. First, the prescribed sample size of 100 for each student cohort struck some members as possibly problematic, and perhaps inappropriate for institutions with very high or very low enrollments.[6] Related to this concern, despite the fact that CAE has procedures in place to help assure that testing samples are representative of the student body overall, it is important for GEAR itself to be satisfied with a campus’ sampling strategy. A third concern revolved around the CLA/CCLA’s assessment of “value-added,” based on the observation that such information might easily be over- or under-estimated if rigorous sampling procedures are not followed. Finally, the GEAR sub-group noted that, compared to other nationally-normed tests, the CLA/CCLA is complex and somewhat difficult to understand and explain, and has administration requirements not all campuses will be able to meet.

Sub-Group Recommendations

Based on its review of materials and subsequent discussion, the GEAR sub-group offers the following recommendations:

Campuses should be given the option to use the CLA/CCLAon a pilot basis (i.e., for one administration) to measure the following SCBA outcomes: a) Critical Thinking Outcome #1; b) Critical Thinking Outcome #2; and/or c) Basic Communication [Written] Outcome #1.

Campuses that choose this option should submit to GEAR a revised general education assessment plan describing these changes to their existing, approved plan.

These revised plans must address all nine evaluative criteria, as appropriate, included in GEAR’s Review Process Guidelines.[7] In particular, the campus must describe in detail how it will assure:

That the student samples to be used in both the first-year and senior groups are representative of the student groups from which they are selected. Strategies for achieving this objective include conducting analyses to demonstrate that the samples do not differ significantly from the overall first-year or senior student groups with respect to factors such as ACT/SAT scores, GPA’s, or course-taking patterns.

That students participating in the testing are adequately motivated to perform well on the CLA/CCLA.

That potential problems associated with value-added designs are adequately controlled. Campuses administering the CLA/CCLA cross-sectionally would need to address concerns related to sample comparability while those administering the CLA/CCLA longitudinally would have to address concerns related to student attrition.

GEAR should provide campuses with a thorough description of the CLA/CCLA during the summer of 2009, to include specificsuggestions regarding sample size, “value-added,” administration procedures, and student motivation.

System Administration, working with GEAR, should ask CAE to sponsor a special Webinar during the summer of 2009 for SUNY campuses interested in using the CLA/CCLA.

Subsequent to the assessment, campuses participating in the pilot should be required to provide two separate reports:

A report on results to System Administration using the standard Summary Report form required by System Administration and available at (Note that the form is scheduled to be updated this spring.)

A report to GEAR, using a special form developed by GEAR for this purpose. This form will request feedback from the campus regarding the adequacy of the CLA/CCLA in mapping to the outcomes of interest andtheir procedures in assuring student representativeness and motivation and sample comparability as well as controlling against attrition. GEAR will use this information to determine whether the CLA/CCLA is appropriate for longer-term use.

[1] These tests can be found at

[2] A full description of the CLA can be found at the CCLA is described at

[3] Students who do not have ACT/SAT scores are asked to complete a 15-minute Scholastic Level Evaluation (SLE), which correlates positively with ACT/SAT measures.

[4] The CLA compares freshmen and seniors whereas the CCLA compares freshmen and second-year students.

[5] Unlike the other three criteria, this last criterion is more “preferred” than required, as GEAR does allow campuses to administer measures in stand-alone fashion, as long as the campuses clarify how they will address possible problems related to student motivation and representativeness.

[6] Institutions are free to over-sample, but must pay for each additional student tested.

[7] These guidelines can be found at