AOM PAPER 13912

Comprehensive Assessment of Team Member Effectiveness: A Behaviorally Anchored Rating Scale

Abstract

Instructors who use team learning methods often use self and peer evaluations to assess individual contributions to team projects to provide developmental feedback or adjust team grades based on individual contributions. This paper describes the development of a behaviorally anchored rating scale (BARS) version of the Comprehensive Assessment of Team Member Effectiveness (CATME). CATME is a published, Likert-scale instrument for self and peer evaluation that measures team-member behaviors that the teamwork literature has shown to be effective. It measures students’ performance in five areas of team-member contributions: contributing to the team’s work, interacting with teammates, keeping the team on track, expecting quality, and having relevant knowledge, skills, and abilities. A BARS version has several advantages for instructors and students, especially that it can help students to better understand effective and ineffective team-member behaviors and it requires fewer rating decisions.

Two studies were conducted to evaluate the instrument. The first shows that the CATME BARS instrument measures the same dimensions as the CATME Likert instrument (linking the BARS version to the theoretical underpinnings of the Likert version). The second provides evidence ofconvergent and predictive validity for the CATME BARS. Ways to use the instrument and areas for future research are discussed.

KEYWORDS: Peer Evaluation, Team-member effectiveness, Teamwork

AOM PAPER 13912

College faculty in many disciplines are integrating team-based and peer-based learning methods into their courses in an effort to help students develop the interpersonal and teamwork skills that employers strongly desire but students often lack (Boni, Weingart, & Evenson, S., 2009; Cassidy, 2006; Verzat, Byrne, & Fayolle, 2009). Further, students are often asked to work in teams as part of experiential, service learning, research-based learning, and simulation-based learning projects, which aim to give students more hands-on practice and make them more active participants in their education than traditional lecture-based coursework (Kolb & Kolb, 2005; Loyd, Kern, & Thompson, 2005; Raelin, 2006; Zantow, Knowlton, & Sharp, 2005). As a result, there is a recognized need for students to learn teamwork skills as part of their college education and teamwork is increasingly integrated into the curriculum, yet “teamwork competencies and skills are rarely developed” (Chen, Donahue, & Klimoski, 2004: 28).

Teamwork presents challenges for both students and faculty. Student teams often have difficulty working effectively together and may have problems such as poor communication, conflict, and social loafing (Burdett, 2003; Verzat et al., 2009). For instructors, challenges in managing teamwork include preventing or remediating these student issues and assigning fair grades to students based on their individual contributions (Chen & Lou, 2004). Evaluating students’ individual contributions to teamwork in order to develop team skills and deter students from free riding on the efforts of teammates is an important part of team-based instructional methods (Michaelsen, Knight, & Fink, 2004). This paper describes the development and testing of an instrument for self and peer evaluation of individual member’s contributions to the team for use in educational settings where team-based instructional methods are used. The results of two studies provide support for the use of the instrument.

LITERATURE REVIEW

There are several reasons why teamwork is prevalent in higher education. First, educational institutions need to teach students to work effectively in teams to prepare them for careers because many graduates will work in organizations where team work methods are used. About half of U.S. companies and 81% of Fortune 500 companies use team-based work structures because they are often more effective or efficient than individual work (Hoegl & Gemuenden, 2001; Lawler, Mohrman, & Benson, 2001). Second, team-based or collaborative learning is being used with greater frequency in many disciplines to achieve the educational benefits of students working together on learning tasks (Michaelsen, et al., 2004). Team learning makes students active in the learning process, allows them to learn from one another, makes students responsible for doing their part to facilitate learning, and increases students’ satisfaction with the experience compared to lecture classes (Michaelsen, et. al, 2004). Finally, accreditation standards in some disciplines require that students demonstrate an ability to work in teams (ABET, 2000;Commission on Accreditation for Dietetics Education, 2008).

Self- and Peer Evaluations

Many team-based learning advocates agree that the approach works best if team grades are adjusted for individual performance. Although instructors could use various methods to determine grade adjustments, peer evaluations of teammates’ contributions are the most commonly used method. Millis and Cottell (1998) give strong reasons for using peer evaluation to adjust student grades. Among these are that students, as members of a team, are better able to observe and evaluate members’ contributions than are instructors, who are outsiders to the team. In addition, peer evaluations increase students’ accountability to their teammates and reduce free riding. Furthermore, students learn from the process of completing peer evaluations. Self and peer evaluations may also be used to provide feedback in order to improve team skills (Chen, Donahue, & Klimoski, 2004).

There is a fairly large body of management research that examines peer ratings, which are used in work contexts, such as 360-degree performance appraisal systems and assessment center selection techniques, in addition to executive education and traditional educational settings (Fellenz, 2006; Hooijberg & Lane, 2009; London, Smither, & Adsit, 1997;Saavedra & Kwun, 1993; Shore, Shore, & Thornton, 1992). Several meta-analytic studies have examined the inter-rater reliability of peer evaluation scores and their correlations with other rater sources and have found that peer ratings are positively correlated with other rating sources and have good predictive validity for various performance criteria (Conway & Huffcutt, 1997; Harris & Schaubroeck, 1988; Schmitt, Gooding, Noe, & Kirsch, 1984; Viswesvaran, Ones, & Schmidt, 1996; Viswesvaran et al., 2005).

In work and educational settings, peer evaluations are often used as part of a comprehensive assessment of individuals’ job performance to take advantage of information that only peers are in a position to observe. Thus, peer ratings are often one part of a broader performance assessment that also includes ratings by supervisors or instructors or other sources (Viswesvaran, Schmidt, & Ones, 2005). In addition to assessing performance, peer evaluations of teamwork reduce the tendency of some team members to free-ride on the efforts of others. Completing peer evaluations also helps individuals to understand the performance criteria and how everyone will be evaluated (Sheppard, Chen, Schaeffer, Steinbeck, Neumann, & Ko,2004).

Self appraisals are valuable because they involve ratees in the ratings process, provide information to facilitate a discussion about an individual’s performance, and in the performance appraisal literature, research has shown that ratees feel that self appraisals should be included in the evaluation process and that ratees want to provide information about their performance (Inderrieden, Allen, & Keaveny, 2004). Self-appraisals are, however, vulnerable to leniency errors (Inderrieden, et al., 2004). An additional problem that plagues self-appraisals is that people who are unskilled in an area are often unable to recognize deficiencies in their own skills or performance because their lack of competence prevents them from having the metacognitive skills to recognize their deficiencies (Kruger & Dunning, 1999). Therefore, people with poor teamwork skills would be likely to overestimate their own abilities and contributions to the team.

A number of problems are associated with both self and peer ratings (Saavedra & Kwun, 1993). One is that many raters, particularly average and below-average performers, do not differentiate in their ratings of team members when it is warranted. Furthermore, when rating the performance of team members, raters tend to use a social comparison framework and rate members relative to one another rather than using objective, independent criteria. Also, some raters worry that providing accurate ratings would damage social relations in the team.

In spite of the limitations of self and peer evaluations, instructors commonly use them because instructors need a measure of students’ teamwork contributions that can be used to adjust students’ grades in order to effectively implement team-based learning. However, although the use of peer evaluations is commonplace, there is not widespread agreement about what a system for self and peer evaluation should look like. This paper describes the development of a behaviorally anchored rating scale (BARS) instrument for the self and peer evaluation of students’ contributions to teamwork in higher education classes that is based on research about what types of contributions are necessary for team-members to be effective.

CATME Likert-Scale Instrument

Although a large body of literature examines individual attributes and behaviors associated with effective team performance (Stewart, 2006), there have been no generally accepted instruments for assessing individuals’ contributions to the team. Some peer evaluation instruments, or peer evaluation systems, such as students dividing points among team members, have been published, but they have not become standard in college classes (e.g,. Fellenz, 2006; Gatfield,1999; Gueldenzoph& May, 2002; McGourty & De Meuse, 2001; Moran, Musselwhite, Zenger, 1996; Rosenstein Dickinson, 1996; Sheppard, Chen, Schaeffer, Steinbeck, Neumann, & Ko, 2004; Taggar Brown, 2001; Van Duzer McMartin, 2000; Wheelan, 1999). Further, not all published peer evaluation instruments were designed to align closely with the literature on team-member effectiveness. To meet the need for a research-based, peer evaluation instrument for use in college classes, Loughry, Ohland, and Moore (2007) developed the Comprehensive Assessment of Team Member Effectiveness (CATME). The researchers used the teamwork literature to identify the ways by which team members can help their team to be effective. They drew heavily on the work of Anderson and West (1998), Campion, Medsker, and Higgs (1993), Cannon-Bowers, Tannenbaum, Salas, and Volpe (1995), Erez, LePine, and Elms (2002), Guzzo, Yost, Campbell, and Shea (1993), Hedge, Bruskiewicz, Logan, Hanson, and Buck (1999), Hyatt and Ruddy (1997), McAllister (1995), McIntyre and Salas (1995), Niehoff and Moorman (1993), Spich and Keleman (1985), Stevens & Campion (1994 & 1996), and Weldon, Jehn, and Pradhan (1991). Based on the literature, they created a large pool of potential items, which they then tested using two large surveys of college students. They used exploratory factor analysis and confirmatory factor analysis to select the items for the final instrument. The researchers found 29 specific types of team-member contributions that clustered into five broad categories (contributing to the team’s work, interacting with teammates, keeping the team on track, expecting quality, and having relevant knowledge, skills, and abilities). The full (87-item) version of the instrument uses three items to measure each of the 29 types of team member contributions with high internal consistency. The short (33-item) version uses a subset of the items to measure the five broad categories (these items are presented in Appendix A). Raters rate each teammate on each item using Likert scales (strongly disagree - strongly agree).

Although the CATME instrument is solidly rooted in the literature on team effectiveness, instructors who wish to use peer evaluation to adjust their students’ grades in team-based learning projects may need a simpler instrument. Even the short version of the CATME requires that students read 33 items and make judgments about each of the items for each of their teammates. If there are four-person teams and the instructor requires a self-evaluation, each student must make 132 independent decisions to complete the evaluation. To carefully consider all of these decisions may require more effort than students are willing to put forth. Then, in order for instructors to use the instrument to adjust grades within the team, the instructor must review 528 data points (132 for each of the four students on the team) before making the computations necessary to determine an adjustment factor for the four students’ grades. For instructors who have large numbers of students and heavy work loads, this may not be feasible. Other published peer evaluation systems, such as those cited earlier, also require substantial effort, which may be one reason why they have not been widely adopted.

Another concern about the CATME instrument is that students using the Likert-scale format to make their evaluations may have very different perceptions about what rating a teammate who behaved in a particular way deserves. This is because the response choices (strongly disagree - strongly agree) are not behavioral descriptions. Therefore, although the CATME Likert-format instrument has a number of strengths, some instructors who need to use peer evaluation may prefer a shorter, behaviorally anchored version of the instrument.

In the remainder of this paper, we describe our efforts to develop a behaviorally anchored rating scale (BARS) version of the CATME instrument. We believe that a BARS version will have two main advantages that, in some circumstances, may make it more appropriate than the original CATME for instructors who want to adjust individual grades in team learning environments. First, because the BARS instrument will assess only the five broad areas of team-member contribution, raters will only need to make five decisions about each person they rate. In a four-person team, this would reduce the burden for students who fill out the survey from 132 decisions to 20. For the instructor, it reduces the number of data points to review for the team from 528 to 80. In addition, by providing descriptions of the behaviors that a team member would display to warrant any given rating, a BARS version of the instrument could help students to better understand the standards by which both they and their peers are being evaluated. If students learn from reading the behavioral descriptions what constitutes good performance and poor performance, they may try to display more effective team behaviors and refrain from behaviors that harm the team’s effectiveness.

Behaviorally Anchored Ratings Scales

Behaviorally anchored rating scales provide a way to measure how much an individual’s behavior in various performance categories contributes to achieving the goals of the team or organization of which the individual is a member (Campbell, Dunnette, Arvey, & Hellervick, 1973). The procedure for creating BARS instruments was developed in the early 1960s and became more popular in the 1970s (Smith & Kendall, 1963). Subject-matter experts (SMEs), who fully understand the job for which the instrument is being developed, provide input for its creation. More specifically, SMEs provide examples of actual performance behaviors, called “critical incidents,” and help to classify whether the examples represent high, medium, or low performance in the category in question. The scales provide descriptions of specific behaviors that people at various levels of performance would typically display.

Research on the psychometric advantages of BARS scales has been mixed (MacDonald & Sulsky, 2009). However, there is research to suggest that BARS scales have a number of advantages over Likert ratings scales, such as greater inter-rater reliability and less leniency error (Campbell, et al., 1973; Ohland, Layton, Loughry, & Yuhasz2005). In addition, instruments with descriptive anchors may generate more positive rater and ratee reactions, have more face validity, be more defensible in court, and have strong advantages for raters from collectivist cultures (MacDonald & Sulsky, 2009). In addition, training both raters and ratees using a BARS scale prior to using it to evaluate performance may make the rating process easier and more comfortable for raters and ratees (MacDonald & Sulsky, 2009). Using the BARS process for developing anchors also greatly facilitates frame-of-reference (FOR) training, which has been shown to be the best approach to rater training (Woehr & Huffcutt, 1994).

DEVELOPMENT OF THE CATME BARS INSTRUMENT

A cross-disciplinary team of nine university faculty members with expertise inmanagement, education, education assessment, various engineering disciplines, and engineering education research collaborated to develop a BARS version of the CATME instrument. We used the “critical incident methodology” described in Hedge, et al.(1999) to develop behavioral anchors for the five broad categories of team-member contribution measured by the original CATME instrument. This method is useful for developing behavioral descriptions to anchor a rating scale, and requires the identification of examples of a wide range of behaviors from poor to excellent (rather than focusing only on extreme cases). Critical incidents include observable behaviors related to what is being assessed, context, and the result of the behavior, and must be drawn from the experience of subject-matter experts who also categorize and translate the list of critical incidents. All members of the research team can be considered as subject-matter experts on team learning and team-member contributions. All have published research relating to student learning teams and all have used student teams in their classrooms. Collectively, they have approximately 90 years of teaching experience.