Myths and Counter-Myths of the Student Teacher Evaluation Process

A Multi-Disciplined Review of the Student Teacher Evaluation Process

Dennis E. Clayson

Professor of Marketing

University of Northern Iowa

College of Business Administration

University of Northern Iowa

Cedar Falls, Iowa 50614-0126

(319) 273-6015

September 2016

A Multi-Disciplined Review of the Student Teacher Evaluation Process

Introduction

There have been divisive issues within academics in the past, but few have been as well researched and long lasting as the debate on the student teacher evaluation process (SET). As early as 1993, Seldin reported that four of five (80%) campuses use some sort of student evaluation of instructors. Business education appeared to be ahead of the curve; by 1994 about 95% of the deans of accredited business schools used the evaluations as a source of teaching information (Crumbley, 1995). Shortly thereafter, a study by the Carnegie Foundation found 98% of universities were using some form of student evaluation of classroom teaching (Magner, 1997). About the same time another study found that 99% of business schools used some form of student evaluation of teaching, and that deans placed a higher importance on these evaluations than either administrative or peer evaluations (Comm & Manthaisel, 1998). Currently, it would be difficult to find a university that does not utilize student evaluation of teaching, and the use of these instruments has become normative. On many campuses, SET are the most important, and in many cases, the only measure of teaching ability (Wilson, 1998). As an example, 87% of accounting professors reported that SET instruments were used to determine tenure decisions, 70% said that the evaluations were used for merit pay decisions (Crumbley & Reichelt, 2009). Seldin (1999) reports a California dean as saying, “If I trust one source of data on teaching performance, I trust the students” (p. 15).

As would be expected, the almost universal utilization of an assessment that could establish tenure, promotion, and merit pay would be extensively researched. One source stated that there was close to 3,000 articles published on SET from 1990 to 2005 alone (Al-Issa & Sulieman, 2007). Reports on the topic are so voluminous that many researchers have for some time been using the method of meta‑analysis in which the case is not a subject but an entire published article (see Clayson, 2009; Cohen, 1980; Cohen, 1981; Feldman, 1989; Spooren, Brockx & Mortelmans, 2013 as examples). Nevertheless, little agreement has been made on key points. The defenders of the system are usually found in the colleges of education, and among those who consult in the area. Some have defended the evaluations almost as if they were religious tenets, and even in refereed sources have referred to contrary findings as “myths.” (Aleamoni, 1999; Marsh and Hattie, 2002; Marsh & Roche, 2000). These advocates typically have an advantage in the publication process since instructional research is the essential academic work of their profession. Other disciplines generally look upon research of instruction as less prestigious, and those opposed to the evaluation process are more dispersed among academic disciplines, and more isolated in their publication outlets. They are, however, equally emphatic. In such an environment, it becomes relatively easy to select research findings that reinforce a point of view, or to find reports that support or counter positive findings.

The following summary of the evaluation process is not free of these problems, but it does attempt to present information from a wider assortment of venues than much that is found in the traditional educational discipline outlets.

Question of Motivation

There is a fundamental issue that needs to be addressed before going further into the literature. Critics could be accused of self-interest in finding fault with the SET process. Almost all university and college-level instructors believe themselves to be good teachers. It is the common conceit of the profession. An instrument that finds that a person’s self-concept is wrong is likely to be seen as wrong. On the other hand, if SET reaffirms a person’s self-concept, it is likely to be assumed to be valid. This creates the possibility of an ethical dilemma which rests upon the validity of the instruments. If instructors with low evaluations demand to have these instruments removed from any consideration for promotion, tenure, and merit pay, they could be seen as acting unethically if the instruments are valid. On the other hand, it would be unethical for instructors to advance SET if the instruments are invalid, and especially so if they individually benefited from their use.

A special case of ethical problems is found with administrators who defend the SET process and demand its utilization. Administrators almost universally benefit from the widespread use of the SET process (see Porter, 2011). In this regard, Crumbley and Fliedner’s (2002) survey of accounting administrators is disturbing. Ninety-five percent of these administrators agreed that a single measure could not capture relevant aspects of teaching ability, and 72% thought that students were not qualified to judge areas of teaching skills. Further, only half (51%) of surveyed administrators thought that SET was related to what the students learn, and almost half (48%) thought that SET added to grade inflation. Yet, 87% of accounting faculty report that SET was used in tenure decisions (Crumbley & Reichelt, 2009).

Some could see the validity argument as a nonsequitur. It could be argued that the SET process is neither valid nor invalid. If it is seen as simply a measure of student perceptions without any other concern for what is being measured, then it would be difficult to argue that any student perception could be invalid. However, it is difficult to find any published literature that is willing to be so radical. A cynical argument could also be advanced that the SET process fulfills its function because administrators and institutions are required to evaluate their faculty and the SET process does exactly that. In other words, the instruments have a form of utilitarian validity irrespective of anything else they may, or may not, accurately measure.

Nevertheless, to simply assume that a critic of the SET process is someone who gets bad evaluations and a supporter is one that generally gets positive scores is an over simplification that ignores much of the research that will be reviewed in this paper.

Gender

Do men get better evaluations than women?

Research on the question of whether gender creates differences in student ratings of professors and instruction has provided mixed results. Defenders of the process have generally stated that there is little evidence of gender related effects (Feldman, 1993), while admitting that there may be a difference in the way male and female students rate faculty (Cashin, 1995). Differences were found early in gender research, leading researchers to generally conclude that male instructors were evaluated more highly than females (Bernard et al., 1981). Others (Bennett, 1982), however, suggested that the existing literature base before the 1980’s offered little evidence that women received systematically lower marks from students than men, but that students did seem to prefer same gender instructors. It was found that sex role stereotypes more strongly influence evaluation of female instructors than those of male professors. This finding was reinforced by a later study (Kierstead et al., 1988) suggesting that women instructors had to work harder to attain the same ratings given to men. Basow and Silberg (1987) reported that male students gave lower ratings to female professors. There is also a tendency for female students to select women instructors as better teachers (Basow, 1998; Centra & Gaubatz, 2000). These types of findings have led some to wonder how traits that are important to student teaching evaluations, which also have been found to be related to gender bias, would not create bias in the evaluations (Laube et al., 2007).

Female students have been shown to give female instructors higher evaluations than they did to male instructors, and higher evaluations than males for the same female instructors. Both male and female students thought that instructors of their own gender showed more “interest in students” (Elmore & LaPointe, 1975). Basow (2000) later published findings that female professors were chosen as the “best” instructor more by women students than male students, who chose more males as the “best.” She found no gender differences in selections of the “worst” professors. The “best” female instructors were considered to be much more “helpful” than the “best” male professors, indicating a subtle gender stereotyping by the students. In a large study of almost 3,000 respondents, Langbein (1994) found that female instructors were rewarded, relative to men, for being supportive and displaying “nurturing” behavior. Women instructors were punished, relative to men, for objective and authoritarian behavior. She also found that women were given less of a boost in their evaluation for good grades than were men. Students have been shown to have more “hostility” towards female than male instructors who did not meet students’ gendered expectations (Sprague & Massoni, 2005).

In a study of introductory management and marketing classes, Foote, Harmon and Mayo (2003) found that student gender role attitudes did not affect student evaluation of instructors. The only marketing study using students to look at the gender of instructors (Clayson & Glynn 1993) found no gender main effects of the global ratings either by sex of instructor or by sex of the student, however significant interaction effects were found. Students perceived instructors of their own gender to be better teachers. Moreover, male and female students used different combinations of personality traits in their determination. Male students indicated that a “good” male instructor would be experienced, approachable, likable, direct, and hard working. Male students perceived a female professor as being a better instructor if she was confident, fair, and not talkative. Female students found a “good” male instructor to be ambitious, yet sensitive and concerned. A “good” female instructor was judged on only two social interactive variables; concerned and likable. Students, both male and female, did, however, expect male professors to be more successful. When students were allowed to pick professors by looking at their pictures, a male photo was chosen 73% of the time as representing the “best professor.” A male photo was chosen 80% of the time as representing the person most likely to be making more consulting money. A female photo was chosen 71% of the time as representing a professor that was denied tenure (Clayson, 1992).

An unpublished study found that there is a positive relationship between judged attractiveness of faculty and the evaluations they receive. The effect seems to be stronger for male instructors than for female (Hamermesh & Parker, 2004). In a large study at a business school, undergraduate students rated female instructors higher than male instructors, but graduate students rated male instructors higher (Nargundkar & Shrikhande, 2014).

Recently, more negative findings have been reported. The SET process was found to be significantly biased against female instructors even into areas that would not be expected to be gender specific… such as how promptly assignments are graded. The bias was found to vary by discipline and by student gender (Boring, Ottoboni & Stark, 2016), a finding reinforced by looking at online data (Clayson, 2016).

Summary

1. Although historically, male professors may have been preferred in evaluations, there currently appears to be few global differences (main effects) between the mean scores for male and female instructors. However, there are gender interaction effects. Students seem to believe that male professors will be more successful, and in some disciplines, prefer instructors of the same gender.

2. Female instructors may need to display more gender stereotypic behaviors than males for high evaluations.

4. If a male and a female instructor receive the same global evaluation, the reasons for that evaluation are highly likely to be different.

5. The effects seem to change over time. Fundamentally, gender differences are likely to be indications of social bias that may be found in any given group of students at any given time.

Research

Do professors who do research get better student evaluations?

Many faculty will argue that doing research makes them better teachers. There are logical reasons for holding this belief. A researcher is more likely to be current with the discipline, to have more information to impart to students, and should be able to guide students better within the community of the discipline. On the other hand, some seem to believe that being a good teacher is incompatible with being a good researcher, primarily because of interest and time constraints. Are these beliefs reflected in student evaluation of instruction?

The answer appears to be no. Although there is no evidence that faculty research harms the evaluation, there is little to indicate that it is a benefit. It has been found that creative researchers and effective teachers have distinctively different personality profiles, with the teachers’ pattern more conducive to higher evaluations (Jackson, 1994). However, Feldman (1987) concluded, “… the likelihood that research productivity actually benefits teaching is extremely small or that the two, for all practical purposes, are essentially unrelated… Productivity in research and scholarship does not seem to detract from being an effective teacher” (p. 274). This opinion was confirmed by Hattie and Marsh (1996). They performed a meta-analysis of 58 studies looking at the relationship between research and teaching. The average correlation was close to zero with a median of zero. A lack of relationships is more likely to occur at research universities and when the rapport aspects of the evaluation are emphasized. Good researchers were rated as more enthusiastic and more knowledgeable. This literature did find a negative relationship between time spent on research and time spent on teaching. Later Marsh and Hattie (2002) conducted a study of over 12,000 student evaluations and again found no association between teaching evaluations and research productivity. “In contrast to the apparent academic myth that research productivity and teaching effectiveness are complementary constructs, results of the present investigation… coupled with the findings of Hattie and Marsh (1996) meta-analysis… provide strong support for the typical finding that the teaching-research relationship is close to zero” [p. 628]. Marsh and Hattie suggest several explanations, but never hypothesis that the evaluations may not be valid.

Summary

1. Research does not seem to benefit or harm student evaluation of instruction.