QUEST Honors Program

Learning Outcomes Assessment

Fall 2015 Results

Fall 2015 Learning Outcomes Assessment Results

Summary without Client Evaluations:

* The incomplete bars in this chart indicate the proportion of assignments where a reviewer declined to evaluate the element.

Notes about the summary:

Overall, we are doing very well, as indicated by the abundance of the blue and green on the chart. However, in the spirit of continuous improvement, we need to understand more about the red areas which indicate that the work was unacceptable in the corresponding element.

The greatest area needing improvement continues to be in data analysis (Learning Outcome 3). Of the Unacceptable ratings, slightly more than half were for 190 reports (the remaining ones were for 490 reports; 390 papers were not evaluated for this learning outcome), as seen in the chart below.

Part of the issue may be that students are not told to provide the level of detail of their analysis that the assessment rubric requires; this might be rectified in the future by requiring an appendix that provides the level of detail described in the rubic. We are also looking at options for making sure that students are learning the necessary data analysis concepts in the program.

A new area of concern is in Learning Outcome 4 (Evaluate, analyze and recommend solutions to real-world problems). This is the first semester that we assessed 190H and 390H papers for this learning outcome. The two unacceptable ratings for LO4.2 Methodology were both from 390H assignments. For LO4.3 Analysis, there was one unacceptable for a 190H assignment and one for a 390H paper, and for LO4.4 Recommendations, all three were from 390H papers.

The remaining red areas are as follows:

  • LO2.3 Prototype and test – 1 unacceptable rating
  • LO6.1 Organziation – 1 unacceptable rating
  • LO6.2 Audience Engagement & Professionalism – 3 unacceptable ratings
  • LO6.4 Effective use of content – 2 unacceptable ratings
  • LO7.2 Professional Writing – 1 unacceptable rating
  • LO7.4 Perspectives – 1 unacceptable rating
  • LO9.4 Ethics – 1 unacceptable rating

All of these instances point to a problem with calibration, since the specific assignment that received these ratings were rated higher by other reviewers.

Summary of Client Evaluations

As indicated by the low number of observations (far right in the above table), we continue to have difficulty getting responses from all of our clients. They are surveyed mid-project and at the end of the project, so ideally we would have 20 responses for the overall assessment and LO9.1-3. The LO4 elements are only asked once for each client, so the most we can have for these is 10 observations. For comparison, the faculty advisors’ overall evaluations are included (second row from top).

Inter-rater Reliability

Calibration of the assessments continues to be an issue. The Fall 2015 assessments indiate that there needs to be greater agreeement among reviewers. The reliability results for learning outcomes 1, 2, and 3 are shown below.

Element / Agreement / Obs (N) / Test / Value / p / N / Agreement
LO1.1 Tool Selection / 100 / 5 / Kappa2 sq / 0.432 / 0.0037 / 34 / moderate
LO1.2 Fit / 50 / 8 / Kappa2 lnr / 0.369 / 0.0023 / 34 / fair
LO1.3 Tool Use / 44 / 16
LO1.4 Solution Evaluation / 80 / 5
overall / 59 / 34
Element / Agreement / Obs (N) / Test / Value / p / N / Agreement
LO2.1 Problem Identification / 40 / 15 / Kappa2 sq / 0.0251 / 0.812 / 60 / very slight
LO2.2 Idea Generation, Screening, Evaluation, and Selection / 27 / 15 / Kappa2 lnr / -0.051 / 0.538 / 60 / none
LO2.3 Prototyping, modeling, testing, and integrating feedback / 47 / 15
LO2.4 Analysis of the innovation's feasibility / 47 / 15
overall / 40 / 60
Element / Agreement / Obs (N) / Test / Value / p / N / Agreement
LO3.1 Qualitative Data Analysis / 52.2 / 23 / Kappa2 sq / 0.356 / 0.0042 / 62 / fair
LO3.2 Quantitative Data Analysis / 45 / 20 / Kappa2 lnr / 0.252 / 0.003 / 62 / fair
LO3.3 Multi-Methods Synthesis / 66.7 / 9
LO3.4 Methodology Choice / 10 / 10
overall / 45.2 / 62

We looked at the percentage agreement at the element level and at the learning outcome level. At the element level, we simply used percent agreement since there were so few observations. The results show that agreement is rarely above 50%, indicating the need for calibration training for the reviewers.

At the outcomes level, we used Cohen’s weighted kappa (squared and linear weights) to examine inter-rater reliabilitysince there were two raters and ordinal data. The Agreement column on the far right is based on the guidelines of Landis and Koch (1977) who characterized values0 as indicating no agreement and 0–0.20 as slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1 as nearly perfect agreement. Based on these results, we see that there is only fair agreement for learning outcomes 1 and 3; and no agreement for learning outcome 2.

For Learning Outcome 7 (written communications), we had three reviewers, so slightly different testing was needed. Pairwise comparison of reviewers for overall agreement only shows slight agreement. The Fleiss Kappa statistic (for more than two raters) comfirms this finding.

Percent / Test / Value / p / N / Agreement
Element / Agreement / Obs (N) / Kappa.fleiss / 0.112 / 0.0396 / 80 / slight
raters 1&2 / 55 / 80 / raters 1&2 / kappa2 sq / 0.265 / 0.0119 / 80 / slight
raters 1&3 / 52.5 / 80 / raters 1&3 / kappa2 sq / 0.220 / 0.0403 / 80 / slight
raters 2&3 / 46.2 / 80 / raters 2&3 / kappa2 sq / 0.174 / 0.0758 / 80 / slight

Reliability was not assessed for learning outcome 4 due to a lack of data. The remaining learning outcomes (5, 6, 8 and 9) were not assessed for reliability since there were many different reviewers.

Comparison of outcomes for 190H, 390H and 490H

The tables below compare the average scores for each learning outcome element for assignments completed in the three required QUEST courses. For learning outcomes 5, 8 and 9, data was only received for BMGT/ENES 490H assignments, so no comparison could be made.

(note: 190H assignments were only assesed for LO1.3)


Landis, J.R.; Koch, G.G. (1977). "The measurement of observer agreement for categorical data". Biometrics 33 (1): 159–174.

Viera, Anthony J.; Garrett, Joanne M. (2005)."Understanding interobserver agreement: the kappa statistic". Family Medicine 37 (5): 360–363.