Postsecondary Institution Ratings System Symposium

By Ben Miller—February 5, 2014Note: The meeting was held 6 February 2014

The U.S. Department of Education is hosting its rescheduled technical symposium to discuss ideas around constructing a college ratings system. Instead of providing live updates, this will feature occasional summaries or ideas that stand out from the presenters. If you want to read New America’s comments, you can find them here [.

Panel A: Data Elements, Metrics, and Collection

We started with a proposal from Roger Benjamin of the Council for Aid to Education, who focused on the need for student learning. Probably the most interesting part of his presentation was his adamance that the only way to really do this in a reliable manner is through standardized tests, such as the Collegiate Learning Assessment. He also suggested having colleges come together to pick their own comparison groups and have an independent commission administer the ratings system through a peer review process.

Next up is Tom Bailey from the Community College Research Center, whose presentation focuses more on two-year colleges but with good overall points. He notes that there really are differences between what consumers want to know (how students like me fare) versus accountability (how does a college do compared to others like it with the same mission, etc.) He does note the need for ideally a longitudinal data system, but thinks that we can get better incrementally with what we have. One of his most interesting points is that he brings up the idea of considering the state when thinking of ratings since the subsidy rates vary as might some policies. He also says that it’s very important to separate out community colleges from technical colleges, noting that 39 of the 50 two-year schools with the best outcomes granted no associate degrees. Though the issue of properly measuring transfer without a student unit record is still there. He also brings up the issue of thinking about outcomes at a more programmatic level, especially for earnings. Finally, he argues that it’s probably best to have this system focus on the extremes of good and bad and less on the middle.

Don Hossler’s remarks are focused a bit more on the larger goals a system could achieve. He notes that they basically have to pick two out of three things: simple, generalizable, and accurate since doing all is impossible. He also addresses the important question of whether this could change institutional behavior. The research on whether performance-based budgeting at the state level can drive institutional change, though he notes there are new approaches being tested, which college leaders say they are paying attention to. But the larger point is that colleges have to see that the measures are accurate and reflect what they are trying to do in order to take them seriously. We’ve also got at least our second if not our third mention of how a student unit record system is probably the best way to do this and interoperability of state data systems is far behind. Hossler’s suggested solution is the National Student Clearinghouse. But the problem here is the Clearinghouse doesn’t own the data, the schools do. So if you use the Clearinghouse, then the schools have to see the ratings as valid.

John Pryor who is starting at Gallup and just left UCLA talks a bit about who actually uses rankings systems now. He notes that just 18% of four-year students said rankings mattered in their college choice. This was the 11th most common thing listed out of 22 options and fell behind things like cost, financial aid, a college visit, and it sounds like even graduation rates, which were chosen by about a third of students.Not surprisingly, those 18 percent tend to be higher-income, higher-ability students who are white or Asian. Long story short, the only people who make use of current rankings systems are the shrinking segment of higher ed. What’s more surprising is after laying out how students don’t really use rankings systems, he thinks that a ratings system should focus on prospective students, which sounds like it could improve the information, but not necessarily the information transmission flaws that need to be overcome.

The idea of how states should be thought about in a ratings system keeps sticking with me. In theory, state behavior will be captured in a lot of other measures for public colleges, since things like low funding are likely to translate into higher prices and greater debt levels. Similarly, poor transfer policies will probably hurt attainment rates. But these things are a few steps removed from each other, rather than a direct call out. Should we be more explicit about that state role or continue just capturing them through proxy?

Patrick Perry from the California Community College Chancellor’s Office probably had the best presentation on this panel, which isn’t surprising since he oversees an incredibly detailed dataset that includes both earnings and better completion information. It doesn’t lend itself well to summary, but he really hammers home the point about how we need to be smarter about which students we actually want to measure outcomes for, so that students who take a single course and are not really pursuing a degree shouldn’t be factored in. The issue of students being place bound and having more limited options comes up again as well, which is why he doesn’t think an accountability-focused system makes as much sense.

The Q&A part of the panel finally gets to the issue that seemed missing in some of the consumer-focused parts about how the consumers we want to use the information are the ones not using it. There’s talk of making things simple and clearer, but unfortunately not anything much more detailed than that. Mark Schneider from the American Institutes of Research puts this in a more interesting way though–how many students’ decision do we actually need to change? He thinks that getting something like 10 percent of students to change their decision could have a big effect. On Twitter, Mark Huelsman, who is at IHEP now and headed to Demos shortly, pushes this idea even further, noting that we aren’t actually sure exactly what behaviors we necessarily want to change. Figuring out which students and what we want them to do seems like a crucial set of things to determine if we want to improve consumer information.

How we describe students as full or part time keeps coming up as an issue that has to be addressed because students change their attendance intensity al the time. Perry has an interesting solution–just track students for 10 years and don’t make a distinction. That obviously creates problems with respect to a very long-term feedback mechanism, but is also somewhat cleaner. Perry also notes that though taking 10 years to earn a credential is a long time, it’s more an issue of life problems potentially popping up, rather than using lots of additional state resources, since those are given out as students earn credits.

Panel B: Weighting and Scoring

Mark Schneider from the American Institutes for Research kicks off the second panel. His presentation focuses a lot on the earnings data he’s been working with states on, particularly in Texas. It speaks to how earnings can change from years 1 to 10 (though he doesn’t talk about whether intermediate measures like three or four years out would work). He reiterates the same argument we’ve heard a few times already about the need to get to the programmatic level. He shows a slide for My Future Texas, which will be a public site designed to help counselors, which could be interesting.

The programmatic issue keeps coming up, but one thing that would be useful to discuss is what measures can work institutionally versus programmatically and how the limitations of institutional data should have an effect on what the ratings system tries to do.

Robert Kelchen from Seton Hall’s presentation is probably the most detailed in terms of a potential ratings system. Among his highlights–consider using a combination of some kind of input-adjusted graduation rate and the raw one. He doesn’t like using raw wages, but instead suggests thinking about using some multiple of the poverty level to show that graduates are not impoverished. But the general message is keep things simple with three to four levels, and measures of access, affordability, and outcomes. He would use multiple years of data also for smoothing. And we get another call for unit record system.

Sean Corcoran from New York University is less focused on actual data, but makes some very good points about how we might think about driving behavior with a ratings system. He brings up the same notion from the first panel that the colleges need to see the measures as “legitimate” or actually reflecting them fairly in order to get a more meaningful response. But he also says we need to think about the theory of action behind each measure and think more about what we’d want a college or student to do with the data. This means that figures need to reflect a path for improvement and speaks to why some raw measures are helpful. He touches on his K-12 background to talk about what we’ve learned there about accountability, and says one place where this appears to be more effective is at the bottom of performance. That would appear to further the call for a system more focused on worst actors.He also calls for a combination of absolute and relative performance measures.

Russ Poulin, who is from the WICHE Cooperative for Educational Technologies talks about the same issue of consumer versus accountability systems, which is becoming a common theme. But he also gives an interesting discussion of how the work he does on Transparency by Design tries to create a “learner progress” cohort, which includes full and part time students by looking at whether they were seeking a specific degree level at that school for the first time to capture transfers. In particular, he stresses the need for better measures for distance learning.

The question and answer session kicks off with the part of the panel that hasn’t been touched on much yet–how should we weight things? Kevin Carey suggests three ways. You can do something based upon empirical work. You can do it by establishing values. Or you can just weight everything equally. Mark Schneider doesn’t particularly like the idea of a value-weighted system because he thinks the government should have a different role than a private business making those judgments. Poulin talks about the need for more flexibility in how weighting are done so that institutions that may be following non-traditional approaches don’t hurt them. Corcoran reiterates his point that value judgments are unavoidable here, but it’s hard to think about weighting when you don’t know what the measures you are going to use are.

An audience question takes this idea further, noting that even if you let users choose their weights, you’ve still made a judgment call in what measures you are putting out there for them to choose for weights.

Panel C: Comparison Groups

Braden Hosch who is now at Stony Brook University but was recently working on higher education accountability in Connecticut kicks off the comparison group discussion. He points out that finding a comparison group solely through mathematics is not totally feasible due to outliers, and so they had to combine formula and judgments to get at the proper groupings. Within the groupings, they found that looking at completions per 100 full-time equivalent students was better than using graduation rates, though it required some tweaks in using enrollment two years later, weight certificates by one-third, and look only at degree seeking. Connecticut also tried to include a learning outcomes measurement, including a way without standardized tests, but could not come up with something workable at this time. He talks about net price, but notes that the measure has some real problems due to the fact that colleges provide radically different estimates of how much it costs to live off-campus in the same underlying area. Data challenges also appeared with earnings because students may have been working before their program so the level of wages may appear high but the change pre- and post-completion may be low. He’s now at least the second person to bring up the idea of letting colleges weigh in on comparison groups.

Patrick Kelly from the National Center for Higher Education Management Systems brings up his organization’s expertise in building peer groups. He echoes the call for data plus flexibility, noting also that colleges may try to game peers if it’s used for accountability. But he makes an important larger point–graduation rates have been extremely stable over time and what you want is a way to get colleges to try and meet the levels of people “better” than them, rather than those like them. He notes that the available IPEDS data can do a decent job of explaining grad rates at four-year schools, but not at two-years. Kelly is also the first person to issue a strong call for needing to know Pell Grant graduation rates and for loan borrowers. He also reiterates the idea that weighting is a values exercise, noting in his experience, the model for ratings is 10 percent of the work that weighting takes up. Kelly’s honesty about consumer information is also welcome, noting that it’s not easy and he doesn’t know what we should do.

The question and answer session kicks off with a discussion of whether ratings groups should be made based upon characteristics or looking at the policy goals we want to achieve. Kelly and Hosch are both more in favor of using characteristics at first because that’s important to have face validity for institutions. Schneider returns to his earlier point about how sometimes the variation across programs is greater than it is across institutions, and that suggests we rate programs, not colleges. A lot of that makes sense and it’s how things like gainful employment would work. But there’s a huge data gap here. Yes, there are limits to institutional data, but moving to programs would leave us with basically nothing to go on.

This discussion has been going on for hours now and it’s clear that there’s still no agreement on what the end goal of this should be. While this keeps a broader discussion going, it does make things very open ended, since a consumer-focused ratings system focused on four-year schools and an accountability system for two-year institutions would be radically different.

One of the most out-of-the-box ideas comes from an audience member who asks if the purpose of the ratings system is to focus on value, then why not just group institutions based upon price? In general, panelists aren’t crazy about the idea because the costs of delivering education and how it is subsidized varies. Bob Morse from U.S. Newsasks if you can judge value without quality. L’Orange doesn’t like the price idea since he says prices end up at certain levels due to a host of factors. This response isn’t surprising, but it also seems odd. If you start with grouping colleges based upon price and then look at outcomes and other things, do some of those problems go away? A cheap school with bad results would presumably get caught in such a scenario, while an expensive school with good outcomes could get recognized more easily.

We have our first lengthy mention of gainful employment today, noting that it is built for a consumer protection purpose that bad is bad so there’s no need to contextualize it or group it. And that’s not what’s happening in the real world of education. Hosch says we could end up there because the problem with the politics of grouping make it very hard. But if you do that, the risk is just putting all the high graduation schools at the top. Morse notes that IPEDS only looks at degrees granted by program, so if you wanted to do ratings on program you’d have to add a lot of other data.

Panel D: Presenting Ratings Information and Models from Existing Systems

I unfortunately had to step out and missed the first two presenters on this panel, so only caught remarks starting with Tod Massa from the State Council of Higher Education for Virginia. If you’re a higher ed data nerd, then this presentation would probably make you very jealous. Massa at his fingertips the ability to easily display Pell grad rates for up to nine 10years for students with a host of characteristics. He also noted earlier that he could look at outcomes for some students out to 19 years. That’s about 1992-93. He also echoes a call to get better unit record-type data. He notes that without that the system would have to be much more limited in scope. Between Massa’s presentation about Virginia and Patrick Perry’s about California Community Colleges earlier in the day is a great showcase of some really impressive work being done at the state level. Masa talks about the need to keep measures simple, bringing up maybe even looking at the difference in graduation rates for non-aided students and Pell students and seeing if that gap was larger than the standard deviation for all similar college nationally. Because of those limits, he suggests not rating community colleges. [UPDATE: I had incorrectly listed SCHEV graduation rates as going out to nine years, they go out to 10. I also fixed the spelling of Tod Massa's name.]