Statistical Software for Students: Academic Practices and Employer Expectations

Statistical Software for Students: Academic Practices and Employer Expectations

Adams, Infeld, & Wulff —Page 1 of 32

Statistical Software for Students: Academic Practices and Employer Expectations
William C. Adams, Donna Lind Infeld, & Carli M. Wulff Trachtenberg School ● George Washington University


StatisticalSoftware for Students:
Academic Practices & Employer Expectations

O

ver the past several decades, students of public administration, public policy, and public affairs haveregularly been taught statistics using various computer packages, such as SAS, SPSS, or Stata. However, there has been little published exploration or discussion of which statistical software packages are of greatest benefit to these students. To explore this topic, we conducted a multi-method study addressing the following research questions:

  1. Is there any available evidence that indicates a particular statistical software program is superior?
  2. Which statistical software programs aremost widely integrated into MPA, MPP and related masters programs?
  3. What statistical software skills, if any, do relevant employers specify in job announcements? And for those continuing in academe, are there trends in software use?

Answering these questions required a canvass of our peers and our students’ prospective employers as well as research on the merits of the competing software.

Prior studies have confirmed that most MPA, MPP, and related masters programs do require an introductory coursein statistics and in budget and finance, andoften offer more advanced courses as well (Infeld & Adams 2011; Koven, Goetzke, Brennan 2008; Morçöl & Ivanova 2010; NASPAA 2009).But, in the data-rich, computer-based world of the 21st century, what is being taughtin those quantitative courses regarding specific statistical software? No prior systematic report of either academic practices or relevant employer needs could be found. However, before exploring this unchartered territory, we searched for evaluations of the comparative merits of prominent software options.

Software Merits

Despite the current enormous emphasis on outcome evaluations in education in general, and public affairs education in particular (Newcomer & Allen, 2010; Aristigueta & Gomes 2006;Castleberry 2006; Durant 2002; Fitzpatrick & Miller-Stevens 2009; Powell 2009; Roberts & Pavlak 2002; Williams 2002), as reviewed below, remarkably few published articles assess the comparative advantages of competing software for students and only one outcome experiment was identified. This literature summary highlights Excel, SAS, SPSS, and Stata because those findings are especially relevant to the subsequent analysis.

Features.The sole recent comparison of functionality that could be found lists the features of many statistical programs (excluding Excel) regarding regression analysis, time series analysis, ANOVA, and selected other statistics (Wikipedia, 2011).In these areas, leading software such as SAS, SPSS,[1] and Stata all run the most common and many arcane statistical procedures, although a few exotic options are missing (e.g., SPSS lacks quantile regression, autoregressive conditional heteroscedasticity analysis, and generalized autoregressive conditional heteroscedasticity analysis).

Accuracy. Most users take for granted that the computations used by statistical software yield precise and identical results.Unfortunately, that is not necessarily the case.Because software performance is so often measured in terms of speed, programming shortcuts thatgain speed can sacrifice exactitude.

Altman & McDonald (2001) obtained mixed results when comparing Excel, SAS, SPSS, Stata, and several other software packages.The good news was that statistical software packages “typically” (though not always) provide “correct answers, to at least the fourth significant digit, for univariate statistics, regression problems, low-difficulty analysis of variance problems, and low difficulty nonlinear regression programs” (p. 684).The bad news was that many programs were unreliable for nonlinear regression and were “unable to return accurate ANOVA results for problems of even average difficulty” (p. 684). Excel 1997 could not even properly calculate standard deviations for data involving eight-digits or more.[2]

Later, using large, complex datasets, researchers at the National Center for Health Statistics performed various procedures using SAS, SPSS, Stata, and a less-widely used program, SUDAAN.They obtained “identical results” and concluded that, given this equally high level of precision, software choices should be driven by other factors such as cost, ease of use, and data management capabilities (Siller & Tompkins 2006).However, a more extensive analysis (Keeling & Pavur 2007) of nine software packages (including Excel 2003, SAS 9.1, SPSS 12.0, and Stata 8.1) did detect various shortcomings (especially with nonlinear regression and autocorrelation calculations) but found notable improvements from earlier versions in almost every area.

Since 2007, Excel’s computations have continued to draw criticism, however no studies were found that gaugedthe accuracy of later versions of other statistical software.Indeed, Excel has a history of nontrivial computational errors (Sawitzki 1994, McCullough & Wilson 1999, 2002, 2005; Knüsel 1998, 2002, 2005;Altman, Gill, & McDonald, 2004). A more recent examination of Excel 2007 reached disturbing conclusions (McCullough & Heiser 2008, p. 4570):

Excel 2007, like its predecessors, fails a standard set of intermediate-level accuracy tests....[I]t is not safe to assume that Microsoft Excel’s statistical procedures give the correct answer.Persons who wish to conduct statistical analyses should use some other package.

Among the many errors that McCullough & Heiser identified are a flawed linear regression algorithm and output, erroneous nonlinear regression results, a nonrandom random number generator, and inaccurate t-tests (incorrect results especially when missing data are involved; wrong p-values, and even mistaken labels). Also regarding Excel 2007, Yalta (2008) found that “the accuracy of various statistical functions range from unacceptably bad to acceptable but significantly inferior” to alternative implementations.Others identified serious flaws in Excel’s polynomial trend line equations (Hargreaves & McWilliams 2010) as well as other areas (Almiron et al. 2010).Some statisticians argue against using Excel (or any other spreadsheet) as vehicle for teaching statistics (Nash 2008), especially given what they view as misleading and confusing default charts (Su 2008). No scholarly studies defending the precision of Excel could be found.

To date, there seem to be no published critiques of Excel 2010, but McCullough & Heiser were pessimistic about future prospects (2008, p. 4570):

Microsoft occasionally fixes errors, more often ignores them, and sometimes fixes them incorrectly. Consequently, every time there is a new version of Excel, the [accuracy] tests must be repeated.

McCullough (personal communication, August 21, 2011) statedthat Microsoft still does not document its “unsupported claims of accuracy” for Excel with any sort of actual “test code with known inputs and outputs.”

Usability. No systematic studies of the relative user-friendliness, accessibility orlearning curve could be found in the published literature.SPSS and Stata offer both a command line interface and a menu-driven, graphical user interface, while SAS is only command based and Excel does not have a command line option.All four run on Windows and Linux; Excel, SPSS, and Stata also offer Mac versions, although SAS does not.Two personal evaluations argued that Stata was superior to SAS and SPSS, but both authors had written books about Stata (Acock 2005; Mitchell 2005).

Costs. Expense matters and certainly gives Excel a structural advantage, oftenpre-installed on computers or seemingly complimentaryas part of the Microsoft Office suite. Also at no extra cost, as of 2011, “SAS OnDemand for Academics” allows online, cloud-based access to a wide range of SAS applications and is free of charge to college instructors and their students. The price of the latest student version of SPSS fell sharply in 2011 and was available on Amazon.com for less than $28, with its limitations of a 13 months license, 1500 cases, and 50 variables. Stata, which not long ago was the least expensive, is now the most expensive for students with the “Small Stata” product costing $49 and limited to one year, 1200 cases, and 99 variables.[3] Site licenses for many academic lab computers quickly run into thousands of dollars and often entail complex fee formulas, volume discounts, and vary further depending on the feature sets to be included. While those exact site license calculations are beyond the scope of this study, they cannot be dismissed as irrelevant.

Pedagogical value. By the 1990s, the dominant pedagogical model for statistics had decisively moved from a passive lecture approach to more active student engagement using a variety of activities, such as creative problem solving, practical applications, discussions, small groups, original research, plus more interpretation and analysis beyond rote calculations (Moore 1997; National Research Council, 1990, 1991). Not that expository texts and classroom lecture have been jettisoned entirely, but they came to be seen as insufficient alone to successfully engage active learning.Consequently, involving students in active data exploration via software became an especially valuable tool for the new pedagogy.

Which software is most suitable for this task?As Moore observed (1997, p. 131), “Software designed for doing statistics is not necessarily well structured for learning statistics.” But there is little published research outside of a few scattered personal reflections and anecdotes about classroom advantages of using any particular software.Certainly arguments can reach the level of religious differences given some people’s attachment to and investment in their preferred statistics software package.

The only randomized, controlled test of software was conducted among 24 undergraduates (mostly criminal justice majors) in an introductory statistics course at Indiana University (Proctor 2002).The dozenwho used Excel scored higher than the dozen who used SPSS in terms of computational knowledge and slightly higher in conceptual knowledge.One might have expected that more systematic, comparative outcome evaluations would have been conducted by now – given the cost of software licenses, the time spent in computer laboratories, and the large investment in instructional communication – in order to optimize the software selection for quantitative courses.

Issues of accuracy notwithstanding, there are hints in the literature that Excel is making inroads beyond its traditional domain of budget, finance, and business.Articles are appearing about ways to use Excel, usuallyincorporating its Analysis ToolPak add-in, to teach applied statistics in fields as varied as psychology (Warner & Meehan 2001), nursing (DiMaria-Ghalili & Ostrow 2009), and engineering (Prvan, Reid, & Petocz 2002) as well as business (Bell 2000).Likewise, some statistics textbooks have begun to focus on Excel (e.g., Dretzke, 2011; Carlberg 2011).

All in all, prior research does not offer much guidance regarding the best software to employ in our quantitative classes.It does raise questions about Excel’s precision, but otherwise we seem to be left in the dark with onlyour personal anecdotes and experiences.Yet academic programs must still make decisions about what software packages to require of their students. We therefore sought to discover what programs across the country currently require for masters students.

Program Practices

A nationwide online poll of 260 eligible[4] NASPAA representatives was conducted May-July, 2011.Completed surveys constituted a total of 131 accredited and not accredited masters’ programs. Responses encompassed over half (n=98; 52%) of the Master of Public Administration (MPA)programs, over half (n=16; 53%) of the Master of Public Policy (MPP) programs,most (n=5; 71%) of the Master of Public Affairs programs, and twelve of the many dozens of varied other public sector-relatedmasters degrees.Responses from MPP programs also represent over one fourth of those that have institutional membership in the Association for Public Policy Analysis and Management (APPAM).

Insert Table 1 and Figure 1 here.

Of the 131 programs, onlytwo do not offer an introductory statistics course.(See Figure 1 and Table 1.)Only four programs teach statistics without employing a software program.Before dismissing these exceptions, it should be noted that comments volunteered by a few who do use software in MPA programsshow skepticism or ambivalence.

“We’re having a lively debate about whether any statistics package should be used. Do MPAs really need to be able to run regressions and such? If we train public leaders, is this a quality they need?… [Some alumni say] these software programs got them their first job. Others say it was a waste of 3 credit hours.”

“The administrator is far better having a general knowledge… and ensuring the expertise is in place than knowing one form of software well unless he or she plans to specialize in some way. If an administrator has time to sit and analyze data, he or she is probably not doing the job especially well.”

“Most of our students don't use stats on the job. Some use Excel on the job but don't need training from us. They get their real training on the job.”

Despite reservations from a fewrepresentatives, the overwhelming majority of these degree programs utilize statistical software. In the introductory statistics course, 97% of the MPA programs employ software (excluding the two programs without such a course), as do 100% of the MPP programs and 94% of the other masters programs. Among those programs using software in this course, most offer a companion computer lab (74% of MPA programs, 88% of the MPP programs, and 81% of other masters programs).[5] Follow-up statistics courses are offered by a majority of MPA programs (59%) and a very large majority of MPP and other masters programs (94%); such coursesalmost always usestatistical software and often include a computer lab. (See Figure 1 and Table 1.)

An introductory budget and finance course is offered by large majorities (92% of MPA programs, 75% MPPs, 88% all others) and typically employs statistical software (89% of those MPA programs offering a course; 92% MPPs; 82% others) although, in this case,most do not have an accompanying computer lab (only 30% of MPA programs using software in this course add a lab; 45% MPPs; 14% others).Subsequent budget and finance courses are less common, but, if offered, usually use statistical software without accompanying computer labs.(See Figure 1 and Table 1.)

Thus, many students in these MPA, MPP, and other masters programs are likely to have worked with statistical software in at least two courses and perhaps several, depending on their degree program, fields, and elective choices. Courses in program evaluation, policy analysis, and certainly capstone projects sometimes entail the use of statistical software as well. What software are these programs featuring?

For budget and finance courses, the answer is easy:Excel.Having long ago vanquished its foes like Lotus 1-2-3, Quattro Pro, and PlanPerfect, Excel dominates the academic spreadsheet world without any major rival.(See Figure 2 and Table 1.)Only a handful of these masters programs do not use Excel alone in their budgeting and finance course(s); most of these exceptions added SPSS, Stata, or some other software to Excel; two outliers use SPSS alone.

Insert Figure 2 here.

For introductory statistics courses, software choices are less uniform.SPSS is most widely taught, but it lacks the near monopoly that Excel has in budget and finance courses.In MPA programs, a large majority (70%) use SPSS as do a majorityin MPP (63%) and other masters programs (59%).(See Table 1 and Figures 2 and3.) A plurality in MPA programs (42%) and MPP programs (44%) use SPSS alone, but it isalso used in conjunction with Excel, especially in MPA programs (28%).In MPP programs, Stata supplants Excel as a main rival to SPSS. Other software (R, JMP, gretl, and Crystal Ball) were rarely mentioned.

Insert Figure 3 and Figure4 here and ideally on the same page for easy comparison.

For subsequent statistics courses, Stata emerges as a stronger contender, markedly surpassing SPSS among MPP programs and showing a nontrivial presence in MPA and other masters programs.(See Table 1 and Figures 2 and 4.)SPSS continues to be the software of choice among MPA programs, used in over two-thirds of the later statistics classes, sometimes along with Excel or other software.

Excluding Excel, respondents were asked to rank the demand for SAS, SPSS, and Stata among employers:“How would you estimate the current usage of these three software packages at the jobs your Masters students seek?”As shown in Figure 5, three out of four academics believe that SPSS is most widely used by relevant employers, with opinions split as to whether SAS or Stata is the runner-up.

Insert Figure 5 and Figure 6 here.

In terms of trends, however, the position of SPSS is not so secure.Respondents were asked if they had “noticed any trends in the popularity of these software programs over the past decade.”As shown in Figure 6, Stata is the clear winner in perceived momentum with 52% saying Stata has gained and only 6% saying it has lost popularity.That net positive 46% compares with a net negative 15% for SAS and a net negative 11% for SPSS.Of course this question asked about relative trends, not absolute standing where SPSS still comes out ahead in both university practices and projected employer needs.Yet, thewidespread impressions of Stata’s strides — indeed three out of ten said it had gained “a lot” — suggest that SPSS (and programs that teach it) ought not to rest on its laurels.In this vein, one respondent commented:

“We relied on SPSS as our stat package for years, particularly in the intro course. However costs and difficulty in licensing are making us consider alternatives. Since some of us use Stata in our own work, and it is higher education friendly, I think we will be moving more and more in that direction.”

In both questions about perceived usage and trends of these three statistical software packages, our assumption was that, as a spreadsheet, Excel is extremely important but falls in a somewhat different category.However, some respondents volunteered that this trio was inconsequential because nothing really matters but Excel.

“Most of our students are using Excel or similar, job specific applications, not these stats software.”