NSSE-AAUDE Data User Guide – For NSSE v2 and NSSE v1 converted to v2 3
University of Colorado Boulder - Last update 11/2015
Posted from http://www.colorado.edu/pba/surveys/nsse-aaude/index.htm
This user guide is for users of AAUDE shared data from NSSE 2.0 (version 2, known here as v2, introduced 2013) and data from NSSE 1.0 (2000-2012, aka v1) converted to v2 formats where items and response scales were sufficiently comparable (known throughout as v1_v2).
See the website for codebook with extensive mark-ups; variable list in Excel; data and background on response rates, populations, and samples; information on AAUDE questions; and procedures for AAUDE data sharing and using AAUDE questions.
Super quick guide
· START by running Enhancements.sas – set the choices and locations in the code as desired. Described below.
· THEN run Longitudinal.sas – set choices as desired, to select schools and years.
· YOU WILL THEN HAVE
· Temp dataset long1, row = respondent (in year, school, class level), v2 and v1_v2
· Permanent dataset BaseStats, row = year*school*class level*item, with marginals, v2 and v1_v2
· NOTE, in both datasets, the “key variables,” described below.
· USE the marked-up codebook and key variables to plan and execute your analyses.
Contents of this doc listed below. This is a chicken-and-egg doc, so many sections use terms etc. defined in later sections. Sorry! See section near the end for changes 2013 to 2014. These are minimal, affecting only two demographic vars. We did not revisit the v1-to-v2 decisions we made with the 2013 data; we simply took that model and added 2014 and 2015 data.
NOTE that the contents points to a section on What’s new in (data from administration year XXXX)
Files to have at your fingertips 2
Files – SAS datasets, SAS formats/code 2
Key variables – for ID/selecting years, schools, respondents 5
Variables UCB added or modified 6
Notes on other variables – HIPsum.., engagement indicators, duration, sexorient 6
Picking years and schools and respondents for analyses 8
Missing data – Handling in SAS, patterns of response, PctValid vars, comparison to v1 8
Naming conventions – Files, variables, labels 9
Majors – Use the CIPs 10
Institutional metadata – Populations, samples, response rates 11
AAUDE Questions – Changes, comparisons, CU calcs – runaround, obstacles 12
VSA 13
Things CU-Boulder did in creating the v2.0 response-level dataset 13
Creating the v1_v2 dataset (in 2013; we didn’t recheck all this) 15
Adding a new year 16
2015 changes/updates 17
2014 changes/updates 17
NOTES, CAUTIONS specific to ONE YEAR 18
PBA file refs and summary of Boulder to do’s 18
Files to have at your fingertips
· This user guide
· The marked up codebook
· The SAS datasets and code from the passworded page.
· The ResponseRates Excel for the N_compare_Pivot tab to see patterns of administration and use of AAUDE questions over years
Files – SAS datasets, SAS formats/code
· SAS datasets – all version 9. PBA ref: L:\IR\survey\NSSE\AAUDE\All\datasets, aka nssedb
· METADATA -- Institutional metadata
· Row = school x year x IRclass (1, 4) value
· Includes 2005 – most current
· Population and sample counts, response rates, derivations, weights.
· Same info is in ResponseRates Excel with plots, narrative
· BASESTATS
· means and SD's (plus N and PctMiss) by institution, class level, and item. Row = school x year x IRclass (1, 4) value, plus marginals, x item. 2001-most current, with v1 data converted to v2 formats.
· Stats are based on compare=1 only
· Sorted by item x year x inst x class level with most recent year on top. Items are ordered by var_order
· CAUTION: here it’s variable CLASS, with values Freshman, Senior, All, rather than IRClass (values 1, 4)
· Contains handy key variables including public, Partic2013, Partic2014, Partic2015, Partic2016, v2, LastV1, compare (always 1 in this dataset)
· And item-specific variables: variable (name), LabelShort, q_num, Var_Order (as in MatchVars_v2), and InV1_v2 so you can limit rows to items with both v1 and v2 data if desired.
· Boulder had thoughts of, but decided against, taking on getting the stats into Tableau. If anyone else does, let us know, for sharing on the AAUDE-NSSE website.
· Boulder considered but decided not to add distributions (percentages) from layout (e.g., pct1, pct2, pct3, with pct who selected response values 1, 2, 3)
· Response-level data in v2 formats
· NSSEALLYRS_V2: 2013 (and later) response-level data, all institutions
· Vars include base questionnaire, NSSE recodes from base (e.g., estimated hrs from x to y scale), NSSE engagement indicator scales (e.g., HO, higher-order learning), NSSE-coded majors and CIPs, NSSE weights, items from all NSSE modules, AAUDE questions, and CU-added utility variables (e.g., COMPARE)
· See Enhancements.sas below to change to shorter variable labels and attach formats.
· CONVERT_V1_V2: Converted version 1 data to version 2 data.
· 2001-2012 response-level data, all institutions.
· Contains the 120 items from v1 that could be converted to a v2 item. Items have been renamed to the v2 name and values converted to v2 values.
· 233,000 rows – this N will not increase. This dataset will not be updated for future administrations aside from adding e.g. Partic2015 flags to all rows of schools participating that year
· Use the provided SAS code to put these two datasets together and select from them for longitudinal analyses.
· NEW Nov. 2015: NSSEALLYRS_V2_TEXT: 2014 (and later) response-level data for the open-ended response questions, all institutions. These items were not on what AAUDE got from NSSE in 2013.
· Row = 1 respondent, with year, IPEDS, INST, IRclass, and surveyID.
· Can be merged with NSSEALLYRS_V2 on YEAR, IPEDS, INST, IRclass and SURVEYID. (surveyID may not be unique across the other keys)
· Not all respondents are in this dataset. The dataset contains only respondents who made at least one non-blank response to one of the below items.
· Vars include
· GENDERID_TXT (gender identity),
· SEXORIENT_TXT (sexual orientation),
· ADV03_TXT (academic advising module, primary source of advice),
· CIV03_TXT_PLUS (civic engagement module, most meaningful experiences),
· FYSFY06B_6_TXT (FY experiences & SR transitions module, considered leaving)
· FYSSR01A_TXT (plans after graduation),
· FYSSR08_TXT_PLUS (anything institution could have done better to prepare for career or further education)
· Note: CIV03_TXT_PLUS and FYSSR08_TXT_PLUS are a combination of variables due to very long responses. CIV03_TXT_PLUS has a length of 1,062 and FYSSR08_TXT has a length of 3,843.
· CAUTIONS from Lou on using this dataset
· Some of the items follow a closed item with text additions to alternatives, so you should not analyze the text items w/o reference to the closed item.
· The dataset is structured to be efficient for merging and analysis, not storage. Most of the space in it is blank.
· Many of the items come from modules, not base NSSE, so were presented to a minority of respondents.
· We masked nothing. Some of the comments name individuals, many berate offices or individuals.
· We did not include the open-ended response items on major/discipline in either dataset. Use the CIP code variables for this instead.
· Going forward, all future open-ended response items will be in this dataset.MATCH_V2 - 'Match' dataset used to assign formats and labels. Row=variable. For v2, with a map of the conversion of v1 to v2 variables. Comes from MatchVars_v2 Excel, which is the master source. The Match_v2 dataset is used in generating code that handles the conversion of v1 to v2.
· SAS formats and code for use with v2 and v1_v2 data.
· All have file extension .sas but are ASCII files that can be opened in any text editor or even in Word.
· Functions
· Put v2 and v1_v2 data together
· Switch from the extremely long variable labels provided by NSSE to shorter labels
· Drop variables from the NSSE modules
· Create SAS formats to label values
· Attach formats to variables
· Select populations of schools, years, and respondents to suit the analyses you’re after
· Formats_v2.sas – creates formats to label values.
· see codebook markup or MatchVars Excel or match dataset for what formats go with what vars.
· Enhancements.sas attaches formats to variables in a temporary dataset.
· PBA ref: L:\IR\survey\NSSE\AAUDE\All\gencode\Formats_v2.sas
· Enhancements.sas
· Run FIRST.
· Allows user to switch to shorter variable labels; also creates and attaches formats to variables in the dataset. Calls formats_v2.sas
· PBA ref: L:\IR\survey\NSSE\AAUDE\All\Enhancements.sas
· Longitudinal.sas
· Run AFTER Enhancements.sas.
· Code to concatenate nsseallyrs_v2 with convert_v1_v2 to allow longitudinal analysis across v1 to v2, and to select comparable populations over time.
· Creates temp dataset long1, with all v2 vars (except those from modules, if dropped) plus two vars from v1: LastV1 (0/1 marking latest year a school did v1) and aau17v1 (the AAUDE “obstacles” question from v1, for reference – not similar enough to v2 AAUDE obstacles question to recode). New vars in v2 will be missing in v1 years.
· PBA ref: IR\survey\NSSE\AAUDE\All\Longitudinal.sas -
· convert_v1_v2.sas
· For reference, what we did in converting v1 to v2. Users do not need to run or understand this!!
· PBA ref: L:\IR\survey\NSSE\AAUDE\All\Convert_V1_V2.sas
· NSSEinit.sas
· Initialization file for CU-Boulder users. Suggest you establish similar with your locations and directives
· PBA ref: l:\ir\survey\nsse\aaude\NSSEinit.sas
· In a separate ZIP - V1 only datasets, with v1 variable names – You should not need these at all.
· Datasets: Basestats_v1, Match_v1, nsseallyrs_v1 (same as nsseallyrs through 2012 data), nsse_oneyear_v1 (with 170 variables used in only one year)
· Use ONLY if you have code for v1 that you want to use on v1 only data
· PBA ref: L:\IR\survey\NSSE\AAUDE\All\datasets\archive, aka nsardb
Key variables – for ID/selecting years, schools, respondents
· KEY variables are noted with the word KEY in the label -- meaning, critically important – not all are literally keys, nor are all literal keys so labelled. Generally near top of dataset. In both convert_v1_v2 and nsseallyears_v2 datasets except as noted
· YEAR -- Survey Year (KEY) – Numeric, e.g. 2013 – coverage depends on dataset. IS A KEY. No missing. YYYY. 2001+.
· INST -- Institution Name (KEY) – Short (16 char) version; e.g. ‘Colorado’ – these are identical over years and 1-1 with IPEDS so you can use them as a key. No missing.
· IPEDS -- IPEDS code (KEY) – Character, e.g. ‘126614’. IS A KEY. The 6-digit IPEDS code. Text variable. XXXXXX for McGill, YYYYY for Toronto. (NSSE uses 24002000 McGill, 35015001 Toronto )
· PUBLIC -- Public Institution (0=no, 1=yes)
· TRGTINST -- Target institution (set to 1 for own institution) – 0/1 numeric; initially all zero.
· Partic2013, Partic2014, Partic2015, Partic2016 are 0/1 vars set for a school on all years administered – These make selecting comparable populations easy. E.g. “where Partic2013” gives all years for the schools that administered in 2013.
· Partic2016 is already set (based on 10-26-15 info; could change). We’ll add Partic2017 etc. in annual updates.
· AAUDEQ - 0/1 (zero, one) variable where 1 = respondent answered at least one AAUDE question. 0 = did not. 0 if school didn’t ask AAUDE q’s in a year, OR if student didn’t answer any of them.
· Property of an individual response/row
· COMPARE – Respondent-level. COMPARABLE over schools: Random sample of eligible first years/seniors (KEY) – 0/1 numeric. Critically important = use “where compare” for ALL comparisons across schools.
· IRclass -- Institution-reported: Class level (KEY) – IS A KEY to stats.
· Values 1, 4 – Numeric, for first-year, seniors
· Other values occur but not within COMPARE=1.
· Really not “class level” – e.g., 1=first year can include soph status.
· SurveyID – NSSE-assigned ID number – important only for linking to your own student records
· V2 -- NSSE version 2 (2013 & later) - 0=no, 1=yes.
· LastV1 – 0/1, marks cases from the school’s last v1 administration. Not in the nsseallyears_v2 dataset. Useful for comparing v2 responses with responses from only the last v1 year for a school.
Variables UCB added or modified
· YEAR, PUBLIC, TRGTINST, COMPARE, V2, Partic2013, Partic2014, Partic2015, Partic2016, LastV1
· CIP codes and descriptors of major codes (see section on Majors, below)
· SATT (V+M)
· MULTRE – calc, were multiple racial/ethnic categories checked
· PctValidQ1_19 and PctValidQ20-36 – pct of questions student made a valid response to – see section on Missing Data, below
· We changed NSSE variable “logdate” from a date-time to a date.
· A new var, logdatetime, records the date-time so you can see how many did it at 3am
· AAU13Avg, NValid, N_1, N_34 – derivations from items AAU13a-g, the new-format obstacles question
· BACH – 0/1 expecting bachelors at this school – derived from AAUDE q
Notes on other variables – HIPsum.., engagement indicators, duration, sexorient
· Variables HIPsumFY and HIPsumSR are calc’d by NSSE from items judged “high impact practices.”
· The variable labels tell the varnames of the items that contribute – 3 for FY, 6 for SR
· HIPsumFY: Number of high-impact practices for first-year students (learncom, servcourse, and research) marked 'Done or in progress'
· HIPsumSR: Number of high-impact practices for seniors (learncom, servcourse, research, intern, abroad, and capstone) marked 'Done or in progress'
· The HIPsumXX vars range 0 to 3 or 6, with one point for each item marked “done or in progress.”
· HIPsumFY is missing for all IRclass=4 (seniors), and vice versa
· If a student answered none of the 3 or 6 items, the HIPsumxx variable is missing. It is NOT missing if the student answered even one of the 3 or 6.
· HIPsumFY is missing for about 15% of IRclass=1
· HIPsumSR is missing for about 13% of IRclass=4
· Note that the 3 items in the HIPsumFY scale are also in the HIPsumSR scale, with internship, study abroad, and capstone added.
· The Engagement Indicators (EI) – variables HO RI LS QR CL DD SF ET QI SE – see p 17 of the codebook markup. The var labels also show what items are included. All range 0 to 60, oddly.