GSS Computer Lab

In this lab, we going to examine and describe two-way tables built with data from the General Social Survey, the GSS. This survey has been conducted in the United States by in-person interviews since 1972; it was run annually until 1994, and every other year since then. The survey asks a wide range of questions, collected both demographic details such as age, race, gender and education, and answers to a wide range of questions on a range of topics diverse enough to include attitudes towards government spending on mental health, belief in hell, and marijuana usage.

·  Go the website sda.berkeley.edu, and click on the SDA Archives link.

·  Now click on the first line, the GSS Cumulative Datafile 1972-2012.

On the left side of your screen, Variable Selection, we can look at the different questions that people were asked and see how they answered those questions.

·  Go to the blue box labeled “Codebooks” at the top, and click on “Standard Codebook”. A separate window should open up, with details of all of the questions that were ever asked, from 1972 to 2012.

·  Click on Sequential Variable List. This will bring up a list of categories of questions. If you click on a topic here, such as Religious Attendance and Identity under the RESPONDENT BACKGROUND VARIABLES heading, you will see a list with a slightly cryptic short code word, and a brief summary of the question.

·  Click on a code word, such as ATTRELIG. These code words are called “variables”; you’ll need to know the variable in order to work with that question later on. You will find the exact question that was asked – precise wording is very important – and how people answered it. Spend a while examining the different questions that were asked, and make a list of variables that you find interesting.

Say you’re curious about the relationship between political affiliation and religion. Is it true that people who are politically conservative are more religious? There’s not one simple question that measures whether someone is “religious”. For now, we’re going to use the variable "ATTRELIG," which we just looked at, so we’ll be getting an answer to the more specific question of whether there’s a difference in religious attendance, based on political orientation. You’ll find the variable POLVIEWS, under Personal and Family Information/Voting Patterns, which answers to the question "We hear a lot of talk these days about liberals and conservatives. I'm going to show you a seven-point scale on which the political views that people might hold are arranged from extremely liberal--point 1—to extremely conservative--point 7. Where would you place yourself on this scale, or haven't you thought much about this?"

·  Type ATTRELIG in the Row Variable box on the right side of the screen, or find it in the variable list on the left, select it with a click, and click the button for Copy to: Row.

·  Make POLVIEWS the column variable, either by typing it into the Column variable box on the right, or finding it on the left and clicking for Copy to: Column.

Now look at the Table Options and Chart Options, on the right hand side.

·  Under Table Options, Click for column percentages; later, you should go back, unclick column percentages, and select row percentages.

·  Optionally, select 0 decimal places, instead of 1; this’ll make the numbers a little easier to talk about, and we really won’t loose much important detail.

·  Click question text, so you get the exact wording of the questions.

·  Unclick color coding, as it can be confusing.

·  Now click on the “Run the Table” button, and try to understand what the percentages mean. What conclusions can you draw from the resulting table? The pie charts should help.

Let’s look at the ATTRELIG and POLVIEWS example, in detail. Here are two tables comparing these variables, the first with column percents and the second with row percents. In each table, the bold numbers tell you percentage of people; the non-bold quantities are based on the number of people who gave each answer.

First, you need to be sure you understand what each percentage means. You can tell the table gives column percents because the column totals are all 100%. Look at the “31” in the first table where “MODERATE” meets “YES”. This means that 31% of moderates attended religious services in the week before they were surveyed. As the second table gives row percents, the “35” in the same box instead tells us that 35% percent of people who attended services in the last week were moderates. In this case, I think that the column percentages are clearer than the version that uses row percentages, so I’ll select that one to summarize.
In that first table, you see that the percentages in the “YES” row go up as you look at people who are more conservative. This means that there is, indeed, a link between religious attendance and political leanings. Now, let’s write a description of this table, using the GEE technique: first Generalize, then Example, then Exception.[1]

·  Generalize – describe the basic pattern in the data: The more politically conservative people are, the more likely they are to have attended religious services in the last week

·  Example – give a specific set of numbers that illustrate this pattern: Fully 52% of conservatives said they attended religious services in the last week, while only 22% of liberals did.

·  Exception – describe any break in the pattern, or other observations that differ in some way from the generalization: People in the middle of the political spectrum, describing themselves as either slightly liberal, moderate, or slightly conservative, all had very similar levels of religious attendance, at about 30%.

Warnings, More details, and Useful Variables:

§  Be sure to always look at the precise question text! In the test, R stands for respondent – the person answering the question. (“You”, in other words). Some questions are only asked of people who had answered “yes” to a previous question. If you’re confused by what, exactly, a question means, it may be best to skip that one and use a different questions

§  Sometimes, you won’t be able to compare two different variables; you’ll get an error message. This happens when nobody answered both questions, which usually occurs when they were asked in different years.

§  If you use a variable with many possible options (like your age, or how old you were when you got married), the resulting table will have too many entries to be easily readable. One such variable, EDUC, gives the number of years of education; DEGREE, which records the highest degree attained, is an easier-to-use variable that may address the same basic idea.

§  When there are very few people who answered both questions, or some answers given by very few people, it’s dangerous to generalize the results.

§  The variable SEX records whether the respondent is male or female; YEAR records when this person was interviewed.

§  As this data spans the years from 1972 to the present, you shouldn’t use any variables that provide financial information, such as income. $1000 was worth a lot more in 1972 than in 2012.

§  If you want to compare three variables, you can enter the third under “Control”. You’ll now get a number of different charts; for example, if your control was sex, you’ll get a chart showing how women answered the questions, and a chart showing how men did.

Assignment:

1.  List three different variables that interest you, giving the variable name and the question text.

  1. Following the same process we went through for ATTRELIG and POLVIEWS above, you should describe the results of comparing three different pairs of GSS variables. You should not re-use any variable. List the variables that you chose to compare, and write a paragraph for each pair describing the resulting table. You’ll want to experiment with both row and column percentages, and select one. Do you find the results surprising, or are they about what you would expect?
    You’ll almost certainly try some pairs that won’t work well, because they can’t be compared, because there are too many entries on the table, or simply because you find the table too confusing, with no clear patterns. Feel free to give up on such pairs, and move on to try something else instead.

[1] The GEE technique comes from Jane Miller’s The Chicago Guide to Writing about Numbers