Version 1.0

Public Agenda Data and Mosaic Plots Activity

** Looking at two variables at once

Yesterday, we looked at bar plots about grades and effort separately.

A good question to ask is “are the two are related?”

R code / table(survey$grade, survey$effort)
Description / This groups the data down into how many people answered the 2 questions the same way. In other words, there are 311 students that an A AND are Trying best to do well in school.
Output /

To represent this graphically, we can use a mosaic plot.

R code / mosaicplot(table(survey$grade,survey$effort))
Description / Graphically compare 2 categorical variables
Output /

How to interpret the mosaic plot:

What grade is the most common?

What grade is the least common?

Each column is broken down to how many people within it answered differently.

Within those students with A’s, are most of them trying their best or could they try harder?

Within those students with B’s, are most of them trying their best or could they try harder?

All the sizes are proportional to the numbers in the tables. So if twice as many respond a certain way, then the height would be twice as big in the mosaic plot.

Looking at the mosaic plot as a whole, is there a trend?

Changing the order of the variables:

R code / mosaicplot(table(survey$effort,survey$grade))
Description / If you switch the order of the variables, the mosaic plot will be drawn differently. It may suggest a different story.
Output /

What do you see in the second plot?

R code / mosaicplot(table(survey$grade,survey$effort),col=c("cyan","magenta"))
Description / Adding color to your mosaic plot.
Output /

Now explore:

Try making mosaic plots with different combinations of the available questions: year, effort, homework, grades. Find another plot and explain what story it tells.

Extra Credit: Playing with the whole survey.

** The original data

The data you have been working with comes from a larger survey conducted by Public agenda. As we mentioned before, it was created by "sampling" students from a large registry. Again, imagine students' names placed in a hat and draw them one at a time and call them with the questions. The "code book" (survey questions and answers) is available at:

http://mobilize.stat.ucla.edu/day1/data/students.doc

You can access the original data for the surveys through R directly. The group who created the data made it available in the format of another statistical programming environment called SPSS. Just like there are many general programming languages (Perl, Python, Java, C, C++) there are lots of statistical programming languages as well. To read this "foreign" format into R we need to add to R's functionality by loading a "library". People contribute libraries to R to allow statisticians to represent new kinds of data, perform new computations, make new graphics and so on. Here, we want to use a function that lets R read an SPSS data set.

> library(foreign)
> big_survey <-
read.spss("http://mobilize.stat.ucla.edu/day1/data/reality_check.sav",
to.data.frame=TRUE)

Here, the library() command introduced a new function, read.spss(). The arguments to this function are the URL where the data are located and a request that the data be read into a data table (again, formally, a data frame).

Now, the code book labels its questions K1 through K48 or so. These are translated into the names of the variables in "big survey". For example, the question about a students' grades is K36 in the main
survey which we could access with

> big_survey$qk36

You can now look through the questions and come up with a few that you are interested in. What story can you tell?

Exploring Computer Science—Unit 6: Participatory Urban Sensing “R” Supplement Page 8