STA 101: Data Analysis and Statistical Inference LAB 1
Introduction to DataSpeed Dating
Dr. Kari Lock Morgan
This lab is designed to help you get comfortable looking at datasets, thinking about cases and variables, and thinking about the types of interesting questions that data and statistics can help us answer.
PART ONE: SPEED DATING
Today we will look at data on speed dating, and questions regarding heterosexual romantic attraction. The data come from speed dating experiments conducted at Columbia University between 2002-2004. Participants were students at Columbia’s graduate and professional schools. Each participant attended one speed dating session, in which they met with each participant of the opposite sex for four minutes. Order and session assignments were randomly determined.
After each four-minute “speed date,” participants filled out a form rating their date partner. The data is organized such that each row is a date, and each variable name is followed by either an M, indicating male answers, or an F, indicating female answers. The following variables were recorded:
Attractive, Sincere, Intelligent, Fun, Ambitious, SharedInterests: Ratings of their date partner on a scale of 1-10 on each of these attributes. 10 = highest, 1 = lowest.
Decision: 1 = Yes (want to see the date again), 0 = No (do not want to see date again)
PartnerYes: How likely they think it is that the partner will say “yes” (Decision = 1) to them, on a scale of 1-10. 10 = most likely, 1 = least likely.
Age, Race: Demographic variables
All datasets from the textbook can be accessed at FindSpeedDating at this website, and click on the .xls file to download the data as an Excel spreadsheet. Open the data and take a look!
Reference: Fisman, R., Iyengar, S., Kamenica, E., and Simonson, I., “Gender differences in mate selection: evidence from a speed dating experiment,” Quarterly Journal of Economics, 2006; 121(2): 673–697.
1)Cases and Variables
- What are the cases?
- How many cases and how many variables are in this dataset?
- Which variables are categorical?
- Which variables are quantitative?
2)Interesting Questions. For each of the following situations, come up with an interesting question you would like to use this data to help you answer. For each question, also indicate which variable(s) would be used to answer the question.
- One quantitative variable
- One categorical variable
- One quantitative variable, One categorical variable
- Two categorical variables
- Two quantitative variables
- More than two variables
PART TWO: CLASS SURVEY
You will all take a survey, creating data we can analyze throughout the semester. The survey will be made up of questions that you find interesting! As a group, (1) think of either a variable you would be interested in, or a relationship between two variables, and (2) write the survey questions to collect data on these variables. The data will be available to the class, so please avoid sensitive questions.
1)What do you want to know about Duke students taking STA 101 this semester (that can be assessed with one or two variables)?
2)What is/are the relevant variable(s)?
3)Write up to two questions to appear on the survey to collect data on these variables. The questions you write will appear verbatim on the survey, so make sure to be clear! For categorical variables, list the possible categories (maximum of five). For quantitative variables, make sure to specify units.
- QUESTION 1:
- QUESTION 2 (optional):
PART THREE: GROUPING SURVEY
You will be placed into lab groups for the rest of the semester. To aid in creating groups, please go to Sakai and take the short grouping survey (under tests and quizzes).