Statistics Yue Oct. 5, 2010
Assignment #1, Due 2010/10/19
1. Taipei City collected data from a sample of 910,182 households (Report on Family Income and Expenditure Survey Taipei: 2003). Fifty-five percent of the households indicated an annual income of NT$1,000,000 or more. Suppose 50% of the heads of households have a VISA credit card.
(a) What is the population of interest in this study?
(b) Is annual income a qualitative or quantitative variable?
(c) Is ownership of a VISA credit card a qualitative or quantitative variable?
(d) Does this study involve cross-sectional or time series data?
(e) Describe any statistical inferences Taipei City might make on the basis of the survey?
2. Leverock’s Waterfront Steakhouse in Maderia Beach, Florida, uses a questionnaire to ask customers how they rate the server, food quality, cocktails, prices, and atmosphere at the restaurant. Each characteristic is rated on a scale of outstanding (O), very good (V), good (G), average (A), and poor (P). Use descriptive statistics to summarize the following data collected on food quality. What is your feeling about the food quality ratings at the restaurant?
G / O / V / G / A / O / V / O / V / G / O / V / AV / O / P / V / O / G / A / O / O / O / G / O / V
V / A / G / O / V / P / V / O / O / G / O / O / V
O / G / A / O / V / O / O / G / V / A / G
Summarize the data by constructing the following:
(a) A frequency distribution
(b) A percent frequency distribution
(c) A bar graph
(d) A pie chart
(e) What do the summaries tell you about employee preferences in the flextime system?
3. Use the random number generator in R to generate observations. Generate 100 observations each from U(0,10), N(5,16), and Exp(5).
(a) Calculate the averages and standard deviations of these three distributions. Then, draw histograms and boxplots to compare their differences.
(b) Using these data to check the validity of Chebyshev’s theorem.
(c) Check if the data satisfy the empirical rule and give your interpretation.
(d) Measure the skewness of these three data sets.
4. Do larger companies generate more revenue? The following data show the number of employees and annual revenue for a sample of 20 Fortune 1000 companies (Fortune, April 17, 2000).
Company / Employees / Revenue($millions) / Company / Employees / Revenue
($millions)
Sprint / 77,600 / 19,930 / American Financial / 9,400 / 3,334
Chase Manhattan / 74,801 / 33,710 / Fluor / 53,561 / 12,417
Computer / 50,000 / 7,660 / Phillips Petroleum / 15,900 / 13,852
Sciences / 89,355 / 21,795 / Cardinal Health / 36,000 / 25,034
Wells Fargo / 12,200 / 2,398 / Borders Group / 23,500 / 2,999
Sunbeam / 29,000 / 7,510 / MCI Worldcom / 77,000 / 37,120
CBS / 69,722 / 27,333 / Consolidated Edison / 14,269 / 7,491
Time Warner / 16,200 / 2,743 / IBP / 45,000 / 14,075
Steelcase / 57,000 / 17,796 / Super Value / 50,000 / 17,421
Georgia-Pacific / 1,275 / 4,673 / H&R Block / 4,200 / 1,669
(a) Provide a crosstabulation of Employees and Revenue.
(b) Prepare a frequency distribution and an ogive plot for the variable Employees.
(c) Prepare a stem-and-leaf plot for the variable Revenue.
(d) Prepare a scatter diagram to show the relationship between the variables Revenue and Employees.
(e) Comment on any relationship between the variables.
5. There are many websites which can provide data for practicing statistical analysis and the website of TVBS Poll Center (www.tvbs.com.tw) is one of them. Choose one survey and answer the following questions.
(a) First, using Excel to save the data and then using software R to read the data in both the format of “txt” and “csv.” You need to write down the whole process.
(b) Comment on the types of data collected from the survey.
(c) Choose at least two variables and apply the techniques you learn from the class. Then, comment on what you find.
(d) Compare your analysis results with the report from TVBS, and verify if there are any differences.
6. A patient takes a lab test and the result comes back either positive or negative. The test returns a correct positive result in 99% of the cases in which the disease is actually present, and a correct negative result in 98% of the cases in which the disease is not present. Furthermore, .001 of all people have this cancer.
(a) If a person is tested positive, what is the probability that this person has the cancer.
(b) If a person takes two independent tests and both return positive results. What is the probability that this person has the cancer?
(c) If a person takes two independent tests and only one returns positive result. What is the probability that this person has the cancer?
Note: This assignment is team-based and each team can have one, two, or three members. You need to hand in the homework in hard-copy and write down the names of your team members.