Statistics, Data,

and Statistical Thinking Chapter 1

1

1.2Descriptive statistics utilizes numerical and graphical methods to look for patterns, to summarize, and to present the information in a set of data. Inferential statistics utilizes sample data to make estimates, decisions, predictions, or other generalizations about a larger set of data.

1.4The first element of inferential statistics is the population of interest. The population is a set of existing units. The second element is one or more variables that are to be investigated. A variable is a characteristic or property of an individual population unit. The third element is the sample. A sample is a subset of the units of a population. The fourth element is the inference about the population based on information contained in the sample. A statistical inference is an estimate, prediction, or generalization about a population based on information contained in a sample. The fifth and final element of inferential statistics is the measure of reliability for the inference. The reliability of an inference is how confident one is that the inference is correct.

1.6Quantitative data are measurements that are recorded on a meaningful numerical scale. Qualitative data are measurements that are not numerical in nature; they can only be classified into one of a group of categories.

1.8A population is a set of existing units such as people, objects, transactions, or events. A sample is a subset of the units of a population.

1.10An inference without a measure of reliability is nothing more than a guess. A measure of reliability separates statistical inference from fortune telling or guessing. Reliability gives a measure of how confident one is that the inference is correct.

1.12Statistical thinking involves applying rational thought processes to critically assess data and inferences made from the data. It involves not taking all data and inferences presented at face value, but rather making sure the inferences and data are valid.

1.14a.The two variables measured are ‘type of credit card used’ and ‘amount of purchase.’

‘Type of credit card used’ is qualitative. It has no meaningful number associated with it, only the name of the card used. ‘Amount of purchase’ is quantitative. It has a meaningful number associated with it.

  1. In Study 1, it says that all purchases were tracked. Thus, the data represent a population.

1.16a.High school GPA is a number usually between 0.0 and 4.0. Therefore, it is quantitative.

b.Honors/awards would have responses that name things. Therefore, it would be qualitative.

c.The scores on the SAT's are numbers between 200 and 800. Therefore, it is quantitative.

d.Gender is either male or female. Therefore, it is qualitative.

e.Parent's income is a number: $25,000, $45,000, etc. Therefore, it is quantitative.

f.Age is a number: 17, 18, etc. Therefore, it is quantitative.

1.18a.1. The variable of interest is the status of a company’s e-commerce strategy.

Since a company either has an e-commerce strategy or not, the variable is qualitative.

2.The variable of interest is when the company will implement an e-commerce plan. Since the time of implementation will be a date, this variable will be qualitative.

3.The variable of interest is whether the company is delivering products over the internet or not. Since the company is either delivering products or not, the variable is qualitative.

4.The variable of interest is the company’s total revenue in the last fiscal year. Since this is a meaningful number, this variable is quantitative.

b. Since there are many more that 154 companies in the U.S., this represents a sample rather than a population.

1.20a.The population of interest is the collection of computer security personnel at all U.S.

corporations and government agencies.

b.Surveys were sent to computer security personnel at all U. S. corporations and

government agencies. However, in 2001, only 538 organizations responded to the survey. There could be nonresponse bias. Often, only those subjects with strong opinions will respond to a survey. Thus, the responses may not reflect what the population as a whole thinks.

  1. The variable measured in the survey is whether or not there was unauthorized use of computer systems at the firms during the year. Since the responses will be either ‘Yes’ or “No’, the variable is qualitative.
  1. If we assume that the responses were a random sample from the population, we could infer that about 64% of all computer security personnel will admit to unauthorized use of computer systems at their firms during the year.

1.22a.The data collection method used is a designed experiment.

  1. The experimental units in the study are the 50,000 smokers.
  1. The variable of interest is the age at which the scanning method first detects a tumor. Since this is a meaningful number, this variable is quantitative.
  1. The population of interest is the set of all smokers in the U.S. The sample of interest

is the set of 50,000 smokers surveyed.

  1. The researchers want to compare the age at first detection for the 2 methods to see if one is more sensitive than the other.

1.24a.The variable of interest to the researchers is the rating of highway bridges.

b.Since the rating of a bridge can be categorized as one of three possible values, it is qualitative.

c.The data set analyzed is a population since all highway bridges in the U.S. were categorized.

d.The data were collected observationally. Each bridge was observed in its natural setting.

1.26a.The population of interest is the set of all New York accounting firms employing two or more professionals. There are two variables of interest: Whether or not the firm uses audit sampling methods, and if so, whether or not it uses random sampling. The sample is the set of 163 firms whose responses were useable. The inference of interest to the New York Society of CPAs is the proportion of all New York accounting firms employing two or more professionals that use sampling methods in auditing their clients.

b.The four responses that were unusable could have been returned blank or could have been filled out incorrectly.

c.Any time a survey is mailed it is questionable whether the returned questionnaires represent a random sample. Often times, only those with very strong opinions return the surveys. In such a case, the returned surveys would not be representative of the entire population.

1.28a.The population of interest is the set of all large investors in the United States.

b.The variable of interest is the contact (or noncontact) large investors had with a corporate director.

c.The sample is the group of 240 large investors who were questioned by the Wirthlin Group.

d.Based on the results of the survey, the Wirthlin Group can infer that approximately 40% of all large investors in the United States had contacted a corporate director.

1.30a.The process being studied is the process of filling beverage cans with softdrink at CCSB's Wakefield plant.

b.The variable of interest is the amount of carbon dioxide added to each can of beverage.

c.The sampling plan was to monitor five filled cans every 15 minutes. The sample is the total number of cans selected.

d.The company's immediate interest is learning about the process of filling beverage cans with softdrink at CCSB's Wakefield plant. To do this, they are measuring the amount of carbon dioxide added to a can of beverage to make an inference about the process of filling beverage cans. In particular, they might use the mean amount of carbon dioxide added to the sampled cans of beverage to estimate the mean amount of carbon dioxide added to all the cans on the process line.

e.The technician would then be dealing with a population. The cans of beverage have already been processed. He/she is now interested in the outputs.

Statistics, Data, and Statistical Thinking1