Ryerson Polytechnic University
QMS400: Introduction to Administrative Statistics
Solutions for exam pack
Question 1:
A sample of 474 employees in department A at D.I.S. Ltd. were drawn and their levels of education (measured in number of years) were recorded. The results are presented in Table 1 (X represents a missing number)
Table 1: ANOVA Educational Level (years)
Sum of Squares / df / Mean Square / F / Sig.Between Groups / 1622.989 / 2 / 811.4945 / 165.20 / .000
Within Groups / 2313.477 / 473 / 4.912
Total / 3936.466
(a)How categories of employees are being compared in this test?
3 categories
(b)Could you conclude at 1% significance level whether the various categories of employees at D.I.S. Ltd are equally paid? Justify you answer numerically.
F=165.2>F.01,2,473=6.85
Reject Null hypothesis, various categories are not being paid equally.
Question 2:
Sammy is a regular participant in a weekly poker game. After Playing for a year, Sammy has observed that he had lost Game 58% of the games.
(a)If the first 52 games can be considered to be a random sample of all games, is there enough evidence to conclude that Sammy has a less than 50% probability of winning any game (use 5% significance level)?
(b)Calculate the p-value of the previous test using the normal table distribution.
Z-test 1 prop
Ho: 0 versus HA: <0
z=1.10, p-value=.57
Do not reject Ho at the 5% significance level
Question 3:
A common practice in track and field is to observe the variability of performance for different categories of athletes. You observe 20 100-meter races for first-year university students. The following statistics were recorded:
- Team A (average= 10.46 seconds, standard-dev=.79 seconds, n=10 races)
- Team B (average= 10.71 seconds, standard-dev=1.29 seconds, n=10 races)
Team A has a more extensive training program than Team B.
(a)Could we conclude at the 5% level that extensive training increases performance?
Ho:AB versus HA:AB
Ttest2samp
t=-.52, p-value=.30 Do not reject Ho
(b)Could we conclude at the 1% level that extensive training reduces performance variability?
Ho:A/B 1 versus HA: A/B < 1
F=.3705 > FL(.01, 9,9)=1/FU(.01,9,9)=1/5.35=.18
Do not reject Ho
Question 4:
Based on a recent survey of 1000 decided voters, it has been estimated that the percentage of the Liberal will receive 51% of the votes in the next Federal elections plus or minus 2.6%.
(a)Determine the significance level of this confidence interval estimation.
z/2=1.645 =10%
(b)What is the required sample size to achieve 0.5% accuracy in the confidence interval (assume =5%)?
N=p(1-p)( z/2/B)2
N= .51(1-49)( z/.005)2
N=38,400 voters
Question 5
Consider the following joint frequency distribution
GMAT score above 600 / GMAT score less than 600 / Marginal probabilitiesMore than 5 years studying math / .5 / .2=1-.5-.2-1 / .7
Less than 5 years studying math / .2 / .1 / .3
Marginal probabilities / .7 / .3 / 1
X= number of years studying mathematics (more or less than 5 years)
Y= score obtained in the GMAT (more than 600 or less than 600)
Sum(joint probabilities)=1 => p=1-.5-.2-.1=.2
(b)We find that these two variables are almost independent. This result is somehow surprising given the fact the GMAT is highly mathematical. One possible explanation is that the GMAT contains logical questions as well.
(c)P(score>600/math>5 years) = P(score>600 and math>5 years)/P(math>5 years)
= .5/.7=71.42%
P(score>600/math<5 years) = P(score>600 and math<5 years)/P(math<5 years)
= .2/.3=66.66%
This conditional probability is higher for someone who studied math more than 5 years
(d)The relation that needs to be satisfied is that the sum of all joint probabilities is equal to 1. This means that p+q+.2+.1=1 => p+q=.7
(e)This conditional probability is equal to
P(GMAT>6000/Math>5 years)=P(GMAT>6000 AND Math>5 years)/P(math>5 years)
= (.7-p)/.7
This quantity is maximized for p=0 and q=.7
Question 6
mi / xi / fi / Xi / FiClass of revenue (in thousands) / midpoint / Frequency / relative frequency / cumulative frequency / Cumulative relative frequency
0-20 / 10 / 20 / 10% / 20 / 10%
20-40 / 30 / 25 / 12.5% / 45 / 22.5%
40-60 / 50 / 85 / 42.5% / 130 / 65%
60-80 / 70 / 50 / 25% / 180 / 90%
80-100 / 90 / 20 / 10% / 200 / 100%
(a)Maximum revenue = $100,000
(b)mean= fi*mi=50
Median= / 50 / =40+20*(50%-35%)/(65%-35%)The distribution is symmetrical. However, it is not very representative of the Canadian population (more negatively skewed and with lower average and median incomes). The sample size is also too small. This frequency table truncates as well revenues higher than $100,000.
(c)Not really as the median computed is only an approximation of the true median (that can be computed only from raw data). Data grouping reduces the precision of the central tendency measures computed but is less computentially intensive and facilitates the interpretation of the data set. Inappropriate bins can leads to erroneous and misleading measures.
(d)Symmetric (bell-shaped) and one peak.
Q3= / 68 / =60+20*(75%-65%)/(90%-65%)Q1= / 32 / =20+20*(25%-10%)/(35%-10%)
IQ=Q3-Q1 / 36 / =60+20*(80%-65%)/(90%-65%)
Top25%= / 68 / =60+20*(75%-65%)/(90%-65%)
Upper inner and outer fences Q1-1.5IQ= 32-1.5*36 =-16 and Q3+1.5IQ=68+36*1.5=116. So, there are no outliers in this data set.
Question 7
A sample of 40 employees at D.I.S. Ltd. were drawn and their salaries were recorded. A summary ogive is presented below.
(a)Fill in the missing relative and cumulative relative frequencies in the table below (bins represent upper bounds on modal classes):
Class / Frequency / Cumulative Frequency / Relative Frequency / Cumulative Relative Frequency15000-25000 / 2 / 2 / 5% / 5%
25000-35000 / 13 / 15 / 32.5% / 37.5%
35000-45000 / 12 / 27 / 30% / 67.5%
45000-55000 / 6 / 33 / 15% / 82.5%
55000-65000 / 3 / 36 / 7.5% / 90%
65000-75000 / 2 / 38 / 5% / 95%
75000-85000 / 2 / 40 / 5% / 100%
(b)Compute the mean and the median from the above table. What does this tell you about the salary distribution at DIS?
Median=35,000+(45,000-35,000)*(67.5%-50%)/(67.5%-37.5%) =40833.33
Mean = sum(mi*fi)= 42250
Since mean<median distribution slightly positively skewed (to the left)
(c)Based on the ogive presented, determine the interquartile range and set up the Box-and Whisker plot. Rediscuss briefly the salary distribution at DIS.
q1 / 27307.69q3 / 50000
IQ= / 22692.31
lower fence= / -6730.77
inner upper fence= / 84038.46
Upper extreme value 85,000
lower extreme value 85,000
Distribution positively skewed
(d)What salary do the best paid 10% employee get?
65,000+
(e)If an employee is chosen randomly, what is the probability that:
- He or she earns a compensation of more than $50,000
25%
- He or she earns a compensation of less than $35,000
42.5%
- He or she earns a compensation of between $30,000 and $40,000.
31.5%
1