Part 1: RBI

The solution is not unique since the value of the mean is affected by the individual values in the sample. The team with the highest RBI is 5, 7, 11, 13, 14, 16, 17, 18, 22, 24. They have a team RBI of 94.2

Part 2: Strikeouts

124

13

245

55248

11645779

1270

(2)859

119

111044

811116

51246

31346

114

1153

This solution is also unique for the same reason as in part 1. This team has a team mean of 55.6. The team is 1, 4, 8, 9, 12, 15, 19, 21, 23, 25.

Part 3: Homeruns

In this case, the solution is not unique. In fact, we can replace, for example, the best or worst member of the team with anyone else on the same side of the median and the median is not affected. That’s because median only considers the relative order of the observations. Hence, a team with the best team median homeruns is 24, 17, 5, 14, 13, 7, 2, 22, 18, 23. They have a median of 32.

Part 4: Batting Average:

The calculated batting averages are shown below:

PLAYER / Batting Average:
23 / Ian Kinsler / 0.319
12 / Ryan Theriot / 0.307
4 / Brian Giles / 0.306
7 / Aubrey Huff / 0.304
6 / Shane Victorino / 0.293
5 / Jermaine Dye / 0.292
18 / Matt Kemp / 0.290
11 / James Loney / 0.289
22 / Garrett Atkins / 0.286
8 / Ivan Rodriguez / 0.276
17 / Prince Fielder / 0.276
16 / David Murphy / 0.275
1 / Carl Crawford / 0.273
10 / Aaron Rowand / 0.271
24 / Carlos Delgado / 0.271
15 / Edgar Renteria / 0.270
25 / Jeff Keppinger / 0.266
2 / Cody Ross / 0.260
20 / Kosuke Fukudome / 0.257
14 / Pat Burrell / 0.250
9 / Pedro Feliz / 0.249
13 / Jason Giambi / 0.247
21 / Jason Kendall / 0.246
19 / Emil Brown / 0.244
3 / Jeff Francoeur / 0.239

Again, the team with the highest team median is not unique. So we choose this team with a highest median: 23, 12, 4, 7, 6, 5, 18, 11, 22, 8. The team median is 0.292.

Part 5: Stolen Bases.

This data is right skewed with large outliers.

To find the team with the least variation, we observe that most of the data in this distribution are located at the lower values. Hence we choose the lowest 10 stolen bases to make up the team with the least variability in their number of bases stolen. The team is

Part 6: Probability of Base Stealing.

6, 18, 23, 1, 12, 20, and 8 are the players who have stolen more than 10 bases. Hence, we choose these 7 players to be on the team and the other three players don’t matter. The probability of choosen a player with more than 10 stolen bases is 0.7. We choose, for the remaining three players (though this choice doesn’t matter) 14, 3, and 9.

Part 7: Confidence interval

We choose numbers from 1-25 out of a hat (without replacement, obviously) in order to select our team. The following team results: 4, 7, 8, 10, 15, 19, 21, 22, 24, 25.

Sample average batting average is: 0.274

Sample standard deviation is 0.0208.

Critical value for the CI: 2.262. Hence:

0.274 +/- 2.262*0.021/sqrt(10) = (0.258, 0.289)

Since we noted earlier than the distribution of batting averages is skewed to the right, we choose the players with the lowest batting averages to make the narrowest confidence interval. That is because in right skewed data the lowest values are the most tightly grouped. Hence, they will have a lower standard deviation. Hence, the confidence interval will be narrower. The team is: 15, 25, 2, 20, 14, 9, 13, 21, 19, 3

The sample mean is 0.253. The sample standard deviation is 0.010. So, the confidence interval in this case is:

0.253 +/- 2.262*0.010/sqrt(10) = (0.246, 0.260)

This interval, though, is not useful or interpretable. That is because the sample is not random. So we can’t interpret it in the same way as a regular confidence interval.

Part 8: p-value.

I’m assuming here that the hypotheses for this test are:

H0: p>=0.5

Ha: p<0.5

The sample proportion for the random team is 0.2.

The test statistic is: z=(0.2-0.5)/sqrt(.5*.5/10)= -1.897

The p-value for this test is P(z<-1.897) = 0.0289

Hence we would reject the null hypothesis (at 0.05 level). And conclude that the proportion of players with a batting average of 0.3 is less than 0.5.

Part 9: Correlation.

The red highlighted squares represent the chosen players. They were chosen because they seem to fall on the same line. The team that is represented by these points is 21, 25, 1, 15, 20, 3, 4, 11, 18, 22. The correlation coefficient of these points is 0.9126.

Part 10: The best team!

The best team that I can find to maximize (or minimize) the characteristics:

Number / PLAYER / AB / H / R / SB / SO / HR / RBI / Batting Average
24 / Carlos Delgado / 598 / 162 / 96 / 1 / 124 / 38 / 115 / 0.271
5 / Jermaine Dye / 590 / 172 / 96 / 3 / 104 / 34 / 96 / 0.292
17 / Prince Fielder / 588 / 162 / 86 / 3 / 134 / 34 / 102 / 0.276
22 / Garrett Atkins / 611 / 175 / 86 / 1 / 100 / 21 / 99 / 0.286
7 / Aubrey Huff / 598 / 182 / 96 / 4 / 89 / 32 / 108 / 0.304
13 / Jason Giambi / 458 / 113 / 68 / 2 / 111 / 32 / 96 / 0.247
2 / Cody Ross / 461 / 120 / 59 / 6 / 116 / 22 / 73 / 0.260
11 / James Loney / 595 / 172 / 66 / 7 / 85 / 13 / 90 / 0.289
23 / Ian Kinsler / 518 / 165 / 102 / 26 / 67 / 18 / 71 / 0.319
4 / Brian Giles / 559 / 171 / 81 / 2 / 52 / 12 / 63 / 0.306

Their team stats are:

Average RBI / 91.300
Average Strikouts / 98.200
Median HR / 27.000
Median Batting Average / 0.288
St. Dev. Of SB / 7.472
r of runs v. homeruns / 0.357
P(SB<10) / 0.900
Width of confidence interval for mean batting average / 0.031
p-value for hypothesis test: / 0.1188