STATC141, BIOE C141

Spring 2006, Midterm

SHOW YOUR WORK

NAME:

ID:

Q1
Q2
Q3
Q4
Q5
Q6
Extra
Total
Full Mark / 100+10

1. True/False (28 points). Explain your answer briefly.

  1. (4pt) If EX=10, then E(3-2X)=17.
  1. (4pt) If Var(X)=2, then Var(4X+3)=11.
  1. (7pt) A six-sided die is rolled once. Let the random variable X denote the number on the upturned face. Define A={X6}, and B={X is odd}. If the die is fair, P(A|B)=1.
  1. (6pt) The events A and B in the previous problem are independent.
  1. (7pt) A fair coin is tossed until we observe heads exactly 50% of the time, then we stop. Let X be the total number of heads we observe before we stop. X is a Binomial random variable.

2. (20 points) A new design for the braking system on a certain type of car has been proposed. For the current system the true average braking distance at 40 mph under specified conditions is known to be 120 ft. It is proposed that the new design be implemented only if sample data strongly indicates a reduction in the true average braking distance for the new design. Suppose braking distance for the new system is normally distributed with =10 ft. is the sample average braking distance for a random sample of 36 observations and is 115 ft. Should the new system be implemented? Please state the relevant hypotheses (null and alternative hypotheses), calculate the relevant test statistic and its associated p-value and draw a conclusion based on the p-value.

3. (16 points)Tuddenham and Snyder obtained the following results for 66 California boys at ages 6 and 18 (the scatter plot is football-shaped):

Average height at 6 = 3 feet 10 inches, SD = 2 inches

Average height at 18 = 5 feet 10 inches, SD = 2 inches, r = 0.8

a. (8 points) If a boy is 4 feet at age 6, what is your guess of his height at age 18?

b. (8 points) If a boy is 6 feet at age 18, what is your guess of his height at 6?

4. (12 points) Let X be the indicator of presence (1) or absence (0) of TATAAT in a random block of 5000bps in the genome of E. Coli, and Y the indicator of presence (1) or absence (0) of TTGACA. From 1166 non-overlapping blocks of 5000bps in a database, the following table is derived:

Y
0 / 1
X / 0 / 495 / 278
1 / 193 / 200

What is the estimated correlation coefficient between X and Y ?

(hint: )

5. (18 points) We assume initially that there are N fragments, each of length L, and that the original full-length DNA sequence, which we call here the genome, is of length G. The coverage a is given by . We also assume that G is much larger than L (and then we ignore the end effects), and that the N fragments are taken at random from the original full-length sequence.

Let p be the mean proportion of the genome covered by one or more fragments. Prove that in order for p to be 0.999 it is necessary to have a=6.9.Please present the details.

(Hints:

  1. The position of left-hand end of any fragment is uniformly distributed in (0, G);
  2. The ultimate Binomial is Poisson;
  3. If X is Poisson distributed with mean , then )

6. (6 pts) A score for global alignment of with is , where E is the number of matches, F is the number of mismatches, and G is the number of deleted letters. Please comment on the optimal alignments obtained under this score system with and (hint: briefly discuss what the optimal alignments would be like under this scoring system).
Additional Problem (Extra 10 points) Suppose that n=15, and

Let =0, =1 be the estimates of mean and standard deviation of above observations. Please design a goodness-of-fit test for how good the above data fit the normal distribution with mean 0 and SD 1. (hint: t-test is not the expected answer since it only compares the means. You could make use of the attached normal table)

Standard Normal (Z) Table

Values in the table represent areas under the curve to the left of Z quantiles along the margins.

Examples: z.5000 = 0.00 (P(Z<0) = 0.5)

z.9750 = +1.96P(Z<1.96) = 0.975

z.0250 = -1.96P(Z<-1.96) = 0.025