Assignment #1: Probability and Data Description (Due Feb. 2)
1. A stretch of DNA has only A, T, C, and G in equal proportions.
a. How many different 10 base pair segments are possible (assume that the order matters)?
b. What is the probability of randomly drawing 10 As in a row?
2. Some regions of the human genome are GC-rich, which means that they have a high fraction of G's and C's. If the proportions of nucleotides 40% G, 40% C, 10% A, and 10% T, what is the probability of drawing a sequence (in order) of GCATTGCCG?
3. A population of fruit flies is polymorphic (has more than one form) for eye colour (white or red) and for body colour (yellow or brown). The true numbers of flies in this population are 10000 yellow/white, 20000 yellow/red, 50000 brown/white, and 50000 brown/red. (Note that these are the numbers of the total population, not just of a sample.)
a. Construct a frequency table for eye colour and a frequency table for body colour.
b. Are eye color and body color independent?
c. If a fruit fly has red eyes, then what is the probability they also have a brown body?
4. The following data are the number of feeding strikes (per hour) for a random sample of bluegill feeding on Daphnia.
6.8, 7.8, 8.7, 9.3, 9.1, 8.5, 6.3, 6.8, 7.8, 7.2, 7.6,
9.4, 8.0, 7.8, 8.6, 7.3, 8.1, 7.4, 8.8, 7.0, 7.3, 8.2
7.2, 7.5, 7.5, 8.1, 7.9, 8.3, 6.9, 7.3, 7.1, 8.6, 9.6
8.3, 7.7, 6.9, 7.3, 7.3, 6.7, 8.4, 8.9, 8.2, 7.1, 7.8
Construct a frequency table. (Note: you should first decide on the number of ‘bins’ needed. In general, samples with fewer than 40-50 shouldn't be given more than about 10 ‘bins’, otherwise too few frequencies per bin will result. Samples of a few thousand can use more than 20 bins).
a. Draw a histogram for these measurements and describe the shape of the distribution.
b. Calculate the sample mean.
c. Calculate the sample median.
d. Calculate the sample standard deviation.
e. What fraction of the values in the data set lie within one standard deviation of the sample mean? Within two standard deviations?
5. Suppose some medical experts have developed a new test for tuberculosis. They know that 6% of the population have the disease. Further, they know that if an individual has the disease, then they will test positive 97% of the time, and if the individual does not have the disease, then they will test negative 99% of the time.
a. If an individual tests positive, then what is the probability that they actually have the disease?
b. If an individual tests negative, then what is the probability that they do not have the disease?
6. In the Canadian population, it is found that 40% of residents have been to the USA, 14% have been to Mexico and 46% have been to the USA or Mexico.
a. If we randomly select a Canadian resident, what is the probability that they have been to the USA and Mexico?
b. If we know that a Canadian resident has been to Mexico, then what is the probability that they have also been to the USA?
c. Are the events ‘been to USA’ and ‘been to Mexico’ disjoint? Are they independent? Justify your answer.