Stat 421, Fall 2008, Fritz Scholz

Homework 2, due October 10

Problem 1 (CDF and Density): Execute the following command lines in R and try to understand what each detail in them means by using the documentation (e.g. ?hist):

y=rnorm(30)

hist(y,probability=T,xlim=c(-3.5,3.5),col=c("blue","orange"))

x=seq(-3.5,3.5,.01)

fx=dnorm(x)

lines(x,fx,col="red")

abline(h=0)

Next try

plot.stepfun(y,xlim=c(-3.5,3.5),main=”empirical CDF”,xlab=”y”,ylab=”F(y)”)

CDF=pnorm(x)

lines(x,CDF,col="red")

Now create an R function

CDF.density.plot(mu=100,sigma=5,n=50){ you fill in the rest}

that plots the histogram of a random sample y of size n from a normal distribution with mean mu and standard deviation sigma. Then superimpose the corresponding true density from which you sampled (as was done in the example above, you just have to make small changes to incorporate mu and sigma, see ?dnorm).

In a separate plot show the empirical cumulative distribution function (CDF) of the same sample (as illustrated by the step function in the second code snippet above) and superimpose the corresponding true CDF from which you sampled.

Arrange it so that the two plots appear on the same page using the command par(mfrow=c(2,1)) at the start of the function body(see ?par). Make sure that the two plots use the same specifications for xlim in the call to hist and plot.stepfun.

Provide your function code and two sample plot pages, one with n=40 and one with n=75.

Problem 2 (Hypergeometric and Binomial Distribution): Read the documentation for sum, sample, Binomial and Hypergeometric.

a. Figure out how to use sample to randomly select without replacement a hand of 13 cards from a deck of 52 cards, numbered 1, 2,…,52. Denote this hand by the vector x.hand of length 13 and give the R expression that produces x.hand. Let hearts.hand be the number of hearts in that hand, where the numbers 1,2,…,13 represent the hearts in the deck. Give a simple vectorized expression for hearts.hand using the function sum applied to a logic vector derived from x.hand (recall that arithmetic expressions involving logic values T or F will treat them as 1 and 0, respectively).

We can view hearts.hand as a hypergeometric random variable with which possible values? If we had sampled our 13 cards with replacement, then hearts.hand would be a binomial random variable with which distributional parameters n and p and with which possible values? In either case we could have realized a random value for hearts.hand by using rhyper or rbinom by using which command expression?

b. Write a function fun.hyper.bin(Nsim=10000,repl=T){you fill in details} that does the following: Using a loop of length Nsim it generates a vector hearts consisting of Nsim values of the above type hearts.hand generated by using the sample function as requested above. In using sample you would specify the optional argument replace=repl. In running this loop do the proper intitialization of hearts prior to the loop, using what you learned about efficiency in lab 2.

Outside of this loop also generate an equivalent random vector hearts.0 of length Nsim, using rhyper or rbinom, respectively, depending on the value of repl (use an if-else construction).

Do this in a single, vectorized call, not in a loop. The vectors hearts and hearts.0 are equivalent in the sense that their histograms should be very similar. Inside this function fun.hyper.bin construct such histograms by specifying the buckets via the optional argument breaks = seq(-.5,13.5,1) to hist. Construct both histograms on the same page by issuing the command par(mfrow=c(2,1)) before invoking the first histogram plot. Annotate the histogram plots to tell which is which and indicate hypergeometric or binomial sampling. Now add points to each histogram that indicate the expected number for each possible value y of hearts.hand. Using the points function with pch=16 add points for both hypergeometric (red) and binomial (blue) sampling on each histogram and using legend add a legend explaining these two types of points.

If py is the probability of y, how many such y-values would you expect in Nsim simulations? Here you get py either from dhyper or dbinom. Give your code and the plots resulting from fun.hyper.bin(Nsim=10000,repl=T) and fun.hyper.bin(Nsim=10000,repl=F).

Discuss the relationship of the points w.r.t. the histograms.