MTH u481, Summer 1, 2005, Computer Lab 2

Estimation and testing hypotheses simulation for one-parameter Beta distribution using SPSS.

Part 1. Comparison of Mean Squared Errors for MLE and MOM estimates under one-parameter beta distribution using SPSS.

Introduction.

This lab deals with a one-parameter beta distribution, which is described by the following

pdf:

fX(X) = *x^(-1) , 0 < x < 1. (1)

The maximum likelihood estimate for  is mle = –n / sum (ln (Xi)).

The method of moments estimate for  is mom = E(x)/ (1 – E(x)), where E(x) is the sample mean of the X-sample.

( Both of these estimates were subject of your homework: Problems 5.8.5, and 5.8.12)

1. Find the cumulative distribution function, if pdf is given by (1).

In this lab you have to find out experimentally which estimate for  gives more accurate results.

Procedure.

  1. Open SPSS
  2. Create a sample of size 100 with pdf given in (1) with  = 2 (fX(X) = 2 * x).

a) First fill one column with a 100 1’s, the easiest way to accomplish it is to fill 10 rows with 1, and then select all 10 rows, copy, and paste 10 times, at the end.

b)Now create a 100-sample with uniform pdf: First, select transform menu, compute submenu. Type the name of the new variable in the target variable box,

for example “uniform”. Select RV.UNIFORM function, and move it to the Numeric expression box, by pressing the button with the black triangle below it. Fill the min and max parameters, they are 0 and 1 in this case.

c)Finally, transform the uniform sample to get the sample with pdf = 2x, 0<x<1.

This is based on the following fact: If X is uniformly distributed over [0,1], and F(y)

is the Y’s continuous cumulative distribution function F(y)=P(Y<y), then F(Y) is

Uniformly distributed over [0,1]

(Prove this if you have forgotten the proof given in class).

Inverting this, we obtain the following rule for our case:

To get the sample with pdf = 2x, we need to compute the square root of each sample with uniform distribution. To do this, first, select transform menu, compute submenu. Type the name of the new variable in the target variable box,

for example “bata2”. Select sqrt function, and move it to the Numeric expression box, by pressing the button with the black triangle below it. You need to take the square root of the uniform variable, so copy the variable name (“uniform”) inside

the parentheses of the sqrt function. You can check the pdf of the resulting sample

by looking at a histogram (Select Graphs menu, Histogram option, and choose the beta2 variable).

  1. Compute the maximum likelihood estimate for . (Pretend that you forgot that  = 2, and let us estimate the parameter from the sample. We assume only that the sample has pdf = *x^( - 1), and find  that best fits the data.)

a)The estimate is mle = –n / sum (ln (Xi)). To compute it, first create a new variable with natural logarithm of the sample. (Use the compute variable utility, select ln as a function, beta2 as the initial variable, and “lnbeta2” as a target variable).

b)Now compute the sum of ln(Xi), by selecting Analyze, Descriptive Statistics, Descriptives. Select the lnbeta2 variable. Press the options button, select the sum, and deselect all other parameters to suppress the unnecessary output. (The same variable will be used in Part 2, so store it!)

c)Now you have the sum of logs, n = 100, so use the formula above and compute the Maximum likelihood estimate for . It should be close to 2,

if you have not made any mistakes.

  1. Compute the method of moments estimate for .

The formula is mom = E(x)/ (1 – E(x))

d) Compute the sample mean of beta2 sample. (Select Analyze, Descriptive Statistics, Descriptives. Select the beta2 variable. Press the options button, select the mean, and deselect all other parameters to suppress the unnecessary output.) Now you have E(x). Use the formula above to find the method of moments estimate for . It should be close to 2.

  1. Compare the two estimates to the real  = 2.

In order to get more meaningful results, repeat steps 1 – 3 10 times, and get 20 different estimates for , (10 MLE and 10 MOM estimates). (You don’t have to fill out the 100 1’s every time, since they stay the same).

NOTE: Compute and record the values of the mean for both the uniform and

the beta2 distributions in all 10 trials. Also in all the ten trials compute the sum of the logs for both the uniform distribution, and the distribution with pdf = 2x.

(They will be used in part 2 of this lab. For each trial you should record 4 values

2 means, and 2 sums of natural logs).

Now compute the mean square error (MSE) for your MLE and MOM estimates.

MSE = sum(est - )^2 over all 10 values. ( = 2).

Compare the MSE values for the 2 estimators, which method is better? (has lower mean squared error)?

Part 2. Hypotheses testing.

Introduction

In this part, you will use the results from part 1 to do two alternative tests of

hypotheses.

The null hypothesis: The data is uniformly distributed.

Alternative hypothesis: The data has pdf =*x^(-1) , 0<x<1, >1.

You will use two different tests, one based on means of the samples, and another based on the product of the samples.

  1. Hypotheses testing based on sample means.

You have the 10 means of the uniform samples, and 10 means of the samples with pdf = 2x which corresponds to =2.

The uniform samples have mean = 0.5, and variance = 1/12. Thus the mean of a sample of size 100 has variance = 1/(12*100). Using the CLT, the

z – score is z = (mean - .5) sqrt(1200). Use the 10% significance level to reject or accept the null hypotheses. The critical z-value is 1.28.

Open a new data file, enter the means of your 10 trials into 1 variable, and the means of samples with  = 2into another variable.

b) Compute the z –score for each of the 10 means of uniform samples, and decide

whether to reject Ho or not based on the critical value above. In how many of the samples Ho was rejected?

c)Compute the z-scores for each of the samples with = 2, and decide whether to reject the samples or not, based on the same critical value. In how many samples

Ho was rejected?

  1. Hypotheses testing based on Likelihood Ratio Test.

From the discussion in class, LRT provides the maximal power. Besides, in this example it is uniformly most powerful over all >1. We should reject Ho if

the product of the samples is greater than a critical value, t. But what should t be to make the p-value 10% ? Besides, the product of 100 random numbers from (0,1) is too small for the computer.

Taking the log of both sides, sum of logs of the samples should be greater than ln(t).

Using the CLT, the test becomes whether [sum(ln(Xi))+n]/sqrt n > (ln(t)+ n)/sqrt n (see the class handouts).

To get 10% significance level, (ln(t) + n)/sqrt n = 1.28.

a) Enter the values for the sum of the logs of the 2 distributions into 2 different variables.

a)Apply the test above to each of the sum of logs of uniform samples, and find in how many Ho was rejected.

b)Apply the test above to each of the sum of logs of samples with pdf = 2x, and find in how many of them Ho was rejected.

c). (Optional). Do the same as in b) for =4 and significance level .15.

Remark. We did the simulation to confirm theoretical results. More frequently it is

applied with another aim, for example, when we cannot evaluate the critical

value analytically. Then it can be found numerically with any given

precision from the condition that the frequency of null hypothesis rejections

is as specified. This method is called Monte-Carlo estimation.