ROC and Bootstrap Example

ECES 680

Statistical Pattern Recognition

Fall 2003

We compare the ROC Az values and confidence intervals obtained by Monte Carlo and by bootstrap. The Monte Carlo values are the ‘true’ values, except for finite size of the MC sample. The bootstrap values are based on a single set of observations. We find that the true value of the Az, as found from the Monte Carlo method, is within the bootstrap confidence interval.

Background:

See D. Bamber, “The Area above the Ordinal Dominance Graph and the Area below the Receiver Operating Characteristic Graph,” Journal of Mathematical Psychology, vol. 12, pp. 387-415, 1975. for information about the ‘empirical ROC.

We construct the ROC for two exponentially distributed variables: the ‘negative’ has mean zero, the positive has mean a. The ROC curve has the equation

y = x1/a,

with Az = a/(1+a). On the next page we show a sample plot.

Monte Carlo Trials

We generate 50 pseudorandom exponential RV’s with mean 1, and 50 pseudorandom RV’s with mean 5. The ROC expected value is 5/6 = 0.833, with a rule-of-thumb standard deviation equal to sqrt(Az(1-Az)/min(n1,n2)) = sqrt(0.167*0.833/50) = 0.0527. The standard error on the mean Az is 0.0527/sqrt(200) = 0.0037. Assuming that the distribution of the mean in Gaussian, the 95% confidence interval is mean ± 1 .96 sigma and has value [0.8260, 0.8406].

We run 200 repetitions of a random trial that generates these variables and the ROC. The average Az over these trials is 0.8325 and is within the 95% confidence interval computed above. The standard deviation is 0.0410. The 95% confidence interval, computed from this mean and standard deviation and a normal assumption, is

The obtained Az values are sorted and plotted below as an estimate of the cumulative distribution of Az estimates. We estimate the 95% confidence interval by sorting the Az estimates and finding the 5th smallest (5/200 = 2.5%) and the 195th. These are [0.7348, 0.9096]. This is the confidence interval on a single set of observations. The confidence interval is not symmetric – this reflects compression on the right of the data. The 95% confidence interval computed on the basis of the rule-of-thumb standard deviation and Gaussian assumption is [0.7521, 0.9129]. We see that there is a substantial overlap between this and the Monte Carlo confidence interval.

Monte Carlo estimates of Az. 200 Monte Carlo trials. Expected values

Bootstrap trials.

One set of pseudorandom numbers was used to estimate the ROC area variation by generating bootstrap samples. The mean (over the trials) is 0.7767, the standard deviation of the bootstrap trials is 0.051, and the 95% confidence interval is [0.660, 0.850].

The mean is different from the MC, but within the 95% confidence interval of the true mean. Furthermore, the true mean is within this confidence interval.

Methods

The Monte Carlo estimates were computed with function mcROC. The bootstrap estimates were computed with function bootROC. These call functions empir_roc2 to obtain the ROC curve, atrap to compute the area, and rande to obtain exponentially distributed random variables. All these are file boot_trial.zip.