Random numbers and stochastic simulations in Excel
- Functions in bold type require that PopTools be installed. When a PopTools function duplicates one of Excel’s built-in functions, the PopTools function is often easier to read and remember and is also based on algorithms that are more numerically robust.
- To generate a new set of random numbers, hit the ‘F9’ key. This replaces all the random numbers in the worksheet and updates any other calculations that depend on them.
Functions for generating random numbers
These are the distributions that you will most commonly be using in this class. There are a variety of other distributions available, both in Excel and PopTools, which use similar syntax. In the “Paste Function” dialog box, look under either “Statistical” or “PopTools random variables” to see what’s available.
Continuous distributions
RAND()Returns a pseudo-random number uniformly distributed between 0 and 1. Either implicitly or explicitly, this is the basis for all the other built-in random number functions.
dRandReal(l, u)Returns a pseudo-random number that is uniformly distributed between l and u. This is the basis (although you never see it) for all the other PopTools random number functions.
NORMINV(RAND(), m, s)Returns a pseudo-random number that is normally distributed with mean m and standard deviation s.
dNormalDev(m, s)Returns a pseudo-random number that is normally distributed with mean m and standard deviation s.
LNORMINV(RAND(), m, s)Returns a pseudo-random number that is log-normally distributed with mean m and standard deviation s.
dLogNormalDev(m, s)Returns a pseudo-random number that is log-normally distributed with mean m and standard deviation s.
BETINV(RAND(), m, s, l, u)Returns a pseudo-random number that is beta distributed with mean m and standard deviation s, and bounded between l and u (the beta distribution has a central tendency, like the normal, but only a finite range of allowable values; it is often used with l = 0 and u = 1).
dBetaDev(m, s)Returns a pseudo-random number that is beta distributed with mean m and standard deviation s, and bounded between zero and one.
Discrete distributions
BINOMINV(RAND(), N, p)Returns a pseudo-random number that is binomially distributed with sample size N and success probability p.
dBinomialDev(N, p)Returns a pseudo-random number that is binomially distributed with sample size N and success probability p.
POISINV(RAND(), m)Returns a pseudo-random number that is Poisson distributed with mean m.
dPoissonDev(m)Returns a pseudo-random number that is Poisson distributed with mean m.
Using PopTools to run a stochastic simulation
Building an Excel spreadsheet to do replicate simulations using built-in tools is tedious process, especially when you consider that instead of 100 replicates, we usually want 1000 or even 10,000! Fortunately, PopTools provides an easier way.
The model
We will use the Ricker model with the parameters given in Table 4.2 of the textbook, applicable to the checkerspot butterfly population. The following figure shows the initial setup for the model (note that I have named the cells that have the parameter values in them, so that the formulas are clearer; the parameter r0 is what is called r in the text – Excel does not allow us to use ‘r’ as a cell name). Cell B12 is the formula for the model. Since we are interested in how long it takes the population to cross a quasi-extinction threshold, we need to create two more columns: one giving the minimum value of N so far, and one just replicating the QE threshold.
Using Monte Carlo analysis in PopTools
Now go to PopTools > Simulation tools > Monte Carlo analysis. Enter the range of the minimum population size into the “Dependent range” and the column of QE values into “test range”. Select “<=” and “Keep results”, and enter the desired number of replicates. Don’t worry about percentiles.
Hit the “Go” button, and the program first does its calculations, and then asks for an output cell. This will be the upper left cell of the output, which has 7 columns and as many rows as your simulation.
CONGRATULATIONS!! You’ve performed your first Monte Carlo analysis.
You now have a bunch of new columns in your spreadsheet. The only one that’s really useful at this point is the one labeled “<=Test1.” In this column are the number of replicates whose minimum population size was less than or equal to the QE threshold at each time step. It’s quite easy to use this to plot a CDF of extinction risk – simply divide by the number of replicates to get the cumulative extinction probability.
You also now have a new worksheet, titled “Monte Carlo results 1”. This contains the simulation results, showing the dependent variable (in this case, the minimum population size) for each replicate simulation. Each replicate has a row, with time running from left to right. You could use these data to construct CDFs for other QE thresholds.
The results are NOT dynamic – if you update the parameters they are not updated. Thus if you want to look at different parameter values or initial population sizes, you will need to re-run the analysis. The new simulation data will be stored in a new sheet, with the helpful name “Monte Carlo results 2”. Thus you should get in the habit of either renaming the sheets with informative names or keeping a careful record of what the parameters were for each analysis.
If you want the simulation output to contain the actual population sizes, rather than the minimum size to date, you will have to re-run the analysis with the column of population sizes set as your dependent range. The summary results now tell you about the mean and variance of population size at each time. You can also use the simulation results to plot time series or histograms of population sizes. For example, select the last column from the simulation results and choose PopTools > Simulation tools > Summary stats to get a histogram.
However, unless your QE threshold is set to zero, using the test column will not be useful, as a population could increase back over the threshold. The way to fix this is to modify your simulation model to force the population to stay extinct if it crosses the QE threshold, using the IF function. For example, if you replaced cell B12 with
=IF(B11D11,B11*EXP(r0*(1-B11/K)+dNormalDev(0,s)), 0)
and carried this down, then the population would drop to zero the turn after it goes below the QE threshold. Then you can use the test column to create the CDF as you did before.
Bootstrapping the data to get confidence intervals
This is a somewhat tedious task in excel. The first step is to generate a set of bootstapped regression coefficients. We are going to do this by setting up a worksheet that uses dynamic resampling and regression tools (both from PopTools), and then use the Monte Carlo tool to replicate the output of this worksheet (the parameter estimates) many times.
Launch the Simple random sample tool from the PopTools→Sampling menu:
The “Sample size” should be the same as the number of rows of data. Be sure to check “Sample with replacement” and uncheck “static result.”
Notice that the resulting output is an “array formula.” You can confirm that it’s dynamic by hitting F9 to update.
Now we want to do a regression of column G on column F, using PopTools→Extra Stats→Regression:
This also creates an array formula, and you can see that when the data in columns F and G are updated, so is the regression results.
The quantities we are interested in are b0 (which is r0 in the model), b1 (which is -r0/K in the model) and Residual St. dev (which is s in the model). We will use the Monte Carlo tool to generate lots of replicate values of these:
The summary output it creates isn’t useful; instead we are interested in the new sheet it creates (“Monte Carlo results x”). Let’s rename that sheet so that we remember what it is:
We can now generate a simulation for each set of parameters – notice that time is now running from left to right rather than top to bottom:
Fill this out to year 10 on the right and all the way down for the 1000 replicates:
Now let’s use the Monte Carlo tool to determine the probability of quasi-extinction by year 10. First create a new column that is the minimum population size over the ten years (so we don’t have populations “recovering” from quasi-extinction. Then use this column as the “Dependent range” in the Monte Carlo tool.
It turns out we are limited by the number of columns, which is 256. This is the limit on the number of replicate parameter values we can use. So, we run using just the first 250 rows:
This runs, and places values in the Monte Carlo results worksheet, but doesn’t generate the summary statistics. This is because, for a few bootstrap replicates, the estimate of b1 is positive (which corresponds to a negative value of K) – this positive density dependence leads to faster than exponential growth, and even within 10 years, the population often diverges to Excel’s version of infinity, generating “#NUM!” as the result… this causes the summary functions to fail. Since this is biologically unrealistic, it is best to go through and delete those rows with positive values of b1.
This run works. Now lets copy the column “<=Test1” to a new worksheet, and sort it in ascending order. Sort that column in ascending order, and convert it to an extinction risk. Also calculate the confidence level associated with each value. These can be used to produce the confidence plot.
What if we want to look at the whole cdf? Well, to do that we need to go back to the stage where we were bootstrapping the regression parameter values. Create a simulation that runs left to right, using the if statement to set it to zero if it ever drops below the QE threshold. Then drag the row down to create 1000 replicate simulations. Now create the CDF using the COUNTIF function (as far as I know, you cannot use a variable name with this, you have to code in “50” directly). Use the TRANSPOSE array function to convert this into a column. This column (normalized to be a probability) will be the dependent variable for a monte carlo analysis.
This is quite slow, as you are doing 1000*1000 100-year simulations!
From the summary output you can see the mean CDF and the 95% confidence intervals. From the new worksheet with the Monte Carlo results you can get all the confidence intervals. The tedious part is that you have to sort each column separately.