Day 4 – Maxent
Exercises
In the exercises below, you’ll explore the variety of outputs that can emerge from different assumptions about Maxent. At first glance, this may appear to discourage the use of Maxent, because predictions can differ greatly depending on assumptions. But it’s important to recognize that all models have assumptions, and that these are easily overlooked. These exercises are meant to get you thinking about your assumptions and evaluating them.
Perform these analyses and answer the following questions after you’ve gone through pages 1-25 of the Maxent tutorial. If you’re uncertain of the function of any of the settings, check the help file accessible on the bottom right corner of the GUI. The table at the end of the help file contains brief descriptions of each option.
For the following exercises you’ll be using data for Japanese barberry (Berberis thunbergii) from the IPANE database. The presence file and environmental grids can be downloaded from the course website.
http://web2.uconn.edu/cyberinfra/module3/outline.html
Unzip the file and place in your home directory (Maxent sometimes freaks out if path names are too long, so don’t bury it in some obscure folder). When you load the presence file, NE_pres.csv, you’ll notice that a number of species are available. Deselect all except BETH, BErberis THunbergii.
When you load the environmental grids (recall that you only tell Maxent the folder where they’re located, not each grid), you’ll see 11 environmental predictors. All grids are at 5’ resolution (roughly rectangles with sides 7-8 km). Five are climate data from Worldclim, with obvious naming. The other six are land use/land cover (LULC) attributes, defined as the proportion of each landscape type in the 5’ grid cell. Importantly, these variables do not represent the habitat where the presences are observed; they describe the composition of the landscape surrounding the observation. As such, it’s unclear (to me) how important each of these predictors might be. They’ll certainly exhibit some correlation with habitat variation (i.e. if BETH likes forest and a cell is 79% forest, you’d expect that there’s a high probability of occurrence there). But these landscape variables might have other interpretations. For example, they could serve as indicators of dispersal if BETH’s is bird-dispersed and these birds tend to avoid certain landscape features. In summary, I’m not sure how important these LULC classifications should be – perhaps we’ll find out below.
P.S. You may get some warnings from Maxent that some of the grids have missing data; just click ok and these points will be omitted from the analysis.
P.P.S. You’ll notice when clicking on the Maxent interface that you can select or deselect the option that is nearest to your click. For example, if you just click arbitrarily in the samples box, you’ll select the species that on the nearest horizontal line, even if you’re nowhere near the species name. This can happen often if you’re jumping among many windows. Be careful with this to avoid screwy output.
A. Maxent’s output formats
Background
Below is a description of the three different formats of Maxent’s output: raw, cumulative and logistic. These are excerpts directly from the literature.
The primary output of Maxent is the exponential function q (x) that assigns a probability (referred to as a ‘‘raw’’ value) to each site used during model training. Raw values are not intuitive to work with though: in particular, it is hard to interpret ‘‘projected’’ values obtained by applying q to environmental conditions at sites not used during model training. Raw values are also scale-dependent, in the sense that using more background data results in smaller raw values, since they must sum to one over a larger number of background points. For these reasons, raw values have generally been converted into the ‘‘cumulative’’ format (Phillips et al. 2006).
The cumulative format is defined in terms of omission rates predicted by the Maxent distribution ql. Specifically, we consider 01 prediction rules that threshold raw outputs at a level p. Each raw threshold p is transformed into the omission percentage c(p) predicted by ql for the corresponding rule, i.e.
Therefore, if we make a 0-1 prediction from the Maxent distribution ql using a cumulative threshold of c, the omission rate is c% for test sites drawn from ql. The cumulative format is scale-independent, and is more easily interpreted when projected, but it is not necessarily proportional to probability of presence.
For example, consider a generalist species whose probability of presence is close to 1 across the whole study area, with slight variations that avoid ties. Since the probability values are similar across the entire region, the cumulative values of individual sites will be roughly proportional to their rank, and hence they will range evenly from 0 to 100. Thus, big variations in cumulative value do not necessarily represent big variations in suitability or probability of presence.
We therefore introduce a new logistic output format that gives an estimate of probability of presence. Let z denote a vector of environmental variables, and let z(x) be the value of z at a site x. Traditional statistical methods such as logistic regression estimate P(y=1|z), the conditional probability of presence given the environmental conditions, which is related to the quantity we estimate…[technical details omitted]’ through the prevalence, P(y=1). ‘Prevalence’ indicates the ubiquity of a species across a landscape and can be defined as the proportion of sites in which it occurs. ‘We may not know or be able to estimate P(y=1), since this quantity is not determinable from presence-only data (Ward et al. 2007).’ [From Phillips and Dudik, 2008, Ecography]
‘Because the required information on prevalence is not available for calculating conditional probability of occurrence, a work- around has been implemented (termed MaxEnt’s ‘‘logistic’’ output). This treats the log of the [raw] output … as a logit score, and calibrates the intercept so that the implied probability of presence at sites with ‘‘typical’’ conditions for the species … is a parameter tau. Knowledge of tau would solve the non- identifiability of prevalence, and in the absence of that knowledge MaxEnt arbitrarily sets s to equal 0.5. This logistic transformation is monotone (order preserving) with the raw output. [From Elith et al, 2011, Div & Dist].
Note that the raw, cumulative and logistic formats are all monotonically related, so they rank sites in the same order and therefore result in identical performance, when measured using rank-based statistics such as AUC (Fielding and Bell 1997). However, their predictive performance will vary when measured by statistics that depend on actual output values such as Pearson’s correlation (Zheng and Agresti 2000). [From Phillips and Dudik, 2008, Ecography]
Analysis
Run three models for BETH using all 11 predictors with only linear features selected. Deselect jackknifing and response curves, as these will just slow things down. For each model, select one of the three output formats (on the bottom right of Maxent’s GUI, under ‘Output format’). Look at the predicted range maps based on each of these output formats. Note that each time you run Maxent, it will overwrite files for the same species. This is a little annoying, but your alternatives are to run Maxent from the command line and process the output in R or make a whole bunch of folders manually beforehand. You may want to put your output in different folders for each model for today’s purposes. To make it easier to access your results, I’ve included the output you should get below the questions. However, I haven’t labeled which is which – match your output to mine and label these below. Explore these maps and briefly answer the following questions.
1. How can you interpret the values in the legend of the raw output? If you had to turn this map into binary output (a presence/absence surface), what threshold value looks appropriate (just by eyeballing it)?
2. The cumulative plot seems to identify more suitable habitat (red) than the raw plot. Why is this? What does imposing a ranking on the predictions, as cumulative output does, do to the differences among cells?
3. Which of the 3 outputs seems to make the most conservative predictions about suitable habitat? Why?
4. The default coloring used for the cumulative and raw plots is on a log scale by default. Try rerunning the model for cumulative and raw output, but with the colors for the range map on a linear scale. You can change this by choosing ‘Settings’, clicking on the ‘Experimental’ tab, and deselecting the box entitled ‘Logscale raw/cumulative pictures’. Does this change you opinion about which output format suggests the most suitable habitat? Does your answer depend on which colors you associate with suitable habitat?
5. List 1 advantage and 1 disadvantage of each type of output. These may either be very general or specific to the species maps below.
Raw:
Cumulative:
Logistic:
In summary, it is important to note the inherent differences in the way model output is represented (in Maxent or any model). Each type of output has advantages and disadvantages, which the user must weigh based on the questions he or she is asking.
A. Model___________________________ B. Model____________________________
C. Model___________________________ D. Model____________________________
E. Model____________________________
B. Number of background points
Background
Background points represent the variation in environmental covariates available to the species. Background points are a random sample from the landscape (or sometimes the whole landscape). In contrast to the usage of the term ‘background’ in the presence/absence modeling literature, Maxent does not assume that background points are pseudo-absences. Background points are used to fit the model and estimate the coefficients associated with each covariate. As you might imagine, using different suites of background points can influence predictions. We’re still trying to figure out what the right way to select background points is, and Elith et al (2011, Div & Dist) have demonstrated the importance of this choice. Normally, this argument relates to how large the area is from which we sample the background points. Should a model for our IPANE data select background points from New England, the United States, or the whole world? We’ll explore some consequences of different background samples by choosing samples of different sizes. Below are some excerpts from the Elith paper.
‘The distribution of covariates in the landscape is conveyed by a finite sample – a collection of points from [the landscape] with associated covariates, typically called a background sample. These data may be supplied in the form of grids of covariates covering a pixelation of the landscape; as a default MaxEnt randomly samples 10,000 background locations from covariate grids, but the background data points can also be specified (see Yates et al., 2010 and case studies below) and grids are not essential (case study 2). Note that the background sample does not take any account of the presence locations – it is simply a sample of [the landscape], and could by chance include presence locations. Using a random background sample implies a belief that the sample of presence records is also a random sample from [the landscape]. We deal later with the case of biased samples.’
‘
‘The advantage of limiting background to local, reachable areas … is that contrasts between occupied and unoccupied environments in the local area are the model focus, and – particularly with fine-scale environmental data – differentiation useful at the management scale might be achievable. It is also likely to be the most ecologically realistic choice for many locally restricted species. On the other hand, if models are to be projected well outside the local geographic area, use of local backgrounds brings with it the penalty that prediction to other areas is likely to involve considerable extrapolation. Some trade-off is clearly required.’
Analysis
Run three models for BETH with only linear features. Select your output type as cumulative (this is arbitrary). Deselect jackknifing and response curves, as these will just slow things down. Under ‘Settings’, ‘Basic’, choose the ‘Max number of background points’ to be 10, 200 and 10,000 (default). Determine which of the following figures below corresponds to each model.
A. # Background points__________________ B. # Background points____________________
C. # Background points_________________ D. # Background points_____________________
E. # Background points_____________________
1. Which model (# of background points) predicts the largest range of BETH?
2. The number of background points determines the range of environmental conditions that Maxent perceives as available for the species. If unrepresentative background points are used to train (fit) the model, and the model is projected across a landscape, it is likely that sites will be encountered during projection that are outside the range of environmental covariates encountered during model training. I’ll call these no-analogue sites. Maxent allows you to chose whether or not to project onto no-analogue sites by selecting, ‘Settings’, ‘Advanced’ tab, ‘Extrapolate.’ Run the model with only 10 background points and DEselect ‘Extrapolate’. Indicate which figure this is above. What type of habitat requires extrapolation in this data set? Why might one avoid extrapolation?
3. If you choose to extrapolate (Maxent’s default), you can also choose how to project onto these no-analogue sites. The default option is to use ‘clamping’, which sets the values of environmental variables in no-analogue sites to the nearest value used during training. For example, if your no-analogue site has higher mean annual precipitation (MAP) than any sites used during training, then the MAP of this site will be clamped to the max training MAP during projection. All the examples you’ve run so far have employed clamping. Turn off clamping (under ‘Settings’, ‘Advanced’ tab) and rerun the model with just 10 background points. Indicate the appropriate figure above. How does it differ from the clamped, extrapolated predictions (fig. A above)? When might clamping vs. not clamping have a stronger effect on predictions?
4. Focus on the unsuitable regions in blue. Now that you know that clamping was used for the first three models run in this section (before you started fooling with clamping and extrapolating), can you explain why the model with 10 background points predicted less unsuitable habitat than the one with 200, and why 200 found less unsuitable habitat that 10000? Can you see why it’s important to know what you’re assuming?
C. Different feature classes
Background
The following is taken from Phillips et al, 2006, Ecological Modeling. The distribution π referred to below is the predicted distribution.