Projecting the Presence of Pecans:
Mining Environmental Data to Identify Ecological Limits of Species Distribution
Amber Johnson, Sarah Kemick, & Brian White
WHAT WE DID – A STEP BY STEP OUTLINE
Step 1: Identify weather stations that lie within boundaries of the distribution of interest and record in a data file.
- This can be done “by hand” (like we did Thursday before the MyWorld GIS presentation)
- Observing an overlay using station numbers to identify stations which fall within the distribution..
- This took two people a couple of hours to turn into a data file.
OR
- This can be done using MyWorld GIS software.
- Construct a project with a shape file for the distribution of interest and the locations of data points with environmental data.
- Analyze the relationship of the two layers to return values from the distribution layer to a file. [File can be exported and used in Excel.]
- Our pecan data linked to our weather station locations in two quick steps using MyWorld GIS. [<30 minutes from installing software to result table!!!]
Step 2: Convert file to .arff type (for description see links on project site) for use with Weka data mining software.
Step 3: Select model type and run Weka data mining software –
- Try different algorithms to find best fit
- Consider which algorithms are most informative
- Save “classifier” [model] once it is trained on the input data
- Apply classifier to data [will cross-validate leaving out 10% of the file at a time] and save predictions to the file.
- Output file will be a comma separated value file.
Step 4: Attach map latitude and map longitude to model predictions in the comma separated value (CSV) file.
- Have a small file handy with site names, latitudes and longitudes for mapping which can be easily pasted into the output file.
- Remember S Latitude and W Longitude should be negative values to map in a coordinate system. I usually call these variables MapLat and MapLong.
Step 5: Open MyWorld GIS
- In the Construct Mode:
- select layers from among included maps
- Maps exist at many scales – world, country, state level maps are easily accessible; you can find many others for free on the web. MyWorld GIS comes with many maps and datasets.
b)In the FILE menu, IMPORT LAYER FROM FILE to add data from a CSV file.
c)Drag layers up in the construct list to place them on top of other layers.
d)Pick the column of data by which to mark points.
e)Adjust the color scheme and number of colors showing as necessary.
- In the Visualize Mode:
- You can select a tool that displays information from a table for any point on which you click – this data can be accumulated in a single file and exported.
- In the Analyze Mode:
- You can query the data to collect a list of stations, states, etc. with some property.
- Plus much, much more we haven’t had time to explore yet!
WHAT WE HAVE TO SHOW FOR IT
- Several models that use environmental variables to project the presence or absence of pecans.
- Even with our limited knowledge of plant physiology and community ecology, these models are fun to think in terms of what they tell us about the niche of the pecan tree. [Example to follow.]
- Could be used to compare model strategies or to investigate ecological settings.
- Maps with flexible options for what information is showing at any given time.
- A few nascent ideas about how to develop the use of these data and tools into curricular materials. [Please help us imagine further uses by providing feedback!]
- Greater knowledge and confidence using Data Mining and GIS tools.
AN EXAMPLE OF A MODEL
The Weka software can fit data distributions using many different types of algorithms from nearest neighbor to logistic models to decision trees. For this example, we focus on a rule based classification model (JRip to those who know/ care).
The output looks like this:
JRIP rules:
======
(MWM >= 26.5) and (BAR5 >= 14.6915) and (PTOAE >= 1.1925) and (ELEV <= 300) => Pecan=1 (82.0/4.0)
(AE >= 652.4943) and (PTOAE >= 1.1295) and (WATDGRC <= 3) and (WRET >= 104.8334) and (ELEV <= 625) => Pecan=1 (72.0/4.0)
(MWM >= 24.6) and (CVRAIN <= 44.3185) and (WSTORAGE >= 181.796) and (ELEV <= 1030) => Pecan=1 (165.0/50.0)
(MWM >= 24.3) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (PTOWATR >= 10.8738) and (Site <= 1517) => Pecan=1 (51.0/4.0)
(AE >= 622.0895) and (COKLM >= 506.9) and (EXPREY <= 520.5728) and (PTOWATR >= 8.7045) => Pecan=1 (59.0/13.0)
(MWM >= 24.8) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (RLOW <= 46.74) => Pecan=1 (52.0/24.0)
(MWM >= 27.22) and (RLOW >= 71.88) and (EXPREY <= 439.1472) and (WRET >= 102.7854) and (TEMP <= 56.0959) => Pecan=1 (15.0/1.0)
(MWM >= 27.44) and (CVRAIN <= 34.6388) and (WSTORAGE <= 161.2) => Pecan=1 (77.0/37.0)
=> Pecan=0 (4064.0/52.0)
Number of Rules : 9
Time taken to build model: 69.58 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 4394 94.7595 %
Incorrectly Classified Instances 243 5.2405 %
Kappa statistic 0.7153
Mean absolute error 0.0744
Root mean squared error 0.2155
Relative absolute error 39.4775 %
Root relative squared error 70.2129 %
Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.974 0.275 0.968 0.974 0.971 0
0.725 0.026 0.765 0.725 0.744 1
=== Confusion Matrix ===
a b <-- classified as
4040 109 | a = 0
134 354 | b = 1
WHAT DOES THAT MEAN?
Let’s look at one of the 9 rules in detail:
(MWM >= 26.5) and (BAR5 >= 14.6915) and (PTOAE >= 1.1925) and (ELEV <= 300) => Pecan=1 (82.0/4.0)
MWM is the Mean Temperature in the Warmest Month (C)
BAR5 is the Biomass Accumulation Ratio
- This is the amount of net above ground productivity added to standing biomass each year.
- Higher values indicate areas where we would find rapidly growing forests, low values could be slow growing forests or grasslands.
PTOAE is the ratio of Potential Evapotranspiration to Actual Evapotranspiration
- higher values mark warmer/ drier settings where precipitation is not high enough to match PET
ELEV is the elevation of the weather station in feet.
So, this statement can be read as:
Where the Mean of the Warmest Month is greater than or equal to 26.5 deg C, and where Biomass Accumulation Ratio is greater than or equal to 14.69, and where the ratio of Potential to Actual Evapotranspiration is greater than or equal to 1.19, and where the Elevation is less than or equal to 300 feet, expect to find pecans.
In other words, pecans are found in warm locations where a moderate amount of the productivity accumulates as standing biomass (think tree trunks, branches, etc) in environments on the dry side and at low elevations.