ESS 221 Tutorial 3

Getting GBIF Data, Preparing Data for DIVA, and Import into DIVA

Tutorial Purpose: (1) Gain familiarity with GBIF, create a use account, and download species occurrence data for the appropriate species – (2) Modify GBIF download to prepare for import into DIVA and (3) import into DIVA for a final shapefile

By the end of this tutorial you should have in your ‘Species Data’ folder of the ‘Video Project’ directory (established in Part 1):

1)A saved original copy of your GBIF .csv file

2)A saved original copy of your GBIF data in .xlsx or .xls file format

  1. This spreadsheet should have all records and all columns of the original data

3)A saved modified copy of your GBIF data in .xlsx or .xls file format

  1. This final version should include a limited number of columns, have at most 1000 records, and have properly labeled column headers in the first row of the spreadsheet.

4)A saved modified copy of your GBIF data in .txt tab delimited file format

  1. This final version should include a limited number of columns, have at most 1000 records, and have properly labeled column headers in the first row of the spreadsheet.

Part 1: Downloading species data from GBIF

Each student should have a partner (1) and have selected and reserved a plant species. Partners were assigned on 1 March; plant species were due 8 March.

In order to download data from GBIF, you must for have a user account. User accounts are free and only require a valid email.

Steps for creating a GBIF user account

1)Using your favorite browser go to (GBIF - Global Biodiversity Information Facility)

2)Select to ‘Create a user account’ at the top right.

3)Fill out the prompts with appropriate information. The email address listed must be valid.

4)A confirmation email will be sent to you. Open the email and follow the link to activate your account.

Steps for downloading data

1)Launch

2)Navigate to and hover over “Data” in the top right

3)From the drop down menu, click on “Explore species”. It may seem like you should use “Explore occurrences”but it is necessary to check that you have the right species, subspecies or variety before investigating occurrences.

4)Enter the species you have chosen in the search block and click search: e.g. Arachis hypogaea L. the peanut or groundnut; and press “Search”

  1. The site will return a list of species names which correspond. It is import to check that you have chosen the correct species, subspecies or variety for your assignment as in crop plant a single species may have been bred into numerous very different cultivars or a wild type of a speciesmay have now have no significance as a food crop.

5)When you are sure you have the correct species/subspecies, select that species (click on the blue species name from the list:

  1. For example: Ipomoea batatas

6)Check the number of occurrence records under “Georeferenced data”.

  1. Biodiversity georeferenced data are data where each record of a species (a specimen in a collection or an observation usually others as well) has a latitude and longitude which places it on the surface of the earth as a point. It may have been recorded by the collector directly or calculated from the description provided by the collector.) You preferably want over five hundred georeferenced records.There are many common species withonly few records should this be the case choose another species for your assignment.
  2. If the number of georeferenced records is sufficient click on the “All [number of records]” link under “VIEW RECORDS”.

7)This will take you to the “Search occurrences” page. Select the large “Download” buttonat the top right. Be sure to download as “Simple CSV” – this will output a tab delimited table file compatible with Microsoft Excel.

  1. You must be logged in to download GBIF data.

8)You will receive an email with a download link.

9)Select the link in the email.

  1. Your occurrence data will down load as zipped file containing the georeferenced specimen or observational records GBIF has for your crop species.

Unzipping your data

1)Create a project folder on your computer or your memory stick designated for Video Project files only

  1. This directory should be organized – consider creating the Video Project folder in the same location as the Tutorial 2 data files for Climate Data and GIS Data.
  2. The remainder of this tutorial will assume this file structure:
  1. This organizational structure is not required; HOWEVER, it is critical to stay organized in your data and files to complete this project.

2)Find your downloaded zip file acquired from GBIF

3)Right click on the zip file.

4)Select to “Extract files…”

5)An ‘Extraction path and options’ window will open

6)Extract the file to the ‘Species Data’ folder in your ‘Video Project’ direcrory:

Part Two: Importing the species occurrence data into excel and creating a GIS useable data set.

The GBIF data you have downloaded and unzipped is a single CSV file. This file is not ready to be imported into DIVA GIS – the data format must first be changed and insignificant data should be removed. Follow the steps below to ready the data.

Steps for importing the occurrence dataset into excel

  1. Open a new Excel document
  2. Click on the data tab in the top menus
  3. Select the “From Text” button within the “Get External Data” submenu
  4. An ‘Import Text File’ window will appear
  5. Navigate to the CSV file you have just unzipped and click ‘Import’
  6. The “Text import wizard” will appear as a popup.
  7. Choose the “Delimited” option on the first screen click “Next”
  8. Check “Tab” on the second screen for tab delimited data click “Next”
  9. Don’t do anything on the third screen and click “Finish”
  10. Accept the import data defaults by clicking “Okay”.
  11. The data should appear from cell A1 with the column headings in row 1

Save a copy of the data

Save and make a copy of the excel spread sheet you have just generated. Having a backup of this table is critical in case you make errors or later need a copy of the original to roll back to.

1)Save the excel spreadsheet twice; both times the files should be saved in the default .xlsx or .xls file formats:

  1. First as the original
  2. Second as a workable spreadsheet.

Understanding the data

The data table you have generated contains information related to occurrences of your particular species. Each occurrence and all its associated information is contained in a single row in the table (one record). The columns of each row correspond to particular information about that occurrence. For example, the decimallatitude column indicates the latitude in decimal degrees of each occurrence.

Delete unnecessary columns

THERE ARE A LOT of columns or fields in the download and hopefully as many rows or records as you were promised by GDIF. Along with each location (latitude and longitude) of the species record is a large amount of taxonomic and collections data. We will first have to clean this up.

You will now tidy up the data set to make it more usable for GIS modeling we are going to do in the DIVA GIS program.

1)Ensure the following columns are maintained

  1. gbifid
  2. decimallatitude
  3. decimallongitude
  4. countrycode

2)All other columns can be deleted; these might include but is not limited to columns related to depth, author, identified by, taxonomic information, data set key, and occurrence id.

3)Click on a column header letter (A, B, C… etc.)

4)Right click and select ‘Delete’

Renaming columns

All column headers (text in row 1) must be less than 10 characters long and cannot contain spaces. Rename all column headers to informative headers that meet these requirements; for example, decimallatitude could be changed to declat.

Limiting data to 1000 records

This step will only apply to groups that have more than 1000 occurrences of their species.

DIVA is capable to handle only about 1000 records of occurrence data. If you have many thousands of records you may have to make a subset choosing 1000 records (or rows)to work with.

1)Navigate to the 1002 row of the spreadsheet

2)Click the 1002 row header on the left so that the whole role is highlighted

3)With that row hightlighted, press Ctrl + Shift + Down Arrow at the same time.

  1. This will highlight the whole of the table for all records more than 1000

4)In the blue highlighted cells, right click

5)Select ‘Delete’

Part Three: Saving your work

You should save this final version of your spreadsheet in two file formats.

This final version should include a limited number of columns, have at most 1000 records, and have properly labeled column headers in the first row of the spreadsheet.

First, save your spreadsheet as an .xlsx or .xls worksheet. Traditional .xlsx and .xls worksheets are easy to open and manipulate. We are saving in this format in the event that the data needs to be further modified.

Second, save your final spreadsheet as a tab delimited text file (.txt). This file will be ultimately imported into DIVA GIS.

Part Four: Import into DIVA

1)Launch DIVA GIS

2)Add the countries shapefile to the map – this shapefile was given to you in Tutorial 2

3)Adding our species points onto the map

  1. Select to Import Points to Shapefile – From text file (.TXT) from within the Data tab in the top menu:

  1. By importing these points, a shapefile has automatically been created. Thiss shaefile has been created within the same directory as the initial TXT file was help – the shapefile takes the same name as the TXT file
  2. In the future, you can add this shapefile directly onto the map rather than having to keep importing from the TXT file

1