Day 3 (Databases and Their Design,Part II)

Day 3 (Databases and their design,Part II)

This course session will explore aspects of how biological databases are managed. Whereas the focus on Day 2 was personal specimen databases, the focus here will be on more expansive databases and placing the biological data in context.

The lectureswill include demonstrations/presentationsof database management systems(of a number that exist) that are designed to handle collections or groups of collections e.g., KE EMu, Biota, BG-Base. We will explorechallenges in building and implementing databases at these levels, using KE-EMu as the example, including scale (e.g., ramping up to millions of specimens) and human issues (e.g., cooperation among differentdepartments within a museum). In this context we will also touch upon rapid digitizationefforts (e.g., wholescale herbarium sheet imaging, voice-activated data acquisition) to address large backlogs of museum specimens that have yet to be databased.

Although we have been examining biological specimen databases, the specimens themselves live in a richerenvironment of related informationand activities centered around them that grows over time. Examples include accession and curation history, taxonomic reidentifications, location tracking, exhibition, requests for loans and analysis, among others, and optimally this related information follows the specimens electronically. We will examine how database management systems capture these “curatorial life cycle” activities, and other aspects of long term data stewardship.

For the balance of the sessionwe willlook at efforts to federate online access to multiple databases across particularbiological disciplines including e.g., VertNET for vertebrate zoology,Paleoportal for paleontology, SEINet for plants, etc. Affiliated issues include database availability, update frequency, caching, associated multimedia, and the importance of georeferencing.

A leap further afield is the integration of biological specimen databases with seemingly “dissimilar” repositories so that biological metadata appear on even broader stages (e.g., in Library catalogs, Digital Asset Management systems). As an example we willalso look at a service atYale University that allows for cross-domain searching of its various museums and libraries, in which Darwin Core and other standards (e.g., CDWA, DC, MODS) were mapped to one another.

Exercises (90 minutes):

Each student should already have his/her database of specimens largely completed, but if not, some time will be allocated at the start for this. Students will then go to one or more internet-accessible databases, search for and download some additional records of interest, and import these into their databases.

At this juncture, the class will be subdivided into several teams. We will then mimic on a small scale the process that a multidisciplinary natural history museum would typically go through when it organizes and posts data to internet portals. Think of each team as a separate curatorial department in that museum.

First, each student will edit his/her database to conform to a defined set of fields in a defined order (subset of Darwin Core). Each team will then be responsible for merging into a single aggregate database all its students’ databases. These aggregate databases will then be uploaded and appended to Yale’s existing internet providers that serve GBIF and other portals, so that the class data can be queried and examined in real time on the internet alongside specimen data from other institutions worldwide. We will remove the class data from the providers at the end of the session.

At the end of the exercise, approximately 30 minutes will be devoted to preparing for the homework for the georeferencing workshop that takes place on Day 5.

Resources needed:

Software from prior sessions

Computer with web browser

Links:

Simple Darwin Core:

KE EMu:

BG-Base:

Specify:

GBIF Portal:

VertNET:

ORNIS:

MaNIS:

FishNet:

HerpNET:

Paleoportal:

SeiNET:

PNW Herbaria:

Rocky Mountain Herbaria: