Long-term Biological Monitoring: Data Management - Electronic Supplementary Material

Importance of Data Managementin a Long-term Biological Monitoring Program: Electronic Supplementary Material

______

Further discussion of contents of the database (Table 3 in the printed version)

Defining the concepts of "Event" and "Sample" (Table 3) werea key aspect of data modeling for the database.The meaning of a sampling "Event" is explained in Table 1 under the MON_EVT name, as "...a visit to a specified site on a specified date for the purpose of collecting samples and/or field measurements."The meaning of a "Sample" is also explained in Table 1, as "... a part of the environment that is removed to a laboratory for analysis, observation, testing, or measuring."What this specifically means depends on the task.In fish bioaccumulation studies (BAAFI) a sample is a whole fish, a composite of many fish, or a filet.For fish community studies (CSAFI) a sample is a single electrofishing pass, from which many fish are identified, enumerated, measured, weighed, and returned to the stream.A benthic community studies (CSABM) sample is a volume of stream bottom, which is collected with a Surber or Hess sampler, and the organisms are identified, counted and sometimes weighed by a specialized laboratory.The toxicity testing task (TXASW) collects water grab samples; subsequent measurements of water chemistry and observations of animal survival and reproduction arelinked to those samples.Laboratory control samples (not tabulated in Table 3) are also created in this task.For studies of bioindicators of fish health (BIAFI) a sample is a fish.

The “Taxa” counts in Table 3 also vary considerably by task.Fish bioaccumulation and bioindicators studies focus on a relatively few species: on game species, or other species large enough to support analytical measurements, or species abundant enough to combine into a composite sample. The toxicity testing task uses two designated test species.The community studies tasks, in contrast, identifyall taxa encountered in each sample.

The “Parameter” counts in Table 3 also vary considerably by task.Fish bioaccumulation measures various combinations of length, weight, mercury, PCBs, ICP metals, and 137Cs.Benthic macroinvertebrate community studies use only two parameters, count and weight. The fish community studies include length in addition to count and weight.

______

Further discussion of data flow from collection to the database

This section expands on the printed version's summary of data flow from planning through loading to the database. Figure 2 summarizes the important components and stages in the program’s work flow.

Sample Collection and Processing

The plans for sampling and analyses and level of quality assurance are dictated by sponsor requirements and needs. Specific aspects are usually determined by sponsors in consultation with BMAP program management and the Principal Investigators (PIs). The PIs plan and oversee field sampling events, and they and their staff conduct or arrange for the analyses.Their work is conducted under the program's Quality Assurance Plan (QAP), using task-specific Standard Operating Procedures (SOPs) which are reviewed annually.Elements of the QAP address training, management assessments, surveillances, procurement, calibration, work aids, chain-of-custody procedures, disposal, quality control methods, voucher specimens, and more.

A sampling eventoccurs at one locationand can last anywhere from a few minutes to multiple days.As mentioned above, sampling locations may be points or may refer to a larger area such as a stream reach. Sampling events at multiple locations can be associated using the monitoring event (MON_EVT) table. Typically each task collects and processes its own samples, but there are cases in which samples are shared among multiple tasks. In such cases, entries are made in the Task Samples (TASK_SMP) table to document such sample sharing.

As samples are collected, information such as date, time, location, and sampling protocols is typically recorded on task-specific data sheets. Some ancillary field measurements such as temperature may also be made and recorded.Except for fish community sampling, the collected samples are labeled, preserved, and stored for return to the processing laboratory.At the processing laboratory samples may be analyzed, further processed, and/or submitted to other laboratories for analyses.Examples of measurements made in the processing laboratory include determining the length and weight of fish samples collected for bioaccumulation analyses, or measuring water chemistry in samples to be used for toxicity tests. Such measurements are typically recorded in logbooks or on data sheets and later computerized into plain-text files or Excel spreadsheets.Fish community sampling is entirely a field activity that uses electrofishing to temporarily stun fish within a stream reach. Species identification and morphometric measurements were originally recorded on printed data entry forms; currently they are capturedusing preprogrammed data entry screens on dedicated portable computers in the field. Live fish are returned to the stream.

Sample designation, and the further analysis of samples, depends on the task.Benthic macroinvertebrate samples are sent to a special laboratory for taxonomic analysis. The toxicity testing task sets up replicate toxicity tests using the water sample collected from the field.Each original water chemistry grab sample and subsequent dilution of that grab sample becomes a sample in the RDBMS, with water chemistry measurements.Toxicity tests are identified with the appropriate water chemistry sample, but toxicity lab controls are designated as distinct samples.For the fish bioaccumulation task,most individual fish are considered a sample.The sample (fish) is then fileted and a part of the muscle tissue removed, homogenized, and sent to an analytical laboratory for the analysis of chemical contaminants such as mercury, PCBs, metals, and/or radionuclides; this muscle tissue constitutes a new sample.Other new samples are created when separate sections of filets from a single fish are analyzed for replicate purposes.For small fish species, a composite sample of a number of whole fish may be made, homogenized, and sent for chemical analysis. Subsampling and compositing of the field samples are tracked in the RDBMS by entries in the Associated Sample (ASAMPLE) table.

The measurements made on samples sent to internal or external analytical laboratories are returned to the investigator for review.Results may come as a printed report, as an electronic report, or both.The investigator reviews the results for completeness, consistency and reasonableness.If necessary the investigator will computerize the results, compare the entered results with the laboratory report, and correct any data entry errors.Various other QA checks are done as outlined in the SOPs.

Data Processing Associated with the Relational Database Management System

Once the investigator is satisfied with the quality and integrity of the data, he or she submits an electronic data deliverable (EDD) to the BMAP data manager for loading into the database. EDD specifications were established early in the data management effort, and they are updated if needed to accommodate changes in the data being provided.The data manager records the receipt of each EDD and assigns a unique data receipt group identifier to the EDD.This is accomplished with customized software which automatically tracksthe EDD through the subsequent data processing stages.The EDD is next transferred to the data management workstation and checked into the Revision Control System (Tichy 1985) which is a software tool for tracking changes to computer files.

The data manager typically examines the data visually, and also runs a series of programs against the EDD to check the integrity of the data and to flag suspicious, inconsistent, or missinginformation.Errors, omissions, and other issues (such as new taxa or sites, new flag codes, misaligned data, incorrect coding of data, or reused tag numbers) are communicated to the data provider, who resolves the issue, or corrects the data and resubmits the EDD.Alternatively, the files can be corrected by the data manager after consultation with the data provider,under revision control using a text editor or program.These checks are in effect an additional level of quality assurance, frequently leading to file fixes or improvements.

Next, the data manager runs a task-specific set of SAS®programs which reformat the data into a standardized set of files. There are three types of standardized files corresponding to sample information (including events and locations), laboratory processing information, and results.Finally a "normalization" program converts the standardized files into SAS®files that correspond to the structure of the data base tables and can be directly loaded into the RDBMS.At this point most of the database constraints are automatically checked against the existing database entries, and a report is generated if any constraints are violated.Because the initial EDDs have already been examinedissues seldom arise at this stage, but if they do they are resolved and any needed data reprocessing is undertaken.Constraint checks at this stage have, for example, prevented the loading of duplicate records, and have flagged the need to add new sites, taxa, and analysis laboratories to reference tables. Finally, the data are loaded to the database, with checks to ensure that the correct number of records have been added.

1