Development of a GIS-based Watershed Characterization and Pesticide Usage Assessment Tool

Project Report Submitted to

USEPA Office of Pesticide Programs,

Environmental Fate and Effects Division

by

Geographic Information Science & Technology Group

Oak Ridge National Laboratory

Budhendra Bhaduri

PO Box 2008 MS 6237

Oak Ridge National Laboratory

Oak Ridge, TN 37831-6237

Phone: (865) 241 9272

Fax: (865) 241 6261

Email:

Background Information:

The Food Quality Protection Act (FQPA) requires EPA’s Office of Pesticide Programs (OPP) to consider drinking water as a separate source of exposure in pesticide dietary risk assessments. Exposure assessment tools currently in use and under development include mechanistic runoff modeling and watershed-based regression approaches. Vulnerability of surface water supplies to pesticide residues in raw water depends in part on pesticide usage in a watershed, which in turn depends in part upon land use/cover characteristics, such as percent cropped acreage. To advance the development of OPP’s drinking water assessment tools, and to allow geographically-targeted monitoring and mitigation efforts at a watershed level, OPP initiated a contract with Oak Ridge National Laboratory (ORNL) to georeference all known Community Water Supply (CWS) surface water intakes in the continental United States to the newly-available National Hydrography Dataset (NHD), and to delineate upstream watersheds for each intake. Using the delineated watersheds, land use/land cover statistics for each watershed are obtained from the National Land Cover Dataset (NLCD), and pesticide usage is estimated from other available data. This project was done in collaboration with the United States Geological Survey and EPAs Office of Water. USGS will use the resulting data to help select watersheds and water supplies for future data collection efforts designed to improve development of pesticide regression models. OW plans to add the referenced intake information to its Reach Address Database (RAD). In the short term, OPP will use the information to identify gaps in landscape regression models under development.

Objective:

The overall objective under which this work assignment is carried out is in the estimation of distributions of pesticide concentrations in surface water for exposure/risk assessments performed for the Food Quality Protection Act (FQPA). The main tools used for estimation of these concentrations are regression models (including WARP and SPARROW) developed in large part by the US Geological Survey (USGS). Initial developments of appropriate data sets and tools for data manipulation are necessary and critical preceding steps that will compliment the future modeling efforts. The objective of this proposed work focuses on the initial development of geospatial data and a GIS-based evaluation tool for managing and visualizing the influence of pesticides on surface drinking water sources. The GIS analyses are critical to assist in analyzing the vulnerability of drinking water sources to upstream areas of pesticide usage and to assess potential pesticide impact to drinking water at a national level. The primary goal was to provide data analysis, modeling, GIS, and software programming technical support to enhance assessments of potential pesticide surface water transport from crop areas to locations of public water intake sources.

Contributory upstream watershed characterization from drinking water intake locations for estimating upstream pesticide usage requires upstream watershed delineation from the intake locations, which in turn requires the understanding of the CWS intakes locations with respect to the hydrography network. Consequently, three components of this broad objective were identified and efforts to address these three objectives individually (and sequentially) were designed to lead to an optimized solution. These three objectives can be described as follows:

1.  The first objective was to geographically index or georeference Community Water System (CWS) intakes to the hydrography network of the National Hydrography Data (NHD).

2.  The second objective addresses the delineation of upstream contributory watersheds from the intake georeferenced locations at a national level.

3.  The third objective focuses on the development of an evaluation tool to assist in assessing the usage of pesticides on the upstream contributory watersheds for the drinking water intakes.

Description of Principal Data Sets:

CWS Intakes Database

The data file that served as the basis for reach indexing were provided by the USGS (Marilee Horn, U.S. Geological Survey, Written commun, September 27, 2001). This data file was extracted from a public supply database (PSDB) that, beginning in 1997, was compiled to facilitate a variety of data-analysis efforts on the part of USGS programs. PSDB’s intended use with the USGS is to link the results from a number of environmental monitoring programs to water sources that are used for public water supplies. These data will establish a context for relating the quality of the Nation's streams and ground-water as assessed by NAWQA program to the individual streams, lakes, reservoirs, and aquifers that serve as water supplies. Further analysis may enable a comprehensive analysis of streamflow conditions including: flood, drought, and time-of-travel that may enable more efficient water-supply planning and protection

Origin of PSDB

In 1997, data on community-water systems (CWS) were retrieved from SDWIS as part of a USGS-EPA project evaluating the impact of changing the maximum allowable levels of arsenic in drinking water (WRIR 99-4279). The data were retrieved by State, combined, and organized into 5 ACCESS tables containing information on: (1) the CWS, such as name, mailing address, and population served; (2) surface-water intakes; (3) ground-water sources; (4) connections between public suppliers; and (5) treatment plants. County served and zip-code data from SDWIS were corrected and added as well as several new data fields to meet the needs of the Arsenic project. The database was referred to as the Public Supply DataBase (PSDB).

The PSDB was also used in a second USGS/EPA project that required an EPA reference number for stream segments, the RF1, for surface-water intakes of CWS serving more than 10,000 people as part of a time-of-travel study that used the USGS SPARROW model (OFR 99-248). In order to assign an RF1, a reasonably accurate latitude/longitude was required. The DeLorme Street Atlas USA software was used to check the accuracy of latitude/longitudes that were in PSDB from SDWIS. Surface-water intakes that were missing from PSDB were requested from Districts who could obtain the locations from (1) State agencies, (2) District databases, or (3) contact the public suppliers directly. In some cases, the DeLorme Street Atlas USA was used to assign latitude/longitudes to named surface-water intakes

Obtaining and evaluating locations for surface-water intakes for public suppliers serving less than 10,000 people has continued. During late 2000, USEPA and USGS began collaborating on an effort to develop a single set of intake locations that would reside in SDWIS. These locations would reflect the QA/QC work done by the USGS and would include conversations with State staff responsible for updating SDWIS/FED. Subsequent retrievals have been made in January, July, and September, 2001 and April, 2002 which were compared with PSDB and used to partially update PSDB. In June, 2002, a series of queries were developed to systematically update PSDB from SDWIS.

The data presented in this version of ORNL reach indexing and watershed delineation was done using the September, 2001 version of PSDB. The following table lists shows those variable from PSDB that were included in the ORNL indexing and watershed product. The remainder of this docment describes ACCESS tables for only those portions of PSDB that were used for the indexing project.

National Hydrography Data (NHD)

Development of a national hydrologic data was initiated the U.S. EPA back in the early 1970s. The Reach File was first conceived in the early 1970's with a proof-of-concept file, known as Reach File Version 1.0 Alpha (RF1A), completed in 1975. The first full implementation referred to as "Reach File Version 1.0" (RF1) was completed in 1982 (Dewald et al., 1985). The source for RF1 was the USGS 1:250,000-scale hydrography that had been photoreduced to a scale of 1:500,000 by the National Oceanic and Atmospheric Administration. RF1 consists of approximately 68,000 reach segments comprising 650,000 miles of stream. Although RF1 still supports broad-based national applications, the need to provide a more detailed hydrologic network motivated the development of Reach File Version 2.0 (RF2) in the late 1980's. RF2 was created by using the Feature File (now called the National Geographic Names Database) of the USGS Geographic Names Information System (GNIS) to add one new level of reach segments to RF1. RF2 contains 170,000 reach segments. Widespread interest in providing a more comprehensive, nationally consistent hydrologic database led to the development of the Reach File Version 3-Alpha (RF3-Alpha). This database combines data from RF1 and RF2, the GNIS, and the 1988 edition of USGS 1:100,000-scale digital line graph (DLG) hydrography data. RF3-Alpha contains nearly 3,200,000 individual hydrographic features (reaches) and more than 93,000,000 coordinate points.

The National Hydrography Data set (NHD), initially released in 1999, is the fourth in the series of continuing improvements to reach data, and is a newly combined data set that provides hydrographic data for the United States. This database supplements attribute-based connectivity with reach delineation that provide spatial connectivity, common classification of features that underlie the reaches, and a design that encourages cooperative data maintenance and improvements among many organizations. The NHD is the culmination of recent cooperative efforts of the U.S. Environmental Protection Agency (USEPA) and the U.S. Geological Survey (USGS). It combines elements of USGS digital line graph (DLG) hydrography files and the USEPA Reach File (RF3). The NHD supersedes RF3 and DLG files by incorporating them, not by replacing them. The DLG files contribute a national coverage of millions of features, including water bodies such as lakes and ponds, linear water features such as streams and rivers, and also point features such as springs and wells. These files provide standardized feature types, delineation, and spatial accuracy. From RF3, the NHD acquires hydrographic sequencing, upstream and downstream navigation for modeling applications, and reach codes. The reach codes provide a way to integrate data from organizations at all levels by linking the data to this nationally consistent hydrographic network. The NHD provides comprehensive coverage of hydrographic data for the United States represented by approximately 3 million reaches or stream segments. Some of the anticipated end-user applications of the NHD are multi-use hydrographic modeling and water-quality studies. Although based on 1:100,000-scale data, the NHD is planned so that it can incorporate and encourage the development of the higher resolution data that many users require. The NHD can be used to promote the exchange of data between users at the national, State, and local levels.

National Land Cover Data (NLCD)

This land cover data set was produced by the U.S Geological Survey (USGS) as part of a cooperative project between the USGS and the U.S. Environmental Protection Agency (USEPA) to produce a consistent, land cover data layer for the conterminous U.S. based on 30-meter Landsat Thematic Mapper (TM) data. National Land Cover Data (NLCD) was developed from TM data acquired by the Multi-resolution Land Characterization (MRLC) Consortium. The MRLC Consortium is a partnership of federal agencies that produce or use land cover data. Partners include the USGS (National Mapping, Biological Resources, and Water Resources Divisions), USEPA, the U.S. Forest Service, and the National Oceanic and Atmospheric Administration. A 21-class land cover classification scheme applied consistently over the United States. The spatial resolution of the data is 30 meters and mapped in the Albers Conic Equal Area projection, NAD 83. The NLCD are provided on a state-by-state basis. The state data sets were cut out from larger "regional" data sets that are mosaics of Landsat TM scenes. At this time, all of the NLCD state files are available for free download as 8-bit binary files and some states are also available on CD-ROM as a Geo-TIFF.

The TM multi-band mosaics were processed using an unsupervised clustering algorithm. Both leaves-off and leaves-on data sets were analyzed. The resulting clusters were then labeled using aerial photography and ground observations. Clusters that represented more than one land cover category were also identified and, using various ancillary data sets, models developed to split the confused clusters into the correct land cover categories. The land cover classification statistics for the state of Iowa, as an example, is as follows:

LandCoverClasses- Iowa
/ SquareMiles
11 Water / 508
12 Perennial Ice Snow / 0
21 Low Intensity Residential / 423
22 Hi Intensity Residential / 160
23 Commercial/Industrial/Transportation / 809
31 Bare Rock / 7
32 Quarries/ Mines / 24
33 Transitional / 0
41 Deciduous Forest / 4173
42 Evergreen Forest / 8
43 Mixed Forest / 72
51 Shrubland / 0
61 Orchards/ Vineyard / 0
71 Grasslands/Herbaceous / 2882
81 Pasture/Hay / 9217
82 Row Crops / 36415
83 Small Grains / 341
84 Fallow / 0
85 Urban/Recreational Grasses / 161
91 Woody Wetlands / 695
92 Emergent/Herbaceous Wetlands / 377
State/Region Total / 56271

The classification system used for NLCD is modified from the Anderson land-use and land-cover classification system. Many of the Anderson classes, especially the Level III classes, are best derived using aerial photography. It is not appropriate to attempt to derive some of these classes using Landsat TM data due to issues of spatial resolution and interpretability of data. Thus, no attempt was made to derive classes that were extremely difficult or “impractical” to obtain using Landsat TM data, such as the Level III urban classes. In addition, some Anderson Level II classes were consolidated into a single NLCD class.

Georeferencing Community Water System (CWS) Drinking Water Intakes

Technical Approach:

A vastly automated algorithm (based on attribute matching, feature characterization, and proximity analysis) that evaluates multiple spatial and non-spatial attributes from the CWS database and the NHD was developed that assigned each CWS intake to the most appropriate stream/reservoir network in the NHD.

For each CWS location, two nearest NHD reaches (linear and/or polygon) are selected. Both selected features are tested with the georeferencing algorithm. The georeferencing model (algorithm) performs an extensive conditional test based on five primary parameters:

  1. Name of source for intakes (N),
  2. Name of river associated with intakes (A),
  3. Distance to Reach (D),
  4. Linear vs. Polygon Reach condition (P), and
  5. Ratio of the two Distances from the intake location to the two nearest reaches (R).

Alphanumeric flag values are assigned for each of the five parameters in the form of [xN xA xD xP xR], where N, A, D, P, and R represents the five parameters respectively. For both NHD reaches, the first four numeric values (x from the alphanumeric part) are added to produce final flag values (Flag 1 and 2 for the nearest and the second nearest reach respectively). Initial testing of the algorithm indicated that inclusion of the fifth parameter (R) in determining the final flag value introduces undesired error in the results. Consequently, values for the fifth parameter (R) is determined and stored but not evaluated as part of the final flag value calculation. It is only consulted in certain situations where anomalies are detected during the verification and validation process. Because of the design (of the algorithm), lower flag value indicates a better match for any intake location and the corresponding reach is selected as the best or most appropriate reach the intake should be georeferenced to.