Chapter 4: Data Input and Output

CHAPTER 4: DATA INPUT AND OUTPUT
(GIS: A Management Perspective - Stan Aronoff)
Pages 103 - 131

For a GIS to be useful it must be capable of receiving and producing information in an effective manner.

The data input and output functions are the means by which a GIS communicates with the world outside.

The objective in defining GIS input and output requirements is to identify the mix of equipment and methods needed to meet the required level of performance and quality. No one device or approach is optimum for all situations.

DATA INPUT:The procedure of encoding data into a computer-readable form and writing the data to the GIS database.

Data entry is usually the major bottleneck in implementing a GIS. The initial cost of building the database is commonly 5 to 10 times to cost of the GIS hardware and software.

The creation of an accurate and well-documented database is critical to the operation of the GIS.

Accurate information can only be generated if the data on which it is based were accurate to begin with.

Data quality information includes the date of collection, the positional accuracy, completeness, and the method used to collect and encode the data. (Discussed in detail in Ch. 5)

There are two types of data to be entered into a GIS: Spatial data and the associated non-spatial attribute data.

The spatial data represents the geographical location of the features

The non-spatial attribute data provide descriptive information like the name of a street, salinity of the lake or the type of tree stand.

The non-spatial attribute data must be logically attached to the features they describe.
There are five types of data entry systems commonly used in a GIS:

keyboard entry
coordinate geometry
manual digitizing
scanning
input of existing digital files

Keyboard entry: involves manually entering the data at a computer terminal. Attribute data are commonly input by keyboard whereas spatial data are rarely input this way.

Keyboard entry may also be used during manual digitizing to enter the attribute information. However this is usually more efficiently handled as a separate operation.

Roads files versus the census file -- roads file will use codes for the various road types while the census file uses exact numbers for things like total population, age range, etc.

Coordinate Geometry (COGO): involves entering survey data using a keyboard. From these data the coordinate of spatial features are calculated. This produces a very high level of precision and accuracy which is needed in a cadastral system.

For a city with 100,000 parcels, it would cost approximately $1 - $1.50 per parcel or $100,000 to $150,000 to digitize the parcels manually. COGO procedures are commonly 6 times and can be up to 20 times more expensive than manual digitizing.

Surveyors and engineers want the higher accuracy of COGO for their applications. Planners and most others are happy with the lower accuracy provided by manual digitizing.

Manual Digitizing: The most widely used method for entering spatial data from maps. The map is mounted on a digitizing tablet and a hand held device termed a puck or cursor is used to trace each map feature. The position of the puck is accurately measured by the device to generate the coordinate data.

Digitizing surfaces range from 12 inches x 12 inches (digitizing tablet) to 36 x 48 (digitizing table) and on up.

The digitizing table electronically encodes the position of the pointing device with a precision of a fraction of a millimeter.

The most common digitizer uses a fine wire mesh grid embedded in the table. The cursor normally has 16 or more buttons that are used to operate the data entry and to enter attribute data.

The digitizing operation itself requires little computing power and so can be done without using the full GIS. A smaller, less expensive computer can be used to control the digitizing process and store the data. The data can later be transferred to the GIS for processing. The problem with this is having enough software for all the computers.

The efficiency of digitizing depends on the quality of the digitizing software and the skill of the operator. The process of tracing lines is time-consuming and error prone. The software can provide aids that substantially reduce the effort of detecting and correcting errors.

Attribute information may be entered during the digitizing process, but usually only as an identification number. The attribute information referenced to the same ID number is entered separately.

Manual digitizing is a tedious job. Operator fatigue (eye strain, back soreness, etc.) can seriously degrade the data quality. Managers must limit the number of hours an operator works at one time. A commonly used quality check is to produce a verification plot of the digitized data that is visually compared with the map from which the data were originally digitized.

Scanning: Scanning provides a faster means of data entry compared to manual digitizing.

In scanning, a digital image of the map is produced by moving an electronic detector across the surface of the map.

There are two types of scanner designs:

Flat-bed scanner: On a flat-bed scanner the map is placed on a flat scanning stage and the detectors move across the map in both the X and the Y directions (similar to copy machine).

Drum scanner: On a drum scanner, the map is mounted on a cylindrical drum which rotates while the detector moves horizontally across the map. The sensor motion provides movement in the X direction while the drum rotation provides movement in the Y direction.

The output from the scanner is a digital image. Usually the image is black and white but scanners can record color by scanning the same document three times using red, green and blue filters.

Inputting existing digital files: There are many companies and organizations on the market that provide or sell digital data files often in a format that can be read directly into a GIS. These digital data sets are priced at a fraction of the cost of digitizing existing maps.

Over the next decade, the increased availability of data should reduce the current high cost and lengthy production times needed to develop digital geographic data bases.

SCANNING VERSUS MANUAL DIGITIZING

Scanning is being used by many organizations, yet the subject is very controversial. One reason for the questions on data accuracy is that rigorous trials are few and of necessity are specific to the organization and application.

Data entry using scanning is claimed to be 5 to 10 times (or more) faster than digitizing.

However maps normally must be redrafted before they can be scanned or the color separates must be scanned.

Redrafting is often considered to be a major disadvantage of the scanning option. Redrafting, although time consuming, does not necessarily add to the cost of the data conversion process. Redrafting can reduce the total cost of both scanning and manual digitizing. For example, studies by the US Forest Service have shown that a "map preparation" step before the manual digitizing is done can reduce the overall digital encoding costs by as much as 50%.

WHY?

1. The redrafting is done manually, not on a computer system and therefore costs are not incurred for the computer time or the higher salaries of computer operators.

2. The digitizing operation proceeds much more quickly and requires less editing if the map has fewer errors and inconsistencies. Faster completion of the digitization and editing functions reduces the amount and therefore the costs of expensive computer system and computer operator time.

3. When inconsistencies on the map must be worked out, manual drafting is more efficient and faster than digitizing because they require different skills. They are not equal tasks.

4. It is very time consuming and therefore very costly to make large numbers of changes to a map once it is in digital form.

While a scanning system is for the most part automated, and requires less highly trained personnel, more complex equipment must be maintained, more sophisticated software must be written or purchased and there are most steps in the process.

Scanners are more expensive than digitizing tables. A 60 x 44 inch digitizing table can cost between $3000 and $8000. A high quality scanner will cost $100,000. The higher equipment costs can be justified if there is a great deal of production that needs to be done.

Most GIS software packages include a digitizing software capability, but separate special-purpose software is needed to operate a scanning system. Scanning works best with maps that are very clean, simple, and do not contain extraneous information. Scanning is most cost-effective for maps with large numbers of polygons (1000 or more) and maps with a large number of irregularly shaped features such as lines and odd polygons.

Manual digitizing tends to be more cost-effective when there are relatively few maps that are not in a form that can be scanned. Maps that require a lot of interpretation do not need to be scanned.

There is a strong demand for faster, more cost effective data entry methods. Hundreds of computer operators with thousands of maps are not the answer. Although scanning will never replace manual digitizing, as more and more scanners are used, the technology will become better and better.

DIRECT USE OF RASTER SCANNED IMAGES

Much of the difficulty in using raster scanning to enter map information is the extraction of points, lines and polygons from the raster data.

In some cases, the raster image is only needed as a background on which to overlay other geographic information.

Air photos, satellite imagery, and scanned map images can be stored and presented in this way.

For example, if a raster satellite image is displayed on a screen, a vector map can be overlaid and then updated or a totally new map created by digitizing on the screen.

Using a raster image as a background can be an effective solution when a relatively small amount of data needs to be extracted but a large area must be displayed in order to find the data.

EXISTING DIGITAL DATA

In the US and Canada, low-cost digital geographic information is becoming more readily available.

Data sets are being produced by the national mapping agencies and agencies responsible for the census and other nationwide statistical data. In the US these agencies include but are not limited to the USGS, US Census Bureau, and the DMA. Natural resources information is being converted to digital form at both the federal and state or provincial levels.

Since digital data sets are produced to satisfy a wide range of users, the cost of the data, currency and accuracy vary. The accuracy with which boundaries are drawn, the date of the information, and the method compilation may be sufficiently different to create errors when different data layers or adjacent map sheets within a data layer are used together.

Figure 4.4 p. 112 This is a map produced from the USGS 1:250,000 Land Use/Land Cover digital data set. To generate this map, the data for two adjacent data sets were joined. Notice the abrupt change in land use categories along a horizontal line in the center of the map. This change coincides with the boundary between two map sheets from which the data were digitized. The differences may be a result of discrepancies in airphoto interpretation or of the three year difference in the source dates of the aerial photography used.

Problems such as there may occur in any digital data set and must be identified and taken into account.

Private companies are also beginning to provide off-the-shelf database products. Although there may be difficulties, the cost of existing data is usually a fraction of the cost of creating anew data set.

The availability of inexpensive data sets will make GIS technology economically more attractive and easier to implement.

In the US the cartographic community has made a considerable effort to coordinate and standardize the production and distribution of digital geographic data.

At the federal level, the Federal Interagency Coordinating Committee on Digital Cartography (FICCDC) was formed in 1983 for this purpose. Over 14 organizations participate in the Committee, which holds regular meetings and produces a newsletter and a variety of reports.

Now we are going to discuss examples of data sets available from these federal agencies.

BASE CARTOGRAPHIC DATA

Base cartographic data include the topographic and planimetric information usually portrayed on a map.

Topographic data are those data that portray relief, such as elevation contours and spot heights.

Planimetric data include roads and streams, as well as cultural data such as administrative and political boundaries, cities, and towns.

Often these data sets are digitized version of an existing map series with each type of information such as the elevation contours, assigned to separate data layers. Base cartographic data sets are produced in two formats: Graphics and topologically-structured.

Graphics format is essentially the line and point features digitized in vector format. In this form, the map can be easily updated or modified to produced special purpose maps.

These data sets are well suited for the CAD systems used in digital mapping. However, they are severely limited by the lack of topological structuring.

A commonly used interchange format is the SIF (Standard Interchange Format) developed by the digital mapping industry for transferring lines, points, curves, and symbols.

These data sets can be incorporated into a GIS but there can be a lot of problems associated with it. For example, the data files often have not been checked for topological consistency.

They may contain such inconsistencies as lines that do not met precisely, that overshoot or under shoot the correct connection point. The may be missing lines or gaps that create polygons that are not closed.

For use in a vector GIS these files must be clean and topologically structured.

Topologically-Structured Format is designed to encode geographic information in a form better suited for spatial analysis and other geographic studies. Most GISs are designed to use topologically structured data.

The USGS Digital Line Graph (DLG) data set is an example of topologically structured data. This cartographic data set has been developed from previous mapping efforts at the 1:2 million scale and more recently at the 1:100,000 and 1:24,000 scales.

The older 1:2 million data includes transportation , hydrography, and political boundary maps.

The 1:100,000 scale data sets for hydrography and transportation have been completed for the entire US while the political boundaries and Public Land Survey System are still being developed.

The 1:24,000 series will include the PLSS, political boundaries, transportation, hydrography, and contour data layers. See Figure 4.5 on page 114

These data sets represent a comprehensive, standardized inexpensive and publicly available source of digital information.

The complete coverage (at the 1:100,000 scale) makes it possible to assemble large-area data bases quickly and at a low cost.

LAND USE / LAND COVER DATA

The USGS has developed a LU/LC data set compiled from 1:58,000 color infrared aerial photography and mapped at the 1:250,000 scale.

The data sets were generated by both manual digitizing and scan digitizing.

The LU/LC classes include urban areas, agricultural land, rangeland, forest, wetlands, barren land and tundra.

Associated maps provide political boundaries, hydrological units (watershed boundaries), federal land ownership, and census subdivisions.

Data are available for about 75% of the US. A separate file is being developed for Alaska using a different classification scheme and automated classification of digital satellite imagery.

CENSUS-RELATED DATA SETS

In Canada and the US, the agencies responsible for disseminating census data provide a number of digital data sets that can be input to a GIS.

Census and other statistical data are provided in the form of attribute data sets coded by geographic location.

Enumeration districts, street addresses, postal codes, census tracts and other similar codes are used.

Spatial data sets are provided that can be linked to the attribute datasets by means of these area codes.

Street networks in metropolitan areas, census tract boundaries, and political boundaries are examples of the spatial data sets commonly available.

The spatial and attribute data sets are sued together to produce special purpose maps and to retrieve information for selection geographic areas. They are also used for more specialized analyses including address matching, district delineation, and network analysis.