GIS and Public Health Geocoding Lab, Part 1, 2013
Instructor: Thomas Talbot
Geocoding refers to the process of adding locational information (usually latitude and longitude coordinates) to address records. It is useful for presentation of information on a map and essential for conducting most types of spatial analysis. There are a number of public and private reference files that contain locational information for all of the addresses in the U.S. To varying degrees, these can be accessed for free on the internet. In addition to the reference files there are software tools to link the addresses you want to geocode to the reference files.
In this lab you are going to geocode 500 addresses using two different geocoding tools. You will add the results of the geocoding (latitude and longitude) from these tools into the same file and then compare the results.
The first tool was developed by Dan Goldberg at the Texas A & M University. There are many features here besides geocoding which you might also consider using in the future. The second geocoding tool you will use is ArcMap.
The first step in this exercise is to copy the people500.csv file from Tom Talbot’s Web site http://www.albany.edu/faculty/ttalbot/
1. Texas A &M University
a. Sign up as a new user at http://geoservices.tamu.edu/UserServices/Signup.aspx
b. Once signed up and logged in, navigate to the geocoding page and select Batch Geocoding.
steps 1 through 4 are pretty self-explanatory. Upload the people500.csv file.
Check the box “First row contains column headings”
This is a set of 500 randomly chosen residential addresses from Albany County. The sample file is comma delimited csv file.
Upload and then validate the people.csv table which is on your flash drive. The site should report back the first 10 records as follows:
Next, choose the “geocoding” service and advance to Step 3.
Choose the correct input fields and choose all the optional output fields.
Advance to Step 4.
There are several options available for allowing inexact matches. Learn about them by clicking on the question mark icon. Set the parameters as shown in the screen below.
Choose Start Process. Click View Process to check on the progress. It normally only takes a minute or two, but with an entire class accessing the site at once it might be longer than this.
When finished, click on My Databases and download the results. Open the file in Excel (remember, it is a CSV text file).
2. ArcGIS Geocoding
Next you will open the file you just Geocoded using the TAMU Geocoding Tool. This file contains latitude and longitude fields.
Once you have the people500.xls file in the Table of Contents Window right click on it and click Geocode. Select 10.0 U.S Geocoding Service.
Next open the attribute table of the geocoded addresses and sort the file by the geocoding score. The addresses which did not geocode have a score of 0. You will need to count these to calculate the ArcMap Geocode match rate. See lab homework assignment at the end of this document.
Next you will add the new ArcGIS geocoded coordinates to the new table. This is done with following command in ArcMap.
GeoProcessing->ArcToolbox->Data Mangaement->Features->addxy
You will see two new fields in the geocoding result file point_x and point_y
Now you will measure the distances in meters between the geocoding results using the TAMU Geocode Tool and the geocoding you did with ArcMap.
Create a new field “Meters”
The formula for calculating distance is as follows:
Distance meters=111319.9*SQRT( (COS(RADIANS( (y1+y2) /2))^2)*(x2-x1)^2+ (y2-y1)^2)
x1,y1 = longitude , latitude TAMU Coordinates
x2,y2 = point_x and point_y ArcGIS Coordinates
You can either figure out how to calculate distance in ArcMap or you can export the shape file to Excel and do the calculation in Excel.
The formula in ArcMap is:
111319.9*SQR( (COS(3.141593/180*( ( [Latitude] + [POINT_Y] ) /2))^2)*( [POINT_X] - [Longitude] )^2+ ( [POINT_Y] - [Latitude] )^2)
Lab Homework
1. Provide a screen shot of the geocoded addresses in ArcMap with the Albany County Boundary Showing. NY counties can be found on ArcGIS Online.
2. Calculate the median distance between the points. Calculate the geocode match rates for TAMU Geocoder and the ArcMap Geocoder.
Match rate= total number of addresses geocoded / total number of addresses.
3. Are the geocode match rates different? If so, why?
4. Calculate the 95 percentile difference in distance. I.e. 95% of the distances are smaller than this distance.
5. Pick 5 addresses that did not geocode in ArcMap and explain why they may not have been geocoded. If the addresses were misspelled or had the incorrect ZIP code suggest the correct spelling and ZIP code. You may want to use a variety of sources to try and locate, verify and correct these addresses. Explain what sources you used.
Lab Homework is due April 18, 2011
Geocoding 1