DLI/ACCOLEDS Training 2006
Simon Fraser University, Vancouver, BC
December 6-8, 2006
PCCF /GIS Mapping Exercise
Part 1 – Creating a map of individual locations: PCCF geocoding
In this first part we will be creating a map of the locations of the DLI contacts in British Columbia. We will use pccf59_MAR06_fccp59.txt – the most recent BC PCCF file – to geocode the DLI contacts in bc_dli.sav.
1. Open pccf.sps.
2. Change the data list file= and the save outfile = to the one provided by the session coordinator.
3. Run the syntax file:
4. Filter the records that have sli=1 and delete the others. The single link indicator (sli) is used to establish a one-to-one relationship between postal codes and dissemination areas, blocks or block-faces.
5. Unfortunately there will be a small number of duplicate postal codes remaining which are due to retired postal codes that are still in the file. These will need to be removed.
You will now see the output screen indicating the number of duplicate records:
Close this screen and do not save the file.
6. We will now filter out the duplicate retired postal codes by using the PrimaryFirst variable. Since we have sorted by retirement date the active postal code is the first case in each group of duplicates because active postal codes all have a retirement date of 19000001; therefore their PrimaryFirst value = 1.
7. Now sort the data by postal code in ascending order.
8. Save the file as bcpccf.sav.
9. Open bc_dli.sav. This file contains the names, institutions, and postal codes for the B.C. DLI reps.
10. Sort this file in ascending order by postal code.
11. We are now going to add the latitude and longitude variables from the bcppf.sav file to the bc_dli.sav file: in essence we are geocoding the B.C. DLI contacts.
First we will select pcode (+) as the key variable by selecting it in the Excluded Variables window, checking off Match cases on key variables in sorted files, bulleting External file is keyed table and click on the button (see below).
Exclude the unneeded variables by selecting them in the New Working Data File: window and clicking on the < button to move them to the Excluded Variables: window (see below). We will now have the following variables remaining in our working data file: DLIcont, Institution, pcode, lat, long.
This process has attached the latitude and longitude to the records for each B.C. DLI contact.
12. The last step in this process is to save the data as a dbf file so that we can import into ArcGIS. Name the file bc_dli.dbf. Now we are finally ready to map.
Part 2 – Creating thematic maps: PCCF geocoding
In this second part we will be creating two types of maps. The first type will be a thematic map of non-STC data using census boundaries. The second type will be a thematic map of non-STC data in conjunction with a census variable. The pre-GIS preparation for both map types will be completed in this part. We will be using numeracy data from B.C. School District 73’s JAKE (Justification and Accountability in Kamloops Education) database. Permission to use the data has been given by CEEDS (Centre for Early Education and Development Studies) at TRU. The numeracy data is contained in numeracy.sav.
1. Open the bcpccf.sav file that you created in the first part.
2. Since we will be mapping at the census tract level we will need to create a new variable ctid that will be compatible with the census tract number in the census tract boundary file.
3. Save the file as bcpccf.sav.
4. Open the numeracy.sav file.
5. Sort the data in ascending order by the postal code variable pcode.
6. We will now geocode the numeracy.sav file with the ctid variable from the pccf.sav file. This is a similar process to step 11 in Part 1.
Exclude all the variables except the ID, FSA_NU, and ctid. The key variable will be pcode and the external file is the keyed table.
7. For the purpose of this exercise we will identify those students with low numeracy scores: 2 = “Does not meet expectations”.
8. Since we are changing our unit of analysis from a student to a census tract it will now be necessary to aggregate the students by census tract. Therefore, the first step will require the creation of two dummy variables that will be used in this aggregation. First we will create the variable numcount to count the students within census tracts not meeting numeracy expectations.
Select FSA_NU from the left window and fill in the Output Variable’s Name: and Label. Click on Change and then click on Old and New Values.
Finally change to and click on .
Click on and then click on OK in the next window.
9. Next step is to create the dummy variable totcount to count all the students in a census tract. This will be used to calculate a percentage of students with a low numeracy score after we aggregate the individual students into census tracts. Each student with a score of 2, 3, or 4 will be assigned a value of 1. All other scores are missing values and will not be counted. Select Transform à Compute from the menu bar, name the Target Variable totcount and give it a numeric expression of 1. Click on If.
In the next window, bullet Include if case satisfies condition, insert the formula below and click on Continue.
10. Now we do the aggregation.
Our Break Variable will be ctid (census tract number) and our Aggregate Variable(s) will be numcount and totcount. We will also have to change the function of both aggregate variables to Sum with the Function button. Also note that we will create a new data file named aggr.sav.
After changing the function click on Continue and then in main Aggregate Data window click on OK. Open aggr.sav. When prompted save numeracy.sav.
11. Now calculate the percentage of students in each census tract with low numeracy scores. This new variable numperc will be used for the second map.
12. Since we will also be comparing the low numeracy data with census data in our third map, we will now add a couple of census variables to our aggregated file. These variables have been extracted from Profile for Census Metropolitan Areas, Tracted Census Agglomerations and Census Tracts, 2001 Census - Cat. No. 95F0495XCB2001005. The data is contained in census.sav. There are two variables: lonefam (% of lone parent families) and faminc (median family income).
13. Our last step is to save the data in our aggregated file, aggr.sav to numeracy.dbf.
Let’s go do some mapping now!
1
Peter Peller, Thompson Rivers University and
Laurie Shretlen, University of Calgary