QGIS ExercisesLab 8: Importing, Joining Tables

Lab 8: Spatial Selection, Importing, Joining Tables

What You’ll Learn:This lesson introducesspatial selection, importing text files into an ArcMap table, combining rows, and navigating tricky joins. The work is organized as two small projects, the first with step-by-step instructions, and the second less so. You should have read, and be ready to refer to Chapters 7 and 8 in the GIS Fundamentals text. This project requires synthesis of what you’ve learned up until now.

Data are in the L8\ directory, separated by projects.

What You’ll Produce: Three maps, one of U.S. NASS data, one of California county income, and one of California counties with parks or forests.

Project 1:Select by Proximity and Adjacency

Work in GIS often involves analyzing the locations of features in relation to other features. Two types of relationships are proximity (how close features are) and adjacency (features that share the same boundary).

Finding points near lines

Open a new data frame, and add the datafrom L8\Project1 the data:-twocity_Mercator.shp,

-35W_Mercator_Q.shp,

-USA_48_Mercator.shp.

Choose VectorSpatial Query from main menu (Video:Lab 8 Select Adjacent). If you don’t see the tool, make sure you have the plugin installed

This should open a spatial query window (see figure below).

Specify to:

-select source features from

“twocity_Mercator”

-Where the feature “Intersects”

-Reference feature of

“35w_MercatorQ”

Then click Apply, and thenClose.

On the Map, the point that represents the city of Minneapolis should be yellow, which means it has been selected.

We often use this type of operation prior to some other analyses. For example, we may wish to locate a shipping center on a major highway. If we wished to find a location “near” a major highway we would need to add a proximity value to the search by buffering one of the features (covered in a future exercise).

Now, we will find all states (polygons) that are intersected by 35W.

First, clear the previously selected features by

View->Select-> Deselect from all Layers

You could also use the toolbar button

Perform another Spatial Query (as above), with

-The Target Layer(s) as 35w_MercatorQ.

-Source layer asUSA_48_Mercator.

-Select “Where the feature Intersects”.

-Then Apply and OK.

The six polygons representing states that intersect the 35W should be selected, similar to the figure to the right.

Selecting by adjacency

Now we’ll select polygons that are adjacent to a selected set of polygons, in this case, all states that share a boundary with a selected set of states.

-First, Clear Selected Features (as described above).

- Select the USA_48_Mercator layerin the TOC. Then activate the Select Single Feature.

The Select Single Feature allows selection of one feature (here the state polygons)

The other select cursors allow selection of groups of features by left clicking and dragging the cursor.

For now, just select one state. For the example below, select Colorado, a rectangular state in approximately the middle of the USA (below left).

Now switch to the Select Features by Rectangle and carefully drag a rectangle just slightly larger than Colorado (below right).

This demonstrates how manual selection allows you to pick features. We won’t do anything more now, but you could save or process selected features.

Project 2: Adding a Text Table, Joining to a Shapefile

This project introduces something quite common, joining ASCII tabular data with a shapefile. Here, we will combine a text file on corn production for US counties with a county shapefile, but there are many other types of tabular data that are available as text files, summarized on a county basis, including population, voting, education, income, crime, air pollution, and many other social, political, and environmental data.

Here we import a text file, convert it to a QGIS compatible table, and edit the table, deleting columns, creating join items, and combining rows before joining it with a polygon shape file. These are all common operations when ingesting tabular data.

Start QGIS, and add lwr48.shp from the L8\Project2\ subdirectory.

Now add the text file cnty26.csv to this data view, using the Add Delimited Text Layer tool, found by the single quote icon, usually along the left-hand main frame of QGIS.

As indicated with the arrow in the figure above, select No geometry (attribute only table), otherwise accept the defaults options.(Video Adding Text)

Once loaded into your data view, right click on the layer and select Open the Attribute Table for viewing.

This file contains 1996 seed corn production, in bushels, for counties in the United States. These data were downloaded from the National Agricultural Statistical Service website, and we’re most interested in the columns:

Stfips: the state Federal Information Processing System (FIPS) code,

CoFips: county FIPS code,

Harvested: the acres harvested for a given yield category in a county,

Yield: Bushels per acre harvested for the yield category,

Production: Total bushels produced (yield times harvested) for the given yield level.

Unfortunately, we can’t directly edit the .csv file, so we must convert it to a DBF file (also called dBase table). Close the table viewer.

Select the cnty26 layer in the Layerswindow, right click andSave Asand save all records to the Project2 subdirectory, naming it something like “raw_corn_dat”.

Select DBF file as the “Format”. Check the box for “Add saved file to the map”

Thenremove the cnty26.csv from the data frame, to reduce clutter.

We want to delete all columns exceptStfips, CoFips, Harvested, Yield, and Production from the raw_corn_dat attribute table.

First toggle the editing button (to start editing), then

Delete columns by left clicking Delete Attributes button then select the all the columns you wish to delete and select OK.

Untoggle the editing button and save your changes to raw_corn_dat.dbf

Close the Attribute table.

We now want to join this data with the county shapefile, lwr48.shp. Unfortunately, there are two problems. First, we don’t have a ready-made key for the join. There is no column that maps cleanly from the raw_corn_dat.dbf file to the lwr48shapefile.

Open the Attribute Table for the lwr48 shapefile.

Notice that lwr48.shp also has the county and state FIPS codes, in the COUNTY and STATE columns, respectively. Each state has a unique FIPS code, and each county within a STATE has a unique code. If we combine the STATE-COUNTY codes, we can create a unique ID for each county in the country.

Toggle the editing button (to start editing),

Add columns by left clicking New Column button then create a new column. Name this column sta_count and assign the Type to be a Whole number (integer), with at width of 8 (precision).

Use the Field calculatorto assign a unique FIPS code to the sta_count variable.

State variables are two digit, while county variables can have up to three digits

Specify “Update existing field”

Multiplying the STATE by 10000 and adding to COUNTY creates a unique 5-digit code, with the value for STATE in the first two digits, and the value for COUNTY in the next three digits.

"STATE" * 10000 + "COUNTY"

Save your changes to lwr48.shp and stop editing, by clicking again on the toggle edit button.

Open the raw_corn_dat.dbf file, add a sta_count column similar to the one in lwr48, and create a value for a new column using the field calculator, according to:

“Stfips” *10000 + “CoFips”

Now, sort the raw_corn_dat table in ascending order according to this new column, by right clicking/selecting in the sta_count column.

You should have a window that looks something like the figure above, with the sta_count shown at the far right of the table.

The next step requires one of the measurement fields to be summarized. However you must change the format of the field from Integer to Real for the summary to work. Chose you field Harvested, Yield or Production and convert it to Real. This is done by using the Field calculator as show to the right. I used Harvested but you could use either of the measurement fields. I named my field Harvest_R

Untoggle the editing button and save your changes to raw_corn_dat.dbf.

Now select Plugins and ensure you have the Group Stats plugin installed. It should have a black check beside the Group Stats line.

Next find the Group Stats toolbar. It is most likely in the extreme lower left of the main QGIS window.

Click on Group Stats and

Select raw_corn_dat in the Layers dropdown and

Drag sta_count to the Rows box,

Drag sum and Harvest_R to the Value box. (see left)

Click onCalculate.

Your output should appear as shown to the right.

Next click on Data and Save All to CSV file.

Name the file

SumCorn.csv

Now add the SumCorn to you Layers panel with the Create a Layer from a
Delimited Text File. Make sure you check the no geometry (attribute only table) option and select Custom delimiters. (See below)

This adds the SumCorn to your Layers Panel.

Open the SumCorn file and sort ascending by sta_count. Your summary field should be names “None”. We will rename this field later, for now just remember you summarized measurement field is called “None”.

Check to see that the sta_count with a value of 10001 has a None value of 5700 (this assumes you use Harvested as the measurement field. If not check back and verify the accuracy of your summarized field; you can get a check value if you sort raw_corn_dat by sta_count and manually add up the entries for the 10001 value.)

Remove the raw_corn_dat file from the map.

Now, join the SumCorn table you just created to the lwr48.shp file (right click on lwr48.shp in Layers panel, select Layer Properties of lwr48.shp, then Joins. Use the green Plus button to begin the join).

The join layer is Sum_Corn, the join fields and target fields are sta_count

Select OK and then Apply, OK

Examine the lwr48.shp file to see the new columns added to the right.

Note there are some counties that do not grow corn. They will not have any joined data from the Sum_Crop file.

Your lwr48.shp file should look something like this:

We want to further process this combined data. Because many operations are restricted on joined files, it is best to save a copy to a new file.

Right click on the lw48.shp in the Layers panel then left click “Save as” and name and save the file appropriately, something like US_corn.shp. Make sure you select ESRI Shapefile for the format.

Add this new file to your data frame, and remove the lwr48 file and summary production tables.

Open the Attribute table, toggle Editing, Select all the records where SumCorn_No 1 >= 0 then, Invert the Selection (see figure to the right).

Use the Field calculator to assign a value of 0 to all the records that have NULL values. Then Clear selected features.

Now, prepare your data for output. First, change the Projection (CRS) of the Project to EPSG:102003, the USA_Contiguous_Albers_Equal_Area_Conic.

Symbolize the US_corn.shp file, using the Layer

Properties, Graduated,useSumCorn_No.

Use a gradient color ramp between two distinct colors (below from RdYlGn).

Specify about 10 classes in a Natural Breaks scheme.

Right click on the first range and select Symbol, then Simple fill then No Pen for the border style.

Select Ok, Apply, OK.


Refine the display of the labels of the Quanties of Sum_No (really Sum Harvest) again by right clicking on the first range manually updating the label for each of the 10 categories. Select Ok, Apply and OK to complete the changes to yourStyle.

Add the states.shp with a hollow symbology and dark outline color. Create an appropriately annotated layout, with title, north arrow, name, legend, and other descriptive elements, and produce and submit a pdf of the layout.

Project 3

Your third task is to produce two (2) maps on the same layout, one showing average income by county in California, and a second map showingcounties with state parks or forests. These instructions will be less detailed than previous lessons, as we omit most steps we’ve covered in previous lessons. The goal is for you to synthesize these previously-taught tools on your own.

Start a new blank ArcMap project. Do not add any layer yet.

Maps must be produced in a UTM Zone 11, NAD83 projection (EPSG: 26911). The original data has map units of decimal degrees. Use QGIS to reproject the data, to the UTM coordinate system.Video: Lab 8 (Reproject)

Data for this project are in L8\project3\ subdirectory.

California county boundaries are in Cal.shp.

After you reproject the Cal.shp file add the reprojected layerto a NEW Project. This will eliminate the “on-the-fly-reprojection” for the rest of the exercise.

Income.dbf is a database file that lists average per capita income by county; add this to file to the Income data frame.

Rec.dbf is a database file that list recreation features an properties, including county location, add this file to the Recreation data frame.

The income, recreation, and county files have a common attribute – cnty_name. This common field allows you to join these files together.

Income Map

You need to produce a map showing only those counties with an average per capita income greater than $16,000. Create this map in a new data frame.

Video: Lab 8 (Income)

The simplest approach uses binary indicator attribute. Select values from the table and then create a new attribute and assign the value 0 (counties below 16,000 per capita income) or the value 1 (counties above 16,000)].

Park/Forest Map

You need to create a map (to turn in) of California that colors counties that contain a park, a forest, or both. Create this map in a new data frame. The database file named rec.dbf lists many recreation types, including parks and forest. We must combine this with the state county outline layer through a join, but this join is problematic, because the source rec.dbf table does not have a proper key for this join. We have to create a proper key before joining, removingmany-to-one relationship for the counties in rec.dbf with counties in the Cal.shp file. There may be multiple entries including for each county, one for each park, forest, reservoir, or other features found in a county. You need to develop a list of counties with parks or forests from this rec.dbf. Video: Lab 8 (Park_Forest).

One way is to open the rec.dbf table and select all those with parks, and save your selected records to a Parks table. Repeat this process, selecting and saving only the records with forests to a Forests table.

Then create indicator (binary, or 0/1) variables in both tables, for example, create a new field named something like “has forest” in theforest table, and set this value equal to one for each record in the table, and create a similar “has park” column of ones in the parks table.

Now join the parks and forests tables to Cal.shp, and then select those rows with a park or aforest. There is a map showing the correct set of counties near the end of these instructions.

Another, bit trickier option is to create both the parks table and the forests table, as above, but instead of adding the 0/1 binary variable to each, do a serial join. First join the parks.dbf you created with the list of counties that have parks to the Cal.shp table, then join the forests.dbf table to the Cal.shp table (with parks still joined). Then select the records that have either a park or forest in the respective columns, and assign these an indicator variable that you’ll use to symbolize your map.

Video: Lab 8 (Correct Legend)

A note of caution; this is a tricky exercise, and many students do not produce this final map correctly. The main problem comes from multiple entries (parks or forests) for each county. You need to be very careful in the table joins, and look at the maps you produce. Make sure your final product makes sense. One helpful guide may be the flowchart or the maps of the respective component maps; in this case a map of those counties with forests, and a separate map of those counties with parks. The “OR” condition should include data from both joined files, so your final project 2 map should have both colored in.