ESTAT Tutorial (New As of May '05)

Getting to know ESTAT:

The Exploratory Spatio-Temporal Analysis Toolkit

ESTAT, the Exploratory Spatio-Temporal Analysis Toolkit is an interactive Geographic Visualization (GeoVisualization) environment designed to support the exploration of complex spatial data. Specifically, we have designed and developed ESTAT in cooperation with the National Cancer Institute and their research staff.

The ESTAT toolkit is based on the open-source Java geovisualization environment, GeoVISTA Studio. Applications built in Studio are highly interactive and designed to operate across common platforms for researchers interested in visualizing and exploring geographic data.

The GeoVISTA Studio and ESTAT research and development efforts are housed at the Penn State GeoVISTA Center, part of the Department of Geography at PennState.

This tutorial is designed to give you an introduction to the installation and use of ESTAT. For further information, you are invited to visit the official ESTAT homepage at:

This material is based upon work supported by the National Institutes of Health under Grant # R01 CA95949-01 Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Institutes of Health, the National Cancer Institute, or the Pennsylvania State University.

Tutorial Contents:

1. Installing ESTAT

2. Using ESTAT – A Step By Step Guide

2.1Loading Data and Creating Projects

2.2Introducing ESTAT

2.3Using the Scatterplot

2.4Using the Bivariate Map

2.5Using the Parallel Coordinate Plot

2.6Using the Time Series Graph

3. Further Reading and Contact Information

1. Installing ESTAT

ESTAT, like all applications created from GeoVISTA Studio, requires that you have installed the latest version of Sun Microsystem’s Java platform on your computer.

You will need Java version 1.5 or newer in order to launch ESTAT. The Java 1.5 platform can be downloaded here.

Once you have successfully downloaded and installed Java 1.5, download the latest ESTAT build and follow these instructions:

1.Unzip the ESTAT archive from the into your C:\ root directory with the "use folder names"option selected. This is important because it ensures that the archive unpacks with the correct directory structure.

2.Using Windows Explorer, My Computer, or whichever file manager program you prefer, navigate to the C:\pcpHome\bin directory.

3.Right click on the file called ESTAT.bat and "edit" it in a text editor, such as Notepad.

4.This will bring up a little bit of batch file code.The line that needs to be changed is: SET JAVA_HOME=C:\Program Files\Java\j2re1.5.0_02

5.You will need to put in whatever version of Java you have here.To figure this out, go to C:\Program Files\Java and use the directory name that follows there. For example, the path on your computer may be C:\Program Files\Java\j2re1.5.0_02 . You want the entire path, so type this after the equal sign on the SET JAVA_HOME= line.

6.Once you have done that, save the file, and you should then be able to double

Click the ESTAT.bat file to launch ESTAT.

7.Now, create a shortcut to the ESTAT.bat file by right clicking on the file and selecting “Create Shortcut.” Placethis shortcut on your desktop for easy access.

2. Using ESTAT – A Step By Step Guide

Launch ESTAT

Double-click on the desktop icon named “ESTAT” to start the program. A DOS window will pop up briefly as ESTAT begins to launch. Leave this window open so that ESTAT launches successfully.

2.1 Using the Data Loader

When the application has launched successfully, click on the file icon in the upper left corner to launch the Data Loader. A small popup will appear to ask if you’d like to name a new project or use an existing one. Select the “new project” option and click “Next” to proceed.

The next dialog box in the Data Loading Wizard will ask you to specify a name for your project. This is the name that will be used to save your project details.

Now, type in “ESTAT Tutorial” for the project name:

Click “Next” and continue to the next portion of the Data Wizard:

Here, you can choose to load Primary Data (data viewable in the map, PCP, and scatterplot) and/or Time Series Data. There are some situations where adequate time series data does not exist, and in that case you could opt to load Primary Data alone. For the data set you will use, time series data does exist, so make sure both boxes are checked and click “Next” when you are ready.

Now, you need to specify the file paths for the different data sources you wish to use. For this tutorial we will be exploring a dataset of cancer mortality rates and socioeconomic variables for all of the counties in the lower 48 states of the U.S.

The Data Paths screen in the Data Wizard is split into two parts. The top portion is for the Primary Data. The bottom is for Time Series Data.

Starting in the Primary Data paths section, click the folder icon on the right side of the “Observations” path area. Then navigate to the pcpHome/data directory, which will be located at C:\pcpHome\data if you installed ESTAT according to instructions included at the beginning of this document.

Select the file called ‘USA_Cancer_Ob.csv’ and click “Open.” This will cause the appropriate path to appear in the Observations path area. Repeat the same procedure for the Metadata file by selecting the ‘USA_Cancer_ObMeta.csv’ file. Finally, add the path for the shapefile called ‘USA_Cancer.shp’.

After you have taken care of the file paths for the Primary Data, continue to the Time Series data and select the ‘USA_Cancer_Ts.csv’ file for the Time Series path, and the ‘USA_Cancer_TsMeta.csv’ file for the Metadata path.

When you’ve done this your screen should look similar to this:

Picking all of these paths is a bit complicated, but once you have done it one time it is saved in your project details and will be recalled automatically when you reload this project.

Click “Next” when you are ready to move on to the variable selection screen. Here you need to pick some (or a lot!) of the variables available in the primary and time series data that you’d like to explore.

We’ll start with the Primary Data area, in the top half of the window. Take a moment to scroll down the list on the left side of the variables available for analysis. If you hold your mouse cursor over the “description” column, the full description will appear as a rollover. A lot of times the descriptions are longer than the space allotted in the table, so this can be a useful feature.

To help you manage all of these variables and select things systematically, there are two icons at the top left that you can use to sort and promote variables by category. Click the leftmost one, as shown below:

This will cause a popup menu to appear with a list of possible category choices. These categories are defined by the person who created the metadata file, a process that is described in Section 3.3 of this tutorial.

Select “Lung Cancer Mortality” from the dropdown list. This will cause all of the variables in that category to be promoted to the top of the list. You need to find the three variables in this group that cover the broadest range of Lung Cancer mortality – the Age Adjusted Rate of Lung Cancer Mortality for All Ages, All Races, Male + Female for the three time periods available (1993-95, 1996-98, 1999-2001). These three variables should be the last three in the category, as shown below:

Move these three variables over into the “Data for Analysis” area by clicking the top arrow button in the middle of the two main boxes.

This will send over those three variables to the other side. Next, add all of the variables from the category called “socioeconomic covariates.”

At this point your Data for Analysis window should look like this:

Now, move down to the Time Series Data area in the bottom portion of the variable selection screen. You want to find similar lung cancer time series data to look at, so use the category promotion icon to move “Lung Cancer Mortality” up to the top of the list. Select the three variables for the Age Adjusted Rate of Lung Cancer Mortality for All Ages, All Races, Male + Female for the three time periods available (1993-95, 1996-98, 1999-2001). Again, these should be the last three in the highlighted group that you just promoted.

The Data for Analysis screen should look like this when you are finished:

Click “Finish” when you are ready, and ESTAT will load these variables.

Note: Clicking “Finish” automatically saves the project and its details. When you return to ESTAT at a later date, you can choose “Load Existing Project” in the first screen, select “ESTAT Tutorial” from the list, and click “Finish” right away to skip variable selection and stick with whatever you selected last time you worked with that project.

2.2 Introducing ESTAT

ESTAT features four primary elements: a scatterplot, bivariate map, time series graph, and parallel coordinate plot. Each of these visualization methods is linked to the other dynamically, so selections you make and mouse movements you provide are coordinated throughout the application.

Once you have loaded data per the previous section of instructions, ESTAT will look similar to the following screen capture:

In the top left, you will find the bivariate scatterplot. At the bottom left is the bivariate map. At top right is the time series graph. At bottom right is the parallel coordinate plot. Each of these visualization methods is described in further detail in the following sections. You will use the data you have loaded to explore a few patterns as you learn details about how to use ESTAT.

2.3 Using the Scatterplot

We will begin with the Scatterplot element of ESTAT. By default it is located in the top-left quadrant of the program.

The Scatterplot in ESTAT is bivariate, meaning two variables are plotted against each other at a time. You can use the drop down boxes to select variables for the X and Y axes and the plot will change immediately to show you the distribution. Also, these selections are automatically linked to the map, since they are both bivariate tools.

The default settings aren’t particularly meaningful, so we’ll change them now to see something interesting. Start by double-clicking the colored box in the upper right-hand corner of the scatterplot window. This will launch the detailed version of the legend (also accessible by right clicking on this box and selecting that option).

First, select the variable “RLALLAALA9901” from the attribute dropdown box. This is the age-adjusted mortality rate for lung cancer among all-ages, all races, and both genders for the period between 1999 and 2001.

By default the interactive legend is set up for a univariate color scheme and representation. This is useful if you want to look at one variable at a time. In the univariate case, you will see a histogram overlaid by the color boundaries (which are determined by the classification scheme chosen). This gives you an idea of the distribution of the data versus the classification scheme that is selected. As you can see, by default the equal interval method has some classes with lots of values and others with hardly any at all.

Choose the “Quantiles” option from the Classifier dropdown menu and notice the immediate change.

Quantile classification specifies that each class must have an equal number of observations. If you hit the “OK” button now, you will see that the visualizations all change to match this new scheme. You can see a lot more variation now that you have specified that each class must contain the same number of observations. So what are we looking at here, then? You can find out by rolling your mouse over the variable names in the dropdown menus on the scatterplot window (or the map window, for that matter). By default you should be seeing two time periods of lung cancer mortality plotted against each other, and in the map you will see only the first one. Notice the relatively strong pattern of lung cancer mortality in the Appalachian and Southern parts of the U.S., as well as a bit in the Pacific Northwest.

To make things a bit more meaningful, let’s plot the first lung cancer variable against a socioeconomic variable. Open the interactive legend again (double click the small legend in the corner) and switch to Bivariate mode. Then select the variable “pctpoor” from the second dropdown box, as shown below:

Once you have done so, you will notice that the distribution will change in the legend area to show you this bivariate relationship. Now, make sure you switch the Classifier 2 method to Quantiles as well, since it will be equal intervals by default. Once you have done this, click ‘OK’ to apply these changes and go back to ESTAT.

All of the elements in ESTAT should update to reflect your choices. Now things are colored in a bivariate scheme, by default a green/purple scheme (for more on color schemes, check out You can use the axes in the scatterplot to interpret this scheme. Each observation is given a bit of green and a bit of purple depending on how high that place is concerning lung cancer mortality and percent impoverished, respectively. So places that are very green have high mortality and low numbers of poor residents, and vice versa. Places that are very light green/purple are low in both, and places that are dark greenish-purple are high in both.

You have surely noticed all of the activity caused by your mouse movements in ESTAT – this is called linked indication. Every time you mouse over an observation, it is highlighted in each view. You can also select groups by clicking and dragging a box or drawing a line. Drag a box now over the darkest purple-green portion of the scatterplot (high values in both variables).

Notice how you have now revealed only this subset in each of the visualization methods. This same selection technique works in every view in ESTAT. Also, you are provided correlation and r-squared values to examine the statistical relationships between these two variables when this selection is made.

2.4 Using the Bivariate Map

The Bivariate map in ESTAT has already been showing you quite a bit from your work with the scatterplot. Now let’s look at some of the features that are unique to the map.

The legend and selection behavior are the same as the scatterplot, so you already know how to change variables and classification methods. If you don’t need to change classification at all, you can also use the dropdown menus to switch variables quickly (this, of course, also works for the scatterplot). If you can’t remember what a certain variable name means, simply hold your mouse cursor over the name in the dropdown menu and the description will appear.

To ‘reset’ the selection you made a minute ago, simply drag a small box over an empty spot on the map or scatterplot and it will return to showing you everything. Do this now so we can look at the full map again. It should look like this when you’re finished:

The icons across the top are special tools you can use to change and explore this map. The default icon is the selection tool which is at the leftmost end of the icon bar. You can click and drag a box over part of the map to make a selection by geography. Try it now – select the southeast United States, or some other region you might be interested in. Once you have done that, click on the zoom-in icon, which is the next one over from the selection icon. Then you can drag a box or click your way in further to look at the region you selected in detail. To return to the full extent, simply click the globe icon. Once you are finished zooming, go ahead and return to the full extent using the globe. The hand icon will let you pan across the map, and the two icons to the right of that are a couple exploratory tools. The red icon is for showing excentric labels. If you click this icon and then roll your mouse over a few counties, you will see the names of adjacent counties in little boxes around the place you are looking at. If you click the green icon, you will cause a Fisheye lens effect when you roll over the map – this can help you pick apart areas that have small counties (or perhaps census tracts if we were looking at a different dataset).