METU

STATISTICS

FALL 2009

Dr. Ozlem Ilk

STAT 495-LAB2

Content: Introduction to GGOBI

To download the software, go to http://www.ggobi.org/. At the very end of the page find “Download” and click on it. Follow the instructions for Windows. After you downloaded both Gtk and GGobi, find the folder you have downloaded them (probably under C://Program Files) and click on gtk...exe file and “Run” it, then click on ggobi....exe file and “Run” it.

There are two file formats that the GGobi accept: An Excel file saved with a .csv extension; an XML file. You can find examples of both in the demo datatsets that come with the GGobi software itself.

Although excel is easier to handle, especially for beginners, XML has additional featuers. For example, you can enter a description of the data in XML, for others to read or for you to remember later. Another example is that if you need to connect some observations with a line (such as the observations that belong to the same person), then you should use XML and not Excel.

First we will see the Excel file of a data set. Go to My Computer. Go to C:\Program Files\ggobi\data. Find places.csv file. Click on it to open. You can use a similar file to save your own data.

Now we will see the XML file of the data. (Again, go to My Computer. Go to C:\Program Files\ggobi\data.) Find places.xml file. Right click on it, and click on open with Internet Explorer or XML Editor. You should be able to see the data description and data.

<description>The "places data" were distribed to interested ASA members a few years ago so that they could apply contemporary data analytic methods to describe these data and then present results in a poster session at the ASA annual conference. Latitude and longitude have been added by Paul Tukey. … Economics: average household income adjusted for taxes and living costs, income growth, job growth.</description>

Now click on GGobi icon. Open up the dataset called places.xml under C:\Program Files\GGobi\data.You will see scatterplots of Climate vs HousingCost. You can easily construct the scatterplots of other variables by clicking on the X, Y buttons in the ggobi.exe window. Try the scatterplot of Lat vs Long.

Suppose you would like to see the relation between climate and the location of the place. On ggobi.exe window, click on Display, and New Barchart. You will see a barchart of one of the variables. Click on X right next to the Climate variable.

Now, click on Brush (depending on the version it is either under ViewMode, or Interaction). Move the box to highest and lowest climate values, and see how the corresponding points change on the scatterplot. This is called linked brushing. You are basically looking at the joint distribution of long and lat given climate.

Transient, Persistent brush. Choose color and glyph.

Click on scatterplot, then identify (again, depending on the version it is either under Viewmode, or interaction). Now go to points on scatterplot. This is called linked identification. Open up scatterplot of educ vs crime. And try identification here. See which cities have the highest crime rate. Find where IA is...

Under tools, click on variable manipulation. This will tell you more about each variable, min, max, # NA... Now, under tools, click on variable transformation. Select for eg, climate, and at stage 2 click on discretize. Check out the barchart of climate. If you want to see what kind of discretization done, brush climate=0, then click on reset all.

Click on scatterplot, then rotation (Under Viewmode or View). You will see 3 dimensional projections of the first 3 variables.

Now lets work on Shipman.xml:

<description>This data was taken from the web site http://www.the-shipman-inquiry.org.uk, which describes an investigation into the convicted murderer Dr. Harold Shipman. Shipman had lived in Hyde, England, it's been determined that he killed at least 215 of his patients during his 23 years of medical practice. The records are ordered by time. A number of cases (a few hundred?) were excluded from consideration: patients of Shipman's who had remained alive and well for some time after his last visit, for instance, or patients who died in hospital after significant treatment, or those who died abroad or in accidents. So these are all cases where Shipman could not be quickly excluded from consideration. It is reported that one early clue was the clustering of deaths at certain times of day, but the time of death isn't available on the web site. This material is protected by Crown copyright.</description>

Look at the variable manipulation. You will see some variables under categorical. Look at the scatterplot of place vs cause. Under tools, click on variable jittering. Select cause. Move degree of jittering just a bit. Now click on place, and jitter that one. You can also see the density at one point by using brush option. Just move the cursor on one point.

Subseting data.

3