Analysing HES data session three

Analysing Hospital Episode Statistics (HES)

Practical Session 3 -

Looking at several diagnoses for each age - Merging data files and creating calculated fields

Available to download from: www.robin-beaumont.co.uk/virtualclassroon/hes

Thursday, 30 August 2007

Written by: Robin Beaumont

Status: V1.0

Contents

1. Comparing different diagnoses for each age - Merging data files 2

1.1 Specifying the data set required 3

1.2 Creating a data file containing all episodes for each age 4

1.3 Creating a data file containing all episodes for each age for each diagnosis 5

1.4 Merging data files together 6

2. Review 7

2.1 Analysing the age specific frequency of diagnoses 8

2.2 Creating a calculated variable 10

2.3 Investigating the proportions of episodes for a particular diagnosis 11

2.4 Viewing several diagnoses at once. 12

3. Optional exercise 15

4. Appendix 16

1.  Comparing different diagnoses for each age - Merging data files

In the previous sessions we investigated the incidence of diagnoses for particular ages. In this session we will take the investigation of diagnoses one stage further and consider the relative proportion of episodes for several diagnoses simultaneously for each age. The final exercise in this session will provide you with the skills to produce charts similar to the one below which shows the proportion of cases for two different conditions for each age.

The above chart clearly shows the relationship between osteoarthritis and one type of head injury for each age. Also each diagnosis is shown as a percentage (proportion) of cases for each age.

Exercise:

What conclusions can you draw from the above chart?

Such information can be used to help guide the clinician to a possible diagnosis. Additionally by taking into account other factors a decision support system can be developed. We can use data like this to guide descions regarding the proably diagnosis given certain signs / symptoms and medical history etc.

To produce the above chart from the tando1 data file we will need to manipulate the data. This involves producing several SPSS data files ('*.sav' files) and then merging them together to produce the necessary data file. There are several stages and we will work through each one carefully and this hopefully will stop getting lost on the way.

1.1  Specifying the data set required

I created the above chart by using the menu option Charts -> line -> multiple, values for individual cases. The dialogue boxes are reproduced below. Seeing what we need to end up with in terms of fields helps plan our data manipulation strategy. .

To help further I have also included an extract from the final dataset as well.

This will mean little to you at the moment. What you should notice is hat onely two of the above fields are within the tando.sav dataset so you can guess that it will take quite a bit of work to transform our current tando1.sav dataset into the one above. therefore have quite a lot of work to do to get the dataset in the required. We will begin by creating the first and fourth columns in the above diagram, that is obtain the total number of episodes for each age.

1.2  Creating a data file containing all episodes for each age

We will begin by producing a file which contains the total number of episodes for each age. We will save this file on floppy disk and call it 'totfage.sav' being short in my mind for 'totals for ages'. We do this by what SPSS calls, aggregating the original data file (Tando1.sav). The exercise below explains how to do this.

Exercise: Aim to create a data file which has the total number of episodes for each age.

1 Make sure you have the 'tando1.sav' dataset open

2 Make sure you have the data window selected (to move to the correct window choose the menu option 'Window' and select the appropriate window).

3 Select the menu option Data -> Aggregate to aggregate the file.

You should now be presented with the dialogue box shown opposite

4 Move the 'startage' variable into the 'Break Variables' box. This instructs SPSS to create a new case (record) in the new datafile for each unique age

5 Make sure the 'Number of cases' is chosen

6 Type on the box 'totage' This is my shorthand for 'total no. of records for each age'

7 Select the 'Write a new data file containing only the aggregated variables’ option

8 Click on the 'File' BUTTON. This will bring up the second dialogue box shown above:

10 Move to a sensible folder to store the file, in the 'File name' box type in 'totfage'. Don't type the quotes

11 Click the 'save' button to return you to the previous, Aggregate data, dialogue box

12 Click the OK button on the 'aggregate data' dialogue box to run the aggregation process.

You now need to inspect to results of your work, that is open the new data file you have just created.

Exercise:

1 Choose the menu option file -> open -> data

2 Move to the folder you sorted the file in.

3 Select the 'totfage' data file you have just created .

4 If you are prompted to change any results choose 'No'.

You should now be presented with a data window similar to the one opposite. For each start age ('startage' variable) the total number of records in the original file is given in the 'totage' variable.

Comparing this data file structure with that which is eventually required we see that we have the first and third columns now.

1.3  Creating a data file containing all episodes for each age for each diagnosis

We will now create a data file which contains the total number of episodes for each age for each diagnosis. The resulting file, we will call 'totagedn.sav' being short in my mind for 'totals for each age for each diagnosis'.

Exercise: Aim to create a data file which has the total number of episodes for each age for each diagnosis.

1 Open the Tando1 data file
(Menu option file -> open ->Data)

2 Select the menu option Data -> Aggregate to aggregate the file

3 Move the 'startage' variable into the 'Break Variables' box. This instructs SPSS to create a new case (record) in the new datafile for each unique age

4 Move the 'diag_1' variable into the 'Break Variables' box. This instructs SPSS to create a new case (record) in the new datafile for each unique diagnosis as well

5 Make sure the ‘Number of cases' is chosen

6 Type on the box 'totagedi' This is my shorthand for 'total no. of records for each age and diagnosis'

7 Select the 'Write a new data file containing only the aggregated variables’ option

8 Click on the 'File' BUTTON

This will bring up the second dialogue box shown above:

10 Move to a sensible folder to store the file, in the 'File name' box type in ' totfdiagn'. Don't type the quotes

11 Click the 'save' button to return you to the previous, Aggregate data, dialogue box

12 Click the OK button on the 'aggregate data' dialogue box to run the aggregation process.

As before when you created an aggregated data file you now need to inspect to results of your work. This is described in the exercise below:

Exercise:

1 Open the 'totfdiagn' data file you have just created (menu option file -> open ). If you are prompted to change any results first choose 'No'.

You should then be presented with a data window which shows for each start age ('startage' variable) and each diagnosis the total number of records in the original file in the 'totagedi' variable.

1.4  Merging data files together

We now have three data files:

tando1 The original data file

totfage A data file containing the number of episodes for each age

totfdiagn A data file containing the number of episodes for each diagnosis for each age

We will now merge the totfage and totfdiagn data files to get one step closer to the data we require. We do this by what SPSS calls merging the variables from each of the files.

Exercise:

1 Make sure the 'totfdiagn' data file is open if it is not open it now.

2 Choose the menu option Data -> Merge files -> add variables. (Note: don't choose the add cases option). You will then be presented with the dialogue box shown below.

3 Select the option ‘An external SPSS data file’ and then click on the Browse button

4 Move to the correct folder and then choose the 'totfage' data file be clicking on it to make it highlighted. The name should now appear in the 'file name' box as shown opposite.

5 Click on the continue (may be ‘OK’) button. You will now be presented with the 'Add variables' dialogue box shown below.

6 Select the 'startage' variable in the excluded variables box. It will become highlighted.

7 Click on the box beside 'Match cases on key variables in sorted files'. A tick sign should appear in it.

8 Click on the second option ‘non active dataset is keyed table'. The button should then be grayed (i.e. selected).

9 Move the 'startage' variable into the 'Key variables' box by clicking on the button with the arrow sign on it beside the box. The dialogue box should now look like the one opposite.

10 Click on the OK button


12 The following dialogue box will appear. Because of the way we created the two files we know they are sorted on the relevant field.
Therefore click the OK button.

13 IMPORTANT. You must now save the file under a different name. Previous versions of SPSS created a new merged file the current version added the new filed to the current open dataset. We therefore need to save this new dataset immediately with a new name.
Select the menu option File -> Save As . . . Move, if necessary to a suitable folder, and give the new dataset the name diagage, which to me means diagnosis for each age

The new dataset looks like this

We now have four of the five variables specified in section 1.1 above. After getting this far making sure we have saved all three aggregated datasets is the first priority.

Exercise:

Check that you have saved all the datasets.

2.  Review

At this point we will review what we have done so far in the form of a diagram:

Although we have only merged two files you can merge large numbers of files together. For example you may have demographic details of subjects in one dataset and several separate datasets for each trial they may have participated in.

I would suggest that you now take a break – you deserve it!

The following assumes that you have the daigage.sav SPSS datafile open, if you don’t open it now

2.1  Analysing the age specific frequency of diagnoses

We will begin by looking at the relative frequency of one of the most common diseases in our dataset (see section 3 of the previous handout):

7151 Localised primary osteoarthritis

We do this by setting up a filter for diag_1 = "7151-"

Exercise:

1 Choose the menu option Data -> select cases This brings up the 'Select cases' dialogue box

2 Click on the 'If condition is satisfied' option

3 Click on the 'If ...' button to bring up the 'Select cases: if' dialogue box show below

4 Type into the box at the top right:

diag_1 = "7151-"

5 Click the continue button

6 Click the OK button on the 'Select cases' dialogue box.

We will now check our results by drawing a chart, this is the following exercise.

Exercise:

1 Choose the menu option Charts -> line

2 Select the Muliple option

3 Select the 'Values of individual cases' option

4 Click the 'define' button to bring up the next dialogue box

5 Move the 'Totage' variable into the 'Lines Represent' box

6 Move the 'Totagedi' variable into the 'Lines Represent' box

7 Select the 'Categories Labels - variable' by clicking on the option

8 Move the 'Startage' variable into the 'Variable' box

9 Click the OK button to obtain the graph. You should end up with a result similar to that shown below.


Unfortunately the x axis in the above chart is misleading, the x axis being unevenly spaced; to obtain a chart with a more sensible x axis we can use the scatter plot option:

Exercise:

1 Choose the menu option Graphs -> Scatter

2 Choose the 'overlay' option

3 Select the totage variable (it becomes highlighted)

4 Select the startage variable (both variables are now highlighted)

5 Click on the button with the arrow to move them both into the 'Y - X pair' box. You may need to click the Swap pair button so that you end up with totage – startage