Creating a Data File in SPSS
Welcome to a tutorial on Getting Started in SPSS - Creating a Data File.
Some terms you should know before you begin:
Variable name: in SPSS this is an 8 character, or less, name that you assign a variable, although later versions allow more than 8 characters. The SPSS program uses the variable name to identify the variable. You should try to be as descriptive as possible in naming your variables, e.g., gender could be called gender (6 characters), marital status could be called marstat (7 characters), etc. Remember, each variable name must be unique; you cannot name two variables the same name.
Variable label: this is a longer description of the variable name that appears on your output when you do your data analysis. For the variable marstat, you would type marital status for the variable label, for gender you would probably type “gender” or “gender of participant.” Variable labels are important as they include description that is not possible in the variable name.
Variable values: these are the range of possible responses for the variable. If the variable is interval or ratio, you do not have to assign variable values because the numbers have a true quantitative meaning, e.g., gross annual income is gross annual income. If the variable is nominal or ordinal, you need to define the values you will use for data entry. For example, gender could be female or male. You would need to assign each possible value a number, e.g., female = 0, and male = 1. Although the numbers are arbitrary and have no quantitative meaning, it is best to start with zero or one and work sequentially (0, 1, 2, 3, etc.) through all possibilities.
Where there is the possibility of missing data, you need to assign missing values a number, usually the number 9 is used for missing data, or 99 if the 9 is a real value, or 999, etc. You also need missing values specified for interval or ratio level variables, and again it is convention to use a series of 9s. You can leave the space blank for missing data, but if you do it is sometimes difficult to know if the blank was intentional or not. Also, if you have to add the variable to another variable, any cells with blanks will not compute. For instance, in one of my files the variables days in placement, days home, days runaway, were added to check that they equaled 365 days in a year. If they do not equal 365 days then I know I have an error(s) and I can easily find them. If missing values are left blank, SPSS will not add the values for that case, therefore I use zeros for 0 days, and a defined value like 999 for any missing data.
Lets get started using SPSS to create a new data file. Double click on the SPSS icon. Depending on the version of SPSS you are using, you may see a screen that allows you to run an SPSS tutorial, open an existing file, or create a new file. For now, select create a new file and click on OK and a blank screen with a grid on it should appear. In newer versions of SPSS you will arrive at an empty grid ready to create a new data file. It is advisable to return to the SPSS tutorial at some point. You can access the tutorial from the Help button on the main toolbar at the top of the screen.
If you look in the bottom left of the screen you should see two tabs and the tab that is highlighted will be “data view.” The variable view screen is where you will define the variables or make changes to how existing variables are defined. The second tab “data view” is where you will go to enter the data. You can click back and forth between these two tabs. The SPSS data file is structured with the variables in the columns and cases in the rows. All of the data for a single case (organization, individual, family, couple, etc.) is entered in one row and is referred to as a “record.”
The first thing you want to do in setting up a SPSS data file is to create a variable you will use to identify the case. Let’s begin by creating a variable for case identification. There are two ways to define variables. From the variable view screen you can click on the cell where the first variable name will appear, this is at the top of the first column on the first row. From data view you can click on “var” at the top of the column where you want the variable to be placed, this action will take you to the variable view screen to define the variable. You can begin by typing in your first eight-character, or less, variable name – which in this example we could use ID. Type in ID for the first variable name.
You have different options to help define the variable. The first is “type.” Most of the data we will be concerned with will be numeric, for example, our first variable ID is numeric. Data that consists of words are called “string” variables. Some variables may be in dollars and you could select “dollar” for the type. If the variable is a date, as in date of birth, it is important to specify that the variable type as a date. If not, you will not be able to use SPSS to calculate the time between two periods or age at a certain point such as baseline or entry to the program. If you select date at the type, then you must next select the format for how you will enter the date. It is important to be consistent throughout (e.g., mm/dd/yy, or mm/dd/yyyy).
When you design a database you want to consider speed, accuracy, and ease of data entry. The order of the variables you define should match with the order of your data gathering device (questionnaire, interview schedule, etc.). It will be very difficult to enter the data if it requires flipping back and forth between pages or moving the cursor up and down to different variables. You want to aim for a smooth process.
The next option is “Width.” The width you will select depends on the number of characters in the range of possible responses to that variable. For instance, if my ID data are all 3 digit numbers, I may want to change width from the default of 8 to 3. The width is set on 8 by default, this simply means the number of characters that are allowed for that variable unless you specify differently. If you changed width to two, the space for that variable would become very narrow – go ahead and try it out. I usually leave width at the default of 8 so that I can read the 8 character variable names. If you need more spaces, for example, if you are entering a 9 digit ID number you would need to change the width to 9, and so on. To change these values, simply click on the cell and then follow the arrows, or gray box, etc., whatever appears.
The next field is “Decimals.” Unless I want decimal places I like to change the decimal from 2 to 0, otherwise I will have a decimal and two zeros behind every value in the database.
Once you are finished changing type, width, and decimals you will see the option for values. In older versions of SPSS you need to click on continue. Since there are no value labels for ID, and should be no missing data, you can leave it at the default which is set on none for missing. Also in older versions of SPSS click on ok and you are ready to start defining another variable. You will not need to change anything under the option column format, align, or measure. Do be careful however that under measure, nominal or ordinal is not chosen for a variable that is continuous, or that you are using as interval or ratio data. Scale is the default.
Now, go to the second row and type in “gender” for the second variable name in the cell under the first variable ID. Change the field decimals so that there are no decimal points allowed. Click on label or place your cursor in the label cell. Type in the longer descriptive title for the variable in the cell or in older versions in the space beside “variable label.” Next define the value labels by clicking in the cell and then on the dots in the gray box to the right. Type a numeral “1” in the box labeled “value.” Tab to the next box or use the arrow key and type in the “value label” – which in this example is female. Click on “Add.” You will see 1 = female appear in the value label box. Now type a 2 in the box labeled value, tab to the next box and type in male, then click on Add. If you make an error, simply click on the value definition (e.g., 1=female) and then on “remove.” When you are finished defining all possible values for the variable, click on OK or continue. Remember that the values must be unique and mutually exclusive. If two values are possible, as in questions that instruct the respondent to choose all that apply, you must create a separate variable for each value and then indicate the response as yes or no.
Now define “missing values,” in some cases you may not know the gender of some of the respondents, in this case you would click in the cell, then on the gray box in the corner, and change the default from no missing values to discrete missing values or a range of missing values if appropriate. For gender we will enter a 9 for missing values in one of the three spaces below discrete missing values. Click “OK” or “continue” when you are done defining the missing values.
Now you have finished defining two variables. The next step is to save your file. Similar to all computer programs, you should save your work intermittently to avoid losing it.
To save the data file, click on “File” in the upper left hand corner of the top toolbar, then on “save as,” and then name your file as you would any other file. The extension .sav will automatically be added to all data files. You can save your file to a specific drive if you have a disk or a memory stick with you. To do this click on the little folder beside “save in” at the top of the save screen. Select the directory where you want the file to be stored. You only need to execute the “save as” command once. Subsequently when you want to save you will click “save” from the File option on the top toolbar.
Once you have defined all of your variables and saved your file, you are ready to start entering data. I like to use the number pad on the right side of the keyboard to enter data on a desktop computer. Make sure the number lock is on if you use the number pad. In newer versions of SPSS you will go to the data view tab in the left hand corner of the screen to enter data.
For the variable ID, place your cursor in the first cell where ID and row one intersect, type in the first value for ID “1.” If you press enter you will advance to the next row. If you press the right arrow key you will advance to the next cell in the same row. Continue entering data until you get to ID number 15. If you make a mistake, you can backspace and fix it, or simply go back to that cell, click on it, and type over what was there or press delete. Now move your cursor to the variable gender and enter in the values for male and female for your 15 cases. You can enter fictional data and even type in some 9s for missing data. Remember to save your data file every now and then.
There are two ways to view the data in your data file. One is to view the numeric values – the numbers, the other is to view the actual values – the words such as male or female.
To change the view, click on “view” on the top toolbar, then go to the item “value labels,” and click on it. To change it back again, follow the same steps.
To delete a variable or delete a case, place your cursor on the variable or case at the edge of the row or column just outside the data grid and click. You will see the row or column highlighted. Next, select “edit” from the top toolbar and then “clear,” or simply click on “delete.”
Delete the case in row 8 and see what happens.
Under the “data” tab on the top toolbar you can insert a variable, insert a case, or go to a case. Try all three – insert a variable, insert a case, and go to a case by using the ID number.
If you have many variables with the same value labels, such as in a satisfaction interview in which the same Likert scale is used for each item, you can use the “template” option under “data” to define your value labels, missing values, and type. This avoids having to retype the same information for every variable. In newer versions of SPSS the option for this action is called “copy data properties.” SPSS will prompt you with a series of questions as to where you want to copy the properties (the working data file in this case), the source variable (the one you want to copy from), the working file variables (the one or ones you want to copy to), and then you will see a screen where you can select which properties you want to copy, and then chose to execute the command.
The “utilities” option on the top toolbar is useful to review your variables and their definitions. Select “variables” from the utilities option and see what happens. Next select the option called “file info.”
The key to learning SPSS is to practice and not be afraid to experiment. There is usually more than one way to do what needs to be done. The best way to learn about your preferences and to gain speed is to go ahead and try new approaches.
Research for Effective Social Work Practice by Judy L. Krysik and Jerry Finn
© 2010 Routledge / Taylor & Francis