Possible Gender Discrimination in Salary at Fifth National Bank

The Fifth National bank is facing a gender discrimination suit. The charge is that its female employees receive substantially smaller salaries than its male employees. The bank’s employee database is found in SPSS file: 5thNatlBank.sav. For each of its 208 employees, the data set includes the following variables:

·  EducLev: education level, a categorical variable with categories 1 (finished high school); 2 (finished some college); 3 (obtained a bachelor’s degree); 4 (took some graduate courses); 5 (obtained a graduate degree)

·  JobGrade: a categorical variable indicating the current job level, the possible levels being 1-6 (6 is highest)

·  YrHired: year employee was hired

·  YrBorn: year employee was born

·  Gender: a categorical variable with values “female” and “male”

·  YrsPrior: number of years of work experience at another bank prior to working at 5th National

·  PCJob: a categorical yes/no variable depending on whether the employee’s current job is computer-related

·  Salary: current annual salary in thousands of dollars


Do these data provide evidence that females are being discriminated against in terms of salary?


A naïve approach compares the average female salary to the average male salary. This can be accomplished by examining the “Descriptive Statistics.”


The average of all salaries is $39,921.92 (approximately $39,922). The female average is $37,209.93 (approximately $37,210), the male average is $45,505.44 (approximately $45,505).


The next level of sophistication is to perform a hypothesis test of the difference between two means. This can be done with a one-tailed hypothesis test.

H0: mF - mM ³ 0

H1: mF - mM < 0 (average female salary is smaller than average male salary)


From this hypothesis test one can conclude that females are earning less on average than males (at any reasonable significance level).

But perhaps there is a reason for this. They might have lower education levels, they might be working at lower job grades, and so on. The question is whether the difference between female and male salaries is still evident after taking these other attributes into account. This is a perfect task for multiple regression analysis with dummy variables.

First we need to create dummy variables for the various categorical variables. We can do this with SPSS’s TRANSFORM ® RECODE ® INTO DIFFERENT VARIABLES statements (be sure to code “Old and New” values. For the Gender categorical variable, create a dummy variable called “FemDum” (1 if the employee is female, 0 if the employee is male). (This has been done for you in the file.)

Next, create 4 dummy variables for EducLev’s 5 categories (Ed2Dummy through Ed5Dummy). For example, Ed2Dummy is 1 if the “educational level is 2” and is 0 otherwise, Ed3Dummy is 1 if the “educational level is 3” and is 0 otherwise, etc. Notice that EducLev 1 is represented by having Ed2Dummy through Ed5Dummy all being 0’s. Although any four dummy variables could be used to represent the 5 education level categories, we use Ed2Dummy through Ed5Dummy so that the lowest level (education level 1) becomes the “reference category.” We use the same procedure to create 5 dummy variables for the 6 Job Grade Categories (called Job2Dum through Job6Dum.) (This has been done for you in the file.)

·  Sometimes we might want to collapse several categories. For example, we might want to collapse the 5 education categories into three categories: 1, (2,3), and (4,5). This would entail having only 2 dummy variables to represent the 3 new categories. The new first category (finished high school) would remain the same as the old first category – our “reference category” with all 0’s for our two new dummy variables. The new second category includes employees who have taken undergraduate courses or have completed a bachelor’s degree, and the new third category includes employees who have taken graduate courses or have completed a graduate degree. It is easy to create the two new dummy variables to represent these two new categories by either by using the RECODE statement, or by adding the Ed2Dummy and the Ed3Dummy together to create the new second category, and adding the Ed4Dummy and the Ed5Dummy together to create the new third category. (This has not been done in the SPSS file. For the purposes of this example, we will retain the original 5 educational categories and their corresponding 4 dummy variables.)

For the PCJob categorical variable, create a dummy variable called “PCJobDum” (1 if the employee’s current job is PC related, 0 otherwise).

Finally, create other new variables related to age and experience. First create a new variable called “Age.” This can be coded as 95 (i.e., the data are from 1995) minus the YrBorn. Next create a new variable called “YrsExper” representing the number of years of experience with 5th National Bank. This can be coded as 95 minus YrHired. (These variables have been created in the SPSS file.)

Now we are ready to run our regression, using Salary as the dependent variable. We will proceed in several stages for learning purposes.

First, estimate the regression equation with only one explanatory variable, FemDum. The output appears below.

The resulting regression equation is:

Estimated Salary = 45.505 – 8.296FemDum (Eq. 1)

To interpret this equation (equation 1), recall that FemDum has only 2 possible values, 0 and 1. If we substitute FemDum = 1 into the equation, we get:

Female Estimated Salary = 45.505 – 8.296(1) = 37.209

This represents the average female salary (as we have seen before).

Similarly, if we substitute FemDum = 0 into the equation (i.e., males), we get:

Male Estimated Salary = 45.505 – 8.296(0) = 45.505

This represents the average male salary (as we have seen before).

Therefore the interpretation of the – 8.296 coefficient of the FemDum variable is the average female salary relative to the reference (male) category; that is, females get paid $8296 less on average than males.

Obviously, our first regression attempt tells only part of the story. It ignores all information except for Gender. Now we will expand our equation by adding experience variables YrsPrior and YrsExper. The output with the FemDum and these two experience variables appears below.

The corresponding regression equation is:

Estimated Salary = 35.492 + 0.988YrsExper + 0.131YrsPrior – 8.080FemDum (Eq. 2)

Again, we will write our regression equation (equation 2) in two forms: one for females (substituting FemDum = 1) and one for males (substituting FemDum = 0). After doing the arithmetic they become:

Female Estimated Salary = 35.492 + 0.988YrsExper + 0.131YrsPrior – 8.080(1)

= 27.412 + 0.988YrsExper + 0.131YrsPrior

and

Male Estimated Salary = 35.492 + 0.988YrsExper + 0.131 YrsPrior – 8.080(0)

= 35.492 + 0.988YrsExper + 0.131 YrsPrior

The female and male forms of the equation differ only by the intercept term. We can interpret the – 8.080 coefficient of the FemDum as the average salary disadvantage for females relative to males after controlling for job experience. Gender discrimination appears to be a very plausible conclusion. However, note that the R2 value is only 49.2%. Perhaps there is more to the story.

Next, let’s add education level to the equation by including the 4 education level dummies, Ed2Dummy through Ed5Dummy. The resulting output appears below.

The estimated regression equation is now:

Estimated Salary = 26.613 + 1.033YrsExper + 0.362YrsPrior – 4.501FemDum + 0.160Ed2Dum + 4.765Ed3Dum + 7.320Ed4Dum + 11.770Ed5Dum (Eq. 3)

Now there are two categorical variables involved, Gender and Education Level. However, we can still write separate equations for any combination of categories by setting the dummies to their appropriate values. For example, the equation for females at the fifth education level is found by setting FemDum =1 and Ed5Dum =1, and setting the other job dummies equal to 0. After terms are combined, this equation is:

Estimated Salary = 33.882 + 1.033YrsExper + 0.362YrsPrior

The intercept 33.882 is the intercept from (Eq. 3), namely 26.613, plus the coefficients of FemDum and Ed5Dum.

We can interpret (Eq. 3) as follows.

For either gender and any education level:

·  The expected increase in salary for one extra year of experience with 5th National Bank is $1033

·  The expected increase in salary for one extra year of prior experience with another bank is $362

The coefficients of the education dummies indicate the average increase in salary an employee can expect relative to the reference education level (here, education level 1, the lowest level). For example, an employee with education level 4 can expect to earn $7320 more than an employee with education level 1, all else being equal.

Finally, the key coefficient, the – 4.501 for females, indicates the average salary disadvantage for female relative to males, given that they have the same experience levels and the same education levels. Note that the R2 value is now 64.5%, quite a bit higher than when the education dummies were not included. We appear to be getting closer to the truth. In particular, we see that there appears to be gender discrimination in salaries, even after accounting for job experience and education level.

One further explanation for gender differences in salary might be job grade. Perhaps females tend to be in lower job grades, which would help explain why they get lower salaries on average.

One way to check this is with a pivot table as in the table below. Clearly, females tend to be concentrated at the lower job grades. For example, 28.85% of all employees are in the lowest job grade, but 34.29% of all females are at this grade and only 17.65% of all males are at this grade. The opposite is true at the higher job grades. This certainly helps to explain why females get lower salaries on average.

With respect to our regression model, we can go one step further to see the effect of job grade on salary by including the dummies for job grade in the equation, along with the other variables we have included so far, as well as the Age variable and the PCJobDum variable.

As expected, the coefficients of the job dummies are all positive, and they increase as the job grade increases—it pays to be in the higher job grades. The effect of Age seems to be minimal, and there appears to be a “bonus” of close to $5000 for having a PC-related job. The R2 value has now increased to 76.5%, and the penalty for being female has decreased to $2555—still large, but not as large as before. However, even if this penalty is considered “small,” is it convincing evidence against the argument for gender discrimination? Probably not, there is still reason to suspect job discrimination. We have used variations in job grade to reduce the penalty for being female. But the remaining question then is, “why are females predominately in the low job grades?” Perhaps this is the real source of gender discrimination. Perhaps management is not advancing the females as quickly as it should, which naturally results in lower salaries for females.


Further Analysis: Exploring Interaction Effects

1

Source: Albright, Winston, Zappe