Ashley Long
2005 Senior Economics and Business Major
Presented at the 2004 Butler University
Undergraduate Research Conference
Effects of Social Factors on
Teenage Birth Rates
Abstract
Despite more heightened social awareness, the problem of teen pregnancy in the U.S. continues to be the focus of study. Consistent with past literature on the topic, this paper models teen birth rates as a function of social, demographic and economic factors. Using state-level data in 2000, empirical results indicate that birth rates per 1000 teenagers are lower in states with higher per capita income, higher average levels of educational attainment, and a lower proportion of single-parent homes. There is also evidence to support the finding that teen birth rates are higher in states with a higher proportion of African-American residents.Table of Contents
Introduction Page 4
Literature Review Page 5
Model Page 6
Data Page 8
Empirical Results Page 9
Conclusion Page 11
Data and Tables Page12
Bibliography Page 17
Introduction
Teenage pregnancy is an important issue in today’s society, affecting almost everyone. Coming from a high school where a large number of the girls in my class had children, I am particularly interested in what variables impact the teenage birth rate. This empirical model looks at the effect several social factors have on the birth rate and whether or not the effect is significant.
I use a cross-section of data from 2000 for each of the fifty states and Washington D.C. My dependent variable is the teenage birth rate. The independent variables include the percentage of the state with a general education diploma, the percentage of the state that is white, the percentage of single parent homes, and the per capita income. Previous literature indicates that these factors significantly affect the teen birth rate.
My data is gathered from government agencies. The data concerning the teenage birth rate was obtained from the U.S. Department of Health and Human Services. The data for all of the independent variables was obtained from different departments within the U.S. Census Bureau.
My hypothesis, based on previous literature, is that these social factors should have a significant impact on the teenage birth rate in each state. This paper looks at the tests used to determine if this hypothesis was correct. It also looks at the possible problems that might affect the accuracy of those results as well as corrections for those problems should they exist. After much testing, I find each of the independent variables to have a significant impact on the teenage birth rate even after correcting the standard errors with the White correction for some slight heteroskedasticity. Other testing finds that there are no other econometric issues with the model.
Literature Review
While looking for literature concerning teen pregnancy, I was not able to find an actual regression model to aid in the construction of my own, however, I did find numerous books about social factors that affect the probability that a teenage girl will become pregnant. Dryfoos (1990) found that “while information about births is relatively easy to collect in a given population, abortion rates are elusive because of underreporting” (175). This means that the actual teen pregnancy rate is hard to calculate and because of this my dependent variable will be the teenage birth rate rather than the teenage pregnancy rate.
One of the factors that contributes to early childbearing is education. Dryfoos found that having a low expectation for educational attainment increased the probability of having children early in life. This seems logical because if a girl is not planning on continuing her schooling, she will probably be more inclined to start a family earlier. Luker (1996) found that teen mothers had likely been held back a grade or had had trouble in school. Hamburg and Lancaster (1986) state in their book School-Age Pregnancy and Parenthood, “positive attitudes towards education, higher levels of educational achievement, and clear education goals appear to make non-marital intercourse less likely for both white and black females” (198). If non-marital intercourse were less likely, then an early pregnancy would be less likely and would result in a lower birth rate.
This is why I include a variable to measure a state’s educational attainment and the completion of high school.
Another factor that is correlated with early childbearing is race. Maynard (1997) found that the percentage of teens having sex differed from race to race. African American teens were found to be more sexually active than white and Hispanic teens. It would seem logical then, that their birth rates would also be higher. Dryfoos found that early childbearing was more likely to occur if the girl was African American. Wodarski (1995) also came to the conclusion that “sexual activity among teens varies according to race,” (5) with African American teens being more sexually active than white teens. At the time their book was published, the pregnancy rate for African Americans was twice as high as for whites. I therefore include a variable to account for racial differences.
Income also seems to be a determining factor in early childbearing. Dryfoos found that many teen mothers come from a low-income family and/or an area of high unemployment. Luker came to the same conclusion. She found that eighty percent of teen mothers came from poor backgrounds and, along with the race issue, forty percent of those were of African American background. To account for this, I include a variable for income. Both Dryfoos and Luker found that teen mothers likely come from single parent families so I include a variable to account for that as well.
Model
The dependent variable in my equation is the teen birth rate in each state in 2000, measured in births per thousand. This variable is labeled as BR. The independent variables in my equation are education, race, single parent families, and income.
The first variable in my equation is education. This equation will be labeled GED as it is the percentage of the state with a high school diploma or general education diploma. I would expect that as education increases, the birth rate would go down. I believe this because according to previous literature, a girl who has higher expectations for education is less likely to conceive early in life. This makes sense because if a girl, or boy, is serious about having an education, he/she will be more cautious about having sex, especially unprotected sex because he/she knows that it would hinder his/her plans for furthering his/her education. If this is true, then the sign of the coefficient for GED should be negative.
The second variable in my equation is race. This variable will be labeled as R, and it is the percentage of the population that is white. My research has found that minorities are more likely to have children early in life. This may be because of cultural differences or some other factor. I would expect that as the percentage of whites in a state increases, the birth rate would decrease. This would make the sign of the coefficient negative.
The third variable in my equation is single parent families. This variable is labeled SPF and is the percentage of families in a state with only one parent. My research shows that having only one parent around is likely to have a positive effect on early childbearing. This would mean that as the percentage of single parent families increases, the birth rate would increase as well. This would make the sign of the coefficient positive.
The final variable in my equation is income. This is labeled as Y. It is the per capita income of the state. Research shows that girls and boys who come from a low-income family are more likely to have children early. This would make sense for a number of reasons. Since these families may not have money to send their kids to college, their expectations for education may
not be as high. This would increase the probability that their sons and daughters would have children early. Along with this, these girls and boys may have to drop out of school to get a job to help support the family. Another reason this would make sense is that if these girls are having sex, they may not have the money for effective contraceptives such as birth control pills. This would mean that as income goes up, the birth rate would go down, making the sign of the coefficient negative.
The mathematical representation of my model would be as follows:
BRi = β0 + β1GEDi + β2Ri + β3SPFi + β4Yi + Єi
The main hypothesis I am testing is if these variables have a significant effect on the birth rate. My null and alternative hypotheses are as follows:
Ho: β1≥0 ; β2≥0 ; β3≤0 ; β4≥0
Ha: β1<0 ; β2<0 ; β3>0 ; β4<0
The form of my equation is linear because I think the slopes of my variables with respect to BR are constant. Since I am not using time series data, I do not expect to have a problem with serial correlation but I will be testing for it. There could possibly be a problem with multicollinearity so I will look at the correlations between the independent variables to see if any of them are too high. There may be some correlation between income and education or income and single parent families. Since cross-sectional models typically come across problems with heteroskedasticity, I will definitely test for this problem. I will use the White heteroskedasticity test and, if needed, the White correction. I do not foresee simultaneity being a problem in my model because I do not think that current birth rates affect my independent variables.
Data
My data is cross-sectional from each of the fifty states as well as Washington D.C. from 2000. The descriptive Statistics can be seen in Table 2. I have used percentages or per capita figures in order to try to eliminate large variances between states because of large population differences. This will help eliminate the problem of heteroskedasticity. My data was collected from numerous government websites, the most useful being that of the U.S. Census Bureau, where I got my data for all the independent variables. The data for birth rates was collected from the U.S. Department of Health and Human Services. I did not encounter any problems with my data such as mismeasurement or missing observations.
Empirical Results
I first ran ordinary least squares on my model. This gave me a regression equation of:
BR = 164.87 – 1.15GED – 0.31R + 2.77SPF – 0.0009Y
As stated earlier, I expected GED, R, and Y to have a negative effect on BR and SPF to have a positive effect. If GED increases by one percent, BR will decrease by 1.15 per thousand. If R increases by one percent, BR will decrease by 0.31 per thousand. If SPF increases by one percent, BR will increase by 2.77 per thousand. If Y increases by $1000 dollars, BR will decrease by 0.9 per thousand. My R2 was 0.72 and my adjusted R2 was 0.69 (see Table 3). Since the adjusted R2 was not much lower, this means that the model is a good fit and the variables explain sixty-nine percent of the change in the birth rate. Looking at the preliminary equation, the coefficients have the expected signs. To test to if they are significant or not I did a hypothesis test. My null and alternative hypotheses were as follows:
Ho: β1≥0 ; β2≥0 ; β3≤0 ; β4≥0
Ha: β1<0 ; β2<0 ; β3>0 ; β4<0
I used the ninety-five percent confidence level, which would mean the critical t-value would be approximately 1.684 with 46 degrees of freedom. The t-statistics for the variables were: GED = -3.521; R = -3.253; SPF = 2.204; Y = -3.222 (see Table 3). Since these are all greater than the critical t-value, I can reject the null for each variable and conclude that each is a significant variable. I also did an F-test to test the significance of the model as a whole. I again used the ninety-five percent confidence level. The null and alternative hypotheses are as follows:
Ho: β1, β2, β3, β4=0
Ha: Ho is not true.
The probability for the F-statistic was 0, which is below 0.05 so I can therefore reject the null and conclude that the model is significant.
Next, I look at some possible econometric issues that my model might face. In order to check for multicollinearity, I look at the correlations between each of the variables (see Table 4). The correlation between SPF and the other variables was a bit high but since the t-stats were not low, another indicator of multicollinearity, I do not feel that this is a problem.
I then check for serial correlation. Since my model is not times series data, I do not expect this to be a problem, but to check, I look at the Durbin Watson statistic (see Table 3). At the ninety-five percent level, the lower limit is 1.38 and the upper limit is 1.72. Since the Durbin Watson statistic of1.9 is greater than the upper limit of 1.72, I can therefore conclude that I do not have serial correlation.
The next problem to check for is heteroskedasticity. I first did a White test (see Table 5) which tests the null hypothesis that there is no heteroskedasticity. Tested at the ninety-five percent confidence level, I cannot reject the null meaning there is no heteroskedasticity. However, tested at the ninety percent confidence level, I can reject the null, which would mean that there is heteroskedasticity. Since the model is borderline for having heteroskedasticity, it would be appropriate to correct the problem. In order to correct for it, I ran an equation with the White correction to fix the standard errors. This does not change the standard errors in any meaningful way (see Table 6). Since the critical t-statistic was approximately 1.684, this would not change the significance of the variables, as their t-stats are all still greater than the critical t-value. I do not think simultaneity is an issue in this model, but a more comprehensive study might wish to test for simultaneous relationships between BR and other variables.