The Determinants of Rural Children’s High School Enrollment in China

--an analysis based on multilevel models

Juan Yang

School of Economics and Business Administration

BeijingNormalUniversity, 19, Xinjiekouwai Dajie, Beijing 100875, China

E-mail:

Terry Sicular

Department of Economics
University of Western Ontario, London, ONN6A4A4Canada

Desheng Lai

School of Economics and Business Administration

BeijingNormalUniversity, 19, Xinjiekouwai Dajie, Beijing 100875, China

E-mail:

Preliminary draft

1 Introductory

The development of rural areas is a key factor to the prospect of China’s economy and society. Economists think the accumulation of physical capital and human capital are two driving force to the development of rural areas (Jacoby, 1997). The former depends on the improvement of rural financial market, while the later is determined by the education, health, training, labor mobility etc. Among them, education plays a quite important role, which could promote economic growth and income distribution in the macroeconomic level as well as increase labor’s productivity and parents’ income in the microeconomic dimension.

However with the expansion of higher education, the educational attainment of rural students does not have significantly changes. What’s more less and less rural students could access to top universities and the percentage of rural students in tertiary education is continuously declining. In China, senior school is the beginning of non-compulsory education and is the prerequisite period to enter university. Understanding rural students’ high school attainment could shed light on their higher education decisions.The research on rural students’ educational choice in high school (upper secondary school) period in China is a virgin area for quite a long time. Most of the researches are concerned the drop-out rate in compulsory education or enrollment decision for higher education (Liu, 2007, Li, 2010), since senior school education is treated as the extension of compulsory education or preparation period for higher education. In labor market, high school diploma could not send any ability signal to the employer and did not get the corresponding return.

With the expansion of higher education, urban high school enrollment has significantly increased after 2000, while rural students’ high school enrollments do not have many changes (refer to figure 1 for detail). Researching the determinants of rural students’ senior school attainment could predict their higher educational attainment and provide some suggestions for improving their enrollment rateand related urban-rural income gap problem.

In the early human capital theory, educational investments are treated as individuals’ educational choices: individuals will maximize their utilities by investing on education until the cost equals the discounted benefits. In the 70s, some economists suggest educational choices should be based on a family decision model instead of individuals’ choices (Becker & Lewis, 1973, Lazear, 1980). Following family educational choices model, many researches focus on estimating the impact of family characteristics (family income, parental schooling, household size, maternal time, etc.) on their children’s academic achievement (Ashenfelter & Rouse, 1998, Björklund et al, 2010). In the same time, community or school district (reflected in local school supply, local economic development, etc.) is also considered to be an important factor to pupils’ educational achievement (Strauss, 1995).

Family background is widely recognized as an important factor to their children’ education (Couch and Dunn 1997). However, the empirical evidence in most countries demonstrate the effect of family background falls over time, i.e., the cohort-specific effects of years of parents’ education on years of child’s education is higher for older cohorts and lower for younger cohorts (e.g. Hertz et al., 2007). Whether this impact has a decreasing trend with the development of economy and expansion of education is unclear in China.

There are several researches discuss the impact of family background on Chinese children’s schooling. Brown (2003) analyzes the effect of parents’ education on children’s academic achievement (test scores). He studies the problem by constructing a utility function and parents may choose the optimal goods level and time spending to maximize their utility. He employing Chinese rural data shows more educated parents provide higher levels of botheducation-related goodsandeducation-related time and parental education has a strong positive effect on children’s test scores. Knight et al (2009) using CHIP-2002 rural data finds both parental education and family income have significant impact on individuals’ middle school dropout and continuation to high school. Knight et al (2011) investigates the intergenerational mobility for each cohort group and the corresponding educational policy in each period by CHIP-2007. The findings suggest the impact of family background is rising for the late birth cohort. The findings by Li (2010) show that higher education expansion is augmenting the impact of family background on their children’s educational attainment and the educational inequality between rural and urban areas is enlarging.

In 2000, Chinese government implements the tax and fees reform in rural areas, which largely relieve rural family’s financial burden in education. In the same time, the government began to exempt poor rural students’ textbooks fess, miscellaneous fees and compensate poor rural students’ living expenses (also known as “two exempt one compensate”). In 2005, Chinese government provides special funds to exempt tuition fees and provide accommodation fees for rural poor students in compulsory educational period. In 2007, all the rural students’ tuition fees and textbook fees are exempted in the compulsory educational period. These policiesresult the increasing speed of per pupil expenditure significantly improves after 2000 and rural-urban gap in educational expenditure is continuously declining. This paper is intending to investigate whether the impact of family background and community characteristics on children’s high school attainment has changed after sharply increased educational funds.

The existed empirical literatures on analyzing individuals’ high school attainment are generally based on a probit or logit model (Sawada, 2001, Jensen, 1997, Al-Samarrais, 1998) and most of researches about China also use similar method to evaluate rural children’s educational decisions. As discussed before, the factors that may affect individuals’ educational choices are multilevel, which at least includes student level and community level. Pupils’ enrollment decision in different communities may be independent, while their decision within the same community may be not. Grouping all pupils within a community is not an appropriate method to analyze their enrollment decision. In a probit or logit model, individuals’ variables and community variables are put into the same regression equation, which is not appropriate, since variables reflect community characteristics should explain the differences between communities instead of individuals.

Multilevel model could explicitly handle hierarchically structured data, such as students are clustered within communities. There exist several special purpose statistical programs designed specifically for estimating multilevel models[1] and Hierarchical Linear Model (HLM) will be employed in this paper. Multilevel models- also known as hierarchical linear and mixed effects models is widely used to explain pupils’ test score that affected by individuals’level and school level. Here we could use it to decompose the differences of children’s high school attainment into the differences between communities and differences between children within the same community (called random effect), as well as explain the differences that attributed to differences of individuals’ characteristics or differences between communities’ characteristics (called fixed effect).The findings will suggest the real reason of rural students’ low educational attainment, which depends on family characteristics or community characteristics and whether we could increase educational equality through enlarging the public educational expenditure and in what level. These results couldprovide some guidelinesfor the governmentto set up future public expenditure policy in education.

The rest of the paper is organized as follows, in the next part, we state the empirical estimating method. In section three, we introduce the data and statistical description. Section four reports the estimating results and conclusion was provided in final part.

2 Estimating method

The estimating process of HLM includes two steps: firstly,regressing variables in individuals’ level by groups; secondly, regressing variables in both individuals’ level and higher level after recording the first step regression results. For example, children’s high school attainment (H) is affected by family characteristics (X). In the same time this affect is varied by the school quality (Q) in the community. Since family characteristics and school quality are different levels’ variables, we could not regressX and Q in a simple regressing model to explain H. Multi-level model treat the effect of Q on H as an intercept or slope effect in the regressing equation of Xand H. In each community, the same regressing equation is used to estimate their respective,andas:

(1)

where j represents community and irepresents pupil.is an intercept and represents X’s affected parameter. And thenandare treated as dependent variables and regressed by community variables, e.g.

(2)

(3)

The impact of community and family characteristics on pupils’ high school decisions are evaluated in a two-level model. Multi-level model can be estimated by maximum likelihood estimation, restricted maximum likelihood estimation and expectation maximum algorithm (refer to Raudenbush & Bryk, 2002 for detail). We will employ Bernoulli model that widely used in HLM statistical software to analyze the determinants of pupils’ high school decisions.Bernoulli model discusses the differences among individuals in level 1 and analyze the differences among communities in level 2.

Level 1:

Individuals i in community j’s high school attainment can be expressed as a logit model:

represents individual i in community j’s high school attainment. If equals 1 means individual i has attended or is attending high school. If equals 0 suggests individual i did not attend the high school.explains individual i in community j’s probability to attend high school given its own and family characteristics.denotesindividual i in community j’s explanatory variable k and is its corresponding regression coefficient. is the error term, satisfy the normal distribution.

Level 2:

……

In the Multi-level linear regression model, the second level may have a intercept and slope effect to the first level as in equation (2) and (3). We assume community characteristics only have an intercept effect to individuals’ educational choices.In other words, ,…,are constant, equals ,…,,respectivelyandis the same to all the pupils. This implies that the impact of family characteristics are constant no matter in which villages. is different to pupils in different community, which means the average probability of high school attainment in different communities are different. That is tosay, community characteristics may influence the mean high school attainment.[2] represents community characteristics, is the regression coefficients of the corresponding variables. is the residual and satisfy the normal distribution. Denoting the variance of as and variance of as , the percentage of observed variation in the high school attainment attributable to community-level characteristics is found by dividing by the total variance:


3 Data and descriptive statistics

In this paper, we plan to employ a large survey data Chinese Household Income Project (CHIP) which is designed by Chinese and foreign economists and carried out by National Bureau of Statistics of China to analyze pupils’ determinants of high school attainment. CHIP data contains comprehensive information about individuals’ characteristics and family income information covering the west, middle and east areas. Until now there are four waves of CHIP survey, which carried out in 1989, 1996, 2003 and 2008 for the year of 1988, 1995, 2002 and 2007. CHIP data includes urban survey, rural survey and immigration survey based on their registration position. In order to avoid unobserved policy influences, we only use recent CHIP-2002 and CHIP-2007 rural survey to analyze the rural students’ high school attainment. CHIP-2002 rural survey covers 22 provinces,9200 household and around 38000 individuals. CHIP-2007 rural data investigate 9 provinces about 8000 household and more than 31000 individuals.The rural survey of CHIP data contains detailed informationabout individuals’ located villages (e.g.If the village has a primary school, middle school, the distance to the nearest primary school and middle school ), which is quite important to individuals’ educational decisions and can be used to analyze the community effect.

In order to analyze pupils’ high school decisions, we select those aged between 16-20 years old as research samples. In China, pupils generally begin their education at age 6-8 and take 8-9 years to finish their compulsory education, which includes primary education and junior middle school education. Therefore, we assume most of the individuals at age 16 could attend high school if they want[3]. The reason we will cut the age at 20 is trying to main as many observations as possible in the same time avoid sample selection bias. After analyzing the children who are still living with their parents at each age, we found only 3 household owners or household spouse are below 21 in our survey. This statistical result suggests cutting age at 20 is reasonable. We will also report the regressing results for children aged 16-18.

After matching the children of those aged 16-20 with their parents within the household, we could get sufficient information about children’s parents, including their education, income, health status and other characteristics. The matched CHIP-2002 is composed by 4169 children and CHIP-2007 is 3121. The definition and simple statistics of the variables that may be used in the analysis are reported in Table 1.

Table 1 Definition and Simple Statistics of the Variables

Variable / Definition / CHIP-2002
Mean (Std. Err.) / CHIP-2007Mean (Std. Err.)
Children’s characteristics
chigh / Have you ever attended (or are you currently attending) high school*(Yes=1;No=0) / 0.2780 (0.4481) / 0.3675 (0.4822)
male / If children is a boy, male=1 / 0.5218
(0.4996) / 0.5178 (0.4998)
age / Children’s age / 17.8960
(1.4325) / 18.1186 (1.3708)
age2 / Children’s age square / 322.3173
(51.4987) / 330.1613
(49.5484)
score / Children’s middle school academic performance
(very good=1, good=2, normal=3, bad=4, very bad=5) / 2.5627 (0.7366) / 2.7654 (0.7247)
badhealth / individuals report their health status is not very good or bad =1 / 0.0072 (0.0843) / 0.0190 (0.1364)
sib / Numbers of siblings / 1.4827
(0.9897) / 1.1917 (0.9387)
Family characteristics / Family characteristics
hincome / Family annual net income per capital (unit: 10 thousand) / 0.2635
(0.2168) / 0.5062 (0.3975)
pschool / Parental schooling =father’s educational years + mother’s educational years / 13.3275 (4.4776) / 14.6963 (3.5630)
Single / If children’s mother or dad is dead or they are divorced, single=1 / 0.0200 (0.1402) / 0.0145 (0.1195)
Gradparent / If children live with their gradparent, gradparent=1 / 0.1222 (0.3275) / 0.0241 (0.1535)
fbadhealth / If Father’s health status is not very good or bad, fbadhealth =1 / 0.0394 (0.1945) / 0.0434 (0.2039)
mbadhealth / If mother’s health status is not very good or bad, mbadhealth =1 / 0.0556 (0.2292) / 0.0483 (0.2039)
fminority / If father is not a hanness, fminority =1 / 0.1407
(0.3478) / 0.0076 (0.0868)
Village characteristics / Village characteristics
Vprimary / If the village has a primary school, vprimary =1 / 1.7220
(1.1496) / 2.3120 (1.3280)
Vmiddistan / The distance to the nearest junior middle school / 3.9804
(4.8940) / 2.0303 (0.8182)
vincome / Average income per capital in the village (unit: 10 thousand) / 0.2357
(0.1372) / 0.3350
(0.1540)
govedu / Public educational expenditure per student (unit: 1 thousand) / 0.6129
(0.3243) / 1.3202
(0.8123)

The statistics in our matched dataset show, the percentage of boys in the rural areas are almost unchanged around 52%, a little bit higher than 50%, reflecting the opinion that boys are roots of the family is still stubborn.[4] The number of siblings is continuously descending, due to the “one child policy”. Within the expectation, the average net family income and educational expenditure are enhancing from 2002 to 2007 due to the development of economy. The average schoolings of both parents are rising as well. The schoolingmean grows from 13.33 in 2002 to 14.70 in 2007.

In this paper, we are intending to tackle youngsters’ high school decisions;therefore we separately describe those enrolled in high schools and those not enrolled in high schoolsin order to find whether their characteristics and family background have some differences. The statistics in table 2 shows that the percentage of boys of those accessing to high school individuals is larger than girls. This reflects that the gender discrimination is still serious in rural areas. In accordance with our prediction, the average family income, average educational expenditure and parental schooling for those accessing to high school is higher than not attending high school individuals and the average number of siblings for high school graduates is less than not accessing to high school pupils.We also find the differences of family background for two groups’ individuals are converged around 2007, but the differences of educational expenditure for two groups are continuously increasing.

Table 2 Statistical Description of ChildrenAged 16-20 for High School Attendants and Non-HighSchool Attendants

2002 / 2007
Attend high school / Not attend high school / Attend high school / Not attend high school
Boy / 56.11 / 48.91 / 54.57 / 49.03
Average Number of siblings / 1.36 / 1.68 / 1.08 / 1.24
Average net income of family(10000 Yuan) / 1.36 / 1.10 / 2.29 / 2.07
Average education expenditure (100 Yuan) / 34.16 / 24.79 / 31.75 / 18.76
Father’s average schooling / 8.34 / 7.40 / 8.17 / 7.72
Mother’s average schooling / 6.70 / 5.49 / 7.03 / 6.67
Location
East / 42.64 / 34.19 / 49.20 / 40.39
Middle / 27.50 / 30.94 / 43.14 / 49.33
West / 29.86 / 34.87 / 7.65 / 10.28
Observations / 1284 / 2885 / 1358 / 1763

The other part of data comes from public finance administrative data in county level during 1999 to 2006. We match CHIP-2002 data with the average public educational expenditure per student in junior middle school between 1999 and 2001. And connect the average public educational expenditure per student in junior middle school between 2003 and 2005 with CHIP-2007.

In 2002, the survey covers 122 counties, but after deleting the missing data of budgetary educational expenditure per student, the dataset only includes 101 counties. The reasons of missing data are varied. Some counties are transferred as urban administration areas during 1999 to 2001 and will not be covered in the survey. Some counties may change their names due to reallocation of villages and will not be tracked by our data anymore. After controlling the age between 16 and 18, the observations are 2188.In 2007, the survey covers 82 areas and after deleting the missing data of budgetary educational expenditure per student, thedatasetbecomes 81 areas. After controlling the age between 16 and 18, the observations become 1660.Figure 1 contrasts the budgetary educational expenditure per student in the sample with the national and regional average budgetary educational expenditure per student, which shows the budgetary educational expenditure per student in the sample could generally reflect the real situation, but a little bit lower than the national level.