EVALUATING THE HOUSING UNIT METHOD:
A CASE STUDY OF 1990 POPULATION ESTIMATES IN FLORIDA
Stanley K. Smith
and
Scott Cody
Bureau of Economic and Business Research
University of Florida
Gainesville, Florida 32611
BIOGRAPHICAL SKETCH
Stanley K. Smith is a Professor of Economics and Director of the Bureau of Economic and Business Research at the University of Florida. Scott Cody is a research demographer at the Bureau. Smith and Cody produce the official state and local population estimates used by the State of Florida for revenue-sharing, planning, and budgeting.
ABSTRACT
The housing unit (HU) method is the most commonly used approach to making small-area population estimates in the United States. This study evaluates the accuracy and bias of HU population estimates produced for counties and subcounty areas in Florida for April 1, 1990. The major findings are that population size has a negative effect on estimation errors (disregarding sign) but no effect on bias; growth rates have a U-shaped effect on estimation errors (disregarding sign) and a negative effect on bias; electricity customer data provide more accurate household estimates than do building permit data; errors in household estimates contribute more to population estimation error than do errors in estimates of average household size or group quarters population; and the application of professional judgment improves the accuracy of purely mechanical techniques. We believe the HU method offers a number of advantages over other population estimation methods and provides planners and demographers with a powerful tool for small-area analysis.
EVALUATING THE HOUSING UNIT METHOD:
A CASE STUDY OF 1990 POPULATION ESTIMATES IN FLORIDA
Introduction
Postcensal population estimates for states and local areas are used for a wide variety of purposes in the United States. They form the basis for the distribution of billions of dollars of federal, state, and local government funds. They determine boundaries and representation for city councils, county commissions, school boards, and other political entities. They are used for planning when and where to build new schools, roads, hospitals, banks, electric power plants, and shopping centers. They provide an important tool for marketing a wide variety of goods and services, and even determine the salaries of some public officials. Clearly, there is a profound need for accurate and timely postcensal population estimates.
Several different methods can be used to make population estimates (see Murdock and Ellis 1991; National Research Council 1980; and Rives, Serow, Lee, and Goldsmith 1989). At the substate level, the HU method is by far the most commonly used (U.S. Bureau of the Census 1983, 1990). This method is widely accepted because it can use a variety of data sources and estimation techniques, can be applied virtually everywhere, and can produce reasonably accurate estimates. Given its widespread use and the importance of population estimates for many types of planning and budgeting, it is essential to evaluate the performance of the HU method from time to time.
This article provides such a critical evaluation. It focuses on April 1, 1990 population estimates for counties and subcounty areas in Florida.[1] It evaluates estimation errors by size of place and rate of growth, by component (i.e. households, persons per household, and group quarters population), and by technique. It calculates the contribution of each component to overall estimation error, considers the role of judgment in producing population estimates, and compares the performance of 1990 estimates with that of 1980 estimates. It confirms some results that have been found before and reports others that are new. Although this study focuses on Florida, it provides insights into the HU method that will be useful in a much broader context.
Many planners have used the HU method to produce small-area population estimates; others have used similar concepts, data sources, and techniques for analyses of fiscal impacts (e.g., Burchell and Listokin 1978), residential mobility (e.g., Varady 1984), age structure (e.g., Myers and Doyle 1990), household size (e.g., Gober 1990), and housing demand (e.g., Myers 1987). Planners have thus been intimately involved in developing the HU method and extending its application into new areas. We believe the present study will help both planners and demographers make more effective use of this increasingly important tool for small area analysis.
Demographers typically distinguish between population estimates and population projections (or forecasts). Estimates refer to the present or some time in the past, whereas projections refer to the future. In terms of methodology, the primary difference between estimates and projections is that estimates can be based on symptomatic data corresponding to the date of the estimate, whereas projections cannot be based on such data; rather, projections must be based on the extrapolation of past trends or assumptions about future demographic change. In this article we focus solely on population estimates.
Brief Description of Methodology[2]
The foundation of the HU method is the fact that almost everyone lives in some type of housing structure, whether a traditional single family unit, an apartment, a mobile home, a college dormitory, or the state penitentiary. The population of any geographic area can therefore be calculated as the number of occupied housing units (households) times the average number of persons per household (PPH), plus the number of persons living in group quarters facilities (e.g., college dormitories, prisons, military barracks) or without traditional housing (e.g., the homeless):
Pt = (Ht PPHt) + GQt (1)
where Pt = total population at time t, Ht = occupied housing units at time t,
PPHt = average number of persons per household at time t, and GQt = group quarters population at time t (including the homeless population).
This is an identity, not an estimate. If these three components were known exactly, the total population would also be known. The problem, of course, is that these components are almost never known exactly. They must rather be estimated from various data sources, using one or more of several possible techniques. In this section we provide a brief description of the data and techniques used to estimate these three components for counties and subcounty areas in Florida. More detailed descriptions of the HU method can be found in Smith and Lewis (1980), Rives and Serow (1984), and Smith (1986).
Households. A number of different types of data can be used to estimate households, such as building permits, certificates of occupancy, electricity customers, telephone customers, property tax records, and aerial photographs. The most commonly used types of data are building permits and electricity customers (U.S. Bureau of the Census 1983), since they are widely available and correlate closely with population change. These are the data sources we use in Florida.
The housing inventory for a city or county can be estimated by adding building permits issued since the most recent census (net of demolitions) to the units counted in that census. Building permit data are available from the U.S. Department of Commerce, which collects them directly from cities and counties throughout the United States.[3] The time lag between issuance of permit and completion of unit is assumed to be three months for single family units and ten months for multifamily units; these assumptions are based on surveys of developers in Florida. For mobile home units, there is no time lag. Although building permit data are not available everywhere, it has been estimated that approximately 90 percent of new housing units in the United States are built in areas requiring building permits (Siskind 1980). In Florida, building permit data are available in 82 percent of the subcounty areas for which we produce population estimates; these areas contain 90 percent of the state's population.
Combining building permit data with housing data from the decennial census provides an estimate of the current housing stock. The next step in the process is to estimate the proportion of housing units occupied by permanent residents. The most effective way to determine current occupancy rates is to conduct a special census or sample survey. Given their high costs, however, such censuses or surveys are rarely conducted. A common procedure is simply to use the occupancy rates from the most recent census (U.S. Bureau of the Census 1983). This is the procedure we follow in Florida.
The product of the housing stock and the occupancy rate (preferably performed separately for each type of housing unit) gives an estimate of the number of households. There are several problems with this estimate. Time lags between issuance of permit and completion of unit may vary from place to place and from year to year. The proportion of permits resulting in completed units is generally unknown. Occupancy rates may be going up or down. Data for mobile homes may be non-existent or of poor quality. Certificate-of-occupancy data can eliminate problems of estimating time lags and completion rates, but not problems of estimating current occupancy rates, demolitions, or conversions from one use to another.
Our second source of data avoids some of those problems. Active residential electricity customer data are available for all cities and counties in Florida and are often of better quality than building permit data. More important, households can be estimated directly from electricity customer data, avoiding the intermediate steps of estimating time lags, completion rates, demolitions, conversions, and occupancy rates. A number of studies have concluded that household estimates based on electricity customer data are generally more accurate than those based on building permit data (e.g., Starsinic and Zitter 1968; Smith and Lewis 1980, 1983; Rives and Serow 1984). We collect electricity customer data from 54 electric power companies in Florida; the five largest companies serve 81 percent of the state's population.
There are several ways to estimate the number of households from active residential electricity customer data. One uses the net change in customers as a measure of the net change in households (Starsinic and Zitter 1968). However, a number of factors may prevent a perfect one-to-one relationship between permanent households and residential electricity customers: housing units occupied by seasonal and other non-permanent residents; master meters serving more than one household; separate meters for pumps, barns, and other non-housing uses; geographic boundaries for utility companies that do not correspond exactly to those used by the Census Bureau; and the bookkeeping practices of individual utility companies. These differences can be accounted for by forming a ratio of the number of households counted in the most recent census to the number of customers reported for the same date, and applying this ratio to the current number of customers. This approach has been found to produce more accurate household estimates than the first approach does (Smith and Lewis 1980, 1983). The ratio approach is the one we follow in Florida.[4]
Our final estimates of households are not based on the same data sources and techniques for all places, however. Rather, we use our professional judgment to decide which sources and techniques are likely to be most reliable for each individual place. In a majority of places we use only electricity customer data, but we occasionally adjust the household/customer ratio to account for evidence of changes in seasonal populations (e.g., shifts in the composition of the housing stock; seasonal fluctuations in the number of active residential electricity customers). When electricity customer data are of dubious quality and building permit data appear to be good, we use only building permit data. When the data sources differ substantially and it is not clear which is better, we average the two. Our choices of data and techniques are determined primarily by the consistency of the data series over time, the presence (or absence) of gaps in the data series, and the availability of additional evidence about data quality or demographic trends. We believe the application of professional judgment provides better household estimates than does the mechanical application of the same data and techniques for all places. The next section offers some evidence supporting this belief.
Persons per household. The second component of the HU method is the average number of persons per household (PPH). Although trends nationally and in Florida have been toward steadily smaller PPH, trends for local areas vary considerably from one place to another. Between 1980 and 1990, PPH declined in all but two of Florida's 67 counties, with declines ranging from 0.9 percent to 11.4 percent. Values of PPH for Florida counties in 1990 ranged from 2.18 to 3.00. Variation in PPH levels and changes over time are even greater for cities than for counties.
To estimate PPH for cities and counties, we developed a formula that combines the local PPH calculated in the most recent census, the national change in PPH since that census (as measured by the Current Population Survey), and the local change in the mix of housing units (single family, multifamily, mobile home) since the most recent census. We base local changes in PPH on national changes, but adjust them up or down depending on whether the initial PPH was higher or lower locally than nationally; on the average, declines are greater when initial levels are higher.[5] We further adjust the estimates to account for changes in the local mix of housing units and the PPH for each type of unit calculated in the most recent census. (Multifamily units typically have lower PPH than do single family units do).[6] This formula is described more fully in Smith and Lewis (1980). Again, we make some adjustments to the formula's estimates according to our professional judgment about factors affecting PPH (e.g., increases in the Hispanic population, which has a relatively large PPH). PPH could also be estimated by extrapolating past trends or holding values constant at levels found in the most recent census (e.g., Starsinic and Zitter 1968). The formula described above, however, has been found to produce more accurate estimates of PPH than either of these alternatives (e.g., Smith and Lewis 1980, 1983). We test several alternative estimation techniques for PPH in the next section.
Group quarters population. Population in households is estimated by multiplying the number of households times the PPH. Population in households accounted for 97.3 percent of total population in the United States in 1990 (97.6 percent in Florida). To obtain an estimate of total population, persons living in group quarters or without traditional housing must also be estimated. We do this in three steps. The first is to collect group quarters data from prisons, colleges, military bases, and long-term health care facilities, for the same date as in the most recent census. The second step is to subtract these numbers from the total non-household population counted in that census, and then to form a ratio of the residual to population in households; we call this ratio the GQ multiplier. In the third step, the current group quarters population is estimated by applying the GQ multiplier to the current estimate of the household population, and adding a direct count of the current number of persons residing in prisons, college dormitories, military barracks, and long-term health care facilities.[7]
Evaluating Accuracy and Bias
The obvious question to ask of any estimation methodology is "How accurate are the estimates?" We provide an answer to this question by comparing April 1, 1990 population estimates with April 1, 1990 census counts for counties and subcounty areas in Florida. This comparison doesn't provide a perfect measure of accuracy and bias, because census counts themselves are subject to error. Differences between estimates and census counts may therefore reflect errors in the decennial census as well as errors in the estimates. The decennial census is believed to be quite accurate for most places, however, and provides a widely used standard for evaluating population estimates. We refer to differences between estimates and census counts as estimation errors, but the reader is cautioned that they may have been caused by enumeration error as well as by estimation error.
Five measures of accuracy and bias are used. Mean absolute percent error (MAPE) is the average error when the direction of error is ignored. The proportion of errors less than 5 percent and greater than 10 percent indicates the frequency of relatively small and of large errors, respectively. These are measures of accuracy, or how close estimates were to census counts, regardless of whether the estimates were high or low. Mean algebraic percent error (MALPE) is the average error when the direction of error is included. This is a measure of bias: a positive error indicates a tendency to overestimate, a negative error indicates a tendency to underestimate. Since a few extreme errors in one direction can change the sign of MALPE, the proportion of estimates that were above the census count (%POS) is used as another measure of bias.