Chapter 3-12. Standardization
< This chapter uses too many examples out of Rothman (2002), done that way to quickly prepare a lecture while teaching out of the Rothman text. It needs to be updated with more of my own examples. >
In Chapter 3-11, we used pooling to combine stratum-specific estimates of effect measures (such as risk ratio) into a single summary effect measure. The summary effect measure was basically a weighted average of the stratum-specific estimates.
Another approach is standardization, which is a method of combining stratum-specific risks (cases/N) or rates (cases/PT) into a single summary value by taking a weighted average of them.
It weights the stratum-specific rates using weights that come from a standard population, in contrast to the pooling approach which weighted by how much information is contained in each stratum.
Suppose we choose to use the U.S. population in the year 2000 as our standard. We would then weight our age-specific rates with weights that reflect the age distribution of the U.S. population in the year 2000. Our summary rate would then be the rate that we would expect if our population had the same age distribution as the U.S. population in year 2000.
______
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah School of Medicine, 2010.
Example Sweden and Panama (Rothman, 2002, pp.1-2)
It would seem residents of Sweden, where the standard of living is generally high, should have lower death rates than residents of Panama, where poverty and more limited health care take their toll. However, a greater proportion of Swedish residents than Panamanian residents die each year.
The reason for this unexpected result is confounding due to differing age distributions of the populations of these two countries, with Panama having a younger population.
Population Pyramid
Panama: 2000
MALE / FEMALE60+
30-59
0-29
150 100 50 0 / 0 50 100 150
Population (in thousands)
Sweden: 2000
MALE / FEMALE60+
30-59
0-29
300 200 100 0 / 0 100 200 300
Population (in thousands)
In both countries, older people die at a greater rate than younger people.
However, because Sweden has a population that is on the average older than that of Panama, a greater proportion of all Swedes die in a given year, despite the lower death rates within specific age categories.
If we standardized the Panama rate and the Sweden rate to match a single population (such as the U.S. population), our rate estimates would then be directly comparable (not confounded by age), since the rates would be based on the same age distribution.
Two advantages to this approach over pooling are:
1) It might be interesting to know the standardized rate, itself, for each country rather than just the rate ratio. For example, it might be interesting to see a graph of 10 different countries, all standardized to the same age distribution.
2) Pooling requires approximatley equal stratum-specific rate ratios (homogeneous effect), whereas standardization does not. For example, the relative effects might be very different in newborns, young adult, and old adult age groups for these two countries so that the pooled estimate may not be appropriate.
We will return to this example below and standardize the rates using Stata.
Simple Examples of Standardization
Using Rothman’s example (2002, p. 158) suppose we have rates of:
Males: 10/1000 person years
Females: 5/1000 persons years
We can standardize these sex-specific rates to any standard that we wish.
For example, we might simply choose to weight males and females equally. We would then obtained a weighted average of the two rates that would equal their simple average
standardized rate = weightmales ´ Ratemales + weightfemales ´ Ratefemales
= 1 ´ 10/1000py + 1 ´ 5/1000py
= (10+5)/1000py
= 7.5/1000 person years
Suppose the rates reflected the disease experience of nurses, 95% of whom are female. In that case, we might wish to use as our standard a weight of 5% for males and 95% for females:
standardized rate = 0.05 ´ 10/1000py + 0.95 ´ 5/1000py
= (0.5 + 4.75)/1000py
= 5.25/1000 person years
To compare rates for exposed and unexposed people, we standardize both to the same standard and then compare them.
An advantage to standardization is that it uses a defined set of weights (which are independent of the data). Thus, other investigators can standardize using the same weights and then directly compare their stratified results to yours.
Example
Age 10-54 years / Age 55-94 years
Current / Past / Current / Past
Deaths / 196 / 111 / 167 / 157
Person-years / 62,119 / 15,763 / 6,085 / 2,780
Rate (´ 105 years) / 315.5 / 704.2 / 2744 / 5647
Rate difference
(´ 105 years) / -388.7 / -2903
Rate ratio / 0.45 / 0.49
Using the Mantel-Haenszel pooling approach, the mortality rate difference is -720/100,000 person years and the mortality rate ratio is 0.47.
Let’s now standardize the rates for age over the two age strata. We will standardize to the age distribution of current clozapine use in the study, since that is the age distribution of those who use the drug.
current clozapine use
Age 10-54 years: 62,119 py ( 91.1%)
Age 55-94 years: 6,085 py ( 8.9%)
Total: 68,204 py (100.0%)
Standardizing,
current clozapine use:
standardized mortality rate = 0.911 ´ 315.5/100,000 py + 0.089 ´ 2744/100,000py
= 532.2/100,000py
past clozapine use:
standardized mortality rate = 0.911 ´ 704.2/100,000 py + 0.089 ´ 5647/100,000py
= 1144/100,000py
Combining into standardize effect measures;
standardized rate difference = (532.2 – 1144)/100,000py = -612/100,000py
slightly smaller than the pooled estimate
standardize rate ratio = 532.2/1144 = 0.47
identical to the pooled estimate to two decimal places.
The stratum-specific rate ratios were very similar, so any weighting, whether pooled or standardized, would give a result close to this value.
Standardized Mortality Ratio (SMR)
When the standardized rate ratio is calculated using the exposed group as the standard, the result is usually referred to as a standardized mortality ratio, or standardized morbidity ratio (Rothman, 2002, p.161).
Thus, we computed an SMR in the preceding example.
Direct Standardization
Rothman and Greenland (1998, pp.45-46) give the following formula for direct standardization.
Let T1, T1, … Tk be the person-years in k strata (e.g., age-sex categories) in some selected standard population. Thus, the T’s are called the standard distribution for which the standardize rate is based.
Let I1, I1, … Ik be the stratum-specific incidence rates computed from your data. Then the standardized rate is given by
The numerator is the number of cases one would see in a population that had the person-time distribution T1, T1, … Tk and the stratum-specific rates I1, I1, … Ik. The denominator is the total person-time in such a population. Therefore, the standardized rate, Is , is the rate one would see in a population with person-time distribution T1, T1, … Tk and stratum-specific rates I1, I1, … Ik.
The standardization process can be conducted with incidence proportions or prevalence proportions, as well.
Let N1, N1, … Nk be the number of persons in k strata. Let R1, R1, … Rk be the stratum-specific incidence proportions (or prevalence proportions). Then the standardized risk, or standardized prevalence, is given by
These are the formulas used by the Stata’s direct standardization command dstdize.
Exercise (direct standardization)
Returning to the Sweden and Panama example, reading the data in,
FileOpen
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on panswedmortality.dta
Open
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\
panswedmortality.dta", clear
* which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\”
cd “Biostats & Epi With Stata\datasets & do-files"
use panswedmortality.dta, clear
Listing the data,
DataDescribe data
List data
Main tab: Variables: (leave empty for all variables): < leave empty >
Override minimum abbreviation of variable names: 15
Options tab: Table options: Draw divider lines between columns
Separators: When these variables change: nation
OK
list, abbreviate(15) divider sepby(nation)
+------+
| nation | age_category | population | deaths |
|------+------+------+------|
1. | Sweden | 0 - 29 | 3145000 | 3,523 |
2. | Sweden | 30 - 59 | 3057000 | 10,928 |
3. | Sweden | 60+ | 1294000 | 59,104 |
|------+------+------+------|
4. | Panama | 0 - 29 | 741,000 | 3,904 |
5. | Panama | 30 - 59 | 275,000 | 1,421 |
6. | Panama | 60+ | 59,000 | 2,456 |
+------+
We see that this file contains the variables for computing the age-specific incidence proportions, or mortality proportions.
We will use the following standard population:
Open
Find the directory where you copied the course CD:
Find the subdirectory datasets & do-files
Single click on panswedstdpop.dta
Open
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\
panswedstdpop.dta", clear
* which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\”
cd “Biostats & Epi With Stata\datasets & do-files"
use panswedstdpop.dta, clear
Double clicking on the last list command in the Review Window, and changing it to:
list, abbreviate(15) divider+------+
| age_category | population |
|------+------|
1. | 0 - 29 | .35 |
2. | 30 - 59 | .35 |
3. | 60+ | .3 |
+------+
we see that this is a file with the proportion of the population that will be used for each age stratum (the same for both countries).
When you wish to use a reference population that is different from any group in your incidence or mortality data file, Stata requires:
1) the standard population to be saved in a separate Stata-formatted data file (.dta file extension), 2) for this file to have the identical strata as the risk data, and
3) for the morbidity or mortality data to be the current file in Stata memory.
Bringing the mortality data back in:
Open
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on panswedmortality.dta
Open
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\
panswedmortality.dta", clear
* which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\”
cd “Biostats & Epi With Stata\datasets & do-files"
use panswedmortality.dta, clear
To obtain the direct standardized rates, we use
StatisticsEpidemiology and related
Other
Direct standardization
Main tab: Characteristic variable: deaths
Population variable: population
Strata variable: age_category
Group variables: nation
Use standard population from Stata dataset: panswedstdpop
OK
dstdize deaths population age_category, by(nation)
using(panswedstdpop)
------
-> nation= Panama
-----Unadjusted----- Std.
Pop. Stratum Pop.
Stratum Pop. Cases Dist. Rate[s] Dst[P] s*P
------
0 - 29 741000 3904 0.689 0.0053 0.350 0.0018
30 - 59 275000 1421 0.256 0.0052 0.350 0.0018
60+ 59000 2456 0.055 0.0416 0.300 0.0125
------
Totals: 1075000 7781 Adjusted Cases: 17351.2
Crude Rate: 0.0072
Adjusted Rate: 0.0161
95% Conf. Interval: [0.0156, 0.0166]
------
-> nation= Sweden
-----Unadjusted----- Std.
Pop. Stratum Pop.
Stratum Pop. Cases Dist. Rate[s] Dst[P] s*P
------
0 - 29 3145000 3523 0.420 0.0011 0.350 0.0004
30 - 59 3057000 10928 0.408 0.0036 0.350 0.0013
60+ 1294000 59104 0.173 0.0457 0.300 0.0137
------
Totals: 7496000 73555 Adjusted Cases: 115032.5
Crude Rate: 0.0098
Adjusted Rate: 0.0153
95% Conf. Interval: [0.0152, 0.0155]
Summary of Study Populations:
nation N Crude Adj_Rate Confidence Interval
------
Panama 1075000 0.007238 0.016141 [ 0.015645, 0.016637]
Sweden 7496000 0.009813 0.015346 [ 0.015235, 0.015457]
Notice that the standardized risks are given for each country, but there is no standardized risk ratio or standardized risk difference reported by Stata. These have to be computed manually, as
standardized risk ratio:
display 0.016141/0.0153461.051805
standardized risk difference:
display 0.016141 - 0.015346.000795
These two formulas are given in Rothman and Greenland (1998, p.63). Given two standardized rates, , both computed using the same standard distribution, the standardized rate ratio and standardized risk difference are given by
The same formulas apply for computing the standardized risk ratio and the standardized prevalence ratio and for computing the standardized risk difference and the standardized prevalence difference.
The confidence intervals for these standardized effect measures are not simply forming ratios and differences with the limits of the individual standardized measures. Rothman and Greenland (1998, p.263) present formulas for the confidence intervals. These are not available in Stata for direct standardization with the dstdize command.
Example Look at the article by Van Den Eden et al (2003). This is a paper that reports results using direct standardization.
1) Look at Statistical Methods section. You should now be able to understand it. Notice that
they cite the US Census website. The website provides US population data so that researchers
around the world can standardize to a common population distribution.
2) Notice how standardization allowed them to compare rates across race/ethnic groups in Table
3, and across studies/countries in Table 4.
Indirect Standardization
The Stata command for indirect standardization is istdize.
The following description and formula for indirect standardization was taken from the Stata reference manual under the dstdize command (StataCorp, 2003, Reference A-F, p.295):
“Standardization of rates can be performed via the indirect method whenever the stratum-specific rates are either unknown or unreliable. If the stratum-specific rates are known, the direct standardization method is preferred.
In order to apply the indirect method, the following must be available:
1. The observed number of cases in each population to be standardized, O. For example, if death rates in two states are being standardized using the US data rate for the same time period, then you must know the total number of deaths in each state.
2. The distribution across the various strata for the population being studied, n1,…,nk. If you are standardizing the death rate in the two states adjusting for age, then you must know the number of individuals in each of the k age groups.