Title: Low Cancer Incidence, Mortality, and Prevalence at Very Old Ages

Authors: Charles Harding, Francesco Pompei, and Richard Wilson*

*Corresponding author:

Prof. Richard Wilson

Jefferson Labs J-255

Harvard University

17 Oxford Street

Cambridge MA 02138

(617) 495-3387

Institution (for all authors): Department of Physics, Harvard University, Cambridge MA 02138

Running Title: Low Cancer Rates at Old Age

Keywords: aging, carcinogenesis, incidence, mortality, prevalence

Submission to: International Journal of Cancer

Article Category: Research article in epidemiology.

Outside support: There was no grant support for this project.

Conflicts of interest: None of the authors has a conflict of interest.

ABSTRACT

Despite some evidence to the contrary, it is still commonly believed that most cancer rates rise with age throughout adulthood. Here we show that, in a population comprising 9.5% of the United States, the age-specific incidence and mortality of most cancers decrease or plateau at very old ages. We also find that, in most cases, age-specific prevalence decreases swiftly at ages above 90. Our results are based on the Surveillance Epidemiology and End Results cancer registry 9 and the US Census 2000. We examine the reliability of census population figures for the oldest old, which have been questioned. Focusing on years 1998-2002, we study twenty-three cancer sites in men and twenty-four in women. Where we have statistical power at old ages, it appears that incidence rates normally peak between 75 and 90 years old, dropping abruptly afterward. Rates often trend toward zero among centenarians, who may be asymptomatic or insusceptible. When we pool all cancer sites together, the same pattern is found. We discuss the relevance of old age cancer rates to carcinogenesis and its modeling. Towards this end, we compare our results with epidemiological, biological, and mathematical modeling research. Our results are consistent with autopsy studies, survival studies, previous cancer rate results, the biology of cellular and organismal senescence, and the possibility that different population groups may have different susceptibility to cancers (heterogeneous susceptibility). While it has been suggested that heterogeneous susceptibility explains low cancer rates at old ages, we do not find support for this hypothesis.

Introduction

Although there is a common belief that the age-specific incidence of most cancers rises throughout adulthood, many papers, including (1,2,3,4), find a decrease that begins in octogenarians. Thilly et al. report a similar drop in the US cancer mortality rate among the elderly (5). Age-specific incidence and mortality have long been influential in the mathematical modeling of carcinogenesis. However, little effort has been invested in comparing these rates at the oldest ages. We do so, using records of the Surveillance, Epidemiology, and End Results (SEER) cancer registries to calculate age-specific incidence and mortality in the same large population, comprising 9.5 % of the United States. Looking at a wide variety of cancer sites (24 in women and 23 in men) helps us determine if the old-age decline in cancer rates is broadly characteristic of carcinogenesis and, based on this, assess its impact on cancer modeling.

Little previous work has considered cancer prevalence among the oldest old. Therefore, we include age-specific prevalence results alongside incidence and mortality. Although they are generally similar, no one of these measures can be derived purely from the other two without relying on substantial assumptions. Detailed registry data is necessary, for example, to determine how many people diagnosed with lung cancer eventually die of stomach cancer, and to include these people in both lung cancer incidence and stomach cancer mortality.

Because populations among the extremely elderly are quite small, the SEER database does not guarantee us statistical power. Therefore, it is especially important to consider sampling error in addition to systematic error in our data sources.

All the data obtained in this paper are available on the web page:

Materials and Methods

Patient data were taken from the Surveillance Epidemiology and End Results cancer registries SEER 9 records, which include all cancers diagnosed from 1975 on in Atlanta, Connecticut, Detroit, Hawaii, Iowa, New Mexico, San Francisco-Oakland, Seattle-Puget Sound, and Utah (6). During 1998-2002, SEER 9 covered 9.5 % of the United States and 40% of the population in SEER 12, a cancer registry that was used in Harding et al. (4) to study incidence, but has not been operating long enough to investigate mortality or prevalence directly. SEER*Stat software (7) was used to count deaths (using survival time and cause of death) and diagnoses in 5-year age categories 0-4 to 110-114. SEER's standard recoding was used to classify cancers into common groupings by site (eg. brain, colon & rectum, lung & bronchus...).

Many patients in the SEER database have been diagnosed with multiple tumors. When we computed the cancer incidence results that are plotted in Figures 1 and 2, we counted only first malignant primary tumors. That is, for each patient we counted his first diagnosis with a malignant, primary tumor and ignored all his other tumors. In contrast to Figures 1 and 2, the cancer incidence results of Supplementary Figures 1 and 2 include every malignant tumor in the database, regardless of whether it was the first primary. The interpretation of these two versions of incidence is explained in the results.

For incidence rates based on first malignant primary tumors only (FIGURES 1, 2), person-time at risk was adjusted for 20-year prevalence on a yearly basis by removing those previously diagnosed with any malignant primary from the at risk pool. This was an improvement on incidence methods in Harding et al. (4), and creates a better correspondence between incidence and hazard rate. Yearly mortality of each cancer type was calculated by checking the follow-up information on each diagnosis in SEER 9 for 1975-2003. If the patient has died, SEER follow-up information includes year and cause of death. As a consequence of SEER's limited history, patients who were most recently diagnosed with a primary tumor prior before 1975 cannot contribute to our analysis of cancer mortality or prevalence in the period 1998-2002. Therefore, our results are accurately termed 20-year limited-duration mortality and prevalence. When attempting to measure the burden of cancer on society by appealing to prevalence statistics, it is difficult to avoid an arbitrary cutoff date; the burden of cancer lessens for people who have survived longer. Although methods have been developed for estimating complete prevalence from limited-duration prevalence, they require modeling incidence with a function, in order to project incidence onto years for which we lack data (8). As we are investigating the inaccuracy of previous incidence models, we do not want to employ new incidence models, which could themselves be seriously flawed.

Let absolute prevalence denote the number of persons living with cancer and let absolute incidence denote the number of persons diagnosed with cancer. We developed a program to estimate 20-year absolute prevalence by the counting method (9). It is simplest to explain by example. We obtain absolute cancer prevalence in 2000 at age 80 by summing over all X<=20 the product of absolute incidence and X-year survival among those diagnosed in year 2000-X and at age 80-X. (Prevalence for other ages and years was calculated similarly.)

Because we lump survival by year we know the year a person died, but we not the day or the month. However, when calculating prevalence, persons should only contribute for the days of the year that they survived. This is especially important at older ages and for more virulent cancers, where large changes in survival probability occur within small timescales. We approximate the true case by averaging the probability of surviving X and X+1 years, and use the result as our X-year survival figure, mentioned in the example above. Survival estimates were obtained by the Kaplan-Meier method, stratifying for sex, cancer-type, year, and age. We did not stratify for other variables, such as stage and grade at diagnosis. Note that SEER*Stat provides an implementation of the counting method, but it is specific to younger age groups (under 85 and a single 85+ category) (7).

We can also estimated prevalence in any given year by directly tabulating persons in SEER who have not died beforehand. Unlike the counting method, this does not account for persons lost to follow-up. For all the cancers we study using the counting method, we also estimated prevalence by direct tabulation. In both cases, prevalence declines sharply at old ages. Because it accounts for those lost to follow-up, prevalence obtained via the counting method is generally higher, and is used for all results shown in this paper.

In our graphs we present prevalence per 100,000 for the year 2000. Data from other years 1998-2002 are only superficially different.

Person-time at risk is necessary for calculating incidence and mortality rates from counts of diagnoses and deaths. Population figures are necessary to determine prevalence proportions and person-time at risk. SEER provides population figures for individual ages 0-84 and a single age category 85+. We split the 85+ population figure in proportion to year 2000 US regional census data, ages 85-114 (10). Regions (states & metropolitan areas) were chosen to correspond with SEER's geographic coverage. We can compare the Census 2000 regional populations against SEER-provided figures for ages 0-84 and 85+. With the exception of some outlier cases for populations in their 50s, each age category 0-85+ shows less than 10% difference between SEER-provided 1998-2002 population figures and Census 2000 regional population figures.

Patient identity is not disclosed by SEER, only a coded patient ID number.

Error and Uncertainty

(a) Overstatement of old populations. There is strong evidence that very old populations were overstated in past US Censuses and may still be overstated as of the 2000 census (11,12). This overestimation should be noted more regularly, and its implications discussed more fully than has been done in papers on age-specific cancer rates, including Harding et al. (4), which uses data from censuses 1970-2000 for its results. While the problem of old-age population overestimation is not discussed in 1990 and 2000 census documentation files and errata (13-16), it is addressed in demographics research papers (11,17-19) and a few Census Bureau special reports (12). Andreev (17) has developed a new estimation procedure to check and correct counts of the very old. Applied to the year 2000 US population 90 and over, this method produces results that are highly consistent with the 2000 census (<1% discrepancy for females, 4% discrepancy for males). He further finds that the 1980 census overstated the population 90 and above by 7% for females and 10% for males. Error is larger for older age groups. Kestenbaum states that, as of 1992, most researchers believed population figures and mortality rates 85-89 and 90-94 to be reasonable when stratified by sex, but not race (18). However, he also reports that centenarian populations were significantly overstated in the 1990 and 2000 censuses (19).

To assess the impact of this uncertainty, we compare female cancer incidence in centenarians with female cancer incidence 85-99. Under the population figures that we derive from the year 2000 US Census, female centenarians have 63% of the incidence of women 80-84, with Wald-approximated 90% confidence (56%, 71%). However, under population figures derived from Kestenbaum's paper (19), this value becomes 100% (89%, 112%). Male cancer incidence exhibits corresponding census value 48% (39%, 58%) and Kestenbaum value 76% (65%, 93%), where incidence peaks amongst 75-79 year-olds. According to census figures, female cancer mortality among centenarians is 78% (69% , 89%) of the greatest female cancer mortality, which occurs in age range 95-99. Under figures derived from Kestenbaum, this value shifts to 121% (107%, 136%). For male cancer mortality, respective values are 65% (53%, 79%) and 100% (81%, 122%), where mortality peaks in age range 85-89. In all cases, 20-year prevalence declines with p<0.001.

However, as can be observed in the figures, the incidence rates of many cancers reach values significantly below their maxima in age-ranges where population figures are more certain, particularly before 95. Although many mortality rates also decline before 95, others peak several years after incidence, at ages where our results are more easily discounted.

When the 2010 census population figures are released, they should provide an interesting comparison.

(b)Age misclassification error. The cancer rate peak and decline (discussed in our results) may be an artifact if ages of elderly persons are widely understated at death/diagnosis. However, in a linkage study of early censuses (1900,1910,1920) and more recent genealogy databases, Gavrilova and Gavrilov (20) found 92% agreement in birth year among centenarians. Moreover, Hill et al. (21) estimate that among white Americans dying in 1985, age reported on death certificate is correct in over 90% of cases for each 5-year age group 85-89 to 105+. We have not found results concerning the accuracy of reported age at diagnosis, although it may be comparable.

(c)Understatement of cancers. Absolute incidence and mortality (counts of deaths and diagnoses) could be underestimated at old age because cancers may be under-reported in the elderly, and less effort is often invested in assigning a cause of death to elderly persons. However, in our results no relationship is observed between the reversal (peak and decline) in a cancer's incidence and the difficulty of diagnosing that cancer.

For incidence and mortality rates, 95% exact confidence intervals were calculated under the assumptions that deaths and diagnoses are Poisson-distributed, while person-time at risk is fixed. For prevalence proportions, 95% exact confidence intervals were also calculated assuming Poisson-distributed counts of persons with cancer and fixed populations in each age category. This provides conservative confidence intervals for prevalence estimates obtained via the counting method (9), explained previously.

The beta model (22) was fit to incidence and mortality rates, here primarily to guide the eye. It has the form a t^(k-1) * (1-bt), where t is age in years, and a, b, and k are fitting parameters. Fitting was limited to ages 55-114. The beta model generally fits to either strongly increasing or decreasing rates, but not to rates that level off at very old ages.

SEER does not provide enough data to stratify by (most) risks within each age group, for instance by history of cigarette smoking or asbestos exposure.

Results

Figure 1 shows age-specific incidence rates, incidence-based mortality rates, and 20-year limited duration prevalence proportions for 24 cancers in women within SEER 9 cancer registries, 1998-2002. These registries comprise 9.5% of the US. Incidence is based on records of the first malignant primary tumor diagnosed in each patient. Mortality and prevalence are calculated from registry follow-up information. Figure 2 presents similar results for 23 cancers in men.

As in Harding et al. (4), incidence rates of many cancers peak between 75 and 90 years old. However, in many other cases there is insufficient statistical power to distinguish between a rate plateauing or declining among the oldest old. Mortality closely mirrors incidence in the most virulent of cancers, such as pancreatic cancer. Generally, peak incidence and mortality coincide +/- 5 years, but mortality sometimes peaks much later, such as in the case of prostate cancer and all cancer sites combined. Studying all cancer sites combined is not a good indicator of site specific behavior.

The age of a cancer’s maximum rate in men is not a good predictor of the age of maximum rate in women. For both incidence and mortality, the decline at old age is usually more significant for women than men, as expected from the larger female population at very old ages. Melanoma of the skin, thyroid, and female breast cancer mortality appear to rise throughout old age, perhaps because average survival is unusually long among those who ultimately die of these diseases (overall and in the case of those 75+) (23). Thyroid cancer mortality lacks statistical power in old age groups, and may plateau or decline instead of continuing to rise.

Cancers of the cervix uteri, corpus uteri, testes, thyroid and Hodgkin lymphoma clearly fail to match the usual incidence pattern. Excluding cancer of the cervix uteri and testes, each site has two common forms. One form is generally found among younger age groups and is less virulent. The second is found among older persons and is more virulent (7, 24). For example, corpus uteri adenocarcenomas are more commonly found in younger persons (7). Overall they comprise 91 % of cases and have 88 % 5-year survival (24). Corpus uteri sarcomas, generally found in older persons, comprise 8 % of cases and have 53 % 5-year survival (7, 24). For these sites, mortality rates do follow the normal pattern of old-age peak and decline; comparison with incidence separates out these cancer subtypes. Hodgkin lymphoma and thyroid results lack statistical power at the oldest ages.

While prevalence is not a rate, 20-year prevalence proportions also display near-power law rise until around 70, followed by a peak between 75 and 95, and subsequently a sharp decline. Notable exceptions include brain cancer and Hodgkin lymphoma. In men and women, brain cancer prevalence reaches a maximum around age 20. Similarly, Hodgkin lymphoma prevalence is highest in age groups 30-40. In both cases, incidence is greatest at younger ages while mortality is greatest at older ages. In this sense, the large young-age peak in prevalence proportions supplements the large old-age peak in mortality rates. On average, younger patients survive longer (24). Note that many cancers display childhood peaks in incidence and mortality that are too small to be visible on our graphs, but are still highly statistically significant.