Re: “ Comparing the Benefits of Screening for Breast Cancer and Lung Cancer Using a Novel Natural History Model”

Supplement

Parameter Estimation Method

The complete model formulation can be denoted by  = (1,2, 1,2, , c1, c2, b1, b2, k1, k2); hence, the parameter space consists of 11 dimensions. All the variables and the events in the natural history of cancer described above can be represented as functions of .

Parameters were estimated bythe maximum likelihood method (3, 4). For estimation purposes, tumor diameters at clinical detection were discretized into 6 bins (0-2, 2-2.9, 3-3.9, 4-4.9, 5-6.9, >7 cm for NSCLC; and 0-1, 1-1.9, 2-2.9, 3-3.9, 4-4.9, >5 cm for IDC); survival times were discretized into 1-year intervals up to L years (L=15 for NSCLC; L=30 for IDC). Let the bins for tumor diameter be [dj, dj+1) for j=0, 1, 2, ... 5 where d0 is 0, d6 is infinity; let the boundaries of the time intervals be sl for l=0, 1, 2, ... L where s0 is 0.

For patients in the SEER dataset who died of cancer let NjklE denote the number of patients that were diagnosed with stage A=k (where k=0 for early stage or 1 for advanced stage), with tumor size within the j-th size bin, and with survival time discretized into the l-th interval. For patients who were censored (either due to other causes mortality or loss-to-follow-up), let NjklC denote the number of patients were diagnosed with stage A=k, with tumor size within the j-th size bin, and with the last follow-up time falling into the l-th interval after diagnosis.

Given model parameters , let the tumor diameter at detection, the stage at detection, and the survival be D(), A(), and S() respectively. The likelihood function given is thus

Because it is computationally challenging to evaluate this likelihood function, we estimated it by empirical likelihood based on simulations, as described next3.

To estimate the likelihood at a given point  in the parameter space, a cohort of N patients was simulated, and the natural history was generated for each patient. For the i-th patient in the cohort, let the tumor diameter and the stage at clinical detection and the survival be Di, Ai, and Si respectively. The likelihood L() is estimated by the proportion of corresponding tumor size-stage-survival population observed in the simulation. That is,

where

Nelder-Mead simplex optimization procedure was used to search for maximal likelihood in the parameter space4. The estimation procedure was conducted in R statistical software package.

Model Properties

The model formulation results in properties that are consistent with clinical observations.

1. Clinical detection

(a) Faster growing (more aggressive) tumors are detected at larger sizes.

This is true when the correlation between tumor size at detection and growth rate is positive (i.e., the  component in ). As we show in the Result section of the main text, the estimates of  is positive for both NSCLC model and IDC model; thereby this property holds.

(b) Compared with early staged diseases, advanced diseases have a higher probability of being detected.

Disease in its early stage can be detected only due to primary tumor whereas disease in its advanced stage can be detected due to either primary tumor or metastasis.

2. Metastasis

(a) Faster growing (more aggressive) tumors progress to advanced stage earlier.

In the equation of TA, is smaller when r is larger since VC and BDk1/f are independent of r.

(b) Tumors detected at larger sizes have higher likelihood of being staged as advanced disease.

The probability of diagnosis as advanced stage at detection is Pr(A) = Pr(VE > VA ), and it is larger when VE is larger.

3. Survival

(a) Patients with faster growing (more aggressive) tumors have shorter survival.

In the equation of TD, TD is smaller when r is larger since VC and BD/f are independent of r.

(b) Tumors detected at larger sizes are associated with shorter survival.

This is a consequence of Property 1(a) and 3(a).

(c) Patients with advanced diseases have shorter survival than patients with early staged diseases.

Disease is advanced if and only if TETA. TE is therefore shorter for patients diagnosed with advanced stage than those with early stage. In the equation of survival S=TD-TE, survival is shorter for patients with advanced disease since TE is larger.

4. Treatment cure threshold and likelihood of cure at detection

(a) Faster growing (more aggressive) tumors progress beyond treatment cure threshold earlier.

In the equation of TC, TC is smaller when r is larger since VC is independent of r.

(b) Patients diagnosed with larger tumors have lower likelihood of cure.

The likelihood of cure is Pr(VE < VC), and it is smaller when VE is larger.

(c) Patients with faster growing (more aggressive) tumors have lower likelihood of cure

This is a consequence of Property 1(a) and 4(b).