Annex to
Preventing obesity in the US: impact on health service utilization and costs
This annex to the paper titled: “Preventing obesity in the US: impact on health service utilization and costs” reports a technical appendix providing:
1. A description of the selection process we used to select and validate the econometric approach
2. the specification of the distributions used to model the nine equations
3. Full output details of all the models employed in the analyses. In particular, for each of the models described in the paper, this technical appendix reports STATA outputs with coefficients, confidence intervals and model specification tests.
Part 1 - Selection of the econometric approach
The approach used in the analyses was compared to other four approaches: the RAND log-transformed four-part model (RAND-T), a modified version of the RAND four-part model using a generalized linear model (GLM) (RAND-GLM), a two-part regression approach using GLM (2PART) and a six-part regression approach by healthcare service using GLM (6PART). For each of the assessed econometric models, table A.1 reports the underlying algorithm. We have also considered other common approaches: the log-transformed single-part linear model, the square-root transformed single-part linear model, the square-root transformed two-part model, the untransformed two-part model. However, results from previous studies show that there is clear and consistent evidence of poorer performances and, therefore, these approaches were discarded before the testing exercise.
Table A.1 – specifications of the assessed models
Name / Model specificationRAND-T & RAND-V / TOTEXP= PrEXP∙[PrINP∙EXPINP+(1-PrINP)∙EXPNIP]
2PART / TOTEXP=PrEXP∙EXP
6PART / TOTEXP= PrINP∙EXPINP+PrOUTP∙EXPOUTP+PrDRU∙EXPDRU
These analyses / TOTEXP= PrINP∙NINP∙AVEXPINP+PrOUTP∙NOUTP∙AVEXPOUTP+PRDRU∙NDRU∙AVEXPDRU
Legend / TOTEXP: tot expenditure / AVEXP: expend. per episode / EXP: expenditure / PR: probability
INP: inpatient / NIP: non-inpatient / OUTP: outpatient / DRU: drug
N: number
All the models include two vectors of covariates with socio-demographic conditions and medical conditions. A third vector of variables was model-specific and included dimensions associated with the type and the quantity of healthcare services that were used by patients.
The models were compared and assessed on their predictive ability of goodness of fit statistics using a standard set of criteria. Given the high heterogeneity of the considered models, particularly in terms of their outputs, the assessment was limited to the prediction of total annual healthcare expenditure. The indicators on which the econometric approaches were primarily tested include: i) the average and the standard deviation; ii) the root mean squared error (RMSE); iii) the mean absolute prediction error (MAPE); iv) the mean prediction error (MPE); v) the scale-free Theil’s statistics (SFTS); vi) the ratio between the predicted healthcare expenditure and the original data by decile of predicted healthcare expenditure. Results of the key tests are presented in table A.2 and figure A.1
TableA.2. Descriptive statistics of original and predicted total expenditure
Regressionapproach / starting data / predicted data
mean / Mean
[low CI] / Mean
[up CI] / std dev / mean / Mean
[low CI] / Mean
[up CI] / std dev
RAND-T / 4559 / 4508 / 4610 / 11336 / 3067 / 3045 / 3089 / 4938
RAND -V / 4359 / 4312 / 4406 / 10955 / 2789 / 2771 / 2806 / 4031
2PART / 3982 / 3939 / 4025 / 10543 / 4673 / 4623 / 4723 / 12187
6PART / 4044 / 3997 / 4092 / 11519 / 4414 / 4384 / 4443 / 7177
9PART / 3412 / 3393 / 3430 / 4614 / 4044 / 3997 / 4092 / 11519
Figure A.1 Predicted-original healthcare expenditure ratio by decile of predicted healthcare expenditure
Part 2 – specifications of the models and process
Dependent variable / Regression approach / link / distribution / CharacterizationPrINP / Logistic regression / - / Logit / Logit(x)~N(0,1)
PrOUTP / Logistic regression / - / Logit / Logit(x)~N(0,1)
PrDRU / Logistic regression / - / Logit / Logit(x)~N(0,1)
NINP / Generalized linear model / Log / Gamma / x~Γ(k,θ)
NOUTP / Generalized linear model / Log / Poisson / x~Pois(λ)
NDRU / Generalized linear model / Log / Gamma / x~Γ(k,θ)
EXPINP / Generalized linear model / Log / Gamma / x~Γ(k,θ)
EXPOUTP / Generalized linear model / Log / Gamma / x~Γ(k,θ)
EXPDRU / Generalized linear model / log / Gamma / x~Γ(k,θ)
For each of the GLM models, the selection of the dependant variable and the link function to determine the relation between the dependant variable and the explanatory variables has followed a two-step approach.
In a first phase, we used the Box-Cox test to select the link between the dependent and the explanatory variables. For all the tested approaches the log link resulted the most suitable option.
In a second phase we used the Akaike information criterion to identify the best fitting log link-distribution. For all the outcomes we tested the combination log-link with all the following distribution families: Gaussian, inverse Gaussian, binomial, Poisson, negative binomial and gamma. Log-link and Gamma distribution resulted the most suitable association for all the GLM regressions but “number of outpatient visits” for which, instead, we used the combination of log-link and Poisson distribution.
Part 3 – output data of the regressions
Probability of drug prescription
Number of strata = 419 / Number of obs = 235582Number of PSUs = 2166 / Population size = 2243554589
Design df = 1747
F( 54, 1694) = 232.31
Prob > F = 0.0000
OR / Std err / t / P>t / CI [lb] / CI [ub]
Year 1997 / baseline
Year 1998 / 0.960241 / 0.053633 / -0.73 / 0.468 / 0.860607 / 1.07141
Year 1999 / 0.941542 / 0.052422 / -1.08 / 0.279 / 0.844141 / 1.050181
Year 2000 / 1.128582 / 0.055247 / 2.47 / 0.014 / 1.025263 / 1.242312
Year 2001 / 1.310069 / 0.069667 / 5.08 / 0 / 1.180314 / 1.454088
Year 2002 / 1.223356 / 0.064098 / 3.85 / 0 / 1.103882 / 1.355759
Year 2003 / 1.165743 / 0.063462 / 2.82 / 0.005 / 1.047688 / 1.2971
Year 2004 / 1.120714 / 0.061465 / 2.08 / 0.038 / 1.006419 / 1.24799
Year 2005 / 1.109103 / 0.061428 / 1.87 / 0.062 / 0.994936 / 1.23637
Year 2006 / 1.047613 / 0.056736 / 0.86 / 0.391 / 0.942042 / 1.165015
Year 2007 / 1.006822 / 0.05576 / 0.12 / 0.902 / 0.903188 / 1.122347
Year 2008 / 0.962088 / 0.052634 / -0.71 / 0.48 / 0.864201 / 1.071061
Year 2009 / 0.935176 / 0.050736 / -1.24 / 0.217 / 0.840778 / 1.040173
Year 2010 / 0.80641 / 0.044996 / -3.86 / 0 / 0.722816 / 0.899672
Gender male / baseline
Gender female / 1.976008 / 0.030982 / 43.44 / 0 / 1.916166 / 2.037718
Age group 0-10 / baseline
Age group 11-20 / 0.693287 / 0.028095 / -9.04 / 0 / 0.640317 / 0.750639
Age group 21-30 / 0.717505 / 0.037168 / -6.41 / 0 / 0.648187 / 0.794235
Age group 31-40 / 0.662028 / 0.036294 / -7.52 / 0 / 0.594537 / 0.73718
Age group 41-50 / 0.681293 / 0.037928 / -6.89 / 0 / 0.610822 / 0.759896
Age group 51-60 / 0.843701 / 0.048796 / -2.94 / 0.003 / 0.753225 / 0.945044
Age group 61-70 / 1.128852 / 0.073678 / 1.86 / 0.063 / 0.993213 / 1.283016
Age group 71-80 / 1.41722 / 0.099927 / 4.95 / 0 / 1.234178 / 1.627407
Age group 81+ / 1.760476 / 0.180171 / 5.53 / 0 / 1.44031 / 2.151811
Race white non-Hispanic / baseline
Race white Hispanic / 0.659183 / 0.016164 / -17 / 0 / 0.628232 / 0.69166
Race Black / 0.663454 / 0.01628 / -16.72 / 0 / 0.632279 / 0.696166
Race Asian / 0.516445 / 0.02008 / -16.99 / 0 / 0.478525 / 0.557369
Race others / 0.789194 / 0.043469 / -4.3 / 0 / 0.70838 / 0.879227
No degree / baseline
High school or less / 0.964413 / 0.021881 / -1.6 / 0.11 / 0.922437 / 1.008299
Bachelor degree / 1.127455 / 0.03306 / 4.09 / 0 / 1.064443 / 1.194197
Master degree or more / 1.219003 / 0.04564 / 5.29 / 0 / 1.132695 / 1.311887
Below poverty line / baseline
1.01 to 1.24 times poverty line / 0.976861 / 0.039322 / -0.58 / 0.561 / 0.902705 / 1.05711
1.25 to 1.99 times poverty line / 0.958667 / 0.029171 / -1.39 / 0.166 / 0.903128 / 1.017622
2.0 to 3.99 times poverty line / 1.045661 / 0.029143 / 1.6 / 0.109 / 0.990035 / 1.104411
4.00 or more times poverty line / 1.191752 / 0.03533 / 5.92 / 0 / 1.124435 / 1.2631
Private insurance / baseline
Public insurance only / 1.1148 / 0.029523 / 4.1 / 0 / 1.058375 / 1.174235
Uninsured / 0.426268 / 0.01059 / -34.32 / 0 / 0.405995 / 0.447553
Single / baseline
Married / 1.133891 / 0.026564 / 5.36 / 0 / 1.082969 / 1.187207
Widow/divorced/separated / 1.042181 / 0.03323 / 1.3 / 0.195 / 0.979003 / 1.109436
Region northeast / baseline
Region midwest / 0.997555 / 0.029212 / -0.08 / 0.933 / 0.941875 / 1.056527
Region south / 1.144604 / 0.031295 / 4.94 / 0 / 1.084842 / 1.207659
Region west / 0.840039 / 0.025816 / -5.67 / 0 / 0.790901 / 0.892229
Body Mass index / 1.008868 / 0.001436 / 6.2 / 0 / 1.006056 / 1.011687
Have hypertension / 7.799568 / 0.359499 / 44.56 / 0 / 7.125406 / 8.537515
Have high cholesterol / 4.612888 / 0.21912 / 32.19 / 0 / 4.202535 / 5.063309
Have diabetes / 7.267646 / 0.551371 / 26.14 / 0 / 6.26284 / 8.433662
Have IHD / 2.586813 / 0.470251 / 5.23 / 0 / 1.811006 / 3.694964
Have stroke / 2.208712 / 0.401941 / 4.35 / 0 / 1.545719 / 3.156079
Have colorectal cancer / 1.0454 / 0.250611 / 0.19 / 0.853 / 0.653258 / 1.672938
Have lung cancer / 1.769363 / 0.620199 / 1.63 / 0.104 / 0.889706 / 3.518745
Have breast cancer / 2.13414 / 0.380646 / 4.25 / 0 / 1.50417 / 3.027951
Have other comorbidities / 3.305391 / 0.197566 / 20 / 0 / 2.93975 / 3.716509
Health status excellent / baseline
Health status very good / 1.389332 / 0.02528 / 18.07 / 0 / 1.340625 / 1.439809
Health status good / 2.00144 / 0.044043 / 31.53 / 0 / 1.916894 / 2.089714
Health status fair / 3.193893 / 0.122525 / 30.27 / 0 / 2.962399 / 3.443476
Health status poor / 5.15338 / 0.499814 / 16.91 / 0 / 4.26068 / 6.233119
Constant / 0.178158 / 0.01503 / -20.45 / 0 / 0.150989 / 0.210216
Number of drug prescriptions
Number of strata = 419 / Number of obs = 177813Number of PSUs = 2057 / Population size = 1737287256
Design df = 1638
coefficient / Std err / t / P>t / CI [lb] / CI [ub]
Year 1997 / baseline
Year 1998 / 0.008357 / 0.039753 / 0.21 / 0.834 / -0.06961 / 0.086329
Year 1999 / 0.026552 / 0.038446 / 0.69 / 0.49 / -0.04886 / 0.101961
Year 2000 / 0.093828 / 0.033069 / 2.84 / 0.005 / 0.028966 / 0.158691
Year 2001 / 0.174216 / 0.034862 / 5 / 0 / 0.105837 / 0.242595
Year 2002 / 0.22128 / 0.03553 / 6.23 / 0 / 0.151592 / 0.290968
Year 2003 / 0.231003 / 0.036528 / 6.32 / 0 / 0.159357 / 0.302649
Year 2004 / 0.23859 / 0.037036 / 6.44 / 0 / 0.165948 / 0.311232
Year 2005 / 0.247455 / 0.036069 / 6.86 / 0 / 0.176709 / 0.318201
Year 2006 / 0.236058 / 0.035304 / 6.69 / 0 / 0.166811 / 0.305304
Year 2007 / 0.179197 / 0.035058 / 5.11 / 0 / 0.110433 / 0.247961
Year 2008 / 0.143281 / 0.035984 / 3.98 / 0 / 0.072701 / 0.21386
Year 2009 / 0.155553 / 0.036538 / 4.26 / 0 / 0.083887 / 0.227219
Year 2010 / 0.123055 / 0.037004 / 3.33 / 0.001 / 0.050476 / 0.195635
Gender male / baseline
Gender female / 0.211611 / 0.008359 / 25.32 / 0 / 0.195216 / 0.228005
Age group 0-10 / baseline
Age group 11-20 / 0.149508 / 0.027166 / 5.5 / 0 / 0.096224 / 0.202792
Age group 21-30 / 0.228909 / 0.031934 / 7.17 / 0 / 0.166273 / 0.291544
Age group 31-40 / 0.401912 / 0.032707 / 12.29 / 0 / 0.337761 / 0.466063
Age group 41-50 / 0.609313 / 0.032079 / 18.99 / 0 / 0.546392 / 0.672233
Age group 51-60 / 0.805883 / 0.03152 / 25.57 / 0 / 0.74406 / 0.867706
Age group 61-70 / 0.865428 / 0.031924 / 27.11 / 0 / 0.802812 / 0.928043
Age group 71-80 / 0.907309 / 0.033127 / 27.39 / 0 / 0.842334 / 0.972284
Age group 81+ / 1.006021 / 0.035319 / 28.48 / 0 / 0.936746 / 1.075295
Race white non-Hispanic / baseline
Race white Hispanic / -0.3246 / 0.01427 / -22.75 / 0 / -0.35259 / -0.29661
Race Black / -0.28668 / 0.013276 / -21.59 / 0 / -0.31272 / -0.26064
Race Asian / -0.39478 / 0.021908 / -18.02 / 0 / -0.43775 / -0.35181
Race others / -0.05926 / 0.031617 / -1.87 / 0.061 / -0.12128 / 0.002752
No degree / baseline
High school or less / 0.020478 / 0.010723 / 1.91 / 0.056 / -0.00055 / 0.041511
Bachelor degree / 0.079543 / 0.013439 / 5.92 / 0 / 0.053184 / 0.105902
Master degree or more / 0.083903 / 0.017047 / 4.92 / 0 / 0.050468 / 0.117339
Below poverty line / baseline
1.01 to 1.24 times pov line / -0.01867 / 0.019226 / -0.97 / 0.332 / -0.05638 / 0.019039
1.25 to 1.99 times pov line / -0.01831 / 0.014984 / -1.22 / 0.222 / -0.0477 / 0.011079
2.0 to 3.99 times pov line / -0.01782 / 0.014279 / -1.25 / 0.212 / -0.04582 / 0.010191
4.00 or more times pov line / 0.00547 / 0.015319 / 0.36 / 0.721 / -0.02458 / 0.035517
Private insurance / baseline
Public insurance only / 0.182466 / 0.011698 / 15.6 / 0 / 0.159522 / 0.205409
Uninsured / -0.29156 / 0.016119 / -18.09 / 0 / -0.32318 / -0.25995
Single / baseline
Married / -0.11949 / 0.014593 / -8.19 / 0 / -0.14811 / -0.09087
Widow/divorced/separated / -0.02767 / 0.015455 / -1.79 / 0.074 / -0.05799 / 0.002641
Region northeast / baseline
Region midwest / 0.07424 / 0.015154 / 4.9 / 0 / 0.044517 / 0.103962
Region south / 0.08699 / 0.014947 / 5.82 / 0 / 0.057673 / 0.116308
Region west / -0.0247 / 0.014857 / -1.66 / 0.097 / -0.05385 / 0.004437
Body Mass index / 0.003526 / 0.000615 / 5.73 / 0 / 0.00232 / 0.004732
Have hypertension / 0.457415 / 0.008546 / 53.52 / 0 / 0.440652 / 0.474178
Have high cholesterol / 0.317417 / 0.009436 / 33.64 / 0 / 0.29891 / 0.335923
Have diabetes / 0.47703 / 0.010332 / 46.17 / 0 / 0.456765 / 0.497296
Have IHD / 0.243455 / 0.022194 / 10.97 / 0 / 0.199923 / 0.286987
Have stroke / 0.187277 / 0.026146 / 7.16 / 0 / 0.135993 / 0.23856
Have colorectal cancer / 0.097514 / 0.071642 / 1.36 / 0.174 / -0.04301 / 0.238033
Have lung cancer / 0.105761 / 0.070494 / 1.5 / 0.134 / -0.03251 / 0.244029
Have breast cancer / 0.116634 / 0.035612 / 3.28 / 0.001 / 0.046783 / 0.186484
Have other comorbidities / 0.40865 / 0.018085 / 22.6 / 0 / 0.373178 / 0.444122
Health status excellent / baseline
Health status very good / 0.230459 / 0.011053 / 20.85 / 0 / 0.208779 / 0.252139
Health status good / 0.571268 / 0.01235 / 46.26 / 0 / 0.547044 / 0.595492
Health status fair / 0.98756 / 0.015525 / 63.61 / 0 / 0.95711 / 1.01801
Health status poor / 1.363638 / 0.021235 / 64.22 / 0 / 1.321988 / 1.405288
Constant / 0.318997 / 0.03895 / 8.19 / 0 / 0.2426 / 0.395394
Cost per drug prescription