The Cost and Outcome Effectiveness of Total Hip Replacement: Technique Choice and Volume-Output Effects Matter; Applied Health Economics and Health Policy; Goldstein, Babikian, Rana, Mackenzie and Millar;

APPENDIX 2: Econometric Issues Detail

Unobserved heterogeneity model testing protocol

The test procedures differs slightly for linear and nonlinear estimation. The main hypotheses tested are: (1) the appropriateness of additional orthogonality conditions (Xijkand ν0k are orthogonal) in RE model; and (2) whether σ2ν = 0 in the RE model or ν0k =0 for k=1…G in the FE model. Tests of orthogonality determine the appropriateness of RE model versus FE model. Tests of zero variances or zero coefficients respectively determine the appropriateness of RE model and the FE model versus PA model.

For linear estimation, a Hausman test addresses (1) where only covariates that vary over i, j and k are included in equations. In RE where s2ν = 0, where s2 is a variance estimate, a Hausman test cannot be calculated.Swamy-Arora estimates of variance components do not alleviate this problem. Alternatively, a Breusch and Pagan Lagrange Multiplier (BP-LM) test for zero variance is used. When variances are zero, the FGLS estimator is equivalent to the pooled OLS estimator. Despite this endorsement of the PA model, we interpret zero variance as a rejection of the RE model in favor of the FE model and test ν0k =0 for k=1…G using an F test. If this null is accepted, then the PA model is indicated.

For nonlinear (logit) estimation, an LR test for ρ=0, where ρ is the percentage of total variance contributed by panel-level variance, is employed to distinguish the RE model from the PA model. If the null is accepted, the RE model is rejected. Given that a viable LR test for ν0k =0 for k=1…G cannot be constructed for the FE model because unconditional MLE are inconsistent, a Hausman test, suggested by [51; p.805], comparing the FE (CMLE) (heterogeneity of fixed effects) and pooled logit model (homogeneity) is employed. In the calculation of this statistic, a non-positive semi-definite (PSD) difference in variance matrices is interpreted as if the FE model is not fully efficient and the hypothesis of homogeneity across hospital clusters is accepted. Thus, the PA model is indicated. Although, it should be noted that a non PSD matrix associated with a Hausman test can be interpreted in various ways.

In this protocol, surgeon cluster FE models are not considered because key policy variables are invariant within these clusters.

For two-level models based on hospital clusters, for every continuous dependent variable, the Hausman test could not be performed due to zero variances for random effects. Additionally, for every variable, continuous or not, with the exception of one, the null hypothesis of zero panel-level variance for the BP-LM test or the LR test was accepted with χ2 test statistics equal to zero. In the linear estimation cases, the zero χ2 was the result of σ2ν = 0, while in the nonlinear cases, σ2ν was small enough to make χ2=0. In the NURFAC case, the null was accepted with a p-value of .36. Thus in all cases, the RE models were ruled out. For the test of ν0k =0 for k=1…G, the null is accepted for LOS (p-value=.1010) and rejected for LNTOTCOST with p-values of 0.00 . For nonlinear estimations, the homogeneity test for unobserved effects in the form of a Hausman test, suggested by [51; p.805], resulted in non-positive semi-definite variance difference matrices. Despite this result, the χ2 test statistics remained positive. In these instances, we accept the homogeneity hypothesis. Effectively, the systematic occurrence of non-positive semi-definite matrices across all equations leads us to reject the notion that the gained efficiency of the RE model is insufficient to offset random influences on variance in favor of the deficiency of the RE model. Thus, the FE model is used for cost variables and the pooled model for all others. All p-values are adjusted for G-1 degrees of freedom when appropriate.

Assignment of missing values for surgical technique

Assigning cluster aggregated percent of techniques used to missing technique values constitutes a sample selection rule (SR). Desirable estimate properties require a SR that is independent of the dependent variable or unobserved effects that determine it [52;p.43]. Under this condition, linear and nonlinear regression on the CC sample produces unbiased parameter estimates. In the missing value approach, the CC sample can be extended using imputation or regression calibration. This approach necessitates the stricter missing completely at random assumption.

In a measurement error framework, desirable properties are preserved and precision improved via the use of a vector of surrogate variables, W, for missing technique variables [53;p.51]. This requires that W contain no additional information on the dependent variable than that in surgical technique and all other regressors. In other words, since technique variables and other regressors are exogenous and uncorrelated with εijk, W is uncorrelated with εijk.

In particular, E(W)=E(ST) is required where ST is the vector of technique variables. In pooled and single-cluster models, W≡ , a vector of ST sample means. This extends the sample by including mixed technique surgeons who responded to the survey with the percent of each technique used.

For the pooled and hospital fixed effects cases, W is respectively jk which represent the sum of ST over all i for a given j or k and E()= E(ST). More technically, contains a nondifferential measurement error. ikis uncorrelated with the resulting measurement – a nonclassical (Berkson) measurement error exists. In contrast, ST is correlated with the measurement error, but only in the extended sample where replaces ST. Given that ikis the regressor in the extended sample, it remains uncorrelated with the regression error which includes portions of the measurement error.Thus, linear regression estimates are consistent for the CCAGG sample. This results holds for all of the unobserved effects models considered here when they are used to estimate the conditional mean of the subset of continuous outcome variable.

This result cannot be readily extended to nonlinear estimation. Thus, a comparison of CC and CCAGG results is useful to indicate that nonlinear estimates in the CCAGG case are not compromised by missing value assignment.

Complete case sample regression results are contained in the supplemental regression results, appendix 3 (Online Resource 3) Table A3.1.

Endogeneity

[36; p.172] argues that selective referral is likely more relevant for non-emergency, but risky conditions, especially if treated by referral specialists. LSURGVOL is the only potentially endogenous variables that appears in equation (1).

Theoretically, THR outcomes determine productivity and quality of life over one’s remaining work life/life. Intertemporal utility can be improved by seeking better outcomes. The costs of traveling an extra 30-50 miles should be offset by the potential long-term gains from seeking better quality outcomes. LSURGVOL is modelled as a quadratic function with diminishing returns of patient-surgeon driving distance and patient-hospital driving distance. In a rural state, such as Maine, where surgeons often have multiple office locations, but hospitals are centralized both distances may affect referral decisions. In addition, a rural-urban dichotomy exists. Rural patients travel further and likely consider a distinct quadratic distance function. As a result, eight IV, listed in Table 1 and present in Z, are relevant. Given that hospital and surgeon distances are highly correlated (a correlation coefficient of .99) and that hospital distances represent the greater burden, for parsimony and in order to insure that the CRVE can be derived, our analysis only relies on hospital distance.Thus four basic IVs act as a starting point: PHDIST, PHDISTSQ, RURAL*PHDIST and RURAL*PHDISTSQ.

2SLS estimates for linear estimations (LNTOTCOST and LOS) are derived using these 4 IVs and are reported in Table A.3.3. Two stage residual inclusion (2SRI) estimation [54] is used for nonlinear estimation (NURFAC, PT and DISCHARGE).

The 2SRI method estimates the reduced form (RF) equation for LSURGVOL that include the four basic instruments by OLS. The OLS residual (LSURGVOLRES) is retained. LSURGVOLRESis included in the original structural equation and estimated using the same methods employed in Table 4. These 2SRI results are reported in the supplemental regression results, Appendix 3 (Online Resource 3) Table A.3.3 for NURFAC, PT, and DISCHARGE.

Standard tests related to IVE include tests of endogeneity, over identifying restrictions, and under identification/weak (IV) identification. These tests are readily available for linear estimates using the Stata IVREG2 and XTIVREG2 user-written programs [55]. An endogeneity test for 2SRI estimation is equivalent to a test of βLSURGVOLRES = 0 in the augmented structural equation. Endogeneity tests for linear estimations (LNTOTCOST and LOS) are C-like statistics. The C-like statistic used is the difference of two Sargan-Hansen statistics: one each for the equation where the potentially endogenous regressors are treated respectively as exogenous and endogenous. Unlike the Wu-Hausman F test, this statistic is robust to conditional heteroscedasticity.

C-like statistics do not reject exogeneity for LOS and LNTOTCOST equations. 2SRI residual z and χ2 tests do not reject exogeneity for NURFAC and DISCHARGE.C-like statistics p-values for LOS and LNTOTCOST using a finite sample correction via an F test are respectively .548 and .582 and 2SRI z tests for NURFAC, PT and χ2 test for simultaneous zero residual coefficients in DISCHARGE are respectively p= .673,p=0.015and p=.772. Thus, corrections for endogeneity are only warranted in the PT equation.

Additional IVE tests are only available for the LNTOTCOST and LOS equations. Given the same endogenous structure in all equations, the reported results should be relevant for nonlinear estimations. These testsestablish that two-stage structural parameter estimates are consistent.

The other tests employed are: Hansen J statistic for over-identifying restrictions; single equation weak identification tests – first stage F statistic and Sanderson and Windmeijer (SW) conditional F statistic [56]; and full system weak identification test – Kleibergen-Papp (KP) F statistic. All tests are cluster robust with small sample adjustments.

For the LOS (LNTOTCOST) equations with 4 IVs, relevant statistics/p-values are: Hansen J, p=.409(p=.421); SW F and first stage F statistics by endogenous regressor: LSURGVOL FSW=14.20, F=14.20 (FSW=51.70, F=51.70); FKP=14.20 (FKP=51.70).

The Hansen J test accepts the null that the IVs are valid instruments in the sense that they are not correlated with the structural error term and do not have a direct effect on the dependent variable.

Under non-spherical error terms Stock and Yogo (SY) critical values [57] no longer provide exact tests, but can rather be used as a guide. For FKP, the critical value for LIML size .10 with one endogenous regressor and 4 IVs at 5% level of significance is 5.44. In both cases, the null of weak identification is rejected. For FSW and F we use SY critical value for 16.85 (10.27based on relative bias of .05 (.10). On the basis of these critical values, weak identification is rejected for the single endogenous variables in both equations.

In summary the IVs meet the requirements of IVs and are sufficiently correlated with the endogenous variables.

IV/2SRI corrections for selective referral bias on VO effects are reported with supplemental regression results in Appendix 3 (Online Resource 3) in Table A.3.3.

Omitted variable bias

The literature on BMI and THR is divided. Either marginally small negative or no statistically significant impacts are reported.On the negative side, [58-61] find marginally higher complication/revision rates, [59,62] find marginally smaller functional improvements and [63] reveal marginal cost increases. In contrast, others find no impact on survival rates [64,65], complication rates [32,65], length of stay [18,66], acute care costs [66] and inflammatory/pain response [67].

Thus, omitted variable bias, if it exists, is likely to be small and confined. Despite this, a proxy for BMI – DIABETES – is used. Diabetes may have its own direct effect and may not meet the redundancy condition for a proxy implying that bias remains, but is limited to slope estimates on variables correlated with BMI.

The literature on the effects of diabetes finds similar results to the BMI impacts, but does not control for BMI. Thus, it is inconclusive on whether diabetes has a direct effect. Given that DIABETES is an imperfect, but strong proxy, 80%-85% of diabetics are overweight or obese [68], estimated slope coefficients on other variables correlated with BMI – such as AGE and RURAL -- will be mildly biased.

The negative consequences of unobserved BMI require that βBMI≠ 0 in equation (1). While we cannot test this condition, strong, yet indirect, evidence contradicts it. If the coefficient on DIABETES in equation (1) is biased, it equals (βDIABETES + βBMI δDIABETES) where δDIABETES is the coefficient on DIABETES in a regression equation for BMI. Given that βDIABETES and βBMI are either zero or positive for outcome variables that are increasing in undesirable outcomes and that DIABETES is positively correlated with BMI (δDIABETES>0), an insignificant result for βDIABETES in our equations suggest that both BMI and DIABETES were never relevant variables. While a test of the relevant regression parameter corrects for sampling error, it cannot correct for the joint sampling error involving βDIABETES and βBMI. Thus, this test does not conclusively establish that both parameters are zero.