Supplementary

Appendix 1: Random Survival Forest and Minimal Depth

RSF is an extension of Random Forest machine learning algorithm to analyze right-censored time-to-event data (1). A forest of survival trees is grown using a log-rank splitting rule to select the optimal candidate variables. A survival estimate for each observation is constructed with a Kaplan-Meier estimator within each terminal node and at each event time. Apart from prediction, RSF can be used for knowledge extraction, including variable ranking and recovery of nonlinear effects and interactions, as it is fully non-parametric (2,3).

RSF’s predictive accuracy for prediction of 2 years OS after UCBT was assessed by Harrell concordance (C)-index using out-of-bag (OOB) data. C-index is measure of discrimination, conceptually similar to the area under the receiver operating characteristic (ROC) curve, ranging from 0.5 to 1. The OOB method involves obtaining bootstrap samples from the derivation cohort and using each sample to compute a prediction model. Each bootstrap sample leaves out about one-third of the data, which is referred to as the OOB data.(1,3) The C-index is calculated using an OOB ensemble constructed with the 1000 OOB data sets produced by the 1000 samples used in deriving the forest. To enhance generalizability, we validated models developed on the derivation sets (20 versions of imputed data sets) on separate corresponding validation sets.

Minimal depth, a property derived from the construction of the trees within the forest, was used identify predictive variables (i.e., feature selection) (4). Minimal depth assumes that variables with high impact on the prediction are those that most frequently split nodes nearest to the root node, where they partition the largest samples of the population. Within each tree, node levels are numbered based on their relative distance to the root of the tree (with the root at 0). Minimal depth measures important risk factors by averaging the depth of the first split for each variable over all trees within the forest. The assumption in the metric is that smaller minimal depth values indicate the variable separates large groups of observations, and therefore has a large impact on the forest prediction. The minimal depth was calculated for 20 versions of the imputed derivation datasets and an average value was obtained. Estimation of predictor importance using the Random Survival Forest minimal depth is presented in Figure S1

.

Table S1: Data set country allocation

Country / Frequency of patients (n) / Cumulative Percent
Derivation data set / Brazil / 61 / 2.6
Czech Republic / 12 / 3.1
Germany / 64 / 5.8
Algeria / 2 / 5.9
Spain / 550 / 29.2
France / 953 / 69.5
Croatia / 2 / 69.6
Hungary / 15 / 70.2
Ireland / 4 / 70.4
Italy / 442 / 89.1
Jordan / 17 / 89.8
Malaysia / 1 / 89.9
Netherlands / 142 / 95.9
Russia / 7 / 96.2
Saudi Arabia / 62 / 98.8
Turkey / 26 / 99.9
South Africa / 2 / 100.0
Total / 2362
Validation data set / Argentina / 1 / .1
Australia / 81 / 10.5
Austria / 44 / 16.2
Belgium / 56 / 23.4
Bulgaria / 1 / 23.5
Canada / 53 / 30.3
Switzerland / 25 / 33.5
Denmark / 23 / 36.5
Finland / 15 / 38.4
England / 274 / 73.7
Greece / 47 / 79.7
Iran / 9 / 80.8
Israel / 66 / 89.3
Norway / 9 / 90.5
New Zealand / 4 / 91.0
Poland / 6 / 91.8
Portugal / 30 / 95.6
Slovenia / 1 / 95.8
Sweden / 33 / 100.0
Total / 778

1

Table S2. A Cox regression model for 2 year overall survival considering top predictors- no interactions considered

p / Exp(B) / 95% CI for Exp(B)
Lower / Upper
Age (>=18 vs <18 years) / .018 / 1.196 / 1.031 / 1.388
Recipient CMV sero-status (positive vs. negative) / .000 / 1.288 / 1.136 / 1.461
Diagnosis (AML vs ALL) / .442 / .952 / .838 / 1.080
Previous autograft (yes vs no) / .006 / 1.333 / 1.086 / 1.637
Disease status / .000
CR2 vs CR1 / .007 / 1.214 / 1.053 / 1.399
Other CR vs CR1 / .000 / 1.891 / 1.474 / 2.425
Advanced vs CR1 / .000 / 2.619 / 2.235 / 3.069
HLA mismatch (>1 vs <=1) / .001 / 1.228 / 1.083 / 1.393
TNC (>=3 vs <3 X107/kg) / .045 / 1.194 / 1.004 / 1.420
ATG (yes vs no) / .000 / 1.295 / 1.135 / 1.479
Center experience (UCBT/year) (>=20 vs <20) / .007 / 1.184 / 1.047 / 1.339
UCBT year / .006 / .969 / .948 / .991

Confidence interval (CI), cytomegalovirus (CMV), acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), Umbilical cord blood transplantation (UCBT), complete remission (CR),human leucocyte antigen (HLA), total nucleated cells (TNCs),anti-thymocyte globulin (ATG).

Table S3. A Cox regression model for 2 year overall survival considering top predictors with interactions considered

p-value / Exp(B) / 95.0% CI for Exp(B)
Lower / Upper
Age (>=18 vs <18 years) / .921 / .987 / .763 / 1.277
Recipient CMV sero-status (positive vs negative) / .000 / 1.283 / 1.131 / 1.456
Diagnosis (AML vs ALL) / .246 / 1.126 / .921 / 1.376
Previous autograft (yes vs no) / .002 / 1.379 / 1.123 / 1.694
Disease status / .000
CR2 vs CR1 / .003 / 1.379 / 1.115 / 1.705
Other CR vs CR1 / .000 / 2.466 / 1.795 / 3.387
Advanced vs CR1 / .000 / 3.043 / 2.311 / 4.006
HLA mismatch (>1 vs <=1) / .003 / 1.213 / 1.069 / 1.377
TNC (>=3 vs <3 X107/kg) / .091 / 1.163 / .976 / 1.385
ATG (yes vs no) / .691 / 1.051 / .822 / 1.344
Center experience (UCBT/year) (>=20 vs <20) / .009 / 1.179 / 1.042 / 1.334
UCBT year / .003 / .967 / .946 / .989
Age*ATG / .055 / 1.328 / .994 / 1.774
Disease status* Diagnosis / .093
CR2*Diagnosis / .132 / .804 / .606 / 1.068
Other CR*Diagnosis / .023 / .546 / .324 / .921
Advanced*Diagnosis / .142 / .779 / .558 / 1.088

Cytomegalovirus (CMV), acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), Umbilical cord blood transplantation (UCBT), complete remission (CR), human leucocyte antigen (HLA), total nucleated cells (TNCs), anti-thymocyte globulin (ATG), * interaction.

1

Table S4: Population characteristics of a randomly selected imputed data set

Derivation Set / Validation Set
Median / Percentile 25 / Percentile 75 / Count / Column N % / Median / Percentile 25 / Percentile 75 / Count / Column N % / p*
UCBT year / 2009.00 / 2007.00 / 2012.00 / 2010.00 / 2007.00 / 2012.00 / 0.335
Age at UCBT (years) / 24.03 / 8.10 / 45.37 / 15.95 / 5.80 / 37.94 / <0.001
<18 / 974 / 41.2% / 421 / 54.1% / <0.001
>=18 / 1388 / 58.8% / 357 / 45.9%
Gender / male / 1268 / 54.0% / 425 / 55.1% / 0.604
female / 1078 / 46.0% / 346 / 44.9%
Karnofsky/Lansky PS / <80% / 88 / 5.0% / 22 / 3.7% / 0.187
>=80% / 1685 / 95.0% / 580 / 96.3%
CMV / neg / 780 / 36.4% / 303 / 44.7% / <0.001
pos / 1364 / 63.6% / 375 / 55.3%
Diagnosis / ALL / 1032 / 43.7% / 365 / 46.9% / 0.117
AML / 1330 / 56.3% / 413 / 53.1%
Cytogenetics / Good / 92 / 5.35% / 37 / 6.9% / 0.206
Intermediate/Poor / 1485 / 86.39% / 462 / 86.4%
Sec AL / 142 / 8.26% / 36 / 6.7%
Months from diagnosis to UCBT / 9.59 / 5.78 / 22.05 / 11.79 / 5.91 / 24.02 / 0.78
<=12 / 1284 / 55.9% / 384 / 50.7% / 0.13
>12 / 1015 / 44.1% / 374 / 49.3%
Previousautograft / no / 2191 / 92.8% / 753 / 96.8% / <0.001
yes / 171 / 7.2% / 25 / 3.2%
Remission status / 1st CR / 1025 / 46.3% / 307 / 42.0% / 0.039
2nd CR / 780 / 35.3% / 291 / 39.8%
other CR / 112 / 5.1% / 47 / 6.4%
advanced disease / 295 / 13.3% / 86 / 11.8%
Graft / single CB unit / 1678 / 71.0% / 461 / 59.3% / <0.001
double CB unit / 684 / 29.0% / 317 / 40.7%
HLA mismatch / <=1 / 890 / 45.3% / 289 / 54.1% / <0.001
>1 / 1076 / 54.7% / 245 / 45.9%
ABO major vs other / Other / 1708 / 72.3% / 573 / 73.7% / 0.468
Major incompatibility / 654 / 27.7% / 205 / 26.3%
Female donor to male recipient / no / 1593 / 69.3% / 499 / 66.4% / 0.149
yes / 707 / 30.7% / 252 / 33.6%
TNCs cryopreserved (X107 cells/kg) / 4.83 / 3.68 / 6.67 / 5.47 / 4.12 / 7.50
<3 / 204 / 10.9% / 30 / 6.4% / 0.04
>=3 / 1669 / 89.1% / 439 / 93.6%
Conditioning / MAC / 1609 / 70.4% / 573 / 76.5% / 0.01
RIC / 676 / 29.6% / 176 / 23.5%
ATG / no / 749 / 34.7% / 315 / 46.7% / <0.001
yes / 1409 / 65.3% / 360 / 53.3%
Mycophenolate mofetil / no / 901 / 43.8% / 325 / 46.6% / 0.201
yes / 1157 / 56.2% / 373 / 53.4%
Center experience (UCBT/year) / 40 / 16 / 58 / 18 / 9 / 34 / <0.001
<20 / 730 / 30.9% / 420 / 54.0% / <0.001
>=20 / 1632 / 69.1% / 358 / 46.0%

1

* Variables were compared using the Mann Whitney and Chi square tests, for continues and nominal variables, respectively.

Umbilical cord blood transplantation (UCBT), performance status (PS), cytomegalovirus (CMV), acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), acute leukemia (AL), complete remission (CR), cord blood (CB), human leucocyte antigen (HLA), total nucleated cells (TNCs),myeloablative conditioning (MAC), reduced intensity conditioning (RIC), anti-thymocyte globulin (ATG).

Table S5. Risk for 2-years OS and 2-years LFScategorized by the UCBT score

2-years OS / 2-years LFS
A random validation set / All 20 imputed validation set / A random validation set / All 20 imputed validation set
Score category / HR (95% CI) / P / HR (95% CI) / p / HR (95% CI) / p / HR (95% CI) / p
0-1 / reference / <0.001 / reference / reference / <0.001
2 / 1.81 (1.26-2.59) / 0.001 / 1.71 (1.17-2.49) / 0.006 / 1.57 (1.13-2.19) / 0.007 / 1.54 (1.09-2.17) / 0.015
3 / 2.06 (1.42-2.99) / <0.001 / 2.00 (1.36-2.96) / 0.001 / 1.86 ( 1.32-2.61) / <0.001 / 1.89 (1.32-2.72) / 0.001
4 / 2.63 (1.78-3.89) / <0.001 / 2.70 (1.79-4.08) / <0.001 / 2.17 (1.51-3.14) / <0.001 / 2.24 (1.52-3.31) / <0.001
5 / 3.92 (2.61-5.89) / <0.001 / 3.97 (2.57-6.15) / <0.001 / 3.54 (2.43-5.18) / <0.001 / 3.71 (2.41-5.70) / <0.001
6-8 / 6.73 (4.44-10.21) / <0.001 / 6.45 (4.19-9.93) / <0.001 / 5.11 (3.45-7.56) / <0.001 / 5.10 (3.37-7.72) / <0.001

Overall survival (OS), leukemia free survival (LFS), umbilical cord blood transplantation (UCBT), Hazard ratio (HR), confidence interval (CI)

Table S6. Probabilities for OS, LFS, NRM, and RI stratified by the UCBT score over 20 imputed datasets

Training Set
Score / 1-year OS / 1-year LFS / 1-year NRM / 1-year RI / 2-years OS / 2-years LFS / 2-years NRM / 2-years RI
0-1 / 72.38 (71.57-73.29) / 72.38 (71.57-73.29) / 14.71 (13.60-15.58) / 20.43 (19.85-20.76) / 66.04 (65.53-66.74) / 66.04 (65.53-66.74) / 15.71 (14.63-16.60) / 24.91 (24.29-25.33)
2 / 63.01 (62.11-63.57) / 63.01 (62.11-63.57) / 23.48 (23.05-25.15) / 18.33 (17.23-19.23) / 55.37 (54.70-56.56) / 55.37 (54.70-56.56) / 26.42 (25.85-27.88) / 22.56 (21.79-23.24)
3 / 54.45 (53.69-55.90) / 54.45 (53.69-55.90) / 30.32 (28.36-31.58) / 19.30 (18.19-20.25) / 45.46 (44.59-46.83) / 45.46 (44.59-46.83) / 34.01 (32.41-35.45) / 23.70 (22.36-24.47)
4 / 45.88 (44.99-46.84) / 45.88 (44.99-46.84) / 33.55 (32.53-35.83) / 24.93 (23.85-25.87) / 37.23 (35.82-38.03) / 37.23 (35.82-38.03) / 37.14 (35.79-38.93) / 28.27 (27.19-29.73)
5 / 34.63 (30.70-36.97) / 34.63 (30.70-36.97) / 34.96 (31.28-38.23) / 34.84 (33.11-38.48) / 26.48 (24.04-27.84) / 26.48 (24.04-27.84) / 36.89 (33.27-39.45) / 39.71 (36.81-42.35)
6-8 / 23.20 (21.37-24.46) / 23.20 (21.37-24.46) / 43.02 (40.97-44.83) / 36.08 (33.99-38.61) / 14.54 (12.74-15.53) / 14.54 (12.74-15.53) / 44.88 (42.29-46.12) / 40.13 (37.73-41.91)
Validation Set
Score / 1-year OS / 1-year LFS / 1-year NRM / 1-year RI / 2-years OS / 2-years LFS / 2-years NRM / 2-years RI
0-1 / 78.96 (77.75-79.9) / 73.91 (73.14-75.09) / 10.34 (8.97-10.70) / 16.19 (14.21-16.87) / 70.21 (68.89-70.71) / 64.76 (64.33-65.86) / 14.10 (12.58-15.51) / 20.78 (19.47-22.65)
2 / 66.32 (61.97-67.4) / 60.91 (56.51-62.20) / 19.91 (18.94-22.82) / 19.45 (18.15-22.44) / 56.43 (53.30-59.02) / 54.60 (50.83-57.04) / 23.64 (22.16-26.01) / 22.22 (20.08-24.32)
3 / 61.00 (56.65-66.18) / 54.00 (48.60-57.71) / 27.75 (24.22-29.25) / 19.17 (16.69-22.16) / 50.09 (44.56-54.72) / 46.53 (39.66-48.95) / 31.41 (28.91-33.53) / 22.78 (20.15-26.82)
4 / 49.16 (45.56-54.14) / 48.72 (44.22-53.74) / 31.75 (26.71-34.07) / 19.82 (16.81-23.06) / 40.20 (37.99-43.97) / 44.17 (40.63-48.21) / 33.11 (29.30-35.46) / 23.14 (19.54-25.61)
5 / 35.53 (29.29-38.84) / 34.32 (25.70-38.83) / 29.04 (24.38-35.08) / 36.73 (28.25-47.11) / 27.98 (23.05-31.60) / 28.45 (19.17-30.24) / 31.07 (26.27-37.02) / 42.04 (33.85-51.91)
6-8 / 21.89 (16.36-25.58) / 23.10 (19.20-27.05) / 54.21 (49.53-57.90) / 23.63 (17.99-25.78) / 14.78 (10.91-17.41) / 18.11 (14.40-22.30) / 56.45 (54.07-60.80) / 25.79 (20.22-28.42)

(range)

Overall survival, leukemia free survival, non-relapse related mortality (NRM), relapse incidence, umbilical cord blood transplantation (UCBT)

Table S7: The population characteristics across the UCBT risk score categories over the validation set

CB score grouped
0-1 / 2 / 3 / 4 / 5 / 6-8
Count / Column N % / Count / Column N % / Count / Column N % / Count / Column N % / Count / Column N % / Count / Column N %
TNC (X107/kg) / >=3 / 695 / 99.4% / 743 / 96.9% / 588 / 90.5% / 361 / 79.5% / 237 / 78.5% / 186 / 69.4%
<3 / 4 / .6% / 24 / 3.1% / 62 / 9.5% / 93 / 20.5% / 65 / 21.5% / 82 / 30.6%
HLA mismatch / <=1 / 589 / 84.3% / 431 / 56.2% / 229 / 35.2% / 145 / 31.9% / 102 / 33.8% / 54 / 20.1%
>1 / 110 / 15.7% / 336 / 43.8% / 421 / 64.8% / 309 / 68.1% / 200 / 66.2% / 214 / 79.9%
Previous autograft / no / 697 / 99.7% / 752 / 98.0% / 624 / 96.0% / 413 / 91.0% / 254 / 84.1% / 204 / 76.1%
yes / 2 / .3% / 15 / 2.0% / 26 / 4.0% / 41 / 9.0% / 48 / 15.9% / 64 / 23.9%
Recipient CMV sero-status / negative / 551 / 78.8% / 299 / 39.0% / 177 / 27.2% / 116 / 25.6% / 75 / 24.8% / 31 / 11.6%
positive / 148 / 21.2% / 468 / 61.0% / 473 / 72.8% / 338 / 74.4% / 227 / 75.2% / 237 / 88.4%
Center experience (UCBT/year) / >=20 / 626 / 89.6% / 521 / 67.9% / 374 / 57.5% / 224 / 49.3% / 140 / 46.4% / 105 / 39.2%
<20 / 73 / 10.4% / 246 / 32.1% / 276 / 42.5% / 230 / 50.7% / 162 / 53.6% / 163 / 60.8%
Age & ATG / Other / 671 / 96.0% / 652 / 85.0% / 434 / 66.8% / 247 / 54.4% / 152 / 50.3% / 75 / 28.0%
>=18y & receiving ATG / 28 / 4.0% / 115 / 15.0% / 216 / 33.2% / 207 / 45.6% / 150 / 49.7% / 193 / 72.0%
Diagnosis & Disease st. / ALL&CR1, AML&CR1 / 546 / 78.1% / 441 / 57.5% / 266 / 40.9% / 105 / 23.1% / 18 / 6.0% / 1 / .4%
ALL&CR2, AML&CR2 / 153 / 21.9% / 322 / 42.0% / 334 / 51.4% / 214 / 47.1% / 88 / 29.1% / 21 / 7.8%
AML&other CR / 0 / 0.0% / 4 / .5% / 8 / 1.2% / 21 / 4.6% / 18 / 6.0% / 12 / 4.5%
ALL&other CR, ALL&advanced ds., AML&advanced ds. / 0 / 0.0% / 0 / 0.0% / 42 / 6.5% / 114 / 25.1% / 178 / 58.9% / 234 / 87.3%

Umbilical cord blood transplantation (UCBT), cytomegalovirus (CMV), human leucocyte antigen (HLA), number of total nucleated cells cryopreserved (TNC), Umbilical cord blood transplantation (UCBT), anti-thymocyte globulin (ATG), status (st), acute lymphoblastic leukemia (ALL), acute myeloid leukemia (AML), complete remission (CR).

1

Figure S1. Estimation of predictor importance using the Random Survival Forest minimal depth.Low minimal depth indicates high importance. Error bars indicate +/- 2 standard deviations over the imputed derivation data sets.

Number of total nucleated cells cryopreserved (TNC),umbilical cord blood transplantation (UCBT), cytomegalovirus serostatus (CMV), human leucocyte antigen (HLA), antithymocye globulin administration (ATG), mycophenolate mofetil (MMF), competability (comp.), female donor to male recipient (F to M).

Figure S2: Interaction analysis using conditional plots: Partial dependence coplotsof (A) age conditioned on ATG administration. (B) Disease status conditioned on diagnosis. Points estimates with loess smooth to indicate trend within each group. Boxplots indicate distribution of predicted survival for each predictor.

Antithmocyte globulin (ATG), status (st,), advanced (adv.), acute lymhoblastic leukemia (ALL), acute myeloid leukemia (AML), overall survival (OS).

Figure S3: A calibration plot of the risk scores between predicted and observed events (death) at two years, according to pentiles.

1.Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. The Annals of Applied Statistics 2008:841-60.

2.Ehrlinger J, Rajeswaran J, Blackstone EH. ggRandomForests: Exploring Random Forest Survival. 2015.

3.Hsich E, Gorodeski EZ, Blackstone EH, Ishwaran H, Lauer MS. Identifying important risk factors for survival in patient with systolic heart failure using random survival forests. Circulation: Cardiovascular Quality and Outcomes 2011;4(1):39-45.

4.Ishwaran H, Kogalur UB, Gorodeski EZ, Minn AJ, Lauer MS. High-dimensional variable selection for survival data. Journal of the American Statistical Association 2010;105(489):205-17.

1