Figure S1 - Sequencing analysis of two mixed barcodes of second prototype (AGY) yielded quantitative measurements. Two plasmid vectors containing known barcode sequences were mixed at defined proportions and diluted in gDNA. (A) Sequencing analysis of the barcode revealed the expected quantity of each plasmid. The values from analysis of the electropherogram were plotted for clone pBS-AGY 1 (B) and pBS-AGY 2 (C), showing close agreement between the experimental and expected results.

Figure S2 - Quantitative results obtained from second prototype barcode (AGY) library sequencing. A known barcode clone, pBS-AGY 1, was mixed at defined proportions with the plasmid library. (A) Sequencing of the barcode revealed the expected quantity of the clone and the library. The values for the clone (B) and library (C) were plotted, revealing close agreement between the experimental and expected results.

Figure S3 - Electropherograms obtained from the mixture of the AGY library and a known clone (Figure S2). The proportions of 1:0 (100% Library : 0% pBS-AGY 1), 3:1 (75% Library : 25% pBS-AGY 1), 1:3 (25% Library : 75% pBS-AGY 1) and 0:1 (0% Library : 100% pBS-AGY 1) were sequenced in triplicates from a single PCR product showing the reproducibility of the sequencing. In these charts it is also possible to visually observe the change in peak area in relation to the quantity of the clone present in the mixture.

Figure S4 - Immunofluorescence to detect LMO2 expression in transduced cells. NIH 3T3 cells were transduced or not (control) with the lentiviral vector containing barcode and carrying the LMO2 transgene. After transduction, cells were stained with polyclonal anti-LMO2 (abcam - ab72841). We were able to detect the eGFP protein expressed by the vector and also the LMO2 antibody, represented in red, in the nuclei of transduced cells. Although there was red staining in the untransduced cells, we interpret this as non-specific since the LMO2 protein is exclusively nuclear.

Figure S5 - Flow cytometry for analysis of expression of IL2RG in HT1080 cells. (A) untransduced cells; (B) transduction with the lentiviral vector containing barcode; (C) transduction with the lentiviral vector containing barcode plus the IL2RG transgene . Cells were stained with PE-conjugated anti-CD132 antibody (clone #31134; R&D Sistems). After labeling, the cells were permeabilized, leading to loss of eGFP, so only the stain with the antibody is shown.

Figure S6: Transduction efficiency of HSC. The HSC used in the in vivo assay were analyzed by flow cytometry for eGFP expression. As seen above, our transduction protocol yielded approximately 40% transduced cells.

Figure S7 - Hematologic analyses of transplant groups during long term observation. Peripheral blood was analyzed at the indicated time, points by manual counting of each of the indicated cell types. (A) White blood cells, (B) lymphocytes and (C) neutrophils. Animals were transplanted with hematopoietic stem cells that had not been transduced (Control) or transduced with the vector encoding the library, but no gene of interest (Library) or encoding LMO2 or IL2RG.

Supplementary Materials and Methods

Myelogram

Morphologic analyses of the bone marrow indicated the presence of two blast populations: one of smaller size, high ratio of nucleus/cytoplasm, absence of granules, loose nuclear chromatin, 1 to 2 nucleoli of small size; the other population of blasts presented large size, smaller ratio of nucleus/cytoplasm, intense cytoplasmic basophilia, denser nuclear chromatin and 1 to 3 large, dysplastic nucleoli and absence of cytoplasmic granulations.

Note that in mice, CD117 is a marker of primitive myeloid and lymphoid precursors. This maker disappears as B and T cells develop1. Therefore, since we encountered reduction in CD34 (Figure 6D), without a significant difference in CD117 and an increase in the percentage of immature B cells (Figure 6B), we infer that the blasts are of the B lymphocyte lineage.

Detailed description of calculations

Supplementary analysis A:

Barcode sequence quantification: Step by step analysis of two mixed plasmid vectors

The prototype barcode AGY and the experiment showed in the Figure S1 were used to exemplify the barcode quantification from a mixture of two plasmid vectors. Below we describe the steps for this analysis.

Step 1 - Identification of the region of interest from the barcode sequencing result. Only the positions containing different nucleotides between the two clones were analyzed (red selection).

Clone / Barcode sequence
1st position / 2nd position / 3rd position / 4th position / 5th position
pBS-AGY1 / AGC / AGC / AGC / AGC / AGT
pBS-AGY2 / AGC / AGT / AGT / AGT / AGT

Step 2 - PolyPhred results: Since each variable position of the barcode may contain two different nucleotides, the PolyPhred values of both nucleotides was determined (i.e., for the prototype barcode AGY, the nucleotides C and T were analyzed at each position). The PolyPhred values are shown on the following table.

Ratio of Plasmid
pBS-AGY1: pBS-AGY2 / 2nd position / 3rd position / 4th position
C / T / C / T / C / T
1:0 / 63174.08 / 0.00 / 63174.08 / 0.00 / 63174.08 / 0.00
9:1 / 64144.50 / 2248.21 / 64144.50 / 5086.12 / 64144.50 / 8919.13
3:1 / 65280.00 / 8104.50 / 65280.00 / 14502.78 / 65280.00 / 22394.00
1:1 / 30558.19 / 5542.26 / 28248.56 / 4552.57 / 27964.30 / 3958.76
1:3 / 3872.57 / 15560.19 / 6346.71 / 15462.32 / 8713.29 / 13798.66
1:9 / 2205.63 / 14151.00 / 1195.57 / 15880.17 / 5153.34 / 12315.95
0:1 / 0.00 / 24718.58 / 0.00 / 25954.51 / 0.00 / 31971.53

Step 3 - Data normalization: The PolyPhred values are then used to calculate the contribution for each base of the variable site. For each position, the value of each nucleotide is normalized (expressed as percent) so that the sum of both nucleotides is 100%. The values are shown on the following table.

Ratio of Plasmid
pBS-AGY1: pBS-AGY2 / 2nd position / 3rd position / 4th position
C / T / C / T / C / T
1:0 / 100.00 / 0.00 / 100.00 / 0.00 / 100.00 / 0.00
9:1 / 96.61 / 3.39 / 92.65 / 7.35 / 87.79 / 12.21
3:1 / 88.96 / 11.04 / 81.82 / 18.18 / 74.46 / 25.54
1:1 / 70.18 / 29.82 / 72.19 / 27.81 / 74.31 / 25.69
1:3 / 19.93 / 80.07 / 29.10 / 70.90 / 38.71 / 61.29
1:9 / 13.48 / 86.52 / 7.00 / 93.00 / 29.50 / 70.50
0:1 / 0.00 / 100.00 / 0.00 / 100.00 / 0.00 / 100.00

Step 4 - Quantification results: The mean value of the analyzed nucleotides at the selected positions was calculated.

Ratio of Plasmid
pBS-AGY1: pBS-AGY2 / pBS-AGY 1 / pBS-AGY 2
C mean / T mean
1:0 / 100.0 / 0.0
9:1 / 92.4 / 7.7
3:1 / 81.7 / 18.3
1:1 / 72.2 / 27.8
1:3 / 29.2 / 70.8
1:9 / 16.7 / 83.3
0:1 / 0.0 / 100.0

Supplementary analysis B:

Barcode sequence quantification: Step by step analysis of barcode library mixed with a known barcode sequence

The prototype barcode AGY and the experiment shown in Figure S2 were used to exemplify the barcode quantification from a mixture of barcode library and a known barcode sequence. Below we describe the steps for this analysis. As the PolyPhred normalized data was exemplified in the Supplementary Analysis A, these steps will be omitted in this explanation.

Step 1 - The library barcode was sequenced in triplicate to determine the expected mean value of each possible base at each variable position. This gives a baseline of the library barcode and is used to calculate the expected values for the contribution of the library to the mixture. Note that repeated sequencing of the same sample yielded consistent data.

Normalized library values
1st position / 2nd position / 3rd position / 4th position / 5th position
C / T / C / T / C / T / C / T / C / T
SEQ 1 / 55.07 / 44.93 / 54.99 / 45.01 / 53.31 / 46.69 / 38.77 / 61.23 / 37.95 / 62.05
SEQ 2 / 53.90 / 46.10 / 54.81 / 45.19 / 53.86 / 46.14 / 37.61 / 62.39 / 38.73 / 61.27
SEQ 3 / 55.08 / 44.92 / 53.95 / 46.05 / 50.50 / 49.50 / 39.66 / 60.34 / 40.06 / 59.94
Mean / 54.68 / 45.32 / 54.58 / 45.42 / 52.56 / 47.44 / 38.68 / 61.32 / 38.91 / 61.09

Step 2 - Calculate the expected values for the library in the mixture: Based on the mean normalized value of the library (table above), the expected contribution of the library in each mixture was calculated.

Library percentage / Normalized library values
1st position / 2nd position / 3rd position / 4th position / 5th position
C / T / C / T / C / T / C / T / C / T
0% / 0.00 / 0.00 / 0.00 / 0.00 / 0.00 / 0.00 / 0.00 / 0.00 / 0.00 / 0.00
10% / 5.47 / 4.53 / 5.46 / 4.54 / 5.26 / 4.74 / 3.87 / 6.13 / 3.89 / 6.11
25% / 13.67 / 11.33 / 13.65 / 11.35 / 13.14 / 11.86 / 9.67 / 15.33 / 9.73 / 15.27
50% / 27.34 / 22.66 / 27.29 / 22.71 / 26.28 / 23.72 / 19.34 / 30.66 / 19.46 / 30.54
75% / 41.01 / 33.99 / 40.94 / 34.06 / 39.42 / 35.58 / 29.01 / 45.99 / 29.19 / 45.81
90% / 49.22 / 40.78 / 49.12 / 40.88 / 47.30 / 42.70 / 34.81 / 55.19 / 35.02 / 54.98
100% / 54.68 / 45.32 / 54.58 / 45.42 / 52.56 / 47.44 / 38.68 / 61.32 / 38.91 / 61.09

Step 3 – Sequencing data normalization: PolyPhred values of the experimental samples were obtained and normalized. The results are shown in the following table.

Ratio of pBS-AGY1: Library / Normalized values from experimental samples
1st position / 2nd position / 3rd position / 4th position / 5th position
C / T / C / T / C / T / C / T / C / T
1:0 / 98.36 / 1.64 / 100.00 / 0.00 / 100.00 / 0.00 / 99.71 / 0.29 / 0.00 / 100.00
9:1 / 94.15 / 5.85 / 94.59 / 5.41 / 95.09 / 4.91 / 94.87 / 5.13 / 0.00 / 100.00
3:1 / 88.90 / 11.10 / 89.10 / 10.90 / 89.53 / 10.47 / 87.22 / 12.78 / 0.00 / 100.00
1:1 / 80.10 / 19.90 / 80.54 / 19.46 / 80.14 / 19.86 / 74.46 / 25.54 / 8.19 / 91.81
1:3 / 70.89 / 29.11 / 70.32 / 29.68 / 69.22 / 30.78 / 60.39 / 39.61 / 25.15 / 74.85
1:9 / 61.74 / 38.26 / 58.01 / 41.99 / 59.45 / 40.55 / 51.51 / 48.49 / 34.89 / 65.11
0:1 / 55.07 / 44.93 / 54.99 / 45.01 / 53.31 / 46.69 / 38.77 / 61.23 / 37.95 / 62.05

Step 4 – Calculate the mean value of pertinent bases in the barcode: Since the pBS-AGY1 clone contributes a single base at each variable site, the mean value for these bases was calculated in each mixture. This value represents the total contribution of both the clone and the library. The peak area specific for the clone can be determined if we assume that the contribution from the library is as determined in step 2. In preparation for this, the mean value for the relevant base as determined by sequencing is calculated (‘sample mean values’ below, example highlighted in red below and in step 3) and the expected mean value of the library for the relevant base was calculated from step 2 (‘expected mean values’ below, example highlighted in green below and in step 2).

Ratio / Sample mean values / Expected mean values of library
1:0 / 99.61 / 0.00
9:1 / 95.74 / 5.23
3:1 / 90.95 / 13.08
1:1 / 81.41 / 26.16
1:3 / 69.13 / 39.24
1:9 / 59.16 / 47.09
0:1 / 52.84 / 52.32

Step 5 – Quantification of pertinent bases in the barcode: To obtain the final result for the quantification of the pBS-AGY1 clone in the mixture, the ‘expected mean values of library’ are subtracted from the ‘sample mean values’ (above) and the result is shown in the table below (‘pBS-AGY1’). The contribution of the library is calculated by subtracting the ‘pBS-AGY1’ value from 100.

Ratio of pBS-AGY1: Library / pBS-AGY 1 / Library
1:0 / 99.61 / 0.39
9:1 / 90.51 / 9.49
3:1 / 77.87 / 22.13
1:1 / 55.25 / 44.75
1:3 / 29.90 / 70.10
1:9 / 12.08 / 87.92
0:1 / 0.52 / 99.48

Supplementary analysis C:

Temporal variance analysis

The calculation of temporal variance is made by comparing successive time points. For this analysis, the PolyPhred normalized data was obtained as exemplified in the Supplementary Analysis A, and only one nucleotide at each barcode position was analyzed (the value of the other possible nucleotide is, by definition, complimentary). There is no need to correct for the contribution of the specific clone versus the library since we are only interested in the change observed over time. The absolute value of the difference between normalized values of successive time points is determined for the chosen nucleotide for each variable position in the barcode. The final result for the temporal variance is the average of these values.

For example, the table below shows T1 and T2 from the 3:1 mixture (known barcode:library) of the tissue culture based assay (Figure 4 in the main text). In this example the temporal variance between T1 and T2 is 2.28.

Normalized values
1st position / 2nd position / 3rd position / 4th position / 5th position / 6th position / 7th position / 8th position / 9th position / 10th position
T1 / 31.46 / 44.40 / 38.83 / 46.23 / 31.26 / 16.24 / 67.89 / 72.15 / 65.53 / 73.60
T2 / 30.04 / 50.11 / 36.10 / 50.99 / 34.79 / 15.74 / 67.94 / 72.96 / 63.13 / 74.46
T1-T2 / 1.42 / 5.71 / 2.73 / 4.76 / 3.53 / 0.5 / 0.05 / 0.81 / 2.4 / 0.86

Software for temporal variance analysis

Software was developed to facilitate analyses. The software searches for the barcode sequence in the PolyPhred files, normalizes the nucleotide values of the variable positions and calculates the temporal variance. The software can be obtained by contacting DBZ () or BES ().

Reference

1 Bhandoola, A andSambandam, A. (2006). From stem cell to T cell: one route or many? Nat Rev Immunol 6: 117-126.