2007-03-06
Some examples and exercises for the PhenePlate system
Exercise 1. Identification of types and diversity index
Exercise 2a: Practical application in Clustering and creation of Dendogram
Exercise 2b. How to interpret difficult typing data
Exercise3a. Some basics of the PhenePlate software
Exercise 3b. Conversion of plate images to PhP data and analysis of data
Exercise 4. Epidemiological data – E.coli infections
Exercise 5. Bacterial transmission in hospital wards
Exercise 6. Species identification using a reference system.
Exercise 7a. Population statistics - Studies of intestinal bacterial populations
Exercise 7b. Population statistics – water pollutions
Exercise 8. Delete unnecessary tests.
Exercise 9. Clustering of data that should not be clustered
Exercise 10. Saving of important isolates representing common PhP types
Example on how to write a protocol
Exercise 1. Identification of types and diversity index
Which lanes are identical?
Identical lanes are:
Single lanes are:
Name the common types that are found more than once C1, C2, C3 etc. Name single types found only once S1, S2 etc. Assign each lane to a type, and complete the list below:
Lane / Type1 / 1
2
3
4
5
6
7
8
9
10
11
12
The diversity can be calculated as Simson’s diversity index (Di). The formula is:
Calculate simpson’s diversity index from the data above
Di = ……………..
Exercise 2a: Practical application in Clustering and creation of Dendogram
How to create a dendrogram
Look at these lanes
A B C D E
Similarity matrix
Table I. Number identical bands Table II. % identical bands
A / B / C / D / E /A
/ B / C / D / EA
/ - /A
/ -B / 7/8 / - / B / 88 / -
C / 6/8 / 6/7 / - / C / 75 / 86 / -
D / 5/8 / 5/7 / 5/6 / - / D / 63 / 71 / 83 / -
E / 4/8 / 4/7 / 4/6 / 4/5 / - / E / 50 / 57 / 67 / 80 / -
III. Next step is to sort all pairs in descending IV. The last step is to group the lanes at the highest similarity
similarity order (pick values from matrix II) level they show to any other lane
Similarity / Pairs / Similarity / Lanes88
/ A-B /88
/ (A, B)86 / B-C / 86 / (A, B), (B, C) = A, B, C
83 / C-D / 83 / (A, B, C), (C, D) = A, B, C, D
80 / D-E / 80 / (A, B, C, D), (D, E) = A, B, C, D, E
Etc.
Since all lanes are in the same group at 80% the grouping
procedure stops here
Now the dendrogram can be drawn: Use the order that the lanes were sorted in Table IV.
Calculate the co-phenetic correlation from the data above
How?
Use the similarity matrix below (Table II). From the dendrogram, prepare a new similarity matrix that indicates the similarities between the lanes in the dendrogram
Table II (true similarity matrix)Dendrogram (clustered similarities)
A / B / C / D / E /A
/ B / C / D / EA
/ - /A
/ -B / 88 / - / B / 88 / -
C / 75 / 86 / - / C / 86 / 86 / -
D / 63 / 71 / 83 / - / D / 83 / 83 / 83 / -
E / 50 / 57 / 67 / 80 / - / E / 80 / 80 / 80 / 80 / -
Load Microsoft Excel and calculate the correlation coefficient between the two matrixes (You can copy the table below directly into excel if this document is read from a computer)
A-B / A-C / A-D / A-E / B-C / B-D / B-E / C-D / C-E / D-ETable 2 / 88 / 75 / 63 / 50 / 86 / 71 / 57 / 83 / 67 / 80
Dendrogram / 88 / 86 / 83 / 80 / 86 / 83 / 80 / 83 / 80 / 80
Correlation=
A high correlation coefficient (co-phenetic correlation > 0.90) indicates a dendrogram that well reflects the original data. When lower values than 0.80 are obtained, the clustering result for individual samples should be carefully checked, however, the dendrogram still may yield a valuable picture of the population structure among the samples
Use the same data as above, but remove lane C, and present a dendrogram
A B C D E
Similarity matrix
I. Number of matching lanes II. % matching lanes
A / B / D / E /A
/ B / D / EA
/ - /A
/ -B / - / B / -
D / - / D / -
E / - / E / -
III. Next step is to sort all pairs in descending IV. The last step is to group the lanes at the highest similarity
similarity order (pick values from matrix II) level they show to any other lane
Similarity / Pairs / Similarity / LanesDraw the dendrogram, and calculate the co-phenetic correlation
Exercise 2b. How to interpret difficult typing data
One basic concept of bacterial fingerprinting using PFGE is the criteria of Tenover. It states that when two bacteria show banding patterns that differ in two bands or less they are regarded as belonging to the same clone, whereas if they differ more they belong to different clones. According to these criteria, which of the lanes below represent bacteria that belong to the same clone?
If this does not seem difficult, see what happens if a new lane (D) is added:
Even though A and G differ by 6 bands they will all belong to the same group when D is introduced as a link between the two groups.
Does clustering solve this problem?
There are two main methods for clustering: Single linkage (as shown above), and average linkage (UPGMA). Which is better to use for this kind of data a) when all 7 lanes are used; and b) when lane D is excluded ?
Exercise3a. Some basics of the PhenePlate software
The following table is a list of test results from 10 isolates (A1-A10) that were exposed to 12 tests each (T1-T12). 2 means positive result, 1 means +/-, and 0 is negative.
*12* / T1 / T2 / T3 / T4 / T5 / T6 / T7 / T8 / T9 / T10 / T11 / T12A1
/ 2 / 2 / 2 / 2 / 2 / 2 / 0 / 0 / 0 / 0 / 1 / 1A2 / 0 / 0 / 1 / 2 / 0 / 0 / 2 / 2 / 0 / 1 / 2 / 2
A3 / 2 / 0 / 2 / 0 / 2 / 2 / 2 / 0 / 0 / 0 / 2 / 2
A4 / 2 / 2 / 2 / 1 / 2 / 2 / 0 / 0 / 0 / 0 / 2 / 0
A5 / 2 / 2 / 0 / 0 / 1 / 1 / 2 / 2 / 0 / 2 / 0 / 0
A6 / 0 / 0 / 1 / 2 / 0 / 0 / 2 / 2 / 0 / 1 / 2 / 2
A7 / 2 / 2 / 1 / 1 / 2 / 2 / 2 / 0 / 0 / 0 / 0 / 0
A8 / 2 / 2 / 2 / 2 / 2 / 2 / 0 / 0 / 0 / 0 / 1 / 1
A9 / 2 / 0 / 2 / 0 / 2 / 0 / 2 / 0 / 2 / 0 / 2 / 2
A10 / 0 / 0 / 0 / 0 / 0 / 0 / 2 / 2 / 0 / 0 / 2 / 2
Firstly, just by looking at the data, try to see which isolates are possibly identical, and which ones are similar.
Identical isolates:Similar isolates:
Single (unique isolates:
Then analyse the same data with the PhenePlate software. The data can be input in the software in different ways:
Method 1. Feed the test data into the computer using Excel – The *12* are very important! Load PHPWIN software, go for DATA ANALYSIS, ANALYSIS OF OTHER DATA, LOAD DATA FROM CLIPBOARD, go to Excel and copy the data to clipboard)
Method 2. Mark the table above (including the header), and copy it to clipboard as above
Method 3. The data can also be found in the file EXMANUAL.TXT in the folder EXAMPLES under PHPWIN
Load PHPWIN software, go for DATA ANALYSIS, ANALYSIS OF OTHER DATA, LOAD DATA FROM FILE and load the file EXMANUAL.TXT .
Go for ANALYSE DATA, CALCULATION OF SIMILARITIES FOR CLUSTERING, select only valid samples, and then go for PRESENTATION OF DENDROGRAM. How does the resulting dendrogram fit with your visual observations?
Exercise 3b. Conversion of plate images to PhP data and analysis of data
The Phene Plate software can transform images of plates, generated by flatbed scanners or digital cameras, to absorbance data that can be used for further calculations. Thus, a cheap flatbed scanner can replace an expensive microplate reader
First, try to determine which isolates are identical (i.e. belong to the same type) in these two plates by visual inspection. The plates contain 16 coliform isolates (from water samples), one isolate in each row, and each isolate was exposed to 11 tests (no 2-12). Well no. 1 in each row was used to prepare the inoculum, and is thus not used for the identification. Continue filling the last column (Type) for all isolates
Test no.Isolate / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / 11 / 12 / Isolate / Type
A1 / A1 / 1
A2 / A2 / 1
A3 / A3 / 2
A4 / A4
A5 / A5
A6 / A6
A7 / A7
A8 / A8
B1 / B1
B2 / B2
B3 / B3
B4 / B4
B5 / B5
B6 / B6
B7 / B7
B8 / B8
Then analyse the same images with the Phene Plate software:
Load PHPWIN software, go for CREATE PhP DATA, CREATE PhP DATA FROM ABSORBANCE DATA AND FROM SCANNED IMAGES, CONVERT PLATE IMAGE TO PHP DATA
Click FROM CLIPBOARD. Then go to Microsoft Word, copy the first plate above to clipboard, go back to PhP, and click load. The plate image appears in PhPWIN.
Follow the instructions in the yellow frame in the top right corner of the PhPWIN frame.
When you have clicked on the center of the last well (H12) you will need to answer som questions about the data. Select ‘First reading’, type a name of the file where to store the data, and type 2 for two plates. Select plate type no. 07R (PhP-RE plate), and click ok
Now a frame with the absorbance data will appear. To save the data, click ‘Save data’.
Now the plate has been converted. Click load – use new co-ordinates to load the next plate, and convert it in the same way. Click Exit
Load PHPWIN software, go for DATA ANALYSIS, ANALYSIS OF PhP DATA, LOAD DATA FROM FILE and load the file you just created .
Go for ANALYSE DATA, CALCULATION OF SIMILARITIES FOR CLUSTERING, and then go for PRESENTATION OF DENDROGRAM. When the dendrogram has been created, click LIST OF PHP TYPES. How does the list and the dendrogram fit with your visual observations? (The results can also be found in the file PHPEXEMPEL.XLS)
Exercise 4. Epidemiological data – E.coli infections
Calculations and printing of test results in a material of 22 E.coli isolates collected from seven patients with repeated acute pyelonephritis (P1-P7). The isolates were assayed with PhP-EC plates. The aim is to find out whether the different infections in the same patient were caused by the same E.coli strain. Another aim was to find out if the same E.coli type was causing inf ections in several patients. This type could then possibly represent a pathogenic clone.
Data are stored in a file named EXEC. Load and print these data, calculate the correlations, and print a dendrogram, list of phenotypes, and names of tested isolates. Give the data in the protocol (see below), and calculate the number of patients with reinfections with the same strain, and with new infections with other strains.
Date:Plates: EC-15 File name: EXECDiskette: No5
Substrate:0.1% peptone Samples: E.coli from pyelonephritis
Incubation temp.37 Reading no. 1: 7h 2: 24h3: 48h 4:
PlateSample Name Result / comments
1 / 12 / P1.A
2 / P1.B
3 / P2.A
4 / P2.B
2 / 5 / P2.C
6 / P2.D
7 / P3.A
8 / P3.B
3 / 9 / P4.A
10 / P4.B
11 / P5.A
12 / P5.B
4 / 13 / P5.C
14 / P6.A
15 / P6.B
16 / P6.C
5 / 17 / P6.D
18 / P6.E
19 / P7.A
20 / P7.B
6 / 21 / P7.C
22 / P7.D
23 / P7.E
24 / Neg control
No of patients:7No of infections:23
No of PhP-types:
No of patients with infections with only one strain:
No of patients with infections with more than one strain:
Exercise 5. Bacterial transmission in hospital wards
Analysis of two different materials of 30 Klebsiella isolates each, collected from infants in two different neonatal wards. From each infant, two Klebsiella isolates have been assayed with PhP-48 plates. The aim has been to investigate whether there has been any nosocomial spread of Klebsiella in these two wards.
The data are stored in the files EXKLEBA and EXKLEBB. Select both files (double click on them in the file list, click OK). Select ANALYSE DATA, CALCULATION OF SIMILARITIES FOR CLUSTERING, click SELECT ALL SAMPLES, and then PRESENTATION OF DENDROGRAM. In dendrogram options, select DENDROGRAM SIZE: 2 PER PAGE. Click SELECT SAMPLES, and select all samples from ward A. Click READY and OK. Click PHP TYPES on the menu bar in the dendrogram. Write down the obtained PhP types in the lab protocol. Also write down the diversity index, which is a good measure whether there has been any nosocomial spread in the ward (high diversity = random, normaldistribution of PhP types, Low diversity = dominance of certain types = possible nosocomial spread of these types. Go for CLUSTER NEW DATA, select samples from ward B, and press OK to view dendrogram from ward B.
Exercise 6. Species identification using a reference system.
Although the PhP system was not developed for species identification, it may be used as a screening system for species identification. To be able to do so, isolates of known species have to be assayed first, and a reference file containing PhP data of these isolates is built. Data from unknown isolates may then be compared to the reference file, and the highest similarities are printed out. Unknown isolates which fall out as similar (>0.80) to several reference strains of the same species can often be assigned to this particular species. However, the accuracy of this identification of course depends on how the reference system was created.
The file EX48.ADD contains data from unknown isolates that have been assayed with the PhP 48 plates, and the file 48-PLATES.REF contains data from isolates of known species, which also have been assayed with PhP 48 plates.
First load the file EX48.ADD into the PhP software. Select ANALYSE DATA, and COMPARISON TO REFERENCE DATA from the main menu. Select LOAD REFERENCE FILE, and select the file
48-PLATES.REF as reference data.
Exercise 7a. Population statistics - Studies of intestinal bacterial populations
The Population Similarity Coefficient (Sp) denotes the proportion of identical types in two compared samples. E.g. if the two samples are from the intestinal flora in the same individual at different sampling occasions, the Sp coefficient is thus a measure of the stability of the intestinal microbial flora in that individual
The file EXPIG4.ADD contains data from the coliform flora in one pig. Six different segments of the intestine were analysed (jejunum dist. pyloris (2d/1), jejunum oralt ileum (2d/2), ileum (2d/3), caecum (2d/4), colonspiral (2d/5), and rectum (2d/6)), and 24 colonies were PhP typed from each sample.
Select the file, ANALYSE DATA, POPULATION STATISTICS from the PhP main menu. Select USE PRE-DEFINED. Click START. When diversiy indices are shown, select CONTINUE to calculate the Sp coefficients
Note the diversity in each segment. Use the Sp values to describe how well the E.coli flora in the rectum corresponds the the flora higher up in the intestine
Segment / Di for pig 4 / Di for pig 6 / Sp coefficient to rectal flora for pig 4 / Sp coefficient to rectal flora for pig 6jejunum dist. pyloris (2d/1)
jejunum oralt ileum (2d/2)
ileum (2d/3)
caecum (2d/4
colonspiral (2d/5)
rectum (2d/6)
The file EXPIG6.ADD contains data from the E.coli flora in another pig. Calculate the same data for this file.
How well does the rectal flora reflect the floras of other intestinal segments?
Load the data from file EXPIG4.ADD together with the data from EXPIG6.ADD, and calculate Sp coefficients from all 12 samples. Select Clustering
Exercise 7b. Population statistics – water pollutions
The file EXRS contains data from 150 coliform isolates from 8 water samples, assayed with the PhP-RS plates. Sample number P1 is a polluted drinking water well, and samples number 2-8 is possible contamination sources. All samples contain several different kinds of bacteria. Select the file EXRS.ADD, ANALYSE DATA, POPULATION STATISTICS from the PhP main menu. Which of the samples S2-S8 contain the highest proportion of isolates that are identical to those in sample P1, and is thus the possible contamination source?
Exercise 8. Delete unnecessary tests.
Normally, performing a high number of tests will give a good discrimination and reproducibility to a typing system. However, doing a high number of tests that give negative (or positive) result for all isolates will decrease the discrimination, and might also decrease reproducibility of the typing. The file EXDEL contains eight isolates, assayed with 48 tests. Calculate and cluster this file (when clustering, cluster from similarity 0 to 100, and use UPGMA. Then load the file EXDEL again, select DATA MANAGER, EDIT DATA, REMOVE TESTS. Remove all data for tests 25-48. View data on the screen. Calculate and cluster the file EXDEL again in the same way as above.
Exercise 9. Clustering of data that should not be clustered
The file EXBAD.ADD contains the following data:
( 8)*** DATA FORMING NO CLUSTER
SAMPLE 1
0 0 0 0 0 2 2 2
SAMPLE 2
0 0 0 0 1 2 2 2
SAMPLE 3
0 0 0 0 2 2 2 2
SAMPLE 4
0 0 0 1 2 2 2 2
SAMPLE 5
0 0 0 2 2 2 2 2
SAMPLE 6
0 0 1 2 2 2 2 2
As you can see, the samples are "chained" and the data do not form any real clusters. Try to calculate these data, look at the correlation coefficients, cluster the data (use both Single Linkage and UPGMA), and look at the co-phenetic correlation.
A low co-phenetic correlation is an indication that the tested samples do not form any real clusters, and such dendrograms should be interpreted with care. They might occur e.g. if samples which are very similar to each others are analyzed, like would be the case if bacterial isolates from one single outbreak are studied.
Always be very careful when interpreting dendrograms, especially when clustering large numbers of samples! A dendrogram is a very rough representation of the data it was created from, and the lower the co-phenetic correlation is, the less representative of the data is the dendrogram.
Exercise 10. Saving of important isolates representing common PhP types
Working with samples containing mixed bacterial populations often also involves saving of isolates. Instead of saving all isolates (would quickly fill a freezer), only one isolate representing each dominating phenotype in each sample can be selected.
Load PHPWIN, and select the file EXRS.ADD. From that file, select the isolates from the first sample (P1 - 32 isolates), calculate similarities and cluster. Select LIST OF PHP TYPES WITH ALL INFORMATION. If you click on the list, the isolates will be sorted in the same order as the dendrogram (click on the list again, and they will be sorted in the original order).
The first column contains an X if the isolate is the best representative of a common PhP type. The columns 2 - 3 contain the number and name of the isolate, and the fourth column the PhP type that the isolate was assigned to by the clustering procedure. Si indicates it is a single (unique) type, and that the isolate was not identical to any other isolate.
Column 5 indicates the quality of the identification , very good (***) to poor (-), and the next column indicates the order the isolates were presented in the dendrogram