Teleassessment Data Analysis
10 subjects, no replications
AROM
Shoulder abduction angle (3 angles)
Shoulder external rotation (2 angles)
Knee flexion (2 angles)
Methods:
- co-lo with goni
- remote with CG reporting goni angle
- remote with PT zooming in and reading goni held by CG
- remote with PT using virtual goni on stored image (LY reporting)
- remote with PT using virtual goni on stored image (RI reporting)
- remote with PT using hand goni on screen on stored image. (LY reporting)
- remote with PT using hand goni on screen on stored image. (RI reporting)
Angles are rounded to nearest integer, Units=degrees
No replications
Analysis: Paired t-test for significant differences.
Detailed methods: A known source of variation is that each angle is different and each subject is different. Thus, it makes sense to block for this nuisance factor variation. See discussion on blocking in [1]:32-36 and [2]:660-665 and [3]:chapt 4.
We have a one-factor study and are running a complete block design ([2]:661, [3]:127). It is not a randomized complete block design because the order of tests was not randomized for each subject. That should not make a difference here. In the language of [4], we have a repeated measures design (pp 387-391) where each treatment is tested in a number of subjects. [4] has a nice explanation of the math and calculations
The one factor study has 7 levels, the 7 methods for measuring the angles. We are interested in whether the 7 methods are the same or different. The F-test tells us this via a one-way ANOVA.
To determine overall differences, there are 70 blocks (10 subjects, 7 angles). For specific joints, can look at subsets of the data.
The null hypothesis is that all angle measurements means are equal.
Analysis and interpretation
Created rom1.xls. Rows are blocks one for each angle and subject, Columns are treatments A-G. Compare to Table 20.4 in [4].
Removed a07s004 from data since missing data for 4 treatments.
Followed tutorial to compute F using Excel analysis tool.
F = 2.46, Fcrit = 2.12, P-value = 0.024.
With 97.5% confidence, can reject null hypothesis that all angle measurement methods are the same.
The Scheffe min sig diff is 3.02. None of the means differ from any other mean by this amount (remember that the Scheffe comparison is conservative).
The Tukey HSD min sig diff is 2.49. None of the means differ from any other by this amount.
Thus, while the angle measurements are not all the same, there is no particular angle measurement method that stands out from the others.
The full analysis used subjects 1-10. The protocol changed after subject 4 to reduce the chance of the joint angle changing between measurement methods.
The ANOVA analysis was repeated for just subjects 5-10. Copied data from rom1.xls to rom2.xls.
F = 1.69, Fcrit = 2.13, P-value = 0.12.
For subjects 5-10 can accept the null hypothesis that there is no difference between any of the methods tested.
Power analysis using gPower.exe for subjects 5-10. ANOVA, post-hoc. Alpha=.05, groups=7, samples=294 showed the following: Power to detect a small effect size (f=0.10) was 20%. Power to detect a medium effect size (f=0.25) was 91%.
Need to determine what effect size we want to detect, that is how accurate must an AROM measurement be (in degrees). For example, if a PT reads 88 degrees when it is actually 85 degrees, does it matter? From Lynda: 5 degrees is the accepted error of measurement. Most clinicians would say that the actual position of the joint is within 5 degrees of the measurement they take. So, calculate power of ability to detect 5 degrees. gPower.exe, anova, post-hoc, groups=7, samples=294, pooled sd=49, effct sze f = 5/49 = .10 (small effect) which means experiment had 20% power.
The power of 20% does not seem correct as we had lots and lots of data. I think this is because I did an ANOVA power analysis with a pooled sd=49. The sd is so large because every subject had a different angle. Instead I believe we should find the pooled sd by looking at the variations about the mean. That is, for each block, we subtract the mean from every measurement to get the deviations about the mean (this is what blocking or repeated measures is all about) and it is the sd of those deviations that we use to determine the pooled sd. (This is the approach taken to blocked designs in “DOE Simplified”).
So, did another crack at the power analysis (see rom2.xls). For each block, computed mean and stdev and variance. Eliminated one outlier (a04, s005, method X6: 46 deg when others were between 78 and 89; probably a data entry error). Computed a pooled SD as the square root of the average variance over the blocks. Gave a pooled SD of 4.86. Compare to pooled SD=49 from above when means not subtracted! Now the desired effect size is 5/4.86 = 1.02. The effect size for detecting a 1 degree difference is 1/4.86=0.21
Used gPower to calculate power: anova, post-hoc, alpha=.05, samples=294, groups=7. For effect size f=1.02, power=100%. For effect size f=0.21, power=77%.
Now this is not exactly correct as each subject did not receive exactly the same treatment because knee angle 1 for subject x was different from knee angle 1 for subject y. But, I think this is fine because we don’t expect any additional variations in the measurement methods when you measure at 30 degrees compared to 60 degrees.
Further analysis
Copied data to rom3.xls for analyzing secondary issues.
Issue: does it matter if CG reports the angle or if PT reads by zooming in? Text by comparing all X2 data to all X3 data. For these two, we know the subject had the same angle.
Analyzed by paired t-test. No significant difference between the methods. Further of 69 observations, 68 had a difference of 0 or 1 degree and one had a difference of 2 degrees. Conclusion: CG self-report is same as zoom in. This reduces tech needs and time as a high zoom camera is costly and it takes time to position and zoom. Conclusion: CG can be taught to read the goni.
Issue: Is goni on screen the same as VR goni?
Analysis: Stacked data for RI under data for LY for side-by-side comparison of two methods. 138 observations. Paired t-test says no difference. Summary stats for the absolute difference between the two methods. Mean abs diff = 4.2 deg +/- 4.69. Min diff = 0 deg, max diff = 42 deg (an outlier), Median diff = 3 deg.. Conclusion: goni on screen is the same as VR goni.
More analysis
Copied data from rom2.xls (subjects 5-10 only) to rom4.xls. Objective: create visuals to determine if any one method has bias or greater deviation from the mean than any other method.
To control for angle, take mean angle for each block, and look at deviations about that mean. If one measure consistently high or low, should be noticeable. Same for excess variation. This is really the only way to do this as there is no gold standard with the exact angle.
One outlier was eliminated (a04, s005, method X6: 46 deg when others were between 78 and 89)
All 7 methods had average deviation from the mean between -1 and 1 degree (no bias) and stnd dev about the mean of 3.4 to 5.3 degrees (approximately equal scatter). An overview scatter plot showing deviations about the mean reveals no trends. Conclusion? Each method about the same.
Still more analysis
Copied data from rom4.xls to rom5.xls. Purpose: determine if any method consistently under or over estimated mean, or if any method had a consistently large variation about the mean.
MMT
Biceps (2 weights)
Quad (2 weights)
Methods:
- co-lo hands on
- tele with PT visual assessment (weight 1)
- tele with PT visual assessment plus digital dyna (weight 2)
Data in MMT score, Units=0-5, +/- converted like a gpa, e.g. 4+ = 4.33, 4- = 3.67.
Although three methods were tested, only two methods were tested on any one muscle/subject. Data can be analyzed by a paired t-test..
Created file mmt.xls
Section “Visual Remote” compares co-located (treatment X1) with remote-visual method (treatment X2).
Section “Digi Remote” compares co-located (treatment X1) with digi-enabled method (treatment X2).
Ran paired t-test and repeated measures ANOVA. Both give same P results, as expected.
The test showed no significant difference between the methods.
Power analysis for paired t-test: See spreadsheet for calculations. Calculated effect size d for paired t-test using Eq. C.4 in [4] (see tutorial). Sample size = 20. Used gPower to calculate Power. Post hoc analysis. Two-tailed. N1=20, n2=20. Effect size from spreadsheet. Or, effect size from gPower, Calc Effectsize: Mean grp 1=0, mean grp2=.33, signma=.9735 (see spreadsheet), d=.339. Same thing for mean grp2=1 gets d= For co-lo to viz observat, detecting diff of .33 and 1 is effect size .72 and 2.18 and test Power was 60% and 100%. For co-lo to viz dyna, detecting diff of .33 and 1 is effect size .65 and 1.96 and test Power was 51% and 100%
More Analysis
Copied data to mmt-1.xls
Analyzed by Wilcoxon Signed-Ranks Test (Portney, p. 427), per suggestion of reviewer.
Could not reject null hypothesis (alpha = 0.05)
Sit to Stand
Two trials with different disks
Methods
- co-lo viewing
- tele viewing
Data berg balance score, Units= 0-4
Analysis
Created file sitstand.xls.
Trial 1 was not analyzed as all scores = 4. This means all subjects were able to stand well with a single dyna-disk of any color. Conclusion is that a single dyna-disk is not useful as a simulated impairment. Remote assessment is validated for this case as all remote scores were the same as the matching co-lo score.
Trial 2 showed some score variation, but most scores were still 4. A paired t-test showed no significant difference. It is likely that the power of the t-test to detect a difference was low. In trial 2, 4 of 10 measurements showed differences between co-lo and remote. Of the four, two had score=0 for remote which means the subject fell for the remote.
The difficulty with making inferences on remote viewing is that there is no guarantee that both therapists were viewing the same impairment. A subject might balance well in one trial (score=4) and fall when doing the same disks for remote (score=0). It is extremely unlikely, if not impossible, that two therapists would view a trial where the patient fell and not score as 0. Thus, trials where one PT scored a zero and the other did not means with almost 100% certainty that the subject behaved differently between trials.
General conclusion is that while remote viewing appears valid as a means for assessing Berg sit-to-stand, the dyna-discs did poorly as a simulated impairment, at least for the combinations and pressures used.
In future studies, the co-lo and remote therapist should view the same trial. This does bring up a problem of who spots the subject. In co-lo the spotter is the PT while in remote the spotter is the care-giver.
More Analysis
Analyzed Trial 2 by Wilcoxon Signed-Ranks Test (Portney, p. 427), per suggestion of reviewer. Results said to reject the null hypothesis. However, this was because subject stepped off the disk in trial 2, but not in trial 1, therefore it is incorrect to include these values in the analysis. Eliminating these trials leave only 1 trial where the rating was different. Wilcoxon says this is signif, but this statement highly prone to Type 2 error. Best solution is not to report Wilcoxon, but to explain in words what is going on.
Functional reach
Two trials with different disks
Methods
- co-lo viewing against board
- tele viewing against board
Data in initial and final reach, Units=inches
Analysis
Created file reach.xls
Stacked data from two trials and treated as one big data set.
Paired t-test showed no difference between co-lo and remote measurement methods. Power of test may be low because of small number of subjects.
Power analysis for paired t-test: Pooled std deviation = 3.32. Calculated effect size d for paired t-test using Eq. C.4 in [4] (see tutorial). Sample size = 20. For detection of 2 inches, effect size d = 1.2, Power to detect = 95%. For detection of 1 inch, d = 0.62, Power to detect = 48%.
Of course these power calculations are a bit bogus because the source of variation is the variation in performing the test rather than variations in the measurement methods. The subject was performing differently each time so that each side of the paired t-test was measuring something different.
The average absolute value of the difference between the two measurements was 1.8 inches. The absolute value of the difference between the two measurements in the second trial was almost double that of the first trial. This is because in the second trial, subjects were standing on two disks and were likely to have a larger variation in their reach from co-lo to remote measures.
TUG
Two trials
Methods
- co-lo with stopwatch
- tele with stopwatch
Data is time, Units=seconds
Analysis
Data is in tug.xls
Descriptive stats of the differences. Median difference colo and remote = 0.87 sec. Max. diff = 10.91 for subject 10. This is an outlier as that subject did way faster for colo compared to remote because of learning.
Learning curve is clear. For example, Subj 10 did remote first followed by colo. In order of walking subject 10’s times were: 25, 19, 14, 13 sec, clearly demonstrating learning. Subject 8 in order was 15, 14, 12, 12 sec. Subject 6: 11, 11, 10, 8. Subject 5: 11, 11, 14, 11 (a blip when did after a long break). Subject 3: 11, 10, 9, 9. Subject 2: 10, 8, 7, 7