Data collection

Data was stored for analysis in comma delimitated format

Example data set can be found below:

------

patient.idtimewtagehtsexinitial.wttyperace

1027931.35890411700279lap(l)1

11326731.35890411700279lap(l)1

11825731.35890411700279lap(l)1

12125031.35890411700279lap(l)1

110421231.35890411700279lap(l)1

119519231.35890411700279lap(l)1

2022327.35068493600223lap(l)3

21919827.35068493600223lap(l)3

212415327.35068493600223lap(l)3

3031017.56986301691310open(GB)4

3928417.56986301691310open(GB)4

37424917.56986301691310open(GB)4

313022817.56986301691310open(GB)4

316221117.56986301691310open(GB)4

319019717.56986301691310open(GB)4

323418217.56986301691310open(GB)4

326917217.56986301691310open(GB)4

330417717.56986301691310open(GB)4

343618317.56986301691310open(GB)4

346617917.56986301691310open(GB)4

.

.

.

.

Example file name would be weightlossdata.csv

R Code for the Analysis of Weight Loss Data

R-software and support materials can be downloaded at

------

data = read.csv("weightlossdata.csv",header=T)

# NOTE: In the data, Females are 0 and Males are 1. Whites are 1, Hispanics are 2, Blacks are 3 and Asians are 4.

attach(data)

plot(time, wt)

# We see that weights tend to creep up the longer after the procedure

# This could be due to other factors beyond the scope of the surgery

# Instead, let us take a look at the data up until 500 days

trunc_data = subset(data, time <= 500)

trunc_data = subset(trunc_data, wt != "NA")

attach(trunc_data)

norm_wt = wt/initial.wt

plot(time,log(norm_wt))

logfit = lm(log(norm_wt) ~ 0+time)

summary(logfit)

# Note: We use 0+time to ensure that there is no intercept term. By removing the intercept term, we force the regression line (of the log transformed data) to pass through (0, 0) (which it should do anyways, since at time 0, everyone has normed weight equal to 1 and thus their log(norm_wt) is 0).

# Plotting the data, we see that the simple linear regression fails to capture the curvature of the weight loss near the beginning. That is, weight loss is more extreme than just exp(-x). Perhaps it is more along the lines of exp((x-c)^2), where c is some positive value.

pts = seq(0,499,len=500)

val = predict(logfit, data.frame(time=pts))

plot(time, norm_wt)

lines(pts, exp(val), col="red", lwd = 2)

# Instead, we consider a quadratic fit (also with no intercept) to the log-transformed data

logfit2 = lm(log(norm_wt) ~ 0+time+I(time^2))

summary(logfit2)

pts = seq(0,499,len=500)

val = predict(logfit2, data.frame(time=pts))

plot(time, norm_wt)

lines(pts, exp(val), col="red", lwd = 2)

# This looks much better than the previous (log-linear) fit. Justification for the unweighted quadratic model used above:

# Previous literature seems to suggest an exponential decay relative to factors such as fat, which would seem to be proportional to weight. An inspection of the data revealed that curvature still existed in the log-transformed normalized weights. This suggests that the weight loss was more extreme than just exp(-t).

# Our model: Wt = (Initial_Wt)*exp(0.000003429*t*(t - 749.781) captures this faster than linear decrease fairly well, R^2 ~ 0.93, but only fits the data well up to 400-500 days. After this point, the variance increases quite a bit, suggesting that environmental factors, rather than the surgerical procedure itself, are influencing the patient's weight.

# That is just as well, as our model would predict that a patient would return to their initial weight after two years.

# We can plot 95% prediction bounds for our curve:

plot(time,norm_wt)
conf.frame = data.frame(time=seq(0,499,len=500))
logfit2.pred = predict(logfit2,interval="pred",newdata = conf.frame,weights=1/log(1+conf.frame$time/100))
matlines(conf.frame$time,exp(logfit2.pred),col=c("red","blue","blue"), lwd = c(2,2,2))

#The change is using 100 as the denominator instead of 365. The construction of the weights for the prediction interval was a bit ad hoc. We wanted a weight function that would leave the bands narrow near t = 0 and wider as time progressed. According to the R manual,

#The prediction intervals are for a single observation at each case in newdata (or by default, the data used for the fit) with error variance(s) pred.var. This can be a multiple of res.var, the estimated value of sigma^2: the default is to assume that future observations have the same error variance as those used for fitting. If weights is supplied, the inverse of this is used as a scale factor.

# Now, as time progresses, we expect that outside factors will start affecting the patients' weight, moreso than the surgery itself. Thus, there is something to be said for weighting the observations, giving more credibility to early observations. Below is a plot of both weighted (blue) and unweighted (red) regressions. We use a simple inverse linear weight function.

logfit2_wtd = lm(log(norm_wt) ~ 0+time+I(time^2), weights = 1/(1+time))

summary(logfit2_wtd)

pts = seq(0,499,len=500)

val = predict(logfit2_wtd, data.frame(time=pts))

plot(time, norm_wt)

lines(pts, exp(val), col="blue", lwd = 2)

logfit2 = lm(log(norm_wt) ~ 0+time+I(time^2))

val = predict(logfit2, data.frame(time=pts))

lines(pts, exp(val), col="red", lwd = 2)

# So far, we have implicitly assumed that there is no difference between males and females in terms of weight loss. Rather than assume this, we should test this. Since we have coded females as 0 and males as 1, this can be done with a simple ANOVA

anova(lm(log(norm_wt) ~ 0+time+sex:time+I(time^2)+sex:I(time^2), weights = 1/(1+time)))

# We observe that there is interaction between sex and time, suggesting that there is a difference between males and females. As such, we shall create two models, one for each gender.

data_f = subset(trunc_data, sex == "0")

data_m = subset(trunc_data, sex == "1")

norm_wt = data_f$wt/data_f$initial.wt

time = data_f$time

logfit2_f_wtd = lm(log(norm_wt) ~ 0+time+I(time^2), weights = 1/(1+time))

summary(logfit2_f_wtd)

norm_wt = data_m$wt/data_m$initial.wt

time = data_m$time

logfit2_m_wtd = lm(log(norm_wt) ~ 0+time+I(time^2), weights = 1/(1+time))

summary(logfit2_m_wtd)

# Below is a plot of the weighted regression with 95% predicition intervals with simple inverse linear weight function for the variance. That is, we allow the variance to increase linearly with time.

plot(time,norm_wt,col=1*sex+2, pch=2*sex+1)

conf.frame = data.frame(time=seq(0,499,len=500))

logfit2_f_wtd.pred = predict(logfit2_f_wtd,interval="pred",newdata = conf.frame,weights=1/(1+conf.frame$time))

logfit2_m_wtd.pred = predict(logfit2_m_wtd,interval="pred",newdata = conf.frame,weights=1/(1+conf.frame$time))

matlines(conf.frame$time,exp(logfit2_f_wtd.pred),col=c("red","blue","blue"), lwd = c(2,2,2))

matlines(conf.frame$time,exp(logfit2_m_wtd.pred),col=c("green","orange","orange"), lwd = c(2,2,2))