Comparing Propensity Score Weighting Methods Visually

Fan 1

Comparing propensity score weighting methods visually

Introduction

Propensity Score Analysis (PSA) is used to adjust the confounding effect when studying the treatment effect in observational studies. In Propensity Score Analysis , adjustments based on propensity scores are made. Propensity score is the conditional probability of being in the treatment group given a set of covariates (D'Agostino, 1998). The use of propensity scores could provide evidences for possible causal relationships as indicated by Rosenbaum and Rubin (1983).

In an observed study, when there is one treatment group and one control group, we let T be an indicator of observed treatment exposure (T=1 if treated, T=0 if control), let Y be the observed response, and let X be a vector of covariates we measured. For each individual we observed, it is assumed that there are two values that associated with it: and , which are the values of the response if the individual were to receive control or treatment respectively. However, because one individual can only receive either treatment or control at one point of time, only of one of and can be observed in the study and the other one is hypothetic. and thus can be considered as inherent characteristics of each individual (we cannot observe both of them) and they are referred as counterfactuals (or potential outcomes). The Y observed response can then be expressed as Y= T + (1-T) .

To infer there is an effect of treatment on response, we are interested in average treatment effect for the treated (ATT), which is defined as = . This statistics can be perceived as the difference between average of responses if all cases receive treatment and the average of responses if all cases are in control group.

In observational studies, the average treatment effect for the treated cannot be directly obtained, because the assignment of Treatment (T) is not random in a same population. On the contrary, it is possible that may be only individuals with certain characters may be in the treatment group. These characters may related to how the individual response and thus confound the estimation of ATT. As a result, in an observational study, because T is not independent with the potential response , , E(Y|T=0)= E( |T=0) ) and E(Y|T=1)= E( |T=1) ), we cannot directly estimate ATT from the observed data.

To solve this problem, Rosenbaum and Rubin (1983) proposed the use of relevant covariance X. It is assumed that , T|X. This is referred as the assumption of strongly ignorable treatment assignment. Based on this assumption, it is expected that E(Y|T=0, X)= E( |T=0, X) ) and E(Y|T=1, X)= E( |T=1, X) ). Thus, the = can be estimated from the observed response Y. Methods such as stratification (Rosenbaum & Rubin, 1983, 1984), matching on propensity score (Rosenbaum & Rubin, 1983), and regression adjustment (Hirano & Imbens, 2001; Rubin, 1997) then can be used to estimate ATT( ).

However, what if we don’t use assumption of strongly ignorable treatment assignment , is there a way to estimate ATT( )? Yes, a method using weighting on the inverse of propensity score can estimate of from Y. Theoretically, the Inverse probability weighting estimator produces an unbiased estimate of the true treatment effect (Horvitz & Thompson, 1952). This weighting is based in the inverse of the propensity score. Such technique is thus referred to as ‘inverse probability weighting’ (IPW).

Specifically, the weights are defined as:

=1/propensity score for treatment group

=1/(1-propensity score) for control group

Suppose Y is the observed outcome variable, T is the treatment indicator, p(x,T) is the propensity score, denotes the potential outcome for treatment.

Similar conclusion is obtained for the control group.

Thus, - =.

Review of weighting methods

Based on the above formula, we get the first weighting method to estimate ATT( ), here I denote as IPW1. In the following formulas, is the corresponding number of an observation.

IPW1 leads to the second method IPW2, which is sometimes known as a ratio estimator in sampling literature (Lunceford & Davidian, 2004). Because E(=E{E(T|X)/ and E(=, we can normalizes the weights so that they add up to 1 in each treatment group and get the estimator by the second method IPW2

The above two methods are popular approaches proposed by Rosenbaum (1999).

Robins, Rotnitzky, & Zhao (1994) considered the Y in the situation of missing data for either Yc or Yt. To solve this problem as a missing data problem, they used the arbitrary semiparametric models with missing data and had the following estimator. Because some scholars consider this estimator of AAT as ‘double robustness’ (Robins, et al., 1994), we denote it as

is the regression of the response on X in group z, z=0,1, depending on parameters , and is an estimator for based on the data from subjects with Z=z. Thus, in this estimator, three regression models are involved: propensity score model and two regression models and . It is believed that this estimator provide more consistent and efficient than and .

In addition, based on the first two methods, one other methods was derived (Lunceford & Davidian, 2004) as following.

The methods of computing variances for above s were summarized by (Lunceford & Davidian (2004) based a series of literature (Dawid, 1979; McIntosh & Rubin, 1999; Stefanski & Boos, 2002).

One drawback of Inverse probability weighting approach ( e.g., , ) is that it’s very sensitive to extreme values. For an observation with extremely small propensity score, the weight will be extremely large so that it will be very influential to the estimate(Posner & Ash).

Thus, Posner and Ash proposed weighting within strata as alternative weighting method. The weighting within strata fist compute the propensity score and divide them into strata. Then, within each strata, a weight for treatment group is used for all treatment observations in this strata and a weight for control group is applied to all control observations in this strata. The weight for treatment group is the reciprocal of the proportion of treatment observations in this strata and the weight for treatment group is the reciprocal of the proportion of control observations in this strata. Thus, even this is some extreme observations, the effect of this observation will not be bigger than any other observations in the same strata. Finally, a weighted least squares regression to used to estimate the AAT. Here we denotes the estimator of AAT using this weighting within strata as .

Comparing weighting methods-Results

I will discuss my project in the following 5 parts.

A. I computed and compared the s using different weighting methods.

B. I tried to visualize the weighting method IPW2 by replicating the observations by weights.

C. Two weighting methods (IPW2 and within strata weighting) are compared in loess graphics with the original loess plot.

D. I tried to test the sensitivity of different weighting methods to outliers. When there are extreme cases( I added 2 extreme cases in the data in 2 ends of the propensity score), I compared the change of ’s after adding extreme cases using different weighting methods. It is expected the change of is least when using with strata weighting. In addition, the loess graphic was drawn to see the change of pattern in the graphics.

E. I discussed some unfinished computation of the variance of ’ s.

First of all, data was set up and propensity score was computed using the model in BerkleyBirthwtPSA2bp.pdf.

#datasetting and the logistic model from BerkleyBirthwtPSA2bp.pdf by Dr. Pruzek

library(PSAgraphics)

path = "C:\\psa"

dataname = "berk.birthwt2.txt"

setwd(path)

berk.birthwt= read.table(dataname, header = TRUE)

attach(berk.birthwt)

bk.bwt.LR2=glm(formula = smoke ~ parity + ded + age * ed + weight

+ factor(racer) + factor(dracer), family = "binomial")

plot(predict(bk.bwt.LR2),fitted(bk.bwt.LR2),ylim=c(-.1,1.1),col=4)

points(predict(bk.bwt.LR2),jitter(smoke,am=.02))

abline(v=quantile(predict(bk.bwt.LR2),prob=seq(0,1,len=7)),lty=3,col=4)

abline(h=quantile(fitted(bk.bwt.LR2),prob=seq(0,1,len=7)),lty=3,col=1)

title("Logits vs. PS-estimates, LR model 2; jittered overplot is Logits vs. Smoke, n = 954")

bk.loess.plt=loess.psa(bwtt/16,smoke,fitted(bk.bwt.LR2),span=.6,int=10)

Next, different weighting methods were used to estimate ∆s.

########################################################################

#A.I computed the ∆s using different weighting methods.

# propensity score

ps=fitted(bk.bwt.LR2)

n=length(bwtt)

y=bwtt/16

z=smoke

# inverse probability weighting estimator

mu1.ipw1=sum(z*y/ps)/n

mu0.ipw1=sum((1-z)*y/(1-ps))/n

mu1.ipw2=solve(sum(z/ps))*sum(z*y/ps)

mu0.ipw2=solve(sum((1-z)/(1-ps)))*sum((1-z)*y/(1-ps))

ipw1=mu1.ipw1-mu0.ipw1

ipw2=mu1.ipw2-mu0.ipw2

ipw1

ipw2

# IPW- double robustness

berk.birthwt.t=berk.birthwt[smoke==1,]

attach(berk.birthwt.t)

reg.t=lm(bwtt/16~parity + ded + age * ed + weight+ factor(racer) + factor(dracer))

attach(berk.birthwt)

new=as.data.frame(cbind(parity,ded,age,ed,weight,racer,dracer))

m1=predict(reg.t, new, se.fit = TRUE)$fit

attach(berk.birthwt)

berk.birthwt.c=berk.birthwt[smoke==0,]

attach(berk.birthwt.c)

reg.c=lm(bwtt/16~parity + ded + age * ed + weight+ factor(racer) + factor(dracer))

attach(berk.birthwt)

new=as.data.frame(cbind(parity,ded,age,ed,weight,racer,dracer))

m2=predict(reg.c, new, se.fit = TRUE)$fit

ipwdr=sum((z*y-(z-ps)*m1)/ps)/n-sum(((1-z)*y+(z-ps)*m2)/(1-ps))/n

ipwdr

# Weighted linear regression gives the same results as ipw2

attach(berk.birthwt)

w=1/ps*(smoke==1)+1/(1-ps)*(smoke==0)

summary(lm(bwtt/16~smoke, weights=w))

ipw2lm=lm(bwtt/16~smoke, weights=w)$coef[2]

ipw2lm

# IPW3 Lunceford2004

C1=sum( (z-ps)/ps )/ sum( ( (z-ps)/ps )^2 )

C0=-sum( (z-ps)/(1-ps) )/ sum( ( (z-ps)/(1-ps) )^2 )

mu1.ipw3=solve( sum( z/ps*(1-C1/ps) ) ) * sum(z*y/ps*(1-C1/ps))

mu0.ipw3=solve( sum( (1-z)/(1-ps)*(1-C0/(1-ps)) ))* sum( (1-z)*y/(1-ps)*(1-C0/(1-ps)))

ipw3=mu1.ipw3-mu0.ipw3

ipw3

# weighting within strata from Posner's paper

attach(berk.birthwt)

# cut propensity score to quintiles

nq=5

strata=cut(ps, quantile(ps,seq(0,1,1/5)), include.lowest=TRUE, labels=FALSE)

totn=table(strata)

tn=table(strata[smoke==1])

cn=table(strata[smoke==0])

w.t=totn/tn

w.c=totn/cn

w.ct=rbind(w.c,w.t)

w.new=rep(0,n)

strata=as.data.frame(strata)

for (t in 1:2){

for (s in 1:nq){

I=attributes(berk.birthwt[strata==s&smoke==(t-1),])$row.names

w.new[I]=w.ct[t,s]

}

# Weighted linear regression using new weight (weighting within strata by posner)

attach(berk.birthwt)

summary(lm(bwtt/16~smoke, weights=w.new))

ipwws=lm(bwtt/16~smoke, weights=w.new)$coef[2]

#Comparing ipw1 ipw2 ipw2lm ipw3 impdr ipwws

compare=as.vector(t(c(ipw1,ipw2,ipw2lm,ipw3,ipwdr,ipwws)))

names(compare)=c("ipw1","ipw2","ipw21m","ipw3","ipwdr","ipwws")

compare

> compare

ipw1 ipw2 ipw21m ipw3 ipwdr ipwws

-0.6542768 -0.5678549 -0.5678549 -0.5686379 -0.5736179 -0.5431399

#the results shows that the estimator using ipw1 has the largest estimator of AAT. Ipwdr is the second largest estimator although it’s close to imp2. ipw2 and ipw3 are very close to each other. The lowest estimator is ipw within strata.

########################################################################

#b.An attempt to visualize the weighting methods by replicating the observations by weights.

# Expand the sample applying inverse ps as weight

# Expand the sample 100 times and get a sample of effect estimates

# then use granova.1w to analyze the treatment effect

K=100

b=berk.birthwt

beta=rep(0,K)

data=NULL

for (k in 1:K)

{

berk.birthwt.list <- lapply(1:n,function(x){ if (runif(1)<w[x]%%1) {matrix(rep(b[x,],w[x]-w[x]%%1+1),byrow=T,nrow=w[x]-w[x]%%1+1)}

else {matrix(rep(b[x,],w[x]-w[x]%%1),byrow=T,nrow=w[x]-w[x]%%1)} })

#berk.birthwt.exp <- matrix(unlist(berk.birthwt.list),byrow=T,ncol=dim(berk.birthwt)[2])

berk.birthwt.exp=NULL

for (i in 1:n) {berk.birthwt.exp=rbind(berk.birthwt.exp,berk.birthwt.list[[i]]) }

berk.birthwt.exp.m<-matrix(sapply(berk.birthwt.exp,c),ncol=15)

colnames(berk.birthwt.exp.m)=dimnames(berk.birthwt)[[2]]

attach(as.data.frame(berk.birthwt.exp.m))

beta[k]=summary(glm(bwtt/16~smoke))$coef[2,1]

data=rbind(data,berk.birthwt.exp.m)

}

> head(beta)

#[1] -0.5663411 -0.5588715 -0.5805650 -0.5739439 -0.5694305 -0.5684621

> mean(beta)

#[1] -0.5685692

> sd(beta)

#[1] 0.01269660

# now the mean(beta) becomes close the ipw2 (-0.5678549 ).

colnames(data)=dimnames(berk.birthwt)[[2]]

data=as.data.frame(data)

head(data)

dim(data)

granova.1w(yy=data$bwtt/16,group=data$smoke)

########################################################################

#C. An attempt to compare three weighting methods in loess graphics.

# Create 1 expanded sample using weight w, propensity score ps is also binded in

# the original data

K=1

b=cbind(berk.birthwt,ps)

head(b)

beta1=rep(0,K)

data1=NULL

for (k in 1:K)

{

berk.birthwt.list <- lapply(1:n,function(x){ if (runif(1)<w[x]%%1) {matrix(rep(b[x,],w[x]-w[x]%%1+1),byrow=T,nrow=w[x]-w[x]%%1+1)}

else {matrix(rep(b[x,],w[x]-w[x]%%1),byrow=T,nrow=w[x]-w[x]%%1)} })

#berk.birthwt.exp <- matrix(unlist(berk.birthwt.list),byrow=T,ncol=dim(berk.birthwt)[2])

berk.birthwt.exp=NULL

for (i in 1:n) {berk.birthwt.exp=rbind(berk.birthwt.exp,berk.birthwt.list[[i]]) }

berk.birthwt.exp.m<-matrix(sapply(berk.birthwt.exp,c),ncol=16)

colnames(berk.birthwt.exp.m)=c(dimnames(berk.birthwt)[[2]],"ps")

attach(as.data.frame(berk.birthwt.exp.m))

beta1[k]=summary(glm(bwtt/16~smoke))$coef[2,1]

data1=rbind(data1,berk.birthwt.exp.m)

}

head(beta1)

mean(beta1)

sd(beta1)

#colnames(data1)=dimnames(berk.birthwt)[[2]]

data1=as.data.frame(data1)

head(data1)

# expand the data using posner's weight (w.new), weigthting within strata

nq=5

strata=cut(ps, quantile(ps,seq(0,1,1/5)), include.lowest=TRUE, labels=FALSE)

totn=table(strata)

tn=table(strata[smoke==1])

cn=table(strata[smoke==0])

w.t=totn/tn

w.c=totn/cn

w.ct=rbind(w.c,w.t)

w.new=rep(0,n)

strata=as.data.frame(strata)

for (t in 1:2){

for (s in 1:nq){

I=attributes(berk.birthwt[strata==s&smoke==(t-1),])$row.names

w.new[I]=w.ct[t,s]

}

K=1

b=cbind(berk.birthwt,ps)

head(b)

beta2=rep(0,K)

data2=NULL

w=w.new

for (k in 1:K)

{

berk.birthwt.list <- lapply(1:n,function(x){ if (runif(1)<w[x]%%1) {matrix(rep(b[x,],w[x]-w[x]%%1+1),byrow=T,nrow=w[x]-w[x]%%1+1)}

else {matrix(rep(b[x,],w[x]-w[x]%%1),byrow=T,nrow=w[x]-w[x]%%1)} })

#berk.birthwt.exp <- matrix(unlist(berk.birthwt.list),byrow=T,ncol=dim(berk.birthwt)[2])

berk.birthwt.exp=NULL

for (i in 1:n) {berk.birthwt.exp=rbind(berk.birthwt.exp,berk.birthwt.list[[i]]) }

berk.birthwt.exp.m<-matrix(sapply(berk.birthwt.exp,c),ncol=16)

colnames(berk.birthwt.exp.m)=c(dimnames(berk.birthwt)[[2]],"ps")

attach(as.data.frame(berk.birthwt.exp.m))

beta2[k]=summary(glm(bwtt/16~smoke))$coef[2,1]

data2=rbind(data2,berk.birthwt.exp.m)

}

head(beta2)

mean(beta2)

sd(beta2)

#colnames(data1)=dimnames(berk.birthwt)[[2]]

data2=as.data.frame(data2)

head(data2)

dim(data2)

# Apply loess.psa to expanded dataset

attach(as.data.frame(data1))

par(mfrow=c(1,3))

attach(berk.birthwt)

bk.loess.plt=loess.psa(bwtt/16,smoke,fitted(bk.bwt.LR2),span=.6,int=10)

title(main='Original sample without weight')

bk.loess.exp=loess.psa(data1$bwtt/16,data1$smoke,data1$ps,span=.6,int=10)

title(main='Pumped by inverse propensity score weights')

bk.loess.posner=loess.psa(data2$bwtt/16,data2$smoke,data2$ps,span=.6,int=10)

title(main='Pumped by Posners within strata weights')

par(mfrow=c(1,1))

bk.loess.plt

bk.loess.exp

bk.loess.posner

# The table shows

Fan 1

# It can be found that after weighting, the biggest change in the graphics is in the lower end of the dashed low when propensity score is small. Especially, the graphic “pumped by inverse propensity score weights” has the biggest drop in the dashed line.

#This can actually be expected because the weighting coefficient in inverse propensity score weighting for observations with small propensity scores in treatment group is the largest, thus has the largest effect on the graphic.

############################################################################

#D. An attempt to compare 3 weighting methods using data with extreme cases.

#d.When there are extreme cases( I added 2 extreme cases in the data, with 1 case in smoker group and 1 case in nonsmoker group),

#> ps.new1

#0.01038011

#> ps.new2

#0.9871458

# I again draw graphics of lowess regression lines for different methods.

# My goal is to see if the weighting within strata can solve the extreme influence of outliers using weighting as it is expected.

#d.1 Artificially create outliers with extreme low propensity score in treatment group: new1

# new is a artificially added observation with extreme small ps (close to 0)

#summary(bk.bwt.LR2)

W1=cbind(parity,ded,age,ed,weight,racer,dracer,ps)

summary(W1)

new=W1[smoke==1&ps<=0.1,-9]

new

new[3]=20

new[4]=10

new=as.data.frame(t(new))

f.new=predict(bk.bwt.LR2,new)

ps.new1=1/(1+exp(-f.new))

ps.new1

m1=1/2*(mean(bwtt[smoke==0])+max(bwtt[smoke==0]))# Assume the birthweight is average of smoke group mean and max non-smoker birthweight

# Artificially add an outlier belonging to control group with extremely large ps (close to 1): new2

# Two artificial data points:

parity ded age ed weight racer dracer ps bwtt

1 0 4 20 10 125 9 9 0.01038011 148.8803

parity ded age ed weight racer dracer ps bwtt

1 0 2 1 0 1 1 6 0.9871458 92.90108

> tail(w) #notice the last two artificial points are with extremely large weights here using inverse probability weighting

951 952 953 954 1 1

1.485113 2.523248 1.547271 1.606789 96.338049 77.795550

# Posner’s weight

> tail(w.new) #check posner's weight on last two observation. Notice they are much smaller than IPW approach.

[1] 1.436090 2.652778 1.605042 1.605042 3.840000 2.358025

#summary(bk.bwt.LR2)

new2=W1[smoke==0&ps==max(ps[smoke==0]),-9]

new2

new2[4]=0

new2[3]=1

new2[5]=1

new2=as.data.frame(t(new2))

f.new2=predict(bk.bwt.LR2,new2)

ps.new2=1/(1+exp(-f.new2))

ps.new2

m2=1/2*(mean(bwtt[smoke==1])+min(bwtt[smoke==1])) # Assume the birthweight is average of smoke group mean and minimum smoker birthweight

# Create smoke.new, ps.new, and bwtt.new each with 2 extra artificial points

attach(berk.birthwt)

smoke.new=c(smoke,1,0)

ps.new=c(fitted(bk.bwt.LR2),ps.new1,ps.new2)

bwtt.new=c(bwtt,m1,m2)

# Loess regression plot of original data and artifial data

par(mfrow=c(1,2))

bk.loess.plt=loess.psa(bwtt/16,smoke,fitted(bk.bwt.LR2),span=.6,int=10)

title(main='Original sample without weight')

bk.loess.plt.new=loess.psa(bwtt.new/16,smoke.new,ps.new,span=.6,int=10)

title(main='Original sample + two artificial outliers without weight')

par(mfrow=c(1,1))

Fan 1

# Now pump the data using weights and redo loess regression plot

# Create 1 expanded sample using weight w, propensity score ps is also binded in

# the original data

# Pump up data with artificially added 2 extreme outliers

K=1

b=cbind(bwtt.new,smoke.new,ps.new)

ps=ps.new

smoke=smoke.new

bwtt=bwtt.new

n=length(ps)

#define weight (here with two more points)

w=1/ps*(smoke==1)+1/(1-ps)*(smoke==0)

tail(w) #notice the last two artificial points are with extremely large weights here using inverse probability weighting

# Pump original data to data1 using w

head(b)

beta1=rep(0,K)

data1=NULL

for (k in 1:K)

{

berk.birthwt.list <- lapply(1:n,function(x){ if (runif(1)<w[x]%%1) {matrix(rep(b[x,],w[x]-w[x]%%1+1),byrow=T,nrow=w[x]-w[x]%%1+1)}

else {matrix(rep(b[x,],w[x]-w[x]%%1),byrow=T,nrow=w[x]-w[x]%%1)} })

berk.birthwt.exp=NULL

for (i in 1:n) {berk.birthwt.exp=rbind(berk.birthwt.exp,berk.birthwt.list[[i]]) }

berk.birthwt.exp.m<-matrix(sapply(berk.birthwt.exp,c),ncol=3)

colnames(berk.birthwt.exp.m)=c("bwtt","smoke","ps")

attach(as.data.frame(berk.birthwt.exp.m))

beta1[k]=summary(glm(bwtt/16~smoke))$coef[2,1]

data1=rbind(data1,berk.birthwt.exp.m)

}

head(beta1)

mean(beta1)

sd(beta1)

#colnames(data1)=dimnames(berk.birthwt)[[2]]

data1=as.data.frame(data1)

head(data1)

tail(data2)

# expand the data using posner's weight (w.new), weigthting within strata

# first define posner's weight using ps.new

bn=rbind(berk.birthwt,1,1)

smoke=smoke.new

nq=5

n=length(ps.new)

strata=cut(ps.new, quantile(ps.new,seq(0,1,1/5)), include.lowest=TRUE, labels=FALSE)

totn=table(strata)

tn=table(strata[smoke==1])

cn=table(strata[smoke==0])

w.t=totn/tn

w.c=totn/cn

w.ct=rbind(w.c,w.t)

w.new=rep(0,n)

strata=as.data.frame(strata)

for (t in 1:2){

for (s in 1:nq){

I=attributes(as.data.frame(bn[strata==s&smoke==(t-1),]))$row.names

w.new[I]=w.ct[t,s]

}