1
Stata helpMar. 08
1Basics
1.1Help
1.2Short cuts
1.3Options
1.3.1Memory, Max variables and max matrix size......
1.4Save commands
1.5Save output
1.6Notation
1.6.1Variable names......
1.7Command syntax
1.7.1By......
1.7.2Weights......
1.7.3If exp......
1.7.4Ranges......
1.8Prefix commands
1.9Estimation commands
1.10Postestimation commands
2Functions
2.1Matematical functions
2.2Statistical functions
2.2.1Examples......
2.3Logical
3Data handling
3.1Import data
3.2Use and save
3.3Describe, labels
3.4Formats
3.5Recoding
3.6Generate, replace
3.7Extended generate
3.7.1Functions......
3.8Drop, keep
3.9Missing
3.10Sort
3.11String commands
3.12Accessing results from commands
3.12.1System variables......
3.12.2Saved results......
3.12.3Accessing results from commands, save as macros......
4Uni- and bivariate
4.1List
4.2Tabulate
4.2.1One-way tables......
4.2.2Two-way tables......
4.2.3Three-way tables......
4.3Table of summary statistics
4.4Means and confidence intervals
4.5Summarize
4.6T-test
4.6.1Test of equal variance (standard deviation)......
4.6.2One way anova......
4.7Non-parametric analysis
4.8Proportions
5Graphics
5.1Plot types
5.2Graph Twoway
5.2.1Twoway syntax......
5.2.2Twoway plot types......
5.2.3Twoway fitlines......
5.3Graph Bar, Hbar and Dot
5.3.1Syntax......
5.3.2Options......
5.4Graph Box, Hbox
5.5Graph Pie
5.5.1Options......
5.6Graph Matrix
5.7Other graphs
5.8Titles
5.8.1Title options......
5.9Legend
5.10Axis scale, label, ticks and grid
5.10.1Axix title......
5.10.2Axis scale......
5.10.3Axis labels and ticks......
5.11Text
5.12Markers and marker labels
5.12.1Markers......
5.12.2Marker labels......
5.13Lines
5.13.1Connecting points......
5.13.2Line options......
5.14Text box options
5.15Other options
5.15.1Colors......
5.15.2Positions......
5.16Over()
5.17By()
5.18Schemes
5.19Combinding graphs
6Regression commands
6.1Regression models
6.1.1Linear regression with simple error structure......
6.1.2GLM......
6.1.3Conditional logistc......
6.1.4Multiple outcome......
6.1.5Linear regression with complex error structure......
6.1.6Survival models......
6.2Orthogonal variables
6.3Test after regression commands
6.3.1Wald test......
6.3.2Likelihood ratio test......
6.4Cataloging estimation results
6.5Cov, Corr, AIC, BIC and sample
6.6Prediction
7Linear regression
7.1.1Test of assumtions......
7.1.2Test of influence......
7.1.3Test of multicollinearity......
8Logistic regression
8.1Syntax
8.2Categorical covariates
8.3Residuals, goodnes-of-fit
8.4Diagnostic plots
9ST Survival time data
9.1Initial settings and description
9.2Kaplan –Meier …
9.3Survival regression models
9.3.1Cox......
9.3.2Parametric survival......
10xtmixed -- Multilevel mixed-effects linear regression
10.1Syntax
10.2Random effect covariances
10.3Predict
11Data reduction
11.1Factor analysis
12Programing
12.1Programs
12.1.1Program definition......
12.2Macros
12.3Loops
12.3.1For loop......
12.3.2Foreach......
12.3.3While......
12.4Conditions
12.4.1If......
12.5Matrix expressions
12.5.1Matrix operators......
13GLLAMM
13.1Instalation
13.2Data format
13.3Syntax examples
13.3.1A two-level random intercept model (logistic)......
13.3.2A two-level random intercept and slope model (linear)......
13.3.3A two-level random intercept model, x1 and x2 categorical......
13.4Prediction
13.4.1Syntax and options......
14Survey commands
14.1Setting stratification, clustering, finite population correction and sample weigths
14.2Means and proportions
14.3Tables
14.4Regression
14.5Stata web links
1Basics
1.1Help
help cmdshow help file for cmd
1.2Short cuts
Ctlr-Rrun selection in do file
Ctlr-Ddo selection in do file
Ctrl-Alt-Tstart STATA
PgUp / PgDownprew/next command in command window
# review nshow last n commands
escclear command
1.3Options
1.3.1Memory, Max variables and max matrix size
set memory 100mdefault =10 Mb, max=as large as OS allows
set maxvar 1000default =5000, max=32767
set matsize 500default =400, max=11000
set xxx, permanentlywill set for all sessions
1.4Save commands
cmdlog using myfilestart a command log file
cmdlog closeclose (and save) command log file
Can also save Review windov as do file, click on left upper “minus”
1.5Save output
(set more off), Begin log, …….., close log, save log, print log
1.6Notation
==equal
~= (or !=)not equal
and
|or
~ (or !)not
x^2x square
+string concatination
.missing
x[3]3. Observation of x
x[_n-1]previous value of x
replace x=2 if _n==3x[3]=2
1.6.1Variable names
Names can be 1-32 ch long, letters (case sensitive), digits, underscore. Start with letter.
1.7Command syntax
[by varlist:] command [varlist] [weigth] [if exp] [in range] [using filename] [, options]
OBS All command are lower case letters!
1.7.1By
by varlist:repeat for all combinations of values in varlist, use sort varlist first
by varlist, sort:
1.7.2Weights
[weighttype=var]
fweight=freqfrequency weighting for aggregated data
aweight=1/sdanalytic weighting by precision
pweight=1/probprobability weighting by sample probabilities
iweight=importance weighting, manual controll of weights
ref U 23.13 and U 30
1.7.3If exp
if expdo if exp == true(OBS, missing includsed)
1.7.4Ranges
in rangerestrict to range (in first/last), f=first, l=last, -n from end. Ex: 5/25, -10/l
list x in 5/10x from 5 to 10
list x in f/10x from first to 10
list x in -10/lx from –10 to last= 10 last observations
1.8Prefix commands
by:
statsby:
bootstrap:
jackknife:
simulate:
svy:
stepwise:
xi:
1.9Estimation commands
1.10Postestimation commands
mfxmarginal effects
adjustadjusted means
estat vcevariance/covariance of estimates
predict, predictnl
ereturn listlist of saved results
test, testnllinear and nonlinear Wald test
lrtestlikelihood ratio tests
lincompoint estimates and conf int of linear combinations
nlcomnon-linear comb
estimatesstore and retrieve results
2Functions
2.1Matematical functions
sqrt()
ln() or log()natural log
log10()
abs()
int()
exp()
min(x1,…,xn) max….
2.2Statistical functions
comb(n,k)“n over k”
binomial(n,k,p)
chi2(df,x)cum chi2
normden(z,s)N(0,s2)
norm(z)cum N(0,1)
uniform()0-1
2.2.1Examples
a+(b-a)*uniform()random uniform [a,b)
a+int((b-a+1)*uniform())random integers [a,b]
mu+s*invnorm(uniform())random normal mu s2
2.3Logical
cond(x,a,b)if x then a else b
3Data handling
3.1Import data
Use DBMS copy to convert from SPSS to Stata format. Use Stata 6 , 8 byte double as outcome file
3.2Use and save
use file.dta
save newfile.dtasave new copy
save file.dta ,replaceOverwrite original data
3.3Describe, labels
describeoverview of variables
label var varname “text”variable lable
label define lblname # “text” # “text”…define mapping between numeric values (#) and labels (“text”) called lblname
label values varname lblnameassociate mapping with variable
3.4Formats
format varname %w.d typew=widht in columns, d=decimal places,
typeg=general, f=fixed, s=string.
Examples: %9.0g , %9.2f, %10s
3.5Recoding
recode varlist (rule) (rule), gen(varlist) copysyntax
recode x (1 2=1 low) (3 4=2 high)(missing=.), gen(x2)recode 1 and 2 into 1, 3 and 4 into 2 give labels and generate new x2
recode x(1=2) if sex==1), gen(x2) copycopy values for sex!=1
egen ageGr3=cut(age), group(3) label3 equal sized groups
egen ageGr2=cut(age), at(0,50,80) label2 groups 0-50, 50-80, values outside set to missing
encode stringvar, generate(newvar)make numerical newvar (1,2,3…) based on stringvar values
3.6Generate, replace
generate newvar=expcreate new variable
replace oldvar=exp
gen agegr=age>=30 if age!=.missing values are greater than all numerical values
gen xlag=x[_n-1]
gen xlead=x[_n+1]
3.7Extended generate
egen [type] newvar = fcn(arguments) [if exp] [in range] [, options]
egen newvar=fcn(arg)extended generate: make newvar from stored functions.
Ex: by code, sort: egen mx=median(x) gives medians of x by values of code
by ... : may be used with some egen functions
3.7.1Functions
count(exp)number of nonmissing observations of exp.
cut(varname), {at(#,#,...,#)|group(#)}cut at the at() numbers, or in equal groups
mean, median, max, min, std, sum
pctile(exp) [, p(#)] percentiles
group(var1 var2)new var from all combinations of var1 and var2
rmiss
3.8Drop, keep
drop varnamesdrop variables from memory
drop in 3drop observation 3
keep var1-var5keep variables 1 to 5. OBS Keep if age==10 will also keep missing.
drop if age==.Remove missing
3.9Missing
.numerical missing
“”string missing
missing(x)is eqv to x==. if x is numeric, is equv to x==”” is x is string
missing values are greater than all numerical values and are sorted last, age>=30 will include missing.
gen agegr=age>=30 if age!=.
drop if age==.Remove missing
mvdecode x1, mv(99) set 99 to missing
mvencode x1, mv(.=99) set missing to 99
3.10Sort
sort varnamesort by variable. Use before “by var:” command
3.11String commands
fname+” “+lnamestring concatination
substr(name,1,10)
See U 16.3.5
3.12Aggregate
contract vars, freq(fname) percent(pname)contract (aggregate) over variable patterns to freq and percents
collapse varscollapse data to means (or other ststs) over variable patterns
3.13Accessing results from commands
3.13.1System variables
_b[varname]regression coef
_b[cons]intercept
_se[varname]SE of regression coef
_ncurrent observation
_Ntotal number of obs
_pipi
Ex: regress y x, _b[_cons] gives constatnt term, _b[x[1]] gives coeff of first category of x, _se[x[1]] gives stand error
Ex: xi:regres y I.x, _b[_Ix_2] gives coef of second level of x (created dummy called _Ix_2)
3.13.2Saved results
return listrun after a command to find list of saved results
ereturn listrun after a command to find list of estimated saved results
e(name)estimation class, live until next estimation
r(name)result class, live until next command
Ex: summarize age, gen agedev=age-r(mean)
Ex: regress y x1 x2, matrix B=e(b), matrix corr=e(V)save coeff and corr matrices
3.13.3Accessing results from commands, save as macros
sum w if c==1mean of w for c=1
global w1=r(mean)save as global macro
dis $w1show content of macro
4Uni- and bivariate
4.1List
list varlist [, [no]display nolabels]list variables, nodisplay gives tabular data, nolables gives values
list varname-i – varname-jList a group of variables
list in 33. Observation, -1=last, 1/10 = 1 to 10
list if explist if var>10, list if var==10
list var1 if var2==.List if var2 is missing
4.2Tabulate
4.2.1One-way tables
tabulate var [weight][if expr][in range][,nofreq plot missing nolable]nolable shows category values
tab1 varlistone way tables for all variables
tab c, gen(c)create dummies c1, c2,.. for each category of c
4.2.2Two-way tables
tab var1 var2 [weight][if expr][in range][,nofreq col row cells chi2 exact missing nolabel]
tab var1 var2 , nofreq col chicrosstab column % no freq with chi-square test
tab var1 var2 ,exactFisher exact test
tabi 30 20 \ 20 10, col chi2immediate table
tab var1 var2, summarize(var3)mean, sd and freq of var3 by var1 and var2. Use mean standard or freq to limit out
4.2.3Three-way tables
sort var3
by var3: tab var1 var2
4.3Table of summary statistics
table rowvar [colvar [supercolvar]] [if] [in] [weight] [, options]
table rowvar, contents(clist) row colclist:freq, mean, sd, sum, n, max, min, median, p# (percentile),iqr. Totals: row col. Show missing: missing
table rowvar colvar supercolvar by superrowvarlistmulti way tables
Ex: table sex, c(n age mean age mean educ) rowsubjects, mean age and mean educ by sex , plus total row
tabstat varlist [if] [in] [weight] [, options]
epitab
4.4Means and confidence intervals
means varlist3 types of means with ci
ci varlist, binomial poisson totalci for means, proportions and counts
4.5Summarize
summarize varsnumber, mean, sd, min, max. Summarize alone takes all variables.
summarize vars ,detailpercentiles, var, skew, kurt
inspect vardetails on values
4.6T-test
ttest var=#one sample T-test
ttest var, by(c)two sample T-test
ttest var1=var2paired two sample T-test
ttest var1=var2, unpairedtwo sample T-test
,unequalequal variances not assumed
Ex: sdtest age, by(sex) (equal var rejected) ttest age, by(sex) unequal
4.6.1Test of equal variance (standard deviation)
sdtest var=#standard deviation=#
sdtest var, by(c)two groups compared
sdtest var1=var2same variance in both variables
4.6.2One way anova
oneway response_var factor_var [weight] [if exp] [in range] [, noanova nolabel missing wrap tabulate
[no]means [no]standard [no]freq [no]obs bonferroni scheffe sidak ]
Ex: oneway var c, tabulateanalysis of var by c
4.7Non-parametric analysis
by gender, sort: centile partners, centile(25 50 75) cci percentiles with exact confidence interval
ranksum partners, by(gender)Mann-Whitney test=Wilcoxon rank sum, 2 group
kwallis partners, by(age3)Kruskal Wallis K-group test
4.8Proportions
proportions x1,over(c)proportions with ci
5Graphics
5.1Plot types
graph twowayscatter, line, density, histogram, function,..
graph matrix
graph bar, hbar, dot
graph box
graph pie
5.2Graph Twoway
5.2.1Twoway syntax
graph twoway plot [if exp] [in range] [, options]twoway syntax (graph may be omitted)
where plot=(plottype varlist, options)plot syntax, several plots may be listed and combined
where varlist= y1 y2 … xlats variable is x
Ex: twoway scatter y xplot y by x
5.2.2Twoway plot types
scatter, line, connected, area
dot, bar, histogram, kdensitykernal desity
function y=f(x),range( x1 x2)f(x) from x1 to x2
rarea rcap rbarrange area, range cap, range bar ,
Ex: twoway area y x , sort base(50) gives shading from 50
Ex: Histogram, bin(10) start(-2.5) percent/frequency
Ex: twoway (histogram x, width(1) frequency) (kdensity x, area(3200))area scaled to the sum of subjects
Ex: function y=normden(x), range(-4 4) droplines(-1.96 1.96)function plots
Ex: twoway dropline db id if abs(db>.25) , mlabel(id)deltabeta >0.25
5.2.3Twoway fitlines
lfit, qfit, mband, mspline,lowesslinear and quadratic fits, median band, median splines and lowess
lfitci, qfitci, fpfitcifit with CI: linear, quadratic, fractional polynom
Ex: (lfitci y x, ciplot(rline)) default is rarea
Ex.: twoway (lfit y x) (lowess y x) (scatter y x)scatter with linear and lowess fit
5.3Graph Bar, Hbar and Dot
5.3.1Syntax
graph bar/hbar/dot yvars [if exp] [in range] [, options]
Where yvars=varlist, or =(stat) varlist, or= (stat) name=varname
stat= mean, median, p1, p2, p99, sum, count, min, max
Ex: graph bar x ,over(c) nofillmeans of x over categories of c
Ex: graph bar (mean) meany=x (median) medy=xmean and median of the same variable
Ex: graph bar (median) x1 x2 , percent stackstacked percentages
5.3.2Options
nofillskip empty categories
sort(1)sort by 1 variable
over(c1)values for each c1
by(c2)separete plots for each c2
bargap(0)% overlap, -30=30% overlap, 30=gap.
blabel(what,where_and_how)bar labels
what: bar/ total/ name/ groupprint height, total height, name of yvar, name of first over() group
Where_and_how:
position(outside/ inside/base/center)where to lpace the bar label
format(%9.1f) gap(rel_size) textbox_optionsoptions for labels
Ex: graph bar teq1 ,over(landsdel) nofill blabel(bar, pos(inside) size(*1.3) format(%9.1f) color(white))
Ex: graph hbar teq1 ,over(landsdel,axis(off) sort(1))nofill blabel(group, pos(base) size(*1.3) format(%9.1f) color(white))
5.4Graph Box, Hbox
graph box x1 x2 x3, ascategoryboxplot of separate cariables, ascat puts labels on the y-axis
graph hbox x, over(c, total)plot of x over cat of c plus total
5.5Graph Pie
graph pie x1 x2 x3sum of x1, x2 and x3
graph pie x ,over(c)sum of x for each category of c
graph pie ,over(c)number of cases for each category of c
5.5.1Options
plabel(_all sum/ percent/ name/ text, text_box_options)label all slices with sum, percent, x-names or a given text
5.6Graph Matrix
graph matrix x1-x5scatter of all 5 variables
5.7Other graphs
gladder y, qladder yhistograms over different transformations of y, QQ plot of the same
5.8Titles
title(“text”), xtitle(“text”), ytitle(“text”)titles
title, subtitle, captition, notetitle types
5.8.1Title options
position(clockpos)
ring(ringpos)
span
text_box_options
Ex: scatter teq1 moralder, title("Title", position(12) ring(0))
5.9Legend
legend([contents] [location])
Contetnts:
order(1 2 3)may also use order(1-“label1” 2 3)
label(1 “label1”)override legend for var 1
cols(1)legend in 1 column. Row(1) …
stackstack symbol and text
rowgap(2) colgap(2)gap between each element
Location:
on/offlegend on/off
position(clock)position of legend
ring(1)radial distance from plot, ring(0)=inside
Ex: legend(label(1 "Density of TEQ") label(2 "Mean") label(3 "Median") ring(0) pos(2) cols(1))
Ex: graph bar teq_di teq_fu teq_npcb teq_mopc teq_hcb ///
, legend(row(1) stack colgap(10) label(1 "Dioxin") label(2 "Furan") label(3 "Non-o") label(4 "Mono") label(5 "HCB"))
5.10Axis scale, label, ticks and grid
5.10.1Axix title
x|ytitle(“line1” “line2”)
5.10.2Axis scale
x|yscale(opts)
Options:
axis(1)axis to modify (1-9)
[no]log
[no]reverse
range(0 100)extend range, will not decrease range. range(0): start at 0, range(100): end at 100
altaxis at alternative side
on/offaxis on/off
Ex: scatter teq1 moralder,xscale(range(0 80)) yscale(off)no y-axis
5.10.3Axis labels and ticks
x|ylabel(rule_or_values,opts)major ticks and labels
x|ytick(rule_or_values)major ticks
x|ymlabel(rule_or_values)minor ticks and labels
x|ymtick(rule_or_values)minor ticks
rule or values (may use both):
#1010 nice values
1 5 50labels at 1, 5 and 50
0 5 10 “mean” 15 20labels every 5, with mean printed at 10
0 (10) 100labels from 0 to 100 in steps of 10
minmaxmin and max values
none
Label options:
angle(0)
[no]gridadd gridlines
format(%5.0f)5 places, o decimals, fixed
Ex: xlabel(1 “Low” 2 “Medium” 3”High”,angle(45))text labels at values 1 2 and 3, at 45 deg
Ex: scatter teq1 moralder,xlabel(#10,grid)
5.11Text
text(y x “text”, opts)text at y,x in the plot
placement(c )c=centered, n=north, s=south, ..
orientation(vertical)
boxdraw box around text
Ex: graph …, text(10 50 “Line1” “Line2”, just(left) color(blue) )two lines of text at (y,x)=(10,50)
5.12Markers and marker labels
5.12.1Markers
mstyle(p1 p2 )default styles
msymbol(sym1 sym2 …)marker, Square, square(small), Sh (hollow), Square, Diamond, Triange, O circle, X , +, p point, . default, i invisible. Ex msymbol(S)
msize(small medium large), msize(*2)small meduin large markers, twize the size
mcolor(green)both outside and inside color
Ex msymbol(. t Oh)markers for 3 variables: default, small triagles and hollow circles
Ex twoway scatter y x [aweight=z], msymbol(oh) msize(small)point size prop to z
5.12.2Marker labels
mlabel(var)label marker by var content
mlabsize(size)
mlabcolor(color)
mlabelpos(12)label at 12 o’clock position
mlabvposition(var)postitions based on variable containing clock positions
mlabgap(*3)3 times larger gap between marker and label
Ex scatter y x, mlabel(z) mlabpos(center) msymbol(i) use contents of z to label points, labels in the center and invisible points
5.13Lines
5.13.1Connecting points
Twoway scatter y x, connect(l) sortsort points, connect with line
connect(l)line
connect(L)separate line for each series
connects(J stepstair)for survival curves
5.13.2Line options
lcolor(red)line color
lwidth(thick) or lwidth(*3)thick line
lpattern(dash)
lpattern(“l” “.-“ “-###”)solid, dotdashed, dash+3 spaces
5.14Text box options
tsstyle(textboxstyle)overall style
box/noboxborder
size(textsizestyle)
color(colorstyle)text color
justification(justificationstyle)text left, center, right
alignment(alignmentsyle)text top, middle, bottom, baseline
bfcolor(colorstyle)background color
bcolor(colorstyle)background and border color
blstyle(linestyle)style of border
orientation(orientationstyle)vertical/horizontal, rvertical/rhorizontal
placement(compassdirstyle)location
ring(1)0:inside, 1-7 outside
format(%9.1f)9 places, 1 desimal, fixed
Ex: graph…,title(“My title”, color(red) box size(*1.5))
5.15Other options
5.15.1Colors
black, white, red, blue, cyan, green, mint, yellow….
gs0… gs16gray scales from black to white
gray=gs8
color*0.5half the intensity
5.15.2Positions
clockpos(12)12 o’clock. clockpos(0) means center if valid
placement(north)alternative to clock with 9 positions
ring(1)0:inside, 1-7 outside
justification(left/ centered/ right)text justification
alignment(top/ middle/ bottom/ baseline)text alignement
orientation(horizontal/ vertical/rhorizontal/ rvertical)
5.16Over()
over(c, total)split by categories of c plus total, can use over(c1) over(c2)
over(c, descending)sort values.
over(c, sort(c2)), sort(1)sort by c2 or by the first y variable
over(var, relabel(1 “lab1” “lab2”))new labels for ”over” variable
ascategory / asyvarsas categories: plotted with spaces, as yvars: plotted dense
missing, nofillshow missing, do not show empty combinations
Ex: graph bar teq_di teq_fu ,over(landsdel, total) nofill
5.17By()
by(varlist, suboptions)separate graphs for each varlist
totaladd total group
missingadd missing groups
colfirstdisplay down columns
rows(#), cols(#)number of rows or cols
holes(numlist)positions to leave blank
compact
Ex: graph bar teq_di teq_fu ,by(star, total rows(1) compact)
5.18Schemes
set scheme(schemename) [,permanently]set overall look of graphs
graph …, scheme(schemename)set overall look for current graph
graph query, schemeslist installed schemes
schemenames:
s2colorDefault, will vary colors of lines and markers
s2monomonocrome, will vary patterns of lines and markers
5.19Combinding graphs
graph …., saving(plt1,replace) or name(plt1)saving to file
graph …., name(plt1,replace)saving to memory
graph use plt1.gph or display plt1.gphshow saved graph from file
graph combine plt1 plt2, ycommon cols(1)combine from memory in 1 row with same y scaling
graph combine plt1.gph plt2.gphcombine from file
5.20Graph query
graph querylist of all styletypes
graph query colorlist of all colorstyles
graph query linepatternlist of all linepatternstyles
5.21Palettes
palette lineplot showing the linetypes
palette symbolplot showing the symboltypes
palette color1 color2plot comparing colors
6Regression commands
6.1Regression models
6.1.1Linear regression with simple error structure
regresslinear regression (also heteroschedastic errors)
boxcoxlinear regression on BoxCox transformations of y and x’s
nlnon linear least squares
6.1.2GLM
logisticlogistic regression
poissonPoisson regression
binregbinary outcome, OR, RR, or RD effect measures
glmuse for non-canonical links
6.1.3Conditional logistc
clogitfor matched case-control data
6.1.4Multiple outcome
mlogitmultinomial logit (not ordered)
ologitordered logit
6.1.5Linear regression with complex error structure