1

Stata helpMar. 08

1Basics

1.1Help

1.2Short cuts

1.3Options

1.3.1Memory, Max variables and max matrix size......

1.4Save commands

1.5Save output

1.6Notation

1.6.1Variable names......

1.7Command syntax

1.7.1By......

1.7.2Weights......

1.7.3If exp......

1.7.4Ranges......

1.8Prefix commands

1.9Estimation commands

1.10Postestimation commands

2Functions

2.1Matematical functions

2.2Statistical functions

2.2.1Examples......

2.3Logical

3Data handling

3.1Import data

3.2Use and save

3.3Describe, labels

3.4Formats

3.5Recoding

3.6Generate, replace

3.7Extended generate

3.7.1Functions......

3.8Drop, keep

3.9Missing

3.10Sort

3.11String commands

3.12Accessing results from commands

3.12.1System variables......

3.12.2Saved results......

3.12.3Accessing results from commands, save as macros......

4Uni- and bivariate

4.1List

4.2Tabulate

4.2.1One-way tables......

4.2.2Two-way tables......

4.2.3Three-way tables......

4.3Table of summary statistics

4.4Means and confidence intervals

4.5Summarize

4.6T-test

4.6.1Test of equal variance (standard deviation)......

4.6.2One way anova......

4.7Non-parametric analysis

4.8Proportions

5Graphics

5.1Plot types

5.2Graph Twoway

5.2.1Twoway syntax......

5.2.2Twoway plot types......

5.2.3Twoway fitlines......

5.3Graph Bar, Hbar and Dot

5.3.1Syntax......

5.3.2Options......

5.4Graph Box, Hbox

5.5Graph Pie

5.5.1Options......

5.6Graph Matrix

5.7Other graphs

5.8Titles

5.8.1Title options......

5.9Legend

5.10Axis scale, label, ticks and grid

5.10.1Axix title......

5.10.2Axis scale......

5.10.3Axis labels and ticks......

5.11Text

5.12Markers and marker labels

5.12.1Markers......

5.12.2Marker labels......

5.13Lines

5.13.1Connecting points......

5.13.2Line options......

5.14Text box options

5.15Other options

5.15.1Colors......

5.15.2Positions......

5.16Over()

5.17By()

5.18Schemes

5.19Combinding graphs

6Regression commands

6.1Regression models

6.1.1Linear regression with simple error structure......

6.1.2GLM......

6.1.3Conditional logistc......

6.1.4Multiple outcome......

6.1.5Linear regression with complex error structure......

6.1.6Survival models......

6.2Orthogonal variables

6.3Test after regression commands

6.3.1Wald test......

6.3.2Likelihood ratio test......

6.4Cataloging estimation results

6.5Cov, Corr, AIC, BIC and sample

6.6Prediction

7Linear regression

7.1.1Test of assumtions......

7.1.2Test of influence......

7.1.3Test of multicollinearity......

8Logistic regression

8.1Syntax

8.2Categorical covariates

8.3Residuals, goodnes-of-fit

8.4Diagnostic plots

9ST Survival time data

9.1Initial settings and description

9.2Kaplan –Meier …

9.3Survival regression models

9.3.1Cox......

9.3.2Parametric survival......

10xtmixed -- Multilevel mixed-effects linear regression

10.1Syntax

10.2Random effect covariances

10.3Predict

11Data reduction

11.1Factor analysis

12Programing

12.1Programs

12.1.1Program definition......

12.2Macros

12.3Loops

12.3.1For loop......

12.3.2Foreach......

12.3.3While......

12.4Conditions

12.4.1If......

12.5Matrix expressions

12.5.1Matrix operators......

13GLLAMM

13.1Instalation

13.2Data format

13.3Syntax examples

13.3.1A two-level random intercept model (logistic)......

13.3.2A two-level random intercept and slope model (linear)......

13.3.3A two-level random intercept model, x1 and x2 categorical......

13.4Prediction

13.4.1Syntax and options......

14Survey commands

14.1Setting stratification, clustering, finite population correction and sample weigths

14.2Means and proportions

14.3Tables

14.4Regression

14.5Stata web links

1Basics

1.1Help

help cmdshow help file for cmd

1.2Short cuts

Ctlr-Rrun selection in do file

Ctlr-Ddo selection in do file

Ctrl-Alt-Tstart STATA

PgUp / PgDownprew/next command in command window

# review nshow last n commands

escclear command

1.3Options

1.3.1Memory, Max variables and max matrix size

set memory 100mdefault =10 Mb, max=as large as OS allows

set maxvar 1000default =5000, max=32767

set matsize 500default =400, max=11000

set xxx, permanentlywill set for all sessions

1.4Save commands

cmdlog using myfilestart a command log file

cmdlog closeclose (and save) command log file

Can also save Review windov as do file, click on left upper “minus”

1.5Save output

(set more off), Begin log, …….., close log, save log, print log

1.6Notation

==equal

~= (or !=)not equal

and

|or

~ (or !)not

x^2x square

+string concatination

.missing

x[3]3. Observation of x

x[_n-1]previous value of x

replace x=2 if _n==3x[3]=2

1.6.1Variable names

Names can be 1-32 ch long, letters (case sensitive), digits, underscore. Start with letter.

1.7Command syntax

[by varlist:] command [varlist] [weigth] [if exp] [in range] [using filename] [, options]

OBS All command are lower case letters!

1.7.1By

by varlist:repeat for all combinations of values in varlist, use sort varlist first

by varlist, sort:

1.7.2Weights

[weighttype=var]

fweight=freqfrequency weighting for aggregated data

aweight=1/sdanalytic weighting by precision

pweight=1/probprobability weighting by sample probabilities

iweight=importance weighting, manual controll of weights

ref U 23.13 and U 30

1.7.3If exp

if expdo if exp == true(OBS, missing includsed)

1.7.4Ranges

in rangerestrict to range (in first/last), f=first, l=last, -n from end. Ex: 5/25, -10/l

list x in 5/10x from 5 to 10

list x in f/10x from first to 10

list x in -10/lx from –10 to last= 10 last observations

1.8Prefix commands

by:

statsby:

bootstrap:

jackknife:

simulate:

svy:

stepwise:

xi:

1.9Estimation commands

1.10Postestimation commands

mfxmarginal effects

adjustadjusted means

estat vcevariance/covariance of estimates

predict, predictnl

ereturn listlist of saved results

test, testnllinear and nonlinear Wald test

lrtestlikelihood ratio tests

lincompoint estimates and conf int of linear combinations

nlcomnon-linear comb

estimatesstore and retrieve results

2Functions

2.1Matematical functions

sqrt()

ln() or log()natural log

log10()

abs()

int()

exp()

min(x1,…,xn) max….

2.2Statistical functions

comb(n,k)“n over k”

binomial(n,k,p)

chi2(df,x)cum chi2

normden(z,s)N(0,s2)

norm(z)cum N(0,1)

uniform()0-1

2.2.1Examples

a+(b-a)*uniform()random uniform [a,b)

a+int((b-a+1)*uniform())random integers [a,b]

mu+s*invnorm(uniform())random normal mu s2

2.3Logical

cond(x,a,b)if x then a else b

3Data handling

3.1Import data

Use DBMS copy to convert from SPSS to Stata format. Use Stata 6 , 8 byte double as outcome file

3.2Use and save

use file.dta

save newfile.dtasave new copy

save file.dta ,replaceOverwrite original data

3.3Describe, labels

describeoverview of variables

label var varname “text”variable lable

label define lblname # “text” # “text”…define mapping between numeric values (#) and labels (“text”) called lblname

label values varname lblnameassociate mapping with variable

3.4Formats

format varname %w.d typew=widht in columns, d=decimal places,

typeg=general, f=fixed, s=string.

Examples: %9.0g , %9.2f, %10s

3.5Recoding

recode varlist (rule) (rule), gen(varlist) copysyntax

recode x (1 2=1 low) (3 4=2 high)(missing=.), gen(x2)recode 1 and 2 into 1, 3 and 4 into 2 give labels and generate new x2

recode x(1=2) if sex==1), gen(x2) copycopy values for sex!=1

egen ageGr3=cut(age), group(3) label3 equal sized groups

egen ageGr2=cut(age), at(0,50,80) label2 groups 0-50, 50-80, values outside set to missing

encode stringvar, generate(newvar)make numerical newvar (1,2,3…) based on stringvar values

3.6Generate, replace

generate newvar=expcreate new variable

replace oldvar=exp

gen agegr=age>=30 if age!=.missing values are greater than all numerical values

gen xlag=x[_n-1]

gen xlead=x[_n+1]

3.7Extended generate

egen [type] newvar = fcn(arguments) [if exp] [in range] [, options]

egen newvar=fcn(arg)extended generate: make newvar from stored functions.

Ex: by code, sort: egen mx=median(x) gives medians of x by values of code

by ... : may be used with some egen functions

3.7.1Functions

count(exp)number of nonmissing observations of exp.

cut(varname), {at(#,#,...,#)|group(#)}cut at the at() numbers, or in equal groups

mean, median, max, min, std, sum

pctile(exp) [, p(#)] percentiles

group(var1 var2)new var from all combinations of var1 and var2

rmiss

3.8Drop, keep

drop varnamesdrop variables from memory

drop in 3drop observation 3

keep var1-var5keep variables 1 to 5. OBS Keep if age==10 will also keep missing.

drop if age==.Remove missing

3.9Missing

.numerical missing

“”string missing

missing(x)is eqv to x==. if x is numeric, is equv to x==”” is x is string

missing values are greater than all numerical values and are sorted last, age>=30 will include missing.

gen agegr=age>=30 if age!=.

drop if age==.Remove missing

mvdecode x1, mv(99) set 99 to missing

mvencode x1, mv(.=99) set missing to 99

3.10Sort

sort varnamesort by variable. Use before “by var:” command

3.11String commands

fname+” “+lnamestring concatination

substr(name,1,10)

See U 16.3.5

3.12Aggregate

contract vars, freq(fname) percent(pname)contract (aggregate) over variable patterns to freq and percents

collapse varscollapse data to means (or other ststs) over variable patterns

3.13Accessing results from commands

3.13.1System variables

_b[varname]regression coef

_b[cons]intercept

_se[varname]SE of regression coef

_ncurrent observation

_Ntotal number of obs

_pipi

Ex: regress y x, _b[_cons] gives constatnt term, _b[x[1]] gives coeff of first category of x, _se[x[1]] gives stand error

Ex: xi:regres y I.x, _b[_Ix_2] gives coef of second level of x (created dummy called _Ix_2)

3.13.2Saved results

return listrun after a command to find list of saved results

ereturn listrun after a command to find list of estimated saved results

e(name)estimation class, live until next estimation

r(name)result class, live until next command

Ex: summarize age, gen agedev=age-r(mean)

Ex: regress y x1 x2, matrix B=e(b), matrix corr=e(V)save coeff and corr matrices

3.13.3Accessing results from commands, save as macros

sum w if c==1mean of w for c=1

global w1=r(mean)save as global macro

dis $w1show content of macro

4Uni- and bivariate

4.1List

list varlist [, [no]display nolabels]list variables, nodisplay gives tabular data, nolables gives values

list varname-i – varname-jList a group of variables

list in 33. Observation, -1=last, 1/10 = 1 to 10

list if explist if var>10, list if var==10

list var1 if var2==.List if var2 is missing

4.2Tabulate

4.2.1One-way tables

tabulate var [weight][if expr][in range][,nofreq plot missing nolable]nolable shows category values

tab1 varlistone way tables for all variables

tab c, gen(c)create dummies c1, c2,.. for each category of c

4.2.2Two-way tables

tab var1 var2 [weight][if expr][in range][,nofreq col row cells chi2 exact missing nolabel]

tab var1 var2 , nofreq col chicrosstab column % no freq with chi-square test

tab var1 var2 ,exactFisher exact test

tabi 30 20 \ 20 10, col chi2immediate table

tab var1 var2, summarize(var3)mean, sd and freq of var3 by var1 and var2. Use mean standard or freq to limit out

4.2.3Three-way tables

sort var3

by var3: tab var1 var2

4.3Table of summary statistics

table rowvar [colvar [supercolvar]] [if] [in] [weight] [, options]

table rowvar, contents(clist) row colclist:freq, mean, sd, sum, n, max, min, median, p# (percentile),iqr. Totals: row col. Show missing: missing

table rowvar colvar supercolvar by superrowvarlistmulti way tables

Ex: table sex, c(n age mean age mean educ) rowsubjects, mean age and mean educ by sex , plus total row

tabstat varlist [if] [in] [weight] [, options]

epitab

4.4Means and confidence intervals

means varlist3 types of means with ci

ci varlist, binomial poisson totalci for means, proportions and counts

4.5Summarize

summarize varsnumber, mean, sd, min, max. Summarize alone takes all variables.

summarize vars ,detailpercentiles, var, skew, kurt

inspect vardetails on values

4.6T-test

ttest var=#one sample T-test

ttest var, by(c)two sample T-test

ttest var1=var2paired two sample T-test

ttest var1=var2, unpairedtwo sample T-test

,unequalequal variances not assumed

Ex: sdtest age, by(sex) (equal var rejected) ttest age, by(sex) unequal

4.6.1Test of equal variance (standard deviation)

sdtest var=#standard deviation=#

sdtest var, by(c)two groups compared

sdtest var1=var2same variance in both variables

4.6.2One way anova

oneway response_var factor_var [weight] [if exp] [in range] [, noanova nolabel missing wrap tabulate

[no]means [no]standard [no]freq [no]obs bonferroni scheffe sidak ]

Ex: oneway var c, tabulateanalysis of var by c

4.7Non-parametric analysis

by gender, sort: centile partners, centile(25 50 75) cci percentiles with exact confidence interval

ranksum partners, by(gender)Mann-Whitney test=Wilcoxon rank sum, 2 group

kwallis partners, by(age3)Kruskal Wallis K-group test

4.8Proportions

proportions x1,over(c)proportions with ci

5Graphics

5.1Plot types

graph twowayscatter, line, density, histogram, function,..

graph matrix

graph bar, hbar, dot

graph box

graph pie

5.2Graph Twoway

5.2.1Twoway syntax

graph twoway plot [if exp] [in range] [, options]twoway syntax (graph may be omitted)

where plot=(plottype varlist, options)plot syntax, several plots may be listed and combined

where varlist= y1 y2 … xlats variable is x

Ex: twoway scatter y xplot y by x

5.2.2Twoway plot types

scatter, line, connected, area

dot, bar, histogram, kdensitykernal desity

function y=f(x),range( x1 x2)f(x) from x1 to x2

rarea rcap rbarrange area, range cap, range bar ,

Ex: twoway area y x , sort base(50) gives shading from 50

Ex: Histogram, bin(10) start(-2.5) percent/frequency

Ex: twoway (histogram x, width(1) frequency) (kdensity x, area(3200))area scaled to the sum of subjects

Ex: function y=normden(x), range(-4 4) droplines(-1.96 1.96)function plots

Ex: twoway dropline db id if abs(db>.25) , mlabel(id)deltabeta >0.25

5.2.3Twoway fitlines

lfit, qfit, mband, mspline,lowesslinear and quadratic fits, median band, median splines and lowess

lfitci, qfitci, fpfitcifit with CI: linear, quadratic, fractional polynom

Ex: (lfitci y x, ciplot(rline)) default is rarea

Ex.: twoway (lfit y x) (lowess y x) (scatter y x)scatter with linear and lowess fit

5.3Graph Bar, Hbar and Dot

5.3.1Syntax

graph bar/hbar/dot yvars [if exp] [in range] [, options]

Where yvars=varlist, or =(stat) varlist, or= (stat) name=varname

stat= mean, median, p1, p2, p99, sum, count, min, max

Ex: graph bar x ,over(c) nofillmeans of x over categories of c

Ex: graph bar (mean) meany=x (median) medy=xmean and median of the same variable

Ex: graph bar (median) x1 x2 , percent stackstacked percentages

5.3.2Options

nofillskip empty categories

sort(1)sort by 1 variable

over(c1)values for each c1

by(c2)separete plots for each c2

bargap(0)% overlap, -30=30% overlap, 30=gap.

blabel(what,where_and_how)bar labels

what: bar/ total/ name/ groupprint height, total height, name of yvar, name of first over() group

Where_and_how:

position(outside/ inside/base/center)where to lpace the bar label

format(%9.1f) gap(rel_size) textbox_optionsoptions for labels

Ex: graph bar teq1 ,over(landsdel) nofill blabel(bar, pos(inside) size(*1.3) format(%9.1f) color(white))

Ex: graph hbar teq1 ,over(landsdel,axis(off) sort(1))nofill blabel(group, pos(base) size(*1.3) format(%9.1f) color(white))

5.4Graph Box, Hbox

graph box x1 x2 x3, ascategoryboxplot of separate cariables, ascat puts labels on the y-axis

graph hbox x, over(c, total)plot of x over cat of c plus total

5.5Graph Pie

graph pie x1 x2 x3sum of x1, x2 and x3

graph pie x ,over(c)sum of x for each category of c

graph pie ,over(c)number of cases for each category of c

5.5.1Options

plabel(_all sum/ percent/ name/ text, text_box_options)label all slices with sum, percent, x-names or a given text

5.6Graph Matrix

graph matrix x1-x5scatter of all 5 variables

5.7Other graphs

gladder y, qladder yhistograms over different transformations of y, QQ plot of the same

5.8Titles

title(“text”), xtitle(“text”), ytitle(“text”)titles

title, subtitle, captition, notetitle types

5.8.1Title options

position(clockpos)

ring(ringpos)

span

text_box_options

Ex: scatter teq1 moralder, title("Title", position(12) ring(0))

5.9Legend

legend([contents] [location])

Contetnts:

order(1 2 3)may also use order(1-“label1” 2 3)

label(1 “label1”)override legend for var 1

cols(1)legend in 1 column. Row(1) …

stackstack symbol and text

rowgap(2) colgap(2)gap between each element

Location:

on/offlegend on/off

position(clock)position of legend

ring(1)radial distance from plot, ring(0)=inside

Ex: legend(label(1 "Density of TEQ") label(2 "Mean") label(3 "Median") ring(0) pos(2) cols(1))

Ex: graph bar teq_di teq_fu teq_npcb teq_mopc teq_hcb ///

, legend(row(1) stack colgap(10) label(1 "Dioxin") label(2 "Furan") label(3 "Non-o") label(4 "Mono") label(5 "HCB"))

5.10Axis scale, label, ticks and grid

5.10.1Axix title

x|ytitle(“line1” “line2”)

5.10.2Axis scale

x|yscale(opts)

Options:

axis(1)axis to modify (1-9)

[no]log

[no]reverse

range(0 100)extend range, will not decrease range. range(0): start at 0, range(100): end at 100

altaxis at alternative side

on/offaxis on/off

Ex: scatter teq1 moralder,xscale(range(0 80)) yscale(off)no y-axis

5.10.3Axis labels and ticks

x|ylabel(rule_or_values,opts)major ticks and labels

x|ytick(rule_or_values)major ticks

x|ymlabel(rule_or_values)minor ticks and labels

x|ymtick(rule_or_values)minor ticks

rule or values (may use both):

#1010 nice values

1 5 50labels at 1, 5 and 50

0 5 10 “mean” 15 20labels every 5, with mean printed at 10

0 (10) 100labels from 0 to 100 in steps of 10

minmaxmin and max values

none

Label options:

angle(0)

[no]gridadd gridlines

format(%5.0f)5 places, o decimals, fixed

Ex: xlabel(1 “Low” 2 “Medium” 3”High”,angle(45))text labels at values 1 2 and 3, at 45 deg

Ex: scatter teq1 moralder,xlabel(#10,grid)

5.11Text

text(y x “text”, opts)text at y,x in the plot

placement(c )c=centered, n=north, s=south, ..

orientation(vertical)

boxdraw box around text

Ex: graph …, text(10 50 “Line1” “Line2”, just(left) color(blue) )two lines of text at (y,x)=(10,50)

5.12Markers and marker labels

5.12.1Markers

mstyle(p1 p2 )default styles

msymbol(sym1 sym2 …)marker, Square, square(small), Sh (hollow), Square, Diamond, Triange, O circle, X , +, p point, . default, i invisible. Ex msymbol(S)

msize(small medium large), msize(*2)small meduin large markers, twize the size

mcolor(green)both outside and inside color

Ex msymbol(. t Oh)markers for 3 variables: default, small triagles and hollow circles

Ex twoway scatter y x [aweight=z], msymbol(oh) msize(small)point size prop to z

5.12.2Marker labels

mlabel(var)label marker by var content

mlabsize(size)

mlabcolor(color)

mlabelpos(12)label at 12 o’clock position

mlabvposition(var)postitions based on variable containing clock positions

mlabgap(*3)3 times larger gap between marker and label

Ex scatter y x, mlabel(z) mlabpos(center) msymbol(i) use contents of z to label points, labels in the center and invisible points

5.13Lines

5.13.1Connecting points

Twoway scatter y x, connect(l) sortsort points, connect with line

connect(l)line

connect(L)separate line for each series

connects(J stepstair)for survival curves

5.13.2Line options

lcolor(red)line color

lwidth(thick) or lwidth(*3)thick line

lpattern(dash)

lpattern(“l” “.-“ “-###”)solid, dotdashed, dash+3 spaces

5.14Text box options

tsstyle(textboxstyle)overall style

box/noboxborder

size(textsizestyle)

color(colorstyle)text color

justification(justificationstyle)text left, center, right

alignment(alignmentsyle)text top, middle, bottom, baseline

bfcolor(colorstyle)background color

bcolor(colorstyle)background and border color

blstyle(linestyle)style of border

orientation(orientationstyle)vertical/horizontal, rvertical/rhorizontal

placement(compassdirstyle)location

ring(1)0:inside, 1-7 outside

format(%9.1f)9 places, 1 desimal, fixed

Ex: graph…,title(“My title”, color(red) box size(*1.5))

5.15Other options

5.15.1Colors

black, white, red, blue, cyan, green, mint, yellow….

gs0… gs16gray scales from black to white

gray=gs8

color*0.5half the intensity

5.15.2Positions

clockpos(12)12 o’clock. clockpos(0) means center if valid

placement(north)alternative to clock with 9 positions

ring(1)0:inside, 1-7 outside

justification(left/ centered/ right)text justification

alignment(top/ middle/ bottom/ baseline)text alignement

orientation(horizontal/ vertical/rhorizontal/ rvertical)

5.16Over()

over(c, total)split by categories of c plus total, can use over(c1) over(c2)

over(c, descending)sort values.

over(c, sort(c2)), sort(1)sort by c2 or by the first y variable

over(var, relabel(1 “lab1” “lab2”))new labels for ”over” variable

ascategory / asyvarsas categories: plotted with spaces, as yvars: plotted dense

missing, nofillshow missing, do not show empty combinations

Ex: graph bar teq_di teq_fu ,over(landsdel, total) nofill

5.17By()

by(varlist, suboptions)separate graphs for each varlist

totaladd total group

missingadd missing groups

colfirstdisplay down columns

rows(#), cols(#)number of rows or cols

holes(numlist)positions to leave blank

compact

Ex: graph bar teq_di teq_fu ,by(star, total rows(1) compact)

5.18Schemes

set scheme(schemename) [,permanently]set overall look of graphs

graph …, scheme(schemename)set overall look for current graph

graph query, schemeslist installed schemes

schemenames:

s2colorDefault, will vary colors of lines and markers

s2monomonocrome, will vary patterns of lines and markers

5.19Combinding graphs

graph …., saving(plt1,replace) or name(plt1)saving to file

graph …., name(plt1,replace)saving to memory

graph use plt1.gph or display plt1.gphshow saved graph from file

graph combine plt1 plt2, ycommon cols(1)combine from memory in 1 row with same y scaling

graph combine plt1.gph plt2.gphcombine from file

5.20Graph query

graph querylist of all styletypes

graph query colorlist of all colorstyles

graph query linepatternlist of all linepatternstyles

5.21Palettes

palette lineplot showing the linetypes

palette symbolplot showing the symboltypes

palette color1 color2plot comparing colors

6Regression commands

6.1Regression models

6.1.1Linear regression with simple error structure

regresslinear regression (also heteroschedastic errors)

boxcoxlinear regression on BoxCox transformations of y and x’s

nlnon linear least squares

6.1.2GLM

logisticlogistic regression

poissonPoisson regression

binregbinary outcome, OR, RR, or RD effect measures

glmuse for non-canonical links

6.1.3Conditional logistc

clogitfor matched case-control data

6.1.4Multiple outcome

mlogitmultinomial logit (not ordered)

ologitordered logit

6.1.5Linear regression with complex error structure