Outline of Material for STATA Tuorial

LEARNING STATA

KEY COMMANDS

1) From the Start menu, load the STATA 9 program (intercooled Stata 9)

2) use \statafiles\caschool.dta

Datasets in STATA are called dta files. Create a folder called statafiles in the C-drive and download the dataset caschool.dta from class web site http://www.econ.iastate.edu/classes/econ371/McPhail/lab.html to this folder. The above command then loads the dataset into STATA memory.

3) describe

This command tells STATA to “describe” the dataset. This command produces a list of the variable names and any variable descriptions stored in the dataset.

4) generate income = avginc*1000.

The command tells STATA to create a new variable called income. The new variable is constructed by multiplying the variable avginc by 1000. The variable avginc is contained in the dataset and is the average household income in a school district expressed in thousands of dollars. The new variable income will be the average household income expressed in dollars instead of thousands of dollars.

5) summarize income

This command tells STATA to compute some summary statistics (mean, standard deviations, and so forth) for income.

6) clear

This command erases any data already in STATA’s memory.

7) correlate str testscr

Two of the variables in the dataset are testscr (the average test score in a school district) and str (the district’s average class size or student-teacher ratio). The above command tells STATA to compute the correlation between str and testscr.

8) scatter testscr str

This command generates a scatter plot of testscr versus str.

9) regress testscr str

This command tells STATA to run an OLS regression with testscr as the dependent variable and str as the explanatory variable. Note that the first variable appearing after the regress command is always the dependent variable to be explained, while the variables following it will be the explanatory variables.

HYPOTHESIS TESTING

We are now interested in constructing tests and confidence intervals for the mean of a population or the difference between the means of two different populations.

10) ttest testscr=0

This command computes the sample mean and standard deviation of the variable testscr, computes a t-test that the population mean is equal to zero, and computes a 95% confidence interval for the population mean.

11) generate testscr_lo = testscr if (str<20)

generate testscr_hi = testscr if (str>=20)

ttest testscr_lo = testscr_hi, unequal unpaired

To test hypotheses regarding the difference between the means of different populations, we first need to define two new variables – testscr_lo, which relates to the test scores for students in districts that have an average class size of less than twenty students and testscr_hi, which gives the test scores for students in districts having an average class size of twenty students or greater.

The command ttest testscr_lo = testscr_hi, unequal unpaired, is then used to test the hypothesis that testscr_lo and testscr_hi come from populations with the same mean. That is, the command computes the t-statistic for the null hypothesis that the mean of test scores for districts with class sizes less than twenty students is the same as the mean of test scores for districts with class sizes equal to or greater than twenty students.

12) generate d = (str<20);

regress testscr d, robust;

The first line creates the binary variable d using the command generate. The variable d takes a value of 1 if the expression in parentheses is true (that is, when str < 20) and is equal to 0 if the expression is false.

The second line indicates the command that STATA uses to run an OLS regression with testscr as the dependent variable and str as the explanatory variable. The option robust tells STATA to use heteroskedasticity robust formulas for the standard errors of the regression coefficient estimators.

USING DO AND LOG FILES

LOG FILES

To store the results of your subsequent commands, you can create “log” files. To do this,

simply type in log using C:\statafiles\stata1.log to create a log file titled stata1.log. To view this file, click on “File”, then on “Log”, then on “View” and finally on “Browse”. You must close your log file before you exit STATA. If you don’t, all the results will be lost. To close the log file, type log close.

DO FILES

The problem with the above outlined “interactive approach” is that it is cumbersome to type out the commands over and over again every time you need to use them. It can also be difficult to fix errors when they occur. These problems can be remedied through the use of “do” files. To create such files, go to the do-file editor on the toolbar, type in your commands and then save the file. Suppose the saved file is stata1.do. To execute the file,

click on “File”, then on “Do” and finally pick the required file. The program will then be executed. If errors occur, the error messages will appear in red. Clear STATA of the variables, fix the error and re-execute the file.