Chapter 1-10. Programming Stata
In this chapter, we will see how to write programs in Stata.
These programs are typically saved as “ado” files. An “ado” file, is simply a file of Stata commands saved with the “ado” file extension and contains “end” on the last line of the file.
Since all of the commands in Stata are implemented as an “ado” file, a good source for example Stata code is to think of a command that does something similar to what you want to do, and then go look at the Stata code for that command.
Viewing (but cannot edit) an ado-file
This is done with the viewsource command. For example, to see how the ttest command was written, open up the ado file in a read-only editor using,
viewsource ttest.adoAfter the file is open, you can highlight it, and cut-and-paste it into the do-file editor so you have the sample code available to you when writing your own programs.
Viewing (but cannot edit) a help file
This is a very nice application of the viewsource command, because it displays how the special markup features of the help file were set up, so you can do the same thing in your own help files. For example, to see Stata’s template for help files, which was designed to you started with developing your own help files to look like official Stata help files, use
viewsource examplehelpfile.hlpTo see what the file looks like when it is executed, use
help examplehelpfileFinding where an ado file is
If you are curious where a particular do-file is stored on your computer, you can do this using the findfile command.
findfile ttest.adoC:\Program Files\Stata9\ado\base/t/ttest.ado
______
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah School of Medicine, 2010.
Finding the directories where Stata looks for ado files
To see the order in which Stata searches directories when a command is executed, use
adopath[1] (UPDATES) "C:\Program Files\Stata9\ado\updates/"
[2] (BASE) "C:\Program Files\Stata9\ado\base/"
[3] (SITE) "C:\Program Files\Stata9\ado\site/"
[4] "."
[5] (PERSONAL) "c:\ado\personal/"
[6] (PLUS) "c:\ado\plus/"
[7] (OLDPLACE) "c:\ado/"
The “.” directory is the “current directory, shown in the lower left-hand corner of the Stata window. Usually you would store your own commands in the PERSONAL directory, which is supposed to not be overwritten when you install a new version of Stata.
Smart Quotes
By the way, the smart quotes, “ ” , that Microsoft Word uses, cannot be interpreted by Stata.
So, if you cut-and-paste the following to the command window,
display “stuff”you get the following error message:
“stuff” invalid name
r(198);
A way to get around this, is to copy it into the do-file editor. The do-file editor changes it into regular quotes, which looks like,
display "stuff"That command can then be executed inside the do-file editor, or cut-and-pasted in the Command window to be executed.
Executing do files from the command line
First, decide on the directory where you want to save the do-file to, and change to that directory. If you put it on your desktop, the directory might be something like the following, with “Greg” replaced by your username.
cd "C:\Documents and Settings\Greg\Desktop\StataCourse\practice"Now, put the commands you want to run as a batch in a do-file.
For example, click on the do-file menu bar icon, which brings up a new do-file. Type the following,
display “Hey, it worked!”Then, do a “save as” to the file name
program1.do
saving it to the current directory you “cd” to above.
Now, in the Command window, execute the command,
do program1which executes all of the commands in do-file, and then returns control to the Command window.
This do-file, program1.do, is a simple program.
It does not mimic a “command”, however, because it requires that you put “do” in front of the do-file name in order to execute it.
Converting your do-file into an ado-file
To turn a do-file into an ado-file, you simply add a “program define” on the first line and an “end” on the last line.
Open the file program1.do inside the do-file editor, and change it to:
program define amazingdisplay "Hey, it worked!"
end
The indention on the second line (or all lines between program and end) is not necessary, but it helps to remind you that you are inside of the program-end combination.
Save it as amazing.ado, instead of program1.do.
On the command line, enter
amazingHey, it worked!
You have just extended Stata to include a new command called amazing.
Adding some color
We can get the display command to output in different colors, similar to what Stata does.
text = green
result = yellow
error = red
input = white
Open the file amazing.ado inside the do-file editor, and change it to:
program define amazingdisplay as text "Hey, it worked!"
display as result "Hey, it worked!"
display as error "Hey, it worked!"
display as input "Hey, it worked!"
end
On the command line, enter
amazingHey, it worked!
Even though we made a change, the older version is still executing. This is because Stata loads programs in memory, and continues to execute the original version stored in Stata memory, even though the file amazing.ado has changed on the hard drive.
It is necessary to drop a program from memory, using the program drop command, before we change it.
Dropping a program from memory
What I like to do is add that as the first line of my ado file, just to avoid this step every time I make a change. Once the program is fully developed, you can drop that command to avoid a user dropping his own program by the same name already in memory.
Open the file amazing.ado inside the do-file editor, and add theh program drop command on the first line. Precede it by “capture” so it runs even if the program is not already loaded in memory.
capture program drop amazingprogram define amazing
display as text "Hey, it worked!"
display as result "Hey, it worked!"
display as error "Hey, it worked!"
display as input "Hey, it worked!"
end
On the command line, enter
program drop amazingWe have to first drop it from memory, if it is there, using the Command window, since if we just run the command amazing, the old version loaded in memory continues to run.
Now if we run the program again, it finds it on the hard drive and runs our updated version
amazingHey, it worked!
Hey, it worked!
Hey, it worked!
Hey, it worked!
Running a program inside a do-file
Sometimes it is nicer to just define the program inside a do-file, and then execute it inside the do-file. One advantage is that the program code is displayed right there where we run, which is nice documentation. Another advantage is debugging is faster because we don’t have to keep going back and forth between the do-file and the command line.
Let’s try it.
With the file amazing.ado in the do-file editor, save it as program2.do. (not .ado).
Next add a few blank lines and then put amazing as a command to call the program. (These are in chapter10.do)
capture program drop amazingprogram define amazing
display as text "Hey, it worked!"
display as result "Hey, it worked!"
display as error "Hey, it worked!"
display as input "Hey, it worked!"
end
amazing
Highlight the entire do-file and hit the “do current file” icon (third icon from the right) inside the do-file editor to execute it.
It executes as expected.
Doing it this way, the program is loaded into Stata memory and is available for the entire Stata session, unless you drop it. The whole step of making it an ado-file is avoided. Sometimes this is nice, and sometimes it is easier to use an ado-file so it’s avialable instantly for all your projects.
Writing a program to optimize test characteristics
We are now going to work through a rather complicated example for a very practical problem.
It is a somewhat common research problem, or quality improvement problem, to determine the optimal cut-point for a continuous (interval scaled) diagnostic test variable to provide the best test characteristics (see box), such as sensitivity and specificity.
For example, Carpenter et al (1995) did this to discover that 60% or greater carotid artery stenosis by duplex Doppler ultrasonography provided the best test characteristics when compared to the gold standard arteriography.
Test Characteristics
With the data in the required form for Stata:
Test “probable value”Gold Standard “true value” / disease present ( + ) / disease absent ( - )
disease present ( + ) / a (true positives) / b (false negatives) / a + b
disease absent ( - ) / c (false positives) / d (true negatives) / c + d
a + c / b + d
We define the following terminology (Lilienfeld, 1994, p. 118-124), expressed as percents:
sensitivity = (true positives)/(true positives plus false negatives)
= (true positives)/(all those with the disease)
= a / (a + b) ´100
specificity = (true negatives)/(true negatives plus false positives)
= (true negatives)/(all those without the disease)
= d / (c + d) ´ 100
Sensitivity and specificity provide information about the accuracy (validity) of a test. Positive and negative predictive values provide information about the meaning to the test results.
The probability of disease being present given a positive test result is the positive predictive value (Lilienfeld, 1994, p. 118-124):
positive predictive value = (true positives)/(true positives plus false positives)
= (true positives)/(all those with a positive test result)
= a / (a + c) ´ 100
The probability of no disease being present given a negative test result is the negative predictive value (Lilienfeld, 1994, p. 118-124):
negative predictive value = (true negatives)/(true negatives plus false negatives)
= (true negatives)/(all those with a negative test result)
= d / (b + d) ´ 100
“Unlike sensitivity and specificity, the positive and negative predictive values of a test depend on the prevalence rate of disease in the population. …For a test of given sensitivity and specificity, the higher the prevalence of the disease, the greater the positive predictive value and the lower the negative predictive value.” (Lilienfeld, 1994, p. 122-123)
The overall accuracy, or simply accuracy, is simply the proportion of correct test decisions, and is defined as (without citation for now, Stoddard just knows this)
overall accuracy = (true postives plus true negative)/(all tests)
= (a + d)/(a + b + c + d)
The area under the receiver operating characteristic curve, or simply ROC, for a dichotomous test and gold standard variable, or 2 × 2 table, is simply the simple average of the sensitivity and specificity (without citation for now, Stoddard just knows this)
ROC = (sensitivity + specificity)/2
We will practice will the AngioData.dta file (see box)
AngioData.dta dataset
This file contains n=172 deindentified pairs of measurements provided by an anomonous researcher, with two continuous scored measurements of carotid artery stenosis.
angio Gold Standard: arteriography (arteriographic stenosis)
icapsv Diagnostic Test: internal carotid artery peak systolic velocity (PSVICA)
AngioData
Opening the data file, which is already in the the StataCourse\practice subdirectory,
use angiodata, clearFirst, we will dichotomize the angio variable into 60% or greater carotid artery stenosis.
recode angio 0/59=0 60/100=1 .=., gen(gold)For a first guess at a cutpoint for icapsv, we will use the mean
sum icapsvVariable | Obs Mean Std. Dev. Min Max
------+------
icapsv | 172 172.5407 141.6103 0 575
Defining a dichotomized icapsv variable,
gen test = cond(icapsv>=172,1,0)replace test=. if icapsv==.
and adding variable labels,
label variable gold "angio"label variable test "icapsv"
To calculate the diagnostic test characteristics, we must first update Stata to include the diagt command. While connected to the internet,
findit diagt------
search for diagt (manual: [R] search)
------
Keywords: diagt
Search: (1) Official help files, FAQs, Examples, SJs, and STBs
(2) Web resources from Stata and from other users
Search of official help files, FAQs, Examples, SJs, and STBs
SJ-4-4 sbe36_2 ...... Software update for diagt
(help diagt if installed) ...... P. T. Seed and A. Tobias
Q4/04 SJ 4(4):490
new options added to diagt
STB-59 sbe36.1 ...... Summary statistics for diagnostic tests
(help diagt if installed) ...... P. T. Seed and A. Tobias
1/01 pp.9--12; STB Reprints Vol 10, pp.90--93
complete revision of diagtest to assess a simple diagnostic
test in comparison with a reference standard; uses the exact
binomial distribution and provides diagti, an immediate
version of the command
Click on the sbe36_2 link to install the ado file.
If you are not connected to the internet, that is okay. The four files it adds (diagt.ado, diagt.hlp, diagti.ado, diagti.hlp) are in the StataCourse\practice subdirectory, which is now your current directory, so Stata will find these commands.
Computing the test characteristics,
| icapsv
angio | Pos. Neg. | Total
------+------+------
Abnormal | 44 12 | 56
Normal | 16 100 | 116
------+------+------
Total | 60 112 | 172
True abnormal diagnosis defined as gold = 1
[95% Confidence Interval]
------
Prevalence Pr(A) 33% 26% 40.1%
------
Sensitivity Pr(+|A) 78.6% 65.6% 88.4%
Specificity Pr(-|N) 86.2% 78.6% 91.9%
ROC area (Sens. + Spec.)/2 .824 .761 .887
------
Likelihood ratio (+) Pr(+|A)/Pr(+|N) 5.7 3.54 9.16
Likelihood ratio (-) Pr(-|A)/Pr(-|N) .249 .15 .413
Odds ratio LR(+)/LR(-) 22.9 10.1 52.1
Positive predictive value Pr(A|+) 73.3% 60.3% 83.9%
Negative predictive value Pr(N|-) 89.3% 82% 94.3%
------
This looks like a pretty good guess for a cutpoint for icapsv.
To do this for every possible cutpoint for icapsv, we could simply put the commands inside a loop. Let’s begin to build a program inside the do-file, and run it for the first three values of icapsv.
capture program drop optcutprogram define optcut
foreach num of numlist 0 21 36 {
capture drop test
gen test = cond(icapsv>`num’,1,0)
replace test=. if icapsv==.
diagt gold test
}
end
optcut
That worked, but let’s turn scrolling off.