Logic Argument of Research Article

Chapter 1-10. Programming Stata

In this chapter, we will see how to write programs in Stata.

These programs are typically saved as “ado” files. An “ado” file, is simply a file of Stata commands saved with the “ado” file extension and contains “end” on the last line of the file.

Since all of the commands in Stata are implemented as an “ado” file, a good source for example Stata code is to think of a command that does something similar to what you want to do, and then go look at the Stata code for that command.

Viewing (but cannot edit) an ado-file

This is done with the viewsource command. For example, to see how the ttest command was written, open up the ado file in a read-only editor using,

viewsource ttest.ado

After the file is open, you can highlight it, and cut-and-paste it into the do-file editor so you have the sample code available to you when writing your own programs.

Viewing (but cannot edit) a help file

This is a very nice application of the viewsource command, because it displays how the special markup features of the help file were set up, so you can do the same thing in your own help files. For example, to see Stata’s template for help files, which was designed to you started with developing your own help files to look like official Stata help files, use

viewsource examplehelpfile.hlp

To see what the file looks like when it is executed, use

help examplehelpfile

Finding where an ado file is

If you are curious where a particular do-file is stored on your computer, you can do this using the findfile command.

findfile ttest.ado

C:\Program Files\Stata9\ado\base/t/ttest.ado

______

Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah School of Medicine, 2010.

Finding the directories where Stata looks for ado files

To see the order in which Stata searches directories when a command is executed, use

adopath

[1] (UPDATES) "C:\Program Files\Stata9\ado\updates/"

[2] (BASE) "C:\Program Files\Stata9\ado\base/"

[3] (SITE) "C:\Program Files\Stata9\ado\site/"

[4] "."

[5] (PERSONAL) "c:\ado\personal/"

[6] (PLUS) "c:\ado\plus/"

[7] (OLDPLACE) "c:\ado/"

The “.” directory is the “current directory, shown in the lower left-hand corner of the Stata window. Usually you would store your own commands in the PERSONAL directory, which is supposed to not be overwritten when you install a new version of Stata.

Smart Quotes

By the way, the smart quotes, “ ” , that Microsoft Word uses, cannot be interpreted by Stata.

So, if you cut-and-paste the following to the command window,

display “stuff”

you get the following error message:

“stuff” invalid name

r(198);

A way to get around this, is to copy it into the do-file editor. The do-file editor changes it into regular quotes, which looks like,

display "stuff"

That command can then be executed inside the do-file editor, or cut-and-pasted in the Command window to be executed.

Executing do files from the command line

First, decide on the directory where you want to save the do-file to, and change to that directory. If you put it on your desktop, the directory might be something like the following, with “Greg” replaced by your username.

cd "C:\Documents and Settings\Greg\Desktop\StataCourse\practice"

Now, put the commands you want to run as a batch in a do-file.

For example, click on the do-file menu bar icon, which brings up a new do-file. Type the following,

display “Hey, it worked!”

Then, do a “save as” to the file name

program1.do

saving it to the current directory you “cd” to above.

Now, in the Command window, execute the command,

do program1

which executes all of the commands in do-file, and then returns control to the Command window.

This do-file, program1.do, is a simple program.

It does not mimic a “command”, however, because it requires that you put “do” in front of the do-file name in order to execute it.

Converting your do-file into an ado-file

To turn a do-file into an ado-file, you simply add a “program define” on the first line and an “end” on the last line.

Open the file program1.do inside the do-file editor, and change it to:

program define amazing
display "Hey, it worked!"
end

The indention on the second line (or all lines between program and end) is not necessary, but it helps to remind you that you are inside of the program-end combination.

Save it as amazing.ado, instead of program1.do.

On the command line, enter

amazing

Hey, it worked!

You have just extended Stata to include a new command called amazing.

Adding some color

We can get the display command to output in different colors, similar to what Stata does.

text = green

result = yellow

error = red

input = white

Open the file amazing.ado inside the do-file editor, and change it to:

program define amazing
display as text "Hey, it worked!"
display as result "Hey, it worked!"
display as error "Hey, it worked!"
display as input "Hey, it worked!"
end

On the command line, enter

amazing

Hey, it worked!

Even though we made a change, the older version is still executing. This is because Stata loads programs in memory, and continues to execute the original version stored in Stata memory, even though the file amazing.ado has changed on the hard drive.

It is necessary to drop a program from memory, using the program drop command, before we change it.

Dropping a program from memory

What I like to do is add that as the first line of my ado file, just to avoid this step every time I make a change. Once the program is fully developed, you can drop that command to avoid a user dropping his own program by the same name already in memory.

Open the file amazing.ado inside the do-file editor, and add theh program drop command on the first line. Precede it by “capture” so it runs even if the program is not already loaded in memory.

capture program drop amazing
program define amazing
display as text "Hey, it worked!"
display as result "Hey, it worked!"
display as error "Hey, it worked!"
display as input "Hey, it worked!"
end

On the command line, enter

program drop amazing

We have to first drop it from memory, if it is there, using the Command window, since if we just run the command amazing, the old version loaded in memory continues to run.

Now if we run the program again, it finds it on the hard drive and runs our updated version

amazing
Hey, it worked!
Hey, it worked!
Hey, it worked!
Hey, it worked!

Running a program inside a do-file

Sometimes it is nicer to just define the program inside a do-file, and then execute it inside the do-file. One advantage is that the program code is displayed right there where we run, which is nice documentation. Another advantage is debugging is faster because we don’t have to keep going back and forth between the do-file and the command line.

Let’s try it.

With the file amazing.ado in the do-file editor, save it as program2.do. (not .ado).

Next add a few blank lines and then put amazing as a command to call the program. (These are in chapter10.do)

capture program drop amazing
program define amazing
display as text "Hey, it worked!"
display as result "Hey, it worked!"
display as error "Hey, it worked!"
display as input "Hey, it worked!"
end
amazing

Highlight the entire do-file and hit the “do current file” icon (third icon from the right) inside the do-file editor to execute it.

It executes as expected.

Doing it this way, the program is loaded into Stata memory and is available for the entire Stata session, unless you drop it. The whole step of making it an ado-file is avoided. Sometimes this is nice, and sometimes it is easier to use an ado-file so it’s avialable instantly for all your projects.

Writing a program to optimize test characteristics

We are now going to work through a rather complicated example for a very practical problem.

It is a somewhat common research problem, or quality improvement problem, to determine the optimal cut-point for a continuous (interval scaled) diagnostic test variable to provide the best test characteristics (see box), such as sensitivity and specificity.

For example, Carpenter et al (1995) did this to discover that 60% or greater carotid artery stenosis by duplex Doppler ultrasonography provided the best test characteristics when compared to the gold standard arteriography.

Test Characteristics

With the data in the required form for Stata:

Test “probable value”
Gold Standard “true value” / disease present ( + ) / disease absent ( - )
disease present ( + ) / a (true positives) / b (false negatives) / a + b
disease absent ( - ) / c (false positives) / d (true negatives) / c + d
a + c / b + d

We define the following terminology (Lilienfeld, 1994, p. 118-124), expressed as percents:

sensitivity = (true positives)/(true positives plus false negatives)

= (true positives)/(all those with the disease)

= a / (a + b) ´100

specificity = (true negatives)/(true negatives plus false positives)

= (true negatives)/(all those without the disease)

= d / (c + d) ´ 100

Sensitivity and specificity provide information about the accuracy (validity) of a test. Positive and negative predictive values provide information about the meaning to the test results.

The probability of disease being present given a positive test result is the positive predictive value (Lilienfeld, 1994, p. 118-124):

positive predictive value = (true positives)/(true positives plus false positives)

= (true positives)/(all those with a positive test result)

= a / (a + c) ´ 100

The probability of no disease being present given a negative test result is the negative predictive value (Lilienfeld, 1994, p. 118-124):

negative predictive value = (true negatives)/(true negatives plus false negatives)

= (true negatives)/(all those with a negative test result)

= d / (b + d) ´ 100

“Unlike sensitivity and specificity, the positive and negative predictive values of a test depend on the prevalence rate of disease in the population. …For a test of given sensitivity and specificity, the higher the prevalence of the disease, the greater the positive predictive value and the lower the negative predictive value.” (Lilienfeld, 1994, p. 122-123)

The overall accuracy, or simply accuracy, is simply the proportion of correct test decisions, and is defined as (without citation for now, Stoddard just knows this)

overall accuracy = (true postives plus true negative)/(all tests)

= (a + d)/(a + b + c + d)

The area under the receiver operating characteristic curve, or simply ROC, for a dichotomous test and gold standard variable, or 2 × 2 table, is simply the simple average of the sensitivity and specificity (without citation for now, Stoddard just knows this)

ROC = (sensitivity + specificity)/2

We will practice will the AngioData.dta file (see box)

AngioData.dta dataset

This file contains n=172 deindentified pairs of measurements provided by an anomonous researcher, with two continuous scored measurements of carotid artery stenosis.

angio Gold Standard: arteriography (arteriographic stenosis)

icapsv Diagnostic Test: internal carotid artery peak systolic velocity (PSVICA)

AngioData

Opening the data file, which is already in the the StataCourse\practice subdirectory,

use angiodata, clear

First, we will dichotomize the angio variable into 60% or greater carotid artery stenosis.

recode angio 0/59=0 60/100=1 .=., gen(gold)

For a first guess at a cutpoint for icapsv, we will use the mean

sum icapsv

Variable | Obs Mean Std. Dev. Min Max

------+------

icapsv | 172 172.5407 141.6103 0 575

Defining a dichotomized icapsv variable,

gen test = cond(icapsv>=172,1,0)
replace test=. if icapsv==.

and adding variable labels,

label variable gold "angio"
label variable test "icapsv"

To calculate the diagnostic test characteristics, we must first update Stata to include the diagt command. While connected to the internet,

findit diagt

------

search for diagt (manual: [R] search)

------

Keywords: diagt

Search: (1) Official help files, FAQs, Examples, SJs, and STBs

(2) Web resources from Stata and from other users

Search of official help files, FAQs, Examples, SJs, and STBs

SJ-4-4 sbe36_2 ...... Software update for diagt

(help diagt if installed) ...... P. T. Seed and A. Tobias

Q4/04 SJ 4(4):490

new options added to diagt

STB-59 sbe36.1 ...... Summary statistics for diagnostic tests

(help diagt if installed) ...... P. T. Seed and A. Tobias

1/01 pp.9--12; STB Reprints Vol 10, pp.90--93

complete revision of diagtest to assess a simple diagnostic

test in comparison with a reference standard; uses the exact

binomial distribution and provides diagti, an immediate

version of the command

Click on the sbe36_2 link to install the ado file.

If you are not connected to the internet, that is okay. The four files it adds (diagt.ado, diagt.hlp, diagti.ado, diagti.hlp) are in the StataCourse\practice subdirectory, which is now your current directory, so Stata will find these commands.

Computing the test characteristics,

diagt gold test

| icapsv

angio | Pos. Neg. | Total

------+------+------

Abnormal | 44 12 | 56

Normal | 16 100 | 116

------+------+------

Total | 60 112 | 172

True abnormal diagnosis defined as gold = 1

[95% Confidence Interval]

------

Prevalence Pr(A) 33% 26% 40.1%

------

Sensitivity Pr(+|A) 78.6% 65.6% 88.4%

Specificity Pr(-|N) 86.2% 78.6% 91.9%

ROC area (Sens. + Spec.)/2 .824 .761 .887

------

Likelihood ratio (+) Pr(+|A)/Pr(+|N) 5.7 3.54 9.16

Likelihood ratio (-) Pr(-|A)/Pr(-|N) .249 .15 .413

Odds ratio LR(+)/LR(-) 22.9 10.1 52.1

Positive predictive value Pr(A|+) 73.3% 60.3% 83.9%

Negative predictive value Pr(N|-) 89.3% 82% 94.3%

------

This looks like a pretty good guess for a cutpoint for icapsv.

To do this for every possible cutpoint for icapsv, we could simply put the commands inside a loop. Let’s begin to build a program inside the do-file, and run it for the first three values of icapsv.

capture program drop optcut
program define optcut
foreach num of numlist 0 21 36 {
capture drop test
gen test = cond(icapsv>`num’,1,0)
replace test=. if icapsv==.
diagt gold test
}
end
optcut

That worked, but let’s turn scrolling off.