Annotated Stata Output Homework #1 Key October 5, 2011, Page 3 of 7
#### Biost 517: Applied Biostatistics I
#### Emerson, Fall 2011
#### Annotated Stata Log File: Homework #1
#### October 5, 2011
#### In this file I give the Stata commands I used to produce
#### the key to Homework #2. In order to properly format
#### a table useful to casual readers, I cut and pasted some
#### of the output into Excel.
#### Comments edited into the log file produced by Stata are
#### on the lines that start with the four ‘#’ signs and are
#### printed in italics.
#### The Stata commands are put in bold face.
#### Stata output is displayed in regular typeface in blue.
#### Start a log file to record commands and output
. log using hw1Stata.log
name: <unnamed>
log: \\.psf\Home\Documents\teach/courses/b517/f11/hw1Stata.log
log type: text
opened on: 22 Oct 2011, 09:07:51
#### Read in data: The infile command was typed all on one line; “quietly” avoided
#### Stata printing output about missing data
. quietly: infile ptid female age dose put0 put6 put12 put15 spd0 spd6 spd12
> spd15 spm0 spm6 spm12 spm15 using teach/datasets/dfmowide.txt
#### I describe the data set just to make sure there are the right number of variables
#### and cases.
. describe
Contains data
obs: 115
vars: 16
size: 7,820 (99.9% of memory free)
storage display value
variable name type format label variable label
ptid float %9.0g
female float %9.0g
age float %9.0g
dose float %9.0g
put0 float %9.0g
put6 float %9.0g
put12 float %9.0g
put15 float %9.0g
spd0 float %9.0g
spd6 float %9.0g
spd12 float %9.0g
spd15 float %9.0g
spm0 float %9.0g
spm6 float %9.0g
spm12 float %9.0g
spm15 float %9.0g
Sorted by:
Note: dataset has changed since last saved
#### The first case should be all missing data owing to the variable names.
#### I verify this, then delete that case
. list in 1
+------+
1. | ptid | female | age | dose | put0 | put6 | put12 | put15 | spd0 | spd6 | spd12 | spd15 | spm0 | spm6 |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . |
|------+------|
| spm12 | spm15 |
| . | . |
+------+
. drop in 1
(1 observation deleted)
#### Eventually I will want descriptive statistics within groups defined by dose
#### and defined by whether spermidine measurements are missing at follow-up.
#### Of course, I already have the dose variable, but I will find it easiest
#### to consider the patterns of missing data if I create variables that
#### indicate missingness at month 6 (msng6) and at month 12 (msng12). Note that
#### I show two different ways of doing this.
. g msng6= 0
. replace msng6= 1 if spd6==.
(8 real changes made)
. g msng12= int(spd12==.)
#### I first tabulate the concordance between a case missing spd at 6 and 12 months.
. table msng6 msng12
| msng12
msng6 | 0 1
0 | 94 12
1 | 1 7
#### Now tabulate the missingness by dose for msng6 and msng12 separately.
. tabulate dose msng6, row
+------+
| Key |
| frequency |
| row percentage |
| msng6 _
dose | 0 1 | Total
0 | 30 2 | 32
| 93.75 6.25 | 100.00
.075 | 28 1 | 29
| 96.55 3.45 | 100.00
.2 | 23 2 | 25
| 92.00 8.00 | 100.00
.4 | 25 3 | 28
| 89.29 10.71 | 100.00
Total | 106 8 | 114
| 92.98 7.02 | 100.00
. tabulate dose msng12, row
| Key |
| frequency |
| row percentage |
| msng12 |
dose | 0 1 | Total
0 | 28 4 | 32
| 87.50 12.50 | 100.00
.075 | 26 3 | 29
| 89.66 10.34 | 100.00
.2 | 21 4 | 25
| 84.00 16.00 | 100.00
.4 | 20 8 | 28
| 71.43 28.57 | 100.00
Total | 95 19 | 114
| 83.33 16.67 | 100.00
#### I use tabstat to get the bulk of my descriptive statistics. I use the
#### “by(dose)” option to get the descriptive statistics stratified by dose.
. tabstat female age spd0 spd6 spd12 spd15, stat(n mean sd min q max) col(stat) by(dose)
Summary for variables: female age spd0 spd6 spd12 spd15
by categories of: dose
dose | N mean sd min p25 p50 p75 max
0 | 32 .1875 .3965578 0 0 0 0 1
| 32 65.86593 8.506546 45.45654 61.14169 66.37098 73.48391 77.17728
| 32 3.262269 1.451079 1.398 2.080885 2.934445 4.168535 7.054
| 30 3.369447 1.532392 1.505 2.195 3.0245 4.282 6.912
| 28 3.255935 1.313698 1.013 2.262 2.8155 4.273 5.91
| 27 2.687976 .9334525 1.249 2.043 2.445 3.368 4.616
.0750000 | 29 .1724138 .3844259 0 0 0 0 1
| 28 61.34507 7.692247 47.80287 56.45448 61.42916 65.76181 76.85421
| 29 3.474182 1.551096 1.50918 2.29 2.9105 4.617 7.017
| 28 2.63864 .8922527 1.388 1.939415 2.45584 3.0455 5.12
| 26 2.919683 .9937638 1.352 2.12713 2.859 3.635 4.923
| 26 2.95123 .98869 0 2.399 2.984 3.628 4.832
.2000000 | 25 0 0 0 0 0 0 0
| 25 62.83893 8.281456 45.42916 59.22519 63.69336 68.32033 77.57974
| 25 3.349862 1.328774 1.69968 2.423 2.92366 4.01 6.218
| 23 2.583477 1.642605 1.068 1.487 1.84741 3.898 7.844
| 21 2.71162 1.395473 .293 1.757 2.509 3.777 6.454
| 21 2.978943 .9025277 1.806 2.195 2.807 3.706 4.805
.4000000 | 28 .2142857 .4178554 0 0 0 0 1
| 28 63.86809 7.807557 48.52019 60.10267 64.9692 69.39356 80.97467
| 28 3.564948 1.884708 .66 2.153115 3.08422 4.7 7.6
| 25 2.67693 1.427543 1.05862 1.667 2.069 2.89825 6.344
| 20 1.949568 .7989114 0 1.4755 1.929 2.4565 3.417
| 18 2.703843 .8659153 1.28947 2.35871 2.6865 3.018 4.465
Total | 114 .1491228 .3577822 0 0 0 0 1
| 113 63.58099 8.159012 45.42916 59.17043 64.3258 69.5989 80.97467
| 114 3.409728 1.55291 .66 2.268 2.957275 4.286 7.6
| 106 2.842533 1.412813 1.05862 1.798 2.52334 3.471 7.844
| 95 2.768562 1.233778 0 1.987 2.553 3.586 6.454
| 92 2.831895 .9246859 0 2.2015 2.719 3.437 4.832
#### I even used tabstat on the binary indicator of female sex. I can do that because
#### the mean of a binary variable is the proportion having 1. I just ignore the rest
#### of the descriptive statistics. I could of course created a table.
. tabulate dose female, row
+------+
| Key |
| frequency |
| row percentage |
| female
dose | 0 1 | Total
0 | 26 6 | 32
| 81.25 18.75 | 100.00
.075 | 24 5 | 29
| 82.76 17.24 | 100.00
.2 | 25 0 | 25
| 100.00 0.00 | 100.00
.4 | 22 6 | 28
| 78.57 21.43 | 100.00
Total | 97 17 | 114
| 85.09 14.91 | 100.00
#### I then get descriptive statistics by whether or not the subject is missing spd data at 6 and 12 months.
. tabstat female age dose spd0 spd6 spd12 spd15, stat(n mean sd min q max) col(stat) by(msng6)
Summary for variables: female age dose spd0 spd6 spd12 spd15
by categories of: msng6
msng6 | N mean sd min p25 p50 p75 max
0 | 106 .1509434 .3596944 0 0 0 0 1
| 105 63.59798 8.359108 45.42916 59.03901 64.3258 69.64271 80.97467
| 106 .1575472 .1526422 0 0 .075 .2 .4
| 106 3.41639 1.567049 .66 2.25839 2.996445 4.386 7.6
| 106 2.842533 1.412813 1.05862 1.798 2.52334 3.471 7.844
| 94 2.746876 1.222055 0 1.987 2.531 3.417 6.454
| 90 2.83218 .9322666 0 2.195 2.719 3.451 4.832
1 | 8 .125 .3535534 0 0 0 0 1
| 8 63.35797 5.187707 56.84326 59.7358 62.56126 66.39562 72.63518
| 8 .209375 .1752231 0 .0375 .2 .4 .4
| 8 3.321457 1.443875 2.087 2.4425 2.90933 3.5785 6.624
| 0 ......
| 1 4.807 . 4.807 4.807 4.807 4.807 4.807
| 2 2.819045 .6759304 2.34109 2.34109 2.819045 3.297 3.297
Total | 114 .1491228 .3577822 0 0 0 0 1
| 113 63.58099 8.159012 45.42916 59.17043 64.3258 69.5989 80.97467
| 114 .1611842 .1540419 0 0 .075 .2 .4
| 114 3.409728 1.55291 .66 2.268 2.957275 4.286 7.6
| 106 2.842533 1.412813 1.05862 1.798 2.52334 3.471 7.844
| 95 2.768562 1.233778 0 1.987 2.553 3.586 6.454
| 92 2.831895 .9246859 0 2.2015 2.719 3.437 4.832
. tabstat female age dose spd0 spd6 spd12 spd15, stat(n mean sd min q max) col(stat) by(msng12)
Summary for variables: female age dose spd0 spd6 spd12 spd15
by categories of: msng12
msng12 | N mean sd min p25 p50 p75 max
0 | 95 .1578947 .3665767 0 0 0 0 1
| 94 63.58134 8.126623 45.42916 59.03901 64.25735 69.64271 80.97467
| 95 .1489474 .1487052 0 0 .075 .2 .4
| 95 3.388189 1.515432 .66 2.25839 3.00237 4.209 7.212
| 94 2.793909 1.323701 1.05862 1.723 2.45584 3.471 6.631
| 95 2.768562 1.233778 0 1.987 2.553 3.586 6.454
| 88 2.800657 .9192363 0 2.1825 2.7035 3.3645 4.832
1 | 19 .1052632 .3153018 0 0 0 0 1
| 19 63.57924 8.543595 48.52019 59.41958 64.33676 69.5989 75.96167
| 19 .2223684 .1695367 0 .075 .2 .4 .4
| 19 3.517424 1.769563 1.658 2.268 2.895 5.08 7.6
| 12 3.223427 2.015124 1.78992 1.9935 2.527325 3.226625 7.844
| 0 ......
| 4 3.51912 .8792833 2.34109 2.95274 3.635195 4.0855 4.465
Total | 114 .1491228 .3577822 0 0 0 0 1
| 113 63.58099 8.159012 45.42916 59.17043 64.3258 69.5989 80.97467
| 114 .1611842 .1540419 0 0 .075 .2 .4
| 114 3.409728 1.55291 .66 2.268 2.957275 4.286 7.6
| 106 2.842533 1.412813 1.05862 1.798 2.52334 3.471 7.844
| 95 2.768562 1.233778 0 1.987 2.553 3.586 6.454
| 92 2.831895 .9246859 0 2.2015 2.719 3.437 4.832
#### While the above statistics are of interest, to the extent that I am afraid that subjects
#### might be missing data due to adverse events and, furthermore, that those adverse events
#### might be occurring due to the same mechanism that lowers the spd levels, I might want
#### to examine the descriptive statistics by both missingness and dose. I use the
#### “by(msng6)” option to get the descriptive statistics stratified by
#### missingness, along with the “bysort dose” prefix. (tabstat will not let
#### me give two variables in the by() option, or else I would do that.
#### I will eventually use an Excel file to format the data I want.
. bysort dose: tabstat female age spd0 spd6 spd12 spd15, stat(n mean sd min q max) col(stat) by(msng6)
-> dose = 0
Summary for variables: female age spd0 spd6 spd12 spd15
by categories of: msng6
msng6 | N mean sd min p25 p50 p75 max
0 | 30 .2 .4068381 0 0 0 0 1
| 30 65.83409 8.637422 45.45654 62.23135 66.37098 74.33265 77.17728
| 30 3.280087 1.495238 1.398 2.0623 2.934445 4.209 7.054
| 30 3.369447 1.532392 1.505 2.195 3.0245 4.282 6.912
| 27 3.198489 1.302391 1.013 2.227 2.795 4.101 5.91
| 26 2.664552 .9438114 1.249 2.043 2.3975 3.368 4.616
1 | 2 0 0 0 0 0 0 0
| 2 66.3436 8.897635 60.05202 60.05202 66.3436 72.63518 72.63518
| 2 2.995 .5345726 2.617 2.617 2.995 3.373 3.373
| 0 ......
| 1 4.807 . 4.807 4.807 4.807 4.807 4.807
| 1 3.297 . 3.297 3.297 3.297 3.297 3.297
Total | 32 .1875 .3965578 0 0 0 0 1
| 32 65.86593 8.506546 45.45654 61.14169 66.37098 73.48391 77.17728
| 32 3.262269 1.451079 1.398 2.080885 2.934445 4.168535 7.054
| 30 3.369447 1.532392 1.505 2.195 3.0245 4.282 6.912
| 28 3.255935 1.313698 1.013 2.262 2.8155 4.273 5.91
| 27 2.687976 .9334525 1.249 2.043 2.445 3.368 4.616
-> dose = .075
Summary for variables: female age spd0 spd6 spd12 spd15
by categories of: msng6
msng6 | N mean sd min p25 p50 p75 max
0 | 28 .1785714 .390021 0 0 0 0 1
| 27 61.23426 7.815975 47.80287 56.18344 61.25667 66.14374 76.85421
| 28 3.361688 1.454106 1.50918 2.28445 2.85475 4.40282 7.017
| 28 2.63864 .8922527 1.388 1.939415 2.45584 3.0455 5.12
| 26 2.919683 .9937638 1.352 2.12713 2.859 3.635 4.923
| 26 2.95123 .98869 0 2.399 2.984 3.628 4.832
1 | 1 0 . 0 0 0 0 0
| 1 64.33676 . 64.33676 64.33676 64.33676 64.33676 64.33676
| 1 6.624 . 6.624 6.624 6.624 6.624 6.624
| 0 ......
| 0 ......
| 0 ......
Total | 29 .1724138 .3844259 0 0 0 0 1
| 28 61.34507 7.692247 47.80287 56.45448 61.42916 65.76181 76.85421
| 29 3.474182 1.551096 1.50918 2.29 2.9105 4.617 7.017
| 28 2.63864 .8922527 1.388 1.939415 2.45584 3.0455 5.12
| 26 2.919683 .9937638 1.352 2.12713 2.859 3.635 4.923
| 26 2.95123 .98869 0 2.399 2.984 3.628 4.832
-> dose = .2
Summary for variables: female age spd0 spd6 spd12 spd15
by categories of: msng6
msng6 | N mean sd min p25 p50 p75 max
0 | 23 0 0 0 0 0 0 0
| 23 62.52966 8.557483 45.42916 59.17043 63.16496 69.92471 77.57974
| 23 3.349517 1.381784 1.69968 2.3264 2.73 4.386 6.218
| 23 2.583477 1.642605 1.068 1.487 1.84741 3.898 7.844
| 21 2.71162 1.395473 .293 1.757 2.509 3.777 6.454
| 20 3.010836 .9137524 1.806 2.173 2.968 3.7965 4.805