Annotated Stata Output Homework #1 Key October 5, 2011, Page 3 of 7

#### Biost 517: Applied Biostatistics I

#### Emerson, Fall 2011

#### Annotated Stata Log File: Homework #1

#### October 5, 2011

#### In this file I give the Stata commands I used to produce

#### the key to Homework #2. In order to properly format

#### a table useful to casual readers, I cut and pasted some

#### of the output into Excel.

#### Comments edited into the log file produced by Stata are

#### on the lines that start with the four ‘#’ signs and are

#### printed in italics.

#### The Stata commands are put in bold face.

#### Stata output is displayed in regular typeface in blue.

#### Start a log file to record commands and output

. log using hw1Stata.log

name: <unnamed>

log: \\.psf\Home\Documents\teach/courses/b517/f11/hw1Stata.log

log type: text

opened on: 22 Oct 2011, 09:07:51

#### Read in data: The infile command was typed all on one line; “quietly” avoided

#### Stata printing output about missing data

. quietly: infile ptid female age dose put0 put6 put12 put15 spd0 spd6 spd12

> spd15 spm0 spm6 spm12 spm15 using teach/datasets/dfmowide.txt

#### I describe the data set just to make sure there are the right number of variables

#### and cases.

. describe

Contains data

obs: 115

vars: 16

size: 7,820 (99.9% of memory free)

storage display value

variable name type format label variable label

ptid float %9.0g

female float %9.0g

age float %9.0g

dose float %9.0g

put0 float %9.0g

put6 float %9.0g

put12 float %9.0g

put15 float %9.0g

spd0 float %9.0g

spd6 float %9.0g

spd12 float %9.0g

spd15 float %9.0g

spm0 float %9.0g

spm6 float %9.0g

spm12 float %9.0g

spm15 float %9.0g

Sorted by:

Note: dataset has changed since last saved

#### The first case should be all missing data owing to the variable names.

#### I verify this, then delete that case

. list in 1

+------+

1. | ptid | female | age | dose | put0 | put6 | put12 | put15 | spd0 | spd6 | spd12 | spd15 | spm0 | spm6 |

| . | . | . | . | . | . | . | . | . | . | . | . | . | . |

|------+------|

| spm12 | spm15 |

| . | . |

+------+

. drop in 1

(1 observation deleted)

#### Eventually I will want descriptive statistics within groups defined by dose

#### and defined by whether spermidine measurements are missing at follow-up.

#### Of course, I already have the dose variable, but I will find it easiest

#### to consider the patterns of missing data if I create variables that

#### indicate missingness at month 6 (msng6) and at month 12 (msng12). Note that

#### I show two different ways of doing this.

. g msng6= 0

. replace msng6= 1 if spd6==.

(8 real changes made)

. g msng12= int(spd12==.)

#### I first tabulate the concordance between a case missing spd at 6 and 12 months.

. table msng6 msng12

| msng12

msng6 | 0 1

0 | 94 12

1 | 1 7

#### Now tabulate the missingness by dose for msng6 and msng12 separately.

. tabulate dose msng6, row

+------+

| Key |

| frequency |

| row percentage |

| msng6 _

dose | 0 1 | Total

0 | 30 2 | 32

| 93.75 6.25 | 100.00

.075 | 28 1 | 29

| 96.55 3.45 | 100.00

.2 | 23 2 | 25

| 92.00 8.00 | 100.00

.4 | 25 3 | 28

| 89.29 10.71 | 100.00

Total | 106 8 | 114

| 92.98 7.02 | 100.00

. tabulate dose msng12, row

| Key |

| frequency |

| row percentage |

| msng12 |

dose | 0 1 | Total

0 | 28 4 | 32

| 87.50 12.50 | 100.00

.075 | 26 3 | 29

| 89.66 10.34 | 100.00

.2 | 21 4 | 25

| 84.00 16.00 | 100.00

.4 | 20 8 | 28

| 71.43 28.57 | 100.00

Total | 95 19 | 114

| 83.33 16.67 | 100.00

#### I use tabstat to get the bulk of my descriptive statistics. I use the

#### “by(dose)” option to get the descriptive statistics stratified by dose.

. tabstat female age spd0 spd6 spd12 spd15, stat(n mean sd min q max) col(stat) by(dose)

Summary for variables: female age spd0 spd6 spd12 spd15

by categories of: dose

dose | N mean sd min p25 p50 p75 max

0 | 32 .1875 .3965578 0 0 0 0 1

| 32 65.86593 8.506546 45.45654 61.14169 66.37098 73.48391 77.17728

| 32 3.262269 1.451079 1.398 2.080885 2.934445 4.168535 7.054

| 30 3.369447 1.532392 1.505 2.195 3.0245 4.282 6.912

| 28 3.255935 1.313698 1.013 2.262 2.8155 4.273 5.91

| 27 2.687976 .9334525 1.249 2.043 2.445 3.368 4.616

.0750000 | 29 .1724138 .3844259 0 0 0 0 1

| 28 61.34507 7.692247 47.80287 56.45448 61.42916 65.76181 76.85421

| 29 3.474182 1.551096 1.50918 2.29 2.9105 4.617 7.017

| 28 2.63864 .8922527 1.388 1.939415 2.45584 3.0455 5.12

| 26 2.919683 .9937638 1.352 2.12713 2.859 3.635 4.923

| 26 2.95123 .98869 0 2.399 2.984 3.628 4.832

.2000000 | 25 0 0 0 0 0 0 0

| 25 62.83893 8.281456 45.42916 59.22519 63.69336 68.32033 77.57974

| 25 3.349862 1.328774 1.69968 2.423 2.92366 4.01 6.218

| 23 2.583477 1.642605 1.068 1.487 1.84741 3.898 7.844

| 21 2.71162 1.395473 .293 1.757 2.509 3.777 6.454

| 21 2.978943 .9025277 1.806 2.195 2.807 3.706 4.805

.4000000 | 28 .2142857 .4178554 0 0 0 0 1

| 28 63.86809 7.807557 48.52019 60.10267 64.9692 69.39356 80.97467

| 28 3.564948 1.884708 .66 2.153115 3.08422 4.7 7.6

| 25 2.67693 1.427543 1.05862 1.667 2.069 2.89825 6.344

| 20 1.949568 .7989114 0 1.4755 1.929 2.4565 3.417

| 18 2.703843 .8659153 1.28947 2.35871 2.6865 3.018 4.465

Total | 114 .1491228 .3577822 0 0 0 0 1

| 113 63.58099 8.159012 45.42916 59.17043 64.3258 69.5989 80.97467

| 114 3.409728 1.55291 .66 2.268 2.957275 4.286 7.6

| 106 2.842533 1.412813 1.05862 1.798 2.52334 3.471 7.844

| 95 2.768562 1.233778 0 1.987 2.553 3.586 6.454

| 92 2.831895 .9246859 0 2.2015 2.719 3.437 4.832

#### I even used tabstat on the binary indicator of female sex. I can do that because

#### the mean of a binary variable is the proportion having 1. I just ignore the rest

#### of the descriptive statistics. I could of course created a table.

. tabulate dose female, row

+------+

| Key |

| frequency |

| row percentage |

| female

dose | 0 1 | Total

0 | 26 6 | 32

| 81.25 18.75 | 100.00

.075 | 24 5 | 29

| 82.76 17.24 | 100.00

.2 | 25 0 | 25

| 100.00 0.00 | 100.00

.4 | 22 6 | 28

| 78.57 21.43 | 100.00

Total | 97 17 | 114

| 85.09 14.91 | 100.00

#### I then get descriptive statistics by whether or not the subject is missing spd data at 6 and 12 months.

. tabstat female age dose spd0 spd6 spd12 spd15, stat(n mean sd min q max) col(stat) by(msng6)

Summary for variables: female age dose spd0 spd6 spd12 spd15

by categories of: msng6

msng6 | N mean sd min p25 p50 p75 max

0 | 106 .1509434 .3596944 0 0 0 0 1

| 105 63.59798 8.359108 45.42916 59.03901 64.3258 69.64271 80.97467

| 106 .1575472 .1526422 0 0 .075 .2 .4

| 106 3.41639 1.567049 .66 2.25839 2.996445 4.386 7.6

| 106 2.842533 1.412813 1.05862 1.798 2.52334 3.471 7.844

| 94 2.746876 1.222055 0 1.987 2.531 3.417 6.454

| 90 2.83218 .9322666 0 2.195 2.719 3.451 4.832

1 | 8 .125 .3535534 0 0 0 0 1

| 8 63.35797 5.187707 56.84326 59.7358 62.56126 66.39562 72.63518

| 8 .209375 .1752231 0 .0375 .2 .4 .4

| 8 3.321457 1.443875 2.087 2.4425 2.90933 3.5785 6.624

| 0 ......

| 1 4.807 . 4.807 4.807 4.807 4.807 4.807

| 2 2.819045 .6759304 2.34109 2.34109 2.819045 3.297 3.297

Total | 114 .1491228 .3577822 0 0 0 0 1

| 113 63.58099 8.159012 45.42916 59.17043 64.3258 69.5989 80.97467

| 114 .1611842 .1540419 0 0 .075 .2 .4

| 114 3.409728 1.55291 .66 2.268 2.957275 4.286 7.6

| 106 2.842533 1.412813 1.05862 1.798 2.52334 3.471 7.844

| 95 2.768562 1.233778 0 1.987 2.553 3.586 6.454

| 92 2.831895 .9246859 0 2.2015 2.719 3.437 4.832

. tabstat female age dose spd0 spd6 spd12 spd15, stat(n mean sd min q max) col(stat) by(msng12)

Summary for variables: female age dose spd0 spd6 spd12 spd15

by categories of: msng12

msng12 | N mean sd min p25 p50 p75 max

0 | 95 .1578947 .3665767 0 0 0 0 1

| 94 63.58134 8.126623 45.42916 59.03901 64.25735 69.64271 80.97467

| 95 .1489474 .1487052 0 0 .075 .2 .4

| 95 3.388189 1.515432 .66 2.25839 3.00237 4.209 7.212

| 94 2.793909 1.323701 1.05862 1.723 2.45584 3.471 6.631

| 95 2.768562 1.233778 0 1.987 2.553 3.586 6.454

| 88 2.800657 .9192363 0 2.1825 2.7035 3.3645 4.832

1 | 19 .1052632 .3153018 0 0 0 0 1

| 19 63.57924 8.543595 48.52019 59.41958 64.33676 69.5989 75.96167

| 19 .2223684 .1695367 0 .075 .2 .4 .4

| 19 3.517424 1.769563 1.658 2.268 2.895 5.08 7.6

| 12 3.223427 2.015124 1.78992 1.9935 2.527325 3.226625 7.844

| 0 ......

| 4 3.51912 .8792833 2.34109 2.95274 3.635195 4.0855 4.465

Total | 114 .1491228 .3577822 0 0 0 0 1

| 113 63.58099 8.159012 45.42916 59.17043 64.3258 69.5989 80.97467

| 114 .1611842 .1540419 0 0 .075 .2 .4

| 114 3.409728 1.55291 .66 2.268 2.957275 4.286 7.6

| 106 2.842533 1.412813 1.05862 1.798 2.52334 3.471 7.844

| 95 2.768562 1.233778 0 1.987 2.553 3.586 6.454

| 92 2.831895 .9246859 0 2.2015 2.719 3.437 4.832

#### While the above statistics are of interest, to the extent that I am afraid that subjects

#### might be missing data due to adverse events and, furthermore, that those adverse events

#### might be occurring due to the same mechanism that lowers the spd levels, I might want

#### to examine the descriptive statistics by both missingness and dose. I use the

#### “by(msng6)” option to get the descriptive statistics stratified by

#### missingness, along with the “bysort dose” prefix. (tabstat will not let

#### me give two variables in the by() option, or else I would do that.

#### I will eventually use an Excel file to format the data I want.

. bysort dose: tabstat female age spd0 spd6 spd12 spd15, stat(n mean sd min q max) col(stat) by(msng6)

-> dose = 0

Summary for variables: female age spd0 spd6 spd12 spd15

by categories of: msng6

msng6 | N mean sd min p25 p50 p75 max

0 | 30 .2 .4068381 0 0 0 0 1

| 30 65.83409 8.637422 45.45654 62.23135 66.37098 74.33265 77.17728

| 30 3.280087 1.495238 1.398 2.0623 2.934445 4.209 7.054

| 30 3.369447 1.532392 1.505 2.195 3.0245 4.282 6.912

| 27 3.198489 1.302391 1.013 2.227 2.795 4.101 5.91

| 26 2.664552 .9438114 1.249 2.043 2.3975 3.368 4.616

1 | 2 0 0 0 0 0 0 0

| 2 66.3436 8.897635 60.05202 60.05202 66.3436 72.63518 72.63518

| 2 2.995 .5345726 2.617 2.617 2.995 3.373 3.373

| 0 ......

| 1 4.807 . 4.807 4.807 4.807 4.807 4.807

| 1 3.297 . 3.297 3.297 3.297 3.297 3.297

Total | 32 .1875 .3965578 0 0 0 0 1

| 32 65.86593 8.506546 45.45654 61.14169 66.37098 73.48391 77.17728

| 32 3.262269 1.451079 1.398 2.080885 2.934445 4.168535 7.054

| 30 3.369447 1.532392 1.505 2.195 3.0245 4.282 6.912

| 28 3.255935 1.313698 1.013 2.262 2.8155 4.273 5.91

| 27 2.687976 .9334525 1.249 2.043 2.445 3.368 4.616

-> dose = .075

Summary for variables: female age spd0 spd6 spd12 spd15

by categories of: msng6

msng6 | N mean sd min p25 p50 p75 max

0 | 28 .1785714 .390021 0 0 0 0 1

| 27 61.23426 7.815975 47.80287 56.18344 61.25667 66.14374 76.85421

| 28 3.361688 1.454106 1.50918 2.28445 2.85475 4.40282 7.017

| 28 2.63864 .8922527 1.388 1.939415 2.45584 3.0455 5.12

| 26 2.919683 .9937638 1.352 2.12713 2.859 3.635 4.923

| 26 2.95123 .98869 0 2.399 2.984 3.628 4.832

1 | 1 0 . 0 0 0 0 0

| 1 64.33676 . 64.33676 64.33676 64.33676 64.33676 64.33676

| 1 6.624 . 6.624 6.624 6.624 6.624 6.624

| 0 ......

| 0 ......

| 0 ......

Total | 29 .1724138 .3844259 0 0 0 0 1

| 28 61.34507 7.692247 47.80287 56.45448 61.42916 65.76181 76.85421

| 29 3.474182 1.551096 1.50918 2.29 2.9105 4.617 7.017

| 28 2.63864 .8922527 1.388 1.939415 2.45584 3.0455 5.12

| 26 2.919683 .9937638 1.352 2.12713 2.859 3.635 4.923

| 26 2.95123 .98869 0 2.399 2.984 3.628 4.832

-> dose = .2

Summary for variables: female age spd0 spd6 spd12 spd15

by categories of: msng6

msng6 | N mean sd min p25 p50 p75 max

0 | 23 0 0 0 0 0 0 0

| 23 62.52966 8.557483 45.42916 59.17043 63.16496 69.92471 77.57974

| 23 3.349517 1.381784 1.69968 2.3264 2.73 4.386 6.218

| 23 2.583477 1.642605 1.068 1.487 1.84741 3.898 7.844

| 21 2.71162 1.395473 .293 1.757 2.509 3.777 6.454

| 20 3.010836 .9137524 1.806 2.173 2.968 3.7965 4.805