DescribingUnivariateDistributions

KeckShilling

library(ggplot2);library(descr);library(knitr)
load("~/Desktop/Math 315/Projects/Data/addhealth_clean.Rdata")

Intro

In this assignment, we are looking at the variables that relate to drug use and parental structure. We want to find out if there is a correlation between parental structure (categorical) and drug use (categorical) among the respondants. The variable, drug_use, is a combination of 3 seperate variables (H4TO65C - H4TO65E) all relating to if drugs are used. As we are not focused on which type of drug, we were able to combine them. Next, parental structure is looked at in regards to biological parent(s), biological relative(s), step-parent(s), adoptive parent(s) and other.We will also be looking at the Age (quantitative) of the respondant to see if that plays a role in drug use as well. Looking at the initial age, agew1, and what the current age is, Age, we might see if there is as trend among younger vs. older.

Categorical

table(data$pstructure)

##
## Biological Other
## 2446 4058

kable(freq(data$pstructure))

Frequency / Percent
Biological / 2446 / 37.60763
Other / 4058 / 62.39237
Total / 6504 / 100.00000
The graph sho / ws a higher / frequency for being raised by someone other than a biological parent or relative, with 62.4% (n=4058). Those that were raised by a biological parent or relative is at 37.6% (n=2246).

table(data$Drugs)

##
## FALSE TRUE
## 3673 1410

kable(freq(data$Drugs))

Frequency / Percent / Valid Percent
FALSE / 3673 / 56.47294 / 72.26048
TRUE / 1410 / 21.67897 / 27.73952
NA's / 1421 / 21.84809 / NA
Total / 6504 / 100.00000 / 100.00000
The freq / uency of peo / ple who do d / rugs is much lower than those who do not do drugs. Among the sample size, 21.7% (n=1410) of people do drugs and 56.5% (n=3673) of people do not do drugs.

Continuous

par(mfrow=c(1,2))
hist(data$Age); boxplot(data$Age)

summary(data$Age)

## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 28.00 30.00 32.00 31.58 33.00 37.00 1667

freq(data$Age)

## Current
## Frequency Percent Valid Percent
## 28 12 0.1845 0.2481
## 29 532 8.1796 10.9986
## 30 814 12.5154 16.8286
## 31 979 15.0523 20.2398
## 32 985 15.1445 20.3639
## 33 944 14.5141 19.5162
## 34 454 6.9803 9.3860
## 35 96 1.4760 1.9847
## 36 17 0.2614 0.3515
## 37 4 0.0615 0.0827
## NA's 1667 25.6304
## Total 6504 100.0000 100.0000

The histogram displays the frequency of the current ages of participants. The range is age 28-37 with the median age being 31.Out of 6,504 participants, 1,667 are labeled as NA.

par(mfrow=c(1,2))
hist(data$H4TO98); boxplot(data$H4TO98)

summary(data$H4TO98)

## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 97.00 97.00 76.84 97.00 98.00 1390

freq(data$H4TO98)

## Number of times used favorite drug in last 30 days
## Frequency Percent Valid Percent
## 0 563 8.65621 11.00899
## 1 152 2.33702 2.97223
## 2 126 1.93727 2.46382
## 3 93 1.42989 1.81854
## 4 53 0.81488 1.03637
## 5 42 0.64576 0.82127
## 6 48 0.73801 0.93860
## 97 4035 62.03875 78.90106
## 98 2 0.03075 0.03911
## NA's 1390 21.37146
## Total 6504 100.00000 100.00000

Histogram shows number of times favorite drug used in the past 12 months. 4,035 participants chose legitimate skip, which was 62% of the sample size. 8.6% said none, 2% did 1-2 days, and 1.9% did 1-2 days in past 12 months. 1.4% did 2-3 days a month, 0.8% 1-2 days a week, 0.6% 3-5 days per week, and 0.7% did almost every day.