Math 130 data analysis assignment

Nick Balfour

September 24, 2017

##
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
##
## filter, lag

## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union

Summary Statistics for Income and Sex

SEX 1 = Male 2 = Female

SEX_INCOME<-select(depress, INCOME, SEX)
MALE_INCOME<- filter(SEX_INCOME, SEX==1)
summary(MALE_INCOME)

## INCOME SEX
## Min. : 2.00 Min. :1
## 1st Qu.:11.00 1st Qu.:1
## Median :23.00 Median :1
## Mean :24.11 Mean :1
## 3rd Qu.:35.00 3rd Qu.:1
## Max. :65.00 Max. :1

SEX_INCOME<-select(depress, INCOME, SEX)
FEMALE_INCOME<- filter(SEX_INCOME, SEX==2)
summary(FEMALE_INCOME)

## INCOME SEX
## Min. : 2.00 Min. :2
## 1st Qu.: 8.00 1st Qu.:2
## Median :13.00 Median :2
## Mean :18.43 Mean :2
## 3rd Qu.:23.00 3rd Qu.:2
## Max. :65.00 Max. :2

summary(depress$INCOME)

## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 9.00 15.00 20.57 28.00 65.00

The tables above show the separate summary statistics for the income of males and females in this dataset as represented by "1" and "2" respectively. In addtion there is also a table of summary statistics of both males and female income together. From the tables you can observe that the males had a higher median, 1st quarter,3rd quarter and mean than females. Both males and females were represented in the maximum and minimum income values however.

Bar plot of Male vs. Female Frequency

ggplot(depress, aes(x=SEX)) + geom_bar()

summary(depress$SEX)

## Male Female
## 111 183

This barplot shows the frequency of males vs. females in this data set. Sex is listed on the x axis and count of each group is on the x-axis.From the barplot and summary table it can clearly observed that there are more males than females.

Bar plot of income frequency

ggplot(depress, aes(x=INCOME)) + geom_bar()

In this bar plot the total income frequency within the depression data set is shown. Income is shown on the x axis and is measured in tens of thousands while y axis shows the count of individuals with the corresponding income. While the range of income stretches between close to 0 and $65,000, there are large gaps that begin to occur before $30,000 and get larger as income increases. This indicates that there were not as many people that made >$40,000 and that the sample size was not large enough to fill in these gaps.

Barplot of Income among Males vs. Females

ggplot(depress, aes(x=INCOME, fill=SEX)) + geom_bar(position = "dodge")

This bar plot shows the income distribution of males vs. females alongside each other. From this figure you can observe that the peak in count of females income is lower than that of males ($7,000-$20,000 vs. $20,000-$35,000) and that there were more females in this dataset. In addition, Both males and females are represented at the highest and lowest incomes but the majority of males appears to occur at a higher range of income that females.

Density plot of Income among Males vs. Females

ggplot(depress, aes(x=INCOME, fill=SEX)) + geom_density() + facet_wrap(~SEX)

This figure depicts relationship of income among males vs females as a density plot. From these figures the peaks of male vs. female income can clearly be observed. There are a greater frequency of women in this data set that earn a lower income than males.

Results Description

The data description is clear and concise, it is clear to me what data is being analyzed and where it was obtained.

The data for Income and Gender came from the depression dataset. The distribution of income shown in the tables and figures above demonstrates that males have a higher average income than females. In addition, within the individuals sampled, there is a higher proportion of both males and females earning less than $20,000/year and that the number of people decreases as income increases. This could indicate that the people sampled in this data set are from a similar socioeconomic status which could vary any number of factors like age and geopgraphical location if the sample size were to be increased.