Problem Set I
Before you begin, make sure that you have loaded the ggplot2 package and the MASS package. Also, install, using the install.packages command, the package ggthemes. Load ggthemes using the library or require command.
- Open the built-in data frame called txhousing. Give five different ways of showing how many rows are in the data frame.
dim(txhousing)
## [1] 8602 9
nrow(txhousing)
## [1] 8602
str(txhousing)
## Classes 'tbl_df', 'tbl' and 'data.frame': 8602 obs. of 9 variables:
## $ city : chr "Abilene" "Abilene" "Abilene" "Abilene" ...
## $ year : int 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 ...
## $ month : int 1 2 3 4 5 6 7 8 9 10 ...
## $ sales : num 72 98 130 98 141 156 152 131 104 101 ...
## $ volume : num 5380000 6505000 9285000 9730000 10590000 ...
## $ median : num 71400 58700 58100 68600 67300 66900 73500 75000 64500 59300 ...
## $ listings : num 701 746 784 785 794 780 742 765 771 764 ...
## $ inventory: num 6.3 6.6 6.8 6.9 6.8 6.6 6.2 6.4 6.5 6.6 ...
## $ date : num 2000 2000 2000 2000 2000 ...
View(txhousing)
#Check out the number of observations in the Global Environment window
- Using ggplot, make a histogram of the listings variable. Use your ggplot skills to make your plot aesthetically pleasing.
ggplot(txhousing,aes(listings)) +geom_histogram(fill="goldenrod") +theme_bw()+
xlab("Listings") +ylab("Frequency")
You will see that the distribution of listings is skewed to the right. To see the relationships among the more common numbers of listing at the left side of the distribution, you can use the xlim() command to restrict the length of the x-axis. Try the following term in your ggplot code: + xlim(0,10000).
ggplot(txhousing,aes(listings)) +geom_histogram(fill="goldenrod") +theme_bw()+
xlab("Listings") +ylab("Frequency") +xlim(0,10000)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1917 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).
- Determine the standard deviation of the sales variable and assign it to an R object. Create another R object and assign it the value of the mean of the inventory variable.
sales.sd <-sd(txhousing$sales) #this effort fails because NAs are present
sales.sd <-sd(txhousing$sales, na.rm=T)
mean.inv <-mean(txhousing$inventory) #NA problem again
mean.inv<-mean(txhousing$inventory,na.rm=T)
- Open the crabs data frame, loaded in the MASS package. Prepare a scatterplot showing the relationship between the variables FL and CW. Use the help function to find more information about the crabs data frame. Use that information to provide more informative x- and y-labels and a figure title. (?crabs will take you to the help file on the data frame.)
Add the best-fit linear line to your plot. Change the colors and size of the points to improve the look of your figure. Use the theme() command to change the theme of your plot. By loading the ggthemes package, you have many choices for themes. As soon as you type + theme, a pop-up of all the choices will appear. Try a bunch of them and choose the theme that looks best to you.
ggplot(crabs,aes(FL,CW)) +geom_point(cex=2.5,col="darkslateblue")+
xlab("Frontal Lobe Size (mm)")+ylab("Carapace Width (mm)") +theme_solarized()+
stat_smooth(method="lm",col="forestgreen")
- Open the diamonds data frame, loaded in the ggplot2 package. Prepare a scatterplot of price (y-variable) against carat (x-variable). Within the aes() portion of your ggplot code, use the col command to display the clarity and the shape command to display the cut. If you need help using the shape command, just run the command ?shape to find help. At the end of the help file, you will find examples of the use of the shape function.
ggplot(diamonds,aes(carat,price,col=clarity,shape=cut)) +geom_point()
- Use facet_wrap() to facet your diamonds figure by color. We used facet_grid() earlier. Use ?facet_wrap to find information on how to use facet_wrap(). Again, looking at the examples at the end of the help document will be very useful.
ggplot(diamonds,aes(carat,price,col=clarity,shape=cut)) +geom_point() +
facet_wrap(~color)
You can specify the number of rows in the facet_wrap function.
ggplot(diamonds,aes(carat,price,col=clarity,shape=cut)) +geom_point() +
facet_wrap(~color,nrow=2)