Quick introduction to descriptive statistics and graphs in
R Commander
Written by: Robin Beaumont e-mail:
Date last updated Monday, 17 September 2012
Version: 1
Contents
Boxplots
Percentages for each category/factor level
Summaries for a interval/ratio variable divided across categories (factor levels)
Histograms
Density plots
Densityplots for subgroups defined by factor levels
Boxplots
From within R you need to load R commander by typing in the following command:
library(Rcmdr)
First of all you need some data and for this example I'll use the sample dataset, by loading it directly from my website. You can do this by selecting the R commander menu option:
Data-> from text, the clipboard or URL
Then I have given the resultant dataframe the name mydataframe, also indicating that it is from a URL (i.e. the web) and the columns are separated by tab characters.
Clicking on the OK button brings up the internet URL box, you need to type in it the following to obtain my sample data:
This dataset has 7 variables of which we are only interested in two here; time (the outcome variable) and dosage a grouping variable indicating which group the result ('time') belongs to.
Percentages for each category/factor level
Using the dataset from the boxplots example. Taking a single variable we can obtain the counts for each category + percentage in R commander.
Consider we wanted to know what the number and percentage of cases are in each group, that is within each category (level) of the dosage variable.
The dosage variable is a grouping variable = nominal data, and each value is said to represent a factor level.
Summaries for a interval/ratio variable divided across categories (factor levels)
We can obtain simple descriptive statistics using the menu option show opposite we can also find these for subgroups by using the Summarize by groups option.
Histograms
Say we wanted to see the distribution of ages in our dataset, you have three options usually you would only show one in a report.
Frequency counts:
Percentages:
Note the dataframe dollar column name format i.e. mydataframe$age description of the x axis.
Density plots
A density plot is a smoothed version of a histogram its very useful. Unfortunately there is no r commander menu option to produce them so you need to type the command:
plot (density(dataframe name $ column name))
So for our dataframe which we have called mydataframe and the column called age within it we type;
plot( density ( mydataframe$age))
Densityplots for subgroups defined by factor levels
There are many ways and the easiest is to use the lattice package introduced latter in the course but for now just considering the gender variable which has only 2 levels we can do the following:
First copy only the male cases into a dataframe called maledata:
maledata <- mydataframe[mydataframe$gender == "Male",]
Now copy only the female cases into a dataframe called femaledata:
femaledata <- mydataframe[mydataframe$gender == "Female",]
Now create our densityplot
plot(density(maledata$age), ylim = c(0, 0.07), main = "densityplots for males/females[dotted] for age", xlab= "age (years)" )
Now need to superimpose the female density line.
lines(density(femaledata$age), lty = 2)
end of document