Exp no: 1

Installing R on a Windows PC:

To install R on your Windows computer, follow these steps:

  1. Go to
  2. Under “Download and Install R”, click on the “Windows” link.
  3. Under “Subdirectories”, click on the “base” link.
  4. On the next page, you should see a link saying something like “Download R 2.10.1 for Windows” (or R X.X.X, where X.X.X gives the version of R, eg. R 2.11.1). Click on this link.
  5. You may be asked if you want to save or run a file “R-2.10.1-win32.exe”. Choose “Save” and save the file on the Desktop. Then double-click on the icon for the file to run it.
  6. You will be asked what language to install it in - choose English.
  7. The R Setup Wizard will appear in a window. Click “Next” at the bottom of the R Setup wizard window.
  8. The next page says “Information” at the top. Click “Next” again.
  9. The next page says “Information” at the top. Click “Next” again.
  10. The next page says “Select Destination Location” at the top. By default, it will suggest to install R in “C:\Program Files” on your computer.
  11. Click “Next” at the bottom of the R Setup wizard window.
  12. The next page says “Select components” at the top. Click “Next” again.
  13. The next page says “Startup options” at the top. Click “Next” again.
  14. The next page says “Select start menu folder” at the top. Click “Next” again.
  15. The next page says “Select additional tasks” at the top. Click “Next” again.
  16. R should now be installed. This will take about a minute. When R has finished, you will see “Completing the R for Windows Setup Wizard” appear. Click “Finish”.
  17. To start R, you can either follow step 18, or 19:
  18. Check if there is an “R” icon on the desktop of the computer that you are using. If so, double-click on the “R” icon to start R. If you cannot find an “R” icon, try step 19 instead.
  19. Click on the “Start” button at the bottom left of your computer screen, and then choose “All programs”, and start R by selecting “R” (or R X.X.X, where X.X.X gives the version of R, eg. R 2.10.0) from the menu of programs.
  20. The R console (a rectangle) should pop up:

Commands and Syntax in R

ls() #list the variables in the workspace

rm(x) #remove x from the workspace

rm(list=ls()) #remove all the variables from the workspace

attach(mat) #make the names of the variables in the matrix or data frame available in the workspace

detach(mat) #releases the names (remember to do this each time you attach something)

with(mat, .... ) #a preferred alternative to attach ... detach

new <- old[,-n] #drop the nth columnround(x,n) #rounds the values of x to n decimal places

ceiling(x) #vector x of smallest integers > x

floor(x) #vector x of largest interger < x

as.integer(x) #truncates real x to integers (compare to round(x,0)

as.integer(x cutpoint) #vector x of 0 if less than cutpoint, 1 if greater than cutpoint)

factor(ifelse(a cutpoint,"Neg","Pos")) #is another way to dichotomize and to make a factor for analysis

transform(data.df,variable names = some operation)#can be part of a set up for a data set

x%in%y #tests each element of x for membership in y

y%in%x #tests each element of y for membership in x

all(x%in%y) #true if x is a proper subset of y

all(x) # for a vector of logical values, are they all true?

any(x) #for a vector of logical values, is at least one true?

new <- old[-n,] #drop the nth row

new <- old[,-c(i,j)] #drop the ith and jth column

new <- subset(old,logical) #select those cases that meet the logical condition

complete <- subset(data.df,complete.cases(data.df)) #find those cases with no missing values

new <- old[n1:n2,n3:n4] #select the n1 through n2 rows of variables n3 through n4)

R Packages

  • When you download R from the Comprehensive R Archive Network (CRAN), you get that ``base" R system
  • The base R system comes with basic functionality; implements the R language
  • One reason R is so useful is the large collection of packages that extend the basic functionality of R
  • R packages are developed and published by the larger R community

R Packages

  • When you download R from the Comprehensive R Archive Network (CRAN), you get that ``base" R system
  • The base R system comes with basic functionality; implements the R language
  • One reason R is so useful is the large collection of packages that extend the basic functionality of R
  • R packages are developed and published by the larger R community

Obtaining R Packages

  • The primary location for obtaining R packages is CRAN
  • For biological applications, many packages are available from the Bioconductor Project
  • You can obtain information about the available packages on CRAN with the available.packages() function

a <- available.packages()

head(rownames(a), 3) ## Show the names of the first few packages

## [1] "A3" "abc" "abcdeFBA"

  • There are approximately 5200 packages on CRAN covering a wide range of topics
  • A list of some topics is available through the Task Views link, which groups together many R packages related to a given topic

Installing an R Package

  • Packages can be installed with the install.packages() function in R
  • To install a single package, pass the name of the lecture to the install.packages() function as the first argument
  • The following the code installs the slidify package from CRAN

install.packages("slidify")

  • This command downloads the slidify package from CRAN and installs it on your computer
  • Any packages on which this package depends will also be downloaded and installed

R Help :

  • The help() function and ?help operator in R provide access to the documentation pages for R functions, data sets, and other objects, both for packages in the standard R distribution and for contributed packages. To access documentation for the standard lm (linear model) function, for example, enter the command help(lm) or help("lm"), or ?lm or ?"lm" (i.e., the quotes are optional).

WorksapceIn R

The workspace refers to all the variables and functions (collectively called objects) that you create during an R session, as well as any packages that are loaded.

  • Often, you want to remind yourself of all the variables you’ve created in the workspace. To do this, use the ls() function to list the objects in the workspace. In the console, type the following:
  • ls()
  • [1] "h" "hw" "x" "y" "yourname" "z"

Exp no: 2

Data Structures

To make the best of the R language, you'll need a strong understanding of the basic data types and data structures and how to operate on those.

It is Very Important to understand because these are the objects you will manipulate on a day-to-day basis in R. Dealing with object conversions is one of the most common sources of frustration for beginners.

To understand computations in R, two slogans are helpful:

  • Everything that exists is an object.
  • Everything that happens is a function call.

John Chambers

R has 6 (although we will not discuss the raw class for this workshop) atomic classes.

  • character
  • numeric (real or decimal)
  • integer
  • logical
  • complex

R has many data structures. These include

  • atomic vector
  • list
  • matrix
  • data frame
  • factors
  • tables

Vectors

A vector is the most common and basic data structure in R and is pretty much the workhorse of R. Technically, vectors can be one of two types:

  • atomic vectors
  • lists

although the term "vector" most commonly refers to the atomic type not lists.

Atomic Vectors

A vector can be a vector of elements that are most commonly character, logical, integer or numeric.

You can create an empty vector with vector() (By default the mode is logical. You can be more explicit as shown in the examples below.) It is more common to use direct constructors such as character(), numeric(), etc.

x <- vector()

# with a length and type

vector("character", length = 10)

## [1] """"""""""""""""""""

character(5) ## character vector of length 5

## [1] """"""""""

numeric(5)

## [1] 0 0000

logical(5)

## [1] FALSEFALSEFALSEFALSEFALSE

Various examples:

x <- c(1, 2, 3)

x

## [1] 1 2 3

length(x)

## [1] 3

x is a numeric vector. These are the most common kind. They are numeric objects and are treated as double precision real numbers. To explicitly create integers, add an L at the end.

x1 <- c(1L, 2L, 3L)

You can also have logical vectors.

y <- c(TRUE, TRUE, FALSE, FALSE)

Finally you can have character vectors:

z <- c("Alec", "Dan", "Rob", "Karthik")

Examine your vector

typeof(z)

## [1] "character"

length(z)

## [1] 4

class(z)

## [1] "character"

str(z)

## chr[1:4] "Alec""Dan""Rob""Karthik"

Question: Do you see a property that's common to all these vectors above?

Add elements

z <- c(z, "Annette")

z

## [1] "Alec" "Dan" "Rob" "Karthik""Annette"

More examples of vectors

x <- c(0.5, 0.7)

x <- c(TRUE, FALSE)

x <- c("a", "b", "c", "d", "e")

x <- 9:100

x <- c(1 + (0+0i), 2 + (0+4i))

Lists

A list is an R structure that may contain object of any other types, including other lists. Lots of the modeling functions (like t.test() for the t test or lm() for linear models) produce lists as their return values, but you can also construct one yourself:

mylist <- list (a = 1:5, b = "Hi There", c = function(x) x * sin(x))

Now the list "mylist" contains three things, named "a," "b," and "c." Their lengths are different: a has length 5, b has length 1, and c is a function, so it doesn't really have a length. (Technically, it has length 1, just because somebody decided that the "length" of a function should be one.) To extract an item from a list, you can use the single brackets, but that will give you back a list. Thus

mylist[1]

$a:

[1] 1 2 3 4 5

mylist[1] + 1 # Can we do math on that?

Error in mylist[1] + 1: Non-numeric first operand

Dumped

We can't do math on a list. Generally you will want single brackets when you're extracting pieces of a list to make into another list.

If we want one of the items in its original form, we can extract it with double square brackets, or by using the dollar sign and the name. (The items in the list don't need to have names, but in this case they do have names, since we supplied them at the time we created the list.)

mylist[[1]]

[1] 1 2 3 4 5

mylist$a

[1] 1 2 3 4 5

mylist$a + 1 # Can we do math on that?

[1] 2 3 4 5 6 # Answer: yes.

mylist$a[2] # What's the second element of the item named "a"?

[1] 2 # Answer: 2.

mylist$a[-2] # Give me everything from "a" except the second element

[1] 1 3 4 5 # Remember the negative subscript?

Usually a list will consist of vectors. As always, every element of a vector must be of the same type (numeric, character, logical, complex...). A list is a natural way, then, to store items of different types and lengths. For example, when you fit a statistical model you might get back a vector of (numeric) coefficients, a logical vector saying which terms were included in the model, a matrix of data, a vector of messages from the fitting routines, a so on. A list is the natural way to represent this sort of thing in R.

Adding items to, and deleting items from, a list

One useful thing about lists is how easily items can be added or deleted. To delete an item from a list, assign NULL to that item's name or number. Furthermore you can add a new element to the list simply by assigning something to a new name. (These will be added at the end of the list.) For example:

mylist # Here's what mylist looks like right now

$a:

[1] 1 2 3 4 5

$b:

[1] "Hi There"

$c:

function(x)

x * sin(x)

mylist$d <- "New item" # Add a new item and print out the list

mylist # We could have assigned to mylist[[4]]; then that

$a: # new element would not have had a name

[1] 1 2 3 4 5

$b:

[1] "Hi There"

DATA FRAME:

A data frame is a list of variables of the same number of rows with unique row names, given class "data. frame". If no variables are included, the row names determine the number of rows.

The column names should be non-empty, and attempts to use empty names will have unsupported results. Duplicate column names are allowed, but you need to use check.names = FALSE for data.frame to generate such a data frame. However, not all operations on data frames will preserve duplicated column names: for example matrix-like subsetting will force column names in the result to be unique.

R Programming/Importing and exporting data

Data can be stored in a large variety of formats. Each statistical package has its own format for data (xls for Microsoft Excel, dta for Stata, sas7bdat for SAS, ...). R can read almost all file formats. We present a method for each kind of file. If none of the following methods work, you can use a specific software for data conversion such as the free software OpenRefine or the commercial software Stat Transfer[1]. In any case, most statistical software can export data in a CSV (comma separated values) format and all of them can read CSV data. This is often the best solution to make data available to everyone.

Exp no: 3

A line chart is a graph that connects a series of points by drawing line segments between them. These points are ordered in one of their coordinate (usually the x-coordinate) value. Line charts are usually used in identifying the trends in data.

Theplot()function in R is used to create the line graph.

Syntax

The basic syntax to create a line chart in R is −

plot(v,type,col,xlab,ylab)

Following is the description of the parameters used −

  • vis a vector containing the numeric values.
  • typetakes the value "p" to draw only the points, "l" to draw only the lines and "o" to draw both points and lines.
  • xlabis the label for x axis.
  • ylabis the label for y axis.
  • mainis the Title of the chart.
  • colis used to give colors to both the points and lines.

Example

A simple line chart is created using the input vector and the type parameter as "O". The below script will create and save a line chart in the current R working directory.

Create the data for the chart.

v <-c(7,12,28,3,41)

# Give the chart file a name.

png(file ="line_chart.jpg")

# Plot the bar chart.

plot(v,type="o")

# Save the file.

dev.off()

When we execute the above code, it produces the following result −

Line Chart Title, Color and Labels

The features of the line chart can be expanded by using additional parameters. We add color to the points and lines, give a title to the chart and add labels to the axes.

Example

# Create the data for the chart.

v <-c(7,12,28,3,41)

# Give the chart file a name.

png(file ="line_chart_label_colored.jpg")

# Plot the bar chart.

plot(v,type="o",col="red",xlab="Month",ylab="Rain fall",

main="Rain fall chart")

# Save the file.

dev.off()

When we execute the above code, it produces the following result −

Multiple Lines in a Line Chart

More than one line can be drawn on the same chart by using thelines()function.

After the first line is plotted, the lines() function can use an additional vector as input to draw the second line in the chart,

# Create the data for the chart.

v <-c(7,12,28,3,41)

t <-c(14,7,6,19,3)

# Give the chart file a name.

png(file ="line_chart_2_lines.jpg")

# Plot the bar chart.

plot(v,type="o",col="red",xlab="Month",ylab="Rain fall",

main="Rain fall chart")

lines(t, type ="o",col="blue")

# Save the file.

dev.off()

When we execute the above code, it produces the following result −

You can also create a vector ofncontiguous colors using the functionsrainbow(n),heat.colors(n),terrain.colors(n),topo.colors(n), andcm.colors(n).

colors()returns all available color names.

Fonts

You can easily set font size and style, but font family is a bit more complicated.

option / description
font / Integer specifying font to use for text.
1=plain, 2=bold, 3=italic, 4=bold italic, 5=symbol
font.axis / font for axis annotation
font.lab / font for x and y labels
font.main / font for titles
font.sub / font for subtitles
ps / font point size (roughly 1/72 inch)
text size=ps*cex
family / font family for drawing text. Standard values are "serif", "sans", "mono", "symbol". Mapping is device dependent.

In windows, mono is mapped to "TT Courier New", serif is mapped to"TT Times New Roman", sans is mapped to "TT Arial", mono is mapped to "TT Courier New", and symbol is mapped to "TT Symbol" (TT=True Type). You can add your own mappings.

# Type family examples - creating new mappings
plot(1:10,1:10,type="n")
windowsFonts(
A=windowsFont("Arial Black"),
B=windowsFont("Bookman Old Style"),
C=windowsFont("Comic Sans MS"),
D=windowsFont("Symbol")
)
text(3,3,"Hello World Default")
text(4,4,family="A","Hello World from Arial Black")
text(5,5,family="B","Hello World from Bookman Old Style")
text(6,6,family="C","Hello World from Comic Sans MS")
text(7,7,family="D", "Hello World from Symbol")

click to view

Margins and Graph Size

You can control the margin size using the following parameters.

option / description
mar / numerical vector indicating margin size c(bottom, left, top, right) in lines. default = c(5, 4, 4, 2) + 0.1
mai / numerical vector indicating margin size c(bottom, left, top, right) in inches
pin / plot dimensions (width, height) in inches

Exp no:4

Measures of central tendency

Statistical analysis in R is performed by using many in-built functions. Most of these functions are part of the R base package. These functions take R vector as an input along with the arguments and give the result.

The functions we are discussing are mean, median and mode.

Mean

It is calculated by taking the sum of the values and dividing with the number of values in a data series.

The functionmean()is used to calculate this in R.

Syntax

The basic syntax for calculating mean in R is −