An Introduction to S-Plus

Introduction to S-plus

An introduction to S-plus

These notes are adapted notes written by Dr. Sara Morris of the Department of Epidemiology and Public Health, Imperial College School of Medicine, and Dr Gavin Shaddick of the Department of Mathematics, University of Bath.

Table of Contents 2

A brief introduction 3

Getting started 4

Simple arithmetic 4

Simple numeric functions 5

Objects in S-plus 5

Object assignment 5

Managing objects 5

Logical values 6

Vectors 7

Sequences 7

Vector arithmetic 8

Extracting elements of a vector 9

Simple vector argument functions 9

Matrices 10

Creating matrices 10

Matrix subscripts 10

Direct assignment in matrices 11

Binding bits together 11

Matrix arithmetic 12

Functions for working with matrices 12

Character matrices 13

Reading in data 13

Missing data values 14

Dataframes 14

Lists and factors 16

Factors 19

Graphics 20

Basic plotting 20

Basic Statistics 21

Univariate data summaries 21

Bivariate data summaries 21

Traditional statistical tests 22

Generating random variates 22

Scripts 23

User defined functions 23

A brief introduction

S-plus is an integrated suite of software facilities for data manipulation, calculation and graphical display. The Windows version of S-plus that you will be using offers both a menu-based interface and a command-line interface. The command-line interface is more flexible, and allows the user to write his/her own functions. Most of the functions you will be using during this practical have been specially written and are not supplied as standard with the software. These introductory notes will focus on how to use the command-line language; please refer to the on-line Splus help menu for the corresponding dialog box options.

Splus consists of a series of objects (which can contain data) on which functions (which are themselves objects) are performed. One of the simplest forms of object is the vector, which is an ordered collection of numbers or characters.

As an example, we can set up a simple vector of numbers and find the mean. To set up a vector, x, consisting of the three numbers 10, 12 and 15 we use the concatenate function, c().

X <-c(10,12,15)

The symbol <- ‘assigns’ the results of performing the function c()to the new variable x. The function simply takes the numbers within the parentheses and joins them together to form a vector. Functions in S-plus always take the form of a name followed by a set of parentheses, within which the object, or data, on which the function is to be performed is declared (some functions also require additional information, or arguments, which are also declared within the parentheses).

For example, to find the mean of these three numbers, now known simply as x, we can use the mean() function, with the name of the data object specified in the parentheses:

mean(x)

12.3333

It is often useful to be able to keep the results of such a calculation, which can be easily done by assigning the result to another variable

mx <- mean(x)

The object mx now contains the mean of the three numbers stored in x

12.333

If we call a function without the parentheses, (), it will show the function code

mean

function(x,trim=0,na.rm=F)

{

if(na.rm) {

wnas<-which.na(x)

if(length*wnas))

x<-x[-wnas]

}

if(mode(x) == “complex”) {

if(trim>0)

stop(“trimming not allowed for complex data”)

return(sum(x)/length(x))

}

x<-as.double(x)

if(trim>0) {

if(trim>=0.5)

return(median(x,na.rm=F))

if(!na.rm && length(which.na(x)))

return(NA)

n<-length(x)

il<-floor(trim*n)+1

i2<-n-i1+1

x<-sort(x,unique(c(i1,i2)))[i1:i2]

}

sum(x)/length(x)

}

Getting started

When you create objects (data, functions etc…) in S-plus they are stored in a subdirectory called ‘_Data’ which you need to create before you run S-plus for the first time. Note that a default directory is used by S-plus in the absence of your own directory.

When you have started S-plus, you are can now enter commands at the prompt (the > sign). When you want to exit the program, use the function q() (note that no data or other arguments have to be declared within the parentheses). This quits the program, any data or functions that you have created will be stored in your ‘_Data’ directory and will be available for use the next time you use S-plus

Simple arithmetic

In S-plus, an expression typed at the prompt is evaluated and the result printed. You can use simple arithmetic notation:

> 1 + 2 +3 # add

[1] 6 # answer printed as a vector with one element

>2 + 3 * 4 # multiplication is done first

[1] 14

> 3/2 + 1 # as is division

[1] 2.5

> 4*3**3 # use ** or ^ for powers

Use round brackets ()to re-order priorities.

Note that the hash # allows us to add comments to help explain the code which are ignored by S-plus

Simple numeric functions

Note that functions are different to the operators ‘+,-‘ etc. and are called with round brackets giving the arguments, e.g. functionname(arg) or functionname(arg1, arg2) where there is more than one argument.

> sqrt(2)

[1] 1.414214

> sin(3.14159) # sin of pi

[1] 2.65359e-06 # close …

> sin(pi) # pi can be used as a given constant

[1] 1.224647e-16

Other common mathematical functions are exp, log and abs.

These functions can be nested and combined as in sqrt(sin(45*pi/180)). Note the use of parentheses to explicitly determine the arguments of each function, sqrt and sin.

Objects in S-plus

S-plus is what is known as ‘object-orientated’ software; the word object is synonymous with the word ‘thing’. So anything with a name in S-plus is an object, i.e. functions, vectors, matrices, lists and so on.

Object assignment

You can assign a value to an object with the <- operator (a ‘less than’ sign followed by a minus and a space).

> x <- sqrt(2) # x gets the square root of 2

> x

[1] 1.41424

Typing the name of an object prints the value of it to screen. You can use x in arithmetic operations such as x**3 and sin(x). Object names must start with a letter and may contain letters, numbers and dots, but not underscore ‘_’ characters.

S-plus is case sensitive. If you create an object with a name already used by S-plus you will get a warning, either when you make the assignment or when you call the object.

For example, if you create an object called c, later you will see warning: looking for object of mode function, ignored one of mode numeric. This is because there is already an S-plus function named c, namely the concatenate function that puts data into a vector.

Managing objects

When you first make an assignment, the object is stored in the _Data directory that is created when you first started S-plus. You can list the objects in this directory from within S-plus using the ls() function.

>ls()

[1] “.Last.value” “x”

The result is a character vector of the names of the objects (those beginning with a dot are S-plus housekeeping files). To remove an object, use rm().

> rm(x)

To make a copy, just assign it to a new object, i.e. xnew <- x.

Notice that the functions you have called are not visible in your _Data directory. These are stored in the main S-plus directory, but are visible via the internal search path. When you type the name of an object, S-plus looks in your _Data directory and then a list of places defined by your S-plus search path. To see the list of directories that S-plus is currently using, use the search() function. The result is a character vector giving the names of the places on the path, for example

> search()

[1] “_Data”

[2] “//splus//.Functions”

[3] “//splus//library//trellis//_Data”

You can list the objects in any of those directories using an optimal argument to the search() function, called pos, for position:

> ls(pos=2)

[1] “%*% “ “%*% .default“ … # a very large vector of S-plus function names

…

[1273] “zs.p” “zs.s” “zs.u” “zs.xbar”

You can add new places (directories) to your search path using the attach() and library() functions. S-plus will then be able to access objects in these other directories.

Logical values

S-plus enables you to compute with Boolean or logical values. A logical value is either True or False (1 or 0 if expressed in numeric terms).

> x <- 10

> x > 10 # is x strictly greater than 10 ?

[1] F

> x <= 10 # x strictly less than or equal to 10?

[1] T

> x == 10 # test for equality, use ==, is x=?

[1] T

> x <- T # direct assignment, x gets TRUE

> x * 1 # numeric operand will coerce

# T to 1, F to 0

[1] 1

> (!x) # not x

[1] F

Vectors

A vector is simply a collection of scalar objects. In S-plus, vectors are row vectors (see later for column vectors). To create a simple vector, use the c() function (for 'concatenation').

For example

> x <- c(2,3,5,7,11) # the first 5 prime numbers

> x

[1] 2 3 5 7 11

> char <- c("a","b","c") # vectors can be characters or

# numbers, but not a mix, in

# which case all elements will be

# stored as characters

Vectors can also be logical, logical operators on a vector produce a vector of T and F values

> x <- c(1,2,3,4,5,6,7,8,9,10)

> x < 5

[1] T T T T F F F F F F

> sum(x<5) # sum coerces to numeric

[1] 4

Exercise:

Assume you have created a vector

x <- c(1000,100,10,1,0.1)

What do you expect the following to produce ?

(i) x==100

(ii) x>1

(iii) (x/10)>1

Sequences

Use the a:b operator to create sequences of numbers and the seq() function for more complicated ones.

> xx <- 1:10 # descending sequence

> xx <- 100:1 # in printing a vector over a number of

# lines, the index of the first element

# on each line is given in []

> xx

[1] 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83

[19] 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65

[37] 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47

[55] 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29

[73] 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11

[91] 10 9 8 7 6 5 4 3 2 1

[1] 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83

[19] 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65

[37] 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47

[55] 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29

[73] 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11

[91] 10 9 8 7 6 5 4 3 2 1

> seq(4, 14, by = 3) # from 4 to 14 with gaps of 3

[1] 4 7 10 13

> seq(4, 14, length = 3) # three numbers evenly spaced between 4 and 14

[1] 4 9 14

To make replicates of numbers, use rep as in

> rep(2, 4) # repeat 2, 4 times

[1] 2 2 2 2

> rep(c(2,6), 2) # repeat (2,6) twice

[1] 2 6 2 6

> rep(c(2,6), c(2, 4)) # repeat 2, twice and 6, 4 times

[1] 2 2 6 6 6 6

> rep(c(2,6), rep(3, 2))# repeat 2, 3 times and 6, 3 times

[1] 2 2 2 6 6 6

Exercise:

What do you expect the following to do? Try to work it out and then try it in S-plus

(i) rep(4,4)

(ii) rep(1:5,5)

(iii) rep(1:5,c(2,2,2,5,5))

(iv) rep(1:3,3:1)

Vector arithmetic

All the mathematical operators used on numeric scalars can be used on numeric vectors. The vectors are operated on element-by-element to return a vector of the same length.

> x <- 1:10

> x * 2 # times each element by 2

[1] 2 4 6 8 10 12 14 16 18 20

> x * x # x squared

[1] 1 4 9 16 25 36 49 64 81 100

> y <- 1:2

> x + y # shorter vector repeated to be the right length

[1] 2 4 4 6 6 8 8 10 10 12

> x + 2

[1] 3 4 5 6 7 8 9 10 11 12

Notice that when two vectors have different lengths, the shorter is repeated to match the longer one. This is called the recycling rule. If the length of the shorter vector is not a multiple of the length of the other a warning is printed.

Exercise:

What do you expect the following to do ? Try to work it out and then try them in S-plus

x<-10:1

y<-1:5

(i) x*5

(ii) x+y

(iii) y*x

(iv) x/y

Extracting elements of a vector

You use square brackets to extract subsets of a vector, as in vect.name[subset.expression]. The subset expression can be a scalar index, a vector index, logical scalar or a logical vector. The following are some examples of scalar and vector indices:

> xx <- 100:1

> xx[7]

[1] 94

> xx[4:7]

[1] 97 96 95 94

> xx[c(2, 3, 5, 7, 11)]

[1] 99 98 96 94 90

> xx[c(1:3, 98:100)]

[1] 100 99 98 3 2 1

> want <- c(1,1,2,2)

> xx[want]

[1] 100 100 99 99

You can use negative subscript expressions to omit elements from a vector

> x <- 1:6

> x[-4]# x without the 4th value

[1] 1 2 3 5 6

You can use several negative values but you cannot mix negative and positive in the same expression. When using logical subscript expressions, a T selects the element and a F omits it: