Using R for Data Analysis and Graphics
Introduction, Code and Commentary
J H Maindonald
Centre for Bioinformation Science,
Australian National University.
©J. H. Maindonald 2000, 2004. A licence is granted for personal study and classroom use.
Redistribution in any other form is prohibited.
Languages shape the way we think, and determine what we can think about (Benjamin Whorf.).
10 October 2004
1
t a il
le n gt h
6 0 6 5 7 0 7 5
3 2 3 4 3 6 3 8 4 0 4 2
6 0 6 5 7 0 7 5
f o o t
le n gt h
3 2 3 6 4 0
e a r c o n c h
le n gt h
4 0 4 5 5 0 5 5
4 0 4 5 5 0 5 5
Ca m bar v ille
Be llbir d
W h ia n W h ia n
B y r a n ge r y
C o n o n da le
A lly n R iv e r
B ulbur in
fe m a le m ale
Lindenmayer, D. B., Viggers, K. L., Cunningham, R. B., and Donnelly, C. F. : Morphological
variation among populations of the mountain brushtail possum, trichosurus caninus Ogibly
(Phalangeridae:Marsupialia). Australian Journal of Zoology 43: 449-459, 1995.
possum n. 1 Any of many chiefly herbivorous, long-tailed, tree-dwelling, mainly Australian marsupials,
some of which are gliding animals (e.g. brush-tailed possum, flying possum). 2 a mildly scornful term
for a person. 3 an affectionate mode of address.
From the Australian Oxford Paperback Dictionary, 2
nd
ed, 1996.
2
TABLE OF CONTENTS
Introduction............................................................................................................................................................ 1
1. Starting Up..........................................................................................................................................................3
1.1 Getting started under Windows......................................................................................................................3
1.2 Use of an Editor Script Window.................................................................................................................... 4
1.3 A Short R Session..........................................................................................................................................5
1.4 Further Notational Details........................................................................................................................... 7
1.5 On-line Help................................................................................................................................................. 7
1.6 The Loading or Attaching of Datasets..........................................................................................................7
1.7 Exercise.........................................................................................................................................................8
2. An Overview of R............................................................................................................................................... 9
2.1 The Uses of R.................................................................................................................................................9
2.2 R Objects......................................................................................................................................................11
*2.3 Looping...................................................................................................................................................... 12
2.4 Vectors......................................................................................................................................................... 12
2.5 Data Frames................................................................................................................................................ 15
2.6 Common Useful Functions...........................................................................................................................16
2.7 Making Tables..............................................................................................................................................17
2.8 The Search List............................................................................................................................................ 18
2.9 Functions in R..............................................................................................................................................18
2.10 More Detailed Information........................................................................................................................20
2.11 Exercises.................................................................................................................................................... 20
3. Plotting.............................................................................................................................................................. 21
3.1 plot () and allied functions...........................................................................................................................21
3.2 Fine control – Parameter settings............................................................................................................... 21
3.3 Adding points, lines and text........................................................................................................................22
3.4 Identification and Location on the Figure Region...................................................................................... 25
3.5 Plots that show the distribution of data values............................................................................................25
3.6 Other Useful Plotting Functions..................................................................................................................29
3.7 Plotting Mathematical Symbols ..................................................................................................................30
3.8 Guidelines for Graphs................................................................................................................................. 31
3.9 Exercises...................................................................................................................................................... 31
3.10 References..................................................................................................................................................32
4. Lattice graphics................................................................................................................................................ 33
4.1 Examples that Present Panels of Scatterplots – Using xyplot().................................................................. 33
4.3 Exercises...................................................................................................................................................... 35
5. Linear (Multiple Regression) Models and Analysis of Variance................................................................. 37
i
5.1 The Model Formula in Straight Line Regression........................................................................................ 37
5.2 Regression Objects.......................................................................................................................................38
5.3 Model Formulae, and the X Matrix............................................................................................................. 38
5.4 Multiple Linear Regression Models.............................................................................................................40
5.5 Polynomial and Spline Regression.............................................................................................................. 43
5.6 Using Factors in R Models.......................................................................................................................... 46
5.7 Multiple Lines – Different Regression Lines for Different Species.............................................................49
5.8 aov models (Analysis of Variance).............................................................................................................. 50
5.9 Exercises...................................................................................................................................................... 52
5.10 References..................................................................................................................................................53
6. Multivariate and Tree-Based Methods.......................................................................................................... 55
6.1 Multivariate EDA, and Principal Components Analysis.............................................................................55
6.2 Cluster Analysis........................................................................................................................................... 56
6.3 Discriminant Analysis..................................................................................................................................56
6.4 Decision Tree models (Tree-based models).................................................................................................58
6.5 Exercises...................................................................................................................................................... 58
6.6 References....................................................................................................................................................58
*7. R Data Structures...........................................................................................................................................59
7.1 Vectors......................................................................................................................................................... 59
7.2 Missing Values............................................................................................................................................. 59
7.3 Data frames..................................................................................................................................................60
7.4 Data Entry....................................................................................................................................................61
7.5 Factors and Ordered Factors...................................................................................................................... 62
7.6 Ordered Factors...........................................................................................................................................63
7.7 Lists..............................................................................................................................................................64
*7.8 Matrices and Arrays.................................................................................................................................. 65
7.9 Exercises...................................................................................................................................................... 66
8. Useful Functions............................................................................................................................................... 68
8.1 Confidence Intervals and Tests....................................................................................................................68
8.2 Matching and Ordering............................................................................................................................... 68
8.3 String Functions...........................................................................................................................................68
8.4 Application of a Function to the Columns of an Array or Data Frame .....................................................69
*8.5 aggregate() and tapply()............................................................................................................................ 69
*8.7 Merging Data Frames................................................................................................................................70
8.8 Dates............................................................................................................................................................ 70
8.9 Exercises...................................................................................................................................................... 71
9. Writing Functions and other Code.................................................................................................................72
9.1 Syntax and Semantics...................................................................................................................................72
9.2 Issues for the Writing and Use of Functions............................................................................................... 73
ii
9.3 Functions as aids to Data Management...................................................................................................... 73
9.4 A Simulation Example..................................................................................................................................74
9.5 Exercises...................................................................................................................................................... 75
*10. GLM, and General Non-linear Models...................................................................................................... 78
10.1 A Taxonomy of Extensions to the Linear Model........................................................................................78
10.2 Logistic Regression....................................................................................................................................79
10.3 glm models (Generalized Linear Regression Modelling).......................................................................... 82
10.4 Models that Include Smooth Spline Terms................................................................................................ 83
10.5 Survival Analysis........................................................................................................................................83
10.6 Non-linear Models..................................................................................................................................... 83
10.7 Model Summaries...................................................................................................................................... 83
10.8 Further Elaborations................................................................................................................................. 83
10.9 Exercises.................................................................................................................................................... 84
10.10 References................................................................................................................................................84
*11. Multi-level Models, Repeated Measures and Time Series........................................................................86
11.1 Multi-Level Models, Including Repeated Measures Models..................................................................... 86
11.2 Time Series Models....................................................................................................................................90
11.3 Exercises.................................................................................................................................................... 91
11.4 References..................................................................................................................................................91
*12. Advanced Programming Topics..................................................................................................................92
12.1. Methods.....................................................................................................................................................92
12.2 Extracting Arguments to Functions .......................................................................................................... 92
12.3 Parsing and Evaluation of Expressions.....................................................................................................93
12.4 Plotting a mathematical expression...........................................................................................................94
12.4 Searching R functions for a specified token.............................................................................................. 95
13. R Resources ....................................................................................................................................................96
13.1 R Packages for Windows............................................................................................................................96
13.2 Literature written by expert users.............................................................................................................. 96
13.3 The R-help electronic mail discussion list................................................................................................. 97
13.4 Competing Systems – XLISP-STAT........................................................................................................... 97
14. Appendix 1...................................................................................................................................................... 98
14.1 Data Sets Referred to in these Notes......................................................................................................... 98
14.2 Answers to Selected Exercises................................................................................................................... 98
iii