没有合适的资源?快使用搜索试试~ 我知道了~
资源详情
资源评论
资源推荐
R Programming
Robin Evans
robin.evans@stats.ox.ac.uk
Michaelmas 2014
This version: November 5, 2014
Administration
The course webpage is at
http://www.stats.ox.ac.uk/
~
evans/teaching.htm
Lectures are at 10am on Mondays and Wednesdays, and practicals at 9am
on Tuesdays and Thursdays; in reality, there will be rather a lot of overlap
between these two formats.
Please bring your own laptop to use during all classes, and ensure that you
have R working (see below). If you don’t have access to a laptop, let me
know and we will try to provide one.
I will hold office hours each week during Michaelmas term on Wednesdays
between 12pm and 1pm; my office is on the first floor of 2 SPR, room
204. I’m very happy to help with any difficulties or problems you are having
with R, but please take steps to help yourselves first (see below for a
list of resources).
Software
You should install R on your own computer at the first opportunity. Visit
http://cran.r-project.org/
for details. Ensure you have the latest version (as of the start of Michaelmas
2014, this was version 3.1.1). Try to spend some time getting used to the
basics of the software, including arithmetic operations and functions. There
are many excellent online tutorials for this purpose.
1
Resources
A strength of R is its help files, which we will discuss. These are accessed
with the ? and ?? commands.
The internet has almost all the answers, and knows much more about R
than I do. If you have a problem, it’s extremely likely that someone will
have had the same difficulty already, and posted a question on an internet
forum.
Books are useful, though not required. Here are a some of them with brief
comments.
1. Venables, W.N. and Ripley, B.D. (2002) Modern Applied Statistics with
S. Springer-Verlag. 4th edition.
The classic text.
2. Chambers (2010) - Software for Data Analysis: Programming with R,
Springer.
One of few books with information on more advanced programming (S4,
overloading).
3. Wickham, H. (2014) Advanced R. Chapman and Hall.
A great new book on the more advanced features: a good follow up to this
class.
4. Crawley, M. (2007) The R Book. Wiley.
Very thorough.
5. Fox, J. (2002) A R and S-PLUS Companion to Applied Regression. Sage.
Does what it says.
6. Ligges, U. (2009) Programmieren mit R. Third edition. Springer.
In German(!)
7. Rizzo, M. L. (2008) Statistical Computing with R. CRC/Chapman &
Hall.
More computational – different examples to the other books.
8. Braun, W. J. and Murdoch, D. J. (2007) A First Course in Statistical
Programming with R. CUP.
Detailed and well written, but at a rather low level. A bit redundant given
the above.
2
9. Maindonald J. and Braun, W. J. (2003) Data Analysis and Graphics
using R Second or third edition CUP.
Advanced statistical graphics
10. Spector, P. (2008) Data Manipulation with R. Springer
Especially for data manipulation.
11. Dalgaard, P. (2009) Introductory Statistics with R. Second Edition. Springer.
Probably redundant given the above.
Getting the Most out of the Class
Learning R has much in common with learning a natural language: it’s easy
to get going with a few simple phrases, though you’ll find some idiosyn-
crasies in the syntax, and occasional aspects are downright illogical. Once
you’ve mastered these few difficulties, the only barrier to fluency is the vast
vocabulary of R: even in the basic packages there are many commands which
you will never use or understand, but the more you learn the more elegantly
you will be able to express yourself. There is a smaller core of ‘everyday’ lan-
guage which we will focus on, and which you will be expected to understand
in exams and practical assessments.
These lecture notes are intended for reference, and will (by the end of the
course) contain sections on all the major topics we cover. Lectures will not
follow the notes exactly, so be prepared to take your own notes; the practical
classes will complement the lectures, and you can be examined on anything
we study in either.
Don’t copy and paste the commands from this guide into R; you will find
it very hard to remember the details of the language and will have to look
everything up when you come to code something yourself.
Make sure you try the exercises, and understand the code involved in
each one; if something doesn’t make sense, use R’s help functions, ask a
classmate, try using internet resources, or ask me for help (preferably in
that order). Some exercises are marked with an asterisk (*), which means
they are a little more advanced than is necessary for the class.
If you find any mistakes or omissions in these notes, I’d be very grateful to
be informed.
3
1 Introduction
1.1 What R is good at
Statistics for relatively advanced users: R has thousands of packages, de-
signed, maintained, and widely used by statisticians.
Statistical graphics: try doing some of our plots in Stata and you won’t have
much fun.
Flexible code: R has a rather liberal syntax, and variables don’t need to be
declared as they would in (for example) C++, which makes it very easy to
code in. This also has disadvantages in terms of how safe the code is.
Vectorization: R is designed to make it very easy to write functions which
are applied pointwise to every element of a vector. This is extremely useful
in statistics.
R is powerful: if a command doesn’t exist already, you can code it yourself.
1.2 What R is not so good at
Statistics for non-statisticians: there is a steep learning curve, which puts
some people off. Try Stata, SAS or SPSS (if you must).
Numerical methods, such as solving partial differential equations; try Mat-
lab.
Analytical methods, such as algebraically integrating a function. Try Math-
ematica or Maple.
Precision graphics, such as might be useful in psychology experiments. Try
Matlab.
Optimization. Though it does have some very easy to use methods built-in.
Low-level, high-speed or critical code; use C, C++, Java or similar. (How-
ever note that such code can be called from R to give the ‘best of both
worlds’.
1.3 General Properties
R makes it extremely easy to code complex mathematical or statistical proce-
dures, though the programs may not run all that quickly. You can interface
R with other languages (C, C++, Fortran) to provide fast implementations
of subroutines, but writing this code (and making it portable) will typically
take longer. Where the advantage falls in this trade-off will depend upon
4
what you’re doing; for most things you will encounter during your degree, R
is sufficiently fast.
R is open source and widely adopted by statisticians, biostatisticians, and
geneticists. There is a huge wealth of existing libraries so you can often
save time by using these, though it is sometimes easier to start from scratch
than to adapt someone else’s function to meet your needs. Contributing new
packages to the central repository (CRAN) is easy: even your lecturer has
managed it. As a result, R packages are not build to very high standards
(but see Bioconductor).
R is portable, and works equally well on Windows, OS X and Linux.
1.4 Interfaces
For Windows and OS X, the standard R download comes with an R GUI,
which is adequate for simple tasks. You can also run R from the command
line in any operating system.
There are a number of more powerful interfaces which you may like to try.
Here’s a few.
RStudio. Very popular, with a nice interface and well thought out, espe-
cially for more advanced usage: can be a bit buggy, so make sure you
update it regularly. Available on all platforms.
Emacs with ESS. (Emacs Speaks Statistics) is available on all platforms,
and is very powerful when you get used to it. Has a habit of freezing
in my experience, though.
TinnR. Alternative Windows interface.
I intend to demonstrate a few of these different approaches during class.
5
剩余81页未读,继续阅读
homewin
- 粉丝: 0
- 资源: 4
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0