下载  >  课程资源  >  专业指导  > R Programming for Data Science

R Programming for Data Science 评分:

R Programming for Data Science by (Roger D. Peng)
Contents P relace History and Overview of r 4 What is r? 4 What is s 4 The S Philosophy Back to r Basic features of r Free Software Design of the r system Limitations ofr 8 R Resources Getting started with R 11 Installation Getting started with the R interface R Nuts and bolts 2 Entering Input Evaluation 12 R Objects 13 Numbers 13 Attributes Creating vectors 14 Mixing objects 15 Explicit Coercion 15 Matrices 16 Lists Factors 18 Missing values 19 Data frames 20 Names Summary 22 CONTENTS Getting Data In and out of r Reading and Writing data Reading Data Files with read table() 23 Reading in Larger Datasets with read table 24 Calculating Memory requirements for r Objects 25 Using Textual and Binary Formats for Storing Data 27 Using put() and dump() 27 Binary formats Interfaces to the Outside world 31 File Connections 31 Reading lines of a text file 32 Reading From a uRl Connection Subsetting R Objects 35 Subsetting a Vector 35 Subsetting a matrix D番 36 g Subsetting nested elements of a list 38 Extracting Multiple elements of a list Partial Matching 39 Removing NA values 40 Vectorized operations 42 Vectorized matrix Operations 4 Dates and times 44 Dates in r 44 Times in r 44 Operations on Dates and Times 46 Summary 47 Control structures 48 if-else 8 for loops 50 Nested for loops 52 while loops 53 repe at loops 54 next break 54 Summary 55 Functions 56 Functions in r 56 Your first Function 56 CONTENTS Argument Matching .......... 60 Lazy evaluation 62 The... Argument Arguments Coming After the . Argument 63 Summary 64 Scoping rules of r 65 A Diversion on Binding Values to Sy mbol 65 Lexical Scoping: Why Does It Matter? Scoping rule 66 67 Lexical vs Dynamic Scoping 68 Application: Optimization 70 Plotting the Likelihood Summary 73 Coding Standards for R 74 Loop Functions 75 Looping on the Command Line 75 lapply() 75 supply 79 split 80 Splitting a data frame 81 ppl 85 apply o) 87 Col/ Row Sums and means 88 Other Ways to Apply 88 apply 90 Vectorizing a function 92 Summary Debugging 9生 Somethings Wrong 94 Figuring Out What's Wrong 97 Debugging Tools in R Using traceback() 8 Using debug() ···· Using recover() 100 ummary 101 Profiling r code 102 Using system time() 103 Timing Longer Expressions 04 The R Profiler 104 CONTENTS Using summaryRprof( 106 Summary 107 Simulation 09 Generating Random Numbers Setting the random number seed ,110 Simulating a Linear Model 111 Random Sampling 115 Summary 116 Data Analysis Case Study: Changes in Fine Particle Air Pollution in the U.S 117 Synopsis 117 Loading and Processing the Raw Data 117 Results ,119 Preface I started using R in 1998 when I was a college undergraduate working on my senior thesis The version was 0.63. I was an applied mathematics major with a statistics concentration and I was working with Dr. Nicolas Hengartner on an analysis of word frequencies in classic texts (Shakespeare, Milton, etc. ) The idea was to see if we could identify the authorship of each of the texts based on how frequently they used certain words. We downloaded the data from project Gutenberg and used some basic linear discriminant analysis for the modeling. The work was eventually published and was my first ever peer-reviewed publication. I guess you could argue it was my first real data science experience Back then, no one was using R. Most of my classes were taught with Minitab, SPSS, Stata,or Microsoft Excel. The cool people on the cutting edge of statistical methodology used S-PLUS. I was working on my thesis late one night and I had a problem. i didnt have a copy of any of those software packages because they were expensive and i was a student. I didnt feel like trekking over to the computer lab to use the software because it was late at night But I had the Internet! After a couple of Yahoo! searches I found a web page for something called r, which I figured was just a play on the name of the S-PLUS package. From what I could tell,R was a clone"of S-PluS that was free. I had already written some S-PLUS code for my thesis so i figured I would try to download r and see if i could just run the s-plus code It didnt work. At least not at first. It turns out that r is not exactly a clone of S-PLUS and quite a few modifications needed to be made before the code would run inR. In particular, R was missing a lot of statistical functionality that had existed in S-PLUS for a long time already. Luckily, R's programming language was pretty much there and i was able to more or less re-implement the features that were missing in r After college, I enrolled in a Phd program in statistics at the University of California, Los Angeles At the time the department was brand new and they didnt have a lot of policies or rules(or classes for that matter! ) So you could kind of do what you wanted, which was good for some students and not so good for others. The Chair of the department, Jan de leeuw, was a big fan of XLisp-Stat and so all of the departments classes were taught using XLisp-Stat. I diligently bought my copy of luke Tierney's book and learned to really love XLisp-Stat. It had a number of features that r didnt have at all, most notably dynamic graphics But ultimately, there were only so many parentheses that i could type, and still all of the research- level statistics was being done in S-PLUS. The department didnt really have a lot of copies ofS-PlUs lying around so I turned back to R. When I looked around at my fellow students, I realized that I was basically the only one who had any experience using R. Since there was a budding interest in R http:/amstat.tandfonline.com/doi/abs/10.1198/000313002100#.vqgiselpage http://www.amazon.com/lisp-stat-object-oriented-environment-stAtiStiCal-prObabIliTy/dp/0471509167/ Prefab around the department, I decided to start a brown bag series where every week for about an hour I would talk about something you could do in r(which wasn't much, really). People seemed to like it, if only because there wasn t really anyone to turn to if you wanted to learn about r By the time I left grad school in 2003, the department had essentially switched over from XI Stat toR for all its work(although there were a few hold outs ). Jan discusses the rationale for the transition in a paper in the yournal of statistical Software In the next step of my career, I went to the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health, where i have been for the past 12 years. When i got to Johns Hopkins people already seemed into R. Most people had abandoned S-PlUS a while ago and were committed to using R for their research. Of all the available statistical packages, R had the most powerful and expressive programming language, which was perfect for someone developing new statistical methods However, we didn t really have a class that taught students how to use r. this was a problem because most of our grad students were coming into the program having never heard of r. most likely in their undergradute programs, they used some other software package. So along with rafael Irizarry, Brian Caffo, Ingo Ruczinski, and Karl Broman, I started a new class to teach our graduate students R and a number of other skills they 'd need in grad school The class was basically a weekly seminar where one of us talked about a computing topic of interest I gave some of the R lectures in that class and when I asked people who had heard ofR before, almost no one raised their hand. And no one had actually used it before. The main selling point at the time was It's just like S-PluS but it's free! A ot of people had experience with Sas or Stata or SPSS.A number of people had used something like Java or C/C++ before and so I often used that a reference frame. No one had ever used a functional-style of programming language like Scheme or Lisp To this day, I still teach the class, known a Biostatistics 140. 776(Statistical Computing"). However, the nature of the class has changed quite a bit over the past 10 years. The population of students (mostly first-year graduate students) has shifted to the point where many of them have been introduced to R as undergraduates This trend mirrors the overall trend with statistics where we are seeing more and more students do undergraduate majors in statistics(as opposed to, say, mathematics). Eventually, by 2008-2009, when I'd asked how many people had heard of or used R before, everyone raised their hand. However, even at that late date i still felt the need to convince people that R was a real language that could be used for real tasks R has grown a lot in recent years, and is being used in so many places now, that I think it's essentially impossible for a person to keep track of everything that is going on. Thats fine, but it makes"introducing" people to R an interesting experience. Nowadays in class, students are often teaching me something new about r that Ive never seen or heard of before(they are quite good at Googling around for themselves). I feel no need to bring people over"to R. In fact it's quite the opposite-people might start asking questions if I werent teaching R http://www.jstatsoft.org/v13/107 ttp//www.biostat.jhsph.edu Prefab This book comes from my experience teaching R in a variety of settings and through different stages of its(and my) development. Much of the material has been taking from by Statistical Computing class as well as the r Programming class I teach through Courser Im looking forward to teaching r to people as long as people will let me, and I'm interested in seeing how the next generation of students will approach it(and how my approach to them will change). Overall, it's been just an amazing experience to see the widespread adoption of r over the past decade. I'm sure the next decade will be just as amazing https://www.coursera.org/course/rprog History and overview of R There are only two kinds of languages: the ones people complain about and the ones nobody uses-Bjarne Stroustrup Watch a video of this chapter What is r? This is an easy question to answer r is a dialect of s What is S? S is a language that was developed by John Chambers and others at the old Bell Telephone Laboratories, originally part of at&T Corp. S was initiated in 1976 as an internal statistical analysis environment--originally implemented as Fortran libraries. Early versions of the language did not even contain functions for statistical modeling In 1988 the system was rewritten in C and began to resemble the system that we have today(this was Version 3 of the language). The book Statistical Models in S by Chambers and Hastie(the white book documents the statistical analysis functionality. version 4 of the s language was released in 1998 and is the version we use today. The book Programming with Data by John Chambers(the green book) documents this version of the language Since the early 90s the life of the s language has gone down a rather winding path. In 1993 Bell Labs save StatSci (later Insightful Corp. )an exclusive license to develop and sell the s language. In 2004 Insightful purchased the S language from Lucent for $2 million. In 2006, Alcatel purchased Lucent Technologies and is now called Alcatel-Lucent Insightful sold its implementation of the S language under the product name S-PLUS and built a number of fancy features(GUIs, mostly) on top of it-hence the PLUS In 2008 Insightful was acquired by tIBCO for $25 million. As of this writing TIBCO is the current owner of the S language and is its exclusive developer The fundamentals of the s language itself has not changed dramatically since the publication of the Green Book by John Chambers in 1998. In 1998, S won the Association for Computing Machinery's Software System Award, a highly prestigious award in the computer science field https://youtu.be/stihtNvsznL http:/icm.bell-labs.com/stat/doc/94.11.ps

...展开详情
2017-01-16 上传 大小:10.34MB
举报 收藏
分享
R for data science中文版

R语言数据处理的经典教材,作图、数据处理结合,tidyverse

立即下载
R for Data Science 原版PDF by Wickham & Grolemund

Data science is an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge. The goal of R for Data Science is to help you learn the most important tools in R that will allow you to do data science. After reading this book, you’ll have the tools to tackle a wid

立即下载
R Programming for Data Science 数据分析计算的R编程

R Programming for Data Science 数据分析计算的R编程 高清目录 pdf 面子书

立即下载
R Programming for Data Science

Book 图书名称: R Programming for Data Science Author 作者:Roger D. Peng Publisher 出版社:Leanpub Page 页数:132 Publishing Date 出版时间:April, 2015 Language 语言:English Size 大小: MB Format 格式:pdf 文字版 ISBN:na/ Edition: 第1版

立即下载
R programming for data science

R programming for data science

立即下载
R for Data Science

R语言对数据进行分析,包含代码和书籍。。。。。。。。

立即下载
html+css+js制作的一个动态的新年贺卡

该代码是http://blog.csdn.net/qq_29656961/article/details/78155792博客里面的代码,代码里面有要用到的图片资源和音乐资源。

立即下载
qBittorrent插件集合(22个)

btetree.py cpasbien.py divxtotal.py ilcorsaronero.py kickass.py leetx.py limetorrents.py linuxtracker.py nyaa.py nyaapantsu.py nyaasi.py pantsu.py psychocydd.py rarbg.py rutor.py skytorrents.py sukebei.py sumotorrent.py tntvillage.py torrent9.py torrentfunk.py zooqle.py

立即下载
万能BIOS刷新工具Universal Flash Utility V8.95

近期在网搜刷新工具时,寻得这组万能刷新工具类型的希缺资源[正宗正版工具软件],特上传bios之家论坛,对号最需要它的爱好者群!软件版权归属原作品发布方,提供与本网站各界爱好者试用,以便交流刷新比较困难的 bios 实际使用经验![[ 其中的895工具是在本论坛首次亮像,=本论坛335469299用户曾在2011年6月29日发表过848的使用资料=各位可划文搜链接[ ==http://bbs.bios.net.cn/?8024== ]看 用户 awb 空间 所存载主题=求万能bios刷写工具flash849.exe-=之=-335469299 -=所回帖发布软件介绍使用参数 参考试用万能 bio

立即下载
基于eclipse+MySQL的图书馆管理系统

基于eclipse+MySQL的图书馆管理系统,可以实现读者的注册、借书和还书,管理员的管理等等操作,是一个功能全面的图书馆管理系统

立即下载
服务器CPU天梯图_最全CPU天梯图

主要是服务器CPU天梯图_最全CPU天梯图,文字版,不是图片

立即下载
公司年会滚动抽奖系统

基于HTML5和JS的公司年会抽奖系统。 该抽奖功能描述: 1).随机所有号码并且不重复出现。 2).中过奖的人,不能再进行抽奖。(不会中了2等奖在去中1等奖) 3).可以自定义抽奖的号码(姓名或数字),需要手动添加至HTML5代码中。

立即下载
2018年数模美赛全部题目A~F题(包括英文版,中文版)

这是我2018年参加数模美赛时全部题目A~F题,包括英文版,中文翻译版题目,适合参加数模国赛和美赛的同学熟悉一下题目。

立即下载
wonderware intouch 11.1 intouch 2014 R2最新中文授权 有效期至2019年1月15日。

wonderware intouch 11.1 intouch 2014 R2最新中文授权 有效期至2019年1月15日

立即下载
Python学习手册(第4版/第5版)两本高清文字完整.pdf版

一、两本书都是PDF版; 二、两本书都是高清文字版,不是影印; 三、两本书都是带有清晰的目录,方便学习时查找与跳转; 四、第5版为英文,第4版为中文,都是OREILLY,且都涵盖了Python2.6和3.X;英文好看英文,英文不好看中文,总一本适合你的; Learning Python 5th Edition Python Python学习手册(第5版).pdf Python学习手册(第4版).pdf 类型和操作——深入讨论Python主要的内置对象类型:数字、列表和字典等。 语句和语法——在Python中输入代码来建立并处理对象,以及Python一般的语法模型。 函

立即下载
C++Primer.Plus(第6版)中文带完整书签.pdf

C++Primer.Plus(第6版)中文带完整书签.pdfC++Primer.Plus(第6版)中文带完整书签.pdfC++Primer.Plus(第6版)中文带完整书签.pdfC++Primer.Plus(第6版)中文带完整书签.pdf

立即下载