没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
2
Table of Contents
Copyright.........................................................................................................................................4
Dedication.................................................................................................................................5
Foreword.........................................................................................................................................6
Preface............................................................................................................................................8
Acknowledgments...........................................................................................................................9
About this Book.............................................................................................................................10
What is data science?.............................................................................................................10
Roadmap.................................................................................................................................11
Audience .................................................................................................................................12
What is not in this book?.......................................................................................................13
Code conventions and downloads ........................................................................................14
Software and hardware requirements..................................................................................14
Author Online.........................................................................................................................15
About the authors...................................................................................................................15
About the Cover Illustration........................................................................................................16
Part 1. Introduction to data science.............................................................................................17
Chapter 1. The data science process............................................................................................18
1.1. The roles in a data science project..................................................................................18
1.2. Stages of a data science project......................................................................................21
1.3. Setting expectations........................................................................................................31
1.4. Summary..........................................................................................................................33
Chapter 2. Loading data into R....................................................................................................35
2.1. Working with data from files..........................................................................................35
2.2. Working with relational databases................................................................................42
2.3. Summary .........................................................................................................................53
Chapter 3. Exploring data............................................................................................................55
3.1. Using summary statistics to spot problems ..................................................................56
3.2. Spotting problems using graphics and visualization...................................................61
3.3. Summary .........................................................................................................................88
Chapter 4. Managing data...........................................................................................................89
4.1. Cleaning data...................................................................................................................89
4.2. Sampling for modeling and validation........................................................................103
4.3. Summary .......................................................................................................................107
Part 2. Modeling methods.........................................................................................................108
Chapter 5. Choosing and evaluating models............................................................................109
5.1. Mapping problems to machine learning tasks............................................................110
5.2. Evaluating models.........................................................................................................119
5.3. Validating models .........................................................................................................139
5.4. Summary .......................................................................................................................146
Chapter 6. Memorization methods............................................................................................147
6.1. KDD and KDD Cup 2009 .............................................................................................147
6.2. Building single-variable models ..................................................................................150
6.3. Building models using many variables .......................................................................158
3
6.4. Summary .......................................................................................................................173
Chapter 7. Linear and logistic regression.................................................................................175
7.1. Using linear regression .................................................................................................175
7.2. Using logistic regression...............................................................................................194
7.3. Summary........................................................................................................................214
Chapter 8. Unsupervised methods............................................................................................216
8.1. Cluster analysis..............................................................................................................216
8.2. Association rules...........................................................................................................243
8.3. Summary .......................................................................................................................256
Chapter 9. Exploring advanced methods..................................................................................257
9.1. Using bagging and random forests to reduce training variance................................258
9.2. Using generalized additive models (GAMs) to learn non-monotone relationships 267
9.3. Using kernel methods to increase data separation....................................................283
9.4. Using SVMs to model complicated decision boundaries...........................................292
9.5. Summary .......................................................................................................................305
Part 3. Delivering results...........................................................................................................307
Chapter 10. Documentation and deployment.........................................................................308
10.1. The buzz dataset..........................................................................................................309
10.2. Using knitr to produce milestone documentation ...................................................310
10.3. Using comments and version control for running documentation ........................321
10.4. Deploying models .......................................................................................................338
10.5. Summary......................................................................................................................345
Chapter 11. Producing effective presentations........................................................................346
11.1. Presenting your results to the project sponsor..........................................................347
11.2. Presenting your model to end users...........................................................................355
11.3. Presenting your work to other data scientists...........................................................361
11.4. Summary ......................................................................................................................367
Appendix A. Working with R and other tools..........................................................................369
A.1. Installing the tools ........................................................................................................369
A.2. Starting with R..............................................................................................................371
A.3. Using databases with R................................................................................................385
Appendix B. Important statistical concepts.............................................................................399
B.1. Distributions..................................................................................................................399
B.2. Statistical theory...........................................................................................................417
B.3. Examples of the statistical view of data......................................................................432
Appendix C. More tools and ideas worth exploring................................................................442
C.1. More tools......................................................................................................................442
C.2. More ideas.....................................................................................................................445
Bibliography...............................................................................................................................449
Index...........................................................................................................................................452
4
Copyright
For online information and ordering of this and other Manning books, please visit
www.manning.com. The publisher offers discounts on this book when ordered in quantity. For
more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 261
Shelter Island, NY 11964
Email: orders@manning.com
©2014 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by means electronic, mechanical, photocopying, or otherwise, without prior written
permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in the book, and Manning Publications
was aware of a trademark claim, the designations have been printed in initial caps or all caps.
Recognizing the importance of preserving what has been written, it is Manning’s policy to
have the books we publish printed on acid-free paper, and we exert our best efforts to that end.
Recognizing also our responsibility to conserve the resources of our planet, Manning books are
printed on paper that is at least 15 percent recycled and processed without the use of elemental
chlorine.
Manning Publications Co.
20 Baldwin Road
PO Box 261
Shelter Island, NY 11964
Development editor: Cynthia Kane
Copyeditor: Benjamin Berg
Proofreader: Katie Tennant
Typesetter: Dottie Marsico
Cover designer: Marija Tudor
ISBN 9781617291562
Printed in the United States of America
1 2 3 4 5 6 7 8 9 10 – EBM – 19 18 17 16 15 14
5
Dedication
To our parents
Olive and Paul Zumel
Peggy and David Mount
剩余471页未读,继续阅读
资源评论
Ding_zhaohai
- 粉丝: 6
- 资源: 14
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功