所需积分/C币:17 2015-07-07 10:58:10 5.02MB PDF
收藏 收藏

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out. Get a crash course in Python Learn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data science Collect, explore, clean, munge, and manipulate data Dive into the fundamentals of machine learning Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clustering Explore recommender systems, natural language processing, network analysis, MapReduce, and databases Table of Contents Chapter 1. Introduction Chapter 2. A Crash Course in Python Chapter 3. Visualizing Data Chapter 4. Linear Algebra Chapter 5. Statistics Chapter 6. Probability Chapter 7. Hypothesis and Inference Chapter 8. Gradient Descent Chapter 9. Getting Data Chapter 10. Working with Data Chapter 11. Machine Learning Chapter 12. k-Nearest Neighbors Chapter 13. Naive Bayes Chapter 14. Simple Linear Regression Chapter 15. Multiple Regression Chapter 16. Logistic Regression Chapter 17. Decision Trees Chapter 18. Neural Networks Chapter 19. Clustering Chapter 20. Natural Language Processing Chapter 21. Network Analysis Chapter 22. Recommender Systems Chapter 23. Databases and SQL Chapter 24. MapReduce Chapter 25. Go Forth and Do Data Science
Data science from scratch Joel grus Beng. Cambridge. Farnham·Kn· Sebastopol, Tokyo OREILLY° Data Science from scratch by Joel grus Copyright o 2015 OReilly Media. All rights reserved Printed in the United states of america Published by O reilly Media, Inc, 1005 Gravenstein Highway North, Sebastopol, CA95472 OReilly books may be purchased for educational, business, or sales promotional use. Online editions are alsoavailableformosttitles( Editor: Marie Beaugureau Indexer: Ellen Troutman-Zaig Production Editor: Melanie Yarbrough Interior Designer: David Futato Copyeditor: Nan Reinhardt Cover Designer: Karen Montgomer Proofreader: Eileen cohen Illustrator: Rebecca Demarest April 2015 First edition Revision History for the First Edition 2015-04-10: First Release See The O reilly logo is a registered trademark of o reilly media, Inc. Data Science from Scratch, the cover image of a rock Ptarmigan, and related trade dress are trademarks of o reilly media, inc While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/ or rights 978-1-491-90142-7 ILSI Table of contents Preface XI 1. Introduction The ascend: of da What is data science? Motivating Hypothetical: DataSciencester Finding Key Connectors 111236 Data Scientists You May Know Salaries and Experience 8 Paid Accounts Topics of Interest Onward 2. A Crash Course in Python The Ba Getting pith g non P e zen or Py on Whitespace Formatting Arithmetic Functions 35556678899 Strings Exceptions Lists 20 Tuples 21 Dictionaries 21 ets 24 Control flo 25 Truthiness 25 The Not-So-Basics ng 27 List Comprehensions 27 Generators and iterators 28 Randomness 29 Regular Expressions 30 Object-Oriented Programming 30 Functional tools 31 enumerate 32 zip and argument Unpacking 33 args and kwargs 34 Welcome to dataSciencester 35 For Further Exploration 35 3. Visualizing Data matplotlib 37 Bar charts 39 Line charts 43 terplots 44 For Further Exploration 47 4. Linear Algebra 49 Vectors 49 Matrices 53 For Further Exploration 55 5. Statistics 57 Describing a Single Set of data 57 Central Tendencies 59 Dispersion 61 Correlation 62 Simpsons paradox Some other correlational caveats 66 Correlation and causation 67 For Further Exploration 68 6. Probabili ,,,69 Dependence and independence Conditional Probability 70 Bayes's Theorem 72 Random variables iv Table of Contents Continuous distributions The normal Distribution The Central Limit Theorem 78 For Further exploration 80 7. Hypothesis and Inference n81 Statistical Hypothesis Testing 81 Example: Flipping a Coin 81 Confidence intervals P-hacking 86 Example: Running an A/B Test 87 Bayesian Inference 88 For Further Exploration 92 8. Gradient descent The Idea behind gradient Descent 93 Estimating the gradient 94 Usi g the Gradient 97 Choosing the right Step size 97 Putting it all together 8 Stochastic gradient descent 99 For Further Exploration 100 Getting Data 103 stdin and stdout 103 Reading files 105 The Basics of Text files 105 Delimited files 106 Scraping the Web 108 HTML and the parsing Thereof 108 Example: O Reilly books about Data 110 Using APIs 114 jSON (and XML) 114 Using an Unauthenticated API 115 Finding apis 116 Example: Using the Twitter APIs 117 Getting Credentials 117 For Further Exploration 120 10. Working with Data. 121 Exploring Your Data 121 Exploring one-Dimensional Data 121 Table of contents Two Dimensions 123 Many dimensions 125 Cleaning and Munging 127 Manipulating data 129 Rescaling 132 Dimensionality Reduction 134 For Further Exploration 139 11. Machine Learning 141 Modeling 141 What Is Machine Learning 142 Overfitting and Underfittin g 142 Correctness 145 The Bias-Variance Trade-off 147 Feature Extraction and selection 148 For Further Exploration 150 12. k-Nearest Neighbors. 151 The model 151 Example: Favorite Languages 153 The Curse of Dimensionality 156 For Further Exploration 163 13. Naive bayes. 165 A Really Dumb Spam Filter 165 A More Sophisticated Spam Filter 166 Implementation 168 Testing Our Model 169 For Further Exploration 172 14. Simple linear regression ,173 The model 173 Using gradient Descent 176 Maximum Likelihood estimation 177 For Further Exploration 177 15. Multiple Regression.…,,…, The model 179 Further Assumptions of the Least Squares Model 180 Fitting the model 181 Interpreting the model 182 Goodness of Fit 183 Table of contents Digression: The Bootstrap 183 Standard Errors of Regression Coefficients 184 Regularization 186 For Further Exploration 188 16. Logistic Regression The Problem 189 The Logistic Function 192 Applying the Model 194 Goodness of Fit 195 Support Vector Machines 196 For Further Investigation 200 17. Decision trees 201 What is a decision tree? 201 Entropy 20 The entropy of a partition 205 Creating a Decision Tree 206 Putting It All Together 208 Random forests 211 For Further Exploration 212 18. Neural Networks.44....4..213 Perceptrons 213 Feed-Forward Neural Networks 215 Ba backpropagation 218 Example: Defeating a CAPTCHA 219 For Further Exploration 224 19.〔 lustering 225 The idea 225 The model 226 Example: Meetups 227 Choosing k 230 Example: Clustering Colors 231 Bottom-up Hierarchical Clustering 233 For Further Exploration 238 20. Natural Language Processing ,239 Word Clouds 239 n-gram Models 241 Grammars 244 Table of contents|ⅶi An Aside: Gibbs Sampling 246 Topic Modeling 247 For Further Exploration 253 21. Network analysis 255 Betweenness Centrality 255 Eigenvector Centrality 260 Matrix Multiplication 260 Centrali 262 Directed Graphs and PageRank 264 For Further Exploration 266 22. Recommender systems. 267 Manual curation 268 Recommending What's Popular 268 User-Based Collaborative Filtering 269 Item-Based Collaborative Filtering 272 For Further Exploration 274 23. Databases and SQL............................ 275 CREATE TABLE and INsert 275 UPDATE 277 DELETE 278 SELECT 278 GROUP BY 280 ORDER BY 282 JOIN 283 Subqueries 285 Indexes 285 Query optimization 286 NOSQL 287 For Further exploration 287 24. Map Reduce 289 Example: Word Count 289 Why map reduce? 291 Map Reduce More Generally 292 Example: Analyzing Status Updates 293 Example: Matrix Multiplication 294 An aside: Combiners 296 For Further Exploration 296 I Table of Contents

试读 127P Data.Science.from.Scratch.First.Principles.with.Python
立即下载 低至0.43元/次 身份认证VIP会员低至7折
猫在吃饭 好书,值得下载
anruo12138 多谢啦,可找到了。这本书讲的内容偏基础的统计一点,不太实战
tinababy1210 最近打算学习,疯狂囤书中。
次轨 很好的入门书
nyz531 书不错,可以看!!!
Ali 书不错 感谢分享。
wangjian052163 不错,好东西,收藏,谢谢
xiaobaiyyyyyy 很棒的书,入门不错
ldh2013 谢谢了,不错的资源。
pywansui python不错的书
Data.Science.from.Scratch.First.Principles.with.Python 17积分/C币 立即下载

试读结束, 可继续阅读

17积分/C币 立即下载 >