PythonDataScienceHandbook(JupyterNotebookVersion)_windows安装jupyter资源-CSDN文库

需积分: 21 158 浏览量 2017-12-16 00:16:26 上传评论 2 收藏 20.85MB PDF 举报

资源推荐

资源详情

资源评论

Python Data Science Handbook

launch

binder

This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks.

How to Use this Book

Read the book in its entirety online at https://jakevdp.github.io/PythonDataScienceHandbook/

Run the code using the Jupyter notebooks available in this repository's notebooks directory.

Launch a live notebook server with these notebooks using binder:

launch

binder

Buy the printed book through O'Reilly Media

About

The book was written and tested with Python 3.5, though other Python versions (including Python 2.7) should work in nearly all cases.

The book introduces the core libraries essential for working with data in Python: particularly IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and related

packages. Familiarity with Python as a language is assumed; if you need a quick introduction to the language itself, see the free companion project, A

Whirlwind Tour of Python: it's a fast-paced introduction to the Python language aimed at researchers and scientists.

See Index.ipynb for an index of the notebooks available to accompany the text.

Software

The code in the book was tested with Python 3.5, though most (but not all) will also work correctly with Python 2.7 and other older Python versions.

The packages I used to run the code in the book are listed in requirements.txt (Note that some of these exact version numbers may not be available on

your platform: you may have to tweak them for your own use). To install the requirements using conda, run the following at the command-line:

$ conda install --file requirements.txt

To create a stand-alone environment named

PDSH with Python 3.5 and all the required package versions, run the following:

$ conda create -n PDSH python=3.5 --file requirements.txt

You can read more about using conda environments in the Managing Environments section of the conda documentation.

License

Code

The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source

Initiative.

Text

The text content of the book is released under the CC-BY-NC-ND license. Read more at Creative Commons.
This notebook contains an excerpt from the Python Data Science Handbook by Jake VanderPlas; the content is available on GitHub.
The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please
consider supporting the work by buying the book!
| Contents | IPython: Beyond Normal Python >
Preface
What Is Data Science?
This is a book about doing data science with Python, which immediately begs the question: what is data science? It's a surprisingly hard definition to nail
down, especially given how ubiquitous the term has become. Vocal critics have variously dismissed the term as a superfluous label (after all, what science
doesn't involve data?) or a simple buzzword that only exists to salt resumes and catch the eye of overzealous tech recruiters.
In my mind, these critiques miss something important. Data science, despite its hype-laden veneer, is perhaps the best label we have for the cross-
disciplinary set of skills that are becoming increasingly important in many applications across industry and academia. This cross-disciplinary piece is key: in
my mind, the best extisting definition of data science is illustrated by Drew Conway's Data Science Venn Diagram, first published on his blog in September
2010:
(Source: [Drew Conway](http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram). Used by permission.)
While some of the intersection labels are a bit tongue-in-cheek, this diagram captures the essence of what I think people mean when they say "data
science": it is fundamentally an interdisciplinary subject. Data science comprises three distinct and overlapping areas: the skills of a statistician who knows
how to model and summarize datasets (which are growing ever larger); the skills of a computer scientist who can design and use algorithms to efficiently
store, process, and visualize this data; and the domain expertise—what we might think of as "classical" training in a subject—necessary both to formulate
the right questions and to put their answers in context.
With this in mind, I would encourage you to think of data science not as a new domain of knowledge to learn, but a new set of skills that you can apply
within your current area of expertise. Whether you are reporting election results, forecasting stock returns, optimizing online ad clicks, identifying
microorganisms in microscope photos, seeking new classes of astronomical objects, or working with data in any other field, the goal of this book is to give
you the ability to ask and answer new questions about your chosen subject area.
Who Is This Book For?
In my teaching both at the University of Washington and at various tech-focused conferences and meetups, one of the most common questions I have
heard is this: "how should I learn Python?" The people asking are generally technically minded students, developers, or researchers, often with an already
strong background in writing code and using computational and numerical tools. Most of these folks don't want to learn Python per se, but want to learn the
language with the aim of using it as a tool for data-intensive and computational science. While a large patchwork of videos, blog posts, and tutorials for this
audience is available online, I've long been frustrated by the lack of a single good answer to this question; that is what inspired this book.
The book is not meant to be an introduction to Python or to programming in general; I assume the reader has familiarity with the Python language,
including defining functions, assigning variables, calling methods of objects, controlling the flow of a program, and other basic tasks. Instead it is meant to
help Python users learn to use Python's data science stack–libraries such as IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and related tools–to
effectively store, manipulate, and gain insight from data.
Why Python?

Python has emerged over the last couple decades as a first-class tool for scientific computing tasks, including the analysis and visualization of large

datasets. This may have come as a surprise to early proponents of the Python language: the language itself was not specifically designed with data

analysis or scientific computing in mind. The usefulness of Python for data science stems primarily from the large and active ecosystem of third-party

packages: NumPy for manipulation of homogeneous array-based data, Pandas for manipulation of heterogeneous and labeled data, SciPy for common

scientific computing tasks, Matplotlib for publication-quality visualizations, IPython for interactive execution and sharing of code, Scikit-Learn for machine

learning, and many more tools that will be mentioned in the following pages.

If you are looking for a guide to the Python language itself, I would suggest the sister project to this book, "A Whirlwind Tour of the Python Language". This

short report provides a tour of the essential features of the Python language, aimed at data scientists who already are familiar with one or more other

programming languages.

Python 2 vs Python 3

This book uses the syntax of Python 3, which contains language enhancements that are not compatible with the 2.x series of Python. Though Python 3.0

was first released in 2008, adoption has been relatively slow, particularly in the scientific and web development communities. This is primarily because it

took some time for many of the essential third-party packages and toolkits to be made compatible with the new language internals. Since early 2014,

however, stable releases of the most important tools in the data science ecosystem have been fully compatible with both Python 2 and 3, and so this book

will use the newer Python 3 syntax. However, the vast majority of code snippets in this book will also work without modification in Python 2: in cases where

a Py2-incompatible syntax is used, I will make every effort to note it explicitly.

Outline of the Book

Each chapter of this book focuses on a particular package or tool that contributes a fundamental piece of the Python Data Sciece story.

1. IPython and Jupyter: these packages provide the computational environment in which many Python-using data scientists work.

2. NumPy: this library provides the

ndarray for efficient storage and manipulation of dense data arrays in Python.

3. Pandas: this library provides the

DataFrame for efficient storage and manipulation of labeled/columnar data in Python.

4. Matplotlib: this library provides capabilities for a flexible range of data visualizations in Python.

5. Scikit-Learn: this library provides efficient & clean Python implementations of the most important and established machine learning algorithms.

The PyData world is certainly much larger than these five packages, and is growing every day. With this in mind, I make every attempt through these pages

to provide references to other interesting efforts, projects, and packages that are pushing the boundaries of what can be done in Python. Nevertheless,

these five are currently fundamental to much of the work being done in the Python data science space, and I expect they will remain important even as the

ecosystem continues growing around them.

Using Code Examples

Supplemental material (code examples, figures, etc.) is available for download at http://github.com/jakevdp/PythonDataScienceHandbook/. This book is

here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not

need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of

code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a

question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into

your product’s documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example:

If you feel your use of code examples falls outside fair use or the per‐ mission given above, feel free to contact us at permissions@oreilly.com.

Installation Considerations

Installing Python and the suite of libraries that enable scientific computing is straightforward . This section will outline some of the considerations when

setting up your computer.

Though there are various ways to install Python, the one I would suggest for use in data science is the Anaconda distribution, which works similarly

whether you use Windows, Linux, or Mac OS X. The Anaconda distribution comes in two flavors:

Miniconda gives you the Python interpreter itself, along with a command-line tool called conda which operates as a cross-platform package manager

geared toward Python packages, similar in spirit to the apt or yum tools that Linux users might be familiar with.

Anaconda includes both Python and conda, and additionally bundles a suite of other pre-installed packages geared toward scientific computing.

Because of the size of this bundle, expect the installation to consume several gigabytes of disk space.

Any of the packages included with Anaconda can also be installed manually on top of Miniconda; for this reason I suggest starting with Miniconda.

To get started, download and install the Miniconda package–make sure to choose a version with Python 3–and then install the core packages used in this

book:

[~]$ conda install numpy pandas scikit-learn matplotlib seaborn jupyter

Throughout the text, we will also make use of other more specialized tools in Python's scientific ecosystem; installation is usually as easy as typing

conda install packagename. For more information on conda, including information about creating and using conda environments (which I would highly

recommend), refer to conda's online documentation.

| Contents | IPython: Beyond Normal Python >

This notebook contains an excerpt from the Python Data Science Handbook by Jake VanderPlas; the content is available on GitHub.
The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please
consider supporting the work by buying the book!
< Preface | Contents | Help and Documentation in IPython >
IPython: Beyond Normal Python
There are many options for development environments for Python, and I'm often asked which one I use in my own work. My answer sometimes surprises
people: my preferred environment is IPython plus a text editor (in my case, Emacs or Atom depending on my mood). IPython (short for Interactive Python)
was started in 2001 by Fernando Perez as an enhanced Python interpreter, and has since grown into a project aiming to provide, in Perez's words, "Tools
for the entire life cycle of research computing." If Python is the engine of our data science task, you might think of IPython as the interactive control panel.
As well as being a useful interactive interface to Python, IPython also provides a number of useful syntactic additions to the language; we'll cover the most
useful of these additions here. In addition, IPython is closely tied with the Jupyter project, which provides a browser-based notebook that is useful for
development, collaboration, sharing, and even publication of data science results. The IPython notebook is actually a special case of the broader Jupyter
notebook structure, which encompasses notebooks for Julia, R, and other programming languages. As an example of the usefulness of the notebook
format, look no further than the page you are reading: the entire manuscript for this book was composed as a set of IPython notebooks.
IPython is about using Python effectively for interactive scientific and data-intensive computing. This chapter will start by stepping through some of the
IPython features that are useful to the practice of data science, focusing especially on the syntax it offers beyond the standard features of Python. Next, we
will go into a bit more depth on some of the more useful "magic commands" that can speed-up common tasks in creating and using data science code.
Finally, we will touch on some of the features of the notebook that make it useful in understanding data and sharing results.
Shell or Notebook?
There are two primary means of using IPython that we'll discuss in this chapter: the IPython shell and the IPython notebook. The bulk of the material in this
chapter is relevant to both, and the examples will switch between them depending on what is most convenient. In the few sections that are relevant to just
one or the other, we will explicitly state that fact. Before we start, some words on how to launch the IPython shell and IPython notebook.
Launching the IPython Shell
This chapter, like most of this book, is not designed to be absorbed passively. I recommend that as you read through it, you follow along and experiment
with the tools and syntax we cover: the muscle-memory you build through doing this will be far more useful than the simple act of reading about it. Start by
launching the IPython interpreter by typing 
ipython on the command-line; alternatively, if you've installed a distribution like Anaconda or EPD, there may be
a launcher specific to your system (we'll discuss this more fully in Help and Documentation in IPython).
Once you do this, you should see a prompt like the following:
IPython 4.0.1 -- An enhanced Interactive Python. 
?         -> Introduction and overview of IPython's features. 
%quickref -> Quick reference. 
help      -> Python's own help system. 
object?   -> Details about 'object', use 'object??' for extra details. 
In [1]:
With that, you're ready to follow along.
Launching the Jupyter Notebook
The Jupyter notebook is a browser-based graphical interface to the IPython shell, and builds on it a rich set of dynamic display capabilities. As well as
executing Python/IPython statements, the notebook allows the user to include formatted text, static and dynamic visualizations, mathematical equations,
JavaScript widgets, and much more. Furthermore, these documents can be saved in a way that lets other people open them and execute the code on their
own systems.
Though the IPython notebook is viewed and edited through your web browser window, it must connect to a running Python process in order to execute
code. This process (known as a "kernel") can be started by running the following command in your system shell:
$ jupyter notebook
This command will launch a local web server that will be visible to your browser. It immediately spits out a log showing what it is doing; that log will look
something like this:
$ jupyter notebook 
[NotebookApp] Serving notebooks from local directory: /Users/jakevdp/PythonDataScienceHandbook 
[NotebookApp] 0 active kernels  
[NotebookApp] The IPython Notebook is running at: http://localhost:8888/ 
[NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Upon issuing the command, your default browser should automatically open and navigate to the listed local URL; the exact address will depend on your
system. If the browser does not open automatically, you can open a window and manually open this address (http://localhost:8888/ in this example).
< Preface | Contents | Help and Documentation in IPython >
This notebook contains an excerpt from the Python Data Science Handbook by Jake VanderPlas; the content is available on GitHub.

The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please

consider supporting the work by buying the book!

< IPython: Beyond Normal Python | Contents | Keyboard Shortcuts in the IPython Shell >

Help and Documentation in IPython

If you read no other section in this chapter, read this one: I find the tools discussed here to be the most transformative contributions of IPython to my daily

workflow.

When a technologically-minded person is asked to help a friend, family member, or colleague with a computer problem, most of the time it's less a matter

of knowing the answer as much as knowing how to quickly find an unknown answer. In data science it's the same: searchable web resources such as

online documentation, mailing-list threads, and StackOverflow answers contain a wealth of information, even (especially?) if it is a topic you've found

yourself searching before. Being an effective practitioner of data science is less about memorizing the tool or command you should use for every possible

situation, and more about learning to effectively find the information you don't know, whether through a web search engine or another means.

One of the most useful functions of IPython/Jupyter is to shorten the gap between the user and the type of documentation and search that will help them

do their work effectively. While web searches still play a role in answering complicated questions, an amazing amount of information can be found through

IPython alone. Some examples of the questions IPython can help answer in a few keystrokes:

How do I call this function? What arguments and options does it have?

What does the source code of this Python object look like?

What is in this package I imported? What attributes or methods does this object have?

Here we'll discuss IPython's tools to quickly access this information, namely the

? character to explore documentation, the ?? characters to explore source

code, and the Tab key for auto-completion.

Accessing Documentation with

The Python language and its data science ecosystem is built with the user in mind, and one big part of that is access to documentation. Every Python

object contains the reference to a string, known as a doc string, which in most cases will contain a concise summary of the object and how to use it. Python

has a built-in

help() function that can access this information and prints the results. For example, to see the documentation of the built-in len function, you

can do the following:

In [1]: help(len)

Help on built-in function len in module builtins:

len(...)

len(object) -> integer

Return the number of items of a sequence or mapping.

Depending on your interpreter, this information may be displayed as inline text, or in some separate pop-up window.

Because finding help on an object is so common and useful, IPython introduces the

? character as a shorthand for accessing this documentation and other

relevant information:

In [2]: len?

Type: builtin_function_or_method

String form: <built-in function len>

Namespace: Python builtin

Docstring:

len(object) -> integer

Return the number of items of a sequence or mapping.

This notation works for just about anything, including object methods:

In [3]: L = [1, 2, 3]

In [4]: L.insert?

Type: builtin_function_or_method

String form: <built-in method insert of list object at 0x1024b8ea8>

Docstring: L.insert(index, object) -- insert object before index

or even objects themselves, with the documentation from their type:

In [5]: L?

Type: list

String form: [1, 2, 3]

Length: 3

Docstring:

list() -> new empty list

list(iterable) -> new list initialized from iterable's items

Importantly, this will even work for functions or other objects you create yourself! Here we'll define a small function with a docstring:

剩余374页未读，继续阅读

评论收藏

内容反馈

绝不原创的飞龙

粉丝: 1w+
资源: 1091

Python Data Science Handbook (Jupyter Notebook Version)

最新资源

Python Data Science Handbook (Jupyter Notebook Version)

Python Data Science Handbook

《Python Data Science Handbook》

Data-science-notebook:数据科学课程+数据科学课程（金融科技）

DataScience-Notebooks：数据科学笔记本的集合

Python Data Science Handbook[美]Jake VanderPlas【高清版】

PythonDataScienceHandbook-master

（原版）Python Data Science Handbook

Python Data Science Handbook Essential Tools for Working with Data(epub, pdf)

python data science handbook

Python Data Science Handbook Essential Tools for Working with Data epub 0分

PythondataScience

Jupyter-for-Data-Science-master

Python Data Science Handbook 原版PDF by VanderPlas

python-data-science-handbook-notes：阅读Jake VanderPlas（https：github.comjakevdpPythonDataScienceHandbook）的Python数据科学手册时所做的笔记

DataScienceBook:数据科学第一本书-即自己平时所学的笔记记录

data-science-notebooks:我建立的资料科学专案的档案库

Python Data Science Handbook.paf

Python Data Science Handbook Essential Tools for Working with Da

python data sicence handbook

基于Python+pytorch的图像处理+附完整代码图像处理，能够轻松实现图像的读取、显示、裁剪等还有机器学习等操作

python大作业 含爬虫、数据可视化、地图、报告、及源码（2016-2021全国各地区粮食产量）.rar

《点燃我温暖你》中李峋的同款爱心代码

Python金融量化的高级库：TA-Lib-0.4.24（包含python3.7、3.8、3.9、3.10的32位和64位版本）

第十五届蓝桥杯大赛软件赛省赛-PythonB组题目

大麦网抢票脚本【Python脚本】

最新资源

python大作业含爬虫、数据可视化、地图、报告、及源码（2016-2021全国各地区粮食产量）.rar