所需积分/C币:50 2019-07-26 12:59:30 44.50MB PDF
收藏 收藏

不知道哪里能设成免费,CSDN的底限阿... scikit-Learn是基于python的机器学习模块,基于BSD开源许可证。这个项目最早由DavidCournapeau 在2007 年发起的,目前也是由社区自愿者进行维护。 scikit-learn的基本功能主要被分为六个部分,分类,回归,聚类,数据降维,模型选择,数据预处理,具体可以参考官方网站上的文档。 对于具体的机器学习问题,通常可以分为三个步骤,数据准备与预处理,模型选择与训练,模型验证与参数调优。 scikit-learn支持多种格式的数据,包括经典的iris数据,LibSVM格式数据等。
CONTENTS 1 Welcome to scikit-learn 1.1 Installing scikit-learn 1.2 Frequently Asked Questions 2 1.3 Suppo 8 1. 4 Related Projects “· 1. 5 About us 12 1.6 Who is using scikit-learn? 17 1. Release histor 26 1. 8 Version 0.21.2 27 1. 9 Version 0.21.1 7 L10 Version 021.0 28 1.11 Version o20.3 ....39 1.12 Version 0. 20.2 41 1.13 Version 0.20.1 42 1.14 Version o200 45 1. 15 Previous releases 1.16 Roadmap ...142 1. 17 Scikit-learn governance and decision-making 146 2 scikit-learn Tutorials 149 2.1 An introduction to machine learning with scikit-learn 149 2.2 A tutorial on statistical-learning for scientific data processing 155 2.3 Working With Text Data ..184 2.4 Choosing the right estimator 2.5 External resources. Videos and Talks 192 3 User Guide 195 3.1 Supervised learning 95 3.2 Unsupervised learning .334 3.3 Model selection and evaluation 432 3.4 Inspection 571 3.5 Dataset transformations ....,573 3.6 Dataset loading utilities 3.7 Computing with scikit-learn 648 4 Glossary of Common Terms and API Elements 4.1 General Concepts ..661 4.2 Class APIs and Estimator Types 670 4.3 Target Types 672 4.4 Methods 674 4.5 Parameters 676 4.6 Attributes .679 4.7 Data and sample properties 680 5 Examples 681 5.1 Miscellaneous examples ....681 5.2 Examples based on real world datasets ..714 5.3 Biclustering .773 5.4 Calibration ,,,,,.,785 5.5 Classification :.· 803 5.6 Clustering 818 5.7 Pipelines and composite estimators 906 5.8 Covariance estimation .940 5.9 Cross decomposition 955 5.10 Dataset examples 959 5.11 Decomposition ····“· 968 5.12 Ensemble methods ....1014 5.13 Tutorial exercises 1068 5.14 Feature Selection 1077 5.15 Gaussian Process for machine learnin 1088 5.16 Missing value Imputation ..1117 5.17 Inspection .1123 5. 18 Generalized Linear models 1128 5.19 Manifold learning ....1215 5.20 Gaussian Mixture Models 1245 5.21 Model Selection 1262 5.22 Multioutput methods 1312 5.23 Nearest Neighbors .,,1315 5. 24 Neural Networks .1348 5.25 Preprocessing “ ..,1361 5.26 Semi Supervised Classification .1387 5.27 Support Vector Machines 1400 5.28 Working with text documents 1432 5.29 ecision trees 1448 6 API Reference 1459 6.1 sklearn base: Base classes and utility functions 1459 6.2 sklearn. calibration: Probability calibration 1467 6. 3 sklearn cluster: Clustering .1470 6.4 sklearn cluster bicluster: Biclustering .1517 6.5 sklearn compose: Composite Estimators ..,.,.,1523 6.6 sklearn. covariance: Covariance estimators 1531 6.7 sklearn cross_decomposition: Cross decomposition 1563 6.8 sklearn datasets: Datasets .1577 6.9 sklearn decomposition: Matrix Decomposition 1622 6.10 sklearn. discriminant analysis: Discriminant Analysis 1678 6.11 sklearn. dummy: Dummy estimators 1685 6.12 sklearn. ensemble: Ensemble methods 1690 6. 13 sklearn. exceptions: Exceptions and warnings ...1730 6.14 sklearn experimental: Experimental 6. 15 sklearn. feature_extraction: Feature Extraction, 1735 1736 6.16 sklearn, feature selection: Feature selection 1763 6. 17 sklearn gaussian_ process: Gaussian Processes 1798 6.18 sklearn. isotonic: Isotonic regression 6. 19 sklearn. impute: Impute 1843 6.20 sklearn kernel_approximation Kernel Approximation 1851 6.21 sklearn kernel_ridge Kernel Ridge Regression 1861 6.22 sklearn. linear model: Generalized Linear Models 1864 6.23 sklearn. manifold: Manifold Learning 1965 6.24 sklearn. metrics: Metrics ....,1983 6.25 sklearn. mixture: Gaussian Mixture Models 2056 6.26 sklearn. model selection: Model Selection ..2068 6.27 sklearn. multi class: Multiclass and multilabel classification ......2122 6.28 sklearn. multioutput: Multioutput regression and classification 2130 6.29 sklearn. naive bayes: Naive Bayes 2140 6.30 sklearn. neighbors: Nearest Neighbors 2153 6.31 sklearn. neural network: Neural network models 2205 6.32 sklearn pipeline: Pipeline 6.33 sklearn inspection 2227 6.34 sklearn. preprocessing: Preprocessing and Normalization 2231 6.35 sklearn. random_projection: Random projection ....,.2286 6.36 sklearn. semi supervised Semi-Supervised learning 2293 6.37 sklea arn. s Support Vector Machines 2299 6. 38 sklearn. tree: Decision Trees 2331 6.39 sklearn utils: Utilities 6.40 Recently deprecated 2386 7 Developer’ s Guide 407 7.1 Contributing .2407 7.2 Developers Tips and Tricks 2428 7.3 Utilities for Developers 7.4 How to optimize for speed,∴· 2432 2436 7.5 Advanced installation instructions 2442 7.6 Maintainer core-developer information ..,,,,,2446 Bibliography 2451 Index 2459 CHAPTER ONE WELCOME TO SCIKIT-LEARN 1.1 Installing scikit-learn Note: If you wish to contribute to the project, it's recommended you install the latest development version 1.1.1 Installing the latest release Scikit-learn requires Python(>=3.5) NumPy(>=1.11.0) SciPy(>=0.17.0) ● joblib(>=0.11) Scikit-learn plotting capabilities (i. e, functions start with"plot_")require Matplotlib(>=1.5.1). Some of the scikit- learn examples might require one or more extra dependencies: scikit-image(>=0.12.3), pandas(>=0.180) Warning: Scikit-learn 0 20 was the last version to support Python 2.7 and Python 3. 4. Scikit-learn now requires Python 3. 5 or newer If you already have a working installation of numpy and scipy, the easiest way to install scikit-learn is using pip pip install -U scikit-learn or conda conda install scikit-learn If you have not installed NumPy or SciPy yet, you can also install these using conda or pip. When using pip, please ensure that binary wheels are used, and NumPy and sciPy are not recompiled from source, which can happen when using particular configurations of operating system and hardware( such as Linux on a Raspberry Pi). Building numpy and scipy from source can be complex(especially on Windows)and requires careful configuration to ensure that they link against an optimized implementation of linear algebra routines. Instead, use a third-party distribution as described below scikit-learn user quide Release.21.2 If you must install scikit-learn and its dependencies with pip, you can install it as scikit-learn[alldeps]. The most common use case for this is in a requirements. txt file used as part of an automated build process for a Paas application or a Docker image. This option is not intended for manual installation from the command line Note: For installing on PyPy, Py Py3-V5.10+, Numpy 1. 14.0+, and scipy 1.1.0+ are required For installation instructions for more distributions see other distributions. For compiling the development version from source, or building the package if no distribution is available for your architecture, see the Advanced installation instructions 1.1.2 Third-party Distributions If you dont already have a py thon installation with numpy and scipy, we recommend to install either via your package manager or via a python bundle. These come with numpy, scipy, scikil-learn, matplotlib anld many other helpful scientific and data processing libraries options are Canopy and Anaconda for all supported platforms Canopy and Anaconda both ship a recent version of scikit-learn, in addition to a large set of scientific python library for Windows Mac osX and Linux Anaconda offers scikit-learn as part of its free distribution Warning: To upgrade or uninstall scikit-learn installed with Anaconda or conda you should not use the pip command. Instead: To upgrade scikit-learn: conda update scikit-learn To uninstall scikit-learn conda remove scikit-learn Upgrading with pip install -U scikit-learn or uninstalling pip uninstall scikit-learn is likely fail to properly remove files installed by the conda command pip upgrade and uninstall operations only work on packages installed via pip install WinPython for Windows The winPython project distributes scikit-learn as an additional plugin 1.2 Frequently Asked Questions Here we try to give some answers to questions that regularly pop up on the mailing list. Chapter 1. Welcome to scikit-learn scikit-learn user guide, release 0.21.2 1.2.1 What is the project name (a lot of people get it wrong)? scikit-learn, but not scikit or SciKit nor sci-kit learn. Also not scikits learn or scikits-learn, which were previously d 1.2.2 How do you pronounce the project name? sy-kit learn. sci stands for science 1.2.3 Why scikit? Therearemultiplescikits,whicharescientifictoolboxesbuiltaroundScipy.Youcanfindalistathttps://scikits appspot. com/scikits. Apart from scikit-learn, another popular one is scikit-image 1.2.4 How can I contribute to scikit-learn? See Contributing. Before wanting to add a new algorithm, which is usually a major and lengthy undertaking, it is recommended to start with known issues. Please do not contact the contributors of scikit-learn directly regarding contributing to scikit-learn 1.2.5 What's the best way to get help on scikit-learn usage? For general machine learning questions, please use Cross Validated with the [machine-learning. tag For scikit-learn usage questions, please use Stack Overflow with the [ scikit-learn] and [python] tags. You can alternatively use the mailing list Please make sure to include a minimal reproduction code snippet (ideally shorter than 10 lines) that highlights your problem on a toy dataset (for instance from sklearn dataset s or randomly generated with functions of numpy random with a fixed random seed). Please remove any line of code that is not necessary to reproduce your problem The problem should be reproducible by simply copy-pasting your code snippet in a Python shell with scikit-learn installed. Do not forget to include the import statements More guidance to write good reproduction code snippets can be found at If your problem raises an exception that you do not understand (even after googling it), please make sure to include the full traceback that you obtain when running the reproduction script For bug reports or feature requests, please make use of the issue tracker on GitHub There is also a scikit-learn gitter channel where some users and developers might be found Please do not email any authors directly to ask for assistance, report bugs, or for any other issue related to scikit-learn 1.2.6 How should I save, export or deploy estimators for production? See Model persistence 1.2. Frequently Asked Questions 3 scikit-learn user quide Release.21.2 1.2.7 How can I create a bunch object? Dont make a bunch object! They are not part of the scikit-learn API. Bunch objects are just a way to package some numpy arrays. As a scikit-learn user you only ever need numpy arrays to feed your model with data For instance to train a classifier, all you need is a 2D array X for the input variables and a ID array y for the target variables. The array X holds the features as columns and samples as rows. The array y contains integer values to encode the class membership of each sample in X 1.2. 8 How can I load my own datasets into a format usable by scikit-learn? are convertible to numeric arrays such as pandas Data frame are also acceptable o Generally, scikit-learn works on any numeric data stored as numpy arrays or scipy sparse matrices. Other types that For more information on loading your data files into these usable data structures, please refer to loading external datasets 1.2.9 What are the inclusion criteria for new algorithms We only consider well-established algorithms for inclusion. a rule of thumb is at least 3 years since publication, 200+ citations and wide use and usefulness. A technique that provides a clear-cut improvement(e.g. an enhanced data structure or a more efficient approximation technique)on a widely-used method will also be considered for inclusion From the algorithms or techniques that meet the above criteria, only those which fit well within the current API of scikit-learn, that is a fit, predict/trans form interface and ordinarily having input/output that is a numpy array or sparse matrix, are accepted The contributor should support the importance of the proposed addition with research papers and/or implementations in other similar packages, demonstrate its usefulness via common use-cases/applications and corroborate performance improvements, if any, with benchmarks and/or plots. It is expected that the proposed algorithm should outperform the methods that are already implemented in scikit-learn at least in some areas Inclusion of a new algorithm speeding up an existing model is easier if it does not introduce new hyper-parameters(as it makes the library more future-proof), it is easy to document clearly when the contribution improves the speed and when it does not, for instance"when n_features >>n_samples benchmarks clearly show a speed up Also note that your implementation need not be in scikit-learn to be used together with scikit-learn tools. You can implement your favorite algorithm in a scikit-learn compatible way, upload it to GitHub and let us know. We will be happy to list it under Related Projects. If you already have a package on GitHub following the scikit-learn APl, you may also be interested to look at scikit-learn-contrib 2.10 Why are you so selective on what algorithms you include in scikit-learn? Code is maintenance cost, and we need to balance the amount of code we have with the size of the team (and add to this the fact that complexity scales non linearly with the number of features). The package relies on core developers using their free time to fix bugs, maintain code and review contributions. Any algorithm that is added needs future attention by the developers, at which point the original author might long have lost interest. See also What are the inclusion criteria for new algorithms ? For a great read about long-term maintenance issues in openl-source software look at the Executive Summary of roads and bridges Chapter 1. Welcome to scikit-learn

试读 127P scikit-learn用户手册0.21.2版
立即下载 低至0.43元/次 身份认证VIP会员低至7折
u012822617 非常感谢分享
奋起的熊猫 挺好的!有用
关注 私信
scikit-learn用户手册0.21.2版 50积分/C币 立即下载

试读结束, 可继续阅读

50积分/C币 立即下载 >