没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
[ 1 ]
Installing Apache Spark
Starting with Apache Spark can be intimidating. However, after you have gone
through the process of installing it on your local machine, in hindsight, it will not
look so scary.
In this chapter, we will guide you through the requirements of Spark 2.0, the
installation process of the environment itself, and through setting up the Jupyter
notebook so that it is convenient and easy to write your code.
The topics covered are:
• Requirements
• Installing Spark
• Jupyter on PySpark
• Installing in the cloud
Requirements
Before we begin, let's make sure your computer is ready for Spark installation. What
you need is Java 7+ and Python 2.6+/3.4+. Spark also requires R 3.1+ if you want to
run R code. For the Scala API, Spark 2.0.0 Preview uses Scala 2.11. You will need to
use a compatible Scala version (2.11.x).
Spark installs Scala during the installation process, so we just need to make sure that
Java and Python are present on your machine.
Throughout this book we will be using Mac OS X El Capitan, Ubuntu as
ourLinuxavor,andWindows10;alltheexamplespresentedshouldrun
on either of these machines.
Installing Apache Spark
[ 2 ]
Checking for presence of Java and Python
On a Unix-like machine (Mac or Linux) you need to open Terminal (or Console),
and on Windows you need to open Command Line (navigate to Start | Run | cmd
and press the Enter key).
Throughout this book we will refer to Terminal, Console, or Command
Line as CLI, which stands for a Command Line Interface.
Once the window opens, type the following:
java -version
If the command prints out something like this:
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
It means you have Java present on your machine. In the preceding case, we are
runningJava8,sowemeettherstcriterion.If,however,executingthepreceding
command returns an error on Mac or Linux, it might look similar to the following:
-bash: java: command not found
Or, on Windows, it might resemble the following error:
'java' is not recognized as an internal or external command, operable
program or batch file
This means that either Java is not installed on your machine, or it is not present in the
PATH.
PATH is an environment variable that a CLI checks for binaries. For example, if you
type the cd (change directory) command and try to execute in the CLI, your system
will scan the folders listed in the PATH, searching for the cd executable and, if found,
willexecuteit;ifthebinarycannotbefound,thesystemwillproduceanerror.
To learn more about what the PATH variable does, go to http://www.
linfo.org/path_env_var.html for more information.
Appendix A
[ 3 ]
If you are sure you have Java installed (or simply do not know) you can try locating
Java binaries. On Linux you can try executing the following command:
locate java
Check the /usr/lib/jvm location for a jvm folder.
RefertoyouravorofLinuxdocumentationtondanequivalentmethod
or an exact location of the jvm folder.
On Mac, check the /Library/Java/JavaVirtualMachines/ location for a jdk or
jrefolder;onWindowsyoucannavigatetoC:\Program Files (x86)\ and check
for the Java folder. If the preceding efforts fail, you will have to install Java
(see the following section, Installing Java).
In a similar fashion to how we checked for Java, let's now check whether Python is
present on your machine. In your CLI type, use the following command:
python --version
If you have Python installed, the Terminal should print out its version. In our case,
this is:
Python 3.5.1 :: Anaconda 2.4.1 (x86_64)
If, however, you do not have Python, you will have to install a compatible version on
your machine (see the following section, Installing Python).
Installing Java
It goes beyond the scope of this book to provide detailed instructions on how you
should install Java. However, it is a fairly straightforward process and the high-level
steps you need to undertake are:
1. Go to
https://www.java.com/en/download/mac_download.jsp and
download the version appropriate for your system.
2. Once downloaded, follow the instructions to install on your machine.
That is effectively all you have to do.
If you run into trouble, check
https://www.java.com/en/download/help/mac_
install.xml
for help on how to install Java on Mac.
Installing Apache Spark
[ 4 ]
Check https://www.java.com/en/download/help/ie_online_install.xml for
steps outlining the installation process on Windows.
Finally, check
https://www.java.com/en/download/help/linux_install.xml
for Linux installation instructions.
Installing Python
OurpreferredavorofPythonisAnaconda(providedbyContinuum)andwe
strongly recommend this distribution. The package comes with all the necessary and
most commonly used modules included (such as pandas, NumPy, SciPy, or Scikit,
among many others). If a module you want to use is not present, you can quickly
install it using the conda package management system.
The Anaconda environment can be downloaded from
https://www.continuum.
io/downloads
. Check the correct version for your operating system and follow the
instructions presented to install the distribution.
Note that, for Linux, we assume you install Anaconda in
your HOME directory.
Once downloaded, follow the instructions to install the environment appropriate for
your operating system:
• For Windows, see
https://docs.continuum.io/anaconda/
install#anaconda-for-windows-install
• For Linux, see https://docs.continuum.io/anaconda/install#linux-
install
• For Mac, see https://docs.continuum.io/anaconda/install#anaconda-
for-os-x-graphical-install
Once both of the environments are installed, repeat the steps from the preceding
section, Checking for presence of Java and Python. Everything should work now.
Checking and updating PATH
If, however, your CLI still produces errors, you will need to update the PATH. This is
necessaryforCLItondtherightbinariestorunSpark.
Setting the
PATH environment variable differs between Unix-like operating systems
and Windows. In this section, we will walk you through how to set these properly in
either of these systems.
剩余19页未读,继续阅读
资源评论
队长给我球23333
- 粉丝: 0
- 资源: 4
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功