没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Contents
1: Getting Started with Apache Spark
Chapter 1: Getting Started with Apache Spark
Introduction
Leveraging�Databricks Cloud
Deploying Spark using Amazon EMR
Installing Spark from binaries
Building the Spark source code with Maven
Launching Spark on Amazon EC2
Deploying Spark on a cluster in standalone mode
Deploying Spark on a cluster with Mesos
Deploying Spark on a cluster with YARN
Understanding SparkContext and SparkSession
Understanding resilient distributed dataset - RDD
2: Developing Applications with Spark
Chapter 2: Developing Applications with Spark
Introduction
Exploring the Spark shell
Developing a Spark applications in Eclipse with Maven
Developing a Spark applications in Eclipse with SBT
Developing a Spark application in IntelliJ IDEA with Maven
Developing a Spark application in IntelliJ IDEA with SBT
Developing applications using the Zeppelin notebook
Setting up Kerberos to do authentication
Enabling Kerberos authentication for Spark
3: Spark SQL
Chapter 3: Spark SQL
Understanding the evolution of schema awareness
Understanding the Catalyst optimizer
Inferring schema using case classes
Programmatically specifying the schema
Understanding the Parquet format
Loading and saving data using the JSON format
Loading and saving data from relational databases
Loading and saving data from an arbitrary source
Understanding joins
Analyzing nested structures
4: Working with External Data Sources
Chapter 4: Working with External Data Sources
Introduction
Loading data from the local filesystem
Loading data from HDFS
Loading data from Amazon S3
Loading data from Apache Cassandra
5: Spark Streaming
Chapter 5: Spark Streaming
Introduction
WordCount using Structured Streaming
Taking a closer look at Structured Streaming
Streaming Twitter data
Streaming using Kafka
Understanding streaming challenges
6: Getting Started with Machine Learning
Chapter 6: Getting Started with Machine Learning
Introduction
Creating vectors
Calculating correlation
Understanding feature engineering
Understanding Spark ML
Understanding hyperparameter tuning
7: Supervised Learning with MLlib � Regression
Chapter 7: Supervised Learning with MLlib � Regression
Introduction
Using linear regression
Understanding the cost function
Doing linear regression with lasso
Doing ridge regression
8: Supervised Learning with MLlib � Classification
Chapter 8: Supervised Learning with MLlib � Classification
Introduction
Doing classification using logistic regression
Doing binary classification using SVM
Doing classification using decision trees
Doing classification using random forest
Doing classification using gradient boosted trees
Doing classification with Na�ve Bayes
9: Unsupervised Learning
Chapter 9: Unsupervised Learning
Introduction
Clustering using k-means
Dimensionality reduction with principal component analysis
Dimensionality reduction with singular value decomposition
10: Recommendations Using Collaborative Filtering
Chapter 10: Recommendations Using Collaborative Filtering
Introduction
Collaborative filtering using explicit feedback
Collaborative filtering using implicit feedback
11: Graph Processing Using GraphX and GraphFrames
Chapter 11: Graph Processing Using GraphX and GraphFrames
Introduction
Fundamental operations on graphs
Using PageRank
Finding connected components
Performing neighborhood aggregation
Understanding GraphFrames
12: Optimizations and Performance Tuning
Chapter 12: Optimizations and Performance Tuning
Optimizing memory
Leveraging speculation
Optimizing joins
Using compression to improve performance
Using serialization to improve performance
Optimizing the level of parallelism
Understanding project Tungsten
Chapter 1. Getting Started with Apache
Spark
In this chapter, we will set up Spark and configure it. This chapter contains the
following recipes:
Leveraging Databricks Cloud
Deploying Spark using Amazon EMR
Installing Spark from binaries
Building the Spark source code with Maven
Launching Spark on Amazon EC2
Deploying Spark on a cluster in standalone mode
Deploying Spark on a cluster with Mesos
Deploying Spark on a cluster with YARN
Understanding SparkContext and SparkSession
Understanding Resilient Distributed Datasets (RDD)
剩余352页未读,继续阅读
资源评论
- chennianxiao20122017-11-25不错的资源,赞一个
yinkaisheng-nj
- 粉丝: 763
- 资源: 6953
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功