没有合适的资源?快使用搜索试试~ 我知道了~
spark apache日志分析、流数据处理教程
5星 · 超过95%的资源 需积分: 10 195 下载量 33 浏览量
2015-02-06
22:28:32
上传
评论 4
收藏 556KB PDF 举报
温馨提示
试读
38页
Databricks Spark Reference Applications spar日志分析、流数据处理 java8代码
资源推荐
资源详情
资源评论
1. Introduction
2. Log Analysis with Spark
i. Section 1: Introduction to Apache Spark
i. First Log Analyzer in Spark
ii. Spark SQL
iii. Spark Streaming
i. Windowed Calculations: window()
ii. Cumulative Calculations: updateStateByKey()
iii. Reusing Code from Batching: transform()
ii. Section 2: Importing Data
i. Batch Import
i. Importing from Files
i. S3
ii. HDFS
ii. Importing from Databases
ii. Streaming Import
i. Built In Methods for Streaming Import
ii. Kafka
iii. Section 3: Exporting Data
i. Small Datasets
ii. Large Datasets
i. Save the RDD to Files
ii. Save the RDD to a Database
iv. Section 4: Log Analyzer Application
3. Twitter Streaming Language Classifier
i. Collect a Dataset of Tweets
ii. Examine the Tweets and Train a Model
i. Examine with Spark SQL
ii. Train with Spark MLLib
iii. Run Examine And Train
iii. Apply the Model in Real-time
Table of Contents
At Databricks, we are developing a set of reference applications that demonstrate how to use Apache Spark. This
book/repo contains the reference applications.
View the code in the Github Repo here: https://github.com/databricks/reference-apps
Read the documentation here: http://databricks.gitbooks.io/databricks-spark-reference-applications/
Submit feedback or issues here: https://github.com/databricks/reference-apps/issues
The reference applications will appeal to those who want to learn Spark and learn better by example. Browse the
applications, see what features of the reference applications are similar to the features you want to build, and refashion
the code samples for your needs. Additionally, this is meant to be a practical guide for using Spark in your systems, so the
applications mention other technologies that are compatible with Spark - such as what file systems to use for storing your
massive data sets.
Log Analysis Application - The log analysis reference application contains a series of tutorials for learning Spark by
example as well as a final application that can be used to monitor Apache access logs. The examples use Spark in
batch mode, cover Spark SQL, as well as Spark Streaming.
Twitter Streaming Language Classifier - This application demonstrates how to fetch and train a language classifier
for Tweets using Spark MLLib. Then Spark Streaming is used to call the trained classifier and filter out live tweets that
match a specified cluster. To build this example go into the twitter_classifier/scala and follow the direction in the
README.
This reference app is covered by license terms covered here.
Databricks Reference Apps
This project demonstrates how easy it is to do log analysis with Apache Spark.
Log analysis is an ideal use case for Spark. It's a very large, common data source and contains a rich set of information.
Spark allows you to store your logs in files to disk cheaply, while still providing a quick and simple way to process them.
We hope this project will show you how to use Apache Spark on your organization's production logs and fully harness the
power of that data. Log data can be used for monitoring your servers, improving business and customer intelligence,
building recommendation systems, preventing fraud, and much more.
This project is broken up into sections with bite-sized examples for demonstrating new Spark functionality for log
processing. This makes the examples easy to run and learn as they cover just one new topic at a time. At the end, we
assemble some of these examples to form a sample log analysis application.
The Apache Spark library is introduced, as well as Spark SQL and Spark Streaming. By the end of this chapter, a reader
will know how to call transformations and actions and work with RDDs and DStreams.
This section includes examples to illustrate how to get data into Spark and starts covering concepts of distributed
computing. The examples are all suitable for datasets that are too large to be processed on one machine.
This section includes examples to illustrate how to get data out of Spark. Again, concepts of a distributed computing
environment are reinforced, and the examples are suitable for large datasets.
This section puts together some of the code in the other chapters to form a sample log analysis application.
While that's all for now, there's definitely more to come over time.
Log Analysis with Spark
How to use this project
Section 1: Introduction to Apache Spark
Section 2: Importing Data
Section 3: Exporting Data
Section 4: Logs Analyzer Application
More to come...
In this section, we demonstrate how simple it is to analyze web logs using Apache Spark. We'll show how to load a
Resilient Distributed Dataset (RDD) of access log lines and use Spark tranformations and actions to compute some
statistics for web server monitoring. In the process, we'll introduce the Spark SQL and the Spark Streaming libraries.
In this explanation, the code snippets are in Java 8. However, there is also sample code in Java 6, Scala, and Python
included in this directory. In those folders are README's for instructions on how to build and run those examples, and the
necessary build files with all the required dependencies.
This chapter covers the following topics:
1. First Log Analyzer in Spark - This is a first Spark standalone logs analysis application.
2. Spark SQL - This example does the same thing as the above example, but uses SQL syntax instead of Spark
transformations and actions.
3. Spark Streaming - This example covers how to calculate log statistics using the streaming library.
Section 1: Introduction to Apache Spark
剩余37页未读,继续阅读
sanfendi
- 粉丝: 301
- 资源: 19
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 花数据集+数据集汇总+标签txt+数据集汇总代码+迁移学习最佳模型+全部迭代最佳模型
- 20240329224412.zip
- switch.docx `switch`语句是C++中的一种流程控制语句,通常用于根据表达式的值选择执行不同的代码块 下面是`
- python绘制直方图-02-进程之间不共享全局变量.ev4.rar
- python绘制直方图-01-第三天知识点回顾.ev4.rar
- 01背包问题动态规划.docx
- 表达式求值.docx表达式求值涉及许多不同的情况和方法,具体取决于表达式的形式和要求的精度 下面是一个简单的例子
- python绘制直方图-08-软件的卸载.ev4.rar
- tcp和udp的区别.docx
- 斐波那契数列c.docx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
- 1
- 2
前往页