.. Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
======================================
Flume 1.5.0 User Guide
======================================
Introduction
============
Overview
--------
Apache Flume is a distributed, reliable, and available system for efficiently
collecting, aggregating and moving large amounts of log data from many
different sources to a centralized data store.
The use of Apache Flume is not only restricted to log data aggregation.
Since data sources are customizable, Flume can be used to transport massive quantities
of event data including but not limited to network traffic data, social-media-generated data,
email messages and pretty much any data source possible.
Apache Flume is a top level project at the Apache Software Foundation.
There are currently two release code lines available, versions 0.9.x and 1.x.
Documentation for the 0.9.x track is available at
`the Flume 0.9.x User Guide <http://archive.cloudera.com/cdh/3/flume/UserGuide/>`_.
This documentation applies to the 1.4.x track.
New and existing users are encouraged to use the 1.x releases so as to
leverage the performance improvements and configuration flexibilities available
in the latest architecture.
System Requirements
-------------------
#. Java Runtime Environment - Java 1.6 or later (Java 1.7 Recommended)
#. Memory - Sufficient memory for configurations used by sources, channels or sinks
#. Disk Space - Sufficient disk space for configurations used by channels or sinks
#. Directory Permissions - Read/Write permissions for directories used by agent
Architecture
------------
Data flow model
~~~~~~~~~~~~~~~
A Flume event is defined as a unit of data flow having a byte payload and an
optional set of string attributes. A Flume agent is a (JVM) process that hosts
the components through which events flow from an external source to the next
destination (hop).
.. figure:: images/UserGuide_image00.png
:align: center
:alt: Agent component diagram
A Flume source consumes events delivered to it by an external source like a web
server. The external source sends events to Flume in a format that is
recognized by the target Flume source. For example, an Avro Flume source can be
used to receive Avro events from Avro clients or other Flume agents in the flow
that send events from an Avro sink. A similar flow can be defined using
a Thrift Flume Source to receive events from a Thrift Sink or a Flume
Thrift Rpc Client or Thrift clients written in any language generated from
the Flume thrift protocol.When a Flume source receives an event, it
stores it into one or more channels. The channel is a passive store that keeps
the event until it's consumed by a Flume sink. The file channel is one example
-- it is backed by the local filesystem. The sink removes the event
from the channel and puts it into an external repository like HDFS (via Flume
HDFS sink) or forwards it to the Flume source of the next Flume agent (next
hop) in the flow. The source and sink within the given agent run asynchronously
with the events staged in the channel.
Complex flows
~~~~~~~~~~~~~
Flume allows a user to build multi-hop flows where events travel through
multiple agents before reaching the final destination. It also allows fan-in
and fan-out flows, contextual routing and backup routes (fail-over) for failed
hops.
Reliability
~~~~~~~~~~~
The events are staged in a channel on each agent. The events are then delivered
to the next agent or terminal repository (like HDFS) in the flow. The events
are removed from a channel only after they are stored in the channel of next
agent or in the terminal repository. This is a how the single-hop message
delivery semantics in Flume provide end-to-end reliability of the flow.
Flume uses a transactional approach to guarantee the reliable delivery of the
events. The sources and sinks encapsulate in a transaction the
storage/retrieval, respectively, of the events placed in or provided by a
transaction provided by the channel. This ensures that the set of events are
reliably passed from point to point in the flow. In the case of a multi-hop
flow, the sink from the previous hop and the source from the next hop both have
their transactions running to ensure that the data is safely stored in the
channel of the next hop.
Recoverability
~~~~~~~~~~~~~~
The events are staged in the channel, which manages recovery from failure.
Flume supports a durable file channel which is backed by the local file system.
There's also a memory channel which simply stores the events in an in-memory
queue, which is faster but any events still left in the memory channel when an
agent process dies can't be recovered.
Setup
=====
Setting up an agent
-------------------
Flume agent configuration is stored in a local configuration file. This is a
text file that follows the Java properties file format.
Configurations for one or more agents can be specified in the same
configuration file. The configuration file includes properties of each source,
sink and channel in an agent and how they are wired together to form data
flows.
Configuring individual components
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Each component (source, sink or channel) in the flow has a name, type, and set
of properties that are specific to the type and instantiation. For example, an
Avro source needs a hostname (or IP address) and a port number to receive data
from. A memory channel can have max queue size ("capacity"), and an HDFS sink
needs to know the file system URI, path to create files, frequency of file
rotation ("hdfs.rollInterval") etc. All such attributes of a component needs to
be set in the properties file of the hosting Flume agent.
Wiring the pieces together
~~~~~~~~~~~~~~~~~~~~~~~~~~
The agent needs to know what individual components to load and how they are
connected in order to constitute the flow. This is done by listing the names of
each of the sources, sinks and channels in the agent, and then specifying the
connecting channel for each sink and source. For example, an agent flows events
from an Avro source called avroWeb to HDFS sink hdfs-cluster1 via a file
channel called file-channel. The configuration file will contain names of these
components and file-channel as a shared channel for both avroWeb source and
hdfs-cluster1 sink.
Starting an agent
~~~~~~~~~~~~~~~~~
An agent is started using a shell script called flume-ng which is located in
the bin directory of the Flume distribution. You need to specify the agent
name, the config directory, and the config file on the command line::
$ bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template
Now the agent will start running source and sinks configured in the given
properties file.
A simple example
~~~~~~~~~~~~~~~~
Here, we give an example configuration file, describing a single-node Flume deployment.
This configuration lets a user generate events and subsequently logs them to the console.
.. code-block:: properties
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r
没有合适的资源?快使用搜索试试~ 我知道了~
flume-ng-1.5.0-cdh5.3.6.rar
共1548个文件
html:1244个
jar:123个
patch:93个
需积分: 0 0 下载量 199 浏览量
2024-03-01
21:16:20
上传
评论
收藏 71.31MB RAR 举报
温馨提示
flume-ng-1.5.0-cdh5.3.6.rarflume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar
资源推荐
资源详情
资源评论
收起资源包目录
flume-ng-1.5.0-cdh5.3.6.rar (1548个子文件)
apply-patches 648B
.buildinfo 230B
CHANGELOG 60KB
apache-maven-fluido.min.css 45KB
stylesheet.css 11KB
basic.css 8KB
default.css 4KB
pygments.css 4KB
print.css 1KB
site.css 53B
DEVNOTES 6KB
do-component-build 2KB
FlumeUserGuide.doctree 1.77MB
FlumeDeveloperGuide.doctree 164KB
index.doctree 26KB
flume-ng 12KB
titlebar.gif 10KB
background.gif 2KB
update.gif 1KB
icon_help_sml.gif 1KB
titlebar_end.gif 849B
ajax-loader.gif 673B
icon_info_sml.gif 638B
icon_error_sml.gif 633B
icon_warning_sml.gif 625B
remove.gif 607B
icon_success_sml.gif 604B
add.gif 397B
fix.gif 366B
tab.gif 291B
index-all.html 1.58MB
FlumeUserGuide.html 291KB
constant-values.html 285KB
overview-tree.html 181KB
Event.html 131KB
Context.html 102KB
allclasses-frame.html 87KB
package-use.html 84KB
Configurable.html 79KB
allclasses-noframe.html 77KB
serialized-form.html 73KB
FlumeDeveloperGuide.html 72KB
ProtosFactory.LogFileMetaData.html 68KB
LifecycleAware.html 67KB
ProtosFactory.Checkpoint.Builder.html 67KB
ProtosFactory.LogFileMetaData.Builder.html 64KB
ProtosFactory.Checkpoint.html 63KB
ThriftFlumeEvent.html 60KB
DerbySchemaHandler.html 58KB
ProtosFactory.LogFileEncryption.html 57KB
ProtosFactory.FlumeEvent.Builder.html 55KB
ProtosFactory.FlumeEvent.html 55KB
ProtosFactory.TransactionEventHeader.html 55KB
ProtosFactory.FlumeEventHeader.html 53KB
ProtosFactory.Put.html 51KB
ProtosFactory.ActiveLog.html 50KB
NamedComponent.html 49KB
ProtosFactory.Take.html 49KB
ProtosFactory.Commit.html 47KB
ProtosFactory.TransactionEventFooter.html 46KB
ProtosFactory.LogFileEncryption.Builder.html 46KB
BucketPath.html 45KB
ProtosFactory.Rollback.html 44KB
ConfigurationConstants.html 44KB
FlumeException.html 44KB
SpoolDirectorySourceConfigurationConstants.html 44KB
ResettableFileInputStream.html 43KB
ThriftFlumeEvent.html 43KB
Channel.html 42KB
package-use.html 41KB
SpillableMemoryChannel.html 41KB
ThriftSourceProtocol.appendBatch_args.html 41KB
ProtosFactory.FlumeEventHeader.Builder.html 41KB
ProtosFactory.TransactionEventHeader.Builder.html 41KB
ProtosFactory.Put.Builder.html 40KB
SyslogUtils.html 40KB
ThriftFlumeEventServer.append_args.html 39KB
Scribe.Log_args.html 39KB
ThriftSourceProtocol.appendBatch_result.html 38KB
LogEntry.html 38KB
RpcClientConfigurationConstants.html 38KB
ThriftSourceProtocol.append_result.html 37KB
ThriftSourceProtocol.append_args.html 37KB
Scribe.Log_result.html 36KB
EventDeliveryException.html 36KB
RegexHbaseEventSerializer.html 35KB
FileChannelConfiguration.html 35KB
ProtosFactory.ActiveLog.Builder.html 35KB
IRCSink.IRCConnectionListener.html 34KB
ThriftFlumeEventServer.close_result.html 34KB
Sink.html 34KB
ProtosFactory.Take.Builder.html 34KB
ThriftFlumeEventServer.close_args.html 34KB
Context.html 33KB
GangliaServer.html 33KB
RpcClientFactory.html 32KB
ElasticSearchSinkConstants.html 32KB
SyslogSourceConfigurationConstants.html 32KB
JMSSourceConfiguration.html 31KB
package-use.html 31KB
共 1548 条
- 1
- 2
- 3
- 4
- 5
- 6
- 16
资源评论
醒目目
- 粉丝: 157
- 资源: 144
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 上市公司-人工智能的采纳程度面板数据(2003-2021年).xlsx
- 第5章spring-mvc请求映射处理
- 2023-04-06-项目笔记 - 第一百十六阶段 - 4.4.2.114全局变量的作用域-114 -2024.04.27
- app-release.apk.1
- soap json 等系列化方式
- c++的五子棋代码,在vs6.0上完美运行
- 基于Javaee的影视创作论坛的设计与实现.rar
- Python导出Mysql数据字典(部分表或全表)
- Java工具类实现输入一个路径,强创建路径、并且鉴权目标路径是否具备修改权限,用于增强程序的健壮性与稳定性,快速开发!
- 资源【STM32+HAL】三轴按键PS2摇杆
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功