.. Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
======================================
Flume 1.7.0 User Guide
======================================
Introduction
============
Overview
--------
Apache Flume is a distributed, reliable, and available system for efficiently
collecting, aggregating and moving large amounts of log data from many
different sources to a centralized data store.
The use of Apache Flume is not only restricted to log data aggregation.
Since data sources are customizable, Flume can be used to transport massive quantities
of event data including but not limited to network traffic data, social-media-generated data,
email messages and pretty much any data source possible.
Apache Flume is a top level project at the Apache Software Foundation.
There are currently two release code lines available, versions 0.9.x and 1.x.
Documentation for the 0.9.x track is available at
`the Flume 0.9.x User Guide <http://archive.cloudera.com/cdh/3/flume/UserGuide/>`_.
This documentation applies to the 1.4.x track.
New and existing users are encouraged to use the 1.x releases so as to
leverage the performance improvements and configuration flexibilities available
in the latest architecture.
System Requirements
-------------------
#. Java Runtime Environment - Java 1.7 or later
#. Memory - Sufficient memory for configurations used by sources, channels or sinks
#. Disk Space - Sufficient disk space for configurations used by channels or sinks
#. Directory Permissions - Read/Write permissions for directories used by agent
Architecture
------------
Data flow model
~~~~~~~~~~~~~~~
A Flume event is defined as a unit of data flow having a byte payload and an
optional set of string attributes. A Flume agent is a (JVM) process that hosts
the components through which events flow from an external source to the next
destination (hop).
.. figure:: images/UserGuide_image00.png
:align: center
:alt: Agent component diagram
A Flume source consumes events delivered to it by an external source like a web
server. The external source sends events to Flume in a format that is
recognized by the target Flume source. For example, an Avro Flume source can be
used to receive Avro events from Avro clients or other Flume agents in the flow
that send events from an Avro sink. A similar flow can be defined using
a Thrift Flume Source to receive events from a Thrift Sink or a Flume
Thrift Rpc Client or Thrift clients written in any language generated from
the Flume thrift protocol.When a Flume source receives an event, it
stores it into one or more channels. The channel is a passive store that keeps
the event until it's consumed by a Flume sink. The file channel is one example
-- it is backed by the local filesystem. The sink removes the event
from the channel and puts it into an external repository like HDFS (via Flume
HDFS sink) or forwards it to the Flume source of the next Flume agent (next
hop) in the flow. The source and sink within the given agent run asynchronously
with the events staged in the channel.
Complex flows
~~~~~~~~~~~~~
Flume allows a user to build multi-hop flows where events travel through
multiple agents before reaching the final destination. It also allows fan-in
and fan-out flows, contextual routing and backup routes (fail-over) for failed
hops.
Reliability
~~~~~~~~~~~
The events are staged in a channel on each agent. The events are then delivered
to the next agent or terminal repository (like HDFS) in the flow. The events
are removed from a channel only after they are stored in the channel of next
agent or in the terminal repository. This is a how the single-hop message
delivery semantics in Flume provide end-to-end reliability of the flow.
Flume uses a transactional approach to guarantee the reliable delivery of the
events. The sources and sinks encapsulate in a transaction the
storage/retrieval, respectively, of the events placed in or provided by a
transaction provided by the channel. This ensures that the set of events are
reliably passed from point to point in the flow. In the case of a multi-hop
flow, the sink from the previous hop and the source from the next hop both have
their transactions running to ensure that the data is safely stored in the
channel of the next hop.
Recoverability
~~~~~~~~~~~~~~
The events are staged in the channel, which manages recovery from failure.
Flume supports a durable file channel which is backed by the local file system.
There's also a memory channel which simply stores the events in an in-memory
queue, which is faster but any events still left in the memory channel when an
agent process dies can't be recovered.
Setup
=====
Setting up an agent
-------------------
Flume agent configuration is stored in a local configuration file. This is a
text file that follows the Java properties file format.
Configurations for one or more agents can be specified in the same
configuration file. The configuration file includes properties of each source,
sink and channel in an agent and how they are wired together to form data
flows.
Configuring individual components
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Each component (source, sink or channel) in the flow has a name, type, and set
of properties that are specific to the type and instantiation. For example, an
Avro source needs a hostname (or IP address) and a port number to receive data
from. A memory channel can have max queue size ("capacity"), and an HDFS sink
needs to know the file system URI, path to create files, frequency of file
rotation ("hdfs.rollInterval") etc. All such attributes of a component needs to
be set in the properties file of the hosting Flume agent.
Wiring the pieces together
~~~~~~~~~~~~~~~~~~~~~~~~~~
The agent needs to know what individual components to load and how they are
connected in order to constitute the flow. This is done by listing the names of
each of the sources, sinks and channels in the agent, and then specifying the
connecting channel for each sink and source. For example, an agent flows events
from an Avro source called avroWeb to HDFS sink hdfs-cluster1 via a file
channel called file-channel. The configuration file will contain names of these
components and file-channel as a shared channel for both avroWeb source and
hdfs-cluster1 sink.
Starting an agent
~~~~~~~~~~~~~~~~~
An agent is started using a shell script called flume-ng which is located in
the bin directory of the Flume distribution. You need to specify the agent
name, the config directory, and the config file on the command line::
$ bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template
Now the agent will start running source and sinks configured in the given
properties file.
A simple example
~~~~~~~~~~~~~~~~
Here, we give an example configuration file, describing a single-node Flume deployment.
This configuration lets a user generate events and subsequently logs them to the console.
.. code-block:: properties
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe
没有合适的资源?快使用搜索试试~ 我知道了~
apache-flume-1.7.0-bin.tar.gz
需积分: 26 18 下载量 167 浏览量
2017-09-23
09:47:14
上传
评论
收藏 53.13MB GZ 举报
温馨提示
共1551个文件
html:1362个
jar:102个
png:34个
Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集、聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。
资源推荐
资源详情
资源评论
收起资源包目录
apache-flume-1.7.0-bin.tar.gz (1551个子文件)
.buildinfo 230B
CHANGELOG 76KB
flume-ng.cmd 936B
apache-maven-fluido.min.css 45KB
stylesheet.css 11KB
basic.css 8KB
default.css 4KB
pygments.css 4KB
print.css 1KB
site.css 53B
DEVNOTES 6KB
FlumeUserGuide.doctree 2.27MB
FlumeDeveloperGuide.doctree 199KB
index.doctree 26KB
flume-ng 12KB
titlebar.gif 10KB
background.gif 2KB
update.gif 1KB
icon_help_sml.gif 1KB
titlebar_end.gif 849B
ajax-loader.gif 673B
icon_info_sml.gif 638B
icon_error_sml.gif 633B
icon_warning_sml.gif 625B
remove.gif 607B
icon_success_sml.gif 604B
add.gif 397B
fix.gif 366B
tab.gif 291B
index-all.html 1.73MB
FlumeUserGuide.html 383KB
constant-values.html 352KB
overview-tree.html 195KB
Event.html 153KB
Context.html 117KB
package-use.html 95KB
allclasses-frame.html 94KB
Configurable.html 85KB
allclasses-noframe.html 83KB
FlumeDeveloperGuide.html 83KB
serialized-form.html 76KB
LifecycleAware.html 70KB
ProtosFactory.LogFileMetaData.html 68KB
ProtosFactory.Checkpoint.Builder.html 67KB
ProtosFactory.LogFileMetaData.Builder.html 64KB
ProtosFactory.Checkpoint.html 63KB
ThriftFlumeEvent.html 60KB
DerbySchemaHandler.html 58KB
ProtosFactory.LogFileEncryption.html 57KB
ProtosFactory.FlumeEvent.Builder.html 55KB
ProtosFactory.FlumeEvent.html 55KB
ProtosFactory.TransactionEventHeader.html 55KB
ProtosFactory.FlumeEventHeader.html 53KB
NamedComponent.html 52KB
ProtosFactory.Put.html 51KB
ProtosFactory.ActiveLog.html 50KB
FlumeException.html 50KB
SpoolDirectorySourceConfigurationConstants.html 50KB
ProtosFactory.Take.html 49KB
ProtosFactory.Commit.html 47KB
BucketPath.html 47KB
ProtosFactory.TransactionEventFooter.html 46KB
ProtosFactory.LogFileEncryption.Builder.html 46KB
ProtosFactory.Rollback.html 44KB
ConfigurationConstants.html 44KB
package-use.html 44KB
EventDeliveryException.html 44KB
ResettableFileInputStream.html 43KB
ThriftFlumeEvent.html 43KB
Channel.html 42KB
KafkaChannelConfiguration.html 41KB
SpillableMemoryChannel.html 41KB
ThriftSourceProtocol.appendBatch_args.html 41KB
ProtosFactory.FlumeEventHeader.Builder.html 41KB
ProtosFactory.TransactionEventHeader.Builder.html 41KB
ProtosFactory.Put.Builder.html 40KB
SyslogUtils.html 40KB
ThriftFlumeEventServer.append_args.html 39KB
Scribe.Log_args.html 39KB
RpcClientConfigurationConstants.html 39KB
ThriftSourceProtocol.appendBatch_result.html 38KB
LogEntry.html 38KB
ThriftSourceProtocol.append_result.html 37KB
DatasetSinkConstants.html 37KB
ThriftSourceProtocol.append_args.html 37KB
FileChannelConfiguration.html 36KB
Scribe.Log_result.html 36KB
RegexHbaseEventSerializer.html 35KB
Sink.html 35KB
package-use.html 35KB
ProtosFactory.ActiveLog.Builder.html 35KB
IRCSink.IRCConnectionListener.html 34KB
ThriftFlumeEventServer.close_result.html 34KB
ProtosFactory.Take.Builder.html 34KB
ThriftFlumeEventServer.close_args.html 34KB
InterfaceAudience.Private.html 33KB
Context.html 33KB
Source.html 32KB
GangliaServer.html 32KB
RpcClientFactory.html 32KB
共 1551 条
- 1
- 2
- 3
- 4
- 5
- 6
- 16
资源评论
happyprince
- 粉丝: 211
- 资源: 115
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功