flume-ng-1.6.0-cdh5.10.1.tar.gz的下载资源-CSDN文库

需积分: 9 190 浏览量 2018-09-15 08:43:11 上传评论收藏 65.08MB GZ 举报

共1679个文件

html：1386个

jar：105个

patch：96个

Apache Flume 是一个分布式、可靠且可用的服务，用于有效地收集、聚合和移动大量日志数据。在大数据领域，它常被用于收集来自不同源的日志数据，并将其传输到集中存储系统，如Hadoop HDFS（Hadoop 分布式文件系统）。`flume-ng-1.6.0-cdh5.10.1.tar.gz` 是一个针对Cloudera Distribution Including Apache Hadoop (CDH) 5.10.1 版本优化的Flume NG的打包文件，Flume NG是Flume的下一代版本，提供更先进的特性和性能。 Flume的核心概念包括Agent、Source、Channel和Sink： 1. **Agent**：是Flume的基本工作单元，每个Agent包含Source、Channel和Sink三个组件。 - **Source**：负责接收数据，例如从网络套接字、syslog服务器或JMS队列中读取日志事件。 - **Channel**：是数据的临时存储，确保数据在Source和Sink之间的安全传输。常见的Channel类型有Memory Channel（内存通道）和File Channel（文件通道）。 - **Sink**：负责将数据发送到目标位置，如HDFS、HBase、Kafka或其他Flume Agent。 2. **配置与拓扑**：Flume的配置文件定义了Agent的配置，包括Source、Channel和Sink的类型以及它们之间的连接。多个Agent可以通过级联形成复杂的数据流拓扑，实现大规模数据收集。 3. **容错与可靠性**：Flume支持数据复制和确认机制，确保数据在传输过程中的高可用性。例如，通过设置多个Sink备份，可以实现数据的冗余传输，防止单点故障。 4. **可扩展性**：Flume的插件体系结构允许开发自定义Source、Sink和Channel，以适应各种特定的数据源和目标。 5. **集成CDH**：`flume-ng-1.6.0-cdh5.10.1` 版本是专门为CDH 5.10.1设计的，这意味着它已经与CDH中的其他组件（如Hadoop、Hive等）进行了很好的集成和优化，可以直接在CDH环境中部署和运行。解压`apache-flume-1.6.0-cdh5.10.1-bin`后，你会得到Flume的二进制文件，包括启动脚本、配置示例、库文件等。要开始使用Flume，你需要编写一个或多个配置文件来定义Agent的配置，然后通过命令行工具启动Agent。安装步骤大致如下： 1. 将解压后的目录移动到适当的位置，如`/usr/local/flume` 2. 编写Flume配置文件，如`/etc/flume/conf/flume.conf` 3. 修改配置文件，指定Source、Channel和Sink 4. 使用`flume-ng agent`命令启动Agent `flume-ng-1.6.0-cdh5.10.1.tar.gz`是一个强大的工具，对于需要实时处理和传输大规模日志数据的环境而言，它是不可或缺的一部分。通过其灵活的架构和丰富的功能，Flume使得日志数据的管理和分析变得更加简单和高效。

资源推荐

资源详情

资源评论

收起资源包目录

flume-ng-1.6.0-cdh5.10.1.tar.gz的下载（1679个子文件）

apply-patches 648B

.buildinfo 230B

CHANGELOG 68KB

flume-ng.cmd 936B

apache-maven-fluido.min.css 45KB

stylesheet.css 11KB

basic.css 8KB

default.css 4KB

pygments.css 4KB

print.css 1KB

site.css 53B

DEVNOTES 6KB

do-component-build 2KB

FlumeUserGuide.doctree 2.13MB

FlumeDeveloperGuide.doctree 200KB

index.doctree 26KB

flume-ng 12KB

titlebar.gif 10KB

background.gif 2KB

update.gif 1KB

icon_help_sml.gif 1KB

titlebar_end.gif 849B

ajax-loader.gif 673B

icon_info_sml.gif 638B

icon_error_sml.gif 633B

icon_warning_sml.gif 625B

remove.gif 607B

icon_success_sml.gif 604B

add.gif 397B

fix.gif 366B

tab.gif 291B

index-all.html 1.74MB

FlumeUserGuide.html 348KB

constant-values.html 347KB

overview-tree.html 200KB

Event.html 156KB

Context.html 112KB

allclasses-frame.html 96KB

package-use.html 95KB

Configurable.html 86KB

allclasses-noframe.html 85KB

FlumeDeveloperGuide.html 83KB

serialized-form.html 76KB

LifecycleAware.html 70KB

ProtosFactory.LogFileMetaData.html 68KB

ProtosFactory.Checkpoint.Builder.html 67KB

ProtosFactory.LogFileMetaData.Builder.html 64KB

ProtosFactory.Checkpoint.html 63KB

ThriftFlumeEvent.html 61KB

DerbySchemaHandler.html 58KB

ProtosFactory.LogFileEncryption.html 57KB

ProtosFactory.FlumeEvent.Builder.html 55KB

ProtosFactory.FlumeEvent.html 55KB

ProtosFactory.TransactionEventHeader.html 55KB

ProtosFactory.FlumeEventHeader.html 53KB

NamedComponent.html 52KB

ProtosFactory.Put.html 51KB

ProtosFactory.ActiveLog.html 50KB

FlumeException.html 50KB

ProtosFactory.Take.html 49KB

SpoolDirectorySourceConfigurationConstants.html 48KB

ProtosFactory.Commit.html 47KB

BucketPath.html 47KB

ProtosFactory.TransactionEventFooter.html 46KB

ProtosFactory.LogFileEncryption.Builder.html 46KB

ProtosFactory.Rollback.html 44KB

ConfigurationConstants.html 44KB

package-use.html 44KB

EventDeliveryException.html 44KB

ThriftFlumeEvent.html 44KB

ResettableFileInputStream.html 43KB

Channel.html 42KB

ThriftSourceProtocol.appendBatch_args.html 41KB

KafkaChannelConfiguration.html 41KB

SpillableMemoryChannel.html 41KB

ProtosFactory.FlumeEventHeader.Builder.html 41KB

ProtosFactory.TransactionEventHeader.Builder.html 41KB

ProtosFactory.Put.Builder.html 40KB

SyslogUtils.html 40KB

ThriftFlumeEventServer.append_args.html 39KB

Scribe.Log_args.html 39KB

LogEntry.html 39KB

RpcClientConfigurationConstants.html 39KB

ThriftSourceProtocol.appendBatch_result.html 39KB

ThriftSourceProtocol.append_result.html 38KB

ThriftSourceProtocol.append_args.html 37KB

DatasetSinkConstants.html 37KB

Scribe.Log_result.html 37KB

FileChannelConfiguration.html 36KB

Sink.html 35KB

RegexHbaseEventSerializer.html 35KB

ProtosFactory.ActiveLog.Builder.html 35KB

ThriftFlumeEventServer.close_result.html 34KB

IRCSink.IRCConnectionListener.html 34KB

ThriftFlumeEventServer.close_args.html 34KB

ProtosFactory.Take.Builder.html 34KB

Context.html 33KB

GangliaServer.html 33KB

Source.html 33KB

package-use.html 32KB

共 1679 条

.. Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ====================================== Flume 1.6.0 User Guide ====================================== Introduction ============ Overview -------- Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. The use of Apache Flume is not only restricted to log data aggregation. Since data sources are customizable, Flume can be used to transport massive quantities of event data including but not limited to network traffic data, social-media-generated data, email messages and pretty much any data source possible. Apache Flume is a top level project at the Apache Software Foundation. There are currently two release code lines available, versions 0.9.x and 1.x. Documentation for the 0.9.x track is available at `the Flume 0.9.x User Guide <http://archive.cloudera.com/cdh/3/flume/UserGuide/>`_. This documentation applies to the 1.4.x track. New and existing users are encouraged to use the 1.x releases so as to leverage the performance improvements and configuration flexibilities available in the latest architecture. System Requirements ------------------- #. Java Runtime Environment - Java 1.7 or later #. Memory - Sufficient memory for configurations used by sources, channels or sinks #. Disk Space - Sufficient disk space for configurations used by channels or sinks #. Directory Permissions - Read/Write permissions for directories used by agent Architecture ------------ Data flow model ~~~~~~~~~~~~~~~ A Flume event is defined as a unit of data flow having a byte payload and an optional set of string attributes. A Flume agent is a (JVM) process that hosts the components through which events flow from an external source to the next destination (hop). .. figure:: images/UserGuide_image00.png :align: center :alt: Agent component diagram A Flume source consumes events delivered to it by an external source like a web server. The external source sends events to Flume in a format that is recognized by the target Flume source. For example, an Avro Flume source can be used to receive Avro events from Avro clients or other Flume agents in the flow that send events from an Avro sink. A similar flow can be defined using a Thrift Flume Source to receive events from a Thrift Sink or a Flume Thrift Rpc Client or Thrift clients written in any language generated from the Flume thrift protocol.When a Flume source receives an event, it stores it into one or more channels. The channel is a passive store that keeps the event until it's consumed by a Flume sink. The file channel is one example -- it is backed by the local filesystem. The sink removes the event from the channel and puts it into an external repository like HDFS (via Flume HDFS sink) or forwards it to the Flume source of the next Flume agent (next hop) in the flow. The source and sink within the given agent run asynchronously with the events staged in the channel. Complex flows ~~~~~~~~~~~~~ Flume allows a user to build multi-hop flows where events travel through multiple agents before reaching the final destination. It also allows fan-in and fan-out flows, contextual routing and backup routes (fail-over) for failed hops. Reliability ~~~~~~~~~~~ The events are staged in a channel on each agent. The events are then delivered to the next agent or terminal repository (like HDFS) in the flow. The events are removed from a channel only after they are stored in the channel of next agent or in the terminal repository. This is a how the single-hop message delivery semantics in Flume provide end-to-end reliability of the flow. Flume uses a transactional approach to guarantee the reliable delivery of the events. The sources and sinks encapsulate in a transaction the storage/retrieval, respectively, of the events placed in or provided by a transaction provided by the channel. This ensures that the set of events are reliably passed from point to point in the flow. In the case of a multi-hop flow, the sink from the previous hop and the source from the next hop both have their transactions running to ensure that the data is safely stored in the channel of the next hop. Recoverability ~~~~~~~~~~~~~~ The events are staged in the channel, which manages recovery from failure. Flume supports a durable file channel which is backed by the local file system. There's also a memory channel which simply stores the events in an in-memory queue, which is faster but any events still left in the memory channel when an agent process dies can't be recovered. Setup ===== Setting up an agent ------------------- Flume agent configuration is stored in a local configuration file. This is a text file that follows the Java properties file format. Configurations for one or more agents can be specified in the same configuration file. The configuration file includes properties of each source, sink and channel in an agent and how they are wired together to form data flows. Configuring individual components ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Each component (source, sink or channel) in the flow has a name, type, and set of properties that are specific to the type and instantiation. For example, an Avro source needs a hostname (or IP address) and a port number to receive data from. A memory channel can have max queue size ("capacity"), and an HDFS sink needs to know the file system URI, path to create files, frequency of file rotation ("hdfs.rollInterval") etc. All such attributes of a component needs to be set in the properties file of the hosting Flume agent. Wiring the pieces together ~~~~~~~~~~~~~~~~~~~~~~~~~~ The agent needs to know what individual components to load and how they are connected in order to constitute the flow. This is done by listing the names of each of the sources, sinks and channels in the agent, and then specifying the connecting channel for each sink and source. For example, an agent flows events from an Avro source called avroWeb to HDFS sink hdfs-cluster1 via a file channel called file-channel. The configuration file will contain names of these components and file-channel as a shared channel for both avroWeb source and hdfs-cluster1 sink. Starting an agent ~~~~~~~~~~~~~~~~~ An agent is started using a shell script called flume-ng which is located in the bin directory of the Flume distribution. You need to specify the agent name, the config directory, and the config file on the command line:: $ bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template Now the agent will start running source and sinks configured in the given properties file. A simple example ~~~~~~~~~~~~~~~~ Here, we give an example configuration file, describing a single-node Flume deployment. This configuration lets a user generate events and subsequently logs them to the console. .. code-block:: properties # example.conf: A single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Des

评论收藏

内容反馈