FlumeUserGuide.pdf 1.6.0

所需积分/C币:15 2016-04-23 19:32:12 1.25MB PDF
3
收藏 收藏
举报

官网pdf指南 FlumeUserGuide.pdf 1.6.0 免积分下载
s bin/flume-ng agent --conf conf --conf-file example conf --name al -Dflume. root logger=INFO, console Note that in a full deployment we would typically include one more option --conf=<conf-dir>. The <conf-dir> directory would include a shell script flume-env. sh and potentially a log4j properties file. In this example, we pass a Java option to force Flume to log to the console and we go without a custom environment script From a separate terminal, we can then telnet port 44444 and send Flume an event s telnet localhost 4 4444 rrying od to localhost. localdomain (127.0.0.1) Connecter Escape character is] Hello world! <ENTER> The original Flume terminal will output the event in a log message 12/05/1915:32: 19 INFo source, NetcatSource: Source starting 12/05/19 15: 32: 19 INFO source. Netcatsource: Created server Socket: sunnio. ch ServerSocketchannelImpI[/127.0.0.1: 44AAN0 world!. y 12/05/19 15: 32: 34 INFo sink. Loggersink: Event:( headers:( body 48 65 6C6C 6F 20 77 6F 72 6C 64 21 OD Congratulations- you',ve successfully configured and deployed a Flume agent! Subsequent sections cover agent configuration in much more detail Zookeeper based Configuration Flume supports Agent configurations via Zookeeper. This is an experimental feature. The configuration file needs to be uploaded in the Zookeeper, under a configurable prefix. The configuration file is stored in Zookeeper Node data. Following is how the Zookeeper Node tree would look like for agents a1 and a2 /£lume /al [Agent config file] /a2 [Agent config file] Once the configuration file is uploaded, start the agent with following options S bin/flume-ng agent-conf conf -z zkhost: 2181, zkhost1: 2181-p/flume -name a1-Dflume. root logger=INFo, console Argument Name Default Description Zookeeper connection string. Comma separated list of hostname port /flume Base Path in Zookeeper to store Agent configurations Installing third-party plugins Flume has a fully plugin-based architecture. While Flume ships with many out-of-the-box sources, channels, sinks, serializers, and the like, many implementations exist which ship separately from Flume While it has always been possible to include custom Flume components by adding their jars to the FLUME_ CLASSPATh variable in the flume-env sh file, Flume now supports a special directory called plugins. d which automatically picks up plugins that are packaged in a specific format. This allows for easier management of plugin packaging issues as well as simpler debugging and troubleshooting of several classes of issues, especially library dependency conflicts The plugins. d directory The plugins. d directory is located at sFLUME_HOME/plugins. d. At startup time, the flume-ng start script looks in the plugins. d directory for plugins that conform to the below format and includes them in proper paths when starting up java Directory layout for plugins Each plugin(subdirectory)within plugins. d can have up to three sub-directories 1. lib- the plugin's jar(s 2. libext- the plugins dependency jar(s) 3. native-any required native libraries, such as. so files Example of two plugins within the plugins. d directory lugins.d plugins. d/custom-scurce-1/lib/my-source jar lugins. d/custom -source-l/libext/spring-core-256.jar plugins. d/custcm-source-2/ plugins. d/custom-sourcc-2/lib/custom. jar plugins. d/ custom-source-2/native/gettext. so Data ingestion Flume supports a number of mechanisms to ingest data from external sources RPC An Avro client included in the flume distribution can send a given file to Flume Avro source using avro RPC mechanism s bin/flume-ng avro-client -H localhost -p 41414 -F /usr/logs/log. 10 The above command will send the contents of /usr/logs/og. 10 toto the Flume source listening on that ports Executing commands There's an exec source that executes a given command and consumes the output. a single line of output ie. text followed by carriage return('tr)or ine feed('in)or both together. Note: Flume does not support tail as a source. One can wrap the tail command in an exec source to stream the file Network streams Flume supports the following mechanisms to read data from popular log stream types, such as 2 Thrift 4. Netcat Setting multi-agent flow Some SI Source Sink foo Channel hannel Agent bar In order to flow the data across multiple agents or hops, the sink of the previous agent and source of the current hop need to be avro type with the sink pointing to the hostname(or IP address)and port of the source Consolidation A very common scenario in log collection is a large number of log producing clients sending data to a few consumer agents that are attached to the storage subsystem. For example, logs collected from hundreds of web servers sent to a dozen of agents that write to HDFS cluster Awre Source Consolidatin A AVro scurce channel gent HDFS Scurce Sink channe Agent. This can be achieved in Flume by configuring a number of first tier agents with an avro sink, all pointing to an avro source of single agent(Again you could use the thrift sources/sinks/clients in such a scenario). This source on the second tier agent consolidates the received events into a single channel which is consumed by a sink to its final destination Multiplexing the flow Flute ascent s one l tno heave ls low to one r more desfinalin. Thi is achieved by defining a low multiplexer tat can replicate or selectively Channel1 Sark HDFS ource Channel2 SIrk Agent SInks Source Channel4 The above example shows a source from agent foo"fanning out the flow to three different channels. This fan out can be replicating or multiplexing In case of replicating flow, each event is sent to all three channels. For the multiplexing case, an event is delivered to a subset of available channels when an event's attribute matches a preconfigured value. For example, if an event attribute called txnType"is set to"customer", then it should go to channell and channel3, if it's "vendor"then it should go to channel2, otherwise channel3. The mapping can be set in the agent's contiguration file Configuration As mentioned in the earlier section, Flume agent configuration is read from a file that resembles a Java property file format with hierarchical property settings. Defining the flow To define the flow within a single agent, you need to link the sources and sinks via a channel. You need to list the sources, sinks and channels for the given agent, and then point the source and sink to a channel. A source instance can specify multiple channels, but a sink instance can only specify one channel. the format is as follows #1ist the sources, sinks and channels for the agent Agent>. sources <Source <Agent> sinks <sink> <Agent> channels -<channel1> <channel2> f set channel for source <Agent> sources. <Scurce> channels = <Channell> <channel2> y set channel for sink <Agent>sinks. <sink>. channel =<Channel1> For example, an agent named agent foo is reading data from an external avro client and sending it to HDFS via a memory channel. The config file weblog. config could look like f list the sources, sinks and channels for the agent agent foo, sources avro-appserver-src-l agent foo. sinks hdEs-sink-1 agent foo. channels mem-channel-1 f set channel for source agent foo. sources. avro-appserver-src-lchannels mem-chanrel-l f set channel for sink agent foo sinks. hdfs-sink-l channel mem-channel-1 This will make the events flow from avro-AppSrv-source to hdfs-Cluster1-sink through the memory channel mem -channel-1. When the agent is started with the weblog. config as its config file, it will instantiate that flow Configuring individual components After defining the flow, you need to set properties of each source, sink and channel. This is done in the same hierarchical namespace fashion where you set the component type and other values for the properties specific to each component f properties for sources <Agent> sources. <Source>.<someProperty> <someValue> f properties for channels <Agent> channel. <Channel>, <someProperty> = <somevalue> f properties for sinks Agent> sources. <Sink>. <sorreProperty> =<somevalue> The property"type"needs to be set for each component for Flume to understand what kind of object it needs to be. Each source, sink and char has its own set of properties required for it to function as intended. All those need to be set as needed. In the previous example, we have a avro-AppSrv-source to hdfs-Cluster1-sink through the memory channel mem-channeI-1. Here's an example that shows contiguration of each of those components agent foo. sources avro-Appsrv-source agent foo. sinks hdfE-Clusterl-sink agent foo, channels mem-channel-1 f set channel for sources, sinks properties of avro-AppSrv-source agent foo. sources. avro-AppSrv-source type avro agent foo. sources. avro-AppSrv-source bind localhost agent foo, sources, avro-AppSrv-source port =10000 properties of men-channel-I agent foo. channels. mem-channel-ltype =memory agent foo, channels. mer-channel-1 capacity =10C0 agent foo. channels. menm-channel-l transactioncapacity =100 f properties of hdfs-Clusterl-sink agent foo, sinks. hdfs-Clustcrl-sink, type hdfs agent foo sinks. hdfs-clusterl-sink hdfs path hdfs: //namenode/flume/webdata Adding multiple flows in an agent A single Flume agent can contain several independent flows. You can list multiple sources, sinks and channels in a config. These components can be inked to form multiple flows flist the sources, sinks and channels for the agent agent>. sources <source1> <source2> <Agent>sinks <sink> <sink> <Agent> channels =<channel1> <channel2 Then you can link the sources and sinks to their corresponding channels(for sources)of channel (for sinks)to setup two different flows For example, if you need to setup two flows in an agent, one going from an external avro client to external HDFS and another from output of a tail to avro sink, then here' s a config to do tha f list the sources, sinks and channels in the agent agent foo. sources avro-AppSrv-sourcel exec-tail-source2 agent foo. sinks hdfs-Clu usterl-sinkl avro-forward-sink2 agent foo. channels nem-channel-1 file-channel-2 #1⊙W#1 configuration agent foo. sources. avro-AppSrv-sourcel, channels mem-channel-1 agent foo sinks. hdfs-clusterl-sinkI channel mem-channel-1 flow #2 configuration agent foo. sources. cxec-tail-source2 channels file-channel agent foo sinks. avro-forward-sink2 channel file-channel-2 Configuring a multi agent flow To setup a multi-tier flow, you need to have an avro/thrift sink of first hop pointing to avro/thrift source of the next hop This will result in the first Flume agent forwarding events to the next Flume agent. For example, if you are periodically sending files (1 file per event) using avro client to a local Flume agent, then this local agent can forward it to another agent that has the mounted for storage Weblog agent config *list sources, sInks and channels in the agent agent foo. sources =avro-AppSrv-Bource agent foo, sinks avro-forward-sink agent. foo. channels file-channel f define the flow agent foo, sources, avro-AppSrv-source channels file-channel agent foo sinks. avro-forward-sink channel file-channel aVro S⊥ nk properties agent foo. sources. avro-forward-sink type avro agent foo, sources, avro-forward-sink hostname =10.1.1. 100 agent. foo. sources. avro-forward-sink port =10000 f configure other pieces HDFS agent config # list sources, sinks and channels in the agent agent foo, sources avro-collection-source agent foo. sinks hdfs-sink agent foo, channels mem-channel f define the flow agent foo. sources. avro-collection-source channels mem-channel agent foo sinks. hdfs-sink, channel mem-channel y avro s⊥ nk propert⊥es agent foo. sources. avro-collection-source type avro agent foo sources. avro-collection-source bind 10 1.1.100 agent foo. sources. avro-collection-source port =10000 configure other pieces Here we link the avro-forward-sink from the weblog agent to the avro - -source of the hdfs agent. this will result in the events coming from the external appserver source eventually getting stored in HDFS Fan out flow As discussed in previous section, Flume supports fanning out the flow from one source to multiple channels. There are two modes of fan out, replicating and multiplexing In the replicating flow, the event is sent to all the configured channels. In case of multiplexing, the event is sent to only a subset of qualifying channels. To fan out the flow, one needs to specify a list of channels for a source and the policy for the fanning it out. This is done by adding a channelselector'that can be replicating or multiplexing. Then further specify the selection rules if it's a multiplexer. If you dont specify a selector, then by default it's replicating f List the sources, sinks and channels for the agent Agent>. sources = <Source> <Agent> sinks <Sink> <sink> <Agent>channels = <channel1> <channe12> f set list of channels for source (separated by space <Agent> sources. <Source1>, channels <Channel1> <channel2> set channel for sinks <Agent> sinks. <Sink1> channel <channell> <Agent> sinks. <Sink2>, channel =<Channel2> Agent> sources.<Source1> selector. type replicating The multiplexing select has a turther set of properties to bifurcate the flow. T his requires specifying a mapping of an event attribute to a set for channel The selector checks for each configured attribute in the event header. If it matches the specified value, then that event is sent to all the channels mapped to that value. If theres no match, then the event is sent to set of channels configured as default f Mapping for multiplexing selector <Agent> sources. <source1> selector. type multiplexing Agent> sources. <Source1> selector header <someHeader> <Agent> sources. <Source1> selector. mapping. <value 1>=<Channell> <Agent> sources. <Scurcel>, selector. mapping. <value2> =<Channel1> <Channel2> <Agent>. sources. <Sourcel> selector. mapping. <value3> <Charne12> <Agent> sources. <Sourcel> selector. default <channel2> The mapping allows overlapping the channels for each value The following example has a single flow that multiplexed to two paths the agent named agent_foo has a single avro source and two channels linked to two sinks f list the sources, sinks and channels in the agent agent foo. sources avro-AppSrv-sourcel agent foo. sinks hdfs-Clusterl-sinkl avro-forward-sink2 agent foo, channels mmem-channel-1 file-channel-2 f set channels for source agent foo, sources, avro-AppSrv-sourcel, channels mem-channel-l file-channel-2 set channe1forS⊥nks agent foo sinks. hdfs-Clusterl-sinkl channel mem-channel-l agent foo, sinks.avro-forward-sink2 channel file-channel-2 f channel selector configuration agent foo. sources. avro-AppSrv-sourcel, selector. type multiplexing agent foo. sources. avro-AppSrv-sourcel, selector header State agent foo. sources. avro-AppSrv-sourcel selector mapping. CA mem-channel-1 agent foo. sources. avro-AppSrv-sourcel selector mapping. Az file-channel-2 agent foo. sources. avro-AppSrv-sourcel, selector mapping. NY memm-channel-l file-channel-2 agent foo. sources. avro-AppSrv-sourcel selector default mem-channel-I The selector checks for a header called State. If the value is"CA"then its sent to mem-channel-1, if its "az then it goes to file- channel-2 or if its " nY then both. If the State" header is not set or doesn't match any of the three, then it goes to mem-channel-1 which is designated as default The selector also supports optional channels To specify optional channels for a header, the config parameter 'optional is used in the following way f channel selector configuration agent foo. sources. avro-AppSrv-sourcel selector type multiplexin agent foo. sources. avro-AppSrv-sourcel selector header State it foo, sources, avro-AppSrv-sourcel, selector. mapping. CA mem-channel-l nt foo. sources. avro-AppSrv-sourcel, selector mapping.AZ file-channel-2 t foo. sources. avro-AppSrv-sourcel, selector. mapping. NY mem-channel-1 file-channel-2 agent too. sources. avro-AppSrv-sourcel, selector. optional. CA mem-channel-l file-channe1-2 agent foo. sources. avro-AppSrv-sourcel selector mapping.AZ file-channel-2 agent foo. sources. avro-AppSrv-sourcel, selector, default mem-channel-l The selector will attempt to write to the required channels first and will fail the transaction if even one of these channels fails to consume the events The transaction is reattempted on all of the channels. Once all required channels have consumed the events, then the selector will attempt to write to the optional channels. a failure by any of the optional channels to consume the event is simply ignored and not retried If there is an overlap between the optional channels and required channels for a specific header, the channel is considered to be required and a failure in the channel will cause the entire set of required channels to be retried For instance, in the above example, for the header"CA"mem- channel-1 is considered to be a required channel even though it is marked both as required and optional, and a failure to write to this channel will cause that event to be retried on all channels configured for the selector. Note that if a header does not have any required channels, then the event will be written to the default channels and will be attempted to be written to the optional channels for that header Specifying optional channels will still cause the event to be written to the default channels, if no required channels are specified. If no channels are designated as default and there are no required, the selector will attempt to write the events to the optional channels Any failures are simply ignored in that case Flume Sources Avro source Listens on Avro port and receives events from external Avro client streams. When paired with the built-in Avro Sink on another (previous hop)Flume agent, it can create tiered collection topologies. Required properties are in bold Property Name Default Description channels ty The component type name, needs to be avro bind hostname or iP address to listen on port Port #f to bind to threads Maximum number of worker threads to spawn lector typ selector. interceptors Space-separated list of interceptors interceptors compression- none This can be "none"or" deflate". The compression-type must match the compression-type of matching AvroSource ty false Set this to true to enable SsL encryption. You must also specify a"keystore"and akeystore-password keystore This is the path to a Java keystore file. Required for SSL keystore The password for the Java keystore. Required for SsL password keystore-type JKs The type of the Java keystore. This can be JKS or"PKCS12 exclude SSLv3 Space-separated list of SsL/TLS protocols to exclude. SSLv3 will always be excluded in addition to the protocols protocols filte false Set this to true to enable ipFiltering for netty ipFilter Rules Example for agent named a1 al. sources r1 al. channels cl al. sources. rl. type avro al. sources. rl, channels cl al. sources. rl, bind =0.0.0.0 al.sources.rl.por 4141 Example of ipFilterrules ip FilterRules defines n netty ip Filters separated by a comma a pattern rule must be in this format <allow'or deny> <ip or 'name for computer name> <pattern> or allow/deny ip/name pattern example: ip Filter Rules=allow: ip: 127. * allow: name: localhost, deny: ip Note that the first rule to match will apply as the example below shows from a client on the localhost This will Allow the client on localhost be deny clients from any other ip"allow: name: localhost, deny: ip: " This will deny the client on localhost be allow clients from any other ip"deny name: localhost, allow: ip Thrift source Listens on Thrift port and receives events from external Thrift client streams. When paired with the built-in Thrift Sink on another(previous hop) Flume agent, it can create tiered collection topologies. Thrift source can be configured to start in secure mode by enabling kerberos authentication. agent- principal and agent-keytab are the properties used by the thrift source to authenticate to the kerberos KDc Required properties are in bold Property Name Default Description channels type The component type name, needs to be thrift bind hostname or ip address to listen on port Port to bind to threads Maximum number of worker threads to spawn selector. type selector interceptors Space separated list of interceptors interceptors alse Set this to true to enable SSL encryption. You must also specify a keystore"and a"keystore-password keystore This is the path to a Java keystore file Required for SsL keystore- The password for the java keystore Required for ssL password keystore The type of the Java keystore. This can be“JKs"or“中Kcs12 typ exclude SSLV3 Space-separated list of SSL/TLS protocols to exclude. SSLv3 will always be excluded in addition to the protocols specified otocols kerberos false Set to true to enable kerberos authentication. In kerberos mode, agent-principal and agent-keytab are required for successful authentication. The Thrift source in secure mode, will accept connections only from Thrift clients that have kerberos enabled and are successfully authenticated to the kerberos Kdc agent- The kerberos principal used by the thrift Source to authenticate to the kerberos Kdc principal agent-keytab -- The keytab location used by the Thrift Source in combination with the agent-principal to authenticate to the kerberos KDc EXample for agent named a1: al. sources =rl al channels cl al.sources.rl. type thrift a1. sources. rl, channels cl al. sources. rl, bind =0.0.0.0 al. sources. rl, port =4141 Exec source Exec source runs a given Unix command on start-up and expects that process to continuously produce data on standard out (stderr is simply discarded, unless property log StdErr is set to true). If the process exits for any reason, the source also exits and will produce no further data. This means configurations such as cat [named pipe] or tail -p [file] are going to produce the desired results where as date will probably not- the former two commands produce streams of data where as the latter produces a single event and exits Required properties are in bold Property Name Default Description channels type le. needs to be exec command The command to execute shell A shell invocation used to run the command. e.g. /bin/sh-C Required only for commands relying on shell features like wildcards, back ticks, pipes etc. restartThrottle 10000 Amount of time(in millis) to wait before attempting a restart restart Whether the executed cmd should be restarted if it dies logStdErr false Whether the command's stderr should be logged batch size The max number of lines to read and send to the channel at a time batch Timeout 3000 Amount of time(in milliseconds) to wait, if the buffer size was not reached, before data is pushed downstream selector type replicating replicating or multiplexing selector Depends on the selector type value interceptors Space-separated list of interceptors Interceptors Warning: The problem with Exec Source and other asynchronous sources is that the source can not guarantee that if there is a failure to put the event into the Channel the client knows about it. In such cases, the data will be lost. As a for instance, one of the most commonly requested features is the tail -F [file]-like use case where an application writes to a log file on disk and Flume tails the file, sending each line as an event. While this is possible, theres an obvious problem; what happens if the channel fills up and Flume can't send an event? Flume has no way of indicating to the application writing the log file that it needs to retain the log or that the event hasnt been sent, for some reason. If this doesn ' t make sense, you need only know this: Your application can never guarantee data has been received when using a unidirectional asynchronous interface such as Exec Source! As an extension of this warning-and to be completely clear-there is absolutely zero guarantee of event delivery when using this source For stronger reliability guarantees, consider the Spooling Directory Source or direct integration with Flume via the SDK Note: You can use ExecSource to emulate Tail Source from Flume 0. 9x(flume og Just use unix command tail -F/full/path/to/your/file Parameter -F is better in this case than -f as it will also follow file rotation Example for agent named a1 al. sources rl al channels = cl al. sources.rl, type =exec al. sources. rl. command tail -F /var/log/secure al. sources. rl, channels cl The 'shell, config is used to invoke the command' through a command shell (such as Bash or Powershell). The 'command'is passed as an argument to shell'for execution. This allows the 'command'to use features from the shell such as wildcards, back ticks, pipes, loops, conditionals etc In the absence of the 'shell contig, the 'commandwill be invoked directly. Cammon values for 'shell:"bin/sh -c:"bin/ksh -C, 'cmd /c, powershell Command etc al. sources. tailsource-l type exec al. sources. tailsource-l shell =/bin/bash -c al. sources. tailsource-l command for i in /path/*. txt; do cat $i: done JMS Source JMS Source reads messages from a JMS destination such as a queue or topic. Being a JMS application it should work with any JMS provider but has only been tested with ActiveMQ. The JMs source provides configurable batch size, message selector, user/pass, and message to fume event converter. Note that the vendor provided JMs jars should be included in the Flume classpath using plugins. d directory(preferred), -classpath on command line, or via fLUME CLASSPaTh variable in flume-env. sh Required properties are in bold Property Name Default Description channels type The component type name, needs to be jms initialContextFactory Inital Context Factory, e.g: org. apache activemq ndi. ActiveMQInitialContextFactory connectionFactory The JNDI name the connection factory shoulld appear as providerURL The JMS provider URL destinationName Destination name destinationType Destination type(queue or topic messageSelector Message selector to use when creating the consumer userName Username for the destination/provide passwordFile File containing the password for the destination/provider batch Size 100 Number of messages to consume in one batch converter type DEFAULT Class to use to convert messages to flume events. See below converter Converter properties converter. charset UTF-8 Default converter only Charset to use when converting JMS TextMessages to byte arrays Converter The JMs source allows pluggable converters, though it's likely the default converter will work for most purposes. The default converter is able to convert Bytes, Text, and object messages to Flume Events. In all cases, the properties in the message are added as headers to the Flume Event BytesMessage Bytes of message are copied to body of the Flume Event Cannot convert more than 2GB of data per message TextMessage Text of message is converted to a byte array and copied to the body of the Flume Event. The default converter uses UTF-8 by default but this is configurable ObjectMessage Object is written out to a Byte Array OutputStream wrapped in an ObjectOutputStream and the resulting array is copied to the body of the Flume Event Example for agent named a1 dl, sources rl al channels cl al. sources,rl, type jms al. sources. rl, channels cl al. sources.rl, initialcontextFactory crg apache. activemg. jndi. ActivelQInitialcontextFactory al. sources.rl. connect-onFactory GenericconnectionFactory al.sources.rl. providerURL tcp: //mqserver: 61616 al. sources. rl. destinationName BUSINESS DATA al. sources. rl destinationType QUEUE Spooling Directory Source This source lets you ingest data by placing files to be ingested into a" spooling"directory on disk. This source will watch the specified directory for new files, and will parse events out of new files as they appear. The event parsing logic is pluggable. After a given file has been fully read into the channel, it

...展开详情
试读 42P FlumeUserGuide.pdf 1.6.0
立即下载 低至0.43元/次 身份认证VIP会员低至7折
一个资源只可评论一次,评论内容不能少于5个字
您会向同学/朋友/同事推荐我们的CSDN下载吗?
谢谢参与!您的真实评价是我们改进的动力~
  • 签到新秀

    累计签到获取,不积跬步,无以至千里,继续坚持!
  • 分享王者

    成功上传51个资源即可获取
关注 私信 TA的资源
上传资源赚积分or赚钱
    最新推荐
    FlumeUserGuide.pdf 1.6.0 15积分/C币 立即下载
    1/42
    FlumeUserGuide.pdf 1.6.0第1页
    FlumeUserGuide.pdf 1.6.0第2页
    FlumeUserGuide.pdf 1.6.0第3页
    FlumeUserGuide.pdf 1.6.0第4页
    FlumeUserGuide.pdf 1.6.0第5页
    FlumeUserGuide.pdf 1.6.0第6页
    FlumeUserGuide.pdf 1.6.0第7页
    FlumeUserGuide.pdf 1.6.0第8页
    FlumeUserGuide.pdf 1.6.0第9页

    试读结束, 可继续读4页

    15积分/C币 立即下载 >