## mumu-morphlines 数据转化工具
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/mumuhadoop/mumu-morphlines/blob/master/LICENSE)
[![Build Status](https://travis-ci.org/mumuhadoop/mumu-morphlines.svg?branch=master)](https://travis-ci.org/mumuhadoop/mumu-morphlines)
[![codecov](https://codecov.io/gh/mumuhadoop/mumu-morphlines/branch/master/graph/badge.svg)](https://codecov.io/gh/mumuhadoop/mumu-morphlines)
mumu-morphlines是一个kite morphlines测试程序,主要通过这个项目来了解和学习kite morphlines的使用方式和工作原理。morphlines是一款数据转换工具集,可以通过morphlines
来抽取、转换、加载(ETL)数据,列如可以抽取日志数据。同时morphlines可以配合flume、hadoop、solr来将非结构化的数据转换为结构化的数据,并且将数据保存在solr中供客户端进
行检索使用。
## kite Morphline
Kite Morphline是一个Morphline版本,将Morphline应用到除Search外的数据处理中,发布了丰富的库、工具、样例、文档。
Kite Morphline支持
- Flumeevents,
- HDFSfiles,
- SparkRDDs,
- RDBMStables
- Avroobjects
已经应用到Crunch、HBase、Impala、Pig、Hive、Sqoop等
### Morphline重要概念
- Commands are plugins to a morphline that perform tasks such as loading, parsing, transforming, or otherwise processing a single record.
- Record is an in-memory data structure of name-value pairs (Record)with optional blob attachments or POJO attachments.
### morphine工作流程
![morphine工作流程](https://raw.githubusercontent.com/mumuhadoop/mumu-morphlines/master/doc/images/morphine_process.jpg)
### morphine在hadoop家族地位
![morphine在hadoop家族地位](https://raw.githubusercontent.com/mumuhadoop/mumu-morphlines/master/doc/images/morphine_architecture.jpg)
## Morphline案列
### Morphline配置文件
```
morphlines: [
{
id: morphline1
importCommands: ["org.kitesdk.**", "org.apache.solr.**"]
commands: [
{
readLine {
charset: UTF-8
}
}
{
grok {
dictionaryFiles: [src/test/resources/grok-dictionaries]
expressions: {
message: """<%{POSINT:priority}>%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:hostname} %{DATA:program}(?:\[%{POSINT:pid}\])?: %{GREEDYDATA:msg}"""
}
}
}
{
convertTimestamp {
field: timestamp
inputFormats: ["yyyy-MM-dd'T'HH:mm:ss'Z'", "MMM d HH:mm:ss"]
inputTimezone: America/Los_Angeles
outputFormat: "yyyy-MM-dd HH:mm:ss"
outputTimezone: UTC
}
}
{logInfo {format: "output record: {}", args: ["@{}"]}}
]
}
]
```
### Morphline测试代码
```
@Test
public void sysLogTest() {
MorphlineContext context = new MorphlineContext.Builder().build();
File configFile = new File(BasicMorphlineTest.class.getResource("/morphlines/syslog.conf").getPath());
Command morphline = new Compiler().compile(configFile, null, context, null);
Record record = new Record();
record.put(Fields.ATTACHMENT_BODY, BasicMorphlineTest.class.getResourceAsStream("/log/syslog.log"));
boolean process = morphline.process(record);
System.out.println(process);
}
```
## 相关阅读
[hadoop官网文档](http://hadoop.apache.org)
[kite:A Data API for Hadoop](http://kitesdk.org/docs/current/)
[Morphlines Introduction](http://kitesdk.org/docs/1.1.0/morphlines/)
## 联系方式
以上观点纯属个人看法,如有不同,欢迎指正。
email:<babymm@aliyun.com>
github:[https://github.com/babymm](https://github.com/babymm)
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
mumu-morphlines是一个kite_morphlines测试程序,主要通过这个项目来了解和 (237个子文件)
addValues.conf 1KB
tomcat.conf 845B
syslog.conf 828B
firewalls 678B
.gitignore 272B
grok-patterns 4KB
mumu-morphlines.iml 21KB
MyToLowerCaseBuilder.java 3KB
BasicMorphlineTest.java 2KB
MorphlinesConfiguration.java 197B
MorphlinesConfigurationTest.java 183B
java 165B
morphine_architecture.jpg 153KB
morphine_process.jpg 63KB
LICENSE 11KB
linux-syslog 919B
mcollective 49B
mcollective-patterns 190B
README.md 4KB
nagios 8KB
postgresql 142B
log4j.properties 1KB
redis 104B
ruby 188B
tomcat_accesslog.txt 202KB
uiDesigner.xml 9KB
pom.xml 8KB
markdown-navigator.xml 4KB
Maven__com_googlecode_concurrentlinkedhashmap_concurrentlinkedhashmap_lru_1_2.xml 722B
Maven__org_apache_directory_server_apacheds_kerberos_codec_2_0_0_M15.xml 692B
Maven__org_kitesdk_kite_morphlines_hadoop_parquet_avro_1_1_0.xml 684B
Maven__org_kitesdk_kite_morphlines_hadoop_sequencefile_1_1_0.xml 684B
Maven__org_eclipse_jetty_orbit_javax_servlet_3_0_0_v201112011016.xml 676B
Maven__org_eclipse_jetty_jetty_continuation_8_1_10_v20130312.xml 666B
Maven__org_kitesdk_kite_morphlines_metrics_servlets_1_1_0.xml 663B
Maven__org_apache_hadoop_hadoop_mapreduce_client_core_2_8_1.xml 659B
Maven__org_kitesdk_kite_morphlines_tika_decompress_1_1_0.xml 656B
Maven__org_apache_lucene_lucene_analyzers_kuromoji_4_10_3.xml 645B
Maven__org_apache_lucene_lucene_analyzers_phonetic_4_10_3.xml 645B
Maven__org_kitesdk_kite_morphlines_hadoop_rcfile_1_1_0.xml 642B
Maven__com_googlecode_juniversalchardet_juniversalchardet_1_0_3.xml 642B
compiler.xml 642B
Maven__org_eclipse_jetty_jetty_security_8_1_10_v20130312.xml 638B
Maven__com_fasterxml_jackson_core_jackson_annotations_2_3_0.xml 632B
Maven__org_apache_lucene_lucene_analyzers_common_4_10_3.xml 631B
Maven__org_eclipse_jetty_jetty_servlet_8_1_14_v20131031.xml 631B
Maven__org_kitesdk_kite_morphlines_hadoop_core_1_1_0.xml 628B
Maven__org_eclipse_jetty_jetty_server_8_1_10_v20130312.xml 624B
Maven__org_eclipse_jetty_jetty_webapp_8_1_10_v20130312.xml 624B
Maven__org_eclipse_jetty_jetty_deploy_8_1_10_v20130312.xml 624B
Maven__org_apache_htrace_htrace_core4_4_0_1_incubating.xml 624B
Maven__org_apache_directory_server_apacheds_i18n_2_0_0_M15.xml 622B
Maven__commons_beanutils_commons_beanutils_core_1_8_0.xml 617B
Maven__org_restlet_jee_org_restlet_ext_servlet_2_1_1.xml 616B
Maven__org_openjdk_jmh_jmh_generator_annprocess_1_19.xml 616B
Maven__com_codahale_metrics_metrics_healthchecks_3_0_2.xml 615B
Maven__org_kitesdk_kite_morphlines_solr_core_1_1_0.xml 614B
Maven__org_kitesdk_kite_hadoop_compatibility_1_1_0.xml 614B
Maven__org_kitesdk_kite_morphlines_tika_core_1_1_0.xml 614B
Maven__org_kitesdk_kite_morphlines_solr_cell_1_1_0.xml 614B
Maven__org_kitesdk_kite_morphlines_useragent_1_1_0.xml 614B
Maven__commons_configuration_commons_configuration_1_6.xml 612B
Maven__com_fasterxml_jackson_core_jackson_databind_2_3_1.xml 611B
Maven__org_kitesdk_kite_morphlines_core_test_jar_tests_1_1_0.xml 610B
Maven__org_eclipse_jetty_jetty_util_8_1_10_v20130312.xml 610B
Maven__org_eclipse_jetty_jetty_http_8_1_10_v20130312.xml 610B
Maven__org_codehaus_jackson_jackson_mapper_asl_1_9_13.xml 608B
Maven__commons_collections_commons_collections_3_2_2.xml 604B
Maven__org_eclipse_jetty_jetty_jmx_8_1_10_v20130312.xml 603B
Maven__org_apache_directory_api_api_asn1_api_1_0_0_M20.xml 603B
Maven__org_eclipse_jetty_jetty_xml_8_1_10_v20130312.xml 603B
Maven__org_kitesdk_kite_morphlines_twitter_1_1_0.xml 600B
Maven__org_kitesdk_kite_morphlines_maxmind_1_1_0.xml 600B
Maven__org_eclipse_jetty_jetty_io_8_1_10_v20130312.xml 596B
Maven__org_apache_lucene_lucene_highlighter_4_10_3.xml 596B
Maven__org_apache_lucene_lucene_expressions_4_10_3.xml 596B
Maven__org_apache_lucene_lucene_queryparser_4_10_3.xml 596B
Maven__org_codehaus_jackson_jackson_core_asl_1_9_13.xml 594B
Maven__commons_fileupload_commons_fileupload_1_2_1.xml 593B
Maven__org_apache_hadoop_hadoop_hdfs_client_2_8_1.xml 589B
Maven__org_apache_hadoop_hadoop_annotations_2_8_1.xml 589B
Maven__org_apache_hadoop_hadoop_yarn_common_2_8_1.xml 589B
Maven__com_codahale_metrics_metrics_servlets_3_0_2.xml 587B
Maven__org_kitesdk_kite_morphlines_saxon_1_1_0.xml 586B
Maven__org_apache_curator_curator_framework_2_7_1.xml 586B
Maven__org_apache_james_apache_mime4j_core_0_7_2.xml 585B
Maven__com_google_inject_extensions_guice_servlet_3_0.xml 584B
Maven__com_fasterxml_jackson_core_jackson_core_2_3_1.xml 583B
Maven__commons_beanutils_commons_beanutils_1_7_0.xml 582B
Maven__org_apache_commons_commons_compress_1_4_1.xml 579B
Maven__org_fusesource_leveldbjni_leveldbjni_all_1_8.xml 579B
Maven__org_kitesdk_kite_morphlines_core_1_1_0.xml 579B
Maven__org_kitesdk_kite_morphlines_json_1_1_0.xml 579B
Maven__org_kitesdk_kite_morphlines_avro_1_1_0.xml 579B
Maven__org_apache_james_apache_mime4j_dom_0_7_2.xml 578B
Maven__org_apache_poi_poi_ooxml_schemas_3_10_1.xml 577B
Maven__com_drewnoakes_metadata_extractor_2_6_2.xml 577B
Maven__org_apache_directory_api_api_util_1_0_0_M20.xml 575B
Maven__com_googlecode_mp4parser_isoparser_1_0_RC_1.xml 575B
Maven__org_apache_lucene_lucene_grouping_4_10_3.xml 575B
共 237 条
- 1
- 2
- 3
资源评论
普通网友
- 粉丝: 1127
- 资源: 5292
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 超分辨率重建-PyTorch框架基于TPU平台实现超分辨率重建模型部署python源码+文档说明.zip
- NVIDIA GeForce 300 Series显卡驱动下载
- 易语言Http开发框架.zip
- 易语言Minecraft Checker 源代码.zip
- 首页模块宣传图.zip
- 电子科技大学22级大二计算机科学与技术专业应用开发小学期作业 Kotlin.zip
- 橡皮擦icon完稿.rar
- 技术资料分享简易0S设计很好的技术资料.zip
- 简易图片加水印源码 防盗图必备
- 易语言web3算法DLL.zip
- 创维8K86机芯 65E91RD 主程序软件 电视刷机 固件升级包 20130226
- Qt MinGW环境下chromium内核的使用
- 世界银行WDI面板数据(1960-2022年)2023.11更新.zip
- 第三次全国土地调查工作分类图示符号库
- 技术资料分享蓝牙串口助手 v1.97很好的技术资料.zip
- 易语言WebBrowser2支持库,基于云外归鸟的开源修改代码.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功