<!--
Copyright 2019 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Oozie to Airflow
[![Build Status](https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow.svg?branch=master)](https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow)
[![codecov](https://codecov.io/gh/GoogleCloudPlatform/oozie-to-airflow/branch/master/graph/badge.svg)](https://codecov.io/gh/GoogleCloudPlatform/oozie-to-airflow)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Dependabot Status](https://api.dependabot.com/badges/status?host=github&repo=GoogleCloudPlatform/oozie-to-airflow)](https://dependabot.com)
[![Python 3](https://pyup.io/repos/github/GoogleCloudPlatform/oozie-to-airflow/python-3-shield.svg)](https://pyup.io/repos/github/GoogleCloudPlatform/oozie-to-airflow/)
A tool to easily convert between [Apache Oozie](http://oozie.apache.org/) workflows
and [Apache Airflow](https://airflow.apache.org) workflows.
The program targets Apache Airflow >= 1.10 and Apache Oozie 1.0 XML schema.
If you want to contribute to the project, please take a look at [CONTRIBUTING.md](CONTRIBUTING.md)
# Table of Contents
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
- [Background](#background)
- [Running the Program](#running-the-program)
- [Installing from PyPi](#installing-from-pypi)
- [Installing from sources](#installing-from-sources)
- [Running the conversion](#running-the-conversion)
- [Structure of the application folder](#structure-of-the-application-folder)
- [The o2a libraries](#the-o2a-libraries)
- [Supported Oozie features](#supported-oozie-features)
- [Control nodes](#control-nodes)
- [EL Functions](#el-functions)
- [Workflow and node notifications](#workflow-and-node-notifications)
- [Airflow-specific optimisations](#airflow-specific-optimisations)
- [Removing unnecessary control nodes](#removing-unnecessary-control-nodes)
- [Removing inaccessible nodes](#removing-inaccessible-nodes)
- [Common Known Limitations](#common-known-limitations)
- [File/Archive functionality](#filearchive-functionality)
- [Not all global configuration methods are supported](#not-all-global-configuration-methods-are-supported)
- [Support for uber.jar feature](#support-for-uberjar-feature)
- [Support for .so and .jar lib files](#support-for-so-and-jar-lib-files)
- [Custom messages missing for Kill Node](#custom-messages-missing-for-kill-node)
- [Capturing output is not supported](#capturing-output-is-not-supported)
- [Subworkflow DAGs must be placed in examples](#subworkflow-dags-must-be-placed-in-examples)
- [EL functions support](#el-functions-support)
- [Notification proxy is not supported](#notification-proxy-is-not-supported)
- [Cloud execution environment for Oozie to Airflow conversion](#cloud-execution-environment-for-oozie-to-airflow-conversion)
- [Cloud environment setup](#cloud-environment-setup)
- [Examples](#examples)
- [EL Example](#el-example)
- [SSH Example](#ssh-example)
- [Email Example](#email-example)
- [MapReduce Example](#mapreduce-example)
- [FS Example](#fs-example)
- [Java Example](#java-example)
- [Pig Example](#pig-example)
- [Shell Example](#shell-example)
- [Spark Example](#spark-example)
- [Sub-workflow Example](#sub-workflow-example)
- [DistCp Example](#distcp-example)
- [Decision Example](#decision-example)
- [Hive/Hive2 Example](#hivehive2-example)
- [Demo Example](#demo-example)
- [Childwf Example](#childwf-example)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
# Background
Apache Airflow is a workflow management system developed by AirBnB in 2014.
It is a platform to programmatically author, schedule, and monitor workflows.
Airflow workflows are designed as [Directed Acyclic Graphs](https://airflow.apache.org/tutorial.html#example-pipeline-definition)
(DAGs) of tasks in Python. The Airflow scheduler executes your tasks on an array of
workers while following the specified dependencies.
Apache Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
Oozie workflows are also designed as [Directed Acyclic Graphs](https://oozie.apache.org/docs/3.1.3-incubating/DG_Overview.html)
(DAGs) in XML.
There are a few differences noted below:
| | Spec. | Task | Dependencies | "Subworkflows" | Parameterization | Notification |
|---------|--------|-------------|---------------------------------|----------------|------------------------------|---------------------|
| Oozie | XML | Action Node | Control Node | Subworkflow | EL functions/Properties file | URL based callbacks |
| Airflow | Python | Operators | Trigger Rules, set_downstream() | SubDag | jinja2 and macros | Callbacks/Emails |
# Running the Program
Note that you need Python >= 3.6 to run the converter.
## Installing from PyPi
You can install `o2a` from PyPi via `pip install o2a`. After installation, the
[o2a](bin/o2a) and [o2a-validate-workflows](bin/o2a-validate-workflows) should be available on your path.
## Installing from sources
1. (Optional) Install virtualenv:
In case you use sources of `o2a`, the environment can be set up via the virtualenv setup
(you can create one using [virtualenvwrapper](https://virtualenvwrapper.readthedocs.io/en/latest/)
for example).
2. Install Oozie-to-Airflow - you have 2 options to do so:
1. automatically: install `o2a` from local folder using `pip install -e .`
This will take care about, among others, adding the [bin](bin) subdirectory to the PATH.
2. more manually:
1. While in your virtualenv, you can install all the requirements via `pip install -r requirements.txt`.
2. You can add the [bin](bin) subdirectory to your
PATH, then all the scripts below can be run without adding the `./bin` prefix.
This can be done for example by adding a line similar to the one below to your `.bash_profile`
or `bin/postactivate` from your virtual environment:
```bash
export PATH=${PATH}:<INSERT_PATH_TO_YOUR_OOZIE_PROJECT>/bin
```
Otherwise you need to run all the scripts from the bin subdirectory, for example:
```bash
./bin/o2a --help
```
In all the example commands below, it is assumed that the [bin](bin) directory is in your PATH -
either installed from PyPi or from the sources.
## Running the conversion
You can run the program by calling:
`o2a -i <INPUT_APPLICATION_FOLDER> -o <OUTPUT_FOLDER_PATH>`
Example:
`o2a -i examples/demo -o output/demo`
This is the full usage guide, available by running `o2a -h`
```
usage: o2a [-h] -i INPUT_DIRECTORY_PATH -o OUTPUT_DIRECTORY_PATH [-n DAG_NAME]
[-u USER] [-s START_DAYS_AGO] [-v SCHEDULE_INTERVAL] [-d]
Convert Apache Oozie workflows to Apache Airflow workflows.
optional arguments:
-h, --help show this help message and exit
-i INPUT_DIRECTORY_PATH, --input-directory-path INPUT_DIRECTORY_PATH
Path to input directory
-o OUTPUT_DIRECTORY_PATH, --output-directory-path OUTPUT_DIRECTORY_PATH
Desired output directory
-n DAG_NAME, --dag-name DAG
没有合适的资源?快使用搜索试试~ 我知道了~
从 Oozie 工作流到 Airflow DAG 的迁移工具_python_代码_下载
共272个文件
py:122个
properties:32个
tpl:29个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 34 浏览量
2022-07-03
12:59:06
上传
评论
收藏 2.21MB ZIP 举报
温馨提示
一个在Apache Oozie工作流和Apache Airflow工作流之间轻松转换的工具。 Apache Airflow 是 AirBnB 于 2014 年开发的工作流管理系统。它是一个以编程方式编写、调度和监控工作流的平台。气流工作流被设计为Python 中任务的有向无环图 (DAG)。Airflow 调度程序在遵循指定依赖项的同时在一组工作人员上执行您的任务。 Apache Oozie 是一个用于管理 Apache Hadoop 作业的工作流调度系统。Oozie 工作流也被设计为XML 中的有向无环图 (DAG)。 更多详情、使用方法,请下载后阅读README.md文件
资源推荐
资源详情
资源评论
收起资源包目录
从 Oozie 工作流到 Airflow DAG 的迁移工具_python_代码_下载
(272个子文件)
setup.cfg 22B
o2a-build-artifacts-sa.json.enc 2KB
.flake8 87B
.gitignore 599B
.gitignore 199B
MANIFEST.in 134B
mypy.ini 167B
oozie-examples-4.3.0.jar 26KB
wordcount.jar 3KB
wordcount.jar 3KB
demo-java-main.jar 1019B
LICENSE 11KB
README.md 45KB
CONTRIBUTING.md 14KB
CODE_OF_CONDUCT.md 684B
feature_request.md 594B
bug_report.md 593B
PULL_REQUEST_TEMPLATE.md 585B
mock 517B
o2a 1KB
o2a-confirm 748B
o2a-generate-dependency-graph 1KB
o2a-generate-index 2KB
o2a-package-upload 1KB
o2a-package-upload-test 1013B
o2a-run-all-configurations 915B
o2a-run-all-conversions 1KB
o2a-run-all-unit-tests 764B
o2a-run-sys-test 26KB
o2a-run-sys-test-complete 4KB
o2a-validate-all-workflows 802B
o2a-validate-workflows 2KB
id.pig 1KB
id.pig 719B
o2a-dependencies.png 1.47MB
childwf_with_notifications.png 219KB
childwf_without_notifications.png 50KB
o2a-dependency-cycles.png 10KB
job.properties 2KB
job.properties 2KB
configuration.template.properties 2KB
configuration.template.properties 2KB
job.properties 2KB
job.properties 2KB
job.properties 2KB
job.properties 2KB
job.properties 2KB
job.properties 2KB
job.properties 2KB
job.properties 2KB
job.properties 2KB
job.properties 2KB
job.properties 2KB
job.properties 1KB
configuration.template.properties 1KB
configuration.template.properties 1KB
configuration.template.properties 1KB
configuration.template.properties 1KB
configuration.template.properties 1KB
configuration.template.properties 1KB
configuration.template.properties 1KB
configuration.template.properties 1KB
job.properties 885B
job.properties 864B
job.properties 854B
configuration.template.properties 722B
configuration.template.properties 722B
configuration.template.properties 640B
configuration.template.properties 640B
configuration.template.properties 640B
test_templates.py 23KB
test_workflow_xml_parser.py 23KB
test_fs_mapper.py 20KB
test_mapreduce_mapper.py 15KB
test_oozie_converter.py 13KB
test_hive_mapper.py 13KB
test_add_node_notification_transformer.py 13KB
workflow_xml_parser.py 11KB
test_java_mapper.py 11KB
el_parser.py 10KB
test_el_parser.py 9KB
test_renderers.py 9KB
oozie_converter.py 8KB
test_inaccessible_node_transformer.py 8KB
test_config_extractors.py 8KB
test_git_mapper.py 8KB
test_el_utils.py 8KB
test_pig_mapper.py 7KB
add_node_notificaton_transformer.py 7KB
test_action_mapper.py 7KB
renderers.py 7KB
test_spark_mapper.py 7KB
el_utils.py 7KB
test_distcp_mapper.py 6KB
el_wf_functions.py 6KB
test_remove_fork_transformer.py 6KB
test_remove_join_transformer.py 6KB
o2a.py 6KB
test_shell_mapper.py 6KB
spark_mapper.py 6KB
共 272 条
- 1
- 2
- 3
资源评论
快撑死的鱼
- 粉丝: 2w+
- 资源: 9148
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功