# Apache SeaTunnel (Incubating)
<img src="https://seatunnel.apache.org/image/logo.png" alt="seatunnel logo" height="200px" align="right" />
[![Backend Workflow](https://github.com/apache/incubator-seatunnel/actions/workflows/backend.yml/badge.svg?branch=dev)](https://github.com/apache/incubator-seatunnel/actions/workflows/backend.yml)
[![Slack](https://img.shields.io/badge/slack-%23seatunnel-4f8eba?logo=slack)](https://join.slack.com/t/apacheseatunnel/shared_invite/zt-123jmewxe-RjB_DW3M3gV~xL91pZ0oVQ)
[![Twitter Follow](https://img.shields.io/twitter/follow/ASFSeaTunnel.svg?label=Follow&logo=twitter)](https://twitter.com/ASFSeaTunnel)
---
[![EN doc](https://img.shields.io/badge/document-English-blue.svg)](README.md)
SeaTunnel was formerly named Waterdrop , and renamed SeaTunnel since October 12, 2021.
---
SeaTunnel is a very easy-to-use ultra-high-performance distributed data integration platform that supports real-time
synchronization of massive data. It can synchronize tens of billions of data stably and efficiently every day, and has
been used in the production of nearly 100 companies.
## Why do we need SeaTunnel
SeaTunnel focuses on data integration and data synchronization, and is mainly designed to solve common problems in the field of data integration:
- Various data sources: There are hundreds of commonly-used data sources of which versions are incompatible. With the emergence of new technologies, more data sources are appearing. It is difficult for users to find a tool that can fully and quickly support these data sources.
- Complex synchronization scenarios: Data synchronization needs to support various synchronization scenarios such as offline-full synchronization, offline-incremental synchronization, CDC, real-time synchronization, and full database synchronization.
- High demand in resource: Existing data integration and data synchronization tools often require vast computing resources or JDBC connection resources to complete real-time synchronization of massive small tables. This has increased the burden on enterprises to a certain extent.
- Lack of quality and monitoring: Data integration and synchronization processes often experience loss or duplication of data. The synchronization process lacks monitoring, and it is impossible to intuitively understand the real-situation of the data during the task process.
- Complex technology stack: The technology components used by enterprises are different, and users need to develop corresponding synchronization programs for different components to complete data integration.
- Difficulty in management and maintenance: Limited to different underlying technology components (Flink/Spark) , offline synchronization and real-time synchronization often have be developed and managed separately, which increases the difficulty of the management and maintainance.
## Features of SeaTunnel
- Rich and extensible Connector: SeaTunnel provides a Connector API that does not depend on a specific execution engine. Connectors (Source, Transform, Sink) developed based on this API can run on many different engines, such as SeaTunnel Engine, Flink, Spark that are currently supported.
- Connector plugin: The plugin design allows users to easily develop their own Connector and integrate it into the SeaTunnel project. Currently, SeaTunnel has supported more than 70 Connectors, and the number is surging. There is the list of connectors we [supported and plan to support](https://github.com/apache/incubator-seatunnel/issues/3018).
- Batch-stream integration: Connectors developed based on SeaTunnel Connector API are perfectly compatible with offline synchronization, real-time synchronization, full- synchronization, incremental synchronization and other scenarios. It greatly reduces the difficulty of managing data integration tasks.
- Support distributed snapshot algorithm to ensure data consistency.
- Multi-engine support: SeaTunnel uses SeaTunnel Engine for data synchronization by default. At the same time, SeaTunnel also supports the use of Flink or Spark as the execution engine of the Connector to adapt to the existing technical components of the enterprise. In addition, SeaTunnel supports multiple versions of Spark and Flink.
- JDBC multiplexing, database log multi-table parsing: SeaTunnel supports multi-table or whole database synchronization, which solves the problem of over-JDBC connections; supports multi-table or whole database log reading and parsing, which solves the need for CDC multi-table synchronization scenarios problems with repeated reading and parsing of logs.
- High throughput and low latency: SeaTunnel supports parallel reading and writing, providing stable and reliable data synchronization capabilities with high throughput and low latency.
- Perfect real-time monitoring: SeaTunnel supports detailed monitoring information of each step in the data synchronization process, allowing users to easily understand the number of data, data size, QPS and other information read and written by the synchronization task.
- Two job development methods are supported: coding and canvas design. The SeaTunnel web project https://github.com/apache/incubator-seatunnel-web provides visual management of jobs, scheduling, running and monitoring capabilities.
## SeaTunnel work flowchart
![SeaTunnel work flowchart](docs/en/images/architecture_diagram.png)
The runtime process of SeaTunnel is shown in the figure above.
The user configures the job information and selects the execution engine to submit the job.
The Source Connector is responsible for parallelizing the data and sending the data to the downstream Transform or directly to the Sink, and the Sink writes the data to the destination. It is worth noting that both Source and Transform and Sink can be easily developed and extended by yourself.
The default engine use by SeaTunnel is [SeaTunnel Engine](seatunnel-engine/README.md). If you choose to use the Flink or Spark engine, SeaTunnel will package the Connector into a Flink or Spark program and submit it to Flink or Spark to run.
## Connectors supported by SeaTunnel
- Source Connectors supported [check out](https://seatunnel.apache.org/docs/category/source-v2)
- Sink Connectors supported [check out](https://seatunnel.apache.org/docs/category/sink-v2)
- Transform supported [check out](docs/en/transform-v2)
### Here's a list of our connectors with their health status.[connector status](docs/en/Connector-v2-release-state.md)
## Environmental dependency
1. java runtime environment, java >= 8
2. If you want to run SeaTunnel in a cluster environment, any of the following Spark cluster environments is usable:
- Spark on Yarn
- Spark Standalone
If the data volume is small, or the goal is merely for functional verification, you can also start in local mode without
a cluster environment, because SeaTunnel supports standalone operation. Note: SeaTunnel 2.0 supports running on Spark
and Flink.
## Compiling project
Follow this [document](docs/en/contribution/setup.md).
## Downloads
Download address for run-directly software package : https://seatunnel.apache.org/download
## Quick start
**SeaTunnel Engine**
https://seatunnel.apache.org/docs/start-v2/locally/quick-start-seatunnel-engine/
**Spark**
https://seatunnel.apache.org/docs/start-v2/locally/quick-start-spark
**Flink**
https://seatunnel.apache.org/docs/start-v2/locally/quick-start-flink
## Application practice cases
- Weibo, Value-added Business Department Data Platform
Weibo business uses an internal customized version of SeaTunnel and its sub-project Guardian for SeaTunnel On Yarn task
monitoring for hundreds of real-time streaming computing tasks.
- Sina, Big Data Operation Analysis Platform
Sina Data Operation Analysis Platform uses SeaTunnel to perform real-time and offline analysis of data operation and
maintenance for Sina News, CDN and other services, and write it into Clickhouse.
- Sogou, Sogou Qiqian System
Sogou Qiqian System tak
没有合适的资源?快使用搜索试试~ 我知道了~
Seatunnel 2.3.1
需积分: 0 23 下载量 45 浏览量
2023-03-28
21:33:32
上传
评论
收藏 241.21MB GZ 举报
温馨提示
共65个文件
jar:26个
txt:11个
sh:9个
Seatunnel 2.3.1 demo已调过,方便内网使用
资源推荐
资源详情
资源评论
收起资源包目录
seatunnel.tar.gz (65个子文件)
apache-seatunnel-incubating-2.3.1
mvnw.cmd 7KB
lib
seatunnel-transforms-v2.jar 973KB
seatunnel-hadoop3-3.1.4-uber-2.3.1-optional.jar 40.16MB
plugins
README.md 345B
LICENSE 22KB
.mvn
wrapper
maven-wrapper.properties 1021B
maven-wrapper.jar 57KB
bin
stop-seatunnel-cluster.sh 2KB
start-seatunnel-spark-2-connector-v2.sh 2KB
install-plugin.sh 2KB
seatunnel.sh 3KB
seatunnel-cluster.sh 3KB
start-seatunnel-spark-3-connector-v2.sh 2KB
start-seatunnel-flink-15-connector-v2.sh 2KB
start-seatunnel-flink-13-connector-v2.sh 2KB
starter
seatunnel-flink-13-starter.jar 15.3MB
seatunnel-flink-15-starter.jar 15.3MB
seatunnel-spark-3-starter.jar 15.5MB
seatunnel-starter.jar 33.81MB
seatunnel-spark-2-starter.jar 15.5MB
logging
slf4j-api-1.7.25.jar 40KB
log4j-slf4j-impl-2.17.1.jar 24KB
log4j-api-2.17.1.jar 295KB
log4j-core-2.17.1.jar 1.71MB
jcl-over-slf4j-1.7.25.jar 16KB
mvnw 10KB
README.md 10KB
DISCLAIMER 552B
connectors
plugin-mapping.properties 5KB
seatunnel
connector-assert-2.3.1.jar 68KB
connector-amazondynamodb-2.3.1.jar 10.97MB
connector-dingtalk-2.3.1.jar 5.32MB
connector-email-2.3.1.jar 713KB
connector-clickhouse-2.3.1.jar 29.07MB
connector-fake-2.3.1.jar 150KB
connector-datahub-2.3.1.jar 6.68MB
connector-cdc-sqlserver-2.3.1.jar 25.38MB
connector-cassandra-2.3.1.jar 13.21MB
connector-console-2.3.1.jar 51KB
connector-elasticsearch-2.3.1.jar 5.22MB
connector-doris-2.3.1.jar 1.88MB
connector-cdc-mysql-2.3.1.jar 30.36MB
licenses
LICENSE-avro.txt 30KB
LICENSE-sjf4j.txt 1KB
LICENSE-orc.txt 13KB
LICENSE-protobuf.txt 2KB
LICENSE-parquet-format.txt 11KB
LICENSE-yetus.txt 23KB
LICENSE-connons-math.txt 22KB
LICENSE-scala.txt 1KB
LICENSE-parquet-mr.txt 11KB
LICENSE-xz.txt 323B
LICENSE-javax-annootation-api.txt 35KB
config
seatunnel.yaml 1KB
log4j2.properties 4KB
v2.batch.config.template 2KB
jvm_client_options 3KB
hazelcast.yaml 1KB
log4j2_client.properties 4KB
hazelcast-client.yaml 932B
v2.streaming.conf.template 2KB
plugin_config 2KB
jvm_options 3KB
seatunnel-env.sh 959B
NOTICE 28KB
共 65 条
- 1
资源评论
keepandkeep
- 粉丝: 271
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功