# Nexmark Benchmark
## What is Nexmark
Nexmark is a benchmark suite for queries over continuous data streams. This project is inspired by the [NEXMark research paper](https://web.archive.org/web/20100620010601/http://datalab.cs.pdx.edu/niagaraST/NEXMark/) and [Apache Beam Nexmark](https://beam.apache.org/documentation/sdks/java/testing/nexmark/).
## Nexmark Benchmark Suite
### Schemas
These are multiple queries over a three entities model representing on online auction system:
- **Person** represents a person submitting an item for auction and/or making a bid on an auction.
- **Auction** represents an item under auction.
- **Bid** represents a bid for an item under auction.
### Queries
| Query | Name | Summary | Flink |
| -------- | -------- | -------- | ------ |
| q0 | Pass Through | Measures the monitoring overhead including the source generator. | ✅ |
| q1 | Currency Conversion | Convert each bid value from dollars to euros. | ✅ |
| q2 | Selection | Find bids with specific auction ids and show their bid price. | ✅ |
| q3 | Local Item Suggestion | Who is selling in OR, ID or CA in category 10, and for what auction ids? | ✅ |
| q4 | Average Price for a Category | Select the average of the wining bid prices for all auctions in each category. | ✅ |
| q5 | Hot Items | Which auctions have seen the most bids in the last period? | ✅ |
| q6 | Average Selling Price by Seller | What is the average selling price per seller for their last 10 closed auctions. | [FLINK-19059](https://issues.apache.org/jira/browse/FLINK-19059) |
| q7 | Highest Bid | Select the bids with the highest bid price in the last period. | ✅ |
| q8 | Monitor New Users | Select people who have entered the system and created auctions in the last period. | ✅ |
| q9 | Winning Bids | Find the winning bid for each auction. | ✅ |
| q10 | Log to File System | Log all events to file system. Illustrates windows streaming data into partitioned file system. | ✅ |
| q11 | User Sessions | How many bids did a user make in each session they were active? Illustrates session windows. | ✅ |
| q12 | Processing Time Windows | How many bids does a user make within a fixed processing time limit? Illustrates working in processing time window. | ✅ |
| q13 | Bounded Side Input Join | Joins a stream to a bounded side input, modeling basic stream enrichment. | ✅ |
| q14 | Calculation | Convert bid timestamp into types and find bids with specific price. Illustrates more complex projection and filter. | ✅ |
| q15 | Bidding Statistics Report | How many distinct users join the bidding for different level of price? Illustrates multiple distinct aggregations with filters. | ✅ |
| q16 | Channel Statistics Report | How many distinct users join the bidding for different level of price for a channel? Illustrates multiple distinct aggregations with filters for multiple keys. | ✅ |
| q17 | Auction Statistics Report | How many bids on an auction made a day and what is the price? Illustrates an unbounded group aggregation. | ✅ |
| q18 | Find last bid | What's a's last bid for bidder to auction? Illustrates a Deduplicate query. | ✅ |
| q19 | Auction TOP-10 Price | What's the top price 10 bids of an auction? Illustrates a TOP-N query. | ✅ |
| q20 | Expand bid with auction | Get bids with the corresponding auction information where category is 10. Illustrates a filter join. | ✅ |
| q21 | Add channel id | Add a channel_id column to the bid table. Illustrates a 'CASE WHEN' + 'REGEXP_EXTRACT' SQL. | ✅ |
| q22 | Get URL Directories | What is the directory structure of the URL? Illustrates a SPLIT_INDEX SQL. | ✅ |
*Note: q1 ~ q8 are from original [NEXMark queries](https://web.archive.org/web/20100620010601/http://datalab.cs.pdx.edu/niagaraST/NEXMark/), q0 and q9 ~ q13 are from [Apache Beam](https://beam.apache.org/documentation/sdks/java/testing/nexmark), others are extended to cover more scenarios.*
### Metrics
For evaluating the performance, there are two performance measurement terms used in Nexmark that are **cores** and **time**.
Cores is the CPU usage used by the stream processing system. Usually CPU allows preemption, not like memory can be limited. Therefore, how the stream processing system effectively use CPU resources, how much throughput is contributed per core, they are important aspect for streaming performance benchmark.
For Flink, we deploy a CPU usage collector on every worker node and send the usage metric to the benchmark runner for summarizing. We don't use the `Status.JVM.CPU.Load` metric provided by Flink, because it is not accurate.
Time is the cost time for specified number of events executed by the stream processing system. With Cores * Time, we can know how many resources the stream processing system uses to process specified number of events.
## Nexmark Benchmark Guideline
### Requirements
The Nexmark benchmark framework runs Flink queries on [standalone cluster](https://ci.apache.org/projects/flink/flink-docs-release-1.13/ops/deployment/cluster_setup.html), see the Flink documentation for more detailed requirements and how to setup it.
#### Software Requirements
The cluster should consist of **one master node** and **one or more worker nodes**. All of them should be **Linux** environment (the CPU monitor script requries to run on Linux). Please make sure you have the following software installed **on each node**:
- **JDK 1.8.x** or higher (Nexmark scripts uses some tools of JDK),
- **ssh** (sshd must be running to use the Flink and Nexmark scripts that manage
remote components)
If your cluster does not fulfill these software requirements you will need to install/upgrade it.
Having [**passwordless SSH**](https://linuxize.com/post/how-to-setup-passwordless-ssh-login/) and __the same directory structure__ on all your cluster nodes will allow you to use our scripts to control
everything.
#### Environment Variables
The following environment variable should be set on every node for the Flink and Nexmark scripts.
- `JAVA_HOME`: point to the directory of your JDK installation.
- `FLINK_HOME`: point to the directory of your Flink installation.
### Build Nexmark
Before start to run the benchmark, you should build the Nexmark benchmark first to have a benchmark package. Please make sure you have installed `maven` in your build machine. And run the `./build.sh` command under `nexmark-flink` directoy. Then you will get the `nexmark-flink.tgz` archive under the directory.
### Setup Cluster
- Step 1: Download the latest Flink package from the [download page](https://flink.apache.org/downloads.html). Say `flink-<version>-bin-scala_2.11.tgz`.
- Step2: Copy the archives (`flink-<version>-bin-scala_2.11.tgz`, `nexmark-flink.tgz`) to your master node and extract it.
```
tar xzf flink-<version>-bin-scala_2.11.tgz; tar xzf nexmark-flink.tgz
mv flink-<version> flink; mv nexmark-flink nexmark
```
- Step3: Copy the jars under `nexmark/lib` to `flink/lib` which contains the Nexmark source generator.
- Step4: Configure Flink.
- Edit `flink/conf/workers` and enters the IP address of each worker node. Recommand to set 8 entries.
- Replace `flink/conf/sql-client-defaults.yaml` by `nexmark/conf/sql-client-defaults.yaml`
- Replace `flink/conf/flink-conf.yaml` by `nexmark/conf/flink-conf.yaml`. Remember to update the following configurations:
- Set `jobmanager.rpc.address` to you master IP address
- Set `state.checkpoints.dir` to your local file path (recommend to use SSD), e.g. `file:///home/username/checkpoint`.
- Set `state.backend.rocksdb.localdir` to your local file path (recommend to use SSD), e.g. `/home/username/rocksdb`.
- Step5: Configure Nexmark benchmark.
- Set `nexmark.metric.reporter.host` to your master IP address.
- Step6: Copy `flink` and `nexmark` to your worker nodes using `scp`.
- Step7: Start Flink Cluster by running `flink/bin/start-cluster.sh` on the master node.
- Step8: Setup the benchmark cluster by running `nex
没有合适的资源?快使用搜索试试~ 我知道了~
nexmark源码包-可用于flink和spark测试基准
共112个文件
java:57个
sql:35个
sh:8个
需积分: 14 3 下载量 51 浏览量
2022-10-21
15:00:13
上传
评论 1
收藏 165KB ZIP 举报
温馨提示
Nexmark 基准测试框架不依赖任何第三方服务,只需要部署好引擎和 Nexmark,通过脚本 nexmark/bin/run_query.sh all 即可等待并获得所有 query 下的 benchmark 结果。
资源详情
资源评论
资源推荐
收起资源包目录
nexmark源码包-可用于flink和spark测试基准 (112个子文件)
org.apache.flink.table.factories.Factory 840B
.gitignore 499B
ProcfsBasedProcessTree.java 27KB
SysInfoLinux.java 19KB
NexmarkTableSourceITCase.java 12KB
Benchmark.java 11KB
GeneratorConfig.java 11KB
NexmarkGenerator.java 9KB
FlinkRestClient.java 8KB
QueryRunner.java 8KB
NexmarkConfiguration.java 7KB
AutoClosableProcess.java 7KB
NexmarkGlobalConfiguration.java 7KB
MetricReporter.java 7KB
WorkloadSuiteTest.java 7KB
WorkloadSuite.java 6KB
CpuMetricSender.java 6KB
AuctionGenerator.java 6KB
NexmarkSourceOptions.java 5KB
PersonGenerator.java 5KB
NexmarkSourceFunction.java 5KB
OperatingSystem.java 5KB
CpuMetricReceiver.java 5KB
NexmarkTableSourceFactoryTest.java 5KB
BidGenerator.java 4KB
Workload.java 4KB
TpsMetric.java 4KB
CpuTimeTracker.java 4KB
NexmarkTableSource.java 4KB
NexmarkUtils.java 4KB
NexmarkTableSourceFactory.java 4KB
RowDataEventDeserializer.java 4KB
CpuMetric.java 4KB
Auction.java 3KB
BenchmarkMetric.java 3KB
SideInputGenerator.java 3KB
Bid.java 3KB
Person.java 3KB
StringsGenerator.java 3KB
FlinkNexmarkOptions.java 3KB
JobBenchmarkMetric.java 3KB
Event.java 2KB
BenchmarkTest.java 2KB
ShellCommandExecutor.java 2KB
NexmarkSourceFunctionITCase.java 2KB
SystemClock.java 2KB
FlinkRestClientTest.java 2KB
Clock.java 2KB
CpuMetricTest.java 2KB
TpsMetricTest.java 2KB
NexmarkGlobalConfigurationTest.java 2KB
NexmarkGeneratorTest.java 2KB
CountChar.java 1KB
BenchmarkMetricTest.java 1KB
CpuMonitor.java 1KB
LongGenerator.java 1KB
CpuMetricSenderTest.java 1KB
PriceGenerator.java 1KB
EventDeserializer.java 1KB
LICENSE 11KB
README.md 14KB
log4j.properties 1KB
config.sh 4KB
metric_server.sh 2KB
metric_client.sh 2KB
run_query.sh 2KB
side_input_gen.sh 1KB
setup_cluster.sh 1KB
shutdown_cluster.sh 1KB
build.sh 187B
q16.sql 2KB
q5.sql 2KB
q15.sql 2KB
q9.sql 2KB
q20.sql 2KB
q0.sql 1KB
q8.sql 1KB
q10.sql 1KB
ddl_gen.sql 1KB
ddl_gen.sql 1KB
q17.sql 1KB
ddl_kafka.sql 1KB
ddl_kafka.sql 1KB
q4.sql 1KB
q14.sql 1KB
q1.sql 1KB
q2.sql 1KB
q21.sql 1KB
q6.sql 1KB
q7.sql 1KB
q13.sql 1KB
q3.sql 1KB
q12.sql 1KB
q2.sql 993B
q11.sql 988B
q18.sql 884B
q22.sql 857B
q3.sql 840B
q19.sql 840B
q0.sql 787B
共 112 条
- 1
- 2
爱纹身的big数据
- 粉丝: 70
- 资源: 13
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0