<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# TsFile-Spark-Connector User Guide
## 1. About TsFile-Spark-Connector
TsFile-Spark-Connector implements the support of Spark for external data sources of Tsfile type. This enables users to read, write and query Tsfile by Spark.
With this connector, you can
* load a single TsFile, from either the local file system or hdfs, into Spark
* load all files in a specific directory, from either the local file system or hdfs, into Spark
* write data from Spark into TsFile
## 2. System Requirements
|Spark Version | Scala Version | Java Version | TsFile |
|------------- | ------------- | ------------ |------------ |
| `>= 2.2` | `2.11` | `1.8` | `0.10.0`|
> Note: For more information about how to download and use TsFile, please see the following link: https://github.com/apache/incubator-iotdb/tree/master/tsfile.
## 3. Quick Start
### Local Mode
Start Spark with TsFile-Spark-Connector in local mode:
```
./<spark-shell-path> --jars tsfile-spark-connector.jar,tsfile-0.10.0-jar-with-dependencies.jar
```
Note:
* \<spark-shell-path> is the real path of your spark-shell.
* Multiple jar packages are separated by commas without any spaces.
* See https://github.com/apache/iotdb/tree/master/tsfile for how to get TsFile.
### Distributed Mode
Start Spark with TsFile-Spark-Connector in distributed mode (That is, the spark cluster is connected by spark-shell):
```
. /<spark-shell-path> --jars tsfile-spark-connector.jar,tsfile-0.10.0-jar-with-dependencies.jar --master spark://ip:7077
```
Note:
* \<spark-shell-path> is the real path of your spark-shell.
* Multiple jar packages are separated by commas without any spaces.
* See https://github.com/apache/iotdb/tree/master/tsfile for how to get TsFile.
## 4. Data Type Correspondence
| TsFile data type | SparkSQL data type|
| --------------| -------------- |
| BOOLEAN | BooleanType |
| INT32 | IntegerType |
| INT64 | LongType |
| FLOAT | FloatType |
| DOUBLE | DoubleType |
| TEXT | StringType |
## 5. Schema Inference
The way to display TsFile is dependent on the schema. Take the following TsFile structure as an example: There are three Measurements in the TsFile schema: status, temperature, and hardware. The basic information of these three measurements is as follows:
| name | type | encode|
|------|------|-------|
| status | Boolean | PLAIN|
| temperature | Float | RLE|
| hardware | Text | PLAIN|
The existing data in the TsFile is as follows:
| root.ln.wf01.wt01 | | root.ln.wf02.wt02 | | | | | |
|------|------------|-----|--------|------|-------|------|-------|
| status | | temperature | | hardware | | status | |
| time | value | time | value | time | value |
| 1 | True | 1 | 2.2 | 2 | "aaa" | 1 | True
| 3 | True | 2 | 2.2 | 4 | "bbb" | 2 | False
| 5 | False | 3 | 2.1 | 6 | "ccc" | 4 | True
The corresponding SparkSQL table is as follows:
| time | root.ln.wf02.wt02.temperature | root.ln.wf02.wt02.status | root.ln.wf02.wt02.hardware | root.ln.wf01.wt01.temperature | root.ln.wf01.wt01.status | root.ln.wf01.wt01.hardware |
|------|-------------------------------|--------------------------|----------------------------|-------------------------------|--------------------------|----------------------------|
| 1 | null | true | null | 2.2 | true | null |
| 2 | null | false | aaa | 2.2 | null | null |
| 3 | null | null | null | 2.1 | true | null |
| 4 | null | true | bbb | null | null | null |
| 5 | null | null | null | null | false | null |
| 6 | null | null | ccc | null | null | null |
You can also use narrow table form which as follows: (You can see part 6 about how to use narrow form)
| time | device_name | status | hardware | temperature |
|------|-------------------------------|--------------------------|----------------------------|-------------------------------|
| 1 | root.ln.wf02.wt01 | true | null | 2.2 |
| 1 | root.ln.wf02.wt02 | true | null | null |
| 2 | root.ln.wf02.wt01 | null | null | 2.2 |
| 2 | root.ln.wf02.wt02 | false | aaa | null |
| 3 | root.ln.wf02.wt01 | true | null | 2.1 |
| 4 | root.ln.wf02.wt02 | true | bbb | null |
| 5 | root.ln.wf02.wt01 | false | null | null |
| 6 | root.ln.wf02.wt02 | null | ccc | null |
## 6. Scala API
NOTE: Remember to assign necessary read and write permissions in advance.
### Example 1: read from the local file system
```scala
import org.apache.iotdb.tsfile._
val wide_df = spark.read.tsfile("test.tsfile")
wide_df.show
val narrow_df = spark.read.tsfile("test.tsfile", true)
narrow_df.show
```
### Example 2: read from the hadoop file system
```scala
import org.apache.iotdb.tsfile._
val wide_df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile")
wide_df.show
val narrow_df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile", true)
narrow_df.show
```
### Example 3: read from a specific directory
```scala
import org.apache.iotdb.tsfile._
val df = spark.read.tsfile("hdfs://localhost:9000/usr/hadoop")
df.show
```
Note 1: Global time ordering of all TsFiles in a directory is not supported now.
Note 2: Measurements of the same name should have the same schema.
### Example 4: query in wide form
```scala
import org.apache.iotdb.tsfile._
val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile")
df.createOrReplaceTempView("tsfile_table")
val newDf = spark.sql("select * from tsfile_table where `device_1.sensor_1`>0 and `device_1.senso
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
时间序列分析 一个时间序列通常由4种要素组成:趋势、季节变动、循环波动和不规则波动。 趋势:是时间序列在长时期内呈现出来的持续向上或持续向下的变动。 季节变动:是时间序列在一年内重复出现的周期性波动。它是诸如气候条件、生产条件、节假日或人们的风俗习惯等各种因素影响的结果。 循环波动:是时间序列呈现出得非固定长度的周期性变动。循环波动的周期可能会持续一段时间,但与趋势不同,它不是朝着单一方向的持续变动,而是涨落相同的交替波动。 不规则波动:是时间序列中除去趋势、季节变动和周期波动之后的随机波动。不规则波动通常总是夹杂在时间序列中,致使时间序列产生一种波浪形或震荡式的变动。只含有随机波动的序列也称为平稳序列。 时间序列建模基本步骤是:①用观测、调查、统计、抽样等方法取得被观测系统时间序列动态数据。②根据动态数据作相关图,进行相关分析,求自相关函数。相关图能显示出变化的趋势和周期,并能发现跳点和拐点。跳点是指与其他数据不一致的观测值。如果跳点是正确的观测值,在建模时应考虑进去,如果是反常现象,则应把跳点调整到期望值。拐点则是指时间序列从上升趋势突然变为下降趋势的点。
资源推荐
资源详情
资源评论
收起资源包目录
Apache IoTDB 是针对时间序列数据收集、存储与分析一体化的数据管理引擎 它具有体量轻、性能高、易使用的特点.zip (2000个子文件)
Session.cpp 32KB
SessionExample.cpp 9KB
sessionIT.cpp 5KB
main.cpp 1KB
webui.css 34KB
Session.h 21KB
index.html 3KB
StorageGroupProcessor.java 102KB
IoTDBSqlVisitor.java 79KB
MManager.java 71KB
RaftMember.java 69KB
TSServiceImpl.java 68KB
MetaGroupMember.java 65KB
PlanExecutor.java 65KB
IoTDBConfig.java 61KB
CMManager.java 59KB
RaftLogManagerTest.java 55KB
MetaGroupMemberTest.java 51KB
MTree.java 50KB
IoTDBAlignByDeviceIT.java 49KB
TsFileSequenceReader.java 48KB
Session.java 47KB
SyncLogDequeSerializer.java 47KB
IoTDBLevelCompactionIT.java 46KB
IoTDBTagIT.java 46KB
DataGroupMemberTest.java 45KB
PhysicalPlanTest.java 45KB
PhysicalGenerator.java 42KB
IoTDBDescriptor.java 42KB
TsFileProcessor.java 42KB
IoTDBSimpleQueryIT.java 41KB
SeriesReader.java 40KB
SessionPool.java 37KB
IOTDBGroupByIT.java 36KB
IoTDBAuthorizationIT.java 35KB
IoTDBAggregationIT.java 33KB
RaftLogManager.java 33KB
DataClusterServer.java 33KB
IoTDBAggregationLargeDataIT.java 33KB
RamUsageEstimator.java 33KB
DataGroupMember.java 33KB
StorageEngine.java 32KB
IoTDBUDTFAlignByTimeQueryIT.java 32KB
AbstractIoTDBJDBCResultSet.java 31KB
IoTDBFillIT.java 29KB
LocalQueryExecutor.java 29KB
IoTDBDatabaseMetadata.java 28KB
LevelCompactionTsFileManagement.java 27KB
Coordinator.java 27KB
IoTDBAggregationSmallDataIT.java 27KB
IoTDBGroupByFillIT.java 26KB
UDTFAlignByTimeDataSetTest.java 26KB
ClusterReaderFactory.java 25KB
ReadWriteIOUtils.java 25KB
SyncClient.java 25KB
IoTDBSessionComplexIT.java 25KB
FileSnapshot.java 24KB
MTreeTest.java 24KB
BytesUtils.java 24KB
StorageGroupProcessorTest.java 24KB
LevelCompactionRecoverTest.java 23KB
SyncLogDequeSerializerTest.java 23KB
IoTDBLoadExternalTsfileIT.java 23KB
TsFileOnlineUpgradeTool.java 23KB
SessionConnection.java 23KB
IoTDBAsIT.java 22KB
TsFileResource.java 22KB
AbstractCli.java 22KB
SerializeUtils.java 21KB
IoTDBMetadataFetchIT.java 21KB
IoTDBQueryDemoIT.java 21KB
ClusterPlanExecutor.java 21KB
SlotPartitionTableTest.java 21KB
DatetimeUtils.java 21KB
MergeMultiChunkTask.java 21KB
IoTDBLastIT.java 20KB
IoTDBCompleteIT.java 20KB
IoTDBTagAlterIT.java 20KB
IoTDBStatement.java 20KB
IoTDBNumberPathIT.java 20KB
MManagerBasicTest.java 19KB
SyncClientAdaptor.java 19KB
IoTDBUDFWindowQueryIT.java 19KB
RawQueryDataSetWithoutValueFilter.java 19KB
InputLayer.java 19KB
IoTDBMultiSeriesIT.java 19KB
IoTDBSessionSimpleIT.java 19KB
RestorableTsFileIOWriterTest.java 18KB
ConcatPathOptimizer.java 18KB
GorillaDecoderV2Test.java 18KB
ClusterPlanRouter.java 18KB
InsertTabletPlan.java 18KB
InsertRowPlan.java 18KB
SessionExample.java 17KB
AggregationExecutor.java 17KB
SlotPartitionTable.java 17KB
ClientMain.java 17KB
GroupByEngineDataSetTest.java 17KB
IoTDBUDFManagementIT.java 17KB
IoTDBDisableAlignIT.java 17KB
共 2000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 20
资源评论
野生的狒狒
- 粉丝: 2511
- 资源: 2146
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功