# *DB4AI Snapshots* for relational dataset versioning
## 1. Introduction
The module *DB4AI Snapshots* provides a robust and efficient framework that allows any database user to manage multiple versions of relational datasets through a convenient API.
### 1.1 Compact storage with efficient access
Its main benefits are automated management of dependencies among large, versioned datasets and their compact storage. Therefore, *DB4AI Snapshots* leverages redundancies among different versions of data for providing a high level of compaction while minimizing adverse effects on performance. At the same time, snapshot data remains efficiently accessible using the full expressiveness of standard SQL.
### 1.2 Immutable snapshot data
The snapshot API prevents the user from changing versioned data. Similar to a code repository, every change of data will generate a new version, i.e. a new snapshot.
### 1.3 Performance
In addition to compact data representation, the primary goal of *DB4AI Snapshots* is high read performance, i.e. when used in highly repetitive and concurrent read operations for serving as training data sets for concurrent training of multiple ML models.
As a secondary performance goal, *DB4AI Snapshots* provides efficient data manipulation for creating and manipulation huge volumes of versioned relational data.
### 1.4 Documentation and Automation
In addition, *DB4AI Snapshots* maintains a full documentation of the origin of any dataset, providing lineage and provenance for data, e.g. to be used in the context of reproducible training of ML models. In addition, *DB4AI Snapshots* facilitates automation, i.e. when applying repetitive transformation steps in data cleansing, or for automatically updating an existing training dataset with new data.
## 2. Quick start
### 2.1 Setup
*DB4AI Snapshots* is automatically installed in every new database instance of openGauss.
Therefore the CREATE DATABASE procedure creates the database schema *db4ai* within the new database and populates it with objects required for managing snapshot data.
After successful database creation, any user may start exploring the snapshot functionality. No additional privileges are required.
### 2.2 Examples
Set snapshot mode to compact storage model *CSS* (**C**omputed **S**nap**S**hot mode).
SET db4ai_snapshot_mode = CSS;
Create a snapshot 'm0@1.0.0' from existing data, where stored in table 'test_data'. The SQL statement may use arbitrary joins and mappings for defining the snapshot.
CREATE SNAPSHOT m0 AS
SELECT a1, a3, a6, a8, a2, pk, a1 b9, a7, a5 FROM test_data;
Create snapshot 'm0@2.0.0' from snapshot 'm0@1.0.0' by applying arbitrary DML and DDL statements.
The new version number indicates a snapshot schema revision by means of at least one ALTER statement.
CREATE SNAPSHOT m0 FROM @1.0.0 USING (
UPDATE SNAPSHOT SET a1 = 5 WHERE pk % 10 = 0;
ALTER SNAPSHOT ADD " u+ " INTEGER, ADD "x <> y"INT DEFAULT 2, ADD t CHAR(10) DEFAULT '',
DROP a2, DROP COLUMN IF EXISTS b9, DROP COLUMN IF EXISTS b10;
UPDATE SNAPSHOT SET "x <> y" = 8 WHERE pk < 100;
ALTER SNAPSHOT DROP " u+ ", DROP IF EXISTS " x+ ";
DELETE FROM SNAPSHOT WHERE pk = 3
);
Create snapshot 'm0@2.0.1' from snapshot 'm0@2.0.0' by UPDATE while using a reference to another table. The new version number indicates an update operation (minor data patch). This example uses an AS clause for introducing 'i' as the snapshot's custom correlation name for joining with tables during the UPDATE operation.
CREATE SNAPSHOT m0 FROM @2.0.0 USING (
UPDATE SNAPSHOT AS i SET a5 = o.a2 FROM test_data o
WHERE i.pk = o.pk AND o.a3 % 8 = 0
);
Create snapshot 'm0@2.1.0' from snapshot 'm0@2.0.1' by DELETE while using a reference to another table. The new version number indicates a data revision. This example uses the snapshot's default correlation name 'SNAPSHOT' for joining with another table.
CREATE SNAPSHOT m0 FROM @2.0.1 USING (
DELETE FROM SNAPSHOT USING test_data o
WHERE SNAPSHOT.pk = o.pk AND o.A7 % 2 = 0
);
Create snapshot 'm0@2.2.0' from snapshot 'm0@2.1.0' by inserting new data. The new version number indicates another data revision.
CREATE SNAPSHOT m0 FROM @2.1.0 USING (
INSERT INTO SNAPSHOT SELECT a1, a3, a6, a8, a2, pk+1000 pk, a7, a5, a4
FROM test_data WHERE pk % 10 = 4
);
The SQL syntax was extended with the new @ operator in relation names, allowing the user to specify a snapshot with version throughout SQL. Internally, snapshots are stored as views, where the actual name is generated according to GUC parameters on arbitrary level, e.g. on database level using the current setting of the version delimiter:
-- DEFAULT db4ai_snapshot_version_delimiter IS '@'
ALTER DATABASE <name> SET db4ai_snapshot_version_delimiter = '#';
Similarly the version separator can be changed by the user:
-- DEFAULT db4ai_snapshot_version_separator IS ’.’
ALTER DATABASE <name> SET db4ai_snapshot_version_separator = ’_’;
Independently from the GUC parameter settings mentioned above, any snapshot version can be accessed:
-- standard version string @schema.revision.patch:
SELECT * FROM public.data@1.2.3;
-- user-defined version strings:
SELECT * FROM accounting.invoice@2021;
SELECT * FROM user.cars@cleaned;
-- quoted identifier for blanks, keywords, special characters, etc.:
SELECT * FROM user.cars@"rev 1.1";
-- or string literal:
SELECT * FROM user.cars@'rev 1.1';
Alternative, using internal name (depends on GUC settings):
-- With internal view name, using default GUC settings
SELECT * FROM public."data@1.2.3";
-- With internal view name, using custom GUC settings, as above
SELECT * FROM public.data#1_2_3;
## 3. Privileges
All members of role **PUBLIC** may use **DB4AI Snapshots**.
## 4. Supported Systems and SQL compatibility modes
The current version of *DB4AI Snapshots* is tested in openGauss SQL compatibility modes A, B and C.
## 5. Portability
*DB4AI Snapshots* uses standard SQL for implementing its functionality.
## 6. Dependencies
None.
## 7. Reference & Documentation
### 7.1 Configuration parameters
**DB4AI Snapshots** exposes several configuration parameters, via the system's global unified configuration (GUC) management.
Configuration parameters may be set on the scope of functions (CREATE FUNCTION), transactions (SET LOCAL), sessions (SET),
user (ALTER USER), database (ALTER DATABASE), or on system-wide scope (postgresql.conf).
SET [SESSION | LOCAL] configuration_parameter { TO | = } { value | 'value' }
CREATE FUNCTION <..> SET configuration_parameter { TO | = } { value | 'value' }
ALTER DATABASE name SET configuration_parameter { TO | = } { value | 'value' }
ALTER USER name [ IN DATABASE database_name ] SET configuration_parameter { TO | = } { value | 'value' }
The following snapshot configuration parameters are currently supported:
#### db4ai_snapshot_mode = { MSS | CSS }
This snapshot configuration parameter allows to switch between materialized snapshot (*MSS*) mode, where every new snapshot is created as compressed but fully
materialized copy of its parent's data, or computed snapshot (*CSS*) mode. In *CSS* mode, the system attempts to exploit redundancies among dependent snapshot versions
for minimizing storage requirements.
The setting of *db4ai_snapshot_mode* may be adjusted at any time and it will have effect on subsequent snapshot operations within the scope of the new setting.
Whenever *db4ai_snapshot_mode* is not set in the current scope, it defaults to *MSS*.
##### Example
SET db4ai_snapshot_mode = CSS;
#### db4ai_snapshot_version_delimiter = value
This snapshot configuration parameter controls the character that delimits the *snapshot version* postfix within snapshot names.
In consequence, the character used as *db4ai_snapshot_version_delimiter*
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
openGauss kernel ~ openGauss是一个开源的关系型数据库管理系统。它具有多核高性能、全链路安全性、智能运维等企业级特性。 openGauss内核早期源自开源数据库PostgreSQL,融合了华为在数据库领域多年的内核经验,在架构、事务、存储引擎、优化器及ARM架构上进行了适配与优化。作为一个开源数据库,期望与广泛的开发者共同构建一个多元化技术的开源数据库社区。
资源推荐
资源详情
资源评论
收起资源包目录
openGauss kernel ~ openGauss是一个开源的关系型数据库管理系统 (2000个子文件)
costsize.cpp 266KB
rewriteHandler.cpp 210KB
indxpath.cpp 183KB
allpaths.cpp 162KB
joinpath.cpp 109KB
es_selectivity.cpp 97KB
equivclass.cpp 92KB
pathkeys.cpp 64KB
cm_config.cpp 64KB
streampath_base.cpp 54KB
gstrace_tool.cpp 53KB
fe-connect.cpp 52KB
rewriteManip.cpp 51KB
clausesel.cpp 50KB
cm_elog.cpp 49KB
joinrels.cpp 44KB
cm_misc.cpp 44KB
alarm.cpp 42KB
gstrace_infra.cpp 39KB
rewriteDefine.cpp 35KB
pqcomm.cpp 31KB
fe-misc.cpp 28KB
elog.cpp 25KB
PageCompression.cpp 23KB
rewriteRlsPolicy.cpp 12KB
streampath_single.cpp 12KB
alarm_log.cpp 12KB
cm_stringinfo.cpp 10KB
pqexpbuffer.cpp 9KB
pgxcpath_single.cpp 9KB
rewriteSupport.cpp 9KB
tidpath.cpp 8KB
fe-protocol.cpp 7KB
findoidjoins.cpp 7KB
cm_cgroup.cpp 6KB
entab.cpp 6KB
pqformat.cpp 6KB
hotpatch_client.cpp 6KB
pqsignal.cpp 4KB
build_query.cpp 3KB
rewriteRemove.cpp 3KB
strlcpy.cpp 2KB
fe_memutils.cpp 2KB
ip.cpp 2KB
ip.cpp 2KB
test_ifaddrs.cpp 1KB
halt.cpp 1KB
cm_path.cpp 1KB
be_module.cpp 1KB
stylesheet.css 2KB
knl_thread.h 124KB
execnodes.h 112KB
pgstat.h 107KB
knl_session.h 103KB
relation.h 101KB
parsenodes_common.h 92KB
radixtree.h 89KB
parsenodes.h 87KB
page_compression_impl.h 85KB
builtins.h 81KB
plpgsql.h 71KB
plannodes.h 64KB
primnodes.h 63KB
nbtree.h 58KB
dms_api.h 56KB
htup.h 55KB
xlogproc.h 54KB
pg_type.h 53KB
commproxy_interface.h 47KB
cm_msg.h 46KB
knl_instance.h 46KB
tableam.h 41KB
c.h 41KB
indexing.h 39KB
xlog.h 38KB
executor.h 38KB
kwlist.h 36KB
s_lock.h 36KB
simplehash.h 35KB
rel_gs.h 35KB
gin_private.h 35KB
lock.h 35KB
rel.h 34KB
elog.h 34KB
instrument.h 34KB
gtm_slock.h 34KB
fmgr.h 33KB
postgres.h 33KB
gscodegen.h 32KB
miscadmin.h 32KB
statctl.h 32KB
partitionmap_gs.h 32KB
nodes.h 31KB
libpq-int.h 30KB
securec.h 29KB
segment_internal.h 26KB
vacuum.h 26KB
ubtree.h 25KB
pg_amproc.h 25KB
libpq-fe.h 24KB
共 2000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 20
资源评论
Java程序员-张凯
- 粉丝: 1w+
- 资源: 7366
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功