【免费】数据库数据分析流水线适配_数据库数据资源-CSDN文库

数据分析

需积分: 0 43 浏览量 2024-05-23 19:25:36 上传评论收藏 281KB PDF 举报

资源推荐

资源详情

资源评论

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/45338800

MonetDB/X100: Hyper-Pipelining Query Execution

Article · January 2005

Source: DBLP

CITATIONS

517

READS

2,328

3 authors, including:

Peter A. Boncz

Centrum Wiskunde & Informatica

143 PUBLICATIONS7,120 CITATIONS

SEE PROFILE

Niels Nes

Centrum Wiskunde & Informatica

49 PUBLICATIONS2,563 CITATIONS

SEE PROFILE

All content following this page was uploaded by Niels Nes on 21 May 2014.

The user has requested enhancement of the downloaded file.

MonetDB/X100: Hyper-Pipelining Query Execution

Peter Boncz, Marcin Zukowski, Niels Nes

CWI

Kruislaan 413

Amsterdam, The Netherlands

{P.Boncz,M.Zukowski,N.Nes}@cwi.nl

Abstract

Database systems tend to achieve only

low IPC (instructions-per-cycle) eﬃciency on

modern CPUs in compute-intensive applica-

tion areas like decision support, OLAP and

multimedia retrieval. This paper starts with

an in-depth investigation to the reason why

this happens, focusing on the TPC-H bench-

mark. Our analysis of various relational sys-

tems and MonetDB leads us to a new set of

guidelines for designing a query processor.

The second part of the paper describes the

architecture of our new X100 query engine

for the MonetDB system that follows these

guidelines. On the surface, it resembles a

classical Volcano-style engine, but the cru-

cial diﬀerence to base all execution on the

concept of vector processing makes it highly

CPU eﬃcient. We evaluate the power of Mon-

etDB/X100 on the 100GB version of TPC-H,

showing its raw execution power to be between

one and two orders of magnitude higher than

previous technology.

1 Introduction

Modern CPUs can perform enormous amounts of cal-

culations per second, but only if they can ﬁnd enough

independent work to exploit their parallel execution

capabilities. Hardware developments during the past

decade have signiﬁcantly increased the speed diﬀerence

between a CPU running at full throughput and mini-

mal throughput, which can now easily be an order of

magnitude.

Permission to copy without fee all or part of this material is

granted provided that the copies are not made or distributed for

direct commercial advantage, the VLDB copyright notice and

the title of the publication and its date appear, and notice is

given that copying is by permission of the Very Large Data Base

Endowment. To copy otherwise, or to republish, requires a fee

and/or special permission from the Endowment.

Proceedings of the 2005 CIDR Conference

One would expect that query-intensive database

workloads such as decision support, OLAP, data-

mining, but also multimedia retrieval, all of which re-

quire many independent calculations, should provide

modern CPUs the opportunity to get near optimal IPC

(instructions-per-cycle) eﬃciencies.

However, research has shown that database systems

tend to achieve low IPC eﬃciency on modern CPUs in

these application areas [6, 3]. We question whether

it should really be that way. Going beyond the (im-

portant) topic of cache-conscious query processing, we

investigate in detail how relational database systems

interact with modern super-scalar CPUs in query-

intensive workloads, in particular the TPC-H decision

support benchmark.

The main conclusion we draw from this investiga-

tion is that the architecture employed by most DBMSs

inhibits compilers from using their most performance-

critical optimization techniques, resulting in low CPU

eﬃciencies. Particularly, the common way to im-

plement the popular Volcano [10] iterator model for

pipelined processing, leads to tuple-at-a-time execu-

tion, which causes both high interpretation overhead,

and hides opportunities for CPU parallelism from the

compiler.

We also analyze the performance of the main mem-

ory database system MonetDB

, developed in our

group, and its MIL query language [4]. MonetDB/MIL

uses a column-at-a-time execution model, and there-

fore does not suﬀer from problems generated by tuple-

at-a-time interpretation. However, its policy of full

column materialization causes it to generate large data

streams during query execution. On our decision sup-

port workload, we found MonetDB/MIL to become

heavily constrained by memory bandwidth, causing its

CPU eﬃciency to drop sharply.

Therefore, we argue for combining the column-wise

execution of MonetDB with the incremental material-

ization oﬀered by Volcano-style pipelining.

We designed and implemented from scratch a new

query engine for the MonetDB system, called X100,

MonetDB is now in open-source, see monetdb.cwi.nl

that employs a vectorized query processing model.

Apart from achieving high CPU eﬃciency, Mon-

etDB/X100 is intended to scale up towards non main-

memory (disk-based) datasets. The second part of this

paper is dedicated to describing the architecture of

MonetDB/X100 and evaluating its performance on the

full TPC-H benchmark of size 100GB.

1.1 Outline

This paper is organized as follows. Section 2 provides

an introduction to modern super-scalar (or hyper-

pipelined) CPUs, covering the issues most relevant for

query evaluation performance. In Section 3, we study

TPC-H Query 1 as a micro-benchmark of CPU eﬃ-

ciency, ﬁrst for standard relational database systems,

then in MonetDB, and ﬁnally we descend into a stan-

dalone hand-coded implementation of this query to get

a baseline of maximum achievable raw performance.

Section 4 describes the architecture of our new X100

query processor for MonetDB, focusing on query exe-

cution, but also sketching topics like data layout, in-

dexing and updates.

In Section 5, we present a performance comparison

of MIL and X100 inside the Monet system on the TPC-

H benchmark. We discuss related work in Section 6,

before concluding in Section 7.

2 How CPUs Work

Figure 1 displays for each year in the past decade the

fastest CPU available in terms of MHz, as well as high-

est performance (one thing does not necessarily equate

the other), as well as the most advanced chip manu-

facturing technology in production that year.

The root cause for CPU MHz improvements is

progress in chip manufacturing process scales, that

typically shrink by a factor 1.4 every 18 months (a.k.a.

Moore’s law [13]). Every smaller manufacturing scale

means twice (the square of 1.4) as many, and twice

smaller transistors, as well as 1.4 times smaller wire

distances and signal latencies. Thus one would expect

CPU MHz to increase with inverted signal latencies,

but Figure 1 shows that clock speed has increased even

further. This is mainly done by pipelining: dividing

the work of a CPU instruction in ever more stages.

Less work per stage means that the CPU frequency

can be increased. While the 1988 Intel 80386 CPU

executed one instruction in one (or more) cycles, the

1993 Pentium already had a 5-stage pipeline, to be in-

creased in the 1999 PentiumIII to 14 while the 2004

Pentium4 has 31 pipeline stages.

Pipelines introduce two dangers: (i) if one instruc-

tion needs the result of a previous instruction, it can-

not be pushed into the pipeline right after it, but must

wait until the ﬁrst instruction has passed through the

pipeline (or a signiﬁcant fraction thereof), and (ii) in

case of IF-a-THEN-b-ELSE-c branches, the CPU must

130nm

250nm

500nm

pipelining

hyper−pipelining

Alpha21164A

350nm

Athlon

Pentium4

Alpha21164

Alpha21164B

POWER4

Itanium2

Alpha21064A

Alpha21064

1000

10000

1994 1996 1998 2000 2002

1000

10000

1994 1996 1998 2000 2002

1000

10000

1994 1996 1998 2000 2002

1000

10000

1994 1996 1998 2000 2002

inverted gate distance

CPU Performance (SPECcpu int+fp)

CPU MHz

Figure 1: A Decade of CPU Performance

predict whether a will evaluate to true or false. It

might guess the latter and put c into the pipeline, just

after a. Many stages further, when the evaluation of

a ﬁnishes, it may determine that it guessed wrongly

(i.e. mispredicted the branch), and then must ﬂush

the pipeline (discard all instructions in it) and start

over with b. Obviously, the longer the pipeline, the

more instructions are ﬂushed away and the higher the

performance penalty. Translated to database systems,

branches that are data-dependent, such as those found

in a selection operator on data with a selectivity that

is neither very high nor very low, are impossible to

predict and can signiﬁcantly slow down query execu-

tion [17].

In addition, super-scalar CPUs

oﬀer the possibility

to take multiple instructions into execution in parallel

if they are independent. That is, the CPU has not one,

but multiple pipelines. Each cycle, a new instruction

can be pushed into each pipeline, provided again they

are independent of all instructions already in execu-

tion. A super-scalar CPU can get to an IPC (Instruc-

tions Per Cycle) of > 1. Figure 1 shows that this has

allowed real-world CPU performance to increase faster

than CPU frequency.

Modern CPUs are balanced in diﬀerent ways. The

Intel Itanium2 processor is a VLIW (Very Large In-

struction Word) processor with many parallel pipelines

(it can execute up to 6 instructions per cycle) with

only few (7) stages, and therefore a relatively low clock

speed of 1.5GHz. In contrast, the Pentium4 has its

very long 31-stage pipeline allowing for a 3.6GHz clock

speed, but can only execute 3 instructions per cycle.

Either way, to get to its theoretical maximum through-

put, an Itanium2 needs 7x6 = 42 independent instruc-

tions at any time, while the Pentium4 needs 31x3 = 93.

Such parallelism cannot always be found, and there-

fore many programs use the resources of the Itanium2

much better than the Pentium4, which explains why in

benchmarks the performance of both CPUs is similar,

despite the big clock speed diﬀerence.

Intel introduced the term hyper-pipelined as a synonym for

“super-scalar”, to market its Pentium4 CPU.

剩余13页未读，继续阅读

评论收藏

内容反馈

weixin_44043328

粉丝: 0
资源: 2

数据库 数据分析 流水线 适配

activiti适配达梦数据库教程

xxl-job适配达梦数据库

flowable 6.8 适配达梦数据库

nacos适配dm达梦数据库

nacos适配达梦、神通等国产数据库版本.rar

Nacos适配达梦数据库

nacos2.1.1适配highgo瀚高数据源

Nacos2.1.2适配达梦8数据库

nacos2.2.3 适配dm数据库

nacos-server-2.1.1 适配达梦数据库

nacos2.2.0适配DM达梦数据库源码

对activiti二次开发适配达梦数据库.docx

nacos适配oracle数据库

达梦数据库使用，包括安装，配置，适配Mysql、备份与还原

nacos2.2.3 适配南大通用数据库Gbase 8S

SpringBoot集成MyBatis-Plus实现国产数据库适配.docx

nacos适配postgresql数据库

实时显示数据库数据的手机APP

2023年更新的最新版全国五级行政区域数据库表以及sql文件

数据库系统概念（原书第七版）课后作业题

年度变更举证DB照片查看工具

Navicat15安装包和安装教程.zip

DB Browser for SQLite 数据库查看工具

JAVA课程设计，学生管理系统，设计SQL server数据库操作

Hive练习 1.在Hive中建立数据库hive，该库中建立外部表party，将hdfs的/party.csv导入该表，用命令

i2 Analyst's Notebook for free免狗

mongodb-linux-x86_64-rhel70-4.4.13安装包和conf配置文件

全国最新行政区划，包括省、市、区、街道四个级别(2024年5月15日-来源与腾讯地图)

山东大学火车票售票系统数据库课程设计

最新资源

数据库数据分析流水线适配