AnalysingthePerformanceofGPUHashTablesforStateSpaceExploration-2017(1712.09494)-计算机科学资源-CSDN文库

159 浏览量 2021-04-22 18:25:33 上传评论收藏 351KB PDF 举报

资源推荐

资源详情

资源评论

Timo Kehrer and Alice Miller (Eds.):

Workshop on Graphs as Models 2017 (GAM ’17)

EPTCS 263, 2017, pp. 1–15, doi:10.4204/EPTCS.263.1

 N.W. Cassee & A.J. Wijs

Analysing the Performance of GPU Hash Tables for State

Space Exploration

Nathan Cassee

Eindhoven University of Technology

Eindhoven, The Netherlands

N.W.Cassee@student.tue.nl

Anton Wijs

Eindhoven University of Technology

Eindhoven, The Netherlands

A.J.Wijs@tue.nl

Abstract: In the past few years, General Purpose Graphics Processors (GPUs) have been used to

signiﬁcantly speed up numerous applications. One of the areas in which GPUs have recently led to a

signiﬁcant speed-up is model checking. In model checking, state spaces, i.e., large directed graphs,

are explored to verify whether models satisfy desirable properties. GPUEXPLORE is a GPU-based

model checker that uses a hash table to efﬁciently keep track of already explored states. As a large

number of states is discovered and stored during such an exploration, the hash table should be able

to quickly handle many inserts and queries concurrently. In this paper, we experimentally compare

two different hash tables optimised for the GPU, one being the GPUEXPLORE hash table, and the

other using Cuckoo hashing. We compare the performance of both hash tables using random and

non-random data obtained from model checking experiments, to analyse the applicability of the two

hash tables for state space exploration. We conclude that Cuckoo hashing is three times faster than

GPUEXPLORE hashing for random data, and that Cuckoo hashing is ﬁve to nine times faster for

non-random data. This suggests great potential to further speed up GPUEXPLORE in the near future.

1 Introduction

General Purpose Graphics Processors (GPUs) have been used to signiﬁcantly speed up computations in

numerous application domains. Contrary to CPUs, GPUs can handle many thousands of parallel threads,

allowing for a great increase in parallel processing of data. This increase has opened up new possibilities

to improve the performance of algorithms, as signiﬁcant speedups can be achieved by using optimised

algorithms and lock-free data structures.

One of the domains to which GPU computing has been applied in the last few years is model check-

ing. Model checking involves taking a model of an (often concurrent) system and verifying whether

certain properties are satisﬁed by that model. The semantics of such models can be expressed in a state

space, which is a directed graph (often sparse), with labels either on the edges or the nodes. During model

checking, such a state space is constructed by interpreting the model. Starting from the initial state of the

model, exploration continues until either all reachable states have been explored, or a counter-example

to the property being veriﬁed has been encountered [4]. This operation can be very resource-demanding,

as state spaces tend to grow exponentially as the number of concurrent components in a model grows

linearly. This problem is commonly referred to as the state space explosion problem. For this reason,

the use of modern parallel architectures is very appealing, since it could make the analysis of large state

spaces more practically feasible.

In recent years, approaches have been developed to perform state space exploration either using a

combination of the CPU and the GPU, or speciﬁcally using the GPU [5, 7, 8, 21, 26, 27, 23, 28, 29].

Most approaches use an exploration strategy similar to Breadth-First Search (BFS).

2 Analysing the Performance of GPU Hash Tables for State Space Exploration

A data structure commonly used during state space exploration is the hash table [10]. It is used to

keep track of the states encountered so far. Accessing the hash table tends to be done very frequently

during model checking, so in a parallel setting, it is critical that the hash table being used can very

efﬁciently handle parallel accesses. For this reason, ﬁrst of all, it needs to be lock-free, and second of

all, it needs to have a suitable hashing mechanism. Even though identifying which hash tables are most

suitable for model checking is an important undertaking, for GPUs, this has not been thoroughly done

in the past, or at least, no reports have been published on it. In this paper, we discuss such a thorough

comparison.

Only a very few different hash tables have been proposed so far for GPUs. A notable one is the one

by Alcantara et al. [3]. They propose to use a version of Cuckoo hashing [17] optimised for parallel

execution using 64-bit data elements. It has been adopted in the standard CUDPP library

. In addition,

Wijs and Bo

sna

cki developed their own custom hash table for their parallel state space exploration tool

GPUEXPLORE [23, 28].

Alcantara et al. compared the performance of their hash table to the building and querying of a sorted

array. The keys and values used by Alcantara et al. to test the performance of Cuckoo hashing are ran-

domly generated integers [1]. Consequently, the performance of their Cuckoo hash table has previously

only been analysed using sequences of random and unique integers, and not using non-random sequences

stemming from real data. Since the performance of a hash table can be inﬂuenced by different patterns

encountered in the input data, for instance, the frequency at which the same values are encountered again

and again, it is useful to consider non-random data as well.

Beides this, for the hash table implemented in GPUEXPLORE, no isolated performance evaluation

has been done in the past. Instead of Cuckoo hashing, the hash table implemented for GPUEXPLORE

uses another way to resolve collisions and instead of operating on 64-bit elements, it supports elements

of arbitrary length.

This paper addresses reports the results of a performance evaluation between Cuckoo hashing and

GPUEXPLORE hashing. We analyse Cuckoo hashing and GPUEXPLORE hashing with both random and

non-random datasets. The non-random data originates from actual state space explorations performed by

a model checker, in which the revisiting of states has been recorded. This data allows us to compare the

applicability of the two hash tables for state space exploration.

The structure of the paper is as follows. Related work is discussed in Section 2. In Section 3,

ﬁrst a brief explanation of GPUs is given, which is particularly focussed on NVIDIA GPUs and the

CUDA programming interface, as we employed those GPUs for our experiments. Furthermore, Section

3 also provides an overview of the approach used in GPUEXPLORE to do model checking. A detailed

description of the two hash tables is given in Section 4. The experimental setup is explained in Section 5.

In Section 6, the results of the comparisons are discussed, and ﬁnally, conclusion and pointers to future

work are given in Section 7.

2 Related work

There are several papers introducing lock-free hash tables for GPUs [3, 13, 14, 6]. However, in those

papers only limited performance evaluations have been conducted. Speciﬁcally, the evaluations that

have been performed involved sequences of equally distributed random values. This is the case for the

performance evaluations conducted in the dissertation of Alcantara [1], and the performance evaluation

conducted by Misra and Chaudhur [13].

CUDPP library: http://cudpp.github.io.

N.W. Cassee & A.J. Wijs 3

The hash table described by Alcantara et al. [1, 3] uses Cuckoo hashing and is implemented in

the CUDA library CUDPP. In the work of Alcantara et al., different aspects of Cuckoo hashing are

thoroughly analysed. Various aspects inﬂuencing the performance of the hash table are analysed and

compared, including the size of the hash table and how long an insertion on the hash table tends to take.

However, no performance evaluation on non-random data for Cuckoo hashing on the GPU exists in the

literature.

Other lock-free hash tables are those proposed by Moazeni & Sarrafzadeh [14] and by Bordawekar

[6]. Unfortunately, to the best of our knowledge, no implementations are publicly available.

Regarding GPU accelerated model checking, relevant work is the paper by Bartocci et al. [5], in

which they introduce modiﬁcations of the SPIN model checker that take advantage of a GPU architecture.

They experience signiﬁcant performance increases for larger problems, for which concurrency of the

GPU starts to pays off. The hash table they use is the one by Alcantara et al., which means that they are

restricted to analysing state spaces containing states that can be stored in 64 bits.

Edelkamp et al. [7] investigated the applicability of GPUs for probabilistic model checking, a pro-

cess in which matrix-vector multiplications are performed and systems of linear equations are solved.

Modiﬁcations to the algorithm such that these operations are performed on a GPU can cause speed-ups

of 18 times compared to executions on the CPU.

Furthermore, Wu. et al. investigated the use of GPUs for on-the-ﬂy state space exploration using

Cuckoo hashing [29], and Wijs and Bo

sna

cki developed GPUEXPLORE, which uses a custom collision

revolvement scheme to maintain which states have already been visited. The latter tool achieves a per-

formance increase of, on average, 120 times compared to CPU based state space exploration, in cases

where state spaces consisting of at least a million states are analysed [23, 28].

Finally, Edelkamp et al. investigated the applicability of GPUs to speedup the generation of very

large state spaces using BFS [11]. In their work, Edelkamp et al. utilise a perfect hashing function to

keep track of the current depth of the BFS, so that during iterations over the state space, only the currently

open states are considered.

3 Background

3.1 CUDA

In the 2000s, development of more programmable GPUs started. The change from ﬁxed pipeline graph-

ical units to programmable graphical processors opened up new possibilities for software developers.

GPUs have signiﬁcantly more processing units than CPUs, allowing more instructions to be executed in

parallel.

The introduction of more programmable GPUs has made it possible to use GPUs for applications

other than graphical calculations. Before the arrival of general purpose graphics processors, the only

way to use GPUs for more general purpose applications was by transforming the problem to a problem of

triangles and polygons, and then solving this problem using the available graphics Software Development

Kits (SDK).

The increased capability of GPUs led Stanford researchers to deﬁne GPUs as stream processors. In

turn, this has made it possible to compile C code to be executed on the GPU. NVIDIA expanded upon

this concept and in 2006 released the ﬁrst version of CUDA. CUDA is an SDK that allows code in a

high-level language such as C or C++ to be compiled for, and executed on, an NVIDIA GPU.

Unlike CPUs, GPUs are built to take large batches of data and execute the same, short, sequence of

operations in parallel on all elements of that data. Because of this, GPUs often consist of several hundreds

剩余14页未读，继续阅读

评论收藏

内容反馈

weixin_38747444

粉丝: 9
资源: 912

Analysing the Performance of GPU Hash Tables for State Space Exp...

最新资源

Analysing the Performance of GPU Hash Tables for State Space Exp...

tsa时间序列分析预测.rar

《网络数据爬取与分析实务教程》相关代码与数据集_Data-Crawling-and-Analysing.zip

range.rar 词频统计软件

Understanding Advanced JavaScript

DbMouse Pro 5.6.3

Oxford NLP lecture

2014-guidance-taking-safe-decisions.pdf

Cognitive Computing for Big Data Systems Over IoT(2017).epub

Analysing-of--C-language.rar_Windows编程_Visual_C++_

Bayesian Methods for Statistical Analysis_[Puza2015]

Analysing-Transport-Data-for-Co.-Wexford

Analysing the POC of CVE-2012-0003

jdk-9.0.1_doc-all 最新版

Design_and_Implementation_of_a_Wireless_OBD_II_Fle.pdf

libforensic1394-0.2.tar.gz

This module contains functions for analysing strings. This .

Programmable User Modelling Applications

Maple_in_Finance_ApplicationPack

A PACKING GENERATION SCHEME FOR THE GRANULAR.pdf

MCSD - Analysing Requirements and Defining Solutions Architecture.CHM

IOTSim A simulator for analysing IoT applications

ICS-packets-Analyzer:它可以分析ICS的数据包

Advances in Feature Selection for Data and Pattern Recognition

Analytical Network and System Administration: Managing Human-Computer Systems

Qt 5实现串口调试助手 （源工程文件、0积分下载）

【SystemVerilog】路科验证V2学习笔记（全600页）.pdf

最新资源

Qt 5实现串口调试助手（源工程文件、0积分下载）