没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
Timo Kehrer and Alice Miller (Eds.): Workshop on Graphs as Models 2017 (GAM ’17) EPTCS 263, 2017, pp. 1–15, doi:10.4204/EPTCS.263.1c:copyright: N.W. Cassee & A.J. WijsAnalysing the Performance of GPU Hash Tables for State Space ExplorationNathan Cassee Eindhoven University of TechnologyEindhoven, The Netherlands N.W.Cassee@student.tue.nlAnton Wijs Eindhoven University of TechnologyEindhoven, The Netherlands A.J.Wijs@tue.nlAbstract: In the past few years, General Purpose Graphics Processors (GPUs
资源推荐
资源详情
资源评论
Timo Kehrer and Alice Miller (Eds.):
Workshop on Graphs as Models 2017 (GAM ’17)
EPTCS 263, 2017, pp. 1–15, doi:10.4204/EPTCS.263.1
c
N.W. Cassee & A.J. Wijs
Analysing the Performance of GPU Hash Tables for State
Space Exploration
Nathan Cassee
Eindhoven University of Technology
Eindhoven, The Netherlands
N.W.Cassee@student.tue.nl
Anton Wijs
Eindhoven University of Technology
Eindhoven, The Netherlands
A.J.Wijs@tue.nl
Abstract: In the past few years, General Purpose Graphics Processors (GPUs) have been used to
significantly speed up numerous applications. One of the areas in which GPUs have recently led to a
significant speed-up is model checking. In model checking, state spaces, i.e., large directed graphs,
are explored to verify whether models satisfy desirable properties. GPUEXPLORE is a GPU-based
model checker that uses a hash table to efficiently keep track of already explored states. As a large
number of states is discovered and stored during such an exploration, the hash table should be able
to quickly handle many inserts and queries concurrently. In this paper, we experimentally compare
two different hash tables optimised for the GPU, one being the GPUEXPLORE hash table, and the
other using Cuckoo hashing. We compare the performance of both hash tables using random and
non-random data obtained from model checking experiments, to analyse the applicability of the two
hash tables for state space exploration. We conclude that Cuckoo hashing is three times faster than
GPUEXPLORE hashing for random data, and that Cuckoo hashing is five to nine times faster for
non-random data. This suggests great potential to further speed up GPUEXPLORE in the near future.
1 Introduction
General Purpose Graphics Processors (GPUs) have been used to significantly speed up computations in
numerous application domains. Contrary to CPUs, GPUs can handle many thousands of parallel threads,
allowing for a great increase in parallel processing of data. This increase has opened up new possibilities
to improve the performance of algorithms, as significant speedups can be achieved by using optimised
algorithms and lock-free data structures.
One of the domains to which GPU computing has been applied in the last few years is model check-
ing. Model checking involves taking a model of an (often concurrent) system and verifying whether
certain properties are satisfied by that model. The semantics of such models can be expressed in a state
space, which is a directed graph (often sparse), with labels either on the edges or the nodes. During model
checking, such a state space is constructed by interpreting the model. Starting from the initial state of the
model, exploration continues until either all reachable states have been explored, or a counter-example
to the property being verified has been encountered [4]. This operation can be very resource-demanding,
as state spaces tend to grow exponentially as the number of concurrent components in a model grows
linearly. This problem is commonly referred to as the state space explosion problem. For this reason,
the use of modern parallel architectures is very appealing, since it could make the analysis of large state
spaces more practically feasible.
In recent years, approaches have been developed to perform state space exploration either using a
combination of the CPU and the GPU, or specifically using the GPU [5, 7, 8, 21, 26, 27, 23, 28, 29].
Most approaches use an exploration strategy similar to Breadth-First Search (BFS).
2 Analysing the Performance of GPU Hash Tables for State Space Exploration
A data structure commonly used during state space exploration is the hash table [10]. It is used to
keep track of the states encountered so far. Accessing the hash table tends to be done very frequently
during model checking, so in a parallel setting, it is critical that the hash table being used can very
efficiently handle parallel accesses. For this reason, first of all, it needs to be lock-free, and second of
all, it needs to have a suitable hashing mechanism. Even though identifying which hash tables are most
suitable for model checking is an important undertaking, for GPUs, this has not been thoroughly done
in the past, or at least, no reports have been published on it. In this paper, we discuss such a thorough
comparison.
Only a very few different hash tables have been proposed so far for GPUs. A notable one is the one
by Alcantara et al. [3]. They propose to use a version of Cuckoo hashing [17] optimised for parallel
execution using 64-bit data elements. It has been adopted in the standard CUDPP library
1
. In addition,
Wijs and Bo
ˇ
sna
ˇ
cki developed their own custom hash table for their parallel state space exploration tool
GPUEXPLORE [23, 28].
Alcantara et al. compared the performance of their hash table to the building and querying of a sorted
array. The keys and values used by Alcantara et al. to test the performance of Cuckoo hashing are ran-
domly generated integers [1]. Consequently, the performance of their Cuckoo hash table has previously
only been analysed using sequences of random and unique integers, and not using non-random sequences
stemming from real data. Since the performance of a hash table can be influenced by different patterns
encountered in the input data, for instance, the frequency at which the same values are encountered again
and again, it is useful to consider non-random data as well.
Beides this, for the hash table implemented in GPUEXPLORE, no isolated performance evaluation
has been done in the past. Instead of Cuckoo hashing, the hash table implemented for GPUEXPLORE
uses another way to resolve collisions and instead of operating on 64-bit elements, it supports elements
of arbitrary length.
This paper addresses reports the results of a performance evaluation between Cuckoo hashing and
GPUEXPLORE hashing. We analyse Cuckoo hashing and GPUEXPLORE hashing with both random and
non-random datasets. The non-random data originates from actual state space explorations performed by
a model checker, in which the revisiting of states has been recorded. This data allows us to compare the
applicability of the two hash tables for state space exploration.
The structure of the paper is as follows. Related work is discussed in Section 2. In Section 3,
first a brief explanation of GPUs is given, which is particularly focussed on NVIDIA GPUs and the
CUDA programming interface, as we employed those GPUs for our experiments. Furthermore, Section
3 also provides an overview of the approach used in GPUEXPLORE to do model checking. A detailed
description of the two hash tables is given in Section 4. The experimental setup is explained in Section 5.
In Section 6, the results of the comparisons are discussed, and finally, conclusion and pointers to future
work are given in Section 7.
2 Related work
There are several papers introducing lock-free hash tables for GPUs [3, 13, 14, 6]. However, in those
papers only limited performance evaluations have been conducted. Specifically, the evaluations that
have been performed involved sequences of equally distributed random values. This is the case for the
performance evaluations conducted in the dissertation of Alcantara [1], and the performance evaluation
conducted by Misra and Chaudhur [13].
1
CUDPP library: http://cudpp.github.io.
N.W. Cassee & A.J. Wijs 3
The hash table described by Alcantara et al. [1, 3] uses Cuckoo hashing and is implemented in
the CUDA library CUDPP. In the work of Alcantara et al., different aspects of Cuckoo hashing are
thoroughly analysed. Various aspects influencing the performance of the hash table are analysed and
compared, including the size of the hash table and how long an insertion on the hash table tends to take.
However, no performance evaluation on non-random data for Cuckoo hashing on the GPU exists in the
literature.
Other lock-free hash tables are those proposed by Moazeni & Sarrafzadeh [14] and by Bordawekar
[6]. Unfortunately, to the best of our knowledge, no implementations are publicly available.
Regarding GPU accelerated model checking, relevant work is the paper by Bartocci et al. [5], in
which they introduce modifications of the SPIN model checker that take advantage of a GPU architecture.
They experience significant performance increases for larger problems, for which concurrency of the
GPU starts to pays off. The hash table they use is the one by Alcantara et al., which means that they are
restricted to analysing state spaces containing states that can be stored in 64 bits.
Edelkamp et al. [7] investigated the applicability of GPUs for probabilistic model checking, a pro-
cess in which matrix-vector multiplications are performed and systems of linear equations are solved.
Modifications to the algorithm such that these operations are performed on a GPU can cause speed-ups
of 18 times compared to executions on the CPU.
Furthermore, Wu. et al. investigated the use of GPUs for on-the-fly state space exploration using
Cuckoo hashing [29], and Wijs and Bo
ˇ
sna
ˇ
cki developed GPUEXPLORE, which uses a custom collision
revolvement scheme to maintain which states have already been visited. The latter tool achieves a per-
formance increase of, on average, 120 times compared to CPU based state space exploration, in cases
where state spaces consisting of at least a million states are analysed [23, 28].
Finally, Edelkamp et al. investigated the applicability of GPUs to speedup the generation of very
large state spaces using BFS [11]. In their work, Edelkamp et al. utilise a perfect hashing function to
keep track of the current depth of the BFS, so that during iterations over the state space, only the currently
open states are considered.
3 Background
3.1 CUDA
In the 2000s, development of more programmable GPUs started. The change from fixed pipeline graph-
ical units to programmable graphical processors opened up new possibilities for software developers.
GPUs have significantly more processing units than CPUs, allowing more instructions to be executed in
parallel.
The introduction of more programmable GPUs has made it possible to use GPUs for applications
other than graphical calculations. Before the arrival of general purpose graphics processors, the only
way to use GPUs for more general purpose applications was by transforming the problem to a problem of
triangles and polygons, and then solving this problem using the available graphics Software Development
Kits (SDK).
The increased capability of GPUs led Stanford researchers to define GPUs as stream processors. In
turn, this has made it possible to compile C code to be executed on the GPU. NVIDIA expanded upon
this concept and in 2006 released the first version of CUDA. CUDA is an SDK that allows code in a
high-level language such as C or C++ to be compiled for, and executed on, an NVIDIA GPU.
Unlike CPUs, GPUs are built to take large batches of data and execute the same, short, sequence of
operations in parallel on all elements of that data. Because of this, GPUs often consist of several hundreds
剩余14页未读,继续阅读
资源评论
weixin_38747444
- 粉丝: 9
- 资源: 912
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功