没有合适的资源?快使用搜索试试~ 我知道了~
Compacted Implementations of Deterministic Finite Automata
0 下载量 101 浏览量
2021-02-09
00:48:04
上传
评论
收藏 137KB PDF 举报
温馨提示
Automaton is a popular data structure with many applications. We present implementation methods of deterministic finite automaton that allow both efficient space usage and performance. The first method utilizes the succinct data structures to speedup the computation of the transition function. The method archives O(log log sigma) time state transition with little augmented memory where sigma is the size of the alphabet. The second method, namely overlapping table, reduces the space requirements
资源推荐
资源详情
资源评论
RESEARCH ARTICLE
Copyright © 2014 American Scientific Publishers
All rights reserved
Printed in the United States of America
Journal of
Computational and Theoretical Nanoscience
Vol. 11, 1–6, 2014
Compacted Implementations of
Deterministic Finite Automata
Meng Zhang
1 ∗
, Yi Zhang
2
, Wei Lv
1
, and Chen Hou
1
1
College of Computer Science and Technology, Jilin University, Changchun, China
2
Department of Computer Science, Jilin Business and Technology College, Changchun, China
Automaton is a popular data structure with many applications. We present implementation methods
of deterministic finite automaton that allow both efficient space usage and performance. The first
method utilizes the succinct data structures to speedup the computation of the transition function.
The method archives O(log log ) time state transition with little augmented memory where is the
size of the alphabet. The second method, namely overlapping table, reduces the space require-
ments of automata implemented by the state transition table approach. It makes the transition tables
of states share the overlapping address space thus reduces the space usage. The method has an
O(1) time state transition, while using fewer memory.
Keywords: Automaton, Compression, Succinct Data Structure.
1. INTRODUCTION
The deterministic finite automaton (DFA) is one of the
most popular data structures.
1 2
It is used in many fields
of computer science, such as data compression, pattern
matching, regular expression matching and text indexing.
Several DFAs used in these applications have been pro-
posed, such as Aho-Corasick automata,
3
automata for rec-
ognizing regular expressions,
4
suffix automata
5 6
and the
factor oracle.
7
The AC automaton solves the multiple pat-
tern matching problem in Onlog time ( is the size of
the alphabet, n is the length of the input text) which gen-
eralizes the Knuth–Morris–Pratt algorithm.
8
The AC algo-
rithm is extensively used in practice, such as deep packet
inspection. The suffix automaton presented by Blumer
et al.
5
is the minimal deterministic automaton accepting
the set of suffixes of a text.
6
Suffix automaton is one of
the core data structures of several efficient string match-
ing algorithms.
9
The Factor Oracle
7
is derived from the
DAWG. It can recognize more than the exact factors of a
string to achieve simplicity and low memory requirements.
The factor oracle is also applied in pattern matching, data
compression and machine learning.
The high space usage of automata limits their appli-
cability. In this paper, we focus on the implementation
of automata that allows both efficient storage and usage.
We don’t study the techniques for compacting specific
automata, such as reducing the number of states or edges
∗
Author to whom correspondence should be addressed.
or using the NFA (non-determined finite automaton) to
simulating the DFA. The problem that we consider is
to represent the general DFA space economically while
not slowing down the performance. Our techniques are
general-purpose that can be used in representing any type
of DFAs. We will use automata to refer to DFAs in
the rest of the paper. We give two representing meth-
ods. The first one utilizes the succinct data structures
10
to speedup the computation of the transition function,
archiving O(log log ) running time. The second one,
namely overlapping table, reduces the space requirements
of automata implemented by the state transition table
approach. In the case of constant alphabets, each state
in a DFA has its own -entry transition table. There are
empty entries in the transition tables. Our approach makes
the transition tables overlap, that is, tables share some
space such that an empty entry of one table is used by
a non-empty entry of another table. To solve the prob-
lem of determining the ownership of table entries, we
introduce a method that uses log -bit labels to identify
each table. The total memory of the tables is reduced
by reusing empty entries. The problem of generating the
minimal overlapping table is an optimization problem.
This problem can be reduced to the shortest common
superstring problem (SCS for short). Given a substring-
free set of strings P, the SCS asks for a shortest com-
mon superstring of P , that is, a minimum-length string
containing all strings from P as substrings. In computa-
tional biology,
11–13
the SCS is used for the DNA frag-
ment assembly problem.
11
The SCS is NP hard
14
and even
J. Comput. Theor. Nanosci. 2014, Vol. 11, No. 3 1546-1955/2014/11/001/006 doi:10.1166/jctn.2014.3443 1
资源评论
weixin_38747592
- 粉丝: 6
- 资源: 937
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- (源码)基于Java的DVD租赁管理系统.zip
- (源码)基于Arduino的模型铁路控制系统.zip
- (源码)基于C语言STM32F10x框架的温湿度监控系统.zip
- (源码)基于Spring Boot的极简易课堂对话系统.zip
- (源码)基于JSP+Servlet+MySQL的学生管理系统.zip
- (源码)基于ESP8266的蜂箱监测系统.zip
- (源码)基于Spring MVC和Hibernate框架的学校管理系统.zip
- (源码)基于TensorFlow 2.3的高光谱水果糖度分析系统.zip
- (源码)基于Python框架库的知识库管理系统.zip
- (源码)基于C++的日志管理系统.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功