DataStructuresforTextSequences资源-CSDN文库

win32

4星 · 超过85%的资源需积分: 10 41 浏览量 2009-03-30 22:56:21 上传评论收藏 330KB PDF 举报

资源推荐

资源详情

资源评论

Data Structures for Text Sequences

Charles Crowley

University of New Mexico



June 10, 1998

Abstract

The data structure used ot maintain the sequence of characters is an important part of a text

editor. This paper investigates and evaluates the range of possible data structures for text

sequences. The ADT interface to the text sequence comp onent of a text editor is examined.

Six common sequence data structures (array, gap, list, line p ointers, xed size buers and piece

tables) are examined and then a general model of sequence data structures that encompasses

all six structures is presented. The piece table metho d is explained in detail and its advantages

are presented. The design space of sequence data structures is examined and several variations

on the ones listed above are presented. These sequence data structures are compared exp eri-

mentally and evaluated based on a number of criteria. The exp erimental comparison is done by

implementing each data structure in an editing simulator and testing it using a synthetic load

of many thousands of edits. We also report on experiments on the senstivity of the results to

variations in the parameters used to generate the synthetic editing load.

1 Intro duction

The central data structure in a text editor is the one that manages the sequence of characters that

represents the current state of the le that is b eing edited. Every text editor requires such a data

structure but b o oks on data structures do not cover data structures for text sequences. Articles on

the design of text editors often discuss the data structure they use [1, 3, 6, 8 , 11, 12] but they do

not cover the area in a general way. This article is concerned with such data structures.

Figure 1 shows where sequence data structures t in with other data structures. Some ordered

sets are ordered by something intrinsic in the items in the sets (e.g., the value of an integer, the

lexicographic p osition of a string) and the p osition of an inserted item dep ends on its value and

the values of the items already in the set. Such data structures are mainly concerned with fast

searching. Data structures for this typ e of ordered set have b een studied extensively.

The other p ossibility is for the order to b e determined by where the items are placed when they are

inserted into the set. If insert and delete is restricted to the two ends of the ordering then you have

a deque. For deques, the two basic data structures are an array (used circularly) and a linked list.

Nothing b eyond this is necessary due to the simplicity of the ADT interface to deques. If you can

insert and delete items from anywhere in the ordering you have a sequence. An imp ortant sub class



Author's address: Computer Science Department, University of New Mexico, Albuquerque, New Mexico 87131,

oce: 505-277-5446, messages: 505-277-3112, fax: 505-277-0813, email: crowley@unmvax.cs.unm.edu

Linked

List

Array Gap Piece

tables

Ordered sets

(with inserts,

deletes and lookups)

Ordered by

where inserted

Insert/delete

at ends only

(Deque)

Unrestricted

insert/delete

(Sequence)

Ordered by

item attributes

Tree Heap Hash

table

. . .

Abstract

data type

Data

structure

Linked

List

Array

Fixed size

buffers

Line

spans

Figure 1: Ordered sets

is sequences where reading an item in the sequence (by p osition numb er) is extremely lo calized.

This is the case for text editors and it is this sub class that is examined in this pap er.

A linked list and an array are the two obvious data structures for a sequence. Neither is suitable

for a general purp ose text editor (a linked list takes up to o much memory and an array is to o slow

b ecause it requires to o much data movement) but they provide useful base cases on which to build

more complex sequence data structures. The gap metho d is a simple extension of an array, it is

simply an array with a gap in the middle where characters are inserted and deleted. Many text

editors use the gap metho d since it is simple and quite ecient but the demands on a mo dern text

editor (multiple les, very large les, structured text, sophisticated undo, virtual memory, etc.)

encourage the investigation of more complicated data structures which might handle these things

more eectively.

The more sophisticated sequence data structures keep the sequence as a recursive sequence of spans

of text. The line span metho d keeps each line together and keeps an array or a linked list of line

p ointers. The xed buer metho d keeps a linked list of xed size buers each of which is partially

full of text from the sequence. Both the line span metho d and the xed buer metho d have b een

used for many text editors.

A less commonly used metho d is the piece table metho d which keeps the text as a sequence of

\pieces" of text from either the original le and an \added text" le. This metho d has many

advantages and these will b ecome clear as the metho ds are presented in detail and analyzed. A

ma jor purp ose of this pap er is to describ e the piece table metho d and explain why it is a go o d data

structure for text sequences.



Sequence Empty( );



ReturnCode Insert( Sequence *sequence, Position position, Item ch );



ReturnCode Delete( Sequence *sequence, Position position );



Item ItemAt( Sequence *sequence, Position position );

| This do es not actually require a

p ointer to a Sequence since no change to the sequence is b eing made but we exp ect that they

will b e large structures and should not b e passing them around. I am ignoring error returns

(e.g., p osition out of range) for the purp oses of this discussion. These are easily added if

desired.



ReturnCode Close( Sequence *sequence );

Many variations are p ossible. The next few paragraphs discuss some of them.

Any practical interface would allow the sequence to b e initialized with the contexts of a le. In

theory this is just the

Empty

op eration followed by an

Insert

op eration for each character in the

initializing le. Of course, this is to o inecient for a real text editor.

Instead we would have a

NewSequence

op eration:



Sequence NewSequence( char * le

name );

| The sequence is initialized with the contents

of the le whose name is contained in `le

name'.

Usually the

Delete

op eration will delete any logically contiguous subsequence



ReturnCode Delete( Sequence *sequence, Position beginPosition, Position endPosition );

Sometimes the

Insert

op eration will insert a subsequence instead of just a single character.



ReturnCode Insert( Sequence *sequence, Position position, Sequence sequenceToInsert );

Sometimes

Copy

and

Move

are separate operations (instead of b eing comp osed of

Inserts

and

Deletes



ReturnCode Copy( Sequence *sequence, Position fromBegin, Position fromEnd, Position

toPosition );



ReturnCode Move( Sequence *sequence, Position fromBegin, Position fromEnd, Position

toPosition );

Replace

op eration that subsumes

Insert

and

Delete

in another p ossibility.



ReturnCode Replace( Sequence *sequence, Position fromBegin, Position fromEnd, Sequence

sequenceToReplaceItWith );

Finally the

ItemAt

pro cedure could retrieve a subsequence.

Although this is the method I use in my text editor simulator describ ed later.

剩余28页未读，继续阅读

评论收藏

内容反馈

dianba8

2014-06-23

找到了我需要的内容，感觉还是挺有用的，点赞！

keminlau

粉丝: 411
资源: 32

Data Structures for Text Sequences

data structures for text sequences.pdf

data structures for text sequences.zip

数据仓库和数据分析的发展新趋势

Elixir Cookbook PACKT 2015

Fluent.Python.1491946008

隐马尔科夫模型的分析和应用

Natural Language Processing and Text Mining

H.265/HEVC标准白皮书（2013年1月）

golang使用leetcode请求失败-Python365:Python365天精进计划

Python程序设计（第二版）.chm

数位板压力测试

python3.6.5参考手册 chm

The Art of Assembly Language Programming

python programming

第十五届蓝桥杯大赛软件赛省赛C++B组题目

C/C++中文参考手册离线最新版

代码随想录-八股文 pdf

编译器（gcc、g++）

Qt5.9 C++开发指南.pdf 及示例源码

Qt （高仿Visio）流程图组件开发，源码分享

mingw-w64-install.exe

Qt、QCustomPlot、实时波形绘制、实时曲线绘制

C/C++中文帮助文档

GitKrakenSetup-6.5.1 版本，包括win和linux

2023蓝桥杯C++A组省赛真题

第十五届蓝桥杯大赛软件赛省赛-C++A组题目

QT7.0.2，2022.05最新版本，包含openssl1.1.1和WebEngine等

C++面试八股文深度总结

PUBG吃鸡罗技鼠标宏

最新资源