没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
ben stopford
Gently Flexing the Grid
Log Structured Merge Trees
It’s nearly a decade since Google released its ‘Big Table’ paper. One of the many cool aspects of that paper was the file
organisation it uses. The approach is more generally known as the Log Structured Merge Tree, after this 1996 paper, although the
algorithm described there differs quite significantly from most real-world implementations.
LSM is now used in a number of products as the main file organisation strategy. HBase, Cassandra, LevelDB, SQLite, even MongoDB
3.0 comes with an optional LSM engine, after it’s acquisition of Wired Tiger.
What makes LSM trees interesting is their departure from binary tree style file organisations that have dominated the space for
decades. LSM seems almost counter intuitive when you first look at it, only making sense when you closely consider how files work
in modern, memory heavy systems.
Some Background
In a nutshell LSM trees are designed to provide better write throughput than traditional B+ tree or ISAM approaches. They do this
by removing the need to perform dispersed, update-in-place operations.
So why is this a good idea? At its core it’s the old problem of disks being slow for random
operations, but fast when accessed sequentially. A gulf exists between these two types of access, regardless of whether the disk is
magnetic or solid state or even, although to a lesser extent, main memory.
The figures in this ACM report here/here make the point well. They show that, somewhat counter intuitively, sequential disk access is
faster than randomly accessing main memory. More relevantly they also show sequential access to disk, be it magnetic or SSD, to be
at least three orders of magnitude faster than random IO. This means random operations are to be avoided. Sequential access is well
worth designing for.
So with this in mind lets consider a little thought experiment: if we are interested in write throughput, what is the best method to
use? A good starting point is to simply append data to a file. This approach, often termed logging, journaling or a heap file, is fully
sequential so provides very fast write performance equivalent to theoretical disk speeds (typically 200-300MB/s per drive).
Benefiting from both simplicity and performance log/journal based approaches have rightfully become popular in many big data
tools. Yet they have an obvious downside. Reading arbitrary data from a log will be far more time consuming than writing to it,
involving a reverse chronological scan, until the required key is found.
This means logs are only really applicable to simple workloads, where data is either accessed in its entirety, as in the write-ahead log
of most databases, or by a known offset, as in simple messaging products like Kafka.
So we need more than just a journal to efficiently perform more complex read workloads like key based access or a range search.
Broadly speaking there are four approaches that can help us here: binary search, hash, B+ or external.
1. Search Sorted File: save data to a file, sorted by key. If data has defined widths use Binary search. If not use a page index +
scan.
2. Hash: split the data into buckets using a hash function, which can later be used to direct reads.
3. B+: use navigable file organisation such as a B+ tree, ISAM etc.
4. External file: leave the data as a log/heap and create a separate hash or tree index into it.
All these approaches improve read performance significantly ( n->O(log(n)) in most). Alas these structures add order and that order
impedes write performance, so our high speed journal file is lost along the way. You can’t have your cake and eat it I guess.
资源评论
奋斗的松鼠
- 粉丝: 276
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功