AradixtreeimplementationinANSIC.zip资源-CSDN文库

共14个文件

c：5个

h：4个

md：2个

需积分: 5 188 浏览量 2024-01-05 14:28:07 上传评论收藏 52KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

A radix tree implementation in ANSI C.zip （14个子文件）

sss

rc4rand.h 145B

rax-oom-test.c 5KB

rax-test.c 36KB

rc4rand.c 4KB

rax.h 9KB

Makefile 517B

rax_oom_malloc.h 3KB

crc16.c 4KB

TODO.md 1KB

rax.c 75KB

.gitignore 33B

rax_malloc.h 2KB

README.md 20KB

COPYING 1KB

# Rax, an ANSI C radix tree implementation Rax is a radix tree implementation initially written to be used in a specific place of Redis in order to solve a performance problem, but immediately converted into a stand alone project to make it reusable for Redis itself, outside the initial intended application, and for other projects as well. The primary goal was to find a suitable balance between performances and memory usage, while providing a fully featured implementation of radix trees that can cope with many different requirements. During the development of this library, while getting more and more excited about how practical and applicable radix trees are, I was very surprised to see how hard it is to write a robust implementation, especially of a fully featured radix tree with a flexible iterator. A lot of things can go wrong in node splitting, merging, and various edge cases. For this reason a major goal of the project is to provide a stable and battle tested implementation for people to use and in order to share bug fixes. The project relies a lot on fuzz testing techniques in order to explore not just all the lines of code the project is composed of, but a large amount of possible states. Rax is an open source project, released under the BSD two clause license. Major features: * Memory conscious: + Packed nodes representation. + Able to avoid storing a NULL pointer inside the node if the key is set to NULL (there is an `isnull` bit in the node header). + Lack of parent node reference. A stack is used instead when needed. * Fast lookups: + Edges are stored as arrays of bytes directly in the parent node, no need to access non useful children while trying to find a match. This translates into less cache misses compared to other implementations. + Cache line friendly scanning of the correct child by storing edges as two separated arrays: an array of edge chars and one of edge pointers. * Complete implementation: + Deletion with nodes re-compression as needed. + Iterators (including a way to use iterators while the tree is modified). + Random walk iteration. + Ability to report and resist out of memory: if malloc() returns NULL the API can report an out of memory error and always leave the tree in a consistent state. * Readable and fixable implementation: + All complex parts are commented with algorithms details. + Debugging messages can be enabled to understand what the implementation is doing when calling a given function. + Ability to print the radix tree nodes representation as ASCII art. * Portable implementation: + Never does unaligned accesses to memory. + Written in ANSI C99, no extensions used. * Extensive code and possible states test coverage using fuzz testing. + Testing relies a lot on fuzzing in order to explore non trivial states. + Implementation of the dictionary and iterator compared with behavior-equivalent implementations of simple hash tables and sorted arrays, generating random data and checking if the two implementations results match. + Out of memory condition tests. The implementation is fuzzed with a special allocator returning `NULL` at random. The resulting radix tree is tested for consistency. Redis, the primary target of this implementation, does not use this feature, but the ability to handle OOM may make this implementation useful where the ability to survive OOMs is needed. + Part of Redis: the implementation is stressed significantly in the real world. The layout of a node is as follows. In the example, a node which represents a key (so has a data pointer associated), has three children `x`, `y`, `z`. Every space represents a byte in the diagram. +----+---+--------+--------+--------+--------+ |HDR |xyz| x-ptr | y-ptr | z-ptr |dataptr | +----+---+--------+--------+--------+--------+ The header `HDR` is actually a bitfield with the following fields: uint32_t iskey:1; /* Does this node contain a key? */ uint32_t isnull:1; /* Associated value is NULL (don't store it). */ uint32_t iscompr:1; /* Node is compressed. */ uint32_t size:29; /* Number of children, or compressed string len. */ Compressed nodes represent chains of nodes that are not keys and have exactly a single child, so instead of storing: A -> B -> C -> [some other node] We store a compressed node in the form: "ABC" -> [some other node] The layout of a compressed node is: +----+---+--------+ |HDR |ABC|chld-ptr| +----+---+--------+ # Basic API The basic API is a trivial dictionary where you can add or remove elements. The only notable difference is that the insert and remove APIs also accept an optional argument in order to return, by reference, the old value stored at a key when it is updated (on insert) or removed. ## Creating a radix tree and adding a key A new radix tree is created with: rax *rt = raxNew(); In order to insert a new key, the following function is used: int raxInsert(rax *rax, unsigned char *s, size_t len, void *data, void **old); Example usage: raxInsert(rt,(unsigned char*)"mykey",5,some_void_value,NULL); The function returns 1 if the key was inserted correctly, or 0 if the key was already in the radix tree: in this case, the value is updated. The value of 0 is also returned on out of memory, however in that case `errno` is set to `ENOMEM`. If the associated value `data` is NULL, the node where the key is stored does not use additional memory to store the NULL value, so dictionaries composed of just keys are memory efficient if you use NULL as associated value. Note that keys are unsigned arrays of chars and you need to specify the length: Rax is binary safe, so the key can be anything. The insertion function is also available in a variant that will not overwrite the existing key value if any: int raxTryInsert(rax *rax, unsigned char *s, size_t len, void *data, void **old); The function is exactly the same as raxInsert(), however if the key exists the function returns 0 (like raxInsert) without touching the old value. The old value can be still returned via the 'old' pointer by reference. ## Key lookup The lookup function is the following: void *raxFind(rax *rax, unsigned char *s, size_t len); This function returns the special value `raxNotFound` if the key you are trying to access is not there, so an example usage is the following: void *data = raxFind(rax,mykey,mykey_len); if (data == raxNotFound) return; printf("Key value is %p\n", data); raxFind() is a read only function so no out of memory conditions are possible, the function never fails. ## Deleting keys Deleting the key is as you could imagine it, but with the ability to return by reference the value associated to the key we are about to delete: int raxRemove(rax *rax, unsigned char *s, size_t len, void **old); The function returns 1 if the key gets deleted, or 0 if the key was not there. This function also does not fail for out of memory, however if there is an out of memory condition while a key is being deleted, the resulting tree nodes may not get re-compressed even if possible: the radix tree may be less efficiently encoded in this case. The `old` argument is optional, if passed will be set to the key associated value if the function successfully finds and removes the key. # Iterators The Rax key space is ordered lexicographically, using the value of the bytes the keys are composed of in order to decide which key is greater between two keys. If the prefix is the same, the longer key is considered to be greater. Rax iterators allow to seek a given element based on different operators and then to navigate the key space calling `raxNext()` and `raxPrev()`. ## Basic iterator usage Iterators are normally declared as local variables allocated on the stack, and then initialized with the `raxStart` function: raxIterator iter;

评论收藏

内容反馈