跳表-原理篇

什么是跳表

跳表（也可以叫跳跃表，英文名：Skip List）是由William Pugh发明的一种查找数据结构，支持对数据的快速查找、插入和删除

论文地址：skiplists

论文开篇就说明了这是一种可以用来代替平衡树的数据结构（平衡树的实现实在是太复杂了😩）

与平衡树相比，跳表的插入和删除算法要简单得多

BST存在的问题

Binary trees can be used for representing abstract data types such as dictionaries and ordered lists. They work well when the elements are inserted in a random order. Some sequences of operations, such as inserting the elements in order, produce degenerate data structures that give very poor performance. If it were possible to randomly permute the list of items to be in serted, trees would work well with high probability for any in put sequence. In most cases queries must be answered on-line, so randomly permuting the input is impractical. Balanced tree algorithms re-arrange the tree as operations are performed to maintain certain balance conditions and assure good perfor mance.

论文中提到，对于二叉搜索树而言，当插入的数据是顺序的时候，会产生退化

二叉搜索树（BST，Binary Search Tree），是一种具有特殊排序性质的二叉树，通过在插入数据时保证顺序以便后续能快速检索

理想情况下BST呈现类似以下的结构，能够达到类似二分查找的效果

graph TD;
    A[7] --> B[3];
    A --> C[9];
    B --> D[1];
    B --> E[5];
    E --> F[4];
    E --> G[6];
    C --> H[8];
    C --> I[10];

但是当插入的数据序列就是有序的时候，可能会出现退化成链表的情况

graph TD;
    A[2] --> B[1];
    A --> C[3];
    C --> D[4];
    D --> E[5];
    E --> F[6];

二叉搜索树可视化

关于退化的问题也可以看看小灰的漫画讲解

漫画：什么是跳跃表？

跳表的结构

针对BST会退化成链表的问题，或者说针对链表我们想要提高查询效率，使用二分查找的思想我们可以建立多级索引的概念

原论文中就有提到像AVL等强制平衡（结构定义比较复杂，每次操作都要保证结构的正确性），导致可能最终的性能很好，但是代码的实现难度可能直接导致其他人无法维护和扩展

论文原文是这么描述的：

Second, if the algorithm is complex, programmers are de terred from implementing optimizations. For example, bal anced tree algorithms are normally described using recursive insert and delete procedures, since that is the most simple and intuitive method of describing the algorithms. A recursive in sert or delete procedure incurs a procedure call overhead. By using non-recursive insert and delete procedures, some of this overhead can be eliminated. However, the complexity of non recursive algorithms for insertion and deletion in a balanced tree is intimidating and this complexity deters most program mers from eliminating recursion in these routines.

…

From a theoretical point of view, there is no need for skip lists. Balanced trees can do everything that can be done with skip lists and have good worst-case time bounds (unlike skip lists). However, implementing balanced trees is an exacting task and as a result balanced tree algorithms are rarely imple mented except as part of a programming assignment in a data structures class. Skip lists are a simple data structure that can be used in place of balanced trees for most applications. Skip lists algo rithms are very easy to implement, extend and modify. Skip lists are about as fast as highly optimized balanced tree algo rithms and are substantially faster than casually implemented balanced tree algorithms.

跳表设计上的一个巧妙的点就在于不强制维护索引层级（就是这个设计既兼顾了性能也简化了代码实现）

插入节点的时候随机生成索引层级数

Initialization（初始化）

An element NIL is allocated and given a key greater than any legal key. All levels of all skip lists are terminated with NIL. A new list is initialized so that the the level of the list is equal to 1 and all forward pointers of the list’s header point to NIL.

分配一个NIL节点（这个节点的键大于任何元素），所有的节点的所有级别的指针最终都以NIL为终点

Search（查找）

通过遍历前进指针来搜索元素

注意：遍历时不会越过正在搜索的元素的节点，当当前层级无法找到指定元素时，需要向下一层级搜索

算法伪代码

for i := list->level downto 1 do
	// 第一层循环，从上到下的层级遍历
	while x->forward[i]->key < searchKey do
		// 第二层循环，从左到右遍历，只有节点小于searchKey才能前进（因为是从上到下遍历，节点小于说明searchKey对应的节点一定在该节点 之后，所以可以前进）
		x := x->forward[i]

逐步解析

以搜索12节点为例子

符号说明
NIL：终点（表示无穷大，搜索到这个节点表示必须向下搜索，即本层遍历结束）
HEAD：头节点
x：当前节点
x.forward[i]：表示x的第i+1层的下一个节点（i+1层的前进指针）
x.forward[i].key：表示x的第i+1层的下一个节点的key

初始化x=HEAD
x.forward[3].key（也就是节点【6】），6小于12，x = x.forward[3]（前进）
x.forward[3].key（也就是节点【NIL】），NIL表示终点（无穷大），i--（向下搜索）
x.forward[2].key（也就是节点【25】），25大于12，i--（向下搜索）
x.forward[1].key（也就是节点【9】），9小于12，x = x.forward[1]（前进）
x.forward[1].key（也就是节点【25】），25大于12，i--（向下搜索）
x.forward[0].key（也就是节点【12】），12等于12，i--（向下搜索）
i<0表示没有在下一层了，遍历（循环）结束，x停留在节点【12】的前一个节点（也就是节点【9】）
x.forward[0].key是否等于12，如果是则说明找到目标，不是则表示目标不在跳表内

可以看到，整个遍历的过程是一个找前驱节点的过程，找到等于目标值的节点的前一个节点

随机函数（Random Level）

跳表使用随机的方式来生成新节点的层高

Insertion（插入）

理解了搜索的过程，插入操作也很好理解，本质就是找到前一个节点并将值插入的过程（不熟悉的建议复习下链表的插入操作😁）

算法伪代码

先上论文的伪代码

Insert(list, searchKey, newValue)
	local update[1...MaxLevel]
	x := list->header
	// 这一步和搜索的代码一样，找到前驱节点
	for i := list->level downto 1 do
		while x->forward[i].key < searchKey do
			x := x->forward[i]
		-- x->key < searchKey <= x->forward[i]->key
		// 记录了遍历路径中每层的最后一个节点
		update[i] := x
	x := x->forward[i]
	// 如果相同则直接更新值
	if x->key = searchKey then x->value := newValue
	else
		// 随机生成新节点的层级
		|v| := randomLevel()
		// 如果新节点的层级大于当前跳表的最高层级
		// 记录相应的信息方便后面统一更新节点
		// 还要更新当前跳表的最高层级
		if |v| > list->level then
			for i := list->level+1 to |v| do
				update[i] := list->header
			list->level := |v|
		x := makeNode(|v|, searchKey, value)
		// 和常规的链表插入操作类似
		// 1. 新节点连接前驱节点的下一个节点
		// 2. 前驱节点在连接新节点
		for i := 1 to level do
			x->forward[i] := update[i]->forward[i]
			update[i]->forward[i] := x

逐步解析

以下是论文中关于插入节点【17】的示意图

对着论文的示意图我们逐步分析代码的执行过程

符号说明
NIL：终点（表示无穷大，搜索到这个节点表示必须向下搜索，即本层遍历结束）
HEAD：头节点
x：当前节点
x.forward[i]：表示x的第i+1层的下一个节点（i+1层的前进指针）
x.forward[i].key：表示x的第i+1层的下一个节点的key
update：update数组，记录了遍历过程中每一层的最后一个节点（从哪个节点往下遍历的，这些节点是需要同步更新的）

初始化x=HEAD，初始化update数组（直接初始化最大的层级数的大小，后续不用动态扩缩）

遍历过程中与搜索的一样，只是在向下搜索之前将节点记录到update

|V| <= list->level

新节点层级不超过当前跳表的最高层级

这种情况下我们只需要更新新节点对应的层级上的前驱节点即可

|V| > list->level

新节点层级超过当前跳表的最高层级

遍历过程中由于并不经过HEAD所以update中不会有记录，而当新节点的层级超过当前跳表最高层级的时候又需要更新HEAD的前进指针，为了后续操作逻辑统一，单独执行一遍循环赋值，这就是以下代码的含义

Deletion（删除）

算法伪代码

逐步解析

跳表节点的删除也非常的简单，和链表的删除操作相同，只是多了多层级节点的更新的操作

搜索前驱节点

更新所有相关层级（update）的指针（断开与被删除节点的连接）

...
for i := 1 to list->level do
	// 判断是否需要更新，如果前一个节点连接的不是被删除节点，那就不需要更新，可以提前中止循环
	if update[i]->forward[i] != x then break
	update[i]->forward[i] := x->forward[i]
// 显式释放
free(x)
...

从上到下更新HEAD指针，跳表层级更新

...
// 通过另一个循环来更新当前跳表的最高层级
// 职责单一（一个循环更新节点指针，一个循环更新层级数），代码可读性更好
while list->level > 1 and
	list->header->forward[list->level] = NIL do
	list->level := list->level-1

参考资料

跳表可视化：https://gallery.selfboot.cn/zh/algorithms/skiplist

跳表论文：https://15721.courses.cs.cmu.edu/spring2018/papers/08-oltpindexes1/pugh-skiplists-cacm1990.pdf

跳表 - OI Wiki

漫画：什么是跳跃表？