Redis Sorted Set

Redis 裡面有一種 data type 叫做 sorted set
簡單來說就是一堆 key:value，但這些資料會依照 value (score) 來排序
讓使用者可以快速取得某個 key 的排名，或是一個排名範圍(或是分數範圍)內的 key 們，等等的功能
詳細可以參考這個～
https://www.tutorialspoint.com/redis/redis_sorted_sets.htm

會寫這篇是因為看到這東西的實作比較特殊，想找找看這實作背後的原因

Redis sorted set 內部結構

當資料量小時，sorted set 使用的是 ziplist
超過一定量時則是使用 hash map + skip list

ziplist 以後有空再寫 http://zhangtielei.com/posts/blog-redis-ziplist.html

skip list 怎麼運作的也不特別講，維基百科或 google 看一下很好查到（https://en.wikipedia.org/wiki/Skip_list）

為什麼 Redis sorted set 使用 skip list

今天主要想看的是第二種實作的方法，一開始我看到這個實作，想說一般資料庫 range query 不都是靠 index 嗎(B+Tree、BTree)？
Redis 怎麼搞特殊跑來用 skip list XD
想要了解一下他這個選擇背後的想法

搜了搜看到這篇文 https://news.ycombinator.com/item?id=1171423

作者給了三個理由
以下會拿 BTree 和 B+Tree 來比較一下，對這三個理由研究一下，不然有些乍看之下還滿難懂的

They are not very memory intensive. It’s up to you basically. Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive than btrees.

第一個理由是關於記憶體用量的，skip list 相比 balance tree 來說記憶體用量是可以比較小的，印象中如果 p 用 1/4，平均是每個 node 會有 4/3 左右個 pointer 而已。
因為 Redis 是 in-memory 運作，節省記憶體這點可能還滿重要的，畢竟 memory 貴貴。

Btree 如果是每個 node 只包含一筆資料，那麼就是會有 2 個 pointer。
如果單一 node 有多筆資料，pointer 數量會變少沒錯，但問題是同一 node 的資料必須緊密排在一起，常常會有預留的空間，會浪費很多記憶體。
如果用的是 B+Tree 的話，只有 leaf nodes 有存資料，那 internal nodes 記憶體也是花很兇。
詳情請參考 https://en.wikipedia.org/wiki/B-tree

A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees.

而這句話翻譯下來就是：使用者大多是為了 range query 來用 sorted set 的，range query 時 skip list 的 cache locality 至少會跟其他平衡術一樣好 (不得不說這一句越看越不懂。

首先 skip list 就跟 linked list 一樣，在 cache locality 上是比 array 差的，因為資料不連續，cpu 讀資料時常常會把附近的資料一起讀進 cache，而 linked list 資料散佈在記憶體各處，cpu 比較無法把相關資料一次塞到 cache 上。

如果是一個 node 只有一筆資料的話，BTree 的 cache locality 確實與 linked list 類似。
不過 BTree 和 B+Tree 如果單一 node 有多筆資料的話 cache locality 應該會比較好才對（？
有人知道請告訴我 XD
以我的知識看起來在 cache locality 表現上應該是 BTrees 家族 >= skip list
Btree 家族會被廣泛用在 database index 上也是因為這個資料聚集的特性，使得在硬碟 io 次數可以減少很多。

(不過又有在網路上查到一些 cache friendly 的 skip list，搞得我很亂)

They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.

這個就比較好理解了
BTree 在 Treverse，可能會需要 stack 或是 parent pointer 的幫忙，要上上下下的跑，而 B+Tree 的話則是 Traverse 很方便，但跟 BTree 一樣在插入刪除修改時都可能要 split merge 之類的
而 skip list 的操作就是一個 linked list 操作，幾行就結束了。

Summary

目前看下來是第一個與第三個理由比較能說服我，第二個理由則是自己想覺得不太合理，網路上也沒什麼可以支持這個說法的相關資訊，作者也沒提供 benchmark，所以暫時保留。

不過我們可以從這些理由看出作者在選用 skip list 時是怎麼去思考的。
Redis 和其他 database 最大的差別是它是 in-memory 的，而其他用 BTree 家族的資料庫大多是存在硬碟上，考量的點就會不同。
如果 skip list 在 range query 時每個 node 都要去硬碟上撈一次應該會慢到天上去，但在記憶體內就差距不會太大。
如果 BTree 直接用在 memory 內又會消耗太多空間（前面提過的浪費與 pointer 數量），但用在硬碟上就比較沒問題，因為硬碟相對大又便宜。
總之就是一個 trade off。

Redis Sorted Set

Redis sorted set 內部結構

為什麼 Redis sorted set 使用 skip list

Summary

See Also