Mongodb Partial and Sparse Index

前幾天幫原本已經有資料的 collection 新建 unique index 的時候讓 Server 啟動爆掉了記錄一下

1
2
3


db.coll.insert({"a":1})
db.coll.insert({"a":2})
db.coll.createIndex({"b":1}, {unique: true}) // duplicate key error

在一個之前資料都沒有的 field 上建 unique index，結果噴出了下面的錯誤

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


{
	"operationTime" : Timestamp(1609329527, 19822385),
	"ok" : 0,
	"errmsg" : "E11000 duplicate key error collection: test.coll index: b_1 dup key: { b: null }",
	"code" : 11000,
	"codeName" : "DuplicateKey",
	"keyPattern" : {
		"b" : 1
	},
	"keyValue" : {
		"b" : null
	},
	"$clusterTime" : {
		"clusterTime" : Timestamp(1609329527, 19822385),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}

errmsg內大致是在說在 b_1 這個 index (幫 b 建立 index 時候 MongoDB 幫取的名字)上，發生了 b = null 的 duplicate key。因為前面塞進去的兩筆都沒有 b ，建立 index 時 MongoDB 自動把他們視為 null 值，所以兩筆資料就撞在一起了。

此時比較正常的解法是跑程式做資料庫的 schema migration，把之前的資料都補上該 field 的值。
但如果新加的 key 本來就不想與以前資料相容該怎辦哩?
有下面兩個解決方法

第一個是已經過時的用法，官方建議 mongodb 3.2版之後就建議使用第二種方式取代。
sparse: true 可以讓 MongoDB 建 index 時直接忽略沒有這個 field 的資料(不像上面把 field 的值當作 null )，所以就不會有 duplicate key error了。

1

db.coll.createIndex({"b":1}, {unique: true, sparse: true})

第二種是使用 partial index
Partial index 指的是用某些條件去決定是否要在該筆資料上建 index，下面這個例子就使用了 $exists: true 來過濾，只有資料內存在 b 這個 field 的資料，我們才在 b 上面建立 unique index。

1
2
3
4
5
6
7
8
9


db.coll.createIndex(
    {"b":1},
    {
        unique: true,
        partialFilterExpression:{
            "b": {$exists: true}
        }
    } 
) 

PartialFilterExpression 後面也可以接很多不同的條件
例如 “b”: {$gt: 1} ，就可以過濾出那些 a field 大於 1 的資料並在這些資料上建 index。
差不多就阿捏