MongoDB副本集搭建

概述

本文参照 MongoDB 官方文档(https://www.mongodb.com/docs/manual/tutorial/deploy-replica-set/#std-label-server-replica-set-deploy)进行实践
基于单机环境,启动3个mongod进程模拟搭建mongo副本集(replica set)

实践环境

使用单台机器启动多个mongod进程模拟集群

操作系统(OS):

1
2
$ cat /etc/redhat-release 
CentOS Linux release 7.9.2009 (Core)

使用Docker构建Mongo环境

Docker版本:

1
2
$ docker --version
Docker version 1.13.1, build 7d71120/1.13.1

Mongo镜像版本:

1
2
$ docker images | grep mongo
docker.io/mongo 4.2-bionic e301407a044e 6 months ago 388 MB

一些建议

Hostnames

官方建议使用 DNS 代替直接使用 IP 地址来配置节点信息

MongoDB 5.0开始,如果节点只配置一个 IP 地址则启动时就会校验并且无法启动

Use hostnames instead of IP addresses to configure clusters across a split network horizon. Starting in MongoDB 5.0, nodes that are only configured with an IP address will fail startup validation and will not start.

前置配置

配置域名映射

分别为3个mongod进程的hostname配置域名映射

1
$ vim /etc/hosts

追加以下域名配置

1
2
3
127.0.0.1 mongodb0.example.net
127.0.0.1 mongodb1.example.net
127.0.0.1 mongodb2.example.net

创建目录与文件

创建数据目录
1
$ mkdir /data/mongod0 /data/mongod1 /data/mongod2
创建配置文件
样例配置文件

取自镜像中的配置模板文件/etc/mongod.conf.orig

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# mongod.conf

# for documentation of all options, see:
# http://docs.mongodb.org/manual/reference/configuration-options/

# Where and how to store data.
storage:
# 数据目录
dbPath: /data/mongod0
# engine:
# mmapv1:
# wiredTiger:

# where to write logging data.
systemLog:
# 日志保存到文件
destination: file
# 如果日志文件存在则进行日志追加
logAppend: true
# 日志文件路径
path: /var/log/mongodb/mongod0.log

# network interfaces
net:
port: 17000
bindIp: 0.0.0.0
# 和bindIp: ::,0.0.0.0效果相同
# bindIpAll: true

# how the process runs
processManagement:
# fork子进程的形式启动,此参数用于后台启动服务
fork: true

replication:
# 副本集名称,副本集部署必须填写,用于区分所属的副本集
replSetName: "rs-example-0"

根据模板配置文件,修改storage.dbpathsystemLog.pathnet.port,分别给三个不同的mongod实例使用,例如:

1
2
3
4
5
$ ls -lh /etc/mongod
total 12K
-rw-r--r--. 1 root root 707 Sep 17 07:47 mongod0.conf
-rw-r--r--. 1 root root 707 Sep 17 07:47 mongod1.conf
-rw-r--r--. 1 root root 707 Sep 17 07:47 mongod2.conf

步骤

依次启动mongod进程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 使用--config或-f参数指定配置文件
$ mongod -f /etc/mongod/mongod0.conf
about to fork child process, waiting until server is ready for connections.
forked process: 643
child process started successfully, parent exiting

$ mongod -f /etc/mongod/mongod1.conf
about to fork child process, waiting until server is ready for connections.
forked process: 692
child process started successfully, parent exiting

$ mongod -f /etc/mongod/mongod2.conf
about to fork child process, waiting until server is ready for connections.
forked process: 730
child process started successfully, parent exiting
1
2
3
4
5
$ ps -ef | grep mongod
root 643 0 2 07:42 ? 00:00:12 mongod -f mongod0.conf
root 692 0 2 07:49 ? 00:00:00 mongod -f mongod1.conf
root 730 0 8 07:49 ? 00:00:01 mongod -f mongod2.conf
root 769 6 0 07:50 ? 00:00:00 grep mongod

成功启动三个mongod进程

连接其中一个mongod实例

Connect mongosh to one of the mongod instances.

官网使用mongosh的命令行工具进行连接,这里直接使用mongo的二进制工具,注意指定连接主机和端口,默认使用27017

尝试使用mongo命令连接mongod0实例

1
2
3
4
5
6
$ mongo --host mongodb0.example.net --port 17000
MongoDB shell version v4.2.24
connecting to: mongodb://127.0.0.1:17000/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("f81fbaff-63d9-45fa-8043-8c5c6cf97105") }
MongoDB server version: 4.2.24
...

连接成功

初始化副本集

Initiate the replica set.

只需要在其中一个节点执行rs.initiate()

1
2
3
4
5
> rs.initiate({"_id": "rs-example-0", "members": [{"_id": 0, "host": "mongodb0.example.net:17000"}]})
{ "ok" : 1 }
rs-example-0:OTHER>

rs-example-0:PRIMARY>

以上步骤中,将mongod0实例加入到名为rs-example-0的副本集中,成功后观察到命令行提示符显示角色为 OTHER ,再次回车后则变为 PRIMARY

再使用rs.add()命令依次添加已有节点加入副本集

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
rs-example-0:PRIMARY> rs.add({"_id": 1, "host": "mongodb1.example.net:17001"})
{
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1694939436, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1694939436, 1)
}

rs-example-0:PRIMARY> rs.add({"_id": 2, "host": "mongodb2.example.net:17002"})
{
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1694939337, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1694939337, 1)
}

也可以用rs.initiate()传入所有节点信息直接完成初始化

常见问题
空参调用rs.initiate()报错,hostname配置错误
1
2
3
4
5
6
7
> rs.initiate()
{
"ok" : 0,
"errmsg" : "No host described in new configuration 1 for replica set rs-example-0 maps to this node",
"code" : 93,
"codeName" : "InvalidReplicaSetConfig"
}

解析:当空参调用时,mongodb使用本机的hostname作为默认的配置,如果hostname无法作为DNS解析的话,那么就会报找不到对应的主机

查询本机hostname配置

1
2
3
4
5
6
7
8
9
10
# 当前配置
$ hostname
Crayon

# 修改
$ hostname mongodb0.example.net

# 查看是否修改成功
$ hostname
mongodb0.example.net

再次执行rs.initiate()

1
2
3
4
5
6
> rs.initiate()
{
"info2" : "no configuration specified. Using a default configuration for the set",
"me" : "mongodb0.example.net:17000",
"ok" : 1
}
缺少members参数
1
2
3
4
5
6
7
> rs.initiate({_id: "rs"})
{
"ok" : 0,
"errmsg" : "Missing expected field \"members\"",
"code" : 93,
"codeName" : "InvalidReplicaSetConfig"
}

添加members参数即可

_id与配置文件中的副本集名称不一致
1
2
3
4
5
6
7
> rs.initiate({_id: "rs", "members": []})
{
"ok" : 0,
"errmsg" : "Attempting to initiate a replica set with name rs, but command line reports rs-example-0; rejecting",
"code" : 93,
"codeName" : "InvalidReplicaSetConfig"
}
副本集节点数不足
1
2
3
4
5
6
7
> rs.initiate({"_id": "rs-example-0", "members": []})
{
"ok" : 0,
"errmsg" : "Replica set configuration contains 0 members, but must have at least 1 and no more than 50",
"code" : 93,
"codeName" : "InvalidReplicaSetConfig"
}

副本集要求至少1个节点,至多50个节点

查看副本集配置

View the replica set configuration.

使用rs.conf()可以查看当前节点所在副本集的配置信息

点我展开
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
rs-example-0:PRIMARY> rs.conf()
{
"_id" : "rs-example-0",
"version" : 5,
"protocolVersion" : NumberLong(1),
"writeConcernMajorityJournalDefault" : true,
"members" : [
{
"_id" : 0,
"host" : "mongodb0.example.net:17000",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {

},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "mongodb2.example.net:17002",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {

},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 1,
"host" : "mongodb1.example.net:17001",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {

},
"slaveDelay" : NumberLong(0),
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"catchUpTimeoutMillis" : -1,
"catchUpTakeoverDelayMillis" : 30000,
"getLastErrorModes" : {

},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
},
"replicaSetId" : ObjectId("6506b39ed951f6176dfb632d")
}
}
查看副本集状态

使用rs.status()可以查看当前副本集状态

点我展开
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
rs-example-0:SECONDARY> rs.status()
{
"set" : "rs-example-0",
"date" : ISODate("2023-09-17T08:49:41.493Z"),
"myState" : 2,
"term" : NumberLong(1),
"syncingTo" : "mongodb0.example.net:17000",
"syncSourceHost" : "mongodb0.example.net:17000",
"syncSourceId" : 0,
"heartbeatIntervalMillis" : NumberLong(2000),
"majorityVoteCount" : 2,
"writeMajorityCount" : 2,
"optimes" : {
"lastCommittedOpTime" : {
"ts" : Timestamp(1694940575, 1),
"t" : NumberLong(1)
},
"lastCommittedWallTime" : ISODate("2023-09-17T08:49:35.387Z"),
"readConcernMajorityOpTime" : {
"ts" : Timestamp(1694940575, 1),
"t" : NumberLong(1)
},
"readConcernMajorityWallTime" : ISODate("2023-09-17T08:49:35.387Z"),
"appliedOpTime" : {
"ts" : Timestamp(1694940575, 1),
"t" : NumberLong(1)
},
"durableOpTime" : {
"ts" : Timestamp(1694940575, 1),
"t" : NumberLong(1)
},
"lastAppliedWallTime" : ISODate("2023-09-17T08:49:35.387Z"),
"lastDurableWallTime" : ISODate("2023-09-17T08:49:35.387Z")
},
"lastStableRecoveryTimestamp" : Timestamp(1694940515, 1),
"lastStableCheckpointTimestamp" : Timestamp(1694940515, 1),
"members" : [
{
"_id" : 0,
"name" : "mongodb0.example.net:17000",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1144,
"optime" : {
"ts" : Timestamp(1694940575, 1),
"t" : NumberLong(1)
},
"optimeDurable" : {
"ts" : Timestamp(1694940575, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2023-09-17T08:49:35Z"),
"optimeDurableDate" : ISODate("2023-09-17T08:49:35Z"),
"lastHeartbeat" : ISODate("2023-09-17T08:49:41.167Z"),
"lastHeartbeatRecv" : ISODate("2023-09-17T08:49:40.819Z"),
"pingMs" : NumberLong(0),
"lastHeartbeatMessage" : "",
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"electionTime" : Timestamp(1694938014, 2),
"electionDate" : ISODate("2023-09-17T08:06:54Z"),
"configVersion" : 5
},
{
"_id" : 1,
"name" : "mongodb1.example.net:17001",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1178,
"optime" : {
"ts" : Timestamp(1694940575, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2023-09-17T08:49:35Z"),
"syncingTo" : "mongodb0.example.net:17000",
"syncSourceHost" : "mongodb0.example.net:17000",
"syncSourceId" : 0,
"infoMessage" : "",
"configVersion" : 5,
"self" : true,
"lastHeartbeatMessage" : ""
},
{
"_id" : 2,
"name" : "mongodb2.example.net:17002",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1144,
"optime" : {
"ts" : Timestamp(1694940575, 1),
"t" : NumberLong(1)
},
"optimeDurable" : {
"ts" : Timestamp(1694940575, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2023-09-17T08:49:35Z"),
"optimeDurableDate" : ISODate("2023-09-17T08:49:35Z"),
"lastHeartbeat" : ISODate("2023-09-17T08:49:41.167Z"),
"lastHeartbeatRecv" : ISODate("2023-09-17T08:49:41.170Z"),
"pingMs" : NumberLong(0),
"lastHeartbeatMessage" : "",
"syncingTo" : "mongodb0.example.net:17000",
"syncSourceHost" : "mongodb0.example.net:17000",
"syncSourceId" : 0,
"infoMessage" : "",
"configVersion" : 5
}
],
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1694940575, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1694940575, 1)
}

关闭mongod进程

mongo命令

使用mongo命令连接到mongod后,使用

1
2
3
4
# 切换到admin库
rs-example-0:SECONDARY> use admin
rs-example-0:SECONDARY> db.shutdownServer()

需要先切换到admin

执行完毕后mongo命令会自动断开与mongod的连接

会有以下日志报错

点击展开
1
2
3
4
5
6
7
8
9
2023-12-23T14:53:57.358+0000 I  NETWORK  [js] DBClientConnection failed to receive message from mongodb2.example.net:17002 - HostUnreachable: Connection closed by peer
server should be down...
2023-12-23T14:53:57.360+0000 I NETWORK [js] trying reconnect to mongodb2.example.net:17002 failed
2023-12-23T14:53:57.360+0000 I NETWORK [js] reconnect mongodb2.example.net:17002 failed failed
> exit
bye
2023-12-23T14:53:58.363+0000 I NETWORK [js] trying reconnect to mongodb2.example.net:17002 failed
2023-12-23T14:53:58.363+0000 I NETWORK [js] reconnect mongodb2.example.net:17002 failed failed
2023-12-23T14:53:58.363+0000 I QUERY [js] Failed to end session { id: UUID("2135374c-738d-4033-be37-87d9292704b4") } due to SocketException: socket exception [CONNECT_ERROR] server [couldn't connect to server mongodb2.example.net:17002, connection attempt failed: SocketException: Error connecting to mongodb2.example.net:17002 (127.0.0.1:17002) :: caused by :: Connection refused]
mongod命令

直接使用mongod命令

1
$ mongod -f [config] --shutdown

需要指定-f--config参数指定配置文件(-f--config等效)

或者--dbpath 指定数据库文件路径(其实-f/--config也是通过读取dbpath的参数获取了数据库文件路径)

本质上原理是相同的,都是通过dbpath下面的mongod.lock文件获取实例进程发送信号退出进程

1
2
$ cat /data/mongod0/mongod.lock
276