


共计 12735 个字符,预计需要花费 32 分钟才能阅读完成。

本文是对 MongoDB 副本集常用操作的一个汇总,同时也穿插着介绍了操作背后的原理及注意点。结合之前的文章:MongoDB 副本集的搭建,大家可以在较短的时间内熟悉 MongoDB 的搭建和管理。


1. 修改节点状态


    1> 将 Primary 节点降级为 Secondary 节点

    2> 冻结 Secondary 节点

    3> 强制 Secondary 节点进入维护模式

2. 修改副本集的配置

    1> 添加节点

    2> 删除节点

    3> 将 Secondary 节点设置为延迟备份节点

    4> 将 Secondary 节点设置为隐藏节点

    5> 替换当前的副本集成员

    6> 设置副本集节点的优先级

    7> 阻止 Secondary 节点升级为 Primary 节点

    8> 如何设置没有投票权的 Secondary 节点

    9> 禁用 chainingAllowed

   10> 为 Secondary 节点显式指定复制源

   11> 禁止 Secondary 节点创建索引


首先查看 MongoDB 副本集支持的所有操作

> rs.help()
    rs.status()                                { replSetGetStatus : 1 } checks repl set status
    rs.initiate()                              { replSetInitiate : null } initiates set with default settings
    rs.initiate(cfg)                           {replSetInitiate : cfg} initiates set with configuration cfg
    rs.conf()                                  get the current configuration object from local.system.replset
    rs.reconfig(cfg)                           updates the configuration of a running replica set with cfg (disconnects)
    rs.add(hostportstr)                        add a new member to the set with default attributes (disconnects)
    rs.add(membercfgobj)                       add a new member to the set with extra attributes (disconnects)
    rs.addArb(hostportstr)                     add a new member which is arbiterOnly:true (disconnects)
    rs.stepDown([stepdownSecs, catchUpSecs])   step down as primary (disconnects)
    rs.syncFrom(hostportstr)                   make a secondary sync from the given member
    rs.freeze(secs)                            make a node ineligible to become primary for the time specified
    rs.remove(hostportstr)                     remove a host from the replica set (disconnects)
    rs.slaveOk()                               allow queries on secondary nodes

    rs.printReplicationInfo()                  check oplog size and time range
    rs.printSlaveReplicationInfo()             check replica set members and replication lag
    db.isMaster()                              check who is primary

    reconfiguration helpers disconnect from the database so the shell will display
    an error, even if the command succeeds. 


将 Primary 节点降级为 Secondary 节点

myapp:PRIMARY> rs.stepDown()

这个命令会让 primary 降级为 Secondary 节点,并维持 60s,如果这段时间内没有新的 primary 被选举出来,这个节点可以要求重新进行选举。


myapp:PRIMARY> rs.stepDown(30)

在执行完该命令后,原 Secondary node3:27017 升级为 Primary。


201705-03T22:24:21.009+0800 I COMMAND  [conn8] Attempting to step down in response to replSetStepDown command
201705-03T22:24:25.967+0800 I –        [conn8] end connection (3 connections now open)
201705-03T22:24:37.643+0800 I REPL    [ReplicationExecutor] Member node3:27018 is now in state SECONDARY
201705-03T22:24:41.123+0800 I REPL    [replication-40] Restarting oplog query due to error: InterruptedDueToReplStateChange: operat
ion was interrupted. Last fetched optime (with hash): {ts: Timestamp
1493821475000|1, t: 2 }[-6379771952742605801]. Restarts remaining: 3201705-03T22:24:41.167+0800 I REPL    [replication-40] Scheduled new oplog query Fetcher source: node3:27018 database: local query:
oplog.rs, filter: {ts: { $gte: Timestamp 1493821475000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000, term: 2 } query metadata: {$replData: 1, $ssm: {$secondaryOk: true } } active: 1 timeout: 10000ms shutting down?: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler request: RemoteCommand 11695 — target:node3:27018 db:local cmd:{find: oplog.rs, filter: {ts: { $gte: Timestamp 1493821475000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000, term: 2 } active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis: -1ms2017-05-03T22:24:41.265+0800 I REPL    [replication-39] Choosing new sync source because our current sync source, node3:27018, has a
n OpTime ({ts: Timestamp
1493821475000|1, t: 2 }) which is not ahead of ours ({ts: Timestamp 1493821475000|1, t: 2 }), it does not have a sync source, and its not the primary (sync source does not know the primary)2017-05-03T22:24:41.266+0800 I REPL    [replication-39] Canceling oplog query because we have to choose a new sync source. Current s
ource: node3:27018, OpTime {ts: Timestamp 0|0, t: –1 }, its sync source index:-1201705-03T22:24:41.266+0800 W REPL    [rsBackgroundSync] Fetcher stopped querying remote oplog with error: InvalidSyncSource: sync
source node3:
27018 (last visible optime: { ts: Timestamp 0|0, t: –1 }; config version: 1; sync source index: –1; primary index: –1) is no longer valid2017-05-03T22:24:41.266+0800 I REPL    [rsBackgroundSync] could not find member to sync from
201705-03T22:24:46.021+0800 I REPL    [SyncSourceFeedback] SyncSourceFeedback error sending update to node3:27018: InvalidSyncSourc
e: Sync source was cleared. Was node3:
27018201705-03T22:24:46.775+0800 I REPL    [ReplicationExecutor] Starting an election, since weve seen no PRIMARY in the past 10000ms
201705-03T22:24:46.775+0800 I REPL    [ReplicationExecutor] conducting a dry run election to see if we could be elected
201705-03T22:24:46.857+0800 I REPL    [ReplicationExecutor] VoteRequester(term 2 dry run) received a yes vote from node3:27019; res
ponse message: {term:
2, voteGranted: true, reason: “”, ok: 1.0 }201705-03T22:24:46.858+0800 I REPL    [ReplicationExecutor] dry election run succeeded, running for election
201705-03T22:24:46.891+0800 I REPL    [ReplicationExecutor] VoteRequester(term 3) received a yes vote from node3:27018; response me
ssage: {term:
3, voteGranted: true, reason: “”, ok: 1.0 }201705-03T22:24:46.891+0800 I REPL    [ReplicationExecutor] election succeeded, assuming primary role in term 3
201705-03T22:24:46.891+0800 I REPL    [ReplicationExecutor] transition to PRIMARY
201705-03T22:24:46.892+0800 I ASIO    [NetworkInterfaceASIO-Replication-0] Connecting to node3:27019
201705-03T22:24:46.894+0800 I ASIO    [NetworkInterfaceASIO-Replication-0] Connecting to node3:27019
201705-03T22:24:46.894+0800 I ASIO    [NetworkInterfaceASIO-Replication-0] Successfully connected to node3:27019
201705-03T22:24:46.895+0800 I REPL    [ReplicationExecutor] My optime is most up-to-date, skipping catchup and completing transiti
on to primary.
201705-03T22:24:46.895+0800 I ASIO    [NetworkInterfaceASIO-Replication-0] Successfully connected to node3:27019
201705-03T22:24:47.348+0800 I REPL    [rsSync] transition to primary complete; database writes are now permitted
201705-03T22:24:49.231+0800 I NETWORK  [thread1] connection accepted from #9 (3 connections now open)
201705-03T22:24:49.236+0800 I NETWORK  [conn9] received client metadata from conn9: {driver: { name: NetworkI
nterfaceASIO-RS, version:3.4.2 }, os: {type:Linux, name:Red Hat Enterprise Linux Server release 6.7 (Santiago), architecture:x86_64, version:Kernel 2.6.32573.el6.x86_64 } }2017-05-03T22:24:49.317+0800 I NETWORK  [thread1] connection accepted from #10 (4 connections now open)
201705-03T22:24:49.318+0800 I NETWORK  [conn10] received client metadata from conn10: {driver: { name: Networ
kInterfaceASIO-RS, version:3.4.2 }, os: {type:Linux, name:Red Hat Enterprise Linux Server release 6.7 (Santiago), architecture:x86_64, version:Kernel 2.6.32573.el6.x86_64 } }

原 Primary node3:27018 降低为 Secondary

201705-03T22:24:36.262+0800 I COMMAND  [conn7] Attempting to step down in response to replSetStepDown command
201705-03T22:24:36.303+0800 I REPL    [conn7] transition to SECONDARY
201705-03T22:24:36.315+0800 I NETWORK  [conn7] legacy transport layer closing all connections
201705-03T22:24:36.316+0800 I NETWORK  [conn7] Skip closing connection for connection # 5
201705-03T22:24:36.316+0800 I NETWORK  [conn7] Skip closing connection for connection # 4
201705-03T22:24:36.316+0800 I NETWORK  [conn7] Skip closing connection for connection # 4
201705-03T22:24:36.316+0800 I NETWORK  [conn7] Skip closing connection for connection # 3
201705-03T22:24:36.316+0800 I NETWORK  [conn7] Skip closing connection for connection # 1
201705-03T22:24:36.316+0800 I NETWORK  [conn7] Skip closing connection for connection # 1
201705-03T22:24:36.382+0800 I NETWORK  [thread1] connection accepted from #8 (5 connections now open)
201705-03T22:24:36.383+0800 I NETWORK  [conn8] received client metadata from conn8: {application: { name: MongoDB
Shell }, driver: {name:MongoDB Internal Client, version:3.4.2 }, os: {type:Linux, name:Red Hat Enterprise Linux Server release 6.7 (Santiago), architecture:x86_64, version:Kernel 2.6.32573.el6.x86_64 } }2017-05-03T22:24:36.408+0800 I –        [conn7] AssertionException handling request, closing client connection: 172 Operation attempt
ed on a closed transport Session.201705-03T22:24:36.408+0800 I –        [conn7] end connection (6 connections now open)
201705-03T22:24:41.262+0800 I COMMAND  [conn5] command local.oplog.rs command: find {find: oplog.rs, filter: {ts: { $gte: Timest
1493821475000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000, term: 2 } planSummary: COLLSCAN cursorid:12906944372 keysExamined:0 docsExamined:1 writeConflicts:1 numYields:1 nreturned:1 reslen:392 locks:{Global: { acquireCount: { r: 4 } }, Database: {acquireCount: { r: 2 } }, oplog: {acquireCount: { r: 2 } } } protocol:op_command 100ms2017-05-03T22:24:48.311+0800 I REPL    [ReplicationExecutor] Member node3:27017 is now in state PRIMARY
201705-03T22:24:49.163+0800 I REPL    [rsBackgroundSync] sync source candidate: node3:27017
201705-03T22:24:49.164+0800 I ASIO    [NetworkInterfaceASIO-RS-0] Connecting to node3:27017
201705-03T22:24:49.236+0800 I ASIO    [NetworkInterfaceASIO-RS-0] Successfully connected to node3:27017
201705-03T22:24:49.316+0800 I ASIO    [NetworkInterfaceASIO-RS-0] Connecting to node3:27017
201705-03T22:24:49.318+0800 I ASIO    [NetworkInterfaceASIO-RS-0] Successfully connected to node3:27017
201705-03T22:25:41.020+0800 I –        [conn4] end connection (5 connections now open)
201705-03T22:29:02.653+0800 I ASIO    [NetworkInterfaceASIO-RS-0] Connecting to node3:27017
201705-03T22:29:02.669+0800 I ASIO    [NetworkInterfaceASIO-RS-0] Successfully connected to node3:27017
201705-03T22:29:41.442+0800 I –        [conn5] end connection (4 connections now open)

冻结 Secondary 节点

如果需要对 Primary 做一下维护,但是不希望在维护的这段时间内将其它 Secondary 节点选举为 Primary 节点,可以在每次 Secondary 节点上执行 freeze 命令,强制使它们始终处于 Secondary 节点状态。

myapp:SECONDARY> rs.freeze(100)

注:只能在 Secondary 节点上执行

myapp:PRIMARY> rs.freeze(100)
{"ok" : 0,
    "errmsg" : "cannot freeze node when primary or running for election. state: Primary",
    "code" : 95,
    "codeName" : "NotSecondary"

如果要解冻 Secondary 节点,只需执行

myapp:SECONDARY> rs.freeze()

强制 Secondary 节点进入维护模式

当 Secondary 节点进入到维护模式后,它的状态即转化为“RECOVERING”,在这个状态的节点,客户端不会发送读请求给它,同时它也不能作为复制源。


1. 自动触发

    譬如 Secondary 上执行压缩

2. 手动触发

myapp:SECONDARY> db.adminCommand({"replSetMaintenance":true})



myapp:PRIMARY> rs.add("node3:27017")
myapp:PRIMARY> rs.add({_id: 3, host: "node3:27017", priority: 0, hidden: true})


> cfg={
_id : 3,
host : node3:27017,
arbiterOnly : false,
buildIndexes : true,
hidden : true,
priority : 0,
tags : {
slaveDelay : NumberLong(0),
votes : 1
> rs.add(cfg 



myapp:PRIMARY> rs.remove("node3:27017")


myapp:PRIMARY> cfg = rs.conf()
myapp:PRIMARY> cfg.members.splice(2,1)
myapp:PRIMARY> rs.reconfig(cfg)

注:执行 rs.reconfig 并不必然带来副本集的重新选举,加 force 参数同样如此。

The rs.reconfig() shell method can trigger the current primary to step down in some situations.


将 Secondary 节点设置为延迟备份节点

cfg = rs.conf()
cfg.members[1].priority = 0
cfg.members[1].hidden = true
cfg.members[1].slaveDelay = 3600

将 Secondary 节点设置为隐藏节点

cfg = rs.conf()
cfg.members[0].priority = 0
cfg.members[0].hidden = true


cfg = rs.conf()
cfg.members[0].host = "mongo2.example.net"


cfg = rs.conf()
cfg.members[0].priority = 0.5
cfg.members[1].priority = 2
cfg.members[2].priority = 2

优先级的有效取值是 0~1000,可为小数,默认为 1

从 MongoDB 3.2 开始

Non-voting members must have priority of 0.
Members with priority greater than 0 cannot have 0 votes.

注:如果将当前 Secondary 节点的优先级设置的大于 Primary 节点的优先级,会导致当前 Primary 节点的退位。

阻止 Secondary 节点升级为 Primary 节点

只需将 priority 设置为 0

fg = rs.conf()
cfg.members[2].priority = 0

如何设置没有投票权的 Secondary 节点

MongoDB 限制一个副本集最多只能拥有 50 个成员节点,其中,最多只有 7 个成员节点拥有投票权。


从 MongoDB 3.2 开始,任何 priority 大于 0 的节点都不可将 votes 设置为 0

所以,对于没有投票权的 Secondary 节点,votes 和 priority 必须同时设置为 0

cfg = rs.conf() 
cfg.members[3].votes = 0 
cfg.members[3].priority = 0 
cfg.members[4].votes = 0
cfg.members[4].priority = 0 

禁用 chainingAllowed


即备份集中如果新添加了一个节点,这个节点很可能是从其中一个 Secondary 节点处进行复制,而不是从 Primary 节点处复制。

MongoDB 根据 ping 时间选择同步源,一个节点向另一个节点发送心跳请求,就可以得知心跳请求所耗费的时间。MongoDB 维护着不同节点间心跳请求的平均花费时间,选择同步源时,会选择一个离自己比较近而且数据比自己新的节点。


myapp:PRIMARY> rs.status().members[1].syncingTo

当然,级联复制也有显而易见的缺点:复制链越长,将写操作复制到所有 Secondary 节点所花费的时间就越长。



将 chainingAllowed 设置为 false 后,所有 Secondary 节点都会从 Primary 节点复制数据。

为 Secondary 节点显式指定复制源


禁止 Secondary 节点创建索引

有时,并不需要 Secondary 节点拥有和 Primary 节点相同的索引,譬如这个节点只是用来处理数据备份或者离线的批量任务。这个时候,就可以阻止 Secondary 节点创建索引。

在 MongoDB 3.4 版本中,不允许直接修改,只能在添加节点时显式指定

myapp:PRIMARY> cfg=rs.conf()
> cfg.members[2].buildIndexes=false
> rs.reconfig(cfg) {ok : 0,
errmsg : priority must be 0 when buildIndexes=false,
code : 103,
codeName : NewReplicaSetConfigurationIncompatible
> cfg.members[2].buildIndexes=false
> cfg.members[2].priority=0
> rs.reconfig(cfg)
ok : 0,
errmsg : New and old configurations differ in the setting of the buildIndexes field for member node3:27017; to make this c
hange, remove then re-add the member,   code : 103,
    codeName : NewReplicaSetConfigurationIncompatible
> rs.remove(node3:27017)
ok : 1 }
> rs.add({_id: 2, host: node3:27017, priority: 0, buildIndexes:false})
ok : 1 }

从上述测试中可以看出,如果要将节点的 buildIndexes 设置为 false,必须同时将 priority 设置为 0。


1.《MongoDB 权威指南》PDF 下载见 http://www.linuxidc.com/Linux/2016-12/138253.htm

2. MongoDB 官方文档

本文永久更新链接地址 :http://www.linuxidc.com/Linux/2017-05/143913.htm

版权声明:本站原创文章,由 星锅 于2022-01-22发表,共计12735字。