# Replication Internals
Replication is the set of systems used to continuously copy data from a primary server to secondary
servers so if the primary server fails a secondary server can take over soon. This process is
intended to be mostly transparent to the user, with drivers taking care of routing queries to the
requested replica. Replication in MongoDB is facilitated through [**replica
sets**](https://docs.mongodb.com/manual/replication/).
Replica sets are a group of nodes with one primary and multiple secondaries. The primary is
responsible for all writes. Users may specify that reads from secondaries are acceptable via
[`setSecondaryOk`](https://docs.mongodb.com/manual/reference/method/Mongo.setSecondaryOk/) or through
[**read preference**](#read-preference), but they are not by default.
# Steady State Replication
The normal running of a replica set is referred to as steady state replication. This is when there
is one primary and multiple secondaries. Each secondary is replicating data from the primary, or
another secondary off of which it is **chaining**.
## Life as a Primary
### Doing a Write
When a user does a write, all a primary node does is apply the write to the database like a
standalone would. The one difference from a standalone write is that replica set nodes have an
`OpObserver` that inserts a document to the **oplog** whenever a write to the database happens,
describing the write. The oplog is a capped collection called `oplog.rs` in the `local` database.
There are a few optimizations made for it in WiredTiger, and it is the only collection that doesn't
include an \_id field.
If a write does multiple operations, each will have its own oplog entry; for example, inserts with
implicit collection creation create two oplog entries, one for the `create` and one for the
`insert`.
These entries are rewritten from the initial operation to make them idempotent; for example, updates
with `$inc` are changed to use `$set`.
Secondaries drive oplog replication via a pull process.
Writes can also specify a [**write
concern**](https://docs.mongodb.com/manual/reference/write-concern/). If a command includes a write
concern, the command will just block in its own thread until the oplog entries it generates have
been replicated to the requested number of nodes. The primary keeps track of how up-to-date the
secondaries are to know when to return. A write concern can specify a number of nodes to wait for,
or **majority**. If **majority** is specified, the write waits for that write to be in the
**committed snapshot** as well, so that it can be read with `readConcern: { level: majority }`
reads. (If this last sentence made no sense, come back to it at the end).
### Default Write Concern
If a write operation does not explicitly specify a write concern, the server will use a default
write concern. This default write concern will be defined by either the
**cluster-wide write concern**, explicitly set by the user, or the
**implicit default write concern**, implicitly set by the
server based on replica set configuration.
#### Cluster-Wide Write Concern
Users can set the cluster-wide write concern (CWWC) using the
[`setDefaultRWConcern`](https://docs.mongodb.com/manual/reference/command/setDefaultRWConcern/)
command. Setting the CWWC will cause the implicit default write concern to
no longer take effect. Once a user sets a CWWC, we disallow unsetting it. The reasoning
behind this is explored in the section
[Implicit Default Write Concern and Sharded Clusters](#implicit-default-write-concern-and-sharded-clusters).
On sharded clusters, the CWWC will be stored on config servers. Shard servers themselves do not
store the CWWC. Instead, mongos polls the config server and applies the default write concern to
requests it forwards to shards.
#### Implicit Default Write Concern
If there is no cluster-wide default write concern set, the server will set the default. This is
known as the implicit default write concern (IDWC). For most cases, the IDWC will default to
`{w: "majority")`.
The IDWC is calculated on startup using the **Default Write Concern Formula (DWCF)**:
`implicitDefaultWriteConcern = if ((#arbiters > 0) AND (#non-arbiters <= majority(#voting nodes)) then {w:1} else {w:majority}`
This formula specifies that for replica sets with arbiters, we want to ensure that we set the
implicit default to a value that the set can satisfy in the event of one data-bearing node
going down. That is, the number of data-bearing nodes must be strictly greater than the majority
of voting nodes for the set to set `{w: "majority"}`.
For example, if we have a PSA replica set, and the secondary goes down, the primary cannot
successfully acknowledge a majority write as the majority for the set is two nodes. However, the
primary will remain primary with the arbiter's vote. In this case, the DWCF will have preemptively
set the IDWC to `{w: 1}` so the user can still perform writes to the replica set.
#### Implicit Default Write Concern and Sharded Clusters
For sharded clusters, the implicit default write concern will always be `{w: "majority"}`.
As mentioned above, mongos will send the default write concern with all requests that it forwards
to shards, which means the default write concern on shards will always be consistent in the
cluster. We don't want to specify `{w: "majority"}` for shard replica sets
that can keep a primary due to an arbiter's vote, but lose the ability to acknowledge majority
writes if a majority of data-bearing nodes goes down. So if the result of the DWCF for any replica
set in the cluster is `{w: 1}`, we require the cluster to set a CWWC. Once set, we disallow
unsetting it so we can prevent PSA shards from implicitly defaulting to `{w: "majority"}` for
reasons mentioned above. However, if a user decides to set the CWWC to `{w: "majority"}`
for a PSA set, they may do so. We assume that in this case the user understands
the tradeoffs they are making.
We will fassert shard servers on startup if no CWWC is set and
the result of the default write concern formula is `{w: 1}`. Similarly, we will also fail any
`addShard` command that attempts to add a shard replica set with a default write concern of
`{w: 1}` when CWWC is unset. This is because we want to maintain a consistent implicit default of
`{w: "majority"}` across the cluster, but we do not want to specify that for PSA sets for reasons
listed above.
#### Replica Set Reconfigs and Default Write Concern
A replica set reconfig will recalculate the default write concern using the Default Write Concern
Formula if CWWC is not set. If the new value of the implicit default write concern is different
from the old value, we will fail the reconfig. Users must set a CWWC before issuing a reconfig
that would change the IDWC.
#### Force Reconfigs
As an important note, we will also fail force reconfigs that may change
the IDWC. In cases where a replica set is facing degraded performance and cannot satisfy a
majority write concern needed to set the CWWC, users can run
`setDefaultRWConcern` with write concern `{w: 1}` instead of making it a majority write so that
setting CWWC does not get in the way of being able to do a force reconfig.
#### Code References
- [The definition of an Oplog Entry](https://github.com/mongodb/mongo/blob/r6.2.0/src/mongo/db/repl/oplog_entry.idl)
- [Upper layer uses OpObserver class to write Oplog](https://github.com/mongodb/mongo/blob/r6.2.0/src/mongo/db/op_observer/op_observer.h#L112), for example, [it is helpful to take a look at ObObserverImpl::logOperation()](https://github.com/mongodb/mongo/blob/r6.2.0/src/mongo/db/op_observer/op_observer_impl.cpp#L114)
- [repl::logOplogRecords() is a common function to write Oplogs into Oplog Collection](https://github.com/mongodb/mongo/blob/r7.1.0/src/mongo/db/repl/oplog.cpp#L440)
- [WriteConcernOptions is filled in extractWriteConcern()](https://github.com/mongodb/mongo/blob/r6.2.0/src/mongo/db/write_concern.cpp#L71)
- [Upper leve
没有合适的资源?快使用搜索试试~ 我知道了~
The MongoDB Database
共2000个文件
c:836个
h:607个
py:265个
需积分: 5 0 下载量 187 浏览量
2024-07-06
11:16:13
上传
评论
收藏 120.81MB ZIP 举报
温馨提示
Components mongod - The database server. mongos - Sharding router. mongo - The database shell (uses interactive javascript). Download MongoDB https://www.mongodb.com/try/download/community Using homebrew brew tap mongodb/brew Using docker image docker pull mongo Building See Building MongoDB. Running For command line options invoke: $ ./mongod --help To run a single server database: $ sudo mkdir -p /data/db $ ./mongod $ $ # The mongo javascript shell connects to localhost and test database by
资源推荐
资源详情
资源评论
收起资源包目录
The MongoDB Database (2000个子文件)
roaring.c 997KB
parse_date.c 515KB
ares_platform.c 482KB
pcre2_jit_compile.c 436KB
pcre2_compile.c 345KB
pcre2_ucd.c 324KB
pcre2test.c 302KB
pcre2_match.c 222KB
pcre2_dfa_match.c 138KB
pcre2grep.c 125KB
pcre2_jit_test.c 106KB
sljitNativeS390X.c 97KB
mongocrypt-ctx-encrypt.c 95KB
sljitNativeARM_32.c 91KB
sljitNativeX86_common.c 87KB
mongocrypt-marking.c 85KB
sljitLir.c 84KB
sljitNativeARM_T2_32.c 82KB
sljitNativePPC_common.c 79KB
deflate.c 79KB
stem_UTF_8_turkish.c 79KB
sljitNativeMIPS_common.c 79KB
ares_init.c 65KB
sljitNativeARM_64.c 65KB
pcre2_ucptables.c 59KB
mongocrypt-crypto.c 56KB
mongocrypt-kms-ctx.c 54KB
inflate.c 54KB
sljitNativeSPARC_common.c 52KB
pcre2_study.c 52KB
mongocrypt.c 51KB
ares_process.c 49KB
stem_UTF_8_french.c 47KB
stem_ISO_8859_1_french.c 46KB
stem_UTF_8_spanish.c 41KB
stem_UTF_8_hungarian.c 41KB
stem_ISO_8859_1_spanish.c 40KB
stem_ISO_8859_1_hungarian.c 40KB
trees.c 40KB
stem_UTF_8_italian.c 40KB
sljitNativeX86_32.c 40KB
pcre2_auto_possess.c 39KB
stem_ISO_8859_1_italian.c 39KB
stem_UTF_8_english.c 38KB
stem_UTF_8_portuguese.c 38KB
stem_ISO_8859_1_english.c 38KB
stem_ISO_8859_1_portuguese.c 37KB
mongocrypt-key-broker.c 37KB
stem_UTF_8_romanian.c 37KB
stem_ISO_8859_2_romanian.c 36KB
mongocrypt-ctx.c 35KB
mongocrypt-ctx-decrypt.c 34KB
Gparser.c 33KB
Gparser.c 32KB
crc32.c 31KB
pcre2_convert.c 30KB
Gfind_proc_info-lsb.c 30KB
_UPT_reg_offset.c 30KB
pcre2_substitute.c 29KB
adig.c 28KB
sljitNativeX86_64.c 27KB
sljitNativeMIPS_64.c 26KB
parse_tz.c 26KB
acountry.c 26KB
stem_UTF_8_finnish.c 26KB
stem_ISO_8859_1_finnish.c 25KB
stem_UTF_8_russian.c 25KB
stem_UTF_8_porter.c 25KB
sljitNativeMIPS_32.c 24KB
kms_request.c 24KB
stem_ISO_8859_1_porter.c 24KB
stem_KOI8_R_russian.c 24KB
pcre2_printint.c 24KB
parse_iso_intervals.c 23KB
infback.c 22KB
Gexpr.c 22KB
mc-range-encoding.c 22KB
mc-fle2-rfds.c 22KB
ares_getaddrinfo.c 22KB
Gtables.c 21KB
mc-fle2-payload-iev.c 21KB
Gscript.c 21KB
stem_UTF_8_dutch.c 21KB
Gstep.c 21KB
stem_ISO_8859_1_dutch.c 20KB
kms_b64.c 20KB
pcre2demo.c 20KB
gzread.c 19KB
Gregs.c 19KB
gzwrite.c 19KB
Gtrace.c 19KB
mongocrypt-ctx-datakey.c 19KB
Gtrace.c 18KB
Gtrace.c 18KB
cng.c 18KB
pcre2_substring.c 18KB
Lrs-race.c 18KB
stem_UTF_8_german.c 18KB
stem_ISO_8859_1_german.c 17KB
sljitNativePPC_64.c 17KB
共 2000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 20
资源评论
强连通子图
- 粉丝: 2027
- 资源: 235
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 5G模组升级刷模块救砖以及5G模组资料路由器固件
- C183579-123578-c1235789.jpg
- Qt5.14 绘画板 Qt Creator C++项目
- python实现Excel表格合并
- Java实现读取Excel批量发送邮件.zip
- 【java毕业设计】商城后台管理系统源码(springboot+vue+mysql+说明文档).zip
- 【java毕业设计】开发停车位管理系统(调用百度地图API)源码(springboot+vue+mysql+说明文档).zip
- 星耀软件库(升级版).apk.1
- 基于Django后端和Vue前端的多语言购物车项目设计源码
- 基于Python与Vue的浮光在线教育平台源码设计
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功