# vim: set foldenable foldmethod=indent sw=4 ts=8 :
# Copyright 2013 Linbit HA Solutions GmbH
# Lars Ellenberg @ linbit.com
TODO:
someone convert this into proper ascii doc please ;-)
... and draw some pictures ...
How crm-fence-peer.sh, pacemaker, and the OCF Linbit DRBD resource agent
are supposed to work together.
Two node cluster is the trickier one, because it has not real quorum.
Relative Timeouts
--dc-timeout > dead-time resp. stonith-timeout
if stonith enabled, --timeout >= --dc-timeout
if no stonith, then timeout may be small.
Pacemaker operations timeouts
monitor and promote action timeout > max(dc_timeout, timeout)
Node reboot, possibly because of crash or stonith due to communication loss
no peer reachable [no delay]
crm may decide to elect itself, shoot the peer,
and start services.
If DRBD peer disk state is known Outdated or worse, DRBD will
switch itself to UpToDate, allowing it to be promoted,
without further fencing actions.
If DRBD peer disk state is DUnknown, DRBD will be only Consistent.
In case crm decides to promote this instance, the fence-peer callback
runs, finds the peer "unreachable", finds itself Consistent only,
does NOT set any constraint, and DRBD refuses to be promoted.
CRM will now try in an endless loop to promote this instance.
Avoid this by adding
param adjust_master_score="0 10 1000 10000"
to the DRBD resource definition.
no replication link
CRM can see both nodes. [delay: crmadmin -S $peer]
If currently both nodes are Secondary Consistent, CRM will decide to
promote one instance. The fence-peer callback will find the other node
still reachable after timeout, and set the constraint.
If there is already one Primary, and this is a node rejoining the
cluster, there should already be a constraint preventing this node
from being promoted.
Only Replication link breaks during normal operation
Single Primary [delay: crmadmin -S $peer]
fence-peer callback finds DC,
crmadmin -S confirms peer still "reachable",
and sets contraint.
Dual Primary
both fence-peer callbacks find DC,
both see node_state "reachable",
optionaly delay for --network-hickup timeout,
and if DRBD is still disconnected,
both try to set the constraint.
Only one succeeds.
The loser should probably commit suicide,
to reduce the overall recovery time.
--suicide-on-failure-if-primary
Node crash
surviving node is Secondary, [no delay]
If not DC, triggers DC election, elects itself.
Is DC now.
If stonith enabled, shoots the peer.
Promotes this node.
During promotion, fenc-peer callback
finds a DC, and a node_state "unreachable",
so sets the constraint "immediately".
surviving node is Primary (DC) [delay up to timeout]
If stonith enabled, shoots the peer.
fence-peer callback finds DC, after some
time sees node_state "unreachable",
or times out while node_state is still "reachable".
Either way still sets the constraint.
surviving node is Primary (not DC) [delay up to mac(dc_timeout,timeout)]
fence-peer callback loops trying to contact DC.
eventually this node is elected DC.
If stonith enabled, shoots the peer.
Fence-peer callback either times out while no DC is available,
thus fails. Make sure you chose a suitable --dc-timeout.
Or it finds the other node "unreachable",
and sets the constraint.
Total communication loss
To the single node, this looks like node crash, so see above.
The difference is the potential of data divergence.
If DRBD was configured for "fencing resource-and-stonith",
IO on any Primary is frozen while the fence-peer callback runs.
If stonith is enabled, timeouts should be selected so that
we are shot while waiting for the DC to confirm node_state
"unreachable" of the peer, thus combined with freezing IO,
no harmful data diversion can happen at this time.
If there is no stonith enabled, data divergence is unavoidable.
==> Multi-Primary *requires*
both node level fencing (stonith)
AND drbd resource level fencing
Again: Multi-Primary REQUIRES stonith enabled and working.
没有合适的资源?快使用搜索试试~ 我知道了~
drbd-8.4.4.tar.gz
需积分: 13 40 下载量 24 浏览量
2014-02-24
14:21:07
上传
评论
收藏 692KB GZ 举报
温馨提示
共220个文件
c:92个
h:46个
in:10个
分布式块设备复制,主要用于两台linux服务器之间共享块设备、文件系统和数据,类似一个网络raid1功能
资源推荐
资源详情
资源评论
收起资源包目录
drbd-8.4.4.tar.gz (220个子文件)
drbd.conf.5 49KB
drbdsetup.8 51KB
drbdadm.8 12KB
drbdmeta.8 7KB
drbd.8 3KB
drbddisk.8 3KB
configure.ac 11KB
drbdadm.bash_completion 5KB
block-drbd 8KB
drbd_receiver.c 167KB
drbdmeta.c 126KB
drbd_main.c 113KB
drbd_nl.c 107KB
drbdadm_main.c 101KB
drbdadm_main.c 85KB
drbdsetup.c 77KB
drbdsetup.c 70KB
drbd_worker.c 58KB
drbd_state.c 57KB
drbdadm_parser.c 51KB
drbd_bitmap.c 49KB
drbd_req.c 46KB
drbdadm_parser.c 42KB
drbd_actlog.c 39KB
libgenl.c 21KB
drbdadm_adjust.c 20KB
config_flags.c 19KB
drbdadm_usage_cnt.c 19KB
lru_cache.c 19KB
drbdtool_common.c 18KB
drbdtool_common.c 18KB
drbdadm_usage_cnt.c 18KB
drbdadm_adjust.c 13KB
drbd_proc.c 10KB
dm.c 10KB
io-latency-test.c 9KB
registry.c 5KB
drbd_interval.c 4KB
drbd_strings.c 4KB
drbdadm_minor_table.c 4KB
drbd_strings.c 4KB
drbd_sysfs.c 3KB
blkdev_issue_zeroout.c 2KB
drbd_nla.c 1KB
kobject.c 1KB
drbd_nla.c 1KB
idr.c 1KB
have_bioset_create_front_pad.c 1KB
wrap_printf.c 772B
have_void_make_request.c 529B
use_blk_queue_max_sectors_anyways.c 471B
have_clear_bit_unlock.c 456B
have_security_netlink_recv.c 429B
hlist_for_each_entry_has_three_parameters.c 380B
have_rb_augment_functions.c 335B
have_cn_netlink_skb_parms.c 328B
drbd_release_returns_void.c 319B
have_open_bdev_exclusive.c 279B
have_blkdev_get_by_path.c 270B
bio_split_has_bio_split_pool_parameter.c 230B
have_find_next_zero_bit_le.c 223B
have_atomic_in_flight.c 213B
drbd_buildtag.c 211B
blkdev_issue_zeroout_has_5_paramters.c 209B
bioset_create_has_three_parameters.c 209B
have_genlmsg_put_reply.c 206B
have_idr_for_each.c 182B
have_genl_lock.c 164B
kmap_atomic_page_only.c 159B
have_list_splice_tail_init.c 157B
have_idr_alloc.c 151B
have_genlmsg_reply.c 143B
queue_limits_has_discard_zeroes_data.c 142B
have_nlmsg_hdr.c 140B
have_dst_groups.c 130B
have_proc_pde_data.c 126B
need_genlmsg_multicast_wrapper.c 122B
have_idr_for_each_entry.c 121B
have_genlmsg_msg_size.c 120B
have_blk_set_stacking_limits.c 112B
have_cpumask_empty.c 107B
have_genlmsg_new.c 105B
have_blk_queue_max_hw_sectors.c 103B
have_struct_queue_limits.c 103B
init_work_has_three_arguments.c 100B
have_blk_queue_max_segments.c 99B
have_netlink_skb_parms_portid.c 97B
have_task_pid_nr.c 95B
have_sock_shutdown.c 92B
have_bio_bi_destructor.c 89B
have_proc_create_data.c 88B
have_kref_sub.c 85B
have_IS_ERR_OR_NULL.c 84B
have_prandom_u32.c 81B
have_ctrl_attr_mcast_groups.c 80B
have_nr_cpu_ids.c 68B
have_vzalloc.c 66B
have_umh_wait_proc.c 66B
have_linux_byteorder_swabb_h.c 62B
have_fmode_t.c 60B
共 220 条
- 1
- 2
- 3
资源评论
枫叶DE飘扬
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功