Getting Started with Linux-HA (heartbeat)
Intro
Let me preface this document by saying most of this is _not_ original work.
My purpose for writing this document is just trying to contribute in some
way to possibly help those who REALLY get things done. The "work" I am
contributing is mostly compiling bits and pieces from other HA documents
(such as Volker Wiegand's Hardware Installation Guide) into a document that
can help novices get started on HA without pestering Alan (like I did!) and
to cut down on repeat questions on the mailing list.
Getting Started
The first thing you'll need is two computers. You need not have identical
hardware in both machines (or amount of memory, etc.), but if you did, it
would make your life that much easier when a component fails.
Now you have to decide on some of your implementation. Your "cluster" is
established via a "heartbeat" between the two computers (nodes) generated by
the software package of the same name. However, this heartbeat needs one or
more media paths (serial via a null modem cable, ethernet via a crossover
cable, etc.) between the nodes.
At this point, you're actually ready to begin hardware-wise. Of course,
since you're looking into HA, you'll mostly likely want to avoid having only
one point of failure. In this case, that would be your null modem
cable/serial port or network interface card(NIC)/crossover cable. So, you
need to decide whether you wish to add a second serial/null modem connection
or a second network interface card (NIC)/crossover connnection to each
node. See Appendix A for instructions on how to build a Cat-5 crossover
cable. My heartbeat path setup uses one serial port and one extra NIC
because I only had one null modem cable, had an extra of NIC on hand and
thought it was good to have two medium types for the heartbeats.
Once your hardware is in order, you must install your OS and configure your
networking (I used Red Hat). Assuming you have 2 NICs, one should be
configured for your "normal" network and the other as a private network
between your clustered nodes (via the crossover cable). For an example, we
will assume that our cluster will have the following addresses:
Node 1 (linuxha1): 192.168.85.1 (normal 192x net)
10.0.0.1 (private 10x net for heartbeat)
Node 2 (linuxha2): 192.168.85.2 (192x)
10.0.0.2 (10x)
Note: None of these addresses should be your "cluster address" - the
address handled by heartbeat and failed over between nodes!
Most *nix distributions this easy during installation, however, if you are
having any problems, refer to either the Ethernet HOWTO, or the
documentation for your distribution. To check your configuration, type:
ifconfig
This will show your network interfaces and their configuration. You can
obtain your network routing information from "netstat -nr".
If it looks good, make sure you can ping between both nodes on all
interfaces.
Next, if you're using one, you'll need to test your serial connection. On
one node, which will be the receiver, type:
cat </dev/ttyS0
On the other node, type,:
echo hello >/dev/ttyS0
You should see the text on the receiver node. If it works, change their
roles and try again. If it doesn't, it may be as simple as having the wrong
device file. Volker's HA Hardware Guide and the Serial HOWTO are two good
resources for troubleshooting your serial connection.
Installing Heartbeat.
You can now install the heartbeat package. If you're reading this, you
already have it, but in any case it's available at:
[1]http://linux-ha.org/download
There are binary RPMs at the website, or you can build heartbeat from
source. Grab the tarball (or install the source RPM). Untar it into your
favorite source directory. From the top of the source tree, type
"./ConfigureMe configure", followed by "make" and "make install". If you
have problems installing the RPMs found at the website and want a way to
make your own, there may be help in the [2]FAQ.
Configuring Heartbeat
Configuring ha.cf
There are three files you will need to configure before starting up
heartbeat. First, is ha.cf. This will be placed in the /etc/ha.d directory
that is created after installation. It tells heartbeat what types of media
paths to use and how to configure them. The ha.cf in the source directory
contains all the various options you can use, I'll go through it line by
line...
serial /dev/ttyS0
Use a serial heartbeat - if you don't use a serial heartbeat, you
must use another medium, such as a bcast (ethernet) heartbeat.
Replace /dev/ttyS0 with the appropriate device file for your required
serial heartbeat.
watchdog /dev/watchdog
Optional. The watchdog function provides a way to have a system that
is still minimally functioning, but not providing a heartbeat, reboot
itself after a minute of being sick. This could help to avoid a
scenario where the machine recovers its heartbeat after being
pronounced dead. If that happened and a disk mount failed over, you
could have two nodes mounting a disk simultaneously. If you wish to
use this feature, then in addition to this line, you will need to
load the "softdog" kernel module and create the actual device file.
To do this, first type "insmod softdog" to load the module. Then,
type "grep misc /proc/devices" and note the number it reports (should
be 10). Next, type "cat /proc/misc | grep watchdog" and note that
number (should be 130). Now you can create the device file with that
info typing, "mknod /dev/watchdog c 10 130".
bcast eth1
Specifies to use a broadcast heartbeat over the eth1 interface
(replace with eth0, eth2, or whatever you use).
keepalive 2
Sets the time between heartbeats to 2 seconds.
warntime 10
Time in seconds before issuing a "late heartbeat" warning in the
logs.
deadtime 30
Node is pronounced dead after 30 seconds.
initdead 120
With some configurations, the network takes some time to start
working after a reboot. This is a separate "deadtime" to handle
that case. It should be at least twice the normal deadtime.
hopfudge 1
Optional. For ring topologies, number of hops allowed in addition to
the number of nodes in the cluster.
baud 19200
Speed at which to run the serial line (bps).
udpport 694
Use port number 694 for bcast or ucast communication. This is the
default, and the official IANA registered port number.
auto_failback on
Required. For those familiar with Tru64 Unix, heartbeat acts as if in
"favored member" mode. The master listed in the haresources
file holds all the resources until a failover, at which time
the slave takes over. When auto_failback is set to on once the
master comes back online, it will take everything back from the
slave. When set to off this option will prevent the master
node from re-acquiring cluster resources after a failover. This
option is similar to to the obsolete nice_failback option. If
you want to upgrade from a cluster which had nice_failback set
off, to this or later versions, special considerations apply in
order to want to avoid requiring a flash cut. Please see the
[3]FAQ for details on how to deal with this situation.
node linuxha1.linux-ha.org
Mandatory.
- 1
- 2
前往页