Detailed instructions for use are in the User's Guide.
[. . . ] SUSE Linux Enterprise High Availability Extension
11
February 18, 2010
www. novell. com High Availability Guide
High Availability Guide
Copyright © 2006- 2010 Novell, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1. 2 or (at your option) version 1. 3; with the Invariant Section being this copyright notice and license. A copy of the license version 1. 2 is included in the section entitled "GNU Free Documentation License". SUSE®, openSUSE®, the openSUSE® logo, Novell®, the Novell® logo, the N® logo, are registered trademarks of Novell, Inc. [. . . ] 2b Dismount the disk on node 1.
umount /srv/r0mount
2c Downgrade the DRBD service on node 1 by typing the following command on node 1:
drbdadm secondary r0
2d On node 2, promote the DRBD service to primary.
drbdadm primary r0
2e On node 2, check to see if node 2 is primary.
rcdrbd status
2f On node 2, create a mount point such as /srv/r0mount.
mkdir /srv/r0mount
2g On node 2, mount the DRBD device.
mount -o rw /dev/drbd0 /srv/r0mount
2h Verify that the file you created on node 1 is viewable.
ls /srv/r0mount
The /srv/r0mount/from_node1 file should be listed.
Distributed Replicated Block Device (DRBD)
125
3 If the service is working on both nodes, the DRBD setup is complete. 4a Dismount the disk on node 2 by typing the following command on node 2:
umount /srv/r0mount
4b Downgrade the DRBD service on node 2 by typing the following command on node 2:
drbdadm secondary r0
4c On node 1, promote the DRBD service to primary.
drbdadm primary r0
4d On node 1, check to see if node 1 is primary.
rcdrbd status
5 To get the service to automatically start and fail over if the server has a problem, you can set up DRBD as a high availability service with OpenAIS. For information about installing and configuring OpenAIS for SUSE Linux Enterprise 11 see Part II, "Configuration and Administration" (page 29).
14. 4 Troubleshooting DRBD
The drbd setup involves many different components and problems may arise from different sources. The following sections cover several common scenarios and recommends various solutions.
126
High Availability Guide
14. 4. 1 Configuration
If the initial drbd setup does not work as expected, there is probably something wrong with your configuration. To get information about the configuration: 1 Open a terminal console, then log in as root. Enter
drbdadm -d adjust r0
In a dry run of the adjust option, drbdadm compares the actual configuration of the DRBD resource with your DRBD configuration file, but it does not execute the calls. Review the output to make sure you know the source and cause of any errors. 3 If there are errors in the drbd. conf file, correct them before continuing. 4 If the partitions and settings are correct, run drbdadm again without the -d option.
drbdadm adjust r0
This applies the configuration file to the DRBD resource.
14. 4. 2 Hostnames
For DRBD, hostnames are case sensitive (Node0 would be a different host than node0). If you have several network devices and want to use a dedicated network device, the hostname will likely not resolve to the used ip address. In this case, use the parameter disable-ip-verification.
Distributed Replicated Block Device (DRBD)
127
14. 4. 3 TCP Port 7788
If your system is unable to connect to the peer, this might be a problem with your local firewall. Make sure that this port is accessible on both nodes.
14. 4. 4 DRBD Devices Broken after Reboot
In cases when DRBD does not know which of the real devices holds the latest data, it changes to a split brain condition. In this case, the respective DRBD subsystems come up as secondary and do not connect to each other. In this case, the following message is written to /var/log/messages:
Split-Brain detected, dropping connection!
To resolve this situation, enter the following on the node which has data to be discarded:
drbdadm secondary r0 drbdadm -- --discard-my-data connect r0
On the node which has the latest data enter the following:
drbdadm connect r0
14. 5 Additional Information
The following open source resources are available for DRBD: · The following man pages for DRBD are available in the distribution: drbd(8) drbddisk(8) drbdsetup(8) drbdadm(8) drbd. conf(5) · Find a commented example configuration for DRBD at /usr/share/doc/ packages/drbd/drbd. conf
128
High Availability Guide
· The project home page http://www. drbd. org. · http://clusterlabs. org/wiki/DRBD_HowTo_1. 0 by the Linux Pacemaker Cluster Stack Project.
Distributed Replicated Block Device (DRBD)
129
Part IV. Troubleshooting and Reference
Troubleshooting
15
Often, strange problems may occur that are not easy to understand (especially when starting to experiment with Heartbeat). However, there are several utilities that may be used to take a closer look at the Heartbeat internal processes. This chapter recommends various solutions.
15. 1 Installation Problems
Troubleshooting difficulties installing the packages or in bringing the cluster online. . The packages needed for configuring and managing a cluster are included in the High Availability installation pattern, available with the High Availability Extension. Check if High Availability Extension is installed as an add-on to SUSE Linux Enterprise Server 11 on each of the cluster nodes and if the High Availability pattern is installed on each of the machines as described in Section 3. 1, "Installing the High Availability Extension" (page 23). In order to communicate with each other, all nodes belonging to the same cluster need to use the same bindnetaddr, mcastaddr and mcastport as described in Section 3. 2, "Initial Cluster Setup" (page 24).
Troubleshooting
133
Check if the communication channels and options configured in /etc/ais/ openais. conf are the same for all cluster nodes. In case you use encrypted communication, check if the /etc/ais/authkey file is available on all cluster nodes. [. . . ] local resource manager (LRM) The local resource manager (LRM) is responsible for performing operations on resources. The LRM is "dumb" in that it does not know of any policy by itself. LSB init scripts are not limited to use in a high availability context. Any LSB-compliant Linux system uses LSB init scripts to control services. [. . . ]