DRBD on Fedora 13
Alteeve Wiki :: How To :: DRBD on Fedora 13 |
NOTICE: This page has been merged into the Two Node Fedora 13 Cluster - Xen-Based Virtual Machine Host on DRBD+CLVM HowTo.
Warning: Until this warning is removed, do not use or trust this document. When complete and tested, this warning will be removed.
This article covers installing and configuring DRBD on a two-node Fedora 13 cluster.
Why DRBD?
DRBD is useful in small clusters as it provides real-time mirroring of data across two (or more) nodes. In two-node clusters, this can be used to host clustered LVM physical volumes. On these volumes you can create logical volumes to host GFS2 partitions, virtual machines, iSCSI and so forth.
A Map of the Cluster's Storage
Node1 Node2
_____ _____ _____ _____
| sda | | sdb | | sda | | sdb |
|_____| |_____| |_____| |_____|
|_______| |_______|
_______ ____|___ _______ _______ ____|___ _______
__|__ __|__ __|__ __|__ __|__ __|__ __|__ __|__
| md0 | | md1 | | md2 | | md3 | | md3 | | md2 | | md1 | | md0 |
|_____| |_____| |_____| |_____| |_____| |_____| |_____| |_____|
| | | | | | | |
___|___ _|_ ____|____ |___________| ____|____ _|_ ___|___
| /boot | | / | | <swap> | | | <swap> | | / | | /boot |
|_______| |___| |_________| ______|______ |_________| |___| |_______|
| /dev/drbd0 |
|_____________|
|
____|______
| clvm PV |
|___________|
|
_____|_____
| drbd_vg0 |
|___________|
|
_____|_____ ___...____
| | |
___|___ ___|___ ___|___
| lv_X | | lv_Y | | lv_N |
|_______| |_______| |_______|
Install
yum install drbd.x86_64 drbd-xen.x86_64 drbd-utils.x86_64
Compile the DRBD module for Xen dom0
If you are running the custom Xen dom0, you will probably need to build the DRBD module from the source RPM. You can try the following RPMs, but be aware that if the dom0 kernel gets updated, you will need to rebuild these RPMs using the following steps.
AN! Provided DRBD RPMs
- drbd-km-2.6.32.21_167.xendom0.fc12.x86_64-8.3.7-12.fc13.x86_64.rpm - DRBD kernel module for myoung's 2.6.32.21_167 dom0 kernel (897 KiB)
- drbd-km-debuginfo-8.3.7-12.fc13.x86_64.rpm - Debug info for DRBD kernel module for myoung's 2.6.32.21_167 dom0 kernel (3.2 KiB)
You can install the two above RPMs with this command:
rpm -ivh https://alteeve.com/files/an-cluster/drbd-km-2.6.32.21_167.xendom0.fc12.x86_64-8.3.7-12.fc13.x86_64.rpm https://alteeve.com/files/an-cluster/drbd-km-debuginfo-8.3.7-12.fc13.x86_64.rpm
Building RPMs From Source
If the above RPMs don't work or if the dom0 kernel you are using in any way differs, please continue.
Install the build environment:
yum -y groupinstall "Development Libraries"
yum -y groupinstall "Development Tools"
Install the kernel headers and development library for the dom0 kernel:
Note: The following commands use --force to get past the fact that the headers for the 2.6.33 are already installed, thus making RPM think that these are too old and will conflict. Please proceed with caution.
rpm -ivh --force http://fedorapeople.org/~myoung/dom0/x86_64/kernel-headers-2.6.32.21-167.xendom0.fc12.x86_64.rpm http://fedorapeople.org/~myoung/dom0/x86_64/kernel-devel-2.6.32.21-167.xendom0.fc12.x86_64.rpm
Download, prepare, build and install the source RPM:
rpm -ivh http://fedora.mirror.iweb.ca/releases/13/Everything/source/SRPMS/drbd-8.3.7-2.fc13.src.rpm
cd /root/rpmbuild/SPECS/
rpmbuild -bp drbd.spec
cd /root/rpmbuild/BUILD/drbd-8.3.7/
./configure --enable-spec --with-km
cp /root/rpmbuild/BUILD/drbd-8.3.7/drbd-km.spec /root/rpmbuild/SPECS/
cd /root/rpmbuild/SPECS/
rpmbuild -ba drbd-km.spec
cd /root/rpmbuild/RPMS/x86_64
rpm -Uvh drbd-km-*
You should be good to go now!
Configure
Now that DRBD is installed, it is time to prepare the space and configure DRBD.
Allocating Raw Space
If you followed the setup steps provided for in "Two Node Fedora 13 Cluster", you will have a set amount of unconfigured hard drive space. This is what we will use for the DRBD space on either node. If you've got a different setup, you will need to allocate some raw space and then
Creating a RAID level 1 'md' Device
This assumes that you have two raw drives, /dev/sda and /dev/sdb. It further assumes that you've created three partitions which have been assigned to three existing /dev/mdX devices. With these assumptions, we will create /dev/sda4 and /dev/sdb4 and, using them, create a new /dev/md3 device that will host the DRBD partition.
If you do not have two drives, you can stop after creating a new partition. If you have multiple drives and plan to use a different RAID levels, please adjust the follow commands accordingly.
Creating The New Partitions
Warning: The next steps will have you directly accessing your server's hard drive configuration. Please do not proceed on a live server until you've had a chance to work through these steps on a test server. One mistake can blow away all your data.
Start the fdisk shell
fdisk /dev/sda
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').
Command (m for help):
View the current configuration with the print option
p
Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000c6fe1
Device Boot Start End Blocks Id System
/dev/sda1 1 5100 40960000 fd Linux raid autodetect
/dev/sda2 5100 5622 4194304 fd Linux raid autodetect
/dev/sda3 * 5622 5654 256000 fd Linux raid autodetect
Command (m for help):
Now we know for sure that the next free partition number is 4. We will now create the new partition.
n
Command action
e extended
p primary partition (1-4)
We will make it a primary partition
p
Selected partition 4
First cylinder (5654-60801, default 5654):
Then we simply hit <enter> to select the default starting block.
<enter>
Using default value 5654
Last cylinder, +cylinders or +size{K,M,G} (5654-60801, default 60801):
Once again we will press <enter> to select the default ending block.
<enter>
Using default value 60801
Command (m for help):
Now we need to change the type of partition that it is.
t
Partition number (1-4):
We know that we are modifying partition number 4.
4
Hex code (type L to list codes):
Now we need to set the hex code for the partition type to set. We want to set fd, which defines Linux raid autodetect.
fd
Changed system type of partition 4 to fd (Linux raid autodetect)
Now check that everything went as expected by once again printing the partition table.
p
Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000c6fe1
Device Boot Start End Blocks Id System
/dev/sda1 1 5100 40960000 fd Linux raid autodetect
/dev/sda2 5100 5622 4194304 fd Linux raid autodetect
/dev/sda3 * 5622 5654 256000 fd Linux raid autodetect
/dev/sda4 5654 60801 442972704+ fd Linux raid autodetect
Command (m for help):
There it is. So finally, we need to write the changes to the disk.
w
The partition table has been altered!
Calling ioctl() to re-read partition table.
WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.
If you see the above message, do not reboot until both drives have been setup. You might as well reboot once only.
Repeat these steps for the second drive, /dev/sdb and then reboot if needed.
Creating The New /dev/mdX Device
If you only have one drive, skip this step.
Now we need to use mdadm to create the new RAID level 1 device. This will be used as the device that DRBD will directly access.
mdadm --create /dev/md3 --homehost=localhost.localdomain --raid-devices=2 --level=1 /dev/sda4 /dev/sdb4
mdadm: Note: this array has metadata at the start and
may not be suitable as a boot device. If you plan to
store '/boot' on this device please ensure that
your boot-loader understands md/v1.x metadata, or use
--metadata=0.90
Seeing as /boot doesn't exist on this device, we can safely ignore this warning.
y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md/md4 started.
You can now cat /proc/mdstat to verify that it indeed built. If you're interested, you could open a new terminal window and use watch cat /proc/mdstat and watch the array build.
cat /proc/mdstat
md3 : active raid1 sdb4[1] sda4[0]
442971544 blocks super 1.2 [2/2] [UU]
[>....................] resync = 0.8% (3678976/442971544) finish=111.0min speed=65920K/sec
md2 : active raid1 sda2[0] sdb2[1]
4193272 blocks super 1.1 [2/2] [UU]
md1 : active raid1 sda1[0] sdb1[1]
40958908 blocks super 1.1 [2/2] [UU]
bitmap: 1/1 pages [4KB], 65536KB chunk
md0 : active raid1 sda3[0] sdb3[1]
255988 blocks super 1.0 [2/2] [UU]
unused devices: <none>
Finally, we need to make sure that the new array will start when the system boots. To do this, we'll again use mdadm, but with different options that will have it output data in a format suitable for the /etc/mdadm.conf file. We'll redirect this output to that config file, thus updating it.
mdadm --detail --scan | grep md3 >> /etc/mdadm.conf
cat /etc/mdadm.conf
# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=b58df6d0:d925e7bb:c156168d:47c01718
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=ac2cf39c:77cd0314:fedb8407:9b945bb5
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=4e513936:4a966f4e:0dd8402e:6403d10d
ARRAY /dev/md3 metadata=1.2 name=localhost.localdomain:3 UUID=f0b6d0c1:490d47e7:91c7e63a:f8dacc21
You'll note that the last line, which we just added, is different from the previous lines. This isn't a concern, but you are welcome to re-write it to match the existing format if you wish.
Before you proceed, it is strongly advised that you reboot each node and then verify that the new array did in fact start with the system. You do not need to wait for the sync to finish before rebooting. It will pick up where you left off once rebooted.
Configuration Files
DRBD uses a global configuration file, /etc/drbd.d/global_common.conf, and one or more resource files. The resource files need to be created in the /etc/drbd.d/ directory and must have the suffix .res. For this example, we will create a single resource called r0 which we will configure in /etc/drbd.d/r0.res.
/etc/drbd.d/global_common.conf
The stock /etc/drbd.d/global_common.conf is sane, so we won't bother altering it here.
Full details on all the drbd.conf configuration file directives and arguments can be found here. Note: That link doesn't show this new configuration format. Please see Novell's link.
/etc/drbd.d/r0.res
This is the important part. This defines the resource to use, and must reflect the IP addresses and storage devices that DRBD will use for this resource.
vim /etc/drbd.d/r0.res
# This is the name of the resource and it's settings. Generally, 'r0' is used
# as the name of the first resource. This is by convention only, though.
resource r0
{
# This tells DRBD where to make the new resource available at on each
# node. This is, again, by convention only.
device /dev/drbd0;
# The main argument here tells DRBD that we will have proper locking
# and fencing, and as such, to allow both nodes to set the resource to
# 'primary' simultaneously.
net
{
allow-two-primaries;
}
# This tells DRBD to automatically set both nodes to 'primary' when the
# nodes start.
startup
{
become-primary-on both;
}
# This tells DRBD to look for and store it's meta-data on the resource
# itself.
meta-disk internal;
# The name below must match the output from `uname -n` on each node.
on an-node01.alteeve.com
{
# This must be the IP address of the interface on the storage
# network (an-node01.sn, in this case).
address 10.0.0.71:7789;
# This is the underlying partition to use for this resource on
# this node.
disk /dev/md3;
}
# Repeat as above, but for the other node.
on an-node02.alteeve.com
{
address 10.0.0.72:7789;
disk /dev/md3;
}
}
This file must be copied to BOTH nodes and must match before you proceed.
Starting The DRBD Resource
From the rest of this section, pay attention to whether you see
- Node1
- Node2
- Both
These indicate which node to run the following commands on. There is no functional difference between either node, so just randomly choose one to be Node1 and the other will be Node2. Once you've chosen which is which, be consistent with which node you run the commands on. Of course, if a command block is proceeded by Both, run the following code block on both nodes.
Initialize The Block Device
Node1
This step creates the DRBD meta-data on the new DRBD device. It is only needed when creating new DRBD partitions.
drbdadm create-md r0
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success
Monitoring Progress
Both
I find it very useful to monitor DRBD while running the rest of the setup. To do this, open a second terminal on each node and use watch to keep an eye on /proc/drbd. This way you will be able to monitor the progress of the array in near-real time.
Both
watch cat /proc/drbd
At this stage, it should look like this:
version: 8.3.7 (api:88/proto:86-91)
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@xenmaster002.iplink.net, 2010-09-07 16:02:46
0: cs:Unconfigured
Starting the Resource
Both
This will attach the backing device, /dev/md3 in our case, and then start the new resource r0.
drbdadm up r0
There will be no output at the command line. If you are watching /proc/drbd though, you should now see something like this:
version: 8.3.7 (api:88/proto:86-91)
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@xenmaster002.iplink.net, 2010-09-07 16:02:46
0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:442957988
That it is Secondary/Secondary and Inconsistent/Inconsistent is expected.
Setting the First Primary Node
Node1
As this is a totally new resource, DRBD doesn't know which side of the array is "more valid" than the other. In reality, neither is as there was no existing data of note on either node. This means that we now need to choose a node and tell DRBD to treat it as the "source" node. This step will also tell DRBD to make the "source" node primary. Once set, DRBD will begin sync'ing in the background.
drbdadm -- --overwrite-data-of-peer primary r0
As before, there will be no output at the command line, but /proc/drbd will change to show the following:
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@xenmaster002.iplink.net, 2010-09-07 16:02:46
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
ns:69024 nr:0 dw:0 dr:69232 al:0 bm:4 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:442888964
[>....................] sync'ed: 0.1% (432508/432576)M
finish: 307:33:42 speed: 320 (320) K/sec
If you're watching the secondary node, the /proc/drbd will show ro:Secondary/Primary ds:Inconsistent/UpToDate. This is, as you can guess, simply a reflection of it being the "over-written" node.
Setting the Second Node to Primary
Node2
The last step to complete the array is to tell the second node to also become primary.
drbdadm primary r0
As with many drbdadm commands, nothing will be printed to the console. If you're watching the /proc/drbd though, you should see something like Primary/Primary ds:UpToDate/Inconsistent. The Inconsistent flag will remain until the sync is complete.
A Note On sync Speed
You will notice in the previous step that the sync speed seems awfully slow at 320 (320) K/sec.
This is not a problem!
As actual data is written to either side of the array, that data will be immediately copied to both nodes. As such, both nodes will always contain up to date copies of the real data. Given this, the syncer is intentionally set low so as to not put too much load on the underlying disks that could cause slow downs. If you still wish to increase the sync speed, you can do so with the following command.
drbdsetup /dev/drbd0 syncer -r 100M
The speed-up will not be instant. It will take a little while for the speed to pick up. Once the sync is finished, it is a good idea to revert to the default sync rate.
drbdadm syncer r0
Making DRBD Start At Boot
This is simply a matter of using chkconfig.
chkconfig drbd on
ls -lah /etc/rc3.d/ | grep drbd
This should output something like:
lrwxrwxrwx. 1 root root 14 Sep 8 17:53 S70drbd -> ../init.d/drbd
Setting up CLVM
The goal of DRBD in the cluster is to provide clustered LVM, referred to as CLVM to the nodes. This is done by turning the DRBD partition into an CLVM physical volume.
So now we will create a PV on top of the new DRBD partition, /dev/drbd0, that we created in the previous step. Since this new LVM PV will exist on top of the shared DRBD partition, whatever get written to it's logical volumes will be immediately available on either node, regardless of which node actually initiated the write.
This capability is the underlying reason for creating this cluster; Neither machine is truly needed so if one machine dies, anything on top of the DRBD partition will still be available. When the failed machine returns, the surviving node will have a list of what blocks changed while the other node was gone and can use this list to quickly re-sync the other server.
Making LVM Cluster-Aware
Normally, LVM is run on a single server. This means that at any time, the LVM can write data to the underlying drive and not need to worry if any other device might change anything. In clusters, this isn't the case. The other node could try to write to the shared storage, so then nodes need to enable "locking" to prevent the two nodes from trying to work on the same bit of data at the same time.
The process of enabling this locking is known as making LVM "cluster-aware".
LVM has tool called lvmconf that can be used to enable LVM locking. This is provided as part of the lvm2-cluster package.
yum install lvm2-cluster.x86_64
Now to enable cluster awareness in LVM, run to following command.
lvmconf --enable-cluster
Enabling Cluster Locking
By default, clvmd, the cluster lvm daemon, is stopped and not set to run on boot. Now that we've enabled LVM locking, we need to start it:
/etc/init.d/clvmd status
clvmd is stopped
active volumes: lv_drbd lv_root lv_swap
As expected, it is stopped, so lets start it:
/etc/init.d/clvmd start
Stopping clvm: [ OK ]
Starting clvmd: [ OK ]
Activating VGs: 3 logical volume(s) in volume group "an-lvm01" now active
[ OK ]
Creating a new PV using the DRBD Partition
We can now proceed with setting up the new DRBD-based LVM physical volume. Once the PV is created, we can create a new volume group and start allocating space to logical volumes.
Note: As we will be using our DRBD device, and as it is a shared block device, most of the following commands only need to be run on one node. Once the block device changes in any way, those changes will near-instantly appear on the other node. For this reason, unless explicitly stated to do so, only run the following commands on one node.
To setup the DRBD partition as an LVM PV, run pvcreate:
pvcreate /dev/drbd0
Physical volume "/dev/drbd0" successfully created
Now, on both nodes, check that the new physical volume is visible by using pvdisplay:
pvdisplay
--- Physical volume ---
PV Name /dev/md1
VG Name vg_01
PV Size 465.52 GiB / not usable 15.87 MiB
Allocatable yes
PE Size 32.00 MiB
Total PE 14896
Free PE 782
Allocated PE 14114
PV UUID BuR5uh-R74O-kACb-S1YK-MHxd-9O69-yo1EKW
"/dev/drbd0" is a new physical volume of "399.99 GiB"
--- NEW Physical volume ---
PV Name /dev/drbd0
VG Name
PV Size 399.99 GiB
Allocatable NO
PE Size 0
Total PE 0
Free PE 0
Allocated PE 0
PV UUID LYOE1B-22fk-LfOn-pu9v-9lhG-g8vx-cjBnsY
If you see PV Name /dev/drbd0 on both nodes, then your DRBD setup and LVM configuration changes are working perfectly!
Creating a VG on the new PV
Now we need to create the volume group using the vgcreate command:
vgcreate -c y drbd_vg0 /dev/drbd0
Clustered volume group "drbd_vg0" successfully created
Now we'll check that the new VG is visible on both nodes using vgdisplay:
vgdisplay
--- Volume group ---
VG Name vg_01
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 6
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 3
Open LV 3
Max PV 0
Cur PV 1
Act PV 1
VG Size 465.50 GiB
PE Size 32.00 MiB
Total PE 14896
Alloc PE / Size 14114 / 441.06 GiB
Free PE / Size 782 / 24.44 GiB
VG UUID YbHSKn-x64P-oEbe-8R0S-3PjZ-UNiR-gdEh6T
--- Volume group ---
VG Name drbd_vg0
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 1
VG Access read/write
VG Status resizable
Clustered yes
Shared no
MAX LV 0
Cur LV 0
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 399.98 GiB
PE Size 4.00 MiB
Total PE 102396
Alloc PE / Size 0 / 0
Free PE / Size 102396 / 399.98 GiB
VG UUID NK00Or-t9Z7-9YHz-sDC8-VvBT-NPeg-glfLwy
If the new VG is visible on both nodes, we are ready to create our first logical volume using the lvcreate tool.
Creating the First Two LVs on the new VG
Now we'll create a simple 20 GiB logical volumes. We will use it as a shared GFS store for source ISOs (and Xen domU config files) later on.
lvcreate -L 20G -n iso_store drbd_vg0
Logical volume "iso_store" created
As before, we will check that the new logical volume is visible from both nodes by using the lvdisplay command:
lvdisplay
--- Logical volume ---
LV Name /dev/vg_01/lv_root
VG Name vg_01
LV UUID dl6jxD-asN7-bGYL-H4yO-op6q-Nt6y-RxkPnt
LV Write Access read/write
LV Status available
# open 1
LV Size 39.06 GiB
Current LE 1250
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0
--- Logical volume ---
LV Name /dev/vg_01/lv_swap
VG Name vg_01
LV UUID VL3G06-Ob0o-sEB9-qNX3-rIAJ-nzW5-Auf64W
LV Write Access read/write
LV Status available
# open 1
LV Size 2.00 GiB
Current LE 64
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:1
--- Logical volume ---
LV Name /dev/vg_01/lv_drbd
VG Name vg_01
LV UUID SRT3N5-kA84-I3Be-LI20-253s-qTGT-fuFPfr
LV Write Access read/write
LV Status available
# open 2
LV Size 400.00 GiB
Current LE 12800
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:2
--- Logical volume ---
LV Name /dev/drbd_vg0/iso_store
VG Name drbd_vg0
LV UUID H0M5fL-Wxb6-o8cb-Wb30-Rla3-fwzp-tzdR62
LV Write Access read/write
LV Status available
# open 0
LV Size 20.00 GiB
Current LE 5120
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:3
The last two are the new logical volumes.
GFS is a cluster-aware file system that can be simultaneously mounted on two or more nodes at once. We will use it as a place to store ISOs that we'll use to provision our virtual machines.
Start by installing the GFS2 tools:
yum install gfs2-utils.x86_64
As before, modify the gfs2 init script to start after clvmd, and then modify xendomains to start after gfs2. Finally, use chkconfig to reconfigure the boot order:
chkconfig xend off; chkconfig cman off; chkconfig drbd off; chkconfig clvmd off; chkconfig xendomains off; chkconfig gfs2 off
chkconfig xend on; chkconfig cman on; chkconfig drbd on; chkconfig clvmd on; chkconfig xendomains on; chkconfig gfs2 on
The following example is designed for the cluster used in this paper.
- If you have more than 2 nodes, increase the -j 2 to the number of nodes you want to mount this file system on.
- If your cluster is named something other than an-cluster (as set in the cluster.conf file), change -t an-cluster:iso_store to match you cluster's name. The iso_store can be whatever you like, but it must be unique in the cluster. I tend to use a name that matches the LV name, but this is my own preference and is not required.
To format the partition run:
mkfs.gfs2 -p lock_dlm -j 2 -t xencluster001:iso_store /dev/drbd_vg0/iso_store
If you are prompted, press y to proceed.
Once the format completes, you can mount /dev/drbd_vg0/iso_store as you would a normal file system.
Both:
To complete the example, lets mount the GFS2 partition we made just now on /shared.
mkdir /shared
mount /dev/drbd_vg0/iso_store /shared
Done!
Growing a GFS2 Partition
To grow a GFS2 partition, you must know where it is mounted. You can not grow an unmounted GFS2 partition, as odd as that may seem at first. Also, you only need to run grow commands from one node. Once completed, all nodes will see and use the new free space automatically.
This requires two steps to complete:
- Extend the underlying LVM logical volume
- Grow the actual GFS2 partition
Extend the LVM LV
To keep things simple, we'll just use some of the free space we left on our /dev/drbd0 LVM physical volume. If you need to add more storage to your LVM first, please follow the instructions in the article: "Adding Space to an LVM" before proceeding.
Let's add 50GB to our GFS2 logical volume /dev/drbd_vg0/iso_store from the /dev/drbd0 physical volume, which we know is available because we left more than that back when we first setup our LVM. To actually add the space, we need to use the lvextend command:
lvextend -L +50G /dev/drbd_vg0/iso_store /dev/drbd0
Which should return:
Extending logical volume iso_store to 70.00 GB
Logical volume iso_store successfully resized
If we run lvdisplay /dev/drbd_vg0/iso_store now, we should see the extra space.
--- Logical volume ---
LV Name /dev/drbd_vg0/iso_store
VG Name drbd_vg0
LV UUID svJx35-KDXK-ojD2-UDAA-Ah9t-UgUl-ijekhf
LV Write Access read/write
LV Status available
# open 1
LV Size 70.00 GB
Current LE 17920
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:3
You're now ready to proceed.
Grow The GFS2 Partition
This step is pretty simple, but you need to enter the commands exactly. Also, you'll want to do a dry-run first and address any resulting errors before issuing the final gfs2_grow command.
To get the exact name to use when calling gfs2_grow, run the following command:
gfs2_tool df
/shared:
SB lock proto = "lock_dlm"
SB lock table = "an-cluster:iso_store"
SB ondisk format = 1801
SB multihost format = 1900
Block size = 4096
Journals = 2
Resource Groups = 80
Mounted lock proto = "lock_dlm"
Mounted lock table = "an-cluster:iso_store"
Mounted host data = "jid=1:id=196610:first=0"
Journal number = 1
Lock module flags = 0
Local flocks = FALSE
Local caching = FALSE
Type Total Blocks Used Blocks Free Blocks use%
------------------------------------------------------------------------
data 5242304 1773818 3468486 34%
inodes 3468580 94 3468486 0%
From this output, we know that GFS2 expects the name "/shared". Even adding something as simple as a trailing slash will not work. The program we will use is called gfs2_grow with the -T switch to run the command as a test to work out possible errors.
For example, if you added the trailing slash, this is the kind of error you would see:
Bad command:
gfs_grow -T /shared/
GFS Filesystem /shared/ not found
Once we get it right, it will look like this:
gfs_grow -T /shared
(Test mode--File system will not be changed)
FS: Mount Point: /shared
FS: Device: /dev/mapper/drbd_vg0-iso_store
FS: Size: 5242878 (0x4ffffe)
FS: RG size: 65535 (0xffff)
DEV: Size: 18350080 (0x1180000)
The file system grew by 51200MB.
gfs2_grow complete.
This looks good! We're now ready to re-run the command without the -T switch:
gfs_grow /shared
FS: Mount Point: /shared
FS: Device: /dev/mapper/drbd_vg0-iso_store
FS: Size: 5242878 (0x4ffffe)
FS: RG size: 65535 (0xffff)
DEV: Size: 18350080 (0x1180000)
The file system grew by 51200MB.
gfs2_grow complete.
You can check that the new space is available on both nodes now using a simple call like df -h.
Altering Start/Stop Orders
Given DRBD's role in the cluster, we need to make sure that cman starts, then drbd and finally clvmd. The shut down order needs to be in reverse.
This is because cman provides DLM and fencing. The drbd daemon relies on this fencing and clvmd requires the distributed locking. Further, drbd needs to start before clvmd so that when clvmd goes looking for it's PV, it can find it.
To make sure the start order is sane then, we'll edit each of the three daemon's init scripts and alter their Required-Start and Required-Stop lines, then make the changes take effect by using chkconfig to remove and re-add them to the start levels.
Altering cman
We will tell cman to stop after drbd only.
vim /etc/init.d/cman
#!/bin/bash
#
# cman - Cluster Manager init script
#
# chkconfig: - 21 79
# description: Starts and stops cman
#
#
### BEGIN INIT INFO
# Provides: cman
# Required-Start: $network $time
# Required-Stop: $network $time drbd
# Default-Start:
# Default-Stop:
# Short-Description: Starts and stops cman
# Description: Starts and stops the Cluster Manager set of daemons
### END INIT INFO
Altering drbd
Now we will tell drbd to start after cman and to not stop until clvmd has stopped.
This requires the additional step of altering the chkconfig: - 70 08 line to instead read chkconfig: - 20 08. This isn't strictly needed, but will give more room for chkconfig to order the dependent daemons by allowing DRBD to be started as low as position 20, rather than waiting until position 70. This is somewhat more compatible with cman and clvmd which normally start at positions 21 and 24, respectively
vim /etc/init.d/drbd
#!/bin/bash
#
# chkconfig: - 20 08
# description: Loads and unloads the drbd module
#
# Copright 2001-2008 LINBIT Information Technologies
# Philipp Reisner, Lars Ellenberg
#
### BEGIN INIT INFO
# Provides: drbd
# Required-Start: $local_fs $network $syslog cman
# Required-Stop: $local_fs $network $syslog clvmd
# Should-Start: sshd multipathd
# Should-Stop: sshd multipathd
# Default-Start:
# Default-Stop:
# Short-Description: Control drbd resources.
### END INIT INFO
Altering clvmd
Lastly, we will now tell clvmd to start after drbd.
vim /etc/init.d/clvmd
### BEGIN INIT INFO
# Provides: clvmd
# Required-Start: $local_fs drbd
# Required-Stop: $local_fs
# Default-Start:
# Default-Stop: 0 1 6
# Short-Description: Clustered LVM Daemon
### END INIT INFO
Applying The Changes
Change the start order by removing and re-adding all cluster-related daemons using chkconfig.
chkconfig drbd off; chkconfig cman off; chkconfig clvmd off; chkconfig drbd on; chkconfig cman on; chkconfig clvmd on
Any questions, feedback, advice, complaints or meanderings are welcome. | |||
Alteeve's Niche! | Alteeve Enterprise Support | Community Support | |
© 2025 Alteeve. Intelligent Availability® is a registered trademark of Alteeve's Niche! Inc. 1997-2025 | |||
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions. |