Hard drive has gone bad in DRBD: Difference between revisions
Line 333: | Line 333: | ||
= Migrate the VMs off of the Effected Node = | = Migrate the VMs off of the Effected Node = | ||
{{note|1=Re-write this to use <span class="code">clusvcadm</span>.}} | |||
From either node, preferably the good node, ssh into in with X-Forwarding enabled and then start '''convirt'''. In our example, '''Node1''' is effected, so we will connect to '''Node2'''. | From either node, preferably the good node, ssh into in with X-Forwarding enabled and then start '''convirt'''. In our example, '''Node1''' is effected, so we will connect to '''Node2'''. |
Latest revision as of 15:03, 11 July 2011
Alteeve Wiki :: How To :: Hard drive has gone bad in DRBD |
So you've lost or are losing a hard drive in one of the cluster nodes.
Steps needed:
- Identify the failed drive. This example will use /dev/sda on Node1.
- Migrate the hosted VMs to the ok node. This document will migrate from Node1 to Node2.
- Break the RAID 1 mirror by removing the defective drive from the effected MD device. Here we will remove /dev/sda from the /dev/md0 device.
- Power off the defective server, physically replace the effected drive and power the repaired server back on.
- Add the replaced /dev/sda into /dev/md0 and begin the RAID 1 rebuild procedure.
- Migrate the effected virtual servers back onto the effected server.
Identifying the Failed Drive
SMART Control
If it's not clear, check the drives' states using smartctl. For each questionable drive, run:
smartctl -a /dev/sda
Replace sda with the drive you want to examine. You should see output like:
Good Drive
Good Drive (/dev/sdb):
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: ST31500341AS
Serial Number: 9VS1XL54
Firmware Version: CC1H
User Capacity: 1,500,301,910,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Wed Feb 3 12:35:54 2010 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 617) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 113 099 006 Pre-fail Always - 55840141
3 Spin_Up_Time 0x0003 100 099 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 101
5 Reallocated_Sector_Ct 0x0033 097 097 036 Pre-fail Always - 151
7 Seek_Error_Rate 0x000f 079 060 030 Pre-fail Always - 89466942
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3584
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 102
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 093 000 Old_age Always - 71
189 High_Fly_Writes 0x003a 066 066 000 Old_age Always - 34
190 Airflow_Temperature_Cel 0x0022 069 052 045 Old_age Always - 31 (Lifetime Min/Max 23/48)
194 Temperature_Celsius 0x0022 031 048 000 Old_age Always - 31 (0 20 0 0)
195 Hardware_ECC_Recovered 0x001a 032 024 000 Old_age Always - 55840141
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 51737176051199
241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 1591700793
242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 95914747
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 3 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Bad Drive Output
Bad Drive (/dev/sda):
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: ST31500341AS
Serial Number: 9VS1Q4Q3
Firmware Version: CC1H
User Capacity: 1,500,301,910,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Wed Feb 3 12:37:38 2010 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 609) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 111 099 006 Pre-fail Always - 229324280
3 Spin_Up_Time 0x0003 100 099 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 95
5 Reallocated_Sector_Ct 0x0033 096 096 036 Pre-fail Always - 191
7 Seek_Error_Rate 0x000f 065 058 030 Pre-fail Always - 30092885574
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3541
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 95
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 076 076 000 Old_age Always - 24
188 Unknown_Attribute 0x0032 100 097 000 Old_age Always - 4295032890
189 High_Fly_Writes 0x003a 001 001 000 Old_age Always - 241
190 Airflow_Temperature_Cel 0x0022 067 054 045 Old_age Always - 33 (Lifetime Min/Max 23/46)
194 Temperature_Celsius 0x0022 033 046 000 Old_age Always - 33 (0 21 0 0)
195 Hardware_ECC_Recovered 0x001a 026 026 000 Old_age Always - 229324280
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 5
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 5
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 6
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 229084965637588
241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 2647572671
242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 3166798893
SMART Error Log Version: 1
ATA Error Count: 24 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 24 occurred at disk power-on lifetime: 1785 hours (74 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff ef 00 19d+13:50:04.321 READ DMA EXT
27 00 00 00 00 00 e0 00 19d+13:50:04.291 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 02 19d+13:50:04.283 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 02 19d+13:50:04.238 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 19d+13:50:04.115 READ NATIVE MAX ADDRESS EXT
Error 23 occurred at disk power-on lifetime: 1785 hours (74 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff ef 00 19d+13:50:01.385 READ DMA EXT
27 00 00 00 00 00 e0 00 19d+13:50:01.355 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 02 19d+13:50:01.347 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 02 19d+13:50:01.325 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 19d+13:50:01.275 READ NATIVE MAX ADDRESS EXT
Error 22 occurred at disk power-on lifetime: 1785 hours (74 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff ef 00 19d+13:49:58.449 READ DMA EXT
27 00 00 00 00 00 e0 00 19d+13:49:58.419 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 02 19d+13:49:58.411 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 02 19d+13:49:58.366 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 19d+13:49:58.247 READ NATIVE MAX ADDRESS EXT
Error 21 occurred at disk power-on lifetime: 1785 hours (74 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff ef 00 19d+13:49:55.488 READ DMA EXT
27 00 00 00 00 00 e0 00 19d+13:49:55.459 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 02 19d+13:49:55.451 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 02 19d+13:49:55.428 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 19d+13:49:55.379 READ NATIVE MAX ADDRESS EXT
Error 20 occurred at disk power-on lifetime: 1785 hours (74 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff ef 00 19d+13:49:52.540 READ DMA EXT
27 00 00 00 00 00 e0 00 19d+13:49:52.511 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 02 19d+13:49:52.488 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 02 19d+13:49:52.463 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 19d+13:49:52.420 READ NATIVE MAX ADDRESS EXT
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
You will notice in the above example that the drive's SMART status is OK, but that it has generated errors. These errors were sufficient to cause poor enough performance to trigger a fence against the effected server.
Locating the Physical Drive
Once you know the logical block device path, /dev/sda here, you will need to locate it's physical position in the server. To do this, reference the docs for the effected server. This mapping should have been recorded when the node was built.
If it wasn't, first go kick the admin in the shins. Next, you will need to guess which is which. We can make an educated guess though because the above output includes the Serial Number (9VS1Q4Q3 above). In fact, reference the Serial Number anyway, in case the OS changed things up at some point.
- Node2
- Node1
Using the serial number and the docs in Node1, we know that the effected drive is in Tray 1.
Migrate the VMs off of the Effected Node
![]() |
Note: Re-write this to use clusvcadm. |
From either node, preferably the good node, ssh into in with X-Forwarding enabled and then start convirt. In our example, Node1 is effected, so we will connect to Node2.
From your PC:
ssh root@an-node02.alteeve.com -X
Once on Node2:
convirt &
Convirt
With convirt running, connect to each node by clicking on their names user the Servers item. This will prompt you to enter the Node's root password.
Then, for each VM on the effected node, do the following:
- Click to highlight the VM.
- Click on Migrate.
- Select the good node as the destination. This is Node2 in this example.
- Confirm the live migration.
- Note: The migration could take some time, so be sure to warn John or whomever might be using the effected VM prior to initiating the migration. No processes will need to be stopped, but to the user, the VM will appear to "freeze" for the duration of the migration.
xm
If you do not want to use convirt, you can use the xm command line tool to perform the migration procedure. The syntax is:
UNTESTED!
#xm migrate [domain_id] [host] -l
xm migrate sql01 an-node02 -l
Break the RAID Arrays
We will tell the array that the drive /dev/sda is no longer usable.
Confirming Your Partition Structure
Continuing our example, we will need to replace /dev/sda which has four partitions:
/dev/sda1; In /dev/md0 - 250MB '/boot' partition /dev/sda2; In /dev/md2 - 2000MB '<swap>' partition /dev/sda3; In /dev/md1 - 20000MB '/' partition /dev/sda5; In /dev/md3 - 1+MB '<LVM>' partition
Confirm the above configuration by checking /proc/mdadm:
cat /proc/mdstat
Which should show:
md0 : active raid1 sdb1[1] sda1[0]
256896 blocks [2/2] [UU]
md1 : active raid1 sdb3[1] sda3[0]
2048192 blocks [2/2] [UU]
md3 : active raid1 sdb5[1] sda5[0]
1442347712 blocks [2/2] [UU]
md2 : active raid1 sdb2[1] sda2[0]
20482752 blocks [2/2] [UU]
unused devices: <none>
Depending on how your drive has failed, you may see one or more entries with: [_U]. If this is the case, the corresponding partition may be absent or, if there, will look like: sda2[2](F).
Failing The mdX Devices
Given these four partitions, we will need to run the following commands to remove their four partitions from their four respective /dev/mdX devices. Adapt this to your needs:
mdadm --fail /dev/md0 /dev/sda1
mdadm --fail /dev/md1 /dev/sda3
mdadm --fail /dev/md2 /dev/sda2
mdadm --fail /dev/md3 /dev/sda5
Confirm that all the arrays are now broken by again running:
cat /proc/mdstat
Which should show:
md0 : active raid1 sdb1[1] sda1[0](F)
256896 blocks [2/1] [_U]
md1 : active raid1 sdb3[1] sda3[0](F)
2048192 blocks [2/1] [_U]
md3 : active raid1 sdb5[1]
1442347712 blocks [2/1] [_U]
md2 : active raid1 sdb2[1] sda2[0](F)
20482752 blocks [2/1] [_U]
unused devices: <none>
In the above example, sda1, sda2 and sda3 where failed my the mdadm --fail call while sda5 has failed in such a way that it is not visible at all.
Replace The Defective Drive
With the knowledge of the defective drive's serial number and port in hand, power off the server.
DO NOT POWER IT BACK ON WHILE CONNECTED TO THE NETWORK!!!.
Under no circumstances do we want the cluster the re-assemble until after the defective drive has been replaced, re-added to the array and confirmed good!
Prepare the Replacement Drive
I prefer to pre-partition the replacement drive on a separate workstation, but this can be done safely in the server itself once it's been installed. If you wish to delay partitioning until then, skip to the next step and then return to here once you reach: #Power the Node Back in SINGLE USER MODE below.
Ensure the New Drive is Blank
In my case, the replacement drive comes up on my workstation as /dev/sdb. If yours is different, simple replace sdb with your drive letter.
First, wipe the drive by writing 10000 blocks to the drive using dd from /dev/zero. Confirm the drive is where we expect it:
fdisk -l
Disk /dev/sda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x13662e6d
Device Boot Start End Blocks Id System
/dev/sda1 * 1 1216 9767488+ 7 HPFS/NTFS
/dev/sda2 1217 1340 996030 82 Linux swap / Solaris
/dev/sda3 1341 9729 67384642+ 83 Linux
Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000f0012
Device Boot Start End Blocks Id System
/dev/sdb1 * 1 32 257008+ fd Linux raid autodetect
/dev/sdb2 33 287 2048287+ fd Linux raid autodetect
/dev/sdb3 288 182401 1462830705 fd Linux raid autodetect
In the case above, the replacement drive had three partitions on it. New drives will usually be blank. Also, I know that the /dev/sdb is the right drive by looking at their capacities. I could further confirm this using smartctl -a /dev/sdb if I had any doubt.
Now blank the drive:
dd if=/dev/zero of=/dev/sdb count=10000
10000+0 records in
10000+0 records out
5120000 bytes (5.1 MB) copied, 1.40715 s, 3.6 MB/s
Now confirm that the drive is clear by re-running fdisk -l:
fdisk -l
Disk /dev/sda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x13662e6d
Device Boot Start End Blocks Id System
/dev/sda1 * 1 1216 9767488+ 7 HPFS/NTFS
/dev/sda2 1217 1340 996030 82 Linux swap / Solaris
/dev/sda3 1341 9729 67384642+ 83 Linux
Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
Disk /dev/sdb doesn't contain a valid partition table
Perfect!
Create a Duplicate Partition Structure
Now we need to create the new partitions in such a way to identically mimic the old drive.
To do this, run fdisk against an okay drive and take node of the start and end cylinders for each good partition. These will be our guide when re-creating the partition scheme on the replacement drive.
Here is the output from a good drive on the surviving node:
fdisk -l /dev/sda
Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 32 257008+ fd Linux raid autodetect
/dev/sda2 33 2582 20482875 fd Linux raid autodetect
/dev/sda3 2583 2837 2048287+ fd Linux raid autodetect
/dev/sda4 2838 182401 1442347830 5 Extended
/dev/sda5 2838 182401 1442347798+ fd Linux raid autodetect
The Start and End columns have the values we will need to set for the corresponding partitions.
Below is a fairly large dump from my terminal using fdisk. I prefer this method over graphical tools as I can be very precise this way. I'll assume here that you are familiar with the fdisk shell. If not, GO LEARN before proceeding!
Start the fdisk shell:
fdisk /dev/sdb
And build the partitions:
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x2d841b1b.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.
The number of cylinders for this disk is set to 182401.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-182401, default 1):
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-182401, default 182401): 32
Command (m for help): a
Partition number (1-4): 1
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (33-182401, default 33):
Using default value 33
Last cylinder, +cylinders or +size{K,M,G} (33-182401, default 182401): 2582
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (2583-182401, default 2583):
Using default value 2583
Last cylinder, +cylinders or +size{K,M,G} (2583-182401, default 182401): 2837
Command (m for help): n
Command action
e extended
p primary partition (1-4)
e
Selected partition 4
First cylinder (2838-182401, default 2838):
Using default value 2838
Last cylinder, +cylinders or +size{K,M,G} (2838-182401, default 182401):
Using default value 182401
Command (m for help): n
First cylinder (2838-182401, default 2838):
Using default value 2838
Last cylinder, +cylinders or +size{K,M,G} (2838-182401, default 182401):
Using default value 182401
Command (m for help): t
Partition number (1-5): 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)
Command (m for help): t
Partition number (1-5): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)
Command (m for help): t
Partition number (1-5): 3
Hex code (type L to list codes): fd
Changed system type of partition 3 to fd (Linux raid autodetect)
Command (m for help): t
Partition number (1-5): 5
Hex code (type L to list codes): fd
Changed system type of partition 5 to fd (Linux raid autodetect)
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
What was done above, in short, was;
- Create the first partition as primary ending on cyl. 32
- Set the first partition to be bootable
- Created the remaining partitions as primary, primary, extended, primary.
- Changed the type of all partitions to fd, Linux raid autodetect.
Confirm that the replacement drive now matches what the remaining good drive is partitioned as:
fdisk -l /dev/sdb
/dev/sdb1 * 1 32 257008+ fd Linux raid autodetect
/dev/sdb2 33 2582 20482875 fd Linux raid autodetect
/dev/sdb3 2583 2837 2048287+ fd Linux raid autodetect
/dev/sdb4 2838 182401 1442347830 5 Extended
/dev/sdb5 2838 182401 1442347798+ fd Linux raid autodetect
Perfect! Now we can install it in place of the defective drive.
Power Off and Unrack
poweroff
Unrack the server and move it to a work area.
Replace the Drive
Remove the drive you suspect to be the failed one. Confirm it is the right one by comparing the Serial Number reported by the smartctl -a /dev/sda call from step 1 (switch sda for your drive, of course). Once you have confirmed that the proper drive is in hand, remove it from it's carrier and set it aside to process for RMA later. Install the replacement drive and UPDATE THE DOCS!
Power the Node Back in SINGLE USER MODE
With the server on your work-bench and not connected to any network, power on the server.
If You Don't Get the Grub Screen
If you replaced the first drive (sda), then there is a good chance the node will not boot but instead appear to hang with a black screen. This happens because the replacement drive is flagged bootable but has no data. To get around this, select the Boot Device BBS prompt. On most systems, including Node2 and Node1, this is done by pressing <F8> during the POST. Once you get the Boot Device list, select the second hard drive to boot from and the proceed with the next step.
Boot as Single User
Interrupt the Grub boot screen by pressing <esc> at the appropriate time. With the default kernel selected, press e to edit it. Append the word " single" to the end of the line (note the leading space).
Recovering the RAID Arrays Manually
In this case, the node failed to boot. Under the rescue DVD, I was able to rebuild the arrays manually. This hasn't solved the boot problem yet, but I'll get back to that tomorrow.
RAID Rebuild in Rescue Mode
To add a replacement disk to a busted array under the CentOS DVD in rescue mode, you need to start by writing out a skeleton /etc/mdadm.conf file. Here is one compatible with the Node2 and Node1 nodes.
Note that in this case, /dev/sda has been replaced and /dev/md0 wouldn't build because, for some reason, mdadm was detecting a superblock on /dev/sda1. For this reason, the initial skeleton file left out the /dev/sda1 entry for /dev/md0 at first.
Create the following /etc/mdadm.conf:
vi /etc/mdadm.conf
ARRAY /dev/md0 level=raid1 num-devices=2 devices=/dev/sdb1,/dev/sda1
ARRAY /dev/md1 level=raid1 num-devices=2 devices=/dev/sdb3,/dev/sda3
ARRAY /dev/md2 level=raid1 num-devices=2 devices=/dev/sdb2,/dev/sda2
ARRAY /dev/md3 level=raid1 num-devices=2 devices=/dev/sdb5,/dev/sda5
Now re-assemble the array:
mdadm --assemble --scan
mdadm: superblock on /dev/sda1 doesn't match others - assembly aborted
mdadm: /dev/md1 has been started with 1 drive (out of 2).
mdadm: /dev/md2 has been started with 1 drive (out of 2).
mdadm: /dev/md3 has been started with 1 drive (out of 2).
You noticed the error above? To fix this, edit /etc/mdadm.conf and remove the ,/dev/sda1 from the /dev/md0 line. It should now look like:
ARRAY /dev/md0 level=raid1 num-devices=2 devices=/dev/sdb1
ARRAY /dev/md1 level=raid1 num-devices=2 devices=/dev/sdb3,/dev/sda3
ARRAY /dev/md2 level=raid1 num-devices=2 devices=/dev/sdb2,/dev/sda2
ARRAY /dev/md3 level=raid1 num-devices=2 devices=/dev/sdb5,/dev/sda5
Zero-out the /dev/sda1 partition with this command:
dd if=/dev/zero of=/dev/sda1 count=1000
Then re-assemble just the array /dev/md0 with just the device /dev/sdb1 specified:
mdadm --assemble --scan /dev/md0
mdadm: /dev/md0 has been started with 1 drive (out of 2).
Good, now add the /dev/sda1 partition to the /dev/md0 array in /dev/mdadm.conf. Once it's back, add /dev/sda1 to the /dev/md0 array to start the sync process.
mdadm --manage /dev/md0 --add /dev/sda1
mdadm: added /dev/sda1
If this worked, you should see be able to cat /proc/mdstat and see something like:
md0 : active raid1 sda1[0] sdb1[1]
256896 blocks [2/2] [UU]
The other entries will show degraded arrays. Now that we've gotten this far, add the new partitions to the rest of the arrays:
mdadm --manage /dev/md1 --add /dev/sda3
mdadm --manage /dev/md2 --add /dev/sda2
mdadm --manage /dev/md3 --add /dev/sda5
You can now watch the arrays sync with watch:
watch cat /proc/mdstat
Depending on the speed of your drives, you will probably see one of the arrays sync'ing and possible one of the others waiting to sync.
Every 2s: /proc/mdstat Wed Feb 3 22:02:07 2010
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]
md3 : active raid1 sda5[0] sdb5[1]
1442347712 blocks [2/1] [_U]
[==>..................] recovery = 10.4% (151696560/1442347712) finishes=217.4min speed=105770K/sec
md2 : active raid1 sda2[0] sdb2[1]
20482753 blocks [2/2] [UU]
md1 : active raid1 sda1[0] sdb1[1]
2048192 blocks [2/2] [UU]
md0 : active raid1 sda1[0] sdb1[1]
256896 blocks [2/2] [UU]
You can see above how the /dev/md3 array is still sync'ing. You can reboot at this point, but I prefer to wait when I can afford the time.
More to come
...
Any questions, feedback, advice, complaints or meanderings are welcome. | |||
Alteeve's Niche! | Alteeve Enterprise Support | Community Support | |
© 2025 Alteeve. Intelligent Availability® is a registered trademark of Alteeve's Niche! Inc. 1997-2025 | |||
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions. |