RHCS Stable 3 Tutorial - Multinode VM Cluster: Difference between revisions
Line 13: | Line 13: | ||
This is a compression/adaption of beekhof's http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch. Credit to him, errors are mine. | This is a compression/adaption of beekhof's http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch. Credit to him, errors are mine. | ||
== Base Cluster == | |||
notes: | notes: | ||
Line 506: | Line 508: | ||
# If anything returns, address them, clear /var/log/messages and try starting pacemaker again. Repeat until no errors are returned. | # If anything returns, address them, clear /var/log/messages and try starting pacemaker again. Repeat until no errors are returned. | ||
</source> | </source> | ||
== Tools == | |||
The core tool is <span class="code">crm</span>, "cluster resource manager". Run by itself, it starts a shell. Alternatively, it can be passed a single argument, a file with multiple commands and [[STDIN]] can be redirected in. | |||
The main tool to monitor the cluster is <span class="code">crm_mon</span>, which a variant on <span class="code">crm status</span>. | |||
Pacemaker tools use <span class="code">--help</span> for the main usage information. The same help is available via <span class="code">[[man]]</span> pages. | |||
Check the cluster's status: | |||
<source lang="bash"> | |||
crm_mon -1 | |||
</source> | |||
<source lang="text"> | |||
============ | |||
Last updated: Sun May 29 12:26:26 2011 | |||
Stack: openais | |||
Current DC: an-node01.alteeve.com - partition with quorum | |||
Version: 1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f | |||
2 Nodes configured, 2 expected votes | |||
0 Resources configured. | |||
============ | |||
Online: [ an-node02.alteeve.com an-node01.alteeve.com ] | |||
</source> | |||
== Building an Active/Passive Cluster == | |||
<span class="code"></span> | |||
<source lang="bash"> | <source lang="bash"> | ||
</source> | </source> |
Revision as of 16:27, 29 May 2011
AN!Wiki :: How To :: RHCS Stable 3 Tutorial - Multinode VM Cluster |
Warning: This document is very much a work in progress. And and all data here could be wrong, inaccurate and missing important bits of information. In fact, it's little more than a dumping grounds for my notes. You really don't want to take anything below seriously until this warning has been removed. |
Overview
This tutorial will walk you through building two distinct clusters:
- . A 2-Node cluster using DRBD for real-time replicated storage backing an iSCSI SAN using Pacemaker for high availability.
- . A 5-Node cluster hosting KVM virtual servers, each VM hosted on a dedicated LUN from the SAN cluster, using Pacemaker for high availability.
Pacemaker
This is a compression/adaption of beekhof's http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch. Credit to him, errors are mine.
Base Cluster
notes:
- Create a multicast calculator.
- ais_addr is set in beekhof's scriptlet to use the last interface on the system. Manually choose the BCN interface IP.
- Check the ais_* values with `env | grep ais_`.
- Does pacemaker 1.1 in EL6 support a second ring?
- for f in /etc/corosync/corosync.conf /etc/corosync/service.d/pcmk /etc/hosts; do scp $f pcmk-2:$f ; done
yum install pacemaker
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
vim /etc/corosync/corosync.conf
Add/edit the following three lines to the 'interface { }' section:
interface {
ringnumber: 0
# Interface to use for cluster comms (BCN).
bindnetaddr: 192.168.3.0
# Multicast IP used for CPG. Must be unique per cluster.
mcastaddr: 226.94.1.1
# Multicast TCP port. Must be unique per ring.
mcastport: 4000
ttl: 1
}
Create the pacemaker service file.
vim /etc/corosync/service.d/pcmk
service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 1
}
Copy the two files to the other node.
rsync -av /etc/corosync/service.d/pcmk root@an-node02:/etc/corosync/service.d/
sending incremental file list
pcmk
sent 178 bytes received 31 bytes 59.71 bytes/sec
total size is 106 speedup is 0.51
rsync -av /etc/corosync/corosync.conf root@an-node02:/etc/corosync/
sending incremental file list
corosync.conf
sent 526 bytes received 31 bytes 1114.00 bytes/sec
total size is 445 speedup is 0.80
Make the log directory /var/log/cluster writeable by members of the root group, which the pacemaker user is.
chmod g+rwx /var/log/cluster
Start the cluster:
/etc/init.d/corosync start
Starting Corosync Cluster Engine (corosync): [ OK ]
In the log file of the first node to start (second machine joining can be seen at the end):
May 28 01:11:58 an-node02 corosync[2366]: [MAIN ] Corosync Cluster Engine ('1.2.3'): started and ready to provide service.
May 28 01:11:58 an-node02 corosync[2366]: [MAIN ] Corosync built-in features: nss dbus rdma snmp
May 28 01:11:58 an-node02 corosync[2366]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
May 28 01:11:58 an-node02 corosync[2366]: [TOTEM ] Initializing transport (UDP/IP Multicast).
May 28 01:11:58 an-node02 corosync[2366]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
May 28 01:11:58 an-node02 corosync[2366]: [TOTEM ] The network interface [192.168.3.72] is now up.
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: process_ais_conf: Reading configure
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: config_find_init: Local handle: 2013064636357672963 for logging
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: config_find_next: Processing additional logging options...
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: get_config_opt: Found 'off' for option: debug
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: get_config_opt: Found 'yes' for option: to_logfile
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: get_config_opt: Found '/var/log/cluster/corosync.log' for option: logfile
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: get_config_opt: Found 'yes' for option: to_syslog
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: config_find_init: Local handle: 4730966301143465988 for quorum
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: config_find_next: No additional configuration supplied for: quorum
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: get_config_opt: No default for option: provider
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: config_find_init: Local handle: 7739444317642555397 for service
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: config_find_next: Processing additional service options...
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: get_config_opt: Found '1' for option: ver
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: process_ais_conf: Enabling MCP mode: Use the Pacemaker init script to complete Pacemaker startup
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: get_config_opt: Defaulting to 'pcmk' for option: clustername
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_logd
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_mgmtd
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: pcmk_startup: CRM: Initialized
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] Logging: Initialized pcmk_startup
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: pcmk_startup: Service: 10
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: pcmk_startup: Local hostname: an-node02.alteeve.com
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: pcmk_update_nodeid: Local node id: 1208199360
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: update_member: Creating entry for node 1208199360 born on 0
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: update_member: 0x199aa40 Node 1208199360 now known as an-node02.alteeve.com (was: (null))
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: update_member: Node an-node02.alteeve.com now has 1 quorum votes (was 0)
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: update_member: Node 1208199360/an-node02.alteeve.com is now: member
May 28 01:11:58 an-node02 corosync[2366]: [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.1.5
May 28 01:11:58 an-node02 corosync[2366]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
May 28 01:11:58 an-node02 corosync[2366]: [SERV ] Service engine loaded: corosync configuration service
May 28 01:11:58 an-node02 corosync[2366]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
May 28 01:11:58 an-node02 corosync[2366]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
May 28 01:11:58 an-node02 corosync[2366]: [SERV ] Service engine loaded: corosync profile loading service
May 28 01:11:58 an-node02 corosync[2366]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
May 28 01:11:58 an-node02 corosync[2366]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
May 28 01:11:58 an-node02 corosync[2366]: [TOTEM ] Process pause detected for 521 ms, flushing membership messages.
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 4: memb=0, new=0, lost=0
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 4: memb=1, new=1, lost=0
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: pcmk_peer_update: NEW: an-node02.alteeve.com 1208199360
May 28 01:11:58 an-node02 corosync[2366]: [pcmk ] info: pcmk_peer_update: MEMB: an-node02.alteeve.com 1208199360
May 28 01:11:58 an-node02 corosync[2366]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
May 28 01:11:58 an-node02 corosync[2366]: [CPG ] downlist received left_list: 0
May 28 01:11:58 an-node02 corosync[2366]: [CPG ] chosen downlist from node r(0) ip(192.168.3.72)
May 28 01:11:58 an-node02 corosync[2366]: [MAIN ] Completed service synchronization, ready to provide service.
May 28 01:12:07 an-node02 corosync[2366]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 8: memb=1, new=0, lost=0
May 28 01:12:07 an-node02 corosync[2366]: [pcmk ] info: pcmk_peer_update: memb: an-node02.alteeve.com 1208199360
May 28 01:12:07 an-node02 corosync[2366]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 8: memb=2, new=1, lost=0
May 28 01:12:07 an-node02 corosync[2366]: [pcmk ] info: update_member: Creating entry for node 1191422144 born on 8
May 28 01:12:07 an-node02 corosync[2366]: [pcmk ] info: update_member: Node 1191422144/unknown is now: member
May 28 01:12:07 an-node02 corosync[2366]: [pcmk ] info: pcmk_peer_update: NEW: .pending. 1191422144
May 28 01:12:07 an-node02 corosync[2366]: [pcmk ] info: pcmk_peer_update: MEMB: .pending. 1191422144
May 28 01:12:07 an-node02 corosync[2366]: [pcmk ] info: pcmk_peer_update: MEMB: an-node02.alteeve.com 1208199360
May 28 01:12:07 an-node02 corosync[2366]: [pcmk ] info: send_member_notification: Sending membership update 8 to 0 children
May 28 01:12:07 an-node02 corosync[2366]: [pcmk ] info: update_member: 0x199aa40 Node 1208199360 ((null)) born on: 8
May 28 01:12:07 an-node02 corosync[2366]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
May 28 01:12:07 an-node02 corosync[2366]: [pcmk ] info: update_member: 0x19a2c30 Node 1191422144 (an-node01.alteeve.com) born on: 8
May 28 01:12:07 an-node02 corosync[2366]: [pcmk ] info: update_member: 0x19a2c30 Node 1191422144 now known as an-node01.alteeve.com (was: (null))
May 28 01:12:07 an-node02 corosync[2366]: [pcmk ] info: update_member: Node an-node01.alteeve.com now has 1 quorum votes (was 0)
May 28 01:12:07 an-node02 corosync[2366]: [pcmk ] info: send_member_notification: Sending membership update 8 to 0 children
May 28 01:12:07 an-node02 corosync[2366]: [CPG ] downlist received left_list: 0
May 28 01:12:07 an-node02 corosync[2366]: [CPG ] downlist received left_list: 0
May 28 01:12:07 an-node02 corosync[2366]: [CPG ] chosen downlist from node r(0) ip(192.168.3.72)
May 28 01:12:07 an-node02 corosync[2366]: [MAIN ] Completed service synchronization, ready to provide service.
In the log file of the second node to start (joins the existing cluster):
May 28 01:12:06 an-node01 corosync[2404]: [MAIN ] Corosync Cluster Engine ('1.2.3'): started and ready to provide service.
May 28 01:12:06 an-node01 corosync[2404]: [MAIN ] Corosync built-in features: nss dbus rdma snmp
May 28 01:12:06 an-node01 corosync[2404]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
May 28 01:12:06 an-node01 corosync[2404]: [TOTEM ] Initializing transport (UDP/IP Multicast).
May 28 01:12:06 an-node01 corosync[2404]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
May 28 01:12:06 an-node01 corosync[2404]: [TOTEM ] The network interface [192.168.3.71] is now up.
May 28 01:12:06 an-node01 corosync[2404]: [pcmk ] info: process_ais_conf: Reading configure
May 28 01:12:06 an-node01 corosync[2404]: [pcmk ] info: config_find_init: Local handle: 2013064636357672963 for logging
May 28 01:12:06 an-node01 corosync[2404]: [pcmk ] info: config_find_next: Processing additional logging options...
May 28 01:12:06 an-node01 corosync[2404]: [pcmk ] info: get_config_opt: Found 'off' for option: debug
May 28 01:12:06 an-node01 corosync[2404]: [pcmk ] info: get_config_opt: Found 'yes' for option: to_logfile
May 28 01:12:06 an-node01 corosync[2404]: [pcmk ] info: get_config_opt: Found '/var/log/cluster/corosync.log' for option: logfile
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: get_config_opt: Found 'yes' for option: to_syslog
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: config_find_init: Local handle: 4730966301143465988 for quorum
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: config_find_next: No additional configuration supplied for: quorum
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: get_config_opt: No default for option: provider
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: config_find_init: Local handle: 7739444317642555397 for service
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: config_find_next: Processing additional service options...
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: get_config_opt: Found '1' for option: ver
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: process_ais_conf: Enabling MCP mode: Use the Pacemaker init script to complete Pacemaker startup
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: get_config_opt: Defaulting to 'pcmk' for option: clustername
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_logd
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_mgmtd
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: pcmk_startup: CRM: Initialized
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] Logging: Initialized pcmk_startup
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: pcmk_startup: Service: 10
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: pcmk_startup: Local hostname: an-node01.alteeve.com
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: pcmk_update_nodeid: Local node id: 1191422144
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: update_member: Creating entry for node 1191422144 born on 0
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: update_member: 0x101ba30 Node 1191422144 now known as an-node01.alteeve.com (was: (null))
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: update_member: Node an-node01.alteeve.com now has 1 quorum votes (was 0)
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: update_member: Node 1191422144/an-node01.alteeve.com is now: member
May 28 01:12:07 an-node01 corosync[2404]: [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.1.5
May 28 01:12:07 an-node01 corosync[2404]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
May 28 01:12:07 an-node01 corosync[2404]: [SERV ] Service engine loaded: corosync configuration service
May 28 01:12:07 an-node01 corosync[2404]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
May 28 01:12:07 an-node01 corosync[2404]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
May 28 01:12:07 an-node01 corosync[2404]: [SERV ] Service engine loaded: corosync profile loading service
May 28 01:12:07 an-node01 corosync[2404]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
May 28 01:12:07 an-node01 corosync[2404]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 8: memb=0, new=0, lost=0
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 8: memb=2, new=2, lost=0
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: pcmk_peer_update: NEW: an-node01.alteeve.com 1191422144
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: update_member: Creating entry for node 1208199360 born on 8
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: update_member: Node 1208199360/unknown is now: member
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: pcmk_peer_update: NEW: .pending. 1208199360
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: pcmk_peer_update: MEMB: an-node01.alteeve.com 1191422144
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: pcmk_peer_update: MEMB: .pending. 1208199360
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: send_member_notification: Sending membership update 8 to 0 children
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: update_member: 0x101ba30 Node 1191422144 ((null)) born on: 8
May 28 01:12:07 an-node01 corosync[2404]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: update_member: 0x1022800 Node 1208199360 (an-node02.alteeve.com) born on: 8
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: update_member: 0x1022800 Node 1208199360 now known as an-node02.alteeve.com (was: (null))
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: update_member: Node an-node02.alteeve.com now has 1 quorum votes (was 0)
May 28 01:12:07 an-node01 corosync[2404]: [pcmk ] info: send_member_notification: Sending membership update 8 to 0 children
May 28 01:12:07 an-node01 corosync[2404]: [CPG ] downlist received left_list: 0
May 28 01:12:07 an-node01 corosync[2404]: [CPG ] downlist received left_list: 0
May 28 01:12:07 an-node01 corosync[2404]: [CPG ] chosen downlist from node r(0) ip(192.168.3.72)
May 28 01:12:07 an-node01 corosync[2404]: [MAIN ] Completed service synchronization, ready to provide service.
Start pacemaker:
/etc/init.d/pacemaker start
Starting Pacemaker Cluster Manager: [ OK ]
In the log files:
May 28 15:56:54 an-node01 pacemakerd: [3527]: info: Invoked: pacemakerd
May 28 15:56:54 an-node01 pacemakerd: [3527]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/root
May 28 15:56:54 an-node01 pacemakerd: [3527]: info: read_config: Reading configure
May 28 15:56:54 an-node01 pacemakerd: [3527]: info: config_find_next: Processing additional service options...
May 28 15:56:54 an-node01 pacemakerd: [3527]: info: get_config_opt: Found 'pacemaker' for option: name
May 28 15:56:54 an-node01 pacemakerd: [3527]: info: get_config_opt: Found '1' for option: ver
May 28 15:56:54 an-node01 pacemakerd: [3527]: info: get_config_opt: Defaulting to 'no' for option: use_logd
May 28 15:56:54 an-node01 pacemakerd: [3527]: info: get_config_opt: Defaulting to 'no' for option: use_mgmtd
May 28 15:56:54 an-node01 pacemakerd: [3527]: info: config_find_next: No additional configuration supplied for: service
May 28 15:56:54 an-node01 pacemakerd: [3527]: info: config_find_next: Processing additional logging options...
May 28 15:56:54 an-node01 pacemakerd: [3527]: info: get_config_opt: Found 'off' for option: debug
May 28 15:56:54 an-node01 pacemakerd: [3527]: info: get_config_opt: Found 'yes' for option: to_logfile
May 28 15:56:54 an-node01 pacemakerd: [3527]: info: get_config_opt: Found '/var/log/cluster/corosync.log' for option: logfile
May 28 15:56:54 an-node01 pacemakerd: [3527]: info: get_config_opt: Found 'yes' for option: to_syslog
May 28 15:56:54 an-node01 pacemakerd: [3527]: info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/root
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: main: Starting Pacemaker 1.1.5-5.el6 (Build: 01e86afaaa6d4a8c4836f68df80ababd6ca3902f): manpages docbook-manpages publican ncurses cman cs-quorum corosync snmp libesmtp
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: main: Maximum core file size is: 18446744073709551615
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: 0x13f7a20 Node 1191422144 now known as an-node01.alteeve.com (was: (null))
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: Node an-node01.alteeve.com now has process list: 00000000000000000000000000000002 (was 00000000000000000000000000000000)
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: G_main_add_SignalHandler: Added signal handler for signal 17
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: start_child: Forked child 3534 for process stonith-ng
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: Node an-node01.alteeve.com now has process list: 00000000000000000000000000100002 (was 00000000000000000000000000000002)
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: start_child: Forked child 3535 for process cib
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: Node an-node01.alteeve.com now has process list: 00000000000000000000000000100102 (was 00000000000000000000000000100002)
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: start_child: Forked child 3536 for process lrmd
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: Node an-node01.alteeve.com now has process list: 00000000000000000000000000100112 (was 00000000000000000000000000100102)
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: start_child: Forked child 3537 for process attrd
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: Node an-node01.alteeve.com now has process list: 00000000000000000000000000101112 (was 00000000000000000000000000100112)
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: start_child: Forked child 3538 for process pengine
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: Node an-node01.alteeve.com now has process list: 00000000000000000000000000111112 (was 00000000000000000000000000101112)
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: start_child: Forked child 3539 for process crmd
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: Node an-node01.alteeve.com now has process list: 00000000000000000000000000111312 (was 00000000000000000000000000111112)
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: main: Starting mainloop
May 28 15:56:54 an-node01 cib: [3535]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/hacluster
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/root
May 28 15:56:54 an-node01 cib: [3535]: info: G_main_add_TriggerHandler: Added signal manual handler
May 28 15:56:54 an-node01 cib: [3535]: info: G_main_add_SignalHandler: Added signal handler for signal 17
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: G_main_add_SignalHandler: Added signal handler for signal 17
May 28 15:56:54 an-node01 cib: [3535]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/crm/cib.xml.sig)
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: get_cluster_type: Cluster type is: 'openais'.
May 28 15:56:54 an-node01 cib: [3535]: info: validate_with_relaxng: Creating RNG parser context
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: init_ais_connection_classic: Creating connection to our Corosync plugin
May 28 15:56:54 an-node01 lrmd: [3536]: info: G_main_add_SignalHandler: Added signal handler for signal 15
May 28 15:56:54 an-node01 crmd: [3539]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/hacluster
May 28 15:56:54 an-node01 attrd: [3537]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/hacluster
May 28 15:56:54 an-node01 crmd: [3539]: info: main: CRM Hg Version: 01e86afaaa6d4a8c4836f68df80ababd6ca3902f
May 28 15:56:54 an-node01 crmd: [3539]: info: crmd_init: Starting crmd
May 28 15:56:54 an-node01 crmd: [3539]: info: G_main_add_SignalHandler: Added signal handler for signal 17
May 28 15:56:54 an-node01 attrd: [3537]: info: main: Starting up
May 28 15:56:54 an-node01 attrd: [3537]: info: get_cluster_type: Cluster type is: 'openais'.
May 28 15:56:54 an-node01 attrd: [3537]: info: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
May 28 15:56:54 an-node01 attrd: [3537]: info: init_ais_connection_classic: Creating connection to our Corosync plugin
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: init_ais_connection_classic: AIS connection established
May 28 15:56:54 an-node01 corosync[3019]: [pcmk ] info: pcmk_ipc: Recorded connection 0x18aa430 for stonith-ng/0
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: get_ais_nodeid: Server details: id=1191422144 uname=an-node01.alteeve.com cname=pcmk
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: init_ais_connection_once: Connection to 'classic openais (with plugin)': established
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: crm_new_peer: Node an-node01.alteeve.com now has id: 1191422144
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: crm_new_peer: Node 1191422144 is now known as an-node01.alteeve.com
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: main: Starting stonith-ng mainloop
May 28 15:56:54 an-node01 lrmd: [3536]: info: G_main_add_SignalHandler: Added signal handler for signal 17
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: crm_update_peer: Node an-node01.alteeve.com: id=1191422144 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000111312 (new)
May 28 15:56:54 an-node01 attrd: [3537]: info: init_ais_connection_classic: AIS connection established
May 28 15:56:54 an-node01 lrmd: [3536]: info: enabling coredumps
May 28 15:56:54 an-node01 corosync[3019]: [pcmk ] info: pcmk_ipc: Recorded connection 0x18ae790 for attrd/0
May 28 15:56:54 an-node01 lrmd: [3536]: info: G_main_add_SignalHandler: Added signal handler for signal 10
May 28 15:56:54 an-node01 attrd: [3537]: info: get_ais_nodeid: Server details: id=1191422144 uname=an-node01.alteeve.com cname=pcmk
May 28 15:56:54 an-node01 lrmd: [3536]: info: G_main_add_SignalHandler: Added signal handler for signal 12
May 28 15:56:54 an-node01 attrd: [3537]: info: init_ais_connection_once: Connection to 'classic openais (with plugin)': established
May 28 15:56:54 an-node01 lrmd: [3536]: info: Started.
May 28 15:56:54 an-node01 attrd: [3537]: info: crm_new_peer: Node an-node01.alteeve.com now has id: 1191422144
May 28 15:56:54 an-node01 attrd: [3537]: info: crm_new_peer: Node 1191422144 is now known as an-node01.alteeve.com
May 28 15:56:54 an-node01 attrd: [3537]: info: main: Cluster connection active
May 28 15:56:54 an-node01 attrd: [3537]: info: main: Accepting attribute updates
May 28 15:56:54 an-node01 attrd: [3537]: info: main: Starting mainloop...
May 28 15:56:54 an-node01 attrd: [3537]: info: crm_update_peer: Node an-node01.alteeve.com: id=1191422144 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000111312 (new)
May 28 15:56:54 an-node01 cib: [3535]: info: startCib: CIB Initialization completed successfully
May 28 15:56:54 an-node01 cib: [3535]: info: get_cluster_type: Cluster type is: 'openais'.
May 28 15:56:54 an-node01 cib: [3535]: info: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
May 28 15:56:54 an-node01 cib: [3535]: info: init_ais_connection_classic: Creating connection to our Corosync plugin
May 28 15:56:54 an-node01 cib: [3535]: info: init_ais_connection_classic: AIS connection established
May 28 15:56:54 an-node01 corosync[3019]: [pcmk ] info: pcmk_ipc: Recorded connection 0x18b2af0 for cib/0
May 28 15:56:54 an-node01 corosync[3019]: [pcmk ] info: pcmk_ipc: Sending membership update 32 to cib
May 28 15:56:54 an-node01 cib: [3535]: info: get_ais_nodeid: Server details: id=1191422144 uname=an-node01.alteeve.com cname=pcmk
May 28 15:56:54 an-node01 cib: [3535]: info: init_ais_connection_once: Connection to 'classic openais (with plugin)': established
May 28 15:56:54 an-node01 cib: [3535]: info: crm_new_peer: Node an-node01.alteeve.com now has id: 1191422144
May 28 15:56:54 an-node01 cib: [3535]: info: crm_new_peer: Node 1191422144 is now known as an-node01.alteeve.com
May 28 15:56:54 an-node01 cib: [3535]: info: cib_init: Starting cib mainloop
May 28 15:56:54 an-node01 cib: [3535]: notice: ais_dispatch_message: Membership 32: quorum acquired
May 28 15:56:54 an-node01 cib: [3535]: info: crm_update_peer: Node an-node01.alteeve.com: id=1191422144 state=member (new) addr=r(0) ip(192.168.3.71) (new) votes=1 (new) born=32 seen=32 proc=00000000000000000000000000000000
May 28 15:56:54 an-node01 cib: [3535]: info: crm_new_peer: Node an-node02.alteeve.com now has id: 1208199360
May 28 15:56:54 an-node01 cib: [3535]: info: crm_new_peer: Node 1208199360 is now known as an-node02.alteeve.com
May 28 15:56:54 an-node01 cib: [3535]: info: crm_update_peer: Node an-node02.alteeve.com: id=1208199360 state=member (new) addr=r(0) ip(192.168.3.72) votes=1 born=32 seen=32 proc=00000000000000000000000000000000
May 28 15:56:54 an-node01 cib: [3535]: info: crm_update_peer: Node an-node01.alteeve.com: id=1191422144 state=member addr=r(0) ip(192.168.3.71) votes=1 born=32 seen=32 proc=00000000000000000000000000111312 (new)
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: 0x13fc1e0 Node 1208199360 now known as an-node02.alteeve.com (was: (null))
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: Node an-node02.alteeve.com now has process list: 00000000000000000000000000000002 (was 00000000000000000000000000000000)
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: Node an-node02.alteeve.com now has process list: 00000000000000000000000000100002 (was 00000000000000000000000000000002)
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: Node an-node02.alteeve.com now has process list: 00000000000000000000000000100102 (was 00000000000000000000000000100002)
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: Node an-node02.alteeve.com now has process list: 00000000000000000000000000100112 (was 00000000000000000000000000100102)
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: crm_new_peer: Node 0 is now known as an-node02.alteeve.com
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: crm_update_peer: Node an-node02.alteeve.com: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000100002 (new)
May 28 15:56:54 an-node01 attrd: [3537]: info: crm_new_peer: Node 0 is now known as an-node02.alteeve.com
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: crm_update_peer: Node an-node02.alteeve.com: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000100102 (new)
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: crm_update_peer: Node an-node02.alteeve.com: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000100112 (new)
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: Node an-node02.alteeve.com now has process list: 00000000000000000000000000101112 (was 00000000000000000000000000100112)
May 28 15:56:54 an-node01 cib: [3535]: info: crm_update_peer: Node an-node02.alteeve.com: id=1208199360 state=member addr=r(0) ip(192.168.3.72) votes=1 born=32 seen=32 proc=00000000000000000000000000000002 (new)
May 28 15:56:54 an-node01 attrd: [3537]: info: crm_update_peer: Node an-node02.alteeve.com: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000100002 (new)
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: crm_update_peer: Node an-node02.alteeve.com: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000101112 (new)
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: Node an-node02.alteeve.com now has process list: 00000000000000000000000000111112 (was 00000000000000000000000000101112)
May 28 15:56:54 an-node01 attrd: [3537]: info: crm_update_peer: Node an-node02.alteeve.com: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000100102 (new)
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: crm_update_peer: Node an-node02.alteeve.com: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000111112 (new)
May 28 15:56:54 an-node01 pacemakerd: [3530]: info: update_node_processes: Node an-node02.alteeve.com now has process list: 00000000000000000000000000111312 (was 00000000000000000000000000111112)
May 28 15:56:54 an-node01 attrd: [3537]: info: crm_update_peer: Node an-node02.alteeve.com: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000100112 (new)
May 28 15:56:54 an-node01 stonith-ng: [3534]: info: crm_update_peer: Node an-node02.alteeve.com: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000111312 (new)
May 28 15:56:54 an-node01 attrd: [3537]: info: crm_update_peer: Node an-node02.alteeve.com: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000101112 (new)
May 28 15:56:54 an-node01 cib: [3535]: info: crm_update_peer: Node an-node02.alteeve.com: id=1208199360 state=member addr=r(0) ip(192.168.3.72) votes=1 born=32 seen=32 proc=00000000000000000000000000100002 (new)
May 28 15:56:54 an-node01 cib: [3535]: info: crm_update_peer: Node an-node02.alteeve.com: id=1208199360 state=member addr=r(0) ip(192.168.3.72) votes=1 born=32 seen=32 proc=00000000000000000000000000100102 (new)
May 28 15:56:54 an-node01 cib: [3535]: info: crm_update_peer: Node an-node02.alteeve.com: id=1208199360 state=member addr=r(0) ip(192.168.3.72) votes=1 born=32 seen=32 proc=00000000000000000000000000100112 (new)
May 28 15:56:54 an-node01 attrd: [3537]: info: crm_update_peer: Node an-node02.alteeve.com: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000111112 (new)
May 28 15:56:54 an-node01 cib: [3535]: info: crm_update_peer: Node an-node02.alteeve.com: id=1208199360 state=member addr=r(0) ip(192.168.3.72) votes=1 born=32 seen=32 proc=00000000000000000000000000101112 (new)
May 28 15:56:54 an-node01 cib: [3535]: info: crm_update_peer: Node an-node02.alteeve.com: id=1208199360 state=member addr=r(0) ip(192.168.3.72) votes=1 born=32 seen=32 proc=00000000000000000000000000111112 (new)
May 28 15:56:54 an-node01 cib: [3535]: info: crm_update_peer: Node an-node02.alteeve.com: id=1208199360 state=member addr=r(0) ip(192.168.3.72) votes=1 born=32 seen=32 proc=00000000000000000000000000111312 (new)
May 28 15:56:54 an-node01 attrd: [3537]: info: crm_update_peer: Node an-node02.alteeve.com: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00000000000000000000000000111312 (new)
May 28 15:56:55 an-node01 crmd: [3539]: info: do_cib_control: CIB connection established
May 28 15:56:55 an-node01 crmd: [3539]: info: get_cluster_type: Cluster type is: 'openais'.
May 28 15:56:55 an-node01 crmd: [3539]: info: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
May 28 15:56:55 an-node01 crmd: [3539]: info: init_ais_connection_classic: Creating connection to our Corosync plugin
May 28 15:56:55 an-node01 crmd: [3539]: info: init_ais_connection_classic: AIS connection established
May 28 15:56:55 an-node01 corosync[3019]: [pcmk ] info: pcmk_ipc: Recorded connection 0x18b6e50 for crmd/0
May 28 15:56:55 an-node01 corosync[3019]: [pcmk ] info: pcmk_ipc: Sending membership update 32 to crmd
May 28 15:56:55 an-node01 crmd: [3539]: info: get_ais_nodeid: Server details: id=1191422144 uname=an-node01.alteeve.com cname=pcmk
May 28 15:56:55 an-node01 crmd: [3539]: info: init_ais_connection_once: Connection to 'classic openais (with plugin)': established
May 28 15:56:55 an-node01 crmd: [3539]: info: crm_new_peer: Node an-node01.alteeve.com now has id: 1191422144
May 28 15:56:55 an-node01 crmd: [3539]: info: crm_new_peer: Node 1191422144 is now known as an-node01.alteeve.com
May 28 15:56:55 an-node01 crmd: [3539]: info: ais_status_callback: status: an-node01.alteeve.com is now unknown
May 28 15:56:55 an-node01 crmd: [3539]: info: do_ha_control: Connected to the cluster
May 28 15:56:55 an-node01 crmd: [3539]: info: do_started: Delaying start, no membership data (0000000000100000)
May 28 15:56:55 an-node01 crmd: [3539]: info: crmd_init: Starting crmd's mainloop
May 28 15:56:55 an-node01 crmd: [3539]: notice: ais_dispatch_message: Membership 32: quorum acquired
May 28 15:56:55 an-node01 crmd: [3539]: info: ais_status_callback: status: an-node01.alteeve.com is now member (was unknown)
May 28 15:56:55 an-node01 crmd: [3539]: info: crm_update_peer: Node an-node01.alteeve.com: id=1191422144 state=member (new) addr=r(0) ip(192.168.3.71) (new) votes=1 (new) born=32 seen=32 proc=00000000000000000000000000000000
May 28 15:56:55 an-node01 crmd: [3539]: info: crm_new_peer: Node an-node02.alteeve.com now has id: 1208199360
May 28 15:56:55 an-node01 crmd: [3539]: info: crm_new_peer: Node 1208199360 is now known as an-node02.alteeve.com
May 28 15:56:55 an-node01 crmd: [3539]: info: ais_status_callback: status: an-node02.alteeve.com is now unknown
May 28 15:56:55 an-node01 crmd: [3539]: info: ais_status_callback: status: an-node02.alteeve.com is now member (was unknown)
May 28 15:56:55 an-node01 crmd: [3539]: info: crm_update_peer: Node an-node02.alteeve.com: id=1208199360 state=member (new) addr=r(0) ip(192.168.3.72) votes=1 born=32 seen=32 proc=00000000000000000000000000000000
May 28 15:56:55 an-node01 crmd: [3539]: notice: crmd_peer_update: Status update: Client an-node01.alteeve.com/crmd now has status [online] (DC=<null>)
May 28 15:56:55 an-node01 crmd: [3539]: info: crm_update_peer: Node an-node01.alteeve.com: id=1191422144 state=member addr=r(0) ip(192.168.3.71) votes=1 born=32 seen=32 proc=00000000000000000000000000111312 (new)
May 28 15:56:55 an-node01 crmd: [3539]: notice: crmd_peer_update: Status update: Client an-node02.alteeve.com/crmd now has status [online] (DC=<null>)
May 28 15:56:55 an-node01 crmd: [3539]: info: crm_update_peer: Node an-node02.alteeve.com: id=1208199360 state=member addr=r(0) ip(192.168.3.72) votes=1 born=32 seen=32 proc=00000000000000000000000000111312 (new)
May 28 15:56:55 an-node01 crmd: [3539]: info: do_started: Delaying start, Config not read (0000000000000040)
May 28 15:56:55 an-node01 crmd: [3539]: info: do_started: Delaying start, Config not read (0000000000000040)
May 28 15:56:55 an-node01 crmd: [3539]: info: config_query_callback: Shutdown escalation occurs after: 1200000ms
May 28 15:56:55 an-node01 crmd: [3539]: info: config_query_callback: Checking for expired actions every 900000ms
May 28 15:56:55 an-node01 crmd: [3539]: info: config_query_callback: Sending expected-votes=2 to corosync
May 28 15:56:55 an-node01 crmd: [3539]: info: do_started: The local CRM is operational
May 28 15:56:55 an-node01 crmd: [3539]: info: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
May 28 15:56:56 an-node01 crmd: [3539]: info: ais_dispatch_message: Membership 32: quorum retained
May 28 15:56:56 an-node01 crmd: [3539]: info: te_connect_stonith: Attempting connection to fencing daemon...
May 28 15:56:57 an-node01 crmd: [3539]: info: te_connect_stonith: Connected
May 28 15:56:59 an-node01 attrd: [3537]: info: cib_connect: Connected to the CIB after 1 signon attempts
May 28 15:56:59 an-node01 attrd: [3537]: info: cib_connect: Sending full refresh
May 28 15:57:56 an-node01 crmd: [3539]: info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped! (60000ms)
May 28 15:57:56 an-node01 crmd: [3539]: WARN: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
May 28 15:57:56 an-node01 crmd: [3539]: info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]
May 28 15:57:56 an-node01 crmd: [3539]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
May 28 15:57:56 an-node01 crmd: [3539]: info: do_te_control: Registering TE UUID: 492bee5d-5336-4981-b74f-db6eb6e04f38
May 28 15:57:56 an-node01 crmd: [3539]: WARN: cib_client_add_notify_callback: Callback already present
May 28 15:57:56 an-node01 crmd: [3539]: info: set_graph_functions: Setting custom graph functions
May 28 15:57:56 an-node01 crmd: [3539]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses
May 28 15:57:56 an-node01 crmd: [3539]: info: do_dc_takeover: Taking over DC status for this partition
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_readwrite: We are now in R/W mode
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/5, version=0.5.1): ok (rc=0)
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/6, version=0.5.2): ok (rc=0)
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/8, version=0.5.3): ok (rc=0)
May 28 15:57:56 an-node01 crmd: [3539]: info: join_make_offer: Making join offers based on membership 32
May 28 15:57:56 an-node01 crmd: [3539]: info: do_dc_join_offer_all: join-1: Waiting on 2 outstanding join acks
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/10, version=0.5.4): ok (rc=0)
May 28 15:57:56 an-node01 crmd: [3539]: info: ais_dispatch_message: Membership 32: quorum retained
May 28 15:57:56 an-node01 crmd: [3539]: info: crmd_ais_dispatch: Setting expected votes to 2
May 28 15:57:56 an-node01 crmd: [3539]: info: config_query_callback: Shutdown escalation occurs after: 1200000ms
May 28 15:57:56 an-node01 crmd: [3539]: info: config_query_callback: Checking for expired actions every 900000ms
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/13, version=0.5.5): ok (rc=0)
May 28 15:57:56 an-node01 crmd: [3539]: info: config_query_callback: Sending expected-votes=2 to corosync
May 28 15:57:56 an-node01 crmd: [3539]: info: update_dc: Set DC to an-node01.alteeve.com (3.0.5)
May 28 15:57:56 an-node01 crmd: [3539]: info: ais_dispatch_message: Membership 32: quorum retained
May 28 15:57:56 an-node01 crmd: [3539]: info: crmd_ais_dispatch: Setting expected votes to 2
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/16, version=0.5.6): ok (rc=0)
May 28 15:57:56 an-node01 crmd: [3539]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
May 28 15:57:56 an-node01 crmd: [3539]: info: do_state_transition: All 2 cluster nodes responded to the join offer.
May 28 15:57:56 an-node01 crmd: [3539]: info: do_dc_join_finalize: join-1: Syncing the CIB from an-node01.alteeve.com to the rest of the cluster
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/17, version=0.5.6): ok (rc=0)
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/18, version=0.5.7): ok (rc=0)
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/19, version=0.5.8): ok (rc=0)
May 28 15:57:56 an-node01 crmd: [3539]: info: update_attrd: Connecting to attrd...
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='an-node01.alteeve.com']/transient_attributes (origin=local/crmd/20, version=0.5.9): ok (rc=0)
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='an-node02.alteeve.com']/transient_attributes (origin=an-node02.alteeve.com/crmd/7, version=0.5.10): ok (rc=0)
May 28 15:57:56 an-node01 attrd: [3537]: info: find_hash_entry: Creating hash entry for terminate
May 28 15:57:56 an-node01 attrd: [3537]: info: find_hash_entry: Creating hash entry for shutdown
May 28 15:57:56 an-node01 crmd: [3539]: info: erase_xpath_callback: Deletion of "//node_state[@uname='an-node01.alteeve.com']/transient_attributes": ok (rc=0)
May 28 15:57:56 an-node01 crmd: [3539]: info: do_dc_join_ack: join-1: Updating node state to member for an-node02.alteeve.com
May 28 15:57:56 an-node01 crmd: [3539]: info: do_dc_join_ack: join-1: Updating node state to member for an-node01.alteeve.com
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='an-node02.alteeve.com']/lrm (origin=local/crmd/21, version=0.5.11): ok (rc=0)
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='an-node01.alteeve.com']/lrm (origin=local/crmd/23, version=0.5.13): ok (rc=0)
May 28 15:57:56 an-node01 attrd: [3537]: info: crm_get_peer: Node an-node02.alteeve.com now has id: 1208199360
May 28 15:57:56 an-node01 crmd: [3539]: info: erase_xpath_callback: Deletion of "//node_state[@uname='an-node02.alteeve.com']/lrm": ok (rc=0)
May 28 15:57:56 an-node01 crmd: [3539]: info: erase_xpath_callback: Deletion of "//node_state[@uname='an-node01.alteeve.com']/lrm": ok (rc=0)
May 28 15:57:56 an-node01 crmd: [3539]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
May 28 15:57:56 an-node01 crmd: [3539]: info: do_state_transition: All 2 cluster nodes are eligible to run resources.
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/25, version=0.5.15): ok (rc=0)
May 28 15:57:56 an-node01 crmd: [3539]: info: do_dc_join_final: Ensuring DC, quorum and node attributes are up-to-date
May 28 15:57:56 an-node01 crmd: [3539]: info: crm_update_quorum: Updating quorum status to true (call=27)
May 28 15:57:56 an-node01 crmd: [3539]: info: abort_transition_graph: do_te_invoke:173 - Triggered transition abort (complete=1) : Peer Cancelled
May 28 15:57:56 an-node01 crmd: [3539]: info: do_pe_invoke: Query 28: Requesting the current CIB: S_POLICY_ENGINE
May 28 15:57:56 an-node01 cib: [3535]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/27, version=0.5.17): ok (rc=0)
May 28 15:57:56 an-node01 attrd: [3537]: info: attrd_local_callback: Sending full refresh (origin=crmd)
May 28 15:57:56 an-node01 attrd: [3537]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (<null>)
May 28 15:57:56 an-node01 attrd: [3537]: info: attrd_trigger_update: Sending flush op to all hosts for: terminate (<null>)
May 28 15:57:56 an-node01 pengine: [3538]: ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
May 28 15:57:56 an-node01 pengine: [3538]: ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
May 28 15:57:56 an-node01 pengine: [3538]: ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
May 28 15:57:56 an-node01 pengine: [3538]: notice: stage6: Delaying fencing operations until there are resources to manage
May 28 15:57:56 an-node01 crmd: [3539]: info: do_pe_invoke_callback: Invoking the PE: query=28, ref=pe_calc-dc-1306612676-9, seq=32, quorate=1
May 28 15:57:56 an-node01 crmd: [3539]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
May 28 15:57:56 an-node01 crmd: [3539]: info: unpack_graph: Unpacked transition 0: 2 actions in 2 synapses
May 28 15:57:56 an-node01 crmd: [3539]: info: do_te_invoke: Processing graph 0 (ref=pe_calc-dc-1306612676-9) derived from /var/lib/pengine/pe-input-16.bz2
May 28 15:57:56 an-node01 crmd: [3539]: info: te_rsc_command: Initiating action 3: probe_complete probe_complete on an-node01.alteeve.com (local) - no waiting
May 28 15:57:56 an-node01 attrd: [3537]: info: find_hash_entry: Creating hash entry for probe_complete
May 28 15:57:56 an-node01 attrd: [3537]: info: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
May 28 15:57:56 an-node01 crmd: [3539]: info: te_rsc_command: Initiating action 2: probe_complete probe_complete on an-node02.alteeve.com - no waiting
May 28 15:57:56 an-node01 crmd: [3539]: info: run_graph: ====================================================
May 28 15:57:56 an-node01 crmd: [3539]: notice: run_graph: Transition 0 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-16.bz2): Complete
May 28 15:57:56 an-node01 attrd: [3537]: info: attrd_perform_update: Sent update 10: probe_complete=true
May 28 15:57:56 an-node01 crmd: [3539]: info: te_graph_trigger: Transition 0 is now complete
May 28 15:57:56 an-node01 crmd: [3539]: info: notify_crmd: Transition 0 status: done - <null>
May 28 15:57:56 an-node01 crmd: [3539]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
May 28 15:57:56 an-node01 crmd: [3539]: info: do_state_transition: Starting PEngine Recheck Timer
Check that the process is running:
ps axf
PID TTY STAT TIME COMMAND
...
3019 ? Ssl 0:07 corosync
3530 pts/0 S 0:00 pacemakerd
3534 ? Ss 0:00 \_ /usr/lib64/heartbeat/stonithd
3535 ? Ss 0:00 \_ /usr/lib64/heartbeat/cib
3536 ? Ss 0:00 \_ /usr/lib64/heartbeat/lrmd
3537 ? Ss 0:00 \_ /usr/lib64/heartbeat/attrd
3538 ? Ss 0:00 \_ /usr/lib64/heartbeat/pengine
3539 ? Ss 0:00 \_ /usr/lib64/heartbeat/crmd
Check that there were no errors:
grep ERROR: /var/log/messages | grep -v unpack_resources
# If anything returns, address them, clear /var/log/messages and try starting pacemaker again. Repeat until no errors are returned.
Tools
The core tool is crm, "cluster resource manager". Run by itself, it starts a shell. Alternatively, it can be passed a single argument, a file with multiple commands and STDIN can be redirected in.
The main tool to monitor the cluster is crm_mon, which a variant on crm status.
Pacemaker tools use --help for the main usage information. The same help is available via man pages.
Check the cluster's status:
crm_mon -1
============
Last updated: Sun May 29 12:26:26 2011
Stack: openais
Current DC: an-node01.alteeve.com - partition with quorum
Version: 1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
0 Resources configured.
============
Online: [ an-node02.alteeve.com an-node01.alteeve.com ]
Building an Active/Passive Cluster
cmirror
cmirror - DOC-55285 - Jonathan Brassow - visegrips - #lvm
Any questions, feedback, advice, complaints or meanderings are welcome. | |||
Alteeve's Niche! | Enterprise Support: Alteeve Support |
Community Support | |
© Alteeve's Niche! Inc. 1997-2024 | Anvil! "Intelligent Availability®" Platform | ||
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions. |