RHCS v3 cluster.conf: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
{{howto_header}} | {{howto_header}} | ||
'''NOTICE''': Do not trust this document until all "Q." are answered and removed. | |||
'''NOTICE''': This is a work in progress and likely contains errors and omissions. | |||
In [[RHCS]], the <span class="code">/etc/cluster/cluster.conf</span> is the "main" configuration file for setting up the cluster and it's nodes and resources. | |||
In cluster version 3, you can technically load cluster configurations from many places. Most options are available in <span class="code">cluster.conf</span> though, so it's a logical place to set most values. | |||
= Format = | |||
The <span class="code">cluster.conf</span> file is an [[XML]] formatted file that must validate against either <span class="code">cluster.rng</span> (cluster 3) or <span class="code">cluster.ng</span> (RHEL 5.x and older). If it fails to validate, the cluster will not use your file. Once you finish editing your <span class="code">cluster.conf</span> file, test it via <span class="code">xmllint</span>: | |||
<source lang="bash"> | |||
xmllint --relaxng /usr/share/cluster/cluster.rng /etc/cluster/cluster.conf | |||
</source> | |||
Change the path to and name of your <span class="code">cluster.[r]ng</span> file above if needed. Do not try to use your new configuration until it validates. | |||
The <span class="code">cluster.conf</span> file should be in the format: | |||
<source lang="xml"> | |||
<?xml version="1.0"?> | |||
<cluster name="an-cluster" config_version="14"> | |||
<...> | |||
</cluster> | |||
</source> | |||
Tags may or may not have child elements. If a tag does not, then put all of the variables in one self-closing statement. | |||
<source lang="xml"> | |||
<foo a="x" b="y" c="z" /> | |||
</source> | |||
If the tag does accept child elements, then use a start and end tag with the child elements inside. The opening tag may or may not have attributes. This example shows two elements. | |||
<source lang="xml"> | |||
<section foo="x" bar="y"> | |||
<baz a="x" b="y" c="z" /> | |||
</section> | |||
</source> | |||
= Sections = | |||
There are multiple sections, most of which are optional and can be omitted if not used. | |||
== cluster; The Parent Element == | |||
All tags and elements must be inside the parent <span class="code">cluster</span> tag. | |||
It only has two attributes; <span class="code">name</span> and <span class="code">config_version</span> | |||
Please see <span class="code">man 5 cluster.conf</span> for more details. | |||
=== name === | |||
This attribute names the cluster. The name you choose will be important, as you will use it elsewhere in your cluster. An example would be when creating a <span class="code">GFS2</span> partition. | |||
* No default. | |||
=== config_version === | |||
This is the current version of the <span class="code">cluster.conf</span> file. Every time you make a change, you must increment this value by one. The cluster software refers to this value when determining which configuration file to use and to push to other nodes. | |||
* No default | |||
* Must be a [http://en.wikipedia.org/wiki/Natural_number natural number] | |||
=== Example === | |||
This names the cluster <span class="code">an-cluster</span> and sets the version to <span class="code">1</span>. All other cluster configurations must be contained inside this start and end tag. | |||
<source lang="xml"> | |||
<?xml version="1.0"?> | |||
<cluster name="an-cluster" config_version="1"> | |||
<!-- All cluster configuration options go here. --> | |||
</cluster> | |||
</source> | |||
== cman; The Cluster Manager == | |||
The <span class="code">cman</span> tag is used to define general cluster configuration options. For example, it sets the number of expected votes, whether the cluster is running in the special two-node state and so forth. | |||
If we had no need for cman arguments, we'd just put in the self-closing tag: | |||
<source lang="xml"> | |||
<cman/> | |||
</source> | |||
=== two_node === | |||
This allows you to configure a cluster with only two nodes. Normally, the loss of quorum after one of two nodes fails prevents the remaining node from continuing (if both nodes have one vote.). The default is '0'. To enable a two-node cluster, set this to '1'. If this is enabled, you must also set 'expected_votes' to '1'. | |||
* Default is <span class="code">0</span> (disabled) | |||
* Must be set to <span class="code">0</span> or <span class="code">1</span> | |||
=== expected_votes === | |||
This is used by <span class="code">cman</span> to determine quorum. The cluster is "quorate" if the sum of votes of members is over half of the expected votes value. By default, <span class="code">cman</span> sets the expected votes value to be the sum of votes of all nodes listed in <span class="code">cluster.conf<span class="code">. This can be overridden by setting an explicit <span class="code">expected_votes</span> value. When using the <span class="code">two_node</span> value to <span class="code">1</span> then this must be set to <span class="code">1</span> as well. Please see <span class="code">[[#clusternode|clusternode]]</span> in the <span class="code">[[#cluster|cluster]]</span> section for more info. | |||
* No default | |||
* Must be a [http://en.wikipedia.org/wiki/Natural_number natural number] | |||
<source lang="xml"> | |||
</source> | |||
<span class="code"></span> | |||
<source lang="xml"> | |||
<!-- | |||
--> | |||
<!-- <cman expected_votes="1" /> --> | |||
<!-- | |||
Set this to 'yes' when you are performing a rolling updade of the | |||
cluster between major releases. | |||
Q. Does this mean cman version, distro version, ...? | |||
--> | |||
<!-- <cman upgrading="no" /> --> | |||
<!-- | |||
This option controls cman's "Disallowed" mode. Setting this to '1' may | |||
improve backwards compatibility. The default is '0', disabled. | |||
Q. How and where exactly? | |||
--> | |||
<!-- <cman disallowed="0" /> --> | |||
<!-- | |||
This is the number of milliseconds after a qdisk poll before a quorum | |||
disk is considered dead. The quorum disk daemon, qdisk, periodically | |||
sends "hello" messages to cman and ais, indicating that qdisk is | |||
present. If cman doesn't receive a "hello" message in the time set | |||
here, cman will declare qdisk dead and generates error messages | |||
indicating that the connection to the quorum device has been lost. | |||
Q. Are quorum disks still useful? | |||
--> | |||
<!-- <cman quorum_dev_poll="50000" /> --> | |||
<!-- | |||
This is number of milliseconds to wait for a service to respond during | |||
a shutdown. | |||
Q. What happens after this time? | |||
Q. Does this refer to crm/pacemaker controlled services or any service? | |||
--> | |||
<!-- <cman shutdown_timeout="5000"/> --> | |||
<!-- | |||
no info. | |||
--> | |||
<!-- <cman ccsd_poll="1000"/> --> | |||
<!-- | |||
no info. | |||
--> | |||
<!-- <cman debug_mask="?"/> --> | |||
<!-- | |||
no info. Is this for the primary totem ring? | |||
--> | |||
<!-- <cman port="?"/> --> | |||
<!-- | |||
no info. | |||
--> | |||
<!-- <cman cluster_id="?"/> --> | |||
<!-- | |||
Enable stronger hashing of cluster ID to avoid collisions. | |||
Q. How? What is an example value? | |||
--> | |||
<!-- <cman hash_cluster_id=""/> --> | |||
<!-- | |||
Local node name; this is set internally by cman-preconfig and should | |||
never be set unless you understand the reprocusions of doing so. It is | |||
here for completeness only. | |||
--> | |||
<!-- <cman nodename="?"/> --> | |||
<!-- | |||
Enable 'cman' broadcast. To enable, set this to 'yes'. The default is | |||
'no', disabled. | |||
Q. Under what conditions would this be enabled? | |||
--> | |||
<!-- <cman broadcast="no"/> --> | |||
<!-- | |||
No info. | |||
--> | |||
<!-- <cman keyfile="?"/> --> | |||
<!-- | |||
No info. | |||
--> | |||
<!-- <cman disable_openais="?"/> --> | |||
<!-- | |||
This provides the ability for a user to specify a multicast address | |||
instead of using the multicast address generated by cman. If a user | |||
does not specify a multicast address, cman creates one. It forms the | |||
upper 16 bits of the multicast address with 239.192 and forms the | |||
lower 16 bits based on the cluster ID. | |||
Q. Does this have to do with the totem ring? | |||
Q. What generates the cluster ID when it's not specified by the user? | |||
--> | |||
<!-- <cman multicast=""/> --> | |||
<!-- | |||
This is where you can define a multicast address. If you specify a | |||
multicast address, ensure that it is in the 239.192.0.0/16 network | |||
which cman uses. Using a multicast address outside this range is | |||
untested. | |||
Q. Is this for the first totem ring? | |||
--> | |||
<!-- <cman addr=""/> --> | |||
<!-- Usage examples for the 'cman' argument. --> | |||
<!-- This example shows the use of cman arguments to setup a two-node | |||
cluster. --> | |||
<cman two_node="1" expected_votes="1" /> | |||
<!-- Totem Ring and the Redundant Ring Protocol --> | |||
<!-- | |||
This controls the OpenAIS message transport protocol. | |||
Q. Does this also control corosync? | |||
Q. Are there specific arguments for either? | |||
--> | |||
<!-- | |||
This defines how many millisecond to wait for consensus. If this timout | |||
is reached, the cluster will give up and attempt to form a new cluster | |||
configuration. The default is '200' (0.2 seconds). | |||
--> | |||
<!-- <totem consensus="200"> --> | |||
<!-- | |||
This tells the totem protocol how long to wait, in milliseconds, for | |||
JOIN messages to come from nodes. The default is '100' (0.1 seconds). | |||
--> | |||
<!-- <totem join="100"> --> | |||
<!-- | |||
This sets the maximum amount of time, in milliseconds, the totem | |||
protocol will wait for a token. If this time elapses, the cluster will | |||
reformed which takes approximately 50 milliseconds. The reconfiguration | |||
time is, then, a sum of this value plus the reconfigure time. The | |||
default value is '5000' (5 seconds). | |||
--> | |||
<!-- <totem token="5000"> --> | |||
<!-- | |||
no info. | |||
--> | |||
<!-- <totem fail_recv_const=""> --> | |||
<!-- | |||
This controls how many times the totem protocol will attempt to | |||
retransmit a token before giving up and forming a new configuration. If | |||
this is set, 'retransmit' and 'hold' will be calculated automatically | |||
using 'retransmits_before_loss' and 'token'. | |||
--> | |||
<!-- <totem token_retransmits_before_loss_const=""> --> | |||
<!-- | |||
This attribute specifies the redundant ring protocol mode. It can be | |||
set to 'active', 'passive', or 'none'. Active replication offers | |||
slightly lower latency from transmit to delivery in faulty network | |||
environments but with less performance. Passive replication may nearly | |||
double the speed of the totem protocol if the protocol doesn't become | |||
cpu bound. The final option is 'none', in which case only one network | |||
interface is used to operate the totem protocol. If only one interface | |||
directive is specified, 'none' is automatically chosen. If multiple | |||
interface directives are specified, only 'active' or 'passive' may be | |||
chosen. | |||
NOTE: Be sure to set this if you are using redundant rings! | |||
NOTE: If you wish to use a redundant ring, it must be configured in | |||
NOTE: each node's <clusternode...> entry. See below for an example. | |||
--> | |||
<!-- <totem rrp_mode="passive"> --> | |||
<!-- | |||
This attribute specifies whether HMAC/SHA1 authentication should be | |||
used to authenticate all messages or not. It further specifies that | |||
all data should be encrypted with the sober128 encryption algorithm to | |||
protect data from eavesdropping. This can be 'on' or 'off'. The default | |||
is 'on'. | |||
If the totem ring is on a private, secure network, disabling this can | |||
improve performance. Please test to see if the extra performance is | |||
worth the reduced security. | |||
Q. Is the default actually 'on'? | |||
--> | |||
<!-- <totem secauth="on"> --> | |||
<!-- | |||
no info | |||
--> | |||
<!-- <totem keyfile=""> --> | |||
<!-- | |||
Totem 'interface' arguments: | |||
You can specifiy one or two '<interface...>' arguments within | |||
'<totem...></totem>'. | |||
--> | |||
<!-- | |||
<totem ...> | |||
"ringnumber" is '0' or '1' and defines the ring as the primary | |||
or secondary ring. Currently, only two rings are supported. | |||
"bindnetaddr" must match the subnet of the interface you want | |||
the ring to use. The final octal must be '0'. This can be an | |||
IPv6 address, however, you will be required to set the 'nodeid' | |||
in the '<totem...>' section above. Further, there will be no | |||
automatic interface selection within a specified subnet as | |||
there is with IPv4. | |||
"mcastaddr" is the multicast address used by the totem | |||
protocol. Avoid the '224.0.0.0/8' range as that is used for | |||
configuration. If you use an IPv6 address, be sure to specify a | |||
'nodeid' in the 'totem' directive above. | |||
"mcastport" is the UDP port used with the multicast address | |||
above. | |||
"broadcast" is not defined... | |||
<interface ringnumber="0" bindnetaddr="192.168.1.0" | |||
mcastaddr="226.94.1.1" mcastport="5405" broadcast="" /> | |||
</totem> | |||
--> | |||
<!-- Quorum Daemon --> | |||
<!-- Options must be combined in one <quorumd... /> statement. --> | |||
<!-- | |||
In older versions of RHCS, a quorum partition was used to maintain | |||
quorum with the network acting as a fall back. This eventually faded | |||
out of fashion and quorum disk partitions were rarely used. Today, | |||
quorum partitions are still not required but they are coming back into | |||
fashion as a way to improve the reliability of a cluster in a multiple | |||
failed state and to provide more intelligent quorum. | |||
Lets look at a couple of examples; | |||
1. If you have a four-node cluster and two nodes fail, the surviving | |||
two nodes will not have quorum because normal quorum requires a | |||
majority (n/2+1). In this case, your cluster would shut down when | |||
it could have kept going. Adding a quorum disk would have allowed | |||
the surviving two nodes to maintain quorum. | |||
2. If you have a four-node cluster and a network event occured where | |||
only one node retained access to a critical network, you would want | |||
that one node to proceed and you would rather fence the three nodes | |||
that lost access. Under normal IP quorum, the opposite would happen | |||
because, by simple majority, the one good node would be fenced by | |||
the three other nodes. The quorumd daemon can have huristics added. | |||
In this case, we would configure each node's quorumd to first check | |||
that critical network connection. The three nodes would see that | |||
they'd lost the link and remove themselves from the cluster. In | |||
this way, only the one good node would remain up and win quorum | |||
thanks to the votes assigned to the quorum disk. | |||
In short, the quorum disk allows a much more fine grained control of | |||
quorum in corner-case failure states. | |||
This section is not required and can be left out when you aren't | |||
using a quorum disk partition. | |||
A quorum partition cannot be used in clusters greater than 16 nodes. | |||
This is due to the latency caused be clusters larger that 16 nodes | |||
causing unreliable quorum disks. With 17 or more nodes, you must use | |||
IP-based (totem protocol) quorum only. | |||
A quorum disk must be a raw 10MB or larger (11MB recommended) | |||
partition on an iSCSI or SAN device. It is recommended that your nodes | |||
use multipath to access the quorum disk. You can not use a CLVM | |||
partition. | |||
Q. On a 2-node DRBD partition, can a raw 10MB partition be used? This | |||
is probably irrelevant as there is the 'two_node' cman option, but | |||
might be useful for the heuristics in a split brain. | |||
See: http://magazine.redhat.com/2007/12/19/enhancing-cluster-quorum-with-qdisk/ | |||
--> | |||
<!-- | |||
This controls how often, in seconds, that the quorum daemon on a node | |||
will attempt to write it's status to the quorum disk and read the | |||
status of other nodes. The higher this value is, the less chance that | |||
a transiant error will disolve quorum. The longer it will take to | |||
detect and recover from a failure. The default is '2'. Please see the | |||
'<heuristics...>' section below for heuristics interval. | |||
Q. Is this accurate? | |||
Q. Does this control the huristics or disk poll? | |||
--> | |||
<!-- <quorumd interval="2" /> --> | |||
<!-- | |||
If a node fails the heuristics checks and/or fails to contact the | |||
quorum disk after this many intervals, it will be declared dead and | |||
will be fenced (a "Technical Knock Out"). To determine how long this | |||
will actually take, multiple 'interval' by 'tko' and you will have the | |||
value is seconds. | |||
If you are using Oracle RAC, be sure that this and the 'interval' value | |||
are high enough to give the RAC a chance to react to a failure first. | |||
So if your RAC timeout is set to 60 seconds, and you are using the | |||
default 'interval' of '2', it is recommended to set this to at least | |||
'35' (70 seconds). | |||
Q. Is there a modern variant on the 'cman_deadnode_timeout' and, if so, | |||
does interval*tko still need to be lower? | |||
--> | |||
<!-- <quorumd tko="" /> --> | |||
<!-- | |||
This is the number of votes assigned to the quorum disk. This value | |||
should be the total number of votes of your cluster minus the minimum | |||
number of nodes your cluster can operate with. For example, if you have | |||
a four-node cluster that can operate with just one node, you would set | |||
this to '3' (4-1). This value must be set when using a quorum disk as | |||
there is no default. | |||
Q. Is this true, or would the votes be calculated? | |||
--> | |||
<!-- <quorumd votes="" /> --> | |||
<!-- | |||
The minimum score for a node to be considered alive. If omitted or set | |||
to 0, the default function, floor((n+1)/2), is used, where n is the sum | |||
of the heuristics scores. The Minimum Score value must never exceed the | |||
sum of the heuristic scores. If set higher, it will be impossible for | |||
the heuristics tests to pass. If the resulting score is below this | |||
value, the node will reboot to try an return in a better state. | |||
Q. Does it reboot after one failure? | |||
--> | |||
<!-- <quorumd min_score="" /> --> | |||
<!-- | |||
The storage device the quorum daemon uses. The device must be the same | |||
on all nodes. It has no default and must be set unless you set 'label' | |||
below. For example, if you created your quorum disk with the call: | |||
mkqdisk -c /dev/sdi1 -l rac_qdisk | |||
This will be set to '/dev/sdi1'. When possible, use set the 'label' | |||
option below as it is more robust. If you use 'label' instead of this | |||
then the device does *not* need to be the same amoung nodes. In short, | |||
don't set this unless you have a good reason to. | |||
Q. Is this true? | |||
--> | |||
<!-- <quorumd device="" /> --> | |||
<!-- | |||
Specifies the quorum disk label created by the 'mkqdisk' utility. If | |||
you look at the example given in the 'device' argument above, then this | |||
would be 'rac_qdisk'. Setting this instead of 'device' is preferable. | |||
If you set this, then 'device' is in fact ignored. | |||
If this field is used, the quorum daemon reads '/proc/partitions' and | |||
checks for qdisk signatures on every block device found, comparing the | |||
label against the value below. This is useful in configurations where | |||
the quorum device name differs among nodes. | |||
--> | |||
<!-- <quorumd label="" /> --> | |||
<!-- DLM; The Distributed Lock Manager --> | |||
<!-- Options must be combined in one <dlm... /> statement. --> | |||
<!-- | |||
This tells DLM to use automatically determine whether to use TCP or | |||
SCTP depending on the 'rrp_mode'. You can force one protocol by setting | |||
this to 'tcp' or 'sctp'. If 'rrp_mode' is 'none', then 'tcp' is used. | |||
The default is 'detect'. | |||
--> | |||
<!-- <dlm protocol="detect" /> --> | |||
<!-- | |||
This specifies how many 100ths of a second (centiseconds) to wait | |||
before dlm emits a warning via netlink. This value is used for deadlock | |||
detection and only applies to lockspaces created with the | |||
DLM_LSFL_TIMEWARN flag. The default is 5 seconds ('500'). | |||
--> | |||
<!-- <dlm timewarn="500" /> --> | |||
<!-- | |||
Setting this to '1' will enable DLM debug messages. The default is '0' | |||
(disabled). | |||
Q: Do these messages go to /var/log/messages ? | |||
--> | |||
<!-- <dlm log_debug="0" /> --> | |||
<!-- DLM daemon options --> | |||
<!-- | |||
This controls fencing recovery dependency. The default is enabled, '1'. | |||
Set this to '0' to disable fencing dependency. | |||
Q. Does this allow cman to start when no fence device is configured? | |||
--> | |||
<!-- <dlm enable_fencing="1" /> --> | |||
<!-- | |||
This controls quorum recovery dependency. The default is enabled, '1'. | |||
Set this to '0' to disable quorum dependency. | |||
Q. Does this mean that a non-quorum partition will attempt to continue | |||
functioning? | |||
--> | |||
<!-- <dlm enable_quorum="0" /> --> | |||
<!-- | |||
The controls the deadlock detection code. The default is '1', to enable | |||
deadlock detection. Set this to '0' to disable it. The default is '0', | |||
disabled. | |||
Q. Is this primarily a debugging tool? | |||
--> | |||
<!-- <dlm enable_deadlk="0" /> --> | |||
<!-- | |||
This controls the posix lock code for clustered file systems. This is | |||
required by cluster-aware filesystems like GFS2, OCFS2 and similar. In | |||
some cases though, like Oracle RAC, plock is implemented internally and | |||
thus needs to be disabled in the cluster. Also, plock can be expensive | |||
in terms of latency and bandwidth. Disabling this may help improve | |||
performance but should only be done if you are sure you do not need | |||
posix locking in your cluster. The default is '1', enabled. To disable | |||
it, set this to '0'. | |||
Unlike 'flock' (file lock), which locks an entire file, plock allows | |||
for locking parts of a file. When a plock is set, the filesystem must | |||
know the start and length of the lock. In clustering, this information | |||
is sent between the nodes via cpg (the cluster process group), which is | |||
a small process layer on top of the totem protocol in corosync. | |||
Messages are of the form 'take lock (pid, inode, start, length)'. | |||
Delivery of these messages are kept in the same order on all nodes | |||
(total order), which is a property of 'virtual synchrony'. For example, | |||
if you have three nodes; A, B and C, and each node sends two messages, | |||
cpg ensures that the message all arrive in the same order across all | |||
nodes. For example, the messages may arrive as 'c1,a1,a2,b1,b2,c2'. The | |||
actual order doesn't matter though. | |||
For more information on posix locks, see the 'fcntl' man page and read | |||
the sections on 'F_SETLK' and 'F_GETLK'. | |||
For more information on cpg, install the corosync development libraries | |||
(corosynclib-devel) and then read the 'cpg_overview' man page. | |||
--> | |||
<!-- <dlm enable_plock="1" /> --> | |||
<!-- | |||
This controls the rate of plock operations per second. The default is | |||
'0', which is "unlimited". Set a positive whole integer to impose a | |||
limit. This mat be needed is excessive plock messages are causing | |||
network load issues. | |||
--> | |||
<!-- <dlm plock_rate_limit="0"/> --> | |||
<!-- | |||
This controls the plock ownership function. When enabled, performance | |||
gains may be seen where a given node repeatedly issues the same lock. | |||
By default, this is set to '1', enabled. This can affect backward | |||
compatibility with older versions of dlm. To disable it, set this to | |||
'0'. | |||
Q. Is this right? This should be explained better. | |||
--> | |||
<!-- <dlm plock_ownership="1" /> --> | |||
<!-- | |||
This is the number of milliseconds to wait before dropping the cache | |||
of lock information. The default is 10 seconds (10000). The lower this | |||
value, the better the performance but the more memory will be used. | |||
NOTE: This value is ignored when 'plock_ownership' is disabled. | |||
Q. Is this right? | |||
--> | |||
<!-- <dlm drop_resources_time="10000" /> --> | |||
<!-- | |||
This is the number of cached items to attempt to drop each | |||
'drop_resources_time' milliseconds. The higher this number, the better | |||
the potential performance, but the more memory will be used. | |||
NOTE: This value is ignored when 'plock_ownership' is disabled. | |||
Q. Is this right? | |||
--> | |||
<!-- <dlm drop_resources_count="10" /> --> | |||
<!-- | |||
This is the number of milliseconds that a cached item is allowed to go | |||
unused before it is set to be dropped. The default it 10 seconds | |||
(10000). The lower this value, the better the performance but the more | |||
memory will be used. | |||
NOTE: This value is ignored when 'plock_ownership' is disabled. | |||
Q. Is this right? | |||
--> | |||
<!-- <dlm drop_resources_age="10000" /> --> | |||
<!-- All default DLM options listed below. --> | |||
<dlm protocol="detect" timewarn="500" log_debug="0" enable_fencing="1" | |||
enable_quorum="0" enable_deadlk="0" enable_plock="1" | |||
plock_rate_limit="0" plock_ownership="1" | |||
drop_resources_time="10000" drop_resources_count="10" | |||
drop_resources_age="10000" /> | |||
<!-- GFS Control daemon --> | |||
<!-- | |||
There are several <gfs_controld...> arguments that are still supported, | |||
but they have been deprecated in favour of the <dlm_controld...> | |||
arguments. To see a full list, please read the 'gfs_controld(8)' man | |||
page. | |||
The one remaining argument that is still current is 'enable_withdraw'. | |||
When set to '1', the default, GFS will respond to a withdrawl. To | |||
disable the responce, set this to '0'. | |||
Q. What does the responce actually do? | |||
--> | |||
<gfs_controld enable_withdraw="1"/> | |||
<!-- Cluster Nodes --> | |||
<clusternodes> | |||
<!-- AN!Cluster Node 1 --> | |||
<!-- | |||
The clusternode 'name' value must match the name returned by | |||
`uname -n`. The network interface with the IP address mapped to | |||
this name will be the network used by the totem ring. The totem | |||
ring is used for cluster communication and reconfiguration, so | |||
all nodes must use network interfaces on the same network for | |||
the cluster to form. For the same reason, this name must not | |||
resolve to the localhost IP address (127.0.0.1/::1). | |||
Optional <clusternode ...> arguments: | |||
- weight="#"; This sets the DLM lock directory weight. This is | |||
a DLM kernel option. | |||
Q. This needs better explaining. | |||
--> | |||
<clusternode name="an-node01.alteeve.com" nodeid="1"> | |||
<!-- | |||
By default, an initial totem ring will be created on | |||
the interface that maps to the name above. Under | |||
Corosync, this would have been "ring 0". | |||
To set up a second totem ring. The 'name' must be | |||
resolvable to an IP address on the network card you | |||
want you second ring on. Further, all other nodes must | |||
be setup to use the same network as their second ring | |||
as well. | |||
NOTE: Currently broken, do not use until this warning | |||
NOTE: has been removed. | |||
--> | |||
<!-- | |||
<altname name="an-node01-sn" port="6899" | |||
mcast="239.94.1.1" /> | |||
--> | |||
<!-- Fence Devices attached to this node. --> | |||
<fence> | |||
<!-- | |||
The entries here reference devices defined | |||
below in the <fencedevices/> section. The | |||
options passed control how the device is | |||
called. When multiple devices are listed, they | |||
are tried in the order that the are listed | |||
here. | |||
The 'name' argument must match a 'name' | |||
argument in the '<fencedevice>' section below. | |||
The details must define how 'fenced' will fence | |||
*this* device. | |||
The 'method' name seems to be unpassed to the | |||
fence agent and is useful to the human reader | |||
only? | |||
All options here are passed as 'var=val' to the | |||
fence agent, one per line. | |||
Note that 'action' was formerly known as | |||
'option'. In the 'fence_na' agent, 'option' | |||
will be converted to 'action' if used. | |||
--> | |||
<method name="node_assassin"> | |||
<device name="batou" port="01" | |||
action="reboot"/> | |||
</method> | |||
</fence> | |||
</clusternode> | |||
<!-- AN!Cluster Node 2 --> | |||
<clusternode name="an-node02.alteeve.com" nodeid="2"> | |||
<altname name="an-node02-sn" port="6899" | |||
mcast="239.94.1.1" /> | |||
<fence> | |||
<method name="node_assassin"> | |||
<device name="batou" port="02" | |||
action="reboot"/> | |||
</method> | |||
</fence> | |||
</clusternode> | |||
</clusternodes> | |||
<!-- | |||
The fence device is mandatory and it defined how the cluster will | |||
handle nodes that have dropped out of communication. In our case, | |||
we will use the Node Assassin fence device. | |||
--> | |||
<fencedevices> | |||
<!-- | |||
This names the device, the agent (script) to controls it, | |||
where to find it and how to access it. | |||
--> | |||
<fencedevice name="batou" agent="fence_na" | |||
ipaddr="batou.alteeve.com" login="section9" | |||
passwd="project2501" quiet="1"></fencedevice> | |||
<fencedevice name="motoko" agent="fence_na" | |||
ipaddr="motoko.alteeve.com" login="section9" | |||
passwd="project2501" quiet="1"></fencedevice> | |||
<!-- | |||
If you have two or more fence devices, you can add the extra | |||
one(s) below. The cluster will attempt to fence a bad node | |||
using these devices in the order that they appear. | |||
--> | |||
</fencedevices> | |||
<!-- When the cluster starts, any nodes not yet in the cluster may be | |||
fenced. By default, there is a 6 second buffer, but this isn't very | |||
much time. The following argument increases the time window where other | |||
nodes can join before being fenced. I like to give up to one minute but | |||
the Red Hat man page suggests 20 seconds. Please do your own testing to | |||
determine what time is needed for your environment. | |||
--> | |||
<fence_daemon post_join_delay="60"> | |||
</fence_daemon> | |||
</source> | |||
= Examples = | |||
Examples of Fedora 13 <span class="code">cluster.conf</span> configurations. | Examples of Fedora 13 <span class="code">cluster.conf</span> configurations. |
Revision as of 18:50, 14 August 2010
Alteeve Wiki :: How To :: RHCS v3 cluster.conf |
NOTICE: Do not trust this document until all "Q." are answered and removed. NOTICE: This is a work in progress and likely contains errors and omissions.
In RHCS, the /etc/cluster/cluster.conf is the "main" configuration file for setting up the cluster and it's nodes and resources.
In cluster version 3, you can technically load cluster configurations from many places. Most options are available in cluster.conf though, so it's a logical place to set most values.
Format
The cluster.conf file is an XML formatted file that must validate against either cluster.rng (cluster 3) or cluster.ng (RHEL 5.x and older). If it fails to validate, the cluster will not use your file. Once you finish editing your cluster.conf file, test it via xmllint:
xmllint --relaxng /usr/share/cluster/cluster.rng /etc/cluster/cluster.conf
Change the path to and name of your cluster.[r]ng file above if needed. Do not try to use your new configuration until it validates.
The cluster.conf file should be in the format:
<?xml version="1.0"?>
<cluster name="an-cluster" config_version="14">
<...>
</cluster>
Tags may or may not have child elements. If a tag does not, then put all of the variables in one self-closing statement.
<foo a="x" b="y" c="z" />
If the tag does accept child elements, then use a start and end tag with the child elements inside. The opening tag may or may not have attributes. This example shows two elements.
<section foo="x" bar="y">
<baz a="x" b="y" c="z" />
</section>
Sections
There are multiple sections, most of which are optional and can be omitted if not used.
cluster; The Parent Element
All tags and elements must be inside the parent cluster tag.
It only has two attributes; name and config_version
Please see man 5 cluster.conf for more details.
name
This attribute names the cluster. The name you choose will be important, as you will use it elsewhere in your cluster. An example would be when creating a GFS2 partition.
- No default.
config_version
This is the current version of the cluster.conf file. Every time you make a change, you must increment this value by one. The cluster software refers to this value when determining which configuration file to use and to push to other nodes.
- No default
- Must be a natural number
Example
This names the cluster an-cluster and sets the version to 1. All other cluster configurations must be contained inside this start and end tag.
<?xml version="1.0"?>
<cluster name="an-cluster" config_version="1">
<!-- All cluster configuration options go here. -->
</cluster>
cman; The Cluster Manager
The cman tag is used to define general cluster configuration options. For example, it sets the number of expected votes, whether the cluster is running in the special two-node state and so forth.
If we had no need for cman arguments, we'd just put in the self-closing tag:
<cman/>
two_node
This allows you to configure a cluster with only two nodes. Normally, the loss of quorum after one of two nodes fails prevents the remaining node from continuing (if both nodes have one vote.). The default is '0'. To enable a two-node cluster, set this to '1'. If this is enabled, you must also set 'expected_votes' to '1'.
- Default is 0 (disabled)
- Must be set to 0 or 1
expected_votes
This is used by cman to determine quorum. The cluster is "quorate" if the sum of votes of members is over half of the expected votes value. By default, cman sets the expected votes value to be the sum of votes of all nodes listed in cluster.conf. This can be overridden by setting an explicit expected_votes value. When using the two_node value to 1 then this must be set to 1 as well. Please see clusternode in the cluster section for more info.
- No default
- Must be a natural number
<!--
-->
<!-- <cman expected_votes="1" /> -->
<!--
Set this to 'yes' when you are performing a rolling updade of the
cluster between major releases.
Q. Does this mean cman version, distro version, ...?
-->
<!-- <cman upgrading="no" /> -->
<!--
This option controls cman's "Disallowed" mode. Setting this to '1' may
improve backwards compatibility. The default is '0', disabled.
Q. How and where exactly?
-->
<!-- <cman disallowed="0" /> -->
<!--
This is the number of milliseconds after a qdisk poll before a quorum
disk is considered dead. The quorum disk daemon, qdisk, periodically
sends "hello" messages to cman and ais, indicating that qdisk is
present. If cman doesn't receive a "hello" message in the time set
here, cman will declare qdisk dead and generates error messages
indicating that the connection to the quorum device has been lost.
Q. Are quorum disks still useful?
-->
<!-- <cman quorum_dev_poll="50000" /> -->
<!--
This is number of milliseconds to wait for a service to respond during
a shutdown.
Q. What happens after this time?
Q. Does this refer to crm/pacemaker controlled services or any service?
-->
<!-- <cman shutdown_timeout="5000"/> -->
<!--
no info.
-->
<!-- <cman ccsd_poll="1000"/> -->
<!--
no info.
-->
<!-- <cman debug_mask="?"/> -->
<!--
no info. Is this for the primary totem ring?
-->
<!-- <cman port="?"/> -->
<!--
no info.
-->
<!-- <cman cluster_id="?"/> -->
<!--
Enable stronger hashing of cluster ID to avoid collisions.
Q. How? What is an example value?
-->
<!-- <cman hash_cluster_id=""/> -->
<!--
Local node name; this is set internally by cman-preconfig and should
never be set unless you understand the reprocusions of doing so. It is
here for completeness only.
-->
<!-- <cman nodename="?"/> -->
<!--
Enable 'cman' broadcast. To enable, set this to 'yes'. The default is
'no', disabled.
Q. Under what conditions would this be enabled?
-->
<!-- <cman broadcast="no"/> -->
<!--
No info.
-->
<!-- <cman keyfile="?"/> -->
<!--
No info.
-->
<!-- <cman disable_openais="?"/> -->
<!--
This provides the ability for a user to specify a multicast address
instead of using the multicast address generated by cman. If a user
does not specify a multicast address, cman creates one. It forms the
upper 16 bits of the multicast address with 239.192 and forms the
lower 16 bits based on the cluster ID.
Q. Does this have to do with the totem ring?
Q. What generates the cluster ID when it's not specified by the user?
-->
<!-- <cman multicast=""/> -->
<!--
This is where you can define a multicast address. If you specify a
multicast address, ensure that it is in the 239.192.0.0/16 network
which cman uses. Using a multicast address outside this range is
untested.
Q. Is this for the first totem ring?
-->
<!-- <cman addr=""/> -->
<!-- Usage examples for the 'cman' argument. -->
<!-- This example shows the use of cman arguments to setup a two-node
cluster. -->
<cman two_node="1" expected_votes="1" />
<!-- Totem Ring and the Redundant Ring Protocol -->
<!--
This controls the OpenAIS message transport protocol.
Q. Does this also control corosync?
Q. Are there specific arguments for either?
-->
<!--
This defines how many millisecond to wait for consensus. If this timout
is reached, the cluster will give up and attempt to form a new cluster
configuration. The default is '200' (0.2 seconds).
-->
<!-- <totem consensus="200"> -->
<!--
This tells the totem protocol how long to wait, in milliseconds, for
JOIN messages to come from nodes. The default is '100' (0.1 seconds).
-->
<!-- <totem join="100"> -->
<!--
This sets the maximum amount of time, in milliseconds, the totem
protocol will wait for a token. If this time elapses, the cluster will
reformed which takes approximately 50 milliseconds. The reconfiguration
time is, then, a sum of this value plus the reconfigure time. The
default value is '5000' (5 seconds).
-->
<!-- <totem token="5000"> -->
<!--
no info.
-->
<!-- <totem fail_recv_const=""> -->
<!--
This controls how many times the totem protocol will attempt to
retransmit a token before giving up and forming a new configuration. If
this is set, 'retransmit' and 'hold' will be calculated automatically
using 'retransmits_before_loss' and 'token'.
-->
<!-- <totem token_retransmits_before_loss_const=""> -->
<!--
This attribute specifies the redundant ring protocol mode. It can be
set to 'active', 'passive', or 'none'. Active replication offers
slightly lower latency from transmit to delivery in faulty network
environments but with less performance. Passive replication may nearly
double the speed of the totem protocol if the protocol doesn't become
cpu bound. The final option is 'none', in which case only one network
interface is used to operate the totem protocol. If only one interface
directive is specified, 'none' is automatically chosen. If multiple
interface directives are specified, only 'active' or 'passive' may be
chosen.
NOTE: Be sure to set this if you are using redundant rings!
NOTE: If you wish to use a redundant ring, it must be configured in
NOTE: each node's <clusternode...> entry. See below for an example.
-->
<!-- <totem rrp_mode="passive"> -->
<!--
This attribute specifies whether HMAC/SHA1 authentication should be
used to authenticate all messages or not. It further specifies that
all data should be encrypted with the sober128 encryption algorithm to
protect data from eavesdropping. This can be 'on' or 'off'. The default
is 'on'.
If the totem ring is on a private, secure network, disabling this can
improve performance. Please test to see if the extra performance is
worth the reduced security.
Q. Is the default actually 'on'?
-->
<!-- <totem secauth="on"> -->
<!--
no info
-->
<!-- <totem keyfile=""> -->
<!--
Totem 'interface' arguments:
You can specifiy one or two '<interface...>' arguments within
'<totem...></totem>'.
-->
<!--
<totem ...>
"ringnumber" is '0' or '1' and defines the ring as the primary
or secondary ring. Currently, only two rings are supported.
"bindnetaddr" must match the subnet of the interface you want
the ring to use. The final octal must be '0'. This can be an
IPv6 address, however, you will be required to set the 'nodeid'
in the '<totem...>' section above. Further, there will be no
automatic interface selection within a specified subnet as
there is with IPv4.
"mcastaddr" is the multicast address used by the totem
protocol. Avoid the '224.0.0.0/8' range as that is used for
configuration. If you use an IPv6 address, be sure to specify a
'nodeid' in the 'totem' directive above.
"mcastport" is the UDP port used with the multicast address
above.
"broadcast" is not defined...
<interface ringnumber="0" bindnetaddr="192.168.1.0"
mcastaddr="226.94.1.1" mcastport="5405" broadcast="" />
</totem>
-->
<!-- Quorum Daemon -->
<!-- Options must be combined in one <quorumd... /> statement. -->
<!--
In older versions of RHCS, a quorum partition was used to maintain
quorum with the network acting as a fall back. This eventually faded
out of fashion and quorum disk partitions were rarely used. Today,
quorum partitions are still not required but they are coming back into
fashion as a way to improve the reliability of a cluster in a multiple
failed state and to provide more intelligent quorum.
Lets look at a couple of examples;
1. If you have a four-node cluster and two nodes fail, the surviving
two nodes will not have quorum because normal quorum requires a
majority (n/2+1). In this case, your cluster would shut down when
it could have kept going. Adding a quorum disk would have allowed
the surviving two nodes to maintain quorum.
2. If you have a four-node cluster and a network event occured where
only one node retained access to a critical network, you would want
that one node to proceed and you would rather fence the three nodes
that lost access. Under normal IP quorum, the opposite would happen
because, by simple majority, the one good node would be fenced by
the three other nodes. The quorumd daemon can have huristics added.
In this case, we would configure each node's quorumd to first check
that critical network connection. The three nodes would see that
they'd lost the link and remove themselves from the cluster. In
this way, only the one good node would remain up and win quorum
thanks to the votes assigned to the quorum disk.
In short, the quorum disk allows a much more fine grained control of
quorum in corner-case failure states.
This section is not required and can be left out when you aren't
using a quorum disk partition.
A quorum partition cannot be used in clusters greater than 16 nodes.
This is due to the latency caused be clusters larger that 16 nodes
causing unreliable quorum disks. With 17 or more nodes, you must use
IP-based (totem protocol) quorum only.
A quorum disk must be a raw 10MB or larger (11MB recommended)
partition on an iSCSI or SAN device. It is recommended that your nodes
use multipath to access the quorum disk. You can not use a CLVM
partition.
Q. On a 2-node DRBD partition, can a raw 10MB partition be used? This
is probably irrelevant as there is the 'two_node' cman option, but
might be useful for the heuristics in a split brain.
See: http://magazine.redhat.com/2007/12/19/enhancing-cluster-quorum-with-qdisk/
-->
<!--
This controls how often, in seconds, that the quorum daemon on a node
will attempt to write it's status to the quorum disk and read the
status of other nodes. The higher this value is, the less chance that
a transiant error will disolve quorum. The longer it will take to
detect and recover from a failure. The default is '2'. Please see the
'<heuristics...>' section below for heuristics interval.
Q. Is this accurate?
Q. Does this control the huristics or disk poll?
-->
<!-- <quorumd interval="2" /> -->
<!--
If a node fails the heuristics checks and/or fails to contact the
quorum disk after this many intervals, it will be declared dead and
will be fenced (a "Technical Knock Out"). To determine how long this
will actually take, multiple 'interval' by 'tko' and you will have the
value is seconds.
If you are using Oracle RAC, be sure that this and the 'interval' value
are high enough to give the RAC a chance to react to a failure first.
So if your RAC timeout is set to 60 seconds, and you are using the
default 'interval' of '2', it is recommended to set this to at least
'35' (70 seconds).
Q. Is there a modern variant on the 'cman_deadnode_timeout' and, if so,
does interval*tko still need to be lower?
-->
<!-- <quorumd tko="" /> -->
<!--
This is the number of votes assigned to the quorum disk. This value
should be the total number of votes of your cluster minus the minimum
number of nodes your cluster can operate with. For example, if you have
a four-node cluster that can operate with just one node, you would set
this to '3' (4-1). This value must be set when using a quorum disk as
there is no default.
Q. Is this true, or would the votes be calculated?
-->
<!-- <quorumd votes="" /> -->
<!--
The minimum score for a node to be considered alive. If omitted or set
to 0, the default function, floor((n+1)/2), is used, where n is the sum
of the heuristics scores. The Minimum Score value must never exceed the
sum of the heuristic scores. If set higher, it will be impossible for
the heuristics tests to pass. If the resulting score is below this
value, the node will reboot to try an return in a better state.
Q. Does it reboot after one failure?
-->
<!-- <quorumd min_score="" /> -->
<!--
The storage device the quorum daemon uses. The device must be the same
on all nodes. It has no default and must be set unless you set 'label'
below. For example, if you created your quorum disk with the call:
mkqdisk -c /dev/sdi1 -l rac_qdisk
This will be set to '/dev/sdi1'. When possible, use set the 'label'
option below as it is more robust. If you use 'label' instead of this
then the device does *not* need to be the same amoung nodes. In short,
don't set this unless you have a good reason to.
Q. Is this true?
-->
<!-- <quorumd device="" /> -->
<!--
Specifies the quorum disk label created by the 'mkqdisk' utility. If
you look at the example given in the 'device' argument above, then this
would be 'rac_qdisk'. Setting this instead of 'device' is preferable.
If you set this, then 'device' is in fact ignored.
If this field is used, the quorum daemon reads '/proc/partitions' and
checks for qdisk signatures on every block device found, comparing the
label against the value below. This is useful in configurations where
the quorum device name differs among nodes.
-->
<!-- <quorumd label="" /> -->
<!-- DLM; The Distributed Lock Manager -->
<!-- Options must be combined in one <dlm... /> statement. -->
<!--
This tells DLM to use automatically determine whether to use TCP or
SCTP depending on the 'rrp_mode'. You can force one protocol by setting
this to 'tcp' or 'sctp'. If 'rrp_mode' is 'none', then 'tcp' is used.
The default is 'detect'.
-->
<!-- <dlm protocol="detect" /> -->
<!--
This specifies how many 100ths of a second (centiseconds) to wait
before dlm emits a warning via netlink. This value is used for deadlock
detection and only applies to lockspaces created with the
DLM_LSFL_TIMEWARN flag. The default is 5 seconds ('500').
-->
<!-- <dlm timewarn="500" /> -->
<!--
Setting this to '1' will enable DLM debug messages. The default is '0'
(disabled).
Q: Do these messages go to /var/log/messages ?
-->
<!-- <dlm log_debug="0" /> -->
<!-- DLM daemon options -->
<!--
This controls fencing recovery dependency. The default is enabled, '1'.
Set this to '0' to disable fencing dependency.
Q. Does this allow cman to start when no fence device is configured?
-->
<!-- <dlm enable_fencing="1" /> -->
<!--
This controls quorum recovery dependency. The default is enabled, '1'.
Set this to '0' to disable quorum dependency.
Q. Does this mean that a non-quorum partition will attempt to continue
functioning?
-->
<!-- <dlm enable_quorum="0" /> -->
<!--
The controls the deadlock detection code. The default is '1', to enable
deadlock detection. Set this to '0' to disable it. The default is '0',
disabled.
Q. Is this primarily a debugging tool?
-->
<!-- <dlm enable_deadlk="0" /> -->
<!--
This controls the posix lock code for clustered file systems. This is
required by cluster-aware filesystems like GFS2, OCFS2 and similar. In
some cases though, like Oracle RAC, plock is implemented internally and
thus needs to be disabled in the cluster. Also, plock can be expensive
in terms of latency and bandwidth. Disabling this may help improve
performance but should only be done if you are sure you do not need
posix locking in your cluster. The default is '1', enabled. To disable
it, set this to '0'.
Unlike 'flock' (file lock), which locks an entire file, plock allows
for locking parts of a file. When a plock is set, the filesystem must
know the start and length of the lock. In clustering, this information
is sent between the nodes via cpg (the cluster process group), which is
a small process layer on top of the totem protocol in corosync.
Messages are of the form 'take lock (pid, inode, start, length)'.
Delivery of these messages are kept in the same order on all nodes
(total order), which is a property of 'virtual synchrony'. For example,
if you have three nodes; A, B and C, and each node sends two messages,
cpg ensures that the message all arrive in the same order across all
nodes. For example, the messages may arrive as 'c1,a1,a2,b1,b2,c2'. The
actual order doesn't matter though.
For more information on posix locks, see the 'fcntl' man page and read
the sections on 'F_SETLK' and 'F_GETLK'.
For more information on cpg, install the corosync development libraries
(corosynclib-devel) and then read the 'cpg_overview' man page.
-->
<!-- <dlm enable_plock="1" /> -->
<!--
This controls the rate of plock operations per second. The default is
'0', which is "unlimited". Set a positive whole integer to impose a
limit. This mat be needed is excessive plock messages are causing
network load issues.
-->
<!-- <dlm plock_rate_limit="0"/> -->
<!--
This controls the plock ownership function. When enabled, performance
gains may be seen where a given node repeatedly issues the same lock.
By default, this is set to '1', enabled. This can affect backward
compatibility with older versions of dlm. To disable it, set this to
'0'.
Q. Is this right? This should be explained better.
-->
<!-- <dlm plock_ownership="1" /> -->
<!--
This is the number of milliseconds to wait before dropping the cache
of lock information. The default is 10 seconds (10000). The lower this
value, the better the performance but the more memory will be used.
NOTE: This value is ignored when 'plock_ownership' is disabled.
Q. Is this right?
-->
<!-- <dlm drop_resources_time="10000" /> -->
<!--
This is the number of cached items to attempt to drop each
'drop_resources_time' milliseconds. The higher this number, the better
the potential performance, but the more memory will be used.
NOTE: This value is ignored when 'plock_ownership' is disabled.
Q. Is this right?
-->
<!-- <dlm drop_resources_count="10" /> -->
<!--
This is the number of milliseconds that a cached item is allowed to go
unused before it is set to be dropped. The default it 10 seconds
(10000). The lower this value, the better the performance but the more
memory will be used.
NOTE: This value is ignored when 'plock_ownership' is disabled.
Q. Is this right?
-->
<!-- <dlm drop_resources_age="10000" /> -->
<!-- All default DLM options listed below. -->
<dlm protocol="detect" timewarn="500" log_debug="0" enable_fencing="1"
enable_quorum="0" enable_deadlk="0" enable_plock="1"
plock_rate_limit="0" plock_ownership="1"
drop_resources_time="10000" drop_resources_count="10"
drop_resources_age="10000" />
<!-- GFS Control daemon -->
<!--
There are several <gfs_controld...> arguments that are still supported,
but they have been deprecated in favour of the <dlm_controld...>
arguments. To see a full list, please read the 'gfs_controld(8)' man
page.
The one remaining argument that is still current is 'enable_withdraw'.
When set to '1', the default, GFS will respond to a withdrawl. To
disable the responce, set this to '0'.
Q. What does the responce actually do?
-->
<gfs_controld enable_withdraw="1"/>
<!-- Cluster Nodes -->
<clusternodes>
<!-- AN!Cluster Node 1 -->
<!--
The clusternode 'name' value must match the name returned by
`uname -n`. The network interface with the IP address mapped to
this name will be the network used by the totem ring. The totem
ring is used for cluster communication and reconfiguration, so
all nodes must use network interfaces on the same network for
the cluster to form. For the same reason, this name must not
resolve to the localhost IP address (127.0.0.1/::1).
Optional <clusternode ...> arguments:
- weight="#"; This sets the DLM lock directory weight. This is
a DLM kernel option.
Q. This needs better explaining.
-->
<clusternode name="an-node01.alteeve.com" nodeid="1">
<!--
By default, an initial totem ring will be created on
the interface that maps to the name above. Under
Corosync, this would have been "ring 0".
To set up a second totem ring. The 'name' must be
resolvable to an IP address on the network card you
want you second ring on. Further, all other nodes must
be setup to use the same network as their second ring
as well.
NOTE: Currently broken, do not use until this warning
NOTE: has been removed.
-->
<!--
<altname name="an-node01-sn" port="6899"
mcast="239.94.1.1" />
-->
<!-- Fence Devices attached to this node. -->
<fence>
<!--
The entries here reference devices defined
below in the <fencedevices/> section. The
options passed control how the device is
called. When multiple devices are listed, they
are tried in the order that the are listed
here.
The 'name' argument must match a 'name'
argument in the '<fencedevice>' section below.
The details must define how 'fenced' will fence
*this* device.
The 'method' name seems to be unpassed to the
fence agent and is useful to the human reader
only?
All options here are passed as 'var=val' to the
fence agent, one per line.
Note that 'action' was formerly known as
'option'. In the 'fence_na' agent, 'option'
will be converted to 'action' if used.
-->
<method name="node_assassin">
<device name="batou" port="01"
action="reboot"/>
</method>
</fence>
</clusternode>
<!-- AN!Cluster Node 2 -->
<clusternode name="an-node02.alteeve.com" nodeid="2">
<altname name="an-node02-sn" port="6899"
mcast="239.94.1.1" />
<fence>
<method name="node_assassin">
<device name="batou" port="02"
action="reboot"/>
</method>
</fence>
</clusternode>
</clusternodes>
<!--
The fence device is mandatory and it defined how the cluster will
handle nodes that have dropped out of communication. In our case,
we will use the Node Assassin fence device.
-->
<fencedevices>
<!--
This names the device, the agent (script) to controls it,
where to find it and how to access it.
-->
<fencedevice name="batou" agent="fence_na"
ipaddr="batou.alteeve.com" login="section9"
passwd="project2501" quiet="1"></fencedevice>
<fencedevice name="motoko" agent="fence_na"
ipaddr="motoko.alteeve.com" login="section9"
passwd="project2501" quiet="1"></fencedevice>
<!--
If you have two or more fence devices, you can add the extra
one(s) below. The cluster will attempt to fence a bad node
using these devices in the order that they appear.
-->
</fencedevices>
<!-- When the cluster starts, any nodes not yet in the cluster may be
fenced. By default, there is a 6 second buffer, but this isn't very
much time. The following argument increases the time window where other
nodes can join before being fenced. I like to give up to one minute but
the Red Hat man page suggests 20 seconds. Please do your own testing to
determine what time is needed for your environment.
-->
<fence_daemon post_join_delay="60">
</fence_daemon>
Examples
Examples of Fedora 13 cluster.conf configurations.
Examples of CentOS 5 cluster.conf configurations.
Any questions, feedback, advice, complaints or meanderings are welcome. | |||
Alteeve's Niche! | Alteeve Enterprise Support | Community Support | |
© 2025 Alteeve. Intelligent Availability® is a registered trademark of Alteeve's Niche! Inc. 1997-2025 | |||
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions. |