Saturday, March 29, 2008

Clusters

The High Availability Linux Project


The basic goal of the High Availability Linux project is to:

Provide a high availability (clustering) solution for Linux which promotes reliability, availability, and serviceability (RAS) through a community development effort.

The Linux-HA project is a widely used and important component in many interesting High Availability solutions, and ranks as among the best HA software packages for any platform. We estimate that we currently have more than thirty thousand installations up in mission-critical uses in the real world since 1999. Interest in this project continues to grow. These web pages are average nearly 20000 hits per day, and we see more than 100 downloads of Heartbeat per day.

Heartbeat now ships as part of SUSE Linux, Mandriva Linux, Debian GNU/Linux, Ubuntu Linux, Red Flag Linux, and Gentoo Linux. Ultra Monkey, and several company's embedded systems are also based on it. Although this is called the Linux-HA project, the software is highly portable and runs on FreeBSD, Solaris, and OpenBSD, even on !MacOS/X from time to time.

There have been many articles and several chapters in books written on this project and software. See the PressRoom for more details.

We are now competitive with commercial systems similar to those described in D. H. Brown's 1998 or March 2000 analysis of RAS cluster features and functions. This release 2 series brings technologies and basic capabilities which match or exceed the capabilities of many commercial HA systems. We think you'll be surprised. An R2 getting started guide is available.

We include advanced integration with the DRBD real-time disk replication software, and also work well with the LVS (Linux Virtual Server) project. We expect to continue to collaborate with them in the future, since our goals are complementary.

We have a page of reference sites to provide a few real-life examples of how organizations both small and large use Heartbeat in production. Submissions for this page are actively encouraged.

Heartbeat is a leading implementor of the Open Cluster Framework (OCF) standard.


What Linux-HA can do nowHeartbeat currently supports a very sophisticated dependency model for n-node clusters. It is both extremely useful and quite stable at this point in time. The following types of applications are typical:

Database servers
ERP applications
Web servers
LVS director (load balancer) servers

Mail servers
Firewalls
File servers
DNS servers
DHCP servers
Proxy Caching servers
Custom applications
etc.
Heartbeat is used in virtually every market segment, industry, and organization size.

Heartbeat

The Heartbeat program is one of the core components of the Linux-HA (High-Availability Linux) project. Heartbeat is highly portable, and runs on every known Linux platform, and also on FreeBSD and Solaris. Ports to other OSes are also in progress.

Heartbeat is the first piece of software which was written for the Linux-HA project. It performs death-of-node detection, communications and cluster management in one process.


Sample Cluster Configuration



In these examples, the server names for our cluster will be paul and silas. The cluster is assumed to send heartbeats on both the eth0 and eth1 ethernet interfaces. The IP addresses which will be used as ServiceAddresses to run services on will be 1.2.3.4 and 1.2.3.5, which are on the 1.2.3.0/24 subnet with the default route pointing to 1.2.3.254.



a)A Basic Single IP address Configuration

The most common, basic configuration is that of a high-availability server which simply provides a single IP address (1.2.3.4) to be failed over between the nodes of our cluster. This is an ActivePassive configuration - the most basic configuration.


/etc/ha.d/ha.cf fileThis is for Heartbeat 1.2.x

logfacility daemon # Log to syslog as facility "daemon"
node paul silas # List our cluster members
keepalive 1 # Send one heartbeat each second
deadtime 10 # Declare nodes dead after 10 seconds
bcast eth0 eth1 # Broadcast heartbeats on eth0 and eth1 interfaces
ping 1.2.3.254 # Ping our router to monitor ethernet connectivity
auto_failback no # Don't fail back to paul automatically
respawn hacluster /usr/lib/heartbeat/ipfail # Failover on network failures

This is for Heartbeat 2.0.x without CRM

logfacility daemon
keepalive 1
deadtime 10
warntime 5
initdead 120 # depend on your hardware
udpport 694
ping 1.2.3.254
bcast eth0
auto_failback off
node paul
node silas
respawn hacluster /usr/lib/heartbeat/ipfail
use_logd yes

This is for Heartbeat 2.0.x with CRM

logfacility daemon
keepalive 1
deadtime 10
warntime 5
initdead 120 # depend on your hardware
udpport 694
ping 1.2.3.254
bcast eth0
auto_failback off
node paul
node silas
use_logd yes
compression bz2
compression_threshold 2
crm yes

See the ipfail page for more information on ipfail.


bcast / mcast / ucastIf you want less broadcast traffic, use ucast, which is strictly peer-to-peer. bcast is limited to the logical segment and not routed, while ucast/mcast are potentially routed. ucast duplicates the packets, as it has to be sent to each node and not just broad/multicasted to all of them at the same time.


/etc/ha.d/haresources fileFor Heartbeat version 2 with CRM, you'll need to modify cib.xml instead of this file. Please see the Basic Single IP address Configuration for version 2 page for details.


paul 1.2.3.4

The first word (paul) on the line represents the "preferred" host for the service. The remainder of the line is the list of resources (services) which are part of this ResourceGroup. In this case, there is only one resource -- an IP address. This is a shorthand notation for IPaddr::1.2.3.4. There are many possible variants of how to specify the IP address, to learn about them, see the page on the IPaddr resource agent.

Note that this address cannot be used for anything else on these machines. In particular, it has to be controlled only by Heartbeat, and cannot be brought up by your operating system at boot time. We call this address a ServiceAddress - which is distinct from an AdministrativeAddress, like those brought up by your operating system.


/etc/ha.d/authkeys file/etc/ha.d/authkeys must be mode 600. See the section on GeneratingAuthkeysAutomatically for information how to generate good keys automatically.

auth 1
1 sha1 PutYourSuperSecretKeyHere


b) An Active/Active Two IP address Configuration

A common configuration is that of a high-availability server which simply provides two IP addresses (1.2.3.4, and 1.2.3.5) to be failed over between the nodes of our cluster. We will set this up as an active/active configuration.


/etc/ha.d/ha.cf file
logfacility daemon # Log to syslog as facility "daemon"
node paul silas # List our cluster members
keepalive 1 # Send one heartbeat each second
deadtime 10 # Declare nodes dead after 10 seconds
bcast eth0 eth1 # Broadcast heartbeats on eth0 and eth1 interfaces
ping 1.2.3.254 # Ping our router to monitor ethernet connectivity
auto_failback yes # Try and keep resources on their "preferred" hosts
respawn hacluster /usr/lib/heartbeat/ipfail # Failover on network failures

See the ipfail page for more information on ipfail.


/etc/ha.d/haresources file
paul 1.2.3.4
silas 1.2.3.5

The first word (paul or silas) on the line represents the "preferred" host for the service. The remainder of the line is the list of resources (services) which are part of this ResourceGroup. In this case, each ResourceGroup consists of only one resource -- an IP address. 1.2.3.4 is a shorthand notation for IPaddr::1.2.3.4, and 1.2.3.5 is a similar shorthand for IPaddr::1.2.3.5.

Because auto_failback was enabled, when paul joins the cluster it will regain the 1.2.3.4 address. Similarly, when silas joins the cluster, it will regain its (1.2.3.5) service address. If an active/passive configuration is desired, then simply change auto_failback to no.


/etc/ha.d/authkeys file/etc/ha.d/authkeys must be mode 600. See the section on GeneratingAuthkeysAutomatically for information how to generate good keys automatically.

auth 1
1 sha1 PutYourSuperSecretKeyHere

No comments: