2.5 Fault Tolerance

With fault tolerance you can monitor the health of the grouped interfaces and detect instances of faults such as link failure, NIC failure, and switch failure. When such a fault is detected, the load on that interface is diverted to another healthy interface. Fault tolerance works along with load balancing to ensure uninterrupted connection between hosts and the server.

If load balancing is enabled in a system and fault tolerance detects a fault in any interface, it diverts the traffic to the less loaded interface in the group. If load balancing is not enabled in a system and a fault tolerance detects a fault in the system, it randomly diverts the load to any of the available healthy interfaces in the group. When the failed interface recovers, it is put back into the healthy set and again the load is redistributed across them. The distribution of load, failover, and redistribution of load when the failed interface has recovered takes place in such a way that the flow of data is smooth and the TCP/IP connections stay intact throughout. The connected hosts re-map their IP addresses to the MAC address, mapping by picking up the broadcast messages sent by the server in case of a NIC failure, and they continue to work without any problems. Novell certified drivers are capable of detecting faults such as link failure or NIC failure.

NOTE:For multihoming, if fault tolerance is enabled and if the link is down, the card is not connected, or the card has a fatal error (not able to send packets out), a user cannot add a secondary IP address. If the user still wants to add a secondary IP address, an error message Cannot allocate resources to add secondary IP address is displayed. This indicates that there is no board to which an IP address can be bound.

For more information, see Figure 6-2 and Figure 6-4.