This document is a short summary of failure detection types with additional/typical/recommended configuration examples using Link-based failure detection only. Even though link-based failure detection was supported before Solaris 10 (since DLPI link up/down notifications are supported by used network driver), it is now possible to use this failure detection type without any probing (probe-based failure detection).
Steps to Follow
IPMP Link-based Only Failure Detection with Solaris [TM] 10 Operating System (OS)
Contents:
1. Types of Failure Detection
1.1. Link-based Failure Detection
1.2. Probe-based Failure Detection
2. Configuration Examples using Link-based Failure Detection only
2.1. Single Interface
2.2. Multiple Interfaces
2.2.1. Active-Active
2.2.1.1. Two Interfaces
2.2.1.2. Two Interfaces + logical
2.2.1.3. Three Interfaces
2.2.2. Active-Standby
2.2.2.1. Two Interfaces
2.2.2.2. Two Interfaces + logical
3. References
1. Types of Failure Detection
1.1. Link-based Failure Detection
Link-based failure detection is always enabled (supposed to be supported by the interface), whether optional probe-based failure detection is used or not. As per PSARC/1999/225 network drivers do send asynchronous DLPI notifications DL_NOTE_LINK_DOWN (link/NIC is down) and DL_NOTE_LINK_UP (link/NIC is up). The UP and DOWN notifications are used in IP to set and clear the IFF_RUNNING flag which is, in the absence of such notifications, always set for an interface that is up. Failure detection software will immediately detect changes to IFF_RUNNING. These DLPI notifications were implemented to network drivers by and by, and supported by almost all of them since Solaris 10.
With link-based failure detection, only the link between local interface and the link partner is been checked on hardware layer. Neither IP layer nor any further network path will be monitored!
No test addresses are required for link-based failure detection.
For more informations, please refer to Solaris 10 System Administration Guide:
IP Services >> IPMP >> 30. Introducing IPMP (Overview) >> Link-Based Failure Detection
1.2. Probe-based Failure Detection
Probe-based failure detection is performed on each interface in the IPMP group that has a test address. Using this test address, ICMP probe messages go out over this interface to one or more target systems on the same IP link. The in.mpathd daemon determines which target systems to probe dynamically:
- all default routes on same IP link are used as probe targets.
- all host routes on same IP link are used as probe targets. ( Configuring Target Systems)
- always neither default nor host routes are available, in.mpathd sends out a all hosts multicast to 224.0.0.1 in IPv4 and ff02::1 in IPv6 to find neighbor hosts on the link.
Note: Available probe targets are determined dynamically, so the daemon in.mpathd has not to be re-started.
The in.mpathd daemon probes all the targets separately through all the interfaces in the IPMP group. The probing rate depends on the failure detection time (FDT) specified in /etc/default/mpathd (default 10 seconds) with 5 probes each timeframe. If 5 consecutive probes fail, the in.mpathd considers the interface to have failed. The minimum repair detection time is twice the failure detection time, 20 seconds by default, because replies to 10 consecutive probes must be received.
Without any configured host routes, the default route is used as a single probe target in most cases. In this case the whole network path up to the gateway (router) is monitored on IP layer. With all interfaces in the IPMP group connected via redundant network paths (switches etc.), you get full redundancy. On the other hand the default router can be a single point of failure, resulting in 'All Interfaces in group have failed'. Even with default gateway down, it could make sense to not fail the whole IPMP group, and to allow traffic within the local network. In this case specific probe targets (hosts or active network components) can be configured via host routes. So it is question of network design, which network path you do want to monitor.
A test address is required on each interface in the IPMP group, but the test addresses can be in a different IP test subnet than the data address(es). So private network addresses as specified by rfc1918 (e.g. 10/8, 172.16/12, or 192.168/16) can be used as well.
For more informations, please refer to Solaris 10 System Administration Guide:
IP Services >> IPMP >> 30. Introducing IPMP (Overview) >> Probe-Based Failure Detection
2. Configuration Examples using Link-based Failure Detection
An IPMP configuration typically consists of two or more physical interfaces on the same system that are attached to the same IP link. These physical interfaces might or might not be on the same NIC. The interfaces are configured as members of the same IPMP group.
A single interface can be configured in its own IPMP group. The single interface IPMP group has the same behavior as an IPMP group with multiple interfaces. However, failover and failback cannot occur for an IPMP group with only one interface.
The following message does tell you, that this is link-based failure detection only configuration. It is reported for each interface in the group.
/var/adm/messages
in.mpathd[144]: [ID 975029 daemon.error] No test address configured on interface ce0; disabling probe-based failure detection on it
So in this configuration it is not an error, but more a confirmation, that the probe-based failure detection has been disabled correctly.
2.1. Single Interface
/etc/hostname.ce0
192.168.10.10 netmask + broadcast + group ipmp0 up
# ifconfig -a
ce0: flags=1000843mtu 1500 index 4
inet 192.168.10.10 netmask ffffff00 broadcast 192.168.10.255
groupname ipmp0
ether 0:3:ba:93:90:fc
2.2. Multiple Interfaces
2.2.1. Active-Active
2.2.1.1. Two Interfaces
/etc/hostname.ce0
192.168.10.10 netmask + broadcast + group ipmp0 up
/etc/hostname.ce1
group ipmp0 up
# ifconfig -a
ce0: flags=1000843mtu 1500 index 4
inet 192.168.10.10 netmask ffffff00 broadcast 192.168.10.255
groupname ipmp0
ether 0:3:ba:93:90:fc
ce1: flags=1000843mtu 1500 index 5
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
groupname ipmp0
ether 0:3:ba:93:91:35
2.2.1.2. Two Interfaces + logical
/etc/hostname.ce0
192.168.10.10 netmask + broadcast + group ipmp0 up \
addif 192.168.10.11 netmask + broadcast + up
/etc/hostname.ce1
group ipmp0 up
# ifconfig -a
ce0: flags=1000843mtu 1500 index 4
inet 192.168.10.10 netmask ffffff00 broadcast 192.168.10.255
groupname ipmp0
ether 0:3:ba:93:90:fc
ce0:1: flags=1000843mtu 1500 index 4
inet 192.168.10.11 netmask ffffff00 broadcast 192.168.10.255
ce1: flags=1000843mtu 1500 index 5
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
groupname ipmp0
ether 0:3:ba:93:91:35
2.2.1.3. Three Interfaces
/etc/hostname.ce0
192.168.10.10 netmask + broadcast + group ipmp0 up
/etc/hostname.ce1
group ipmp0 up
/etc/hostname.bge1
group ipmp0 up
# ifconfig -a
bge1: flags=1000843mtu 1500 index 3
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
groupname ipmp0
ether 0:9:3d:11:91:1b
ce0: flags=1000843mtu 1500 index 4
inet 192.168.10.10 netmask ffffff00 broadcast 192.168.10.255
groupname ipmp0
ether 0:3:ba:93:90:fc
ce1: flags=1000843mtu 1500 index 5
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
groupname ipmp0
ether 0:3:ba:93:91:35
2.2.2. Active-Standby
2.2.2.1. Two Interfaces
/etc/hostname.ce0
192.168.10.10 netmask + broadcast + group ipmp0 up
/etc/hostname.ce1
group ipmp0 standby up
# ifconfig -a
ce0: flags=1000843mtu 1500 index 4
inet 192.168.10.10 netmask ffffff00 broadcast 192.168.10.255
groupname ipmp0
ether 0:3:ba:93:90:fc
ce0:1: flags=1000843mtu 1500 index 4
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
ce1: flags=69000842mtu 0 index 5
inet 0.0.0.0 netmask 0
groupname ipmp0
ether 0:3:ba:93:91:35
2.2.2.2. Two Interfaces + logical
/etc/hostname.ce0
192.168.10.10 netmask + broadcast + group ipmp0 up \
addif 192.168.10.11 netmask + broadcast + up
/etc/hostname.ce1
group ipmp0 standby up
# ifconfig -a
ce0: flags=1000843mtu 1500 index 4
inet 192.168.10.10 netmask ffffff00 broadcast 192.168.10.255
groupname ipmp0
ether 0:3:ba:93:90:fc
ce0:1: flags=1000843mtu 1500 index 4
inet 192.168.10.11 netmask ffffff00 broadcast 192.168.10.255
ce0:2: flags=1000843mtu 1500 index 4
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
ce1: flags=69000842mtu 0 index 5
inet 0.0.0.0 netmask 0
groupname ipmp0
ether 0:3:ba:93:91:35