Smoothwall Hardware Failover and Redundancy (Heartbeat)(Master and Slave)(Active and Passive Modes)

:::::Hardware failover:::::
(two UTM devices as a hardware failover pair)
(essential in a high avaiability environment)
(Master system – active state
failover system – passive state)
(these two devices communicate using the heartbeat interface)
(all configuration changes made on the master system gets replicated to the failover system)
(if the master system fails, the failover unit changes the state from the passive state to the
active state and takes over the IP addresses and connections from the master and continue where the master left off)
(the failover unit now in active state sends a broadcast to clear the ARP cahce on the switches)
(failover process takes less than 30 seconds before all the services are active depending on the amount of services that need to startup)
(network connectivity is restored within 10-15 seconds)

:managing and maintaining hardware failover is important:
(master can be shut and the failover unit can perform all the updates and then master can be restarted and the failover unit will fail back to master unit)
(this updates the failover unit, performs the failover test and also thefailback to master)
master unit can be accessed by the web interface IP address i.e.
http://192.168.110.1:81
https://192.168.110.1:441

failover unit will not have any active web interface except for the active heartbeat interface.
https://192.168.110.1:440 (sent to the master system and then proxies it and sent to the hearbeat interface)
Warning: shows that it is a failover unit (slave heartneat system) and the when was the last time the settings were copied over from the master unit.

if we want to connect to the CLI of the failover unit, we will have to SSH to he master first and then:
#ssh -p 222 root@10.99.0.2 (where 10.99.0.2 is the heartbeat interface)

:Recommended process:
1. first check all the updates are downloded on the master.
System » Maintenance » Updates-> refresh updates list -> download updates
(within 5 mins all the downloaded updates will transfer to the failover unit)
2. access the GUI of the failover unit using the port 440 and check if all the updates are downloaded
3. install all the avaiable updates on the failover unit.
4. perform a reboot of the failover unit. (should not interfere in the traffic as it is the failover unit)
5. we can then check the master’s system logs to see if the failoer unit is back and has joined as a failover unit.
6. now perform failover test:
first install all the updates which were downloaded on the master system.
7. we won’t reboot, but the master should go into standby mode to test the failover:
System » Hardware » Failover-> enter standby mode
8. after 30 seconds if you try to connect to the web intrerface it will be connected to the failover unit automatically.
9. now the failover unit is active, we need to use the port 440, to connect to the master passive system.
10. so reboot the master. when the master is back the failover will go to the passive status and the master to the active status.
11. wait 3 mins in order for the reboot to occur. then try the web interface login again and you will be connected to the master unit.
(failover pair should be tested regularly)
(split brain syndrome which makes both the systems think that they should be the active systems, it happens when the
connection between the two units has broken somehow)(disconnect everything on the failover unit except for the heartbeat interface)
(try rebooting the failover unit and it should come back to the passive state, if the issue was temporary)
(you can reboot the failover unit through the console CLI connection or the CLI through SSH from the master to the failover unit)
(use ifconfig to see the active interfaces on the failover unit)(only the heartbeat interface, loopback interface and the GRE interface in some cases should be active)
(also check the master system logs)
(simplest way to test the failover is to shutdown the master and failover unit becomes the active device)
(power on the master and then refresh the admin GUI, which shows the failback works)

::::::Hardware failover pair setup(planning deployment and installation)::::::
(one interface is used as a dedicated interface, which is the heartbeat interface)
1.install and configure the master system.
2.enable the hearbeat on the interface: Networking » Interfaces » Interfaces ( no need to restart the networking)
3.Enable SSH: System » Administration » Admin options
4.System » Hardware » Failover->Heartbeat:
Enable: ticked
auto failback: ticked
Keep-alive internal: passive system keeps checking if the active system is available
Dead time: how much time to wait to failover, when the active system goes down
Master heartbeat IP:
Slave heartbeat IP:
Netmask: these IPs shouldn’t be in use in the network anywhere.
5.Reboot: System » Maintenance » Shutdown->reboot
6.Generate failover archive: System » Hardware » Failover (also has recent updates)
(System » Maintenance » Updates->clear cache)(as the archive shouldn’t be more than 100MB)
7.boot the failover unit and install the updates same as the master system.
8.only cable the heartbeat interface.
9.install the archive on the failover unit using the #setup command -> restore configuration
(put the archive on a usb or CD and install it on the failover unit)
(it will setup the failover unit and the interfaces)
10.check the master logs: Logs and reports » Realtime » System->heartbeat (shows the failover unit is up)
11.failover unit’s web interface is accessed by using master’s IP but using the 440 port number.
(master redirects the traffic to go over the hearbeat interface)
Warning: shows that it is a failover unit (slave heartneat system) and the when was the last time the settings were copied over from the master unit.
12. if we want to connect to the CLI of the failover unit, we will have to SSH to he master first and then:
#ssh -p 222 root@10.99.0.2 (where 10.99.0.2 is the heartbeat interface)
13.connect the failover unit to the internal and external networks.(a switch may be required between the UTM units and the router to the IPS, so both units can access the router)
14. we won’t reboot, but the master should go into standby mode to test the failover:
System » Hardware » Failover-> enter standby mode
15. after 30 seconds if you try to connect to the web intrerface it will be connected to the failover unit automatically.
16. now the failover unit is active, we need to use the port 440, to connect to the master passive system.
17. failback(preemptive): so reboot the master. when the master is back the failover will go to the passive status and the master to the active status.
18. wait 3 mins in order for the reboot to occur. then try the web interface login again and you will be connected to the master unit.
(failover pair should be tested regularly)

:if the failover fails:
1.check the list of active interfaces on the master system. #ifconfig (ethA, ethB, ethC and ethD all are active)
2.check the list of active interfaces on the failvoer unit. #ifconfig (only the hearbeat interface should be active)
3.try pinging the heartbeat interface from the master to the failover unit, especially when the failover unit (ifconfig) shows the heartbeat interface as active.
if this happens then the cable is faulty.
(this is easy to diagnose and identify in the appliances, when using a 3rd party system, it is better to move the physical cable to other interface on the failover unit.
then keep pinging from the master unit to the heartbeat interface of the failover unit, and it will come up at some point)
4.issue the reboot command on the failover unit and check the realtime system logs on the master to see if the heartbeat interface is back up.
5.if there is a problem with the failvoer unit archive, simply generate another archive from the master unit.
6.if the failover unit stops working and need to be replaced, it is simply setting up a new failover unit (i.e. generate an archive from the master and setup failover unit)
7.if the master system stops working and need replacing:
1.simply make the failvoer unit to the master now and then setup a new failvoer unit.
connect to the CLI of the failover unit.
#rm /var/hearbeat/settings.tar.gz (remove this file: y)
#echo master>/etc/ha.d/nodeinfo (change slave strings to master in this file)
this will make failover unit into the master and the we can generate failover archive to setup a new failover unit
2.or setup a new master using the backup archive not the failover archive.

Advertisements

Posted on May 12, 2014, in Smoothwall. Bookmark the permalink. 1 Comment.

  1. Hi – please could you tell me, is this configuration intended for the community edition only? Or could this be achieved with the paid for product? Thanks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: