Best practice to restart the master server after a cluster failover (up to v. 5.2)
This procedure applies to server versions up to v. 5.2.11.
Thanks to the implementation of this Feature Request: https://www.nomachine.com/FR08N03172, the procedure for restoring the active/passive roles in a NoMachine cluster after failover has been optimized and it's no longer needed to adopt the manual procedure described below.
A NoMachine cluster is made of two servers: the primary or master server (server A) and the secondary or slave server (server B).
When the secondary server (server B) looses contact with the primary server (server A), the cluster failover occurs: server B takes place of server A to grant the continuity of services. Sessions running on the remote nodes continue to stay connected, management of server-node connections is passed from server A to server B.
Until the implementation of this Feature Request: https://www.nomachine.com/FR08N03172 it's necessary to apply the following manual procedure for restarting the master server and restore its original active role. The implementation of FR08N03172 will make it automatic.
IMPORTANT:
a) The primary server (server A) is always active, but it doesn't monitor the secondary server (server B). The secondary server (server B) is passive, its task is to constantly monitor server A.
b) Server A creates i) the cluster virtual interface and ii) connections to each remote node. The secondary server monitors server A, if it detects that the primary server is not responding, server B fails over and takes i) the cluster virtual interface and ii) connections to each remote node.
Server B fails over only after the 'grace period' that is set by default to 30 seconds. A different grace period can be set by using the 'nxserver --clusterupdate' command.
When the cluster failover occurs, it's necessary to restore the previous configuration (server A as primary server and server B as secondary server) manually. Use a fence device to cut server A off permanently from the network when it looses network connection.
This is necessary to avoid to have two operative servers on the same network with the risk to duplicate server-node connections on both servers. In the cluster configuration it's necessary that only one server is active.
To re-activate the original primary server (server A) it's important to take care of the following.
1) Shutdown server B with 'nxserver --shutdown'. All sessions running on the remote nodes will be disconnected.
2) Ensure that host machine of server A is up and running (power on the machine or connect it to the network again depending on how it has been isolated).
3) Restart server A with 'nxserver --restart'.
4) Start server B with 'nxserver --startup'.
NOTES:
Remote nodes can be restarted independently from server A and server B.
Just wait for the server A to re-establish a new the connection to each of the remote nodes once it has been rebooted and re-activated in the cluster configuration.
To monitor the connection status, execute the following command on server A from console:
nxserver --nodelist
If the connection to the remote node is established , the node is marked as 'running' .
