Steeleye Lifekeeper Cluster Setup Walk-Through

Steeleye Lifekeeper Cluster Setup Walk-Through

Steeleye LifeKeeper Cluster Setup Walk-Through

  1. Create a public and a private network on 2 different NICs on the box. Make sure that you have the public network as the first priority through Windows by going into “Network Connections” -> “Advanced” -> “Advanced Settings” and putting the public network NIC at the top of the list.
  1. Document the names of the network connections (ex. “Local Connection 2”) and which network, public or private, that they connect to. You’ll need this later.
  1. Create a window’s host file for the two nodes in the cluster. The hosts file is typically under C:\WINDOWS\system32\drivers\etc. Make sure that the public IP address for both nodes in the cluster is in this file along with “localhost”.

  1. Verify that you can ping each node in the cluster along with “localhost” from each node. If you can’t, take the necessary steps to resolve that problem.
  1. Present storage to each node in the cluster keeping in mind that the target for the SDR (Steeleye Data Replication) mirror must be equivalent or slightly larger than the source volume. Make sure the drive-letters associated to the new volumes are identical.
  1. Install LifeKeeper and apply the licenses while logged in as Administrator.
  1. Fire up the LifeKeeper GUI and log in using Administrator and the Window’s Administrator password.
  1. At this point, you should have the LifeKeeper GUI up and see a single icon representing your server. The first step is to add the other node in the cluster by connecting to it. Make sure that the other node is up and running and that the LifeKeeper services are active. Then click on “File” -> “Connect” and fill out the fields.
  1. The other node should pop into view at this point. The next step is to add communication paths (both public and private). Click on “Edit” -> “Server” -> “Create Comm Path…”
  1. Select the local server and on the next screen select the remote server. The device type is “TCP”, Heartbeat Interval should be left at “6”, and Heartbeat Misses “5”. The next screen shows you your local interfaces. Select the Private network first and we’ll do the Public network second. Leave the Priority at “1” (implies that the local node is the primary node) and select the matching network over on the secondary node. Leave the port at “1500” and click “Create”.

  1. At this point you should return to the main LifeKeeper screen and see little warning indicators next to your node icons. This is warning you that you only have one communication path established and this will go away once we make our Public network communication path. Proceed to make the second communication path. Once you’re finished, the icons should return to normal and look similar to the picture below.

  1. The next step is to create a hierarchy and add a switchable (Virtual/Floating) IP address for the application resource that this cluster will be protecting. Do this by clicking on “Edit” -> “Server” -> “Create Resource Hierarchy…” For the Select Recovery Kit, choose “IP Address”. Switchback Type relates to whether LifeKeeper will automatically fail the resource back to the primary node or require manual intervention. “Intelligent” is what you want to select here (requires manual intervention). The next window lets you select which server will be the primary node for the switchable IP so select the local node. The next screen lets you type in what the IP address will be (put it on the same network as your public IPs) followed by selecting the appropriate subnet mask. The IP resource tag is how the IP will be displayed within LifeKeeper. For simplicity and clarity, it’s best to leave this as the IP address itself. Next you have to select which NIC is your Public network on the local machine. Leave local recovery to “No”, Quick Check Interval to “3”, Deep Check Interval to “5”, and click “Create”. In the status window you’ll see the IP being created locally but now we have to create it remotely. So click “Next” and select the remote node. Leave the Switchback Type to “Intelligent” and the Target Priority at “10”. The next screen has the switchable IP in a grayed out box. Move past this screen and select the appropriate subnet mask. Select which NIC is the Public network on the remote node, leave Target Restore Mode as “Enable”, Target Local Recovery at “No”, then click “Next”, “Finish”, and “Done”. What you end up with is an IP address under the Hierarchy side and Active and Standby icons under your nodes.

  1. At this point we could verity that the IP address switches from node to node correctly. Perform an “ipconfig” from a command line on the local node and notice how the switchable IP address has a present on the public network’s interface.

  1. If this is done on the remote node, you won’t see it present. Next, right-click on the switchable IP address under the Hierarchy window in LifeKeeper and select “In-Service…” Choose the remote node from the window that pops up and hit “Next” and then “In Service”. After a new second outage, the IP address moves over to the other node. To verify, run another “ipconfig” from the local node and verify that the switchable IP is no longer present.

  1. Move the IP address back to the primary node and we’ll begin to add another resource. Once things are back to normal, click on “Edit” -> “Server” -> “Create Resource Hierarchy…” From the window that pops up, select “Volume” and leave the Switchback Type to “Intelligent”. Let the local node be the primary so select it from the Server pull-down. On the next screen you are prompted to select the local drive letter that corresponds to the storage that we’ll mirror to the remote node. The Volume Tag is just how LifeKeeper will display the resource, select the default. Leave the Quick Check Interval at “3”, the Deep Check Interval at “5”, and click “Next”. It sets up the resource locally and you have to click “Next” again to set it up on the remote side. Select the remote node from the pull-down and leave the Switchback Type to “Intelligent”. Leave the Target Priority at “10” and click “Next”. The next two screens display the Volume Tag and the Volume Letter in grayed boxes so just hit “Next” to get past them. For Volume Type select “Create SDR Mirror”. For the Network End-Points select the private networks on both nodes (so we don’t saturate the Public network). The next screen allows you to select the Mode, either “Asynchronous” or “Synchronous”. Select whichever the customer decides (Asynch is much faster and does incremental block transfers but provides a chance of data loss upon failover if volumes aren’t completely synched). Click “Next”, “Create”, “Finish”, and “Done” to get back to the main screen. What you’ll see now is another resource under the Hierarchy side and two more icons under your nodes depicting where the volume is active and where it’s being mirrored to.
  1. When these resources fail over to the other node, we don’t want the users to be able to connect to it prior to the volume being transferred. To handle that, we’ll create a dependency. Right-click on the switchable IP address in the Hierarchies window and select “Create Dependency…” Select the volume from the pull-down and hit “Next”. Click on “Create Dependency” and you’ll see how the volume becomes a branch off of the switchable IP address in the Hierarchies section.
  1. Next we need to add a general application that will fail over from node to node along with the IP address and the storage. To add this, click on “Edit”, “Resource”, “Create Resource Hierarchy…” Select “General Application” and leave the Switchback Type to “Intelligent”. Select the primary node for the application which should have been the same one you selected for the IP and the storage, the local node. The Restore script is a batch file or executable that starts your application. Keep in mind that if there are multiple things that need to be started for your application to function correctly that they should all be bundled together in a batch file. Windows services can be started/stopped via the command line by using the syntax “net start <servicename>” and “net stop <servicename>”. The Remove script is the batch file or executable that stops the application. The next few screens have optional values, Quick Check, Deep Check, and Application Info. To leave these out (which is default), erase the contents of the text box and leave it blank. Answer “Yes” to bring the resource in service and on the next screen is where you define a LifeKeeper tag for the application. Select “Create Instance” and LifeKeeper will create half of the instance and then prompt you for the values that need to be input for the remote piece. Select the remote node, “Intelligent” Switchback Type, and the default value for the Target Priority. Once this is done, it takes you to a couple of screens where the Resource Tag and the Application Info values are displayed but grayed out. Once you’re past this, it completes the resource build and you can hit “Finish”. At this point you should have something that looks similar to the following.
  1. We need to establish a dependency for the application so that things are brought down and back up in the correct order. Most likely, you’ll want to have the storage come up first, then the application, and finally the IP address. That means that our current dependency must be removed and two new dependencies must be defined. Right-click on the IP address and select “Delete Dependency…” Select the storage and click “Done”.
  1. Right-click on the IP address and select “Create Dependency…” and select the application resource. Then right-click on the application and select “Create Dependency…” and select the storage resource. At this point you should have something look similar to the following.