Scenario: Using DSClone for Disaster Recovery
Novell Cool Solutions: Feature
By Paul McKeith
Digg This -
Posted: 22 Mar 2005
Editor's note: Thanks to Paul McKeith for sharing this real-life success story on using DSClone in disaster recovery.
I have used DSClone to recover from a disaster. In fact, in a large-scale directory with millions of objects, I see no other effective means of recovery. It can take days to replicate millions of objects. To help illustrate this, I will use an example scenario to show how DSClone differs from "dsrepair -RC".
The "-RC" Scenario
When you take an "-RC", remember that this is a "snapshot in time" of that server's view of eDirectory. Using the "-RC" to restore a server's eDir database has limited appeal. For example, suppose you have a three-server tree with five partitions, each with one million objects. The tree is fully replicated to the other two servers. Here's a three-day look at what would typically happen:
You take the "-RC" of Server B and restart Server B as part of normal weekly DIBs.
As is normally expected, some unknown number of add/change/deletes are made to the o=Novell parition. It could be one, it could be a million - that doesn't matter. What matters is that changes happened, and admins will have no idea what changed. Of course, these changes are happily sync'd to all the servers in the respective rings.
Server B's DIB is corrupted, and DSRepair can't fix it. In fact, with 5 million objects I am not so sure you can afford to wait for the days it will take to run, just to find out if DSRepair will in fact fix it.
Traditionally, at this point you have two options:
Option 1: Clean the rings and delete the NCP server object. Let this sync out throughout the tree. Then build a new/replacement server and add the replicas back onto the new server. This can be combined with the DSMaint process to maintain references.
Option 2: Restore the DIB from Monday's "-RC". But Server A and Server C know that they already sync'd changes to/from Server B, so they will not re-send the changes since the time the "-RC" was taken. So a "dsrepair -xk2" is needed to destroy the replicas on the server but retain its references in the tree. Next, in turn, you would add each replica back onto Server B. With five million objects this could take a days to complete. But what if there were 10 million obects in a partition, or even 50 million objects in 25 partitions? You can see the enormity of the problem here.
The DSClone Solution
- Clean the rings and delete the NCP server object. Let this sync out throughout the tree.
- Identify another server (let's say Server A) that has the partitions that Server B had.
- 3) Use DSclone in iMonitor to clone Server A. It will ask you for a new NCP server object to be created. Let's call the replacement "Server D" for ease of discussion here. You can choose online mode, so Server A does not have to be brought down to do this.
- Now that you have DIB files and an NCP server object to match, copy the DIB files to the original Server B. How you get them there may be difficult on NetWare, but it's easy for Linux, Solaris, Windows, etc. (see the last paragraph of this article on this point).
- In our case, we also need to change the name in the autoexec.ncf to Server D.
- Reboot Server B (now Server D). Upon init, the DS will recognize that this is a cloned DIB, based on attributes added to the Server D NCP server object by DS Clone. It will then accept the copied DIB files as its own. It makes modifications to the psuedo server object in the local DIB as well.
Note: DSClone adds Server D to all the replica rings that the clone source (Server A) has. >>> The key is that when this server is added to the ring, the transitive vectors / sync'd up to times set to the time of the clone source at the time of the clone.<<< So other servers will know what changes need to be sent to the new server when it comes up. In online mode, DSClone then literally copies Server A's DIB files to another location. This DIB copy is in a state as though Server A had been brought down, but it has the "personality" of Server D.
Next is the really cool part ...
At this point, the server is up and has all 10 million objects that the original Server B had. But there are some caveats. This process will address all the eDirectory-related issues and result in a healthy tree. But it will not address everything. For example, it will not auto-create the server specific objects normally created during install, such as the CA, LDAP objects, certificates, SAS, etc. So these will have to be taken care of via pkidiag or "ndsconfig upgrade" on *nix. If the server has DirXML (Identity Manager) on it, it will have to be added back into the DriverSet. If the server is the CA, that should be restored from the CA export that you (hopefully) took. If the server is the NICI SDI key "tree key" server, make sure you add one of the other servers to the W0 object before starting up Server D.
On the NetWare copy issue ... As you can tell, this was really created with large-size Solaris/Linux-based implementions in mind. Doing the file copy is very easy over FTP or SCP. For NetWare, you could certainly burn the DIB to CD or DVD and copy it locally, using something like cpqfm, but the DIB files may be too large to copy this way. You might be able to use -RC/-xk2 just to get the server up and functional for file copies and go from there.
DSClone is best used for trees with millions of objects where adding replicas to a server would simply take too long. Essentially, what DSClone does is relegate partition operations, regardless of size, to the speed of a simple file copy. First and foremost, it is the most efficient way to build a multi-server tree with millions of objects, but it can also be used as a disaster recovery tool. DSClone is not an eDirectory backup and restore solution. Unlike typical backup/restore solutions, DSClone can be used to deploy a new server. As a disaster recovery tool, it does not "restore" a server to its state before a failure. Instead, it is used as a primary element of a comprehensive disaster recovery plan to rebuild a failed server for large-scale environments.
DSClone may not be the best way to recover a failed server. This is especially true in traditional NetWare environments where many other services must be considered. There might be many services dependent on eDirectory that are not automatically restored by DSClone. The services affected are exemplified by those affected when NWCONFIG is used on a NetWare server to move the server from one tree to another.
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com