Novell is now a part of Micro Focus

Snapshots - a Good DR Practice?

Novell Cool Solutions: Tip
By Jim Henderson

Digg This - Slashdot This

Posted: 29 Mar 2006

A reader posed the following idea about taking snapshots as a means of disaster recovery:

"In our virtual environment, it's quite possible to take cold or hot snapshots of our virtual machines. From a DR standpoint, I would think that such a snapshot would be great for expediting the recovery of a failed or corrupted virtual server(s)."

And here are a few thoughts on the topic from eDirectory expert Jim Henderson:

The thing that makes snapshots of the servers potentially problematic is the issues that are introduced when making partitioning and replica placement changes in the directory.

Let's look at a small example - three servers in the tree. All servers hold one replica of [Root] (first server is the Master, the other two are R/W per the default behaviour).

Scenario 1

a. Snapshot the servers at this point.

b. Now create a partition at OU=EDU.OU=PRV.O=NOVELL.

c. Destroy server 3 and recover the snapshot. The replica rings are quite inconsistent at this point, but it's not totally unrecoverable - but somewhat messy to do so (would likely require a dial-in from Novell).

Scenario 2

Instead of destroying a server, add a server to the tree, and then destroy the master replica. This is actually easy to recover from because the fourth server holds no replica information.

Combination of Scenario 1 and 2

Now let's combine the above two scenarios:

a. Snapshot the three servers.

b. Make a partition change.

c. Remove the replica from server 2 and add a new server to the partition OU=EDU.OU=PRV.O=NOVELL and then change the new server to the master.

d. Now destroy a server holding a R/W of root and restore its snapshot.

This would be a mess to recover using a snapshot. There would be replica changes, addition of a server, and a change in partitioning that are unaccounted for in the old snapshot.

Now, with a small number of servers, you would probably be able to get away with snapping all of the servers after each replica change, but the more servers you have in the environment, the more complex (and administratively intense) this process becomes. With more than a handful of servers, the process for making partition or replica changes could become so burdensome as a preventitive measure that these changes could be a problem. When restoring a snapshot - especially if you have a distributed administration model - you'd have to ask yourself whether or not the snapshot was really going to restore properly - not so much from the point of "will it work" as "did we remember to snap all of the servers the last time we made a change to partitioning or replica placement."

Even servers that don't communicate directly with each other have to know about each other for the placement of external references, so you couldn't limit yourself to just a single replica ring. You'd have to revert all of the servers to the previous state in order to ensure that things were consistent.

As you can see, it can be increasingly complex to account for all of the variables when using snapshots of the servers as a means of disaster recovery.

Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions.

© Copyright Micro Focus or one of its affiliates