Recovering DS when the NetWare Server Crashes.
(Last modified: 23Jan2003)
This document (10013083) is provided subject to the disclaimer at the end of this document.
Recovering DS when the NetWare Server Crashes.
When a server's hardware has crashed, failed or a server has been taken out of a Directory Services tree without properly removing DS from that server, several steps need to be taken to ensure that the remaining servers can synchronize correctly and that if necessary, the server can be replaced and re-inserted
into the Directory Services tree.
WARNING: Deleting a server object for a failed server will cause loss of server references for that server unless proper steps have been taken. If a server fails and this server will be replaced follow TID # 10013535. The DSMAINT -PSE procedure will retain links to home directories, directory map objects, and NDS aware printing that will be otherwise lost if the server object is just deleted. (this switch does not work in NW 5)
1. Verify that time is synchronized on the network .
If time is not synchronized changes cannot properly be made to the directory service tree. See TID 2908867 for time synchronization help.
Load DSrepair | Time Synchronization
This will report whether time is synchronized across all available servers. If time is not synchronized, determine why. Questions to ask: is there a Single Time or a Reference Time server available and working properly? Are you using configured sources and if so, is the source server up and running?
2. Clean up the replica rings.
If a server goes down permanently or is replaced without removing DS, the replicas it contained will have incorrect
replica ring information. Each server in each of the replica rings will still think the server should be contacted with
updates whenever they occur. Also, if the server that has been removed or has had a hardware crash contained a master replica of any partition, another server with a read-write replica must be selected to become the new master replica.
If a non-troubled server contained all the master replicas, this process is rather simple. If it is not known what replicas the suspect server had, this process can be quite complicated, as
each replica in the tree would have to be queried to determine if the suspect server were part of the replica ring.
Verify that a Master replica exists for each Partition:
Load DSrepair | Advanced Options | Replica and Partition Operations | (select each replica one at a time ) | View Replica Ring
In the replica ring for each replica that was contained on the suspect server, verify that a master replica exists on a good server. If not, escape back one screen to Replica Options, Partition (this partition). There, choose the option to "Designate this Server as the New Master Replica". (If this server doesn't contain a read-write replica of said partition, or if you have another server you wish to be the master, do this
step on that server)
WARNING: DO NOT designate a Subordinate Reference replica as the new Master replica unless no R/W or Read Only replica exists of that partition. Doing so will cause all of your partition
objects to go unknown and you will have to recreate them manually.
Once it is verified that a master replica exists for each partition:
3. Clean up the Tree. (server objects) When a master replica is present for each partition, run PARTMGR or NDSManager in Windows. Delete the server object representing the crashed or suspect server. This will remove the server from the tables on each server in the tree, containing server names and IPX addresses, as well as remote nds. It will also remove the server from the replica rings, then come back to delete the server object.
NOTE: You may need to bring the server DOWN before you can delete the server object. Also, don't worry if the server object will not delete. When you re-install DS back onto the server it will prompt you to replace the existing NCP server object. Make sure you install the server into the SAME CONTEXT that it existed in before.
Wait some time for the server object to be deleted by the system, before executing the following step:
Check whether the server object is deleted. (Servers known to the database) Only after enough time (30min - 1Hour) you should:
4. Verify that each replica ring is consistent and valid: Load DSrepair -a | Advanced Options | Replica and Partition
Operations | (select each replica one at a time ) | View Replica Ring.
If the suspect server exists in the replica ring, select it and press enter, a screen will appear entitled Replica Options: Server <this server} Select the option entitled "Remove this Server from the Replica Ring".
This will remove the suspect/crashed server from the replica ring for this partition. This information will synchronize out to the other servers in the replica ring.
This step needs to be completed for each replica that the suspect/crashed server contained.
5. Clean up the Tree (volume objects) When the server object is deleted, the volume objects corresponding to it either be removed also or will go unknown; this is noted by a yellow ? beside them in NWadmin or (unknown)
beside the name in DOS.
Using Netadmin or NWadmin, delete the Volume Objects corresponding the suspect/crashed server. These need to be removed before the server is re-installed so the volumes can be put in the tree correctly.
6. Summary and Miscellaneous.
At this point all references to the suspect server should be gone. There should not be any place in the tree where you can find the server or its volumes. It is important here to make sure that all servers are synchronizing and communicating properly. Make sure the server that went down wasn't a router between two segments, etc..
SET DSTRACE=ON, SET DSTRACE=+S, SET DSTRACE=*H, make sure that everything completes correctly. You should see "SYNC: End sync of partition <partition name> All processed = YES." for each partition in the tree. See TID 2909026 for more information on DSTrace.
Also Load DSrepair | Report Synchronization Status Make sure this reports all synced within the last half hour and
7. Reinstall the server into the tree using INSTALL | Directory Options | Install Directory Services onto this server.
Realize that any print queues referencing this server's volume objects will need to be recreated. User's home directory assignments may also need to be reassigned. And Trustee assignments (user's file rights) will have to be restored from your SMS-compliant backup solution. If you don't have an SMS backup, you will need to reassign file rights manually (use containers and groups to assign rights, it will go much faster).
The reason for this is that all of these assignments were pointing to a specific volume object ID. When the volume object was deleted and recreated (through the INSTALL process), the ID number changed. Any objects referencing the old ID number will not function properly.
Search: install re-install reinstall server hard drive crash restore sys volume hardware failure taking out of tree remove removing
The Origin of this information may be internal or external to Novell. Novell makes all reasonable efforts to verify this information. However, the information provided in this document is for your information only. Novell makes no explicit or implied claims to the validity of this information.
Any trademarks referenced in this document are the property of their respective owners. Consult your product manuals for complete trademark information.
- Document ID:
- Solution ID: 4.0.5521881.2242755
- Creation Date: 25Jul1999
- Modified Date: 23Jan2003
Did this document solve your problem? Provide Feedback