Resolving -761 errors

  • 7021255
  • 29-Aug-2017
  • 29-Aug-2017

Environment

Novell eDirectory 8.8 SP7
Novell NetWare 6.5

Situation

Merged a Child Partition into Parent Partition
Resolving -761 errors
Servers may be stuck in Join State 2
Report Synchronization Status shows -761 errors
DSTRACE may show -672 although no replica ring inconsistencies may be found.

Resolution

Note: this TID is only applicable to older versions of eDirectory such as early versions of 8.8 SP7.

Option 1 - 
1. Disable outbound synchronization on all servers in the said replica ring through NDS iMonitor.  This can be done by going to Agent Configuration | Agent Synchronization.

2. Run the latest version DSREPAIR on all servers in the replica ring at the same time, with the "-ANT" switch
For Netware:
- Load dsrepair with the -ANT switch   i.e. LOAD DSREPAIR -ANT
- Run a local dsrepair

For Windows:
- In the NDSCons Utility, click on ndsrepair, type -ANT in the "Startup Parameters" field and click start
- Run a local dsrepair

For Unix platforms (Solaris, AIX, Linux):
- At a console, run ndsrepair -l yes -R -Ad -ANT

3. Enable outbound synchronization on all servers in the said replica ring through NDS iMonitor.  This can be done by going to Agent Configuration | Agent Synchronization.


After running DSREPAIR on all servers in the replica ring, trigger a synchronization through NDS iMonitor from the Master of the partition by doing the following:
Agent Configuration | Agent Triggers.  Select Replication and click on the Submit button.

This should resolve the Join State 2 and all the replicas will now be in an ON state.  Allow some time for changes to synchronize. If the replicas are still in a Join State 2, move to Option 2 below.

Option 2 - Use NDS iMonitor to view and verify there are transitive vector timestamps in the future.  
WARNING: The following steps are irreverible.  Please make a backup of the database before continuing:
On Netware:
Run dsrepair -rc

On Windows:
Make a copy of the ..\Novell\NDS\DibFiles directory

On Unix:
Make a copy of the /var/nds/dib directory 
NOTE:  Make sure you use the -R switch with the cp command to get all subdirectories.

Do the following: 

- Login to NDS iMonitor on the server holding the master of the partition that is getting the -761 errors.
- Click on Agent Synchronization
- Find the Partition in question in the display list and click on the partition.
- Scroll through the attributes in the left pane until you find the Transitive Vector attribute.
- View the timestamps on the right, looking for one in the future. 
- If a future timestamp is found, note the replica number 
i.e.
7-29-03
3:52:04 pm
1:2
In the above example, the replica number is 1.

Note the replica numbers with future timestamps and perform the following:
- Scroll through the attribute list on the left until the "Replica" attribute is found.
- Click on the Replica attribute
- The 6th column is the replica number.  Find the replica number that was identified above and note the associated server (listed in the 3rd column)

Once the servers have been identified, do the following:
NOTE: The replica ring for the Partition Receiving the -761 errors on Report Synch State MUST be consistent. Use NDS iMonitor | Agent Synchronization.  Click on Replica Synchronization located on the left of your screen on the same row as the said partition receiving the -761 errors.  Note the servers in the replica ring and their types.  Click on each server in the replica ring (the servers are listed in a Replica section on the bottom far left frame).  All servers MUST see the same view.  If they do not - Please call Novell Technical Support.

NOTE: Before using the below DSREPAIR switch, Verify you are running the latest DSREPAIR available.

Remove the partition on the servers that have the future timestamps by performing the following:

On Netware:
Load DSREPAIR -A -DR | Advanced Options menu | Replica and Partition Operations | Highlight the parent Receiving the -761 errors on Report Synch State |  Destroy Selected replica on this server.   

On Windows:
- In the NDSCons Utility, click on ndsrepair, type -A -DR in the "Startup Parameters" field.  Click on Start
- Highlight the desired partition on the left. 
- Click on Partitions | Destroy Selected Replica from Server

On Unix:
- Run ndsrepair -P -Ad -DR
- Select the replica number in question
- Select "Destroy the selected replica on this server


Use NDS iMonitor to synchronizae the changes by doing the following:
Agent Configuration | Agent Triggers.  Select Replication and click on the Submit button.

This should resolve the Join State 2 and all the replicas will now be in an ON state. 

Upgrade all servers in the replica ring to the latest versions of DS.  The minimum versions of DS that need to be placed on the servers are the latest public eDirectory and NDS updates.

After all servers in the ring are upgraded,  you can add real copies back to the server starting with the Parent, waiting until it goes to an ON state before turning the subordinate reference of the child (which will now show to be in a Dying state) into a Read Write copy of the partition.

Note:  If none of the previous steps resolve this issue, call Novell Technical Support for assistance.

.

Additional Information

A transitive vector timestamp for the one of the replicas was in the future.  This is an issue with earlier versions of DS.  All servers that are participating in the merge must be on the latest versions of DS.
Formerly known as TID# 10072036
Formerly known as TID# NOVL80465