Novell Home

eDirectory Partition Operations - Part 1

Novell Cool Solutions: AppNote
By Akos Szechy

Digg This - Slashdot This

Posted: 9 Nov 2005
 

Appnote: eDirectory Partition Operations - Part 1

Overview

This AppNote describes the different types of eDirectory replicas and the operations that can take place in eDirectory. Also, it offers advice on avoiding "stuck" operations and helps you see what is happening in the background during these operations. Hopefully, the tips and tricks in this AppNote will lessen the need for calls to Novell Support as well.

Each operation in this AppNote contains an example with the following items:

  • Description of the current operation
  • The states of the partition during the operation
  • The steps performed during the operation
  • Suggestions
  • Troubleshooting
  • Common error codes
  • Screenshots using the test environment

eDirectory Basics

This overview illustrates the absolutely basic concepts of eDirectory. It describes the replica types and partitions; if you need more information about these, see the eDirectory documentation at:
http://www.novell.com/documentation/edir873/index.html?page=/documentation/edir873/edir873/data/fbadjaeh.html

Partitions

A partition is a logical division of the eDirectory database. A directory partition forms a distinct unit of data in the tree that stores directory information.

Partitions can be created at container level objects, like Organization, Organizational Unit or any objects which marked as container. An eDirectory has one partition by default, called the [ROOT] partition. This partition contains all the object by default.

Replicas

A replica is a copy or an instance of a user-defined partition that is distributed to an eDirectory server. If you have more than one eDirectory server on your network, you can keep multiple replicas (copies) of the directory. That way, if one server or a network link to it fails, users can still log in and use the remaining network resources.

There are six types of replicas:

1. Master replica: There can be only one Master replica for a partition. The Master is a read-writeable replica that, most importantly, controls the partition operations and the obituary process. This type of replica also performs the following operations:

  • Managing objects (add, remove, move)
  • Authenticating objects
  • Managing attributes (add, remove)

By default the first server in the tree holds the Master replica of the [ROOT] partition.

2. Read-Write replica: This replica type allows modification to objects and will automatically propagate them to the other replicas based on the timestamps. You can designate a Read-Write replica as a Master replica.

3. Read-Only replica: This replica type is only readable. It does not perform any write operations; it will forward all writing requests to a Read-Write replica. The replica can be designated as a Master replica.

4. Filtered Read-Write Replica: This replica contains only a special set of classes and attributes specified by the filter. The replica can be written and the changes will be synchronized to the other replicas.

A server can hold more than one Filtered replica, but there can be only one filter specified per server that applies to all replicas stored on the server. Because the replica does not contain all the information about a partition (it has only a subset), it cannot be designated directly as a Master. It must be changed to a Read-only or Read-Write replica first.

5. Filtered Read-Only Replica: The same rules applies to this replica type as the ones to the Filtered Read-Write Replica, but the replica is only readable, therefore all writing requests are forwarded to a writeable replica.

6. Subordinate reference replica: Subordinate reference replicas are system-generated replicas that don't contain all the objects, attributes and values like a master or a read/write replica. Subordinate reference replicas, therefore, don't provide fault tolerance. They are internal pointers that are generated to contain enough information for eDirectory to resolve object names across partition boundaries. You cannot create a Subordinate references replica; eDirectory will create it when the server holds a replica of the parent partition, but not one of the child partitions.

The Subordinate replica holds no partition data, only information about the "real" replica-holder servers. Therefore it cannot be designated as a Master without adding a Read-Write or Read-Only replica.

The following figure uses the common signs for different replica types:

Figure 1 - Icons for replica types

About eDirectory Partition Operations

This section walks you through partition operations with explanations and screenshots, giving you background information on what is happening on the eDirectory side.

General Technical Information

Each partition operation will be introduced using an example. In this Appnote all operations are introduced in NetWare environment, but the partition operations are the same on all platform. On DSREPAIR (and NDSREPAIR on the xNIX platform) they may look like a little different.

It's suggested you use the latest patch levels, so our tree name is NW65SP4_TREE (NetWare 6 .5 SP4 is the latest Service pack for NetWare 6.5 at this writing). There are two servers in the tree:

  • NW65SP4.Services running eDirectory 8.7.3.7 with IP address 172.16.63.16
  • APPNOTE2.Appnote running eDirectory 8.7.3.7 with IP address 172.16.63.17

Initially, we have the following partitions:

  • [ROOT] with a Master replica on NW65SP4
  • PLAY with a Master replica on NW65SP4
  • CHILD.PLAY with a Master replica on NW65SP4

Here is how the tree looks from ConsoleOne at the beginning:

Figure 2 - Tree view in ConsoleOne

Before You Start...

First you'll need to do some basic checks; otherwise you can easily end up with a stuck partition operation. There are a few "golden rules" that you must do before you start any of the partition operations. These are:

1. Always create a backup on the servers holding a replica of the partition(s) involved in the operation(s). Here are some backup methods:

  • Use the DSRepair -A/Advanced options/Create NDS Archive option to create a backup, however this needs to involve Novell Technical Services if something goes wrong (the author prefers this method).
  • Use any filemanager on the server side (CPQFM.NLM, TOOLBOX.NLM) to create a backup of the SYS:_NETWARE/ directory where the NDS files are stored. However, you have to unload ds.nlm to create this backup. Note: Back up the SYS:_NETWARE/NDS.RFL directory as well; otherwise, in case of a restore, the database might not open.
  • Use eMbox or TSA software to do backups.

2. Always do a basic health check based on the following TID, which has a Tutorial as well on how to check your tree:
http://support.novell.com/cgi-bin/search/searchtid.cgi?10060600.htm

You may also want to reference the article on Using iMonitor to Perform eDirectory Health Checks, at:
http://www.novell.com/coolsolutions/feature/15336.html.

3. Be careful about using DSRepair for doing things like designating a new master. Remember that DSRepair is not an administration tool, but a repair tool (as the name indicates) - it bypasses many checks and can leave you in bad shape. Always use the administration tools for administrative tasks, and use diagnostic/repair tools for diagnostics and repair.

4. Always use the latest available patch levels.

5. Avoid using mixed version of DS in the replica ring.

6. Be patient - some of the operations can be time consuming!

Adding a Replica to a Server

This operation is one of the easiest operations, but it can easily go wrong if there is not proper communication between the servers. It's absolutely necessary to check that the servers are able to communicate with each other.

If this is the first replica on a server, the schema will be synchronized to server as well, which can take some time.

Replica States

The replica can have the following states during the operation:

  • Begin Add - the replica is being added to the server.
  • New - the server holding the Master replica for the partition adds the server into the replica ring. It also synchronizes the new member of the ring to the other servers holding a replica of this partition. (The Master server assigned a replica number to this replica.) The Master replica sends all the data in this partition to the server. During this state the clients cannot login to this replica, and the replica only accepts information from the Master replica. The necessary Subordinate replicas will be created. The creation of the Subordinate replicas is the same process as a New replica creation, but no real data is stored in these replicas, so this process is much faster.
  • Transition On - during this state, the replica synchronizes the changes with the other members of the replica ring. If all members are aware of the new member in the replica ring, and all subordinates are placed in the On state, the replica can also be set to the On state.
  • On - this is the final, operational state where the replica can accept changes and where clients can reach the replica. The replica will stay in On state.

The Add Process

The process of adding a replica is the following:

  1. The Master replica adds the server to the replica ring.
  2. The new server gets a state of "Begin add" and starts to add the replica. If this is the first replica on the server, it requests the schema.
  3. The changes are propagated to all members of the replica ring, and the replica is set to the New state.
  4. The necessary subordinate replicas are added to the server.
  5. All the partition data is synchronized from the Master to the server.
  6. If all subordinates are ON, the data and the schema are synchronized and the Master turns the replica to Transition On state.
  7. At this point, the members of the replica ring start to update the partition's data with the latest changes.
  8. Once its done, the Master turns the replica to the ON state.

Suggestions

  1. Always add the replicas from the bottom to the top. Never start by adding a [ROOT] first, because it will create unnecessary Subordinates.
  2. Always check you have a Master replica for the partitions involved in the operation. Also take into consideration that Subordinates might created - and if the partition with the subordinate does not have a Master replica, the process will not finish. (It will lead to a -602 error when you try to add the replica.)
  3. Check that the new server will be able to communicate with the rest of the replica ring and the DS versions are compatible with each other.

Troubleshooting and Error Codes

If you think the replica is stuck for some reason, you can use the DSTrace utility to check what's going on in the background. Basically, you want to enable the schema, the inbound replica synchronization, and the partition operations. Therefore the following DSTRACE commands can be used on the server receiving the replica:

  • set dstrace = on (turns on the dstrace by enabling the Directory Services screen)
  • set dstrace = nodebug (turns off unnecessary messages)
  • set dstrace = +part (shows messages related to partition operations)
  • set dstrace = +in (shows incoming objects)
  • set dstrace = +schema (shows if schema synchronization is still in progress)

If you see messages in brown text, like "Received packet with entry found 1", this means that the server is still receiving the object from the server in the trace (in this case from NW65SP4). The synchronization happens by timestamps - the earlier the object is created, the earlier it is synchronized:

Figure 3 - Server still receiving objects

If you see messages in green text, that means schema synchronization is in progress:

Figure 4 - Schema synchronization in progress

Here are some error codes that might appear:

  • -602 NO_SUCH_VALUE - During the add of the replica, this usually indicates no Master replica for a partition involved in the operation.
  • -603 NO_SUCH_ATTRIBUTE - In DSREPAIR/Report synchronization status - no error; this is normal while the replica is in New state and for a couple of minutes after the ON state
  • -637 PREVIOUS_MOVE_IN_PROGRESS - When you try to add a replica, this error indicates you have some obituaries in your system. To solve this issue, follow TID 10064117.
  • -654 PARTITION_BUSY - Another partition operation hasn't finished yet, or you have problems with your Transitive Vectors. A Repair Local Database operation might fix the error; otherwise, check the Knowledgebase for other possible reasons.
  • -657 SCHEMA_SYNC_IN_PROGRESS - The schema synchronization hasn't finished yet. The error should be solved by the DS itself.
  • -659 TIME_NOT_SYNCHRONIZED - The time is not synchronized between the servers, or you have future timestamps in the partition. Look for "Synthetic time is being issued" messages on the console of the server holding the Master replica for the partition.
  • -698 REPLICA_IN_SKULK, -6015 ERR_SERVER_IN_SKULK - The server is synchronizing the partition with another server; therefore it cannot deal with the server giving the error. The error should be solved by DS itself.
  • -761 ERR_SEEN_THIS_STATE - During synchronization, this error message is normal. This is how the server notifies the other servers it is up to date.

Scenario

Suppose we need to add a replica of the PLAY partition to the APPNOTE2 server. Before we start, we should answer the following questions:

  • 1. Which partitions will be involved in the operation? - In this operation two partitions are involved: the PLAY and the CHILD partition, because a subordinate will be created of the CHILD partition on APPNOTE2. We place the parent of the partition to the server, but not the child.
  • Which servers will be involved in the operations? - Two servers are involved, because the Master is sending the data to the Read-Write replica we are currently adding.
  • Will any subordinates be created? Yes, a subordinate will be created of the CHILD partition after the replica add.

Now let's see the operation step by step:

1. We add the replica from ConsoleOne.

Figure 5 - Adding the replica

2. The replica is added on NW65SP4, and the Subordinate of CHILD is created on APPNOTE2:

Figure 6 - CHILD Subordinate created

3. Once the Subordinate is added, the replica is changed to NEW.

Figure 7 - Replica in New state

4. When all the syncronization is finished, the replica changes to ON:

Figure 8 - Replica in On state

Removing a Replica

People usually say that removing a replica is not an issue - you can always bypass the normal approach and force it off. This is true, but it's usually better to use the normal way. There are a lot of issues when you forcefully remove a replica, and your system can easily become inconsistent.

When you remove a replica, consider the following items - some of them might seem trivial, but it's better to be on the safe side:

  • The Master replica of a partition cannot be removed. You have to move the master to another server first.
  • A Subordinate replica of a partition cannot be removed. They are system-placed replicas, and the system has to remove them automatically.
  • When you remove a replica, eDirectory has to check all the child partitions to determine if a Subordinate exists and if it needs to be removed.
  • eDirectory also has to do a check when you remove a partition; a Subordinate has to be created if the parent of the partition exists.

Replica States

In the Dying state, the server checks if a Subordinate needs to be created. If so, it propagates the change to the replica type and continues to work as an "Adding a replica" operation. If no subordinate necessary, it propagates the changes to the members of the replica ring and moves all the objects in the partition to the External Reference partition. All the objects become external references, and the backlinker checks whether the objects are really needed or can be deleted from the local database.

In the Dead state, the other servers of the replica ring are notified about the replica's state; once that's done, the replica can be removed.

Removal Process

  1. The Master of the partition notifies the members that the replica will be removed.
  2. The replica changes to Dying state.
  3. All the objects converted to External references.
  4. eDirectory checks if a Subordinate needs to be created. If so, it places a Subordinate replica and continues the operation as a Replica add; otherwise the replica enters the Dead state.
  5. Once all the servers are notified, the partition record will be removed, and the replica disappears from the server.

Suggestions

  • Always remove the replicas from top-to-bottom to prevent unnecessary creation of Subordinate replicas.
  • If the Subordinates are no longer needed, remember that they do not disappear for a long time (one day). This is an issue with older DS versions; you have to add a replica of that partition and remove it.

Troubleshooting and Error Codes

Again, your friend for troubleshooting replica removals is DSTrace - however, you only have to enable the partition flag so you can see if the system is working:

  • set dstrace = on (turns on DSTrace by enabling the Directory Services screen)
  • set dstrace = nodebug (turns off unnecessary messages)
  • set dstrace = +part (shows the messages related to partition operations)

Here are error codes that might appear:

  • -637 PREVIOUS_MOVE_IN_PROGRESS - When you try to add a replica, this error indicates you have some obituaries in your system. To solve this issue, follow TID 10064117.
  • -654 PARTITION_BUSY - Another partition operation hasn't finished yet, or you have problems with your Transitive Vectors. A Repair Local Database operation might fix the error; otherwise, check the Knowledgebase for other possible reasons.
  • -659 TIME_NOT_SYNCHRONIZED - The time is not synchronized between the servers, or you have future timestamps in the partition. Look for "Synthetic time is being issued" messages on the console of the server holding the Master replica for the partition.
  • -698 REPLICA_IN_SKULK, -6015 ERR_SERVER_IN_SKULK - The server is synchronizing the partition with another server; therefore it cannot deal with the server giving the error. The error should be solved by DS itself.
  • -761 ERR_SEEN_THIS_STATE - During synchronization, this error message is normal. This is how the server notifies the other servers it is up to date.

Case Study

Suppose we need to remove the replica of the PLAY partition from the APPNOTE2 server. Let's walk through the questions again:

* Which partitions will be involved in the operations? - In this operation two partitions are involved: the PLAY and the CHILD partition. As we remove the parent partition, the Subordinate replica of CHILD is no longer necessary, and the DS removes it.

* Which servers are involved in the operations? - All servers are involved in the replica ring of PLAY and CHILD.PLAY.

* Will there be any subordinates created or removed? - Yes, the Subordinate replica of CHILD partition will be removed.

1. We remove the replica from ConsoleOne, and the replica changes to Dying:

Figure 9 - Replica in Dying state

2. After the synchronization happens, the replica status changes to Dead.

Figure 10 - Replica in Dying state

3. Once the replica is removed, the Subordinate is still there:

Figure 11 - Replica in On state

4. In the latest DS version, the Subordinate is removed by DS.NLM. In earlier versions, sometimes unnecessary Subordinates get stuck, and a replica of the stuck partition (in this case, CHILD.PLAY) needs to be added and later removed from the server.

First the replica starts to die, and then it goes away. Because this one was the last replica, DSRepair tells us that no more replicas are found on the server:

Figure 12 - No Replicas on server

Changing the Replica Type

Changing the type of the replica is easy, using graphical interfaces. Basically, there are two possibilities:

  1. You can change any of the replicas to a Master replica, except a Subordinate reference - a replica needs to be added in that case.
  2. You can change any replicas except a Master and a Subordinate reference to Read-Only, Read-Write, or any of the filtered ones.

For a filtered replica, a brand new replica will be added to the server, while changing a Read-Write or Read-Only replica is basically only a change of the replica type.

Replica States

  • Master Start - This means that the replica type is being changed.
  • Master Done - The replica type has changed on all servers in the ring and the next state is going to be ON.

Normally you don't see these states, as they change very quickly.

Replica Change Process

If you change a replica to anything except a Master, it's pretty easy; the Master makes a change request that goes through the members of the replica ring.

If you change a replica to Master, it's a little tricky. Here is the process:

  1. The Master replica propagates the change to the other servers.
  2. The current Master replica notifies the new Master that it is become to the master.
  3. The new master checks the timestamp of the replica issued by the old master and waits while its clock is ahead of this timestamp and then it timestamps the replicas.
  4. The old master changes itself to Read-Write and the change is propagated.
  5. The replica state is propagated as ON.

Suggestions

Don't use the "Designate this server as the new master" feature of DSRepair, unless you lost your original Master replica. You may end up with two Master replicas if there are synchronization issues.

Always create a backup on the members of the replica ring using DSREPAIR -RC, in case you end up with more Master replicas.

Troubleshooting and Error Codes

With DSTrace, you only have to enable the partition flag to see if the system is working:

  • set dstrace = on (turns on the DSTrace by enabling the Directory services screen)
  • set dstrace = nodebug (turns off unnecessary messages)
  • set dstrace = +part (shows the messages related to partition operations)

Here are the error codes that might appear:

  • -637 PREVIOUS_MOVE_IN_PROGRESS - When you try to add a replica, this error indicates you have some obituaries in your system. To solve this issue, follow TID 10064117.
  • -654 PARTITION_BUSY - Another partition operation hasn't finished yet, or you have problems with your Transitive Vectors. A Repair Local Database operation might fix the error; otherwise, check the Knowledgebase for other possible reasons.
  • -659 TIME_NOT_SYNCHRONIZED - The time is not synchronized between the servers, or you have future timestamps in the partition. Look for "Synthetic time is being issued" messages on the console of the server holding the Master replica for the partition.
  • -698 REPLICA_IN_SKULK, -6015 ERR_SERVER_IN_SKULK - The server is synchronizing the partition with another server; therefore it cannot deal with the server giving the error. The error should be solved by DS itself.
  • -761 ERR_SEEN_THIS_STATE - During synchronization, this error message is normal. This is how the server notifies the other servers it is up to date.

Case Study

Suppose we have two replicas for the partition PLAY: the Master is on NW65SP4 and a Read-Write replica on APPNOTE2. We need to change the Master to APPNOTE2.

* Which partitions will be involved in the operations? - In this operation, only the servers in the replica ring of PLAY partition are involved.

* Which servers are involved in the operations? - Two servers are involved, the Master and the Read-Write which is being changed to Master.

* Will any subordinates be created or removed? - No, the replica type change does not indicate any subordinate addition or removal.

1. We change the replica type from ConsoleOne and the partition type changes to Master Start:

Figure 11 - Replica in Master Start state

2. Once the change is propagated over the replica ring, the replica changes to Master Done ...

Figure 12 - Replica in Master Done state

3. The replica turnes to ON and the replica type changes to Master:

Figure 13 - Replica in On state


Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com

© 2014 Novell