Novell Home

eDirectory Partition Operations - Part 2

Novell Cool Solutions: AppNote
By Akos Szechy

Digg This - Slashdot This

Posted: 16 Nov 2005
 

Note: This is the second of two AppNotes on partitions and replicas - the first AppNote is at: http://www.novell.com/coolsolutions/appnote/15969.html

Partition Creation

Creating - or splitting - is a much more complex issue than the ones discussed in the earlier AppNote. Here there are more states, and more tricks to know.

First, it's important to clearly understand the following terms:

  • Parent - the old partition. The new partition is cut from this partition.
  • Child - the new partition that will be created.

When you create a partition, basically eDirectory cuts a part of the tree and places it into a new partition record. A replica of the new partition will be placed in every server in the replica ring of the parent partition. This applies to real replicas, not subordinates.

If the parent's replica ring contains a subordinate replica, the child's replica ring will not contain any replica on that server, because the parent is then a "child" where the subordinate is placed, and the "grandchild" does not need a subordinate.

Replica States

  • Split state 0 - The master requests the members of the replica ring to split the partitions.
  • Split state 1 - The master checks if every update is successful and then turns the parent and the child to ON.

Partition Split Process

  1. The Master informs the replica ring that a new partition is being created.
  2. All the servers are notified and set to Split state 0.
  3. Every server splits the current partition with the correct replica ring which does not include Subordinate replicas.
  4. When all servers split the partition the Master will finish the synchronization during Split state 1.

  5. When every server got the correct updates the Master will turn on both partitions and will synchronize the changes.

Suggestions

Always create a backup on the members of the replica ring using DSREPAIR -RC Minimize the size of the replica ring as much as you can, so the less server will be involved in the partition operation Communication must be perfect between the servers

Troubleshooting Steps

Use DSTrace. You only need to enable the partition flag, so you can see if the system is working:

  • set dstrace = on (turns on the DSTrace by enabling the Directory Services screen)
  • set dstrace = nodebug (turns off all unnecessary messages)
  • set dstrace = +part (shows the messages related to partition operations)

Partition operation can be cancelled while the partitions are in Split State 0, however there is a risk that the parent will not change back to ON state. In this case a dialin is necessary from Novell Technical Services

Error Codes

  • -637 PREVIOUS_MOVE_IN_PROGRESS - When you try to add a replica, this error indicates there are obituaries in the system. To solve this issue, follow TID 10064117.
  • -654 PARTITION_BUSY - Another partition operation hasn't finished yet, or you have problems with your Transitive Vectors. A Repair Local Database operation might fix the error; otherwise, check the Knowledgebase for other possible reasons.
  • -659 TIME_NOT_SYNCHRONIZED - The time is not synchronized between the servers, or you have future timestamps in the partition. Look for "Synthetic time is being issued" messages on the console of the server holding the Master replica for the partition.
  • -698 REPLICA_IN_SKULK, -6015 ERR_SERVER_IN_SKULK - The server is synchronizing the partition with another server; therefore it cannot deal with the server giving the error. The error should be solved by the DS itself.
  • -761 ERR_SEEN_THIS_STATE - During synchronization, this error message is normal. This is how the server notifies the other servers it is up to date.

Case Study

Suppose you have two replicas for the partition PLAY: the Master is on NW65SP4 and a Subordinate replica is on APPNOTE2. We need to create a partition of the CHILD.PLAY partition.

* Which partitions will be involved in the operations?
All servers are included in the replica ring of PLAY and the newly created CHILD.PLAY.

*Which servers are involved in the operations?
Two servers are involved, because the APPNOTE2 contains a Subordinate of the PLAY partition.

*Will be a subordinate be created for the CHILD.PLAY partition?
No. APPNOTE2 contains a replica of [ROOT], therefore it has a subordinate replica the PLAY partition. CHILD.PLAY is a child of the PLAY partition, but a grandchild of [ROOT], therefore no subordinate is necessary.

1. We split the partition in ConsoleOne. As a first step the parent partition changes to Split State 0 on both servers:

Figure 1 - Master, Split State 0

Figure 2 - Subordinate, Split State 0

2. Once the synchronization happens in this phase, the new partition is created in Split State 1:

Figure 3 - Split State 1

3. After the synchronization of Split State 1, the replicas turn to ON:

Figure 4 - Replicas in ON state

Partition Merge

Now let's move forward to more complex operations. The partition split was quite easy, as eDirectory didn't really have to deal with child replica rings and handle replica placements. But for a Merge or Join, eDirectory has to consider those issues.

When you join partitions, eDirectory removes the partition record for a specified partition and moves all the data to the parent partition. At this point eDirectory must consider replica placements.

So what happens if there is a server in the replica ring of the child partition that is not in the replica ring of the parent partition? Or what if a subordinate is there?
The answer: The server will receive a full copy of the parent partition, including all the subordinates that need to be created at this point.

Replica States

  • Join state 0 - The two master replica for the child and parent partition work together to change the state of the partitions involved in the operation. Servers that need to receive a replica will get the partition at this point. Servers involved in the operation MUST have a replica of the parent and child partition, therefore the Master Replica adds a replica of the child or the parent, whichever is missing, to the servers. When this is done, the process can move ahead to Join State 1.
  • Join state 1 - The boundary between the partitions is removed. Each object's partition ID changes to that of the parent partition.
  • Join state 2 - Changes are propagated, and the parent's Master stays the Master, while the child's Master becomes a Read-write replica. The replica is turned to ON.

Partition Merge Process

  1. The Master of the child partition informs the Master of the parent about the operation.
  2. The two Masters exchange replica rings.
  3. New replicas are added to the servers involved in the operation, if they don't have a replica of the child or the parent. Also, if any subordinates must be created, this is where that happens.
  4. Once all replicas are in place, the partition boundaries are removed during Join State 1. All partition IDs for obects are changed to that of the parent partition.
  5. The parent changes to Join state 2, and the child partition record is removed.
  6. When all IDs are changed, and all servers have reported back the success of the child partition deletion, the Master changes the replica state to ON. The child partition's Master replica becomes a read-write replica on the parent partition, if the Master Replica for both partitions were not on the same server.

Suggestions

  • Always create a backup for the members of the replica rings, using DSREPAIR -RC.
  • Minimize the size of the replica ring as much as you can, so there will be less server involvement in the partition operation.
  • Make sure communication is perfect between the servers.
  • Avoid subordinates in both replica rings by adding a Read-Write replica of that servers before you start the merge.
  • Before starting the merge, have the same replica ring for the parent and the child.

Troubleshooting

Use DSTrace and enable the partition flag to see if the system is working.

  • set dstrace = on (turns on the dstrace by enabling the Directory Services screen)
  • set dstrace = nodebug (turns off all unnecessary messages)
  • set dstrace = +part (shows messages related to partition operations)

A partition operation can be cancelled while the partitions are in Join State 0, but there is a risk that the parent will not change back to the ON state. In this case, a dial-in is necessary from Novell Technical Services.

Error Codes

  • -637 PREVIOUS_MOVE_IN_PROGRESS - When you try to add a replica, this error indicates there are obituaries in the system. To solve this issue, follow TID 10064117.
  • -654 PARTITION_BUSY - Another partition operation hasn't finished yet, or you have problems with your Transitive Vectors. A Repair Local Database operation might fix the error; otherwise, check the Knowledgebase for other possible reasons.
  • -659 TIME_NOT_SYNCHRONIZED - The time is not synchronized between the servers, or you have future timestamps in the partition. Look for "Synthetic time is being issued" messages on the console of the server holding the Master replica for the partition.
  • -698 REPLICA_IN_SKULK, -6015 ERR_SERVER_IN_SKULK - The server is synchronizing the partition with another server; therefore it cannot deal with the server giving the error. The error should be solved by the DS itself.
  • -761 ERR_SEEN_THIS_STATE - During synchronization, this error message is normal. This is how the server notifies the other servers it is up to date.

Case Study

Suppose you have two replicas for the partition CHILD.PLAY: the Master is on NW65SP4 and a Read-Write replica on APPNOTE2. You need to merge this partition to PLAY, which has the Master replica placed on NW65SP4 and a Subordinate placed on APPNOTE2.

* Which partitions will be involved in the operations?
In this operation the PLAY partition will be involved as well as the CHILD.PLAY partition.

* Which servers are involved in the operations?
All servers are involved in the replica ring of PLAY and CHILD.PLAY.

* Will be any subordinates or replicas created for any partition?
Yes. In the first phase of the merge, a Read-Write replica is placed on the APPNOTE2 server for the PLAY partition. This server currently has a subordinate, so the DS needs to add a Read-Write replica in order to merge the Read-Write replica on APPNOTE2 for the CHILD.PLAY partition. We can allow the DS do it, or we can prevent it by adding a Read-Write replica of CHILD.PLAY to the APPNOTE2 server before we start the merge. This is the preferred way, as it ensures that the operation will not get stuck at this point. (In this exercise, we let the DS do the change so you can see how the process works.)

1. We merge the partition in ConsoleOne. The parent partition changes to Join state 0:

Figure 5 - Partition merge - Join state 0

2. The DS realizes that a Read-Write replica needs to be added to this server, and the child partition is moved to Join State 1:

Figure 6 - Join state 1

3. Once the replica is added, the child moves to Join State 2, and the parent becomes Join State 0:

Figure 7 - Join State 0

4. When this information is synchronized, the child changes to the final state, Join State 2, and the parent becomes Join State 1:

Figure 8 - Child in Join state 2

5. Finally, the child partition disappears and only the parent available in Join State 2:

Figure 9 - Parent in Join state 2

6. As soon as the final synchronization is ready, the replica turns to ON:

Figure 10 - Replica in ON state

This operation was more complex - the more states and more servers that are involved, the more time and risk it takes.

Partition Move

A partition move actually involves three partitions:
  • The partition you move (the moved partition)
  • The partition you move to (the destination partition)
  • The partition you move from (the source partition)

Note: You must be extremely carefully when you do a partition move operation, as you can potentially damage three partitions.

Only partitions without a child partition can be moved, otherwise you will get a -686 NOT_LEAF_PARTITION error. You also need to consider the containment rules of the schema: you cannot move an Organizational Unit under the [ROOT], and you cannot move the Organization to any other Organization Unit.

Subordinate replicas can be tricky in this operation:

  • If you have any subordinate replicas of the moved partition, they might not be necessary in the destination, so they have to be removed during the operation.
  • If you have any real replicas of the destination partition, and those servers don't have a replica of the moved partition, a subordinate needs to be created.

Replica States

The moved partition and the destination partition have the following states:

  • Move state 0 - A change request is made from the client to the Master replica of the destination partition. The server checks for sufficient rights to perform the operation. If everything is correct, the server sets the status of the replica to Move state 0.
  • Move state 1 - The move takes place.
  • Move state 2 - Final synchronizations take place and the replica turns to ON.
  • The source partition has the following state during the whole operation:
    • Locked - No other partition operations can take place when the partition is in this state.

    Partition Move Process

    1. The client instructs the Master replica of the destination partition to initiate the move.
    2. The server checks that the necessary rights of the user performing the move are enough to finish the operation. Also checks the containment rules defined by the schema.
    3. The Master replica instructs the Master of the moved partition to start the move process. They both change their own partition to Move state 0, and the source partition is changed to Locked
    4. All necessary subordinates are added or removed. The necessary backlinks are created, and the partition is relocated to the new place during Move state 1.
    5. If the partition has been relocated in all servers in the replica ring, the status of the partition moves to Move state 2, and the final synchronizations take place.
    6. All the 3 partitions turn back to the ON state.

    Suggestions

    • Always create a backup on the members of the replica rings using DSREPAIR -RC.
    • Minimize the size of the replica ring as much as you can, so fewer servers will be involved in the partition operation.
    • Communication must be perfect between the servers.
    • Avoid subordinates in both replica rings by adding a Read-Write replica of that servers before you start the move.
    • Before starting the move, have the same replica ring for the source, destination and the moved partition.

    Troubleshooting Steps

    Use DSTrace and enable the partition flag to see if the system is working.

    • set dstrace = on (turns on the dstrace by enabling the Directory Services screen)
    • set dstrace = nodebug (turns off all unnecessary messages)
    • set dstrace = +part (shows messages related to partition operations)

    Partition operation can be cancelled while the partitions are in Move State 0, but there is a risk that the source will not change back to ON state. In this case, a dial-in from Novell Technical Services is necessary.

    Error Codes

    • -637 PREVIOUS_MOVE_IN_PROGRESS - When you try to add a replica, this error indicates you have some obituaries in your system. To solve this issue, follow TID 10064117.
    • -654 PARTITION_BUSY - Another partition operation hasn't finished yet, or you have problems with your Transitive Vectors. A Repair Local Database operation might fix the error; otherwise, check the Knowledgebase for other possible reasons.
    • -659 TIME_NOT_SYNCHRONIZED - The time is not synchronized between the servers, or you have future timestamps in the partition. Look for "Synthetic time is being issued" messages on the console of the server holding the Master replica for the partition.
    • -698 REPLICA_IN_SKULK, -6015 ERR_SERVER_IN_SKULK - The server is synchronizing the partition with another server; therefore it cannot deal with the server giving the error. The error should be solved by DS itself.
    • -761 ERR_SEEN_THIS_STATE - During synchronization, this error message is normal. This is how the server notifies the other servers it is up to date.

    Case Study

  • Suppose you have three partitions:
    • PLAY - the Master replica for this partition is on NW65SP4, while a Read-Write replica is on APPNOTE2.
    • CHILD.PLAY - The Master replica for this partition is on NW65SP4, and there are no real replicas for APPNOTE2. Therefore, you a have a Subordinate replica for APPNOTE2.
    • GRANDCHILD.CHILD.PLAY - The Master replica for this partition is on NW65SP4, and there are no other replicas.

    Also suppose you need to move the GRANDCHILD partition directly to the PLAY partition.

    * Which partitions will be involved in the operations?
    In this operation, all three partitions will be involved: PLAY, CHILD, GRANDCHILD.

    * Which servers are involved in the operations?
    All servers are involved in the replica ring of PLAY, CHILD.PLAY, and GRANDCHILD.PLAY.

    * Will any subordinates or replicas be created for any partition?
    Yes. Currently there are no replicas on APPNOTE2 for GRANDCHILD.CHILD.PLAY, but you will be moving GRANDCHILD.CHILD.PLAY to the PLAY partition, which has a Read-Write replica on APPNOTE2. Therefore a subordinate needs to be created on APPNOTE2 for GRANDCHILD.CHILD.PLAY. You can also add a replica of GRANDCHILD to APPNOTE2, which eliminates this step, and no subordinates will be created. Later you can remove this partition.

    1. The move is initiated from ConsoleOne. At this point the PLAY partition enters the Move State, while the destination partition is Locked. The moved partition will cause Move State 1:

    Figure 11 - Move State 1

    2. On the other server, the parent enters Move State 0, and a Subordinate is created from the moved partition. The subordinate also changes to Move State 0:

    Figure 12 - Subordinate in Move State 0

    3. Once the synchronization completed the partition changes to Move State 1:

    Figure 13 - Partition in Move State 1

    4. At the end of the Move state, all replicas are in the ON state:

    Figure 14 - Replicas in ON state


    Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com

    © 2014 Novell