Subtree Searches in eDirectory 8.8
Novell Cool Solutions: AppNote
By Sarangthem Chanu, Vithalprasad Gaitonde
Digg This -
Posted: 22 Feb 2006
When it comes to enterprise directories, the one thing that is the focus of attention is the search performance of the directory, given that searching is the most ubiquitous of all directory operations. The performance of the search operation, in turn, heavily influences performance of directory-enabled applications and Identity Management Systems.
In eDirectory 8.8, subtree search performance has been improved by storing additional hierarchical information on each entry. As the following sections explain, this hierarchical information is later used for quick subtree scope evaluation, thereby leading to faster subtree searches. The examples in the subsequent sections use LDAP terminology and representation.
Subtree Search in Earlier Versions of eDirectory
There are three primary elements in a directory client's search criteria:
- Base of the search
- Scope of the search
- Search filter
The scope parameter can be base, one level, or sub (indicating a subtree). The combination of base of the search and the scope parameters serve to constrain the search to a certain area of the DIT (Directory Information Tree). See the eDirectory Administration Guide (http://www.novell.com/documentation/edir88/index.html) for a more exhaustive treatment of this.
A frequent complaint by some eDirectory customers concerns subtree search performance of eDirectory. For a large tree with a deeply nested tree structure, eDirectory subtree search performance remains flat, irrespective of where the base of the search is located. So, a client searching a subtree with a small number of entries below it is comparable in performance to a client searching a subtree with a large number of entries. The client is not ensured of a faster search because the subtree is smaller. However, a one-level search does display performance characteristics that are a function of the number of entries below the base of the search.
"Raison d'etre" of the eDirectory Subtree Search ProblemThe above problem manifests itself because of how the subtree search over LDAP or NDAP is translated into a query to the database layer of eDirectory. Let's examine this with an example: an LDAP client sends a search to eDirectory with parameters as follows:
- base of ou=engineering,ou=directory,o=novell
- search filter of cn=*
- scope of subtree
This search will translate into a query to the database with presence assertion on the CN attribute. As you can see, only the filter part of the search criteria is translated to database query. The other part of the client's search criteria - the base of the search and the scope - is left out of the query passed to the database layer. Thus in our example, each entry in the database with a CN attribute is returned by the database layer in response to the query.
These entries are further processed by the upper directory service layer, called the DS Agent, to see if they meet the base and scope criteria. To do this, the DS Agent traverses up the hierarchy from the entry until
a) the base of the search is reached (in which case the entry passes the criteria and is added to the search result set to be sent back to the client), or
b) until the root entry of DIT is reached (in which case the entry fails the criteria and is left out of the search result set).
There are two problems with this approach:
1. The determination of the base and scope criteria outside of the database layer implies that this computation is devoid of any use of database index, and thus is slower.
2. For a query such as CN=* (or for that matter any other that matches a large number of objects), the database query is likely to throw a lot more entries than what may end up in the final search result set, depending on the base and scope criteria.
In contrast, a single-level search scope does not suffer from the same problem as a subtree scope. So, if our example uses a one-level scope search, it will be translated into a database query with a presence assertion on CN, "anded" with an value assertion on the Parent ID attribute ? the value being the entry ID of the base of the search. The Parent ID is an indexed operational attribute on each entry. It stores the entry ID (a 32-bit integer value which uniquely identifies each entry in the database) of the parent of an entry. Since the entire search criteria is passed to the database, which in turn evaluates entries using the Parent ID index, the computation of the search result set is done entirely in the database layer. This means it is performed much faster than the subtree search.
Remedy to the Subtree Search Performance in eDirectory 8.8
The subtree search problem in eDirectory has been remedied in the 8.8 release, using techniques similar to the single-level search. In addition to the Parent ID being stored on each entry in the database, eDirectory now stores additional hierarchical information in the form of a new system attribute ? Ancestor ID. This attribute stores the entry IDs of all the ancestors of an entry - including the immediate parent and itself - up to the tree root. Just like Parent ID, the Ancestor ID attribute is also indexed. Being a system attribute, it is not synchronized to other replicas and is not accessible to LDAP or NDAP clients. The iMonitor screen shot below shows the Ancestor ID attribute for a user entry.
Figure 1 - Ancestor ID attribute
With the additional hierarchical information available at the database layer, the agent now translates the scope and base criteria into the query to the database, in addition to the filter specification. This is done via a value assertion on the Ancestor ID attribute (value being the entry ID of the base) being "anded" with the rest of the filter criteria of the client. The assertion on Ancestor ID is evaluated by the database engine using the Ancestor ID index to build the search result set. Thus, no further trimming of the search result set need be done by the agent. The figure below illustrates the old and the altered behavior of subtree search inside the eDirectory server.
Figure 2 - Subtree searches in eDirectory - old vs. altered
No configuration changes are required from the administrator to enable the altered subtree search behavior.
Operations Augmented in eDirectory 8.8
Several directory operations have been made AncestorID-aware in eDirectory 8.8. The most obvious one is the search operation with subtree search scope, which now uses an Ancestor ID assertion as explained in the previous section. Each add entry operation now populates Ancestor IDs on the entry as an system attribute. This population is done by reading the Ancestor ID attribute of it's parent - if it's up to date (as discussed later in this article) - adding its own entry ID and populating the resulting set on the entry. If the Ancestor ID of the parent of an entry being added is not up to date, the computation of the new entry's Ancestor IDs is done by walking up the hierarchy to the tree root. The Move Entry operation has also been changed to update the new Ancestor IDs.
The partition operations that require populating/changing the Ancestor IDs are Add Replica and Move Partition. For the Add Replica operation, the Ancestor IDs are populated inline, as the entries are added to the DIB. The Move Partition operation also requires creating new Ancestor IDs for all the objects in the subtree of the partition. This is not done during the move partition operation; instead, it is done through a background process after the completion of the move partition operation. This is because creating the Ancestor IDs for all of the child objects could take a long time and would delay the move operation's completion. Delaying a partition operation would have further repercussions; for example, additional partition operations would be prevented by the server until the previous operation was completed.
Until all the objects in the subtree are updated with the new Ancestor ID information, subtree search operations will behave the way they did prior to 8.8. In other words, instead of using the Ancestor ID attribute, those searches will evaluate the base and scope criteria in the agent instead of in the database. A multi-valued attribute called UpdateInProgress in the pseudo-server maintains the list of partitions for which the Ancestor ids are being repopulated due to a partition move. The iMonitor snapshot below shows the UpdateInProgress attribute having the entry ID of the partition being moved. To view this information, go to Agent Configuration and click on Pseudo Server.
Figure 3 - UpdateInProgress attribute
Upgrade from Prior Versions of eDirectory
New eDirectory 8.8 servers will have all entries with the Ancestor ID attribute on them. However, the databases of the existing servers which are being upgraded from a prior version of eDirectory need to be scanned and each entry populated with Ancestor IDs. This is accomplished by a background process that scans each entry in the database and populates the Ancestor ID attribute on it. Because this is done in the background, other directory operations can continue to run normally while the database upgrade is in process.
Once all entries in the database have been populated with an Ancestor ID, the NDS Object Upgrade Version (used to track object upgrades in different release of eDirectory) is updated to 6. New eDirectory 8.8 servers would show the NDS Object Upgrade version of 6 right after they are installed. If the NDS Object Upgrade Version is earler than 6, the Ancestor ID attribute is deemed to be not up to date.
After all the objects in the database have an Ancestor ID attribute populated on them, the subtree search is performed using the new method. The figure below shows the NDS Object Upgrade version in the Agent Information page.
Figure 4 - NDS Object Upgrade version
The local DIB repair option in DSRepair has also been made Ancestor-ID-aware. It validates and fixes the Ancestor IDs of every entry in the DIB, and it reports the number of the objects whose Ancestor IDs were found to be invalid. Single object repair also fixes the Ancestor ID attribute.
Performance of the Augmented OperationsThe subtree search operation is the intended beneficiary of this change in eDirectory 8.8. Subtree operations are vastly faster in eDirectory 8.8, if the filter criteria of the search matches many objects in the database (e.g. CN=*), most of which ultimately do not make it to the final search result set, on account of the base and scope criteria. The extent of performance gain depends on the number of objects in the database that would match the filter criteria. The other types of subtree search operations that benefit from this change are those whose filter assertions are applied to unindexed attributes.
Here is an illustration ? an LDAP client sends a subtree search request with the following parameters:
base of ou=engineering,ou=directory,o=novell filter of telephoneNumber=8018611758 scope of subtree
In prior versions of eDirectory, if telephoneNumber is an unindexed attribute, this would cause the database to look at each entry in the database that has a telehoneNumber. The value(s) on the entry would be compared with 8018611758 to find a match. The DS Agent would then determine if the entries found to match the filter meet the base and scope criteria of the search.
The altered subtree search operation in eDirectory 8.8 works differently. In addition to the telephoneNumber assertion, another assertion of Ancestor ID= entry ID of ou=engineering,ou=directory,o=novell would be passed to the database. Since Ancestor ID is an indexed attribute, the database engine will be able to quickly look up all entries that have this specific value in their Ancestor ID attribute. Only entries below ou=engineering,ou=directory,o=novell would be candidates for further comparison of the telephoneNumber filter. As you can see, the computational expense in the later case is much less, thanks to the index on the Ancestor ID attribute.
A performance test conducted in a 200,000-object tree with 10 levels of containers (20,000 objects at each level) showed a 69% performance gain, when the lowest level container was used as the base of the search and the search filter was cn=*. The other two operations affected by this change ? add and move ? showed very marginal performance degradation (~2%) in the tests conducted on eDirectory 8.8. This can be attributed to the fact that amount of time it takes to store the additional data ? the Ancestor IDs ? is very small in comparison to the rest of the operation.
With the additional hierarchical information in the form of Ancestor IDs being stored on the entry in eDirectory 8.8, subtree searches are expected to be supercharged. It is hoped that the long-standing problem with subtree searches is now a thing of the past, and customers deploying eDirectory 8.8 will get more bang for their subtree searches.
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com