Interpreting Predicate Statistics
Novell Cool Solutions: Feature
By Jim Henderson
Digg This -
Posted: 13 Jun 2005
Part of the process of improving performance of eDirectory searches is to evaluate the index requirements and set up indexes appropriate to your environment. In order to come up with sane index requirements, however, you need to understand what sorts of queries are being made against the directory server. In this tip, we will look at how to use and interpret Predicate Statistics for this purpose.
What are Predicates?
The term predicate refers to a search filter. In logic, it is the part of a proposition that is affirmed or denied about the subject. For example, if you execute the command:
ldapsearch -h megadodo -p 389 -b "" -s sub -x "(&(objectClass=User)(cn=Jim))"
The predicate is the search filter (&(objectClass=User)(cn=Jim)). The test being performed is whether or not the objectClass is "User" and the cn is "Jim"; this is something that can be evaluated to be either true or false for any object that falls within the search scope.
Configuring Predicate Statistics
The configuration of Predicate Statistics requires the use of the ConsoleOne utility - there is not yet a plugin for iManager to manage or view this information. Figure 1 shows the page in ConsoleOne where Predicate Statistics is managed from.
Figure 1. Predicate Data tab in ConsoleOne
Once on this page, press the properties button, and the dialog shown in figure 2 is displayed.
Figure 2. Properties for Predicate Data
In this dialog, there are three pieces of information:
- Predicate Stats object name
- Associated Server
- Update Interval
The Update Interval is used to control how frequently the predicate data table is updated; it also controls how often the configuration is updated. The default value is 300 seconds, and it can be set as low as 30 seconds.
Pressing the Advanced button gives you the dialog shown in figure 3.
Figure 3. Advanced Options
This is where the capture of predicate data is enabled. You can also enable options to display the value text, which allows you to see not just the attributes that were searched, but also the values that were searched for - this is useful for determining if an index should be a value index or a substring index.
The final option here, entitled Write to Disk tells the predicate stats process whether or not the predicates should be written to the DS database or just held in memory.
Using ConsoleOne, viewing the Predicate Data tab in the server object will show the information that has been recorded for searches performed against the server, as long as the predicate statistics gathering has already been enabled.
Note: Gathering of Predicate Statistics should only be done for a relatively short period of time - 24-48 hours during normal usage of the directory. Remember that you want to gather statistics on real usage of the directory, not of your idle periods.
Remember that Predicate Statistics does impact server performance - there is overhead to running this process, and on a busy server, that overhead could be significant.
How do I Interpret the Data?
The predicate statistics measurement is simply a counter of the number of times a search term is used in a search against a specific server. This counter needs to be evaluated by the administrator to determine what attributes might benefit from having an index defined for them.
Figure 4. Predicate Data
In figure 4, we see the results of a few minutes of predicate data gathering. You may notice, however, that while we only ran a single query (the search for objectClass=User and CN=Jim), there are a number of other bits of data that show up in this table.
There are several other predicates here that relate to various background processes - for example, the predicate (NDS_PARTITION_ID == 4 && Obituary) is a search looking for the presence of obituaries in the first user-defined partition on that server.
In the predicate that relates to the search we performed:
(((NDS_FLAGS & 1) == 1) && (((Object Class == 437) && (CN == Jim))))
There are a number of additional pieces of information. Let's step through it piece by piece in order to understand what this entire predicate is.
First, we have the ((NDS_FLAGS & 1) == 1) term. This particular term is looking at the flags on the object, logically anding it with the value of 1, and checking to see if the result is 1. In this case, what it is looking for is the flag on the object that indicates the object is present in the database - this prevents the predicate from finding objects that have been or are in the process of deleted.
Next, we have the (Object Class == 437) test. Now, we know that we searched for "User", but why is the predicate showing a value of 437?
In eDirectory, objects reference each other by an entry ID (or EID); when an object is defined as a particular class, this is tracked as an object reference, rather than as just a string value. The value '437' here is the base 10 value for the EID being referenced. To find out what this is on a NetWare or Windows server, you can use DSBROWSE; just convert the value to a base 16 number (in this case, 0x1B5) and search for that value in the object class portion of the schema. For the Unix and Linux platforms, there's no easy way to make this determination.
The final test here is the (CN == Jim) term - in this case, just a simple string comparison.
From the predicate data here, you can see that this particular search happened 1,227 times. In order to determine whether or not this is a significant number of times, you need to know the time period the sample is for. To find this out, use iMonitor to look at the predicate stats object (in this case, magrathea-PS) and look for the ndsPredicateState attribute. By looking at the timestamp for the value, you can tell how long the predicate statistics have been running.
What other factors do I need to be concerned with?
The most important thing to be concerned with is performance - two questions to ask:
- How quickly do the results get back to the user performing the search?
- Is the response time fast enough?
The first of these questions is something that is measureable - use the search term in the predicate data and time the search yourself. You probably will want to perform the search multiple times and take an average of the searches.
The second question, however, is extremely subjective. While it is easy to say that a search that takes 20 minutes is too slow, what about a search that takes a minute? 30 seconds? What is the threshold that your users will wonder if the search was submitted and submit it again?
There are two approaches to take to determine if this is acceptable:
- Ask a sampling of the users of the application if it performs OK for them, or
- Try running the application yourself and then reason whether or not, if you were the user of the application, if the performance would be acceptable for normal usage.
The first of these is always going to be the more accurate means of measuring the application's performance as compared to expectations, because your own perceptions of what would be considered acceptable may not equate to what the actual users who use the application expect.
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com