Data Cleansing with DirXML
Novell Cool Solutions: Feature
By Peter J Strifas
Digg This -
Posted: 13 Nov 2003
Novell Support Engineer Peter J Strifas shares a technique or two that should help you get -- and keep -- your data warehouse in order.
I've lived through a "data cleansing" process once during a DirXML deployment at large university - the problem was that user data was inconsistent and/or non-existent in their production tree. So we looked at this as a "Gordian Knot" and decided a process needed to be identified (and then refined). Also we realized there is NO SILVER BULLET to "clean data" :)
The Cleansing Process
- Identify the desired data.
- Find out what each user HAS.
- Get good data and import it.
This sounds easy, but in retrospect, it was one of the most difficult items on our checklist. You almost need a "committee" within the customer's organization to accommodate this step. Once you can define the "standard" user data set, then you can attack it.
There are several tools you can use to accomplish this -- we used Softerra's LDAPBrowser (free) which allowed us to set up directory searches on users with a list of the standard attributes. The search was saved as XLS which allowed us to work with it in Microsoft Access.
This means identifying all the sources for the user attributes that are a part of the "standard" -- this means HR, Telecom, eDirectory, e-mail system, etc -- then using Access, merge the data on defined user sets (or you could try doing the entire tree if it's small enough to handle). The biggest problem is each system has it's own "unique" identifer for each user -- so there's much work needed at this stage. Once we had ceated a set of users with "clean" data, we'd import them into NDS using JRB Utilities (affordable set of tools for NDS).
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com