[an error occurred while processing this directive]

AppNote: Using DirXML in a Data Junction djCosmos ETL Environment
Nsure Identity Manager Cool Solutions (DirXML) Article

Posted: 18 Dec 2003

Mark J. Worwetz
Senior Software Engineer, DirXML
Novell Inc.
mworwetz@novell.com

This AppNote provides information concerning of the use of ETL (Extract, Transformation, Load) software in a DirXML solution. ETL software solutions are widely used by corporations of all sizes to migrate and integrate information between disparate data stores and applications. Specific examples, using ETL vendor Data Junction's djCosmos product, will illustrate how these solutions can be integrated into a traditional DirXML identity management scenario. The use of the djCosmos product in the examples is for illustration purposes only and is not intended to indicate exclusive support for that product.

Contents:

Topics DirXML, DirXML Delimited Text Driver, ETL, Data Junction djCosmos.
Products DirXML 1.1a
Audience Consultants, SEs, IT administrators
Level Intermediate
Prerequisite Skills Familiarity with DirXML, DirXML Delimited Text Driver, Data Junction djCosmos
Operating System n/a
Tools n/a
Sample Code yes
Introduction

In today's enterprise software market there are literally thousands of applications, databases, and directories (data stores) available to solve and organize an almost infinite variety of business challenges. When considering the cost issues involved in maintaining these data stores, especially in the current economic environment, it becomes clear why so many companies are looking for a magical tool that will help them painlessly integrate their enterprise. It also becomes clear why more and more software vendors and consulting organizations are moving into the data integration and identity management markets.

Novell's identity management product, DirXML, is a market leader in this space for several reasons. Some of these include:

  • DirXML utilizes eDirectory as a data store. eDirectory is a perfect platform for data integrity, distribution, speed, and cross-platform availability.
  • DirXML can be configured to suit any customer environment and business rules.
  • DirXML removes the requirement to understand the native integration interfaces to connected systems.
  • DirXML can connect multiple systems into the same business process. Acting as a data hub and router, DirXML removes the need for point-to-point integrations.
  • DirXML can maintain persistent application object associations after integration.
  • DirXML does not require a common, unique ID field across all applications.

Although DirXML is an excellent product, it can not fulfill the requirements of all data integration scenarios. The reason for this is that the suite of DirXML application connectors (drivers) is limited to include only some of the major NOS and application vendors. Connectors for LDAP, JDBC, JMS, and Delimited Text reach a greater number of applications, and a team of driver developers is constantly working on new drivers, but this still leaves thousands of older and/or proprietary data stores out of reach.

Leveraging ETL Software

There has always been a need to move data from one format or data store to another. In environments where numerous data conversions are required on a very regular basis, it was quickly realized that automation of the process and user-friendly tools were desperately needed. In response to the demand, a host of ETL vendors entered the market with a wide variety of products. Initially these products only provided tools for mapping data formats and transforming data in one-time batch conversions, but eventually they have evolved into tools and processes that can be embedded in applications, scheduled, and can perform rules-based modifications. Over the years, the catalogue of data stores that can be integrated with ETL software has grown to include literally hundreds of products. It is no wonder that a large majority of the Fortune 500 and Global 1000 companies currently utilize an ETL solution from one vendor or another.

What all of this means is that in a great number of the software environments of our customers (and potential customers!), ETL integration solutions are already in place. It also means that many ETL vendors have out-of-the box integration points and data-format definitions to a number of data stores that are not currently accessible via DirXML. This should not be a cause for alarm! No ETL vendor can provide the identity management functionality of DirXML. The existence of ETL solutions is actually good news because DirXML can be configured to utilize these solutions in an integrated Identity Management scenario. By leveraging the investment in existing ETL software, tools, and knowledge base within an organization, the ETL solutions can become "virtual" driver shims to provide integration with DirXML very quickly and flexibly.

It is important to keep in mind that when an ETL solution is used in this manner, you are creating an environment where there are multiple business-logic (policy) engines. This distributed policy situation is definitely not an ideal situation. Therefore the following guidelines are recommended:

  • If a DirXML driver shim can perform the integration with the target application, it is always preferred that it should be used. The points of failure, complexity, and management overhead are greatly reduced, and performance and reliability are increased by using a DirXML driver alone.
  • If a DirXML driver shim can not integrate with the target application (or it is politically impossible to do so) and ETL is to be used, try to focus as much business logic as possible in one location, preferably in DirXML. The DirXML guideline for designing driver shims is to use the shim only for translating data from a generic format to an app-specific format, and then making appropriate application calls to read/write data. Try to use the same guidelines in your ETL "virtual" driver shim.
  • If an ETL solution is already in place, select a generic DirXML driver shim (ie. Delimited Text, JMS) that meets your customer's integration requirements and configure it as needed. In this situation, the most acceptable integration should require little or no modification of the ETL solution.
Data Junction djCosmos Overview

The Data Junction's djCosmos 8.0 ETL product is the focus of this document. (For more information about Data Junction, visit www.datajunction.com). The primary tool for doing development with Data Junction is called the Map Designer. The Map Designer is a complete IDE that allows a developer to design, test, and implement data integration processes known as transformations.

A transformation consists of three components:

  • Source Connection - This is a definition of how and where an application or data can be reached, data record schema definition (if any), and input file names and/or location.
  • Target Connection - This is a definition of how and where an application or data can be written, data record schema definition (if any), and output file names and/or location.
  • Map - This is a definition of field mappings, data-manipulation procedures, rules-processing, etc. The map is where the core of the transformation functionality is implemented.

A simple transformation describes a unidirectional transformation process. Therefore, in order to implement a bi-directional interface, two transformations would need to be utilized (A to B, B to A).

As an added bonus, the product is shipped with example transformations using various source and target file formats that are fairly simple to understand, execute, and modify. The examples are very useful for familiarizing yourself with the tools, although it is clear that a bit of training (available through Data Junction) would be extremely useful for getting off to a faster start.

Integration Scenarios

The following examples are intended as a guide to creating integration scenarios using the DirXML Delimited Text Driver and djCosmos. It is not intended as a djCosmos programming tutorial. Therefore the integration solutions illustrated in the following examples will focus on the interaction between the two products and utilize extremely simple configurations for both products. It is also very important to keep in mind that there are countless ways to solve business problems using these products. This document is intended to introduce a few techniques in as simple a manner as possible.

For the integration examples, the DirXML Delimited Text driver will be used to import and export comma-separated files that will utilize the format and sample data of the default driver configuration. The examples will utilize the default import and export locations for those files, c:\csvsample\input and c:\csvsample\output. The sample data file, sample.csv, consists of 100 data records containing information about U.S. senators. The fields are position dependent and are ordered as follows:

LastName, FirstName, Title, Email, WorkPhone, Fax, WirelessPhone, Description

Depending on the direction of the transformation, this sample data format will either be consumed or created by a djCosmos transformation. Again for simplicity, the sample transformation will do a simple conversion from one flat file format to another. On the Subscriber channel, a transformation will extract and re-order the fields of each record and create a new file, sample.asc, with new records of the following format:

"First Name","Last Name","Email Address","Description","Fax Number","Work Phone","Function","Mobile"

The Publisher Channel transformation will perform the reverse of this operation.

The figure below illustrates a djCosmos map that will perform the Subscriber transformation.


Figure 1: djCosmos transformation map 'samplecsv.map.xml'

With both a configured DirXML driver and djCosmos transformation in place, we have most of the tools needed for an integration solution. The DirXML driver is going to process all Subscriber events as they arrive from eDirectory, and it is configured to poll on a periodic basis for Publisher events and process them as the files arrive. We are still missing a couple of key components to complete the Subscriber processing.

  • A method is needed to instantiate the djCosmos transformation engine outside of the framework of the Map Designer IDE.
  • There is a need to dynamically change the transformation Source data.
  • A method is needed to either trigger the djCosmos engine when data is available to process, or schedule it to run on a periodic basis.

djCosmos Engine SDK

After designing your transformations using the Map Designer, the transformations and their sub-components may be saved in XML format. The top-level format is known as a transformation file and will have a name such as "samplecsv.tf.xml". Saving the data in this fashion makes it easy to transport the transformation to another computer system for use and also makes it possible to programmatically instantiate the transformation using the djCosmos Engine SDK. The SDK enables you to embed code to load, configure and execute transformation using either a COM or Java interface. In the following examples, the Java SDK will be used. The SDK class files are contained within the dj800ec.jar file. Make sure that this file is included in the CLASSPATH environment variable of the executing shell.

Example 1

This example will utilize a Java program to execute the samplecsv.tf.xml transformation. When the transformation was developed using the Map Designer, the Source and Target file names were explicitly named for ease of use and testing purposes. In the real world you will want to process files with differing names and content. The way to do this with djCosmos is to specify a DJ Message object URI for the source of the transformation. This may be done using the Map Designer, or by editing the transformation file directly. The following figure shows our example transformation Source. The value for the "Source File/URI" input is "djmessage:///samplemsg". This is the name format required for a DJ Message object "samplemsg" which will be used in our Java example.


Figure 2: djCosmos transformation source

We can now examine the Java program that will execute our transformation. The entire text is provided below:

//--------------------------------------------------------------------------
// This sample loads a the samplecsv transformation and executes it using a
// dynamic source DJMessageObject  "sample".
// c:\csvsample\output\sample.csv is loaded into a message
// object.  The transformation is loaded and run using the message
// object as it's source data, and writing to a sample.asc file.
//--------------------------------------------------------------------------
import DataJunction.ec.*;
import java.io.*;

public class Sample
{
    // Engine object must be created.
    Engine eng; 

    // Conversion object
    IConversion convert = null; 	

    // A Shared Expression Context must be created to use a Message Object.
    ISharedExpressionContext context;

    // Message object will hold the contents of the source file
    IMessage msg;

    // transformation filename.
    final String CONVFILE1 = new String( "samplecsv.tf.xml" ); 

    //----------------------------------------------------
    public static void main( String[] args )
    {
        Sample sample = new Sample();
        if ( sample.setup() == true )
           sample.go();
    }

    //----------------------------------------------------
    public Sample()
    {
        // Instantiate engine
        eng = new Engine();

        // We must also set the initialization file for our engine object
        eng.setInitializationFile( "dj800.ini" );

        // Initialize context
        context = new SharedExpressionContext();

        // Create the Message Object
        try
        {
            msg = context.createMessage();
        }
        catch( ECException ece )
        {
            System.out.println( "ERROR Creating message." + ece );
            System.out.println( "Error Code: "  + ece.getErrorCode() );
        }
    }

    // Create the conversions object
    public boolean setup()
    {
        // Create the conversion objects
        try
        {
            convert = new Conversion();
        }
        catch( ECException ece )
        {
            System.out.println( "ERROR Creating conversion." + ece );
            System.out.println( "Error Code: "  + ece.getErrorCode() );
            return false;
        }
        return true;
    }

    // Initialize DJ Message object, load it with contents of sample file
    public void go()
    {
        // Set the name of the message source for conversion
        // accessed as "djmessage:///samplemsg"
        msg.setName( "samplemsg" );

        // Read the sample.csv source file into the message object
        try
        {
            FileReader sourceReader = new FileReader(
                                    "c:\\csvsample\\output\\sample.csv" );
            BufferedReader srBuffer = new BufferedReader( sourceReader );

            String sourceData = new String();
            String record = new String();

            // The first record
            if ( ( record = srBuffer.readLine() ) != null )
                sourceData = record;

            // The remaining records
            while ( ( record=srBuffer.readLine() ) != null )
            {
                sourceData = sourceData + "\r\n" + record;
            }

            // The final record delimiter
            sourceData += "\r\n";
            sourceReader.close();

            msg.setBody( sourceData );
        }
        catch (IOException ioe) {
            System.out.println( "File operation error." );
            System.out.println(  ioe );
        }

        // Attach the context containing the message object to the conversion
        try
        {
            context.attach( convert );
        }
        catch( ECException ece )
        {
            IExpressionError iee = context.getLastError();

            if ( iee.isError() == true )
            {
                System.out.println( "EXPRESSION ERROR:  " +
                                                     iee.getErrorLineString()  );
                System.out.println( "Element:  "  + iee.getErrorToken() );
                System.out.println( iee.getFormatedErrorString() );
            }
            else System.out.println( "EXPRESSION ERROR: information about the error 
			has be over written by other execute() actions." );
        }

        // Load and run conversion map.
        try
        {
            convert.Load( CONVFILE1 );    // Load the Conversion map file
            System.out.println( "Loaded conversion " + convert.getName() );

            convert.Run();
            System.out.println( "Completed conversion run." );

        }
        catch( ECException ece )
        {
            System.out.println( "ERROR Loading conversion file: " + ece );
            System.out.println( "Error Code: "  + ece.getErrorCode() );
        }
    } //go()
} //Sample class

The Data Junction specific classes and interfaces involved in this program are:

  • Engine
  • ISharedExpressionContext
  • IMessageObject
  • IConversionObject

The Engine class is used to initialize the Data Junction conversion engine which performs the transformation process and guarantee a valid license status. This license information is referenced via the 'dj800.ini' file along with other engine initialization parameters.

The SharedExpressionContext class provides a framework that allows the various components of the transformation process to have access to shared components. Most specifically, this class is needed in order to use the MessageObject class. The instance of this class in the program is context.

The MessageObject class is at the heart of this program. It holds the name of the message URI that will be used as the source of the transformation (samplemsg) and it contains the text of the data read from the sample.csv file when the program is run. This instance of this class, msg, is created from a method in the SharedExpressionContext and is therefore associated with context.

The ConversionObject class is used to load and execute the samplecsv.tf.xml transformation that was created previously. The class instance convert is also attached to context so it will have access to msg. The Run() method of this class executes the transformation.

The result of executing this transformation is the output of the ascii file 'sample.asc', which is identical to the results that can be achieved by executing the transformation in the Map Designer IDE. However, it is obviously easier to programmatically execute the program using the Engine SDK program. Now that we have a simple methodology for executing the Data Junction transformations, we can utilize some of the features of the DirXML Delimited Text Driver to create a more automatic synchronization scenario.

Example 2

The previous example illustrated an automation of the DJ Cosmos transformation process using a Java program. This example will show how transformations on the Subscriber channel can be automated even further. What is missing in example 1 is a method of triggering the transformation when Subscriber data is made available by the Delimited Text driver. This example utilizes the PostProcessor interface extension of the driver to run the transformation as soon as the source file is written. The PostProcessor interface can be easily utilized by our implementation of the Delimited Text driver. All that is required is the implementation of a couple of interfaces in the sample program used in Example 1 and the addition of two new Subscriber channel parameters.

The new parameters that must be added to the <subscriber-options> of the driver are:

      <post-processor display-name="DJ Cosmos Post Processor Class">com.novell.nds.dirxml.driver.delimitedtext.djpostprocessor.DJPostProcessor</post-processor>
      <post-processor-params display-name="Conversion File Name">c:\csvsample\samplecsv.tf.xml</post-processor-params>

The <post-processor> parameter identifies the class name of the extension we will be adding to the driver, and the <post-processor-params> is a string that identifies the DJCosmos transformation file.

The DJPostProcessor source code is shown below. It is essentially the same as Example 1, but will now be dynamically loaded by the Delimited Text driver. Two new methods are also present. The init() method passes the <post-processor-params> and a handle to the DirXML trace utility to our program. The nextOutputFile() method implements the PostProcessor interface. When a new delimited text file is written to the output directory of the Subscriber channel, the driver will automatically call our extension with a handle to the File object that was just written. Our program will then startup and run the DJCosmos transformation on the file. Based on the results of the processing, the original source file will be renamed with either a '.fail' or a '.bak' extension prior to returning. If the transformation is executed successfully by the DJCosmos engine, the result of the transformation will once more be the creation of the 'sample.asc' file defined as the target of the transformation.


package com.novell.nds.dirxml.driver.delimitedtext.djpostprocessor;

import com.novell.nds.dirxml.driver.delimitedtext.*;

import DataJunction.ec.*;
import java.io.*;
import java.util.*;

/**
 * DJPostProcessor extends the capabilities of the DirXML
 * Delimited Text driver by providing in-line execution of a DJCosmos
 * transformation operation on Subscriber channel documents.
 *
 * @version		1.1	11/24/03
 * @author		Mark Worwetz	
 */
public class DJPostProcessor
   implements PostProcessor
{
   private Engine m_eng;
   private IConversion m_convert = null;
   private ISharedExpressionContext m_context = null;
   private IMessage m_msg = null;
   private String m_convFile = null; 
   private File m_outputFile = null; 
   private Tracer m_tracer;

   //----------------------------------------------------
   // working method of PostProcessor
   //----------------------------------------------------
   public void nextOutputFile( File outputFile )
   {
    boolean result = false;

    m_outputFile = outputFile;
    m_tracer.traceMessage("DJPP: Processing file name: " + m_outputFile.getName());

    try
    {
        // Instantiate engine
        m_eng = new Engine();

        // We must also set the initialization file for our engine object
        m_eng.setInitializationFile("dj800.ini");

        // Initialize context
        m_context = new SharedExpressionContext();

        // Create the Message Object
        m_msg = m_context.createMessage();

        // Setup the conversion object
        if ((result = setup()) == true)
           result = go();
     }   
     catch( ECException ece )
     {
       m_tracer.traceMessage("DJPP: ERROR Creating DJ message." + ece.toString());
       m_tracer.traceMessage("DJPP: Error Code: " + ece.getErrorCode() );
       result = false;
     }

     // Rename file with either a '.fail' or a '.bak' depending on result.
     String fileName = m_outputFile.getAbsolutePath();
     String suffix;
     if (result == false)
         suffix = ".fail";
     else
         suffix = ".bak";

     String newFile = fileName.substring(0, fileName.indexOf(".")) + suffix;
     File newFileFd = new File(newFile);
     m_outputFile.renameTo(newFileFd);
    }

   //-----------------------------------------------
   // Create the conversions object
   //-----------------------------------------------
   private boolean setup()
   {
      // Create the conversion objects
      try
      {
       m_convert = new Conversion();
       }
       catch(ECException ece)
       {
         m_tracer.traceMessage("DJPP: ERROR Creating DJ conversion." + 
                              ece.toString() );
         m_tracer.traceMessage("DJPP: Error Code: " + ece.getErrorCode() );
         return false;
      }

      return true;
    }

    //-----------------------------------------------
    // Initialize DJ Message object, load it with contents of sample file
    //-----------------------------------------------
    private boolean go()
    {
        // Set the name of the message source for conversion
        // accessed as "djmessage:///samplemsg"
        // This is hard-coded into the 'source' in the transformation file.
        m_msg.setName("samplemsg");

        // Read the source file into the message object
        try
        {
            FileReader sourceReader = new FileReader(m_outputFile);
            BufferedReader srBuffer = new BufferedReader(sourceReader);

            String sourceData = new String();
            String record = new String();

            // The first record
            if ((record = srBuffer.readLine()) != null)
                sourceData = record;

            // The remaining records
            // Note that the CR/LF is hard-coded for Windows platform
            while ((record=srBuffer.readLine()) != null)
            {
              sourceData = sourceData + "\r\n" + record;
            }

          // The final record delimiter
          sourceData += "\r\n";
          sourceReader.close();

          m_msg.setBody(sourceData);
        }
        catch (IOException ioe)
        {
          m_tracer.traceMessage("DJPP: File operation error: " + ioe.toString());
          return false;
        }

        // Attach the context containing the message object to the conversion
        try
        {
          m_context.attach(m_convert);
        }
        catch(ECException ece)
        {
          IExpressionError iee = m_context.getLastError();

          if (iee.isError() == true)
            {
              m_tracer.traceMessage("DJPP: EXPRESSION ERROR:  " + 
                                                   iee.getErrorLineString());
              m_tracer.traceMessage("DJPP: Element: "  + iee.getErrorToken());
              m_tracer.traceMessage("DJPP: " + iee.getFormatedErrorString());
            }
            else
            {
              m_tracer.traceMessage("DJPP: EXPRESSION ERROR:  information "
                            + "about the error has be over written by other "
                            + "execute() actions.");
            }
            return false;
        }

        // Load and run conversion map.
        try
        {
          m_convert.Load( m_convFile );    // Load the Conversion map file
          m_tracer.traceMessage("DJPP: Loaded conversion " + m_convert.getName());

          m_convert.Run();
          m_tracer.traceMessage("DJPP: Completed conversion run.");
          return true;
        }
        catch( ECException ece )
        {
          m_tracer.traceMessage("DJPP: ERROR Loading conversion file: " +
                                             ece.toString());
          m_tracer.traceMessage("DJPP: Error Code: "  + ece.getErrorCode());
          return false;
        }
    }

    //-----------------------------------------------
    // Init will be called to initialize the Extension interface.
    //-----------------------------------------------
    public void init(String parameterString, Tracer traceHandle )
    {
        m_tracer = traceHandle;
        m_convFile = parameterString;
    }
}
Conclusion

This AppNote has provided a basic level of understanding of Integration Scenarios utilizing the Data Junction DJCosmos product with DirXML. As stated previously, this information and the examples provided are intended to give a starting point to individuals who may be facing an integration or synchronization project utilizing these products.