Novell Home

Generic File Driver for IDM

Novell Cool Solutions: Cool Tool

Digg This - Slashdot This

In Brief

Pluggable generic file driver currently supporting read/write of CSV, XML and XLS.

Vitals

Product Categories:
  • Identity Manager
  • Functional Categories:
  • Identity & Security Management
  • Posted:19 Feb 2007
    File Size:172.9 KB
    License:MPL (Mozilla Public License)
    Download:/coolsolutions/tools/downloads/GenFileDriver.jar
    Publisher:Stefaan Van Cauwenberge

    Disclaimer

    Please read the note from our friends in legal before using this file.


    Details

    Description

    The Genric File Driver is similar to the Text Driver shipped with IDM, but has more options, and has the capability to read virtually any file type. Out of the box, the following file types are supported: XML, CSV and XLS (the latter using POI from apache).

    Main Features

    • Pluggable setup: the driver uses pluggable strategies to perform tasks such as what files should be read, how should a file be read/written, when a new file should be created, etc. The driver is shipped with strategies for each of the options.
    • Currently support for reading and writing XML, CSV and XLS files. Possible support (via custom plugins) for virtually every file type.
    • When reading XML or CSV files, maximum 20 records are in memory at any given time. This allow for reading of large files.
    • Input and output format are independent. You can eg read CSV files and write XML files.
    • The records in the input file can be enriched with meta data (eg: record number, filename, last record indicator, etc).
    • Restarts reading of input files at the point it was interrupted when it was interrupted (eg: shutdown during reading of a file).

    Installation

    Copy the jar (GenFileDriver.jar) to DirXML's classes folder. If you want to use XLS support, also download poi-2.5.1-final-20040804.jar from Apache and copy it into the same folder.

    Create a new java driver. Name of the class: info.vancauwenberge.filedriver.shim.driver.GenericFileDriverShim

    At the bottom (appendix A), an XML containing all configuration options is included. Copy and paste this into the driver configuration (XML editing enabled). You can optionally remove all options for strategies you are not using (eg: remove all CSV reader options if you are using the XML reader).

    Driver Options:

    Option Description Example value
    schema Field Names (Field1,Field2,Field3) LastName,FirstName,Title,Email,WorkPhone,Fax,WirelessP
    objectClass Object Class Name User

    Common Publisher Options:

    These options are for the publisher, independent of any strategy used.

    Option Description Example value
    pub_fileLocator Publisher: File Locator Strategy (Class): Current implementations: info.vancauwenberge.filedriver. filelocator.RegExpFileLocator info.vancauwenberge.filedriver.filelocator.RegExpFileLocato r
    pub_FileSorter Publisher: File Sorter Strategy (Class). Leave empty for no sorting: Current implementations: info.vancauwenberge.filedriver.filesorter.FilePropertySorter info.vancauwenberge.filedriver.filesorter.FilePropertySorter
    pub_fileReader Publisher: File Reader Strategy (Class): Current implementations:
    info.vancauwenberge.filedriver.filereader.csv.CSVFileReader
    info.vancauwenberge.filedriver.filereader.xml.XMLFileReade r info.vancauwenberge.filedriver.filereader.xls.XLSFileReader
    info.vancauwenberge.filedriver.filereader.csv.CSVFileReader
    pub_pollingInterval Publisher: Polling interval (seconds): 80
    pub_metaData Publisher: List of meta data elements that should be added(recordNumber,isLastRecord,filePath,fileName,fileSize) . The schema as given in the driver parameter 'schema' will be extended with all meta data elements listed. recordNumber,isLastRecord
    pub_heartbeatInterval Publisher: Heartbeat interval (minutes): 1
    pub_workDir Publisher: Temporary work folder. All files are moved to a subdirectory (containing a date/time) of this directory prior to reading the file. C:\workspace\DirXMLFileDriver\temp

    Strategy-specific Publisher Options:

    RegExpFileLocator Options

    The RegExpFileLocator is responsible for finding files that need to be processed. The given implementation scans a source folder, and selects all files that match the given regular expression.

    Option Description Example value
    regExp-sourceFolder RegExpFileLocator: Source folder for files: C:\source
    regExp-regExp RegExpFileLocator: Regular expression for finding files.Use (?i) at start to make case ignore regexp: (?i).*\.csv

    FilePropertySorter Options

    The FilePropertySorter sorts the list of files returned by the file locator implementation, and sorts them (ascending or descending) on a given file property. The property can be any file attribute as exposed by the java file object.

    Option Description Example value
    fileSort_SortMethod FileSorter: File sort methode: getName
    fileSort_SortOrderAsc FileSorter: Sort ascending (true/false): true

    CSVFileReader

    The CSVFileReader reads CSV files (with or without header). It has the option to skip empty lines, use a given seperator, use the header names (if any) in stead of the given driver schema as field names, etc...

    Option Description Example value
    csvReader_skipEmptyLines CSVReader: skip empty lines (true/false) true
    csvReader_UseHeaderName s CSVReader: use the header names (true) or the driver schema given (false). Using the header names has the advantage that the order of the columns can change, without affecting your code. The downside is that if the header names changes (eg: change in case), your code will no longer work. true
    csvReader_hasHeader CSVReader: has the CSV file a header (true/false) true
    csvReader_forcedEncoding CSVReader: forced enoding of the xml file. Leave blank to use system default encoding. UTF-8
    csvReader_seperator CSVReader: field seperator ,

    XMLFileReader

    The XMLFileReader reads XML files. The expected format is flat:

    <root>
    <aRecord>
    <aField>avalue</aField>
    <anotherField>anotherValue</ anotherField>
    </aRecord>
    <aRecord>
    ...
    </aRecord>
    </root>

    If the file is not in the given format, an optional preXslt can be applied to make it in this format. As with the CSV reader, you have the option to use the tag names (eg: aField in the example above) as field names, or use the schema given in the driver configuration.

    Option Description Example value
    xmlReader_useTagNames XMLFileReader: Use the tag names from the XML document (true) or use the driver schema given (false). Using the tag names has the advantage that the order of the elements can change, without affecting your code. The downside is that if the element names changes (eg: change in case), your code will no longer work. The latter should not occur in an XML file, because XML is case sensitive (making at in fact another field when the case of the tag changes). true
    xmlReader_preXslt XMLFileReader: xsl to apply prior to processing the XML file (leave empty for no pre-processing):
    <?xml version="1.0"
    encoding="UTF-
    8"?><xsl:stylesheet
    version="1.0"
    xmlns:xsl="http://www.w3.o
    rg/1999/XSL/Transform"
    xmlns:fo="http://www.w3.or
    g/1999/XSL/Format"><xsl:o
    utput method="xml"
    indent="yes"/><xsl:template
    match="root"><xsl:copy><x
    sl:applytemplates/></
    xsl:copy></xsl:
    template><xsl:template
    match="aRecord"><xsl:copy
    ><xsl:applytemplates/></
    xsl:copy></xsl:
    template><xsl:template
    match="schema"><xsl:copy>
    <xsl:applytemplates/></
    xsl:copy></xsl:
    template><xsl:template
    match="objectClass"><new
    ObjectClass><xsl:value-of
    select="."/></newObjectClas
    s></xsl:template></xsl:styles
    heet>
    xmlReader_forcedEncoding XMLFileReader: forced enoding of the xml file. Leave blank to use the encoding attribute inside the xml file (or system default when attribute not given).

    XLSFileReader

    The XLS file reader reads 1 tab of an Excel sheet. This reader will read the complete XLS file in memory.

    Option Description Example value
    xlsReader_SheetName XLSReader: Name of the sheet within the XLS file to read. Sheet1
    xlsReader_HasHeader XLSReader: has the XLS sheet a header (true/false) true

    Common Subscriber Options

    These options are common for the subscriber, independent of the strategies used.

    Option Description Example value
    sub_FileStartStrategy Subscriber: File Start Strategy (Class):
    Current implementations:
    info.vancauwenberge.filedriver.filestart.BasicNewFileDecider
    info.vancauwenberge.filedriver.filestart.BasicNewFileDecider
    sub_FileNameStrategy Subscriber: File Name Strategy (Class):
    Current implementations:
    info.vancauwenberge.filedriver.filename.GUIDFileNameStrategy
    info.vancauwenberge.filedriver.filename.SimpleDateFormatFileNameStrategy
    info.vancauwenberge.filedriver.filename.SimpleDateFormatFileNameStrategy
    sub_FileWriteStrategy Subscriber: File Write Strategy (Class):
    Current implementations:
    info.vancauwenberge.filedriver.filewriter.CSVFileWriter
    info.vancauwenberge.filedriver.filewriter.XMLFileWriter
    info.vancauwenberge.filedriver.filewriter.XLSFileWriter
    info.vancauwenberge.filedriver.filewriter.XMLFileWriter
    sub_WorkDir Subscriber: Work directory. All files will be created initially in the directory. When the files are closed, they are moved to the sub_DestFolder. C:\workspace\DirXMLFileD river\temp
    sub_DestFolder Subscriber: Destination directory: C:\workspace\DirXMLFileDriver

    Strategy-specific Subscriber Options

    BasicNewFileDecider

    This strategy decides when a new file should be started. The BasicNewFileDecider has several options included: maximum number of records, maximum age of the file, idle time or a manual control based on a field in the Add event received.

    Option Description Example value
    newFile_MaxRecords NewFile: Maximum number of data records in a file. Whenever this maximum is reached, a new file is started (unless overruled by newFile_FieldName). 100
    newFile_MaxFileAge NewFile: Maximum 'age' of a file (in seconds). The age counter starts from writing the first record to the file. As soon as the file is older then the given age, the next record will be written in a new file.
    Set 0 to disable this.
    60
    newFile_InactiveSaveInterval NewFile: Save file after nnn seconds of inactivity. As soon as the driver does not receive new events whitin the given amount of seconds, the file is closed. A new file is started as soon as a new record needs to be written.
    Set 0 to disable this.
    0
    newFile_FieldName NewFile: Fieldname used to manually control the creation of new files. If the value of this field is 'true', a new file will be started for adding the current record. Any other value will not start a new file. Note: as soon as this field is present, it overrules the maxRecord check. If you want both (manual and maxrecord), only set the field when it should be set to 'true'.

    GUIDFileNameStrategy

    The GUIDFileNameStrategy generates filenames containing a unique GUID. You have the option to prefix the GUID and to postfix the GUID. The postfix is typically a file extentions. The new filename will be <prefix>GUID<postfix>

    Option Description Example value
    GUIDNamer_FilePostFix GUID FileNamer: Optional postfix for the filename (eg: file extention). .csv.out
    GUIDNamer_FilePreFix GUID FileNamer: Optional prefix for the filename. NEW_FILE_

    SimpleDateFormatFileNameStrategy

    The SimpleDateFormatFileNameStrategy generates filesnames containing a date/time. Internally, a SimpleDateFormat is used.

    Option Description Example value
    simpleDateNamer_FormatString Date FileNamer: Date format string(according to jave.util.SimpleDateFormat). 'NEW'_'FILE'_yyyyMMdd-HHmmssSSS.'out'
    simpleDateNamer_TimeZone GUID FileNamer: Timezone for the dataformatter. Leave blank to use system default timezone.

    CSVFileWriter

    The CSVFileWriter writes records in a CSV format. You can specify the seperator independent from the seperator from the CSV reader.

    Option Description Example value
    csvWriter_WriteHeader CSVWriter: Write a header record(true/false): true
    csvWriter_Seperator CSVWriter: Seperator character: ,
    csvWriter_ForcedEncoding CSVWriter: Forced file encoding. Leave blank to use system default encoding.: UTF-8

    XMLFileWriter

    The XMLFileWriter writes the data in an XML file. The format is the same is what is expected in the XMLFileReader. Optionally a post xslt can be applied to get a different format.

    Option Description Example value
    xmlWriter_RootName XMLWriter: Root element name.: root
    xmlWriter_RecordName XMLWriter: Record element name: record
    xmlWriter_ForcedEncoding XMLWriter: Forced file encoding. Leave blank to use system default encoding.: UTF-8
    xmlWriter_PostXSL XMLWriter: Xsl to apply after saving the XML file (leave empty for no postprocessing):
    <?xml version="1.0"
    encoding="UTF-
    8"?><xsl:stylesheet
    version="1.0"
    xmlns:xsl="http://www.w3.o
    rg/1999/XSL/Transform"
    xmlns:fo="http://www.w3.or
    g/1999/XSL/Format"><xsl:o
    utput method="xml"
    indent="yes"/><xsl:template
    match="root"><xsl:copy><x
    sl:applytemplates/></
    xsl:copy></xsl:
    template><xsl:template
    match="record"><xsl:copy>
    <xsl:applytemplates/></
    xsl:copy></xsl:
    template><xsl:template
    match="LastName"><newLa
    stName><xsl:value-of
    select="."/></newLastName>
    </xsl:template><xsl:template
    match="*"><xsl:copy><xsl:a
    pplytemplates/></
    xsl:copy></xsl:
    template></xsl:stylesheet>

    XLSFileWriter

    The XLSFileWriter writes all records into one sheet(tab) of an XLS file.

    Option Description Example value
    xlsWriter_SheetName XLSWriter: Sheet/tab name. Sheet1
    xlsWriter_AddHeader XLSWriter: Add a header row (true/false) true

    Appendix A: Driver Config with all options and example data.

    <?xml version="1.0" encoding="UTF-8"?>
    <driver-config name="GenFileDriver">
    <driver-options>
    <schema display-name="Field Names (Field1,Field2,Field3)"
    id="100">LastName,FirstName,Title,Email,WorkPhone,Fax,WirelessPhone,Description</schema>
    <objectClass display-name="Object Class Name" id="101">User</objectClass>
    </driver-options>
    <publisher-options>
    <pub_fileLocator display-name="Publisher: File Locator Strategy (Class):"
    id="124">info.vancauwenberge.filedriver.filelocator.RegExpFileLocator</pub_fileLocator>
    <pub_FileSorter display-name="Publisher: File Sorter Strategy (Class):"
    id="125">info.vancauwenberge.filedriver.filesorter.FilePropertySorter</pub_FileSorter>
    <pub_fileReader display-name="Publisher: File Reader Strategy (Class):"
    id="126">info.vancauwenberge.filedriver.filereader.csv.CSVFileReader</pub_fileReader>
    <pub_pollingInterval display-name="Publisher: Polling interval (seconds):" id="127">5</pub_pollingInterval>
    <pub_metaData display-name="Publisher: List of meta data elements that should be
    added(recordNumber,isLastRecord,filePath,fileName,fileSize):" id="128">recordNumber,isLastRecord</pub_metaData>
    <pub_heartbeatInterval display-name="Publisher: Heartbeat interval (minutes):"
    id="129">1</pub_heartbeatInterval>
    <pub_workDir display-name="Publisher: Temporary work folder:"
    id="130">/home/nlv10194/filedriver/temp</pub_workDir>
    <regExp-sourceFolder display-name="RegExpFileLocator: Source folder for files:"
    id="131">/home/nlv10194/filedriver</regExp-sourceFolder>
    <regExp-regExp display-name="RegExpFileLocator: Regular expression for finding files.Use (?i) at start to
    make case ignore regexp:" id="132">(?i).*\.csv</regExp-regExp>
    <fileSort_SortMethod display-name="FileSorter: File sort methode:" id="133">getName</fileSort_SortMethod>
    <fileSort_SortOrderAsc display-name="FileSorter: Sort ascending (true/false):"
    id="134">true</fileSort_SortOrderAsc>
    <xmlReader_useTagNames display-name="XMLFileReader: Use the tag names from the XML document (true) or use the
    driver schema given (false):" id="135">true</xmlReader_useTagNames>
    <xmlReader_preXslt display-name="XMLFileReader: xsl to apply prior to processing the XML file (leave empty
    for no pre-processing):" id="136"><?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"><xsl:output
    method="xml" indent="yes"/><xsl:template match="root"><xsl:copy><xsl:applytemplates/&
    gt;</xsl:copy></xsl:template><xsl:template match="aRecord"><xsl:copy><xsl:applytemplates/&
    gt;</xsl:copy></xsl:template><xsl:template match="schema"><xsl:copy><xsl:applytemplates/&
    gt;</xsl:copy></xsl:template><xsl:template
    match="objectClass"><newObjectClass><xsl:value-of
    select="."/></newObjectClass></xsl:template></xsl:stylesheet></xmlReader_preXslt>
    <xmlReader_forcedEncoding display-name="XMLFileReader: forced enoding of the xml file. Leave blank to use the
    encoding attribute inside the xml file (or system default when attribute not given)."
    id="137"></xmlReader_forcedEncoding>
    <csvReader_skipEmptyLines display-name="CSVReader: skip empty lines (true/false)"
    id="138">true</csvReader_skipEmptyLines>
    <csvReader_UseHeaderNames display-name="CSVReader: use the header names (true) or the driver schema given
    (false) " id="139">true</csvReader_UseHeaderNames>
    <csvReader_hasHeader display-name="CSVReader: has the CSV file a header (true/false)"
    id="140">true</csvReader_hasHeader>
    <csvReader_forcedEncoding display-name="CSVReader: forced enoding of the xml file. Leave blank to use system
    default encoding." id="141">UTF-8</csvReader_forcedEncoding>
    <csvReader_seperator display-name="CSVReader: field seperator" id="142">,</csvReader_seperator>
    <xlsReader_SheetName display-name="XLSReader: Sheet name:" id="143">Sheet1</xlsReader_SheetName>
    <xlsReader_HasHeader display-name="XLSReader: has the XLS sheet a header (true/false):"
    id="144">true</xlsReader_HasHeader>
    <xlsReader_UseHeaderNames display-name="XLSReader: use the header names -if any- (true) or the driver schema
    given (false):" id="145">true</xlsReader_UseHeaderNames>
    </publisher-options>
    <subscriber-options>
    <sub_FileStartStrategy display-name="Subscriber: File Start Strategy (Class):"
    id="102">info.vancauwenberge.filedriver.filestart.BasicNewFileDecider</sub_FileStartStrategy>
    <sub_FileNameStrategy display-name="Subscriber: File Name Strategy (Class):"
    id="103">info.vancauwenberge.filedriver.filename.SimpleDateFormatFileNameStrategy</sub_FileNameStrategy>
    <sub_FileWriteStrategy display-name="Subscriber: File Write Strategy (Class):"
    id="104">info.vancauwenberge.filedriver.filewriter.XMLFileWriter</sub_FileWriteStrategy>
    <sub_WorkDir display-name="Subscriber: Work directory:"
    id="105">C:\workspace\DirXMLFileDriver\temp</sub_WorkDir>
    <sub_DestFolder display-name="Subscriber: Destination directory:"
    id="106">C:\workspace\DirXMLFileDriver</sub_DestFolder>
    <newFile_MaxRecords display-name="NewFile: Maximum number of data records in a file:"
    id="107">100</newFile_MaxRecords>
    <newFile_MaxFileAge display-name="NewFile: Maximum 'age' of a file (in seconds):"
    id="108">60</newFile_MaxFileAge>
    <newFile_InactiveSaveInterval display-name="NewFile: Save file after nnn seconds of inactivity:"
    id="109">0</newFile_InactiveSaveInterval>
    <newFile_FieldName display-name="NewFile: Fieldname used to manully control the creation of new files:"
    id="110"></newFile_FieldName>
    <csvWriter_WriteHeader display-name="CSVWriter: Write a header record(true/false):"
    id="111">true</csvWriter_WriteHeader>
    <csvWriter_Seperator display-name="CSVWriter: Seperator character:" id="112">,</csvWriter_Seperator>
    <csvWriter_ForcedEncoding display-name="CSVWriter: Forced file encoding. Leave blank to use system default
    encoding.:" id="113">UTF-8</csvWriter_ForcedEncoding>
    <xmlWriter_RootName display-name="XMLWriter: Root element name.:" id="114">root</xmlWriter_RootName>
    <xmlWriter_RecordName display-name="XMLWriter: Record element name:" id="115">record</xmlWriter_RecordName>
    <xmlWriter_ForcedEncoding display-name="XMLWriter: Forced file encoding. Leave blank to use system default
    encoding.:" id="116">UTF-8</xmlWriter_ForcedEncoding>
    <xmlWriter_PostXSL display-name="XMLWriter: Xsl to apply after saving the XML file (leave empty for no postprocessing):"
    id="117"><?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"><xsl:output
    method="xml" indent="yes"/><xsl:template match="root"><xsl:copy><xsl:applytemplates/&
    gt;</xsl:copy></xsl:template><xsl:template match="record"><xsl:copy><xsl:applytemplates/&
    gt;</xsl:copy></xsl:template><xsl:template
    match="LastName"><newLastName><xsl:value-of
    select="."/></newLastName></xsl:template><xsl:template match="*"><xsl:copy><xsl:applytemplates/&
    gt;</xsl:copy></xsl:template></xsl:stylesheet></xmlWriter_PostXSL>
    <xlsWriter_SheetName display-name="XLSWriter: Sheet name:" id="118">Sheet1</xlsWriter_SheetName>
    <xlsWriter_AddHeader display-name="XLSWriter: Add a header row (true/false):"
    id="119">true</xlsWriter_AddHeader>
    <GUIDNamer_FilePostFix display-name="GUID FileNamer: Optional postfix for the filename (eg: file extention:"
    id="120">.csv.out</GUIDNamer_FilePostFix>
    <GUIDNamer_FilePreFix display-name="GUID FileNamer: Optional prefix for the filename:"
    id="121">NEW_FILE_</GUIDNamer_FilePreFix>
    <simpleDateNamer_FormatString display-name="Date FileNamer: Date format string(according to
    jave.util.SimpleDateFormat):" id="122">'NEW'_'FILE'_yyyyMMdd-HHmmssSSS.'out'</simpleDateNamer_FormatString>
    <simpleDateNamer_TimeZone display-name="GUID FileNamer: Timezone for the dataformatter. Leave blank to use
    system default timezone.:" id="123"></simpleDateNamer_TimeZone>
    </subscriber-options>
    </driver-config>

    Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com

    © 2014 Novell