Generic File Driver for IDM
Novell Cool Solutions: Cool Tool
In Brief
Pluggable generic file driver currently supporting read/write of CSV, XML and XLS.
Vitals
- Product Categories:
- Identity Manager
- Functional Categories:
- Identity & Security Management
Posted: | 19 Feb 2007 |
File Size: | 172.9 KB |
License: | MPL (Mozilla Public License) |
Download: | /coolsolutions/tools/downloads/GenFileDriver.jar |
Publisher: | Stefaan Van Cauwenberge |
Disclaimer
Please read the note from our friends in legal before using this file.
Details
Description
The Genric File Driver is similar to the Text Driver shipped with IDM, but has more options, and has the capability to read virtually any file type. Out of the box, the following file types are supported: XML, CSV and XLS (the latter using POI from apache).
Main Features
- Pluggable setup: the driver uses pluggable strategies to perform tasks such as what files should be read, how should a file be read/written, when a new file should be created, etc. The driver is shipped with strategies for each of the options.
- Currently support for reading and writing XML, CSV and XLS files. Possible support (via custom plugins) for virtually every file type.
- When reading XML or CSV files, maximum 20 records are in memory at any given time. This allow for reading of large files.
- Input and output format are independent. You can eg read CSV files and write XML files.
- The records in the input file can be enriched with meta data (eg: record number, filename, last record indicator, etc).
- Restarts reading of input files at the point it was interrupted when it was interrupted (eg: shutdown during reading of a file).
Installation
Copy the jar (GenFileDriver.jar) to DirXML's classes folder. If you want to use XLS support, also download poi-2.5.1-final-20040804.jar from Apache and copy it into the same folder.
Create a new java driver. Name of the class: info.vancauwenberge.filedriver.shim.driver.GenericFileDriverShim
At the bottom (appendix A), an XML containing all configuration options is included. Copy and paste this into the driver configuration (XML editing enabled). You can optionally remove all options for strategies you are not using (eg: remove all CSV reader options if you are using the XML reader).
Driver Options:
Option | Description | Example value |
schema | Field Names (Field1,Field2,Field3) | LastName,FirstName,Title,Email,WorkPhone,Fax,WirelessP |
objectClass | Object Class Name | User |
Common Publisher Options:
These options are for the publisher, independent of any strategy used.
Option | Description | Example value |
pub_fileLocator | Publisher: File Locator Strategy (Class): Current implementations: info.vancauwenberge.filedriver. filelocator.RegExpFileLocator | info.vancauwenberge.filedriver.filelocator.RegExpFileLocato r |
pub_FileSorter | Publisher: File Sorter Strategy (Class). Leave empty for no sorting: Current implementations: info.vancauwenberge.filedriver.filesorter.FilePropertySorter | info.vancauwenberge.filedriver.filesorter.FilePropertySorter |
pub_fileReader | Publisher: File Reader Strategy (Class): Current implementations: info.vancauwenberge.filedriver.filereader.csv.CSVFileReader info.vancauwenberge.filedriver.filereader.xml.XMLFileReade r info.vancauwenberge.filedriver.filereader.xls.XLSFileReader |
info.vancauwenberge.filedriver.filereader.csv.CSVFileReader |
pub_pollingInterval | Publisher: Polling interval (seconds): | 80 |
pub_metaData | Publisher: List of meta data elements that should be added(recordNumber,isLastRecord,filePath,fileName,fileSize) . The schema as given in the driver parameter 'schema' will be extended with all meta data elements listed. | recordNumber,isLastRecord |
pub_heartbeatInterval | Publisher: Heartbeat interval (minutes): | 1 |
pub_workDir | Publisher: Temporary work folder. All files are moved to a subdirectory (containing a date/time) of this directory prior to reading the file. | C:\workspace\DirXMLFileDriver\temp |
Strategy-specific Publisher Options:
RegExpFileLocator Options
The RegExpFileLocator is responsible for finding files that need to be processed. The given implementation scans a source folder, and selects all files that match the given regular expression.
Option | Description | Example value |
regExp-sourceFolder | RegExpFileLocator: Source folder for files: | C:\source |
regExp-regExp | RegExpFileLocator: Regular expression for finding files.Use (?i) at start to make case ignore regexp: | (?i).*\.csv |
FilePropertySorter Options
The FilePropertySorter sorts the list of files returned by the file locator implementation, and sorts them (ascending or descending) on a given file property. The property can be any file attribute as exposed by the java file object.
Option | Description | Example value |
fileSort_SortMethod | FileSorter: File sort methode: | getName |
fileSort_SortOrderAsc | FileSorter: Sort ascending (true/false): | true |
CSVFileReader
The CSVFileReader reads CSV files (with or without header). It has the option to skip empty lines, use a given seperator, use the header names (if any) in stead of the given driver schema as field names, etc...
Option | Description | Example value |
csvReader_skipEmptyLines | CSVReader: skip empty lines (true/false) | true |
csvReader_UseHeaderName s | CSVReader: use the header names (true) or the driver schema given (false). Using the header names has the advantage that the order of the columns can change, without affecting your code. The downside is that if the header names changes (eg: change in case), your code will no longer work. | true |
csvReader_hasHeader | CSVReader: has the CSV file a header (true/false) | true |
csvReader_forcedEncoding | CSVReader: forced enoding of the xml file. Leave blank to use system default encoding. | UTF-8 |
csvReader_seperator | CSVReader: field seperator | , |
XMLFileReader
The XMLFileReader reads XML files. The expected format is flat:
<root> <aRecord> <aField>avalue</aField> <anotherField>anotherValue</ anotherField> </aRecord> <aRecord> ... </aRecord> </root>
If the file is not in the given format, an optional preXslt can be applied to make it in this format. As with the CSV reader, you have the option to use the tag names (eg: aField in the example above) as field names, or use the schema given in the driver configuration.
Option | Description | Example value |
xmlReader_useTagNames | XMLFileReader: Use the tag names from the XML document (true) or use the driver schema given (false). Using the tag names has the advantage that the order of the elements can change, without affecting your code. The downside is that if the element names changes (eg: change in case), your code will no longer work. The latter should not occur in an XML file, because XML is case sensitive (making at in fact another field when the case of the tag changes). | true |
xmlReader_preXslt | XMLFileReader: xsl to apply prior to processing the XML file (leave empty for no pre-processing): | <?xml version="1.0" encoding="UTF- 8"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.o rg/1999/XSL/Transform" xmlns:fo="http://www.w3.or g/1999/XSL/Format"><xsl:o utput method="xml" indent="yes"/><xsl:template match="root"><xsl:copy><x sl:applytemplates/></ xsl:copy></xsl: template><xsl:template match="aRecord"><xsl:copy ><xsl:applytemplates/></ xsl:copy></xsl: template><xsl:template match="schema"><xsl:copy> <xsl:applytemplates/></ xsl:copy></xsl: template><xsl:template match="objectClass"><new ObjectClass><xsl:value-of select="."/></newObjectClas s></xsl:template></xsl:styles heet> |
xmlReader_forcedEncoding | XMLFileReader: forced enoding of the xml file. Leave blank to use the encoding attribute inside the xml file (or system default when attribute not given). |
XLSFileReader
The XLS file reader reads 1 tab of an Excel sheet. This reader will read the complete XLS file in memory.
Option | Description | Example value |
xlsReader_SheetName | XLSReader: Name of the sheet within the XLS file to read. | Sheet1 |
xlsReader_HasHeader | XLSReader: has the XLS sheet a header (true/false) | true |
Common Subscriber Options
These options are common for the subscriber, independent of the strategies used.
Option | Description | Example value |
sub_FileStartStrategy | Subscriber: File Start Strategy (Class): Current implementations: info.vancauwenberge.filedriver.filestart.BasicNewFileDecider |
info.vancauwenberge.filedriver.filestart.BasicNewFileDecider |
sub_FileNameStrategy | Subscriber: File Name Strategy (Class): Current implementations: info.vancauwenberge.filedriver.filename.GUIDFileNameStrategy info.vancauwenberge.filedriver.filename.SimpleDateFormatFileNameStrategy |
info.vancauwenberge.filedriver.filename.SimpleDateFormatFileNameStrategy |
sub_FileWriteStrategy | Subscriber: File Write Strategy (Class): Current implementations: info.vancauwenberge.filedriver.filewriter.CSVFileWriter info.vancauwenberge.filedriver.filewriter.XMLFileWriter info.vancauwenberge.filedriver.filewriter.XLSFileWriter |
info.vancauwenberge.filedriver.filewriter.XMLFileWriter |
sub_WorkDir | Subscriber: Work directory. All files will be created initially in the directory. When the files are closed, they are moved to the sub_DestFolder. | C:\workspace\DirXMLFileD river\temp |
sub_DestFolder | Subscriber: Destination directory: | C:\workspace\DirXMLFileDriver |
Strategy-specific Subscriber Options
BasicNewFileDecider
This strategy decides when a new file should be started. The BasicNewFileDecider has several options included: maximum number of records, maximum age of the file, idle time or a manual control based on a field in the Add event received.
Option | Description | Example value |
newFile_MaxRecords | NewFile: Maximum number of data records in a file. Whenever this maximum is reached, a new file is started (unless overruled by newFile_FieldName). | 100 |
newFile_MaxFileAge | NewFile: Maximum 'age' of a file (in seconds). The age counter starts from writing
the first record to the file. As soon as the file is older then the given age, the next
record will be written in a new file. Set 0 to disable this. |
60 |
newFile_InactiveSaveInterval | NewFile: Save file after nnn seconds of inactivity. As soon as the driver does not
receive new events whitin the given amount of seconds, the file is closed. A new file
is started as soon as a new record needs to be written. Set 0 to disable this. |
0 |
newFile_FieldName | NewFile: Fieldname used to manually control the creation of new files. If the value of this field is 'true', a new file will be started for adding the current record. Any other value will not start a new file. Note: as soon as this field is present, it overrules the maxRecord check. If you want both (manual and maxrecord), only set the field when it should be set to 'true'. |
GUIDFileNameStrategy
The GUIDFileNameStrategy generates filenames containing a unique GUID. You have the option to prefix the GUID and to postfix the GUID. The postfix is typically a file extentions. The new filename will be <prefix>GUID<postfix>
Option | Description | Example value |
GUIDNamer_FilePostFix | GUID FileNamer: Optional postfix for the filename (eg: file extention). | .csv.out |
GUIDNamer_FilePreFix | GUID FileNamer: Optional prefix for the filename. | NEW_FILE_ |
SimpleDateFormatFileNameStrategy
The SimpleDateFormatFileNameStrategy generates filesnames containing a date/time. Internally, a SimpleDateFormat is used.
Option | Description | Example value |
simpleDateNamer_FormatString | Date FileNamer: Date format string(according to jave.util.SimpleDateFormat). | 'NEW'_'FILE'_yyyyMMdd-HHmmssSSS.'out' |
simpleDateNamer_TimeZone | GUID FileNamer: Timezone for the dataformatter. Leave blank to use system default timezone. |
CSVFileWriter
The CSVFileWriter writes records in a CSV format. You can specify the seperator independent from the seperator from the CSV reader.
Option | Description | Example value |
csvWriter_WriteHeader | CSVWriter: Write a header record(true/false): | true |
csvWriter_Seperator | CSVWriter: Seperator character: | , |
csvWriter_ForcedEncoding | CSVWriter: Forced file encoding. Leave blank to use system default encoding.: | UTF-8 |
XMLFileWriter
The XMLFileWriter writes the data in an XML file. The format is the same is what is expected in the XMLFileReader. Optionally a post xslt can be applied to get a different format.
Option | Description | Example value |
xmlWriter_RootName | XMLWriter: Root element name.: | root |
xmlWriter_RecordName | XMLWriter: Record element name: | record |
xmlWriter_ForcedEncoding | XMLWriter: Forced file encoding. Leave blank to use system default encoding.: | UTF-8 |
xmlWriter_PostXSL | XMLWriter: Xsl to apply after saving the XML file (leave empty for no postprocessing): | <?xml version="1.0" encoding="UTF- 8"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.o rg/1999/XSL/Transform" xmlns:fo="http://www.w3.or g/1999/XSL/Format"><xsl:o utput method="xml" indent="yes"/><xsl:template match="root"><xsl:copy><x sl:applytemplates/></ xsl:copy></xsl: template><xsl:template match="record"><xsl:copy> <xsl:applytemplates/></ xsl:copy></xsl: template><xsl:template match="LastName"><newLa stName><xsl:value-of select="."/></newLastName> </xsl:template><xsl:template match="*"><xsl:copy><xsl:a pplytemplates/></ xsl:copy></xsl: template></xsl:stylesheet> |
XLSFileWriter
The XLSFileWriter writes all records into one sheet(tab) of an XLS file.
Option | Description | Example value |
xlsWriter_SheetName | XLSWriter: Sheet/tab name. | Sheet1 |
xlsWriter_AddHeader | XLSWriter: Add a header row (true/false) | true |
Appendix A: Driver Config with all options and example data.
<?xml version="1.0" encoding="UTF-8"?> <driver-config name="GenFileDriver"> <driver-options> <schema display-name="Field Names (Field1,Field2,Field3)" id="100">LastName,FirstName,Title,Email,WorkPhone,Fax,WirelessPhone,Description</schema> <objectClass display-name="Object Class Name" id="101">User</objectClass> </driver-options> <publisher-options> <pub_fileLocator display-name="Publisher: File Locator Strategy (Class):" id="124">info.vancauwenberge.filedriver.filelocator.RegExpFileLocator</pub_fileLocator> <pub_FileSorter display-name="Publisher: File Sorter Strategy (Class):" id="125">info.vancauwenberge.filedriver.filesorter.FilePropertySorter</pub_FileSorter> <pub_fileReader display-name="Publisher: File Reader Strategy (Class):" id="126">info.vancauwenberge.filedriver.filereader.csv.CSVFileReader</pub_fileReader> <pub_pollingInterval display-name="Publisher: Polling interval (seconds):" id="127">5</pub_pollingInterval> <pub_metaData display-name="Publisher: List of meta data elements that should be added(recordNumber,isLastRecord,filePath,fileName,fileSize):" id="128">recordNumber,isLastRecord</pub_metaData> <pub_heartbeatInterval display-name="Publisher: Heartbeat interval (minutes):" id="129">1</pub_heartbeatInterval> <pub_workDir display-name="Publisher: Temporary work folder:" id="130">/home/nlv10194/filedriver/temp</pub_workDir> <regExp-sourceFolder display-name="RegExpFileLocator: Source folder for files:" id="131">/home/nlv10194/filedriver</regExp-sourceFolder> <regExp-regExp display-name="RegExpFileLocator: Regular expression for finding files.Use (?i) at start to make case ignore regexp:" id="132">(?i).*\.csv</regExp-regExp> <fileSort_SortMethod display-name="FileSorter: File sort methode:" id="133">getName</fileSort_SortMethod> <fileSort_SortOrderAsc display-name="FileSorter: Sort ascending (true/false):" id="134">true</fileSort_SortOrderAsc> <xmlReader_useTagNames display-name="XMLFileReader: Use the tag names from the XML document (true) or use the driver schema given (false):" id="135">true</xmlReader_useTagNames> <xmlReader_preXslt display-name="XMLFileReader: xsl to apply prior to processing the XML file (leave empty for no pre-processing):" id="136"><?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"><xsl:output method="xml" indent="yes"/><xsl:template match="root"><xsl:copy><xsl:applytemplates/& gt;</xsl:copy></xsl:template><xsl:template match="aRecord"><xsl:copy><xsl:applytemplates/& gt;</xsl:copy></xsl:template><xsl:template match="schema"><xsl:copy><xsl:applytemplates/& gt;</xsl:copy></xsl:template><xsl:template match="objectClass"><newObjectClass><xsl:value-of select="."/></newObjectClass></xsl:template></xsl:stylesheet></xmlReader_preXslt> <xmlReader_forcedEncoding display-name="XMLFileReader: forced enoding of the xml file. Leave blank to use the encoding attribute inside the xml file (or system default when attribute not given)." id="137"></xmlReader_forcedEncoding> <csvReader_skipEmptyLines display-name="CSVReader: skip empty lines (true/false)" id="138">true</csvReader_skipEmptyLines> <csvReader_UseHeaderNames display-name="CSVReader: use the header names (true) or the driver schema given (false) " id="139">true</csvReader_UseHeaderNames> <csvReader_hasHeader display-name="CSVReader: has the CSV file a header (true/false)" id="140">true</csvReader_hasHeader> <csvReader_forcedEncoding display-name="CSVReader: forced enoding of the xml file. Leave blank to use system default encoding." id="141">UTF-8</csvReader_forcedEncoding> <csvReader_seperator display-name="CSVReader: field seperator" id="142">,</csvReader_seperator> <xlsReader_SheetName display-name="XLSReader: Sheet name:" id="143">Sheet1</xlsReader_SheetName> <xlsReader_HasHeader display-name="XLSReader: has the XLS sheet a header (true/false):" id="144">true</xlsReader_HasHeader> <xlsReader_UseHeaderNames display-name="XLSReader: use the header names -if any- (true) or the driver schema given (false):" id="145">true</xlsReader_UseHeaderNames> </publisher-options> <subscriber-options> <sub_FileStartStrategy display-name="Subscriber: File Start Strategy (Class):" id="102">info.vancauwenberge.filedriver.filestart.BasicNewFileDecider</sub_FileStartStrategy> <sub_FileNameStrategy display-name="Subscriber: File Name Strategy (Class):" id="103">info.vancauwenberge.filedriver.filename.SimpleDateFormatFileNameStrategy</sub_FileNameStrategy> <sub_FileWriteStrategy display-name="Subscriber: File Write Strategy (Class):" id="104">info.vancauwenberge.filedriver.filewriter.XMLFileWriter</sub_FileWriteStrategy> <sub_WorkDir display-name="Subscriber: Work directory:" id="105">C:\workspace\DirXMLFileDriver\temp</sub_WorkDir> <sub_DestFolder display-name="Subscriber: Destination directory:" id="106">C:\workspace\DirXMLFileDriver</sub_DestFolder> <newFile_MaxRecords display-name="NewFile: Maximum number of data records in a file:" id="107">100</newFile_MaxRecords> <newFile_MaxFileAge display-name="NewFile: Maximum 'age' of a file (in seconds):" id="108">60</newFile_MaxFileAge> <newFile_InactiveSaveInterval display-name="NewFile: Save file after nnn seconds of inactivity:" id="109">0</newFile_InactiveSaveInterval> <newFile_FieldName display-name="NewFile: Fieldname used to manully control the creation of new files:" id="110"></newFile_FieldName> <csvWriter_WriteHeader display-name="CSVWriter: Write a header record(true/false):" id="111">true</csvWriter_WriteHeader> <csvWriter_Seperator display-name="CSVWriter: Seperator character:" id="112">,</csvWriter_Seperator> <csvWriter_ForcedEncoding display-name="CSVWriter: Forced file encoding. Leave blank to use system default encoding.:" id="113">UTF-8</csvWriter_ForcedEncoding> <xmlWriter_RootName display-name="XMLWriter: Root element name.:" id="114">root</xmlWriter_RootName> <xmlWriter_RecordName display-name="XMLWriter: Record element name:" id="115">record</xmlWriter_RecordName> <xmlWriter_ForcedEncoding display-name="XMLWriter: Forced file encoding. Leave blank to use system default encoding.:" id="116">UTF-8</xmlWriter_ForcedEncoding> <xmlWriter_PostXSL display-name="XMLWriter: Xsl to apply after saving the XML file (leave empty for no postprocessing):" id="117"><?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"><xsl:output method="xml" indent="yes"/><xsl:template match="root"><xsl:copy><xsl:applytemplates/& gt;</xsl:copy></xsl:template><xsl:template match="record"><xsl:copy><xsl:applytemplates/& gt;</xsl:copy></xsl:template><xsl:template match="LastName"><newLastName><xsl:value-of select="."/></newLastName></xsl:template><xsl:template match="*"><xsl:copy><xsl:applytemplates/& gt;</xsl:copy></xsl:template></xsl:stylesheet></xmlWriter_PostXSL> <xlsWriter_SheetName display-name="XLSWriter: Sheet name:" id="118">Sheet1</xlsWriter_SheetName> <xlsWriter_AddHeader display-name="XLSWriter: Add a header row (true/false):" id="119">true</xlsWriter_AddHeader> <GUIDNamer_FilePostFix display-name="GUID FileNamer: Optional postfix for the filename (eg: file extention:" id="120">.csv.out</GUIDNamer_FilePostFix> <GUIDNamer_FilePreFix display-name="GUID FileNamer: Optional prefix for the filename:" id="121">NEW_FILE_</GUIDNamer_FilePreFix> <simpleDateNamer_FormatString display-name="Date FileNamer: Date format string(according to jave.util.SimpleDateFormat):" id="122">'NEW'_'FILE'_yyyyMMdd-HHmmssSSS.'out'</simpleDateNamer_FormatString> <simpleDateNamer_TimeZone display-name="GUID FileNamer: Timezone for the dataformatter. Leave blank to use system default timezone.:" id="123"></simpleDateNamer_TimeZone> </subscriber-options> </driver-config>
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com
[an error occurred while processing this directive]