Novell exteNd
Director 5.2 API

com.sssw.search.api
Interface EbiDataFetcherDelegate

All Superinterfaces:
EbiDelegate

public interface EbiDataFetcherDelegate
extends EbiDelegate

Objects that implement this interface represent Data Fetchers whose job is to obtain data in various sources and bring it into the Query Engine. Once the data is inside the Query Engine, it is indexed and ready to be used for querying.


Fields inherited from interface com.sssw.fw.api.EbiDelegate
SERVICE_LOCAL, SERVICE_REMOTE
 
Method Summary
 void clear()
          Clears any values set by the caller and resets them to the defaults.
 void clearIndexInfo()
          Clears out any indexing progress information maintained by the object.
 int fetchData(EbiContext context, boolean writeContentToOutput)
          Fetches document data from the source repository into the destination query engine repository.
 String getDebugSummary(boolean includeSettings, boolean includeIndexInfo)
          Provides a nicely formatted summary of the fetcher object's current settings and/or internal state information.
 String getDescription()
          Gets the description of the fetcher.
 String getDestRepository()
          Gets the name of the destination repository for the fetcher to import the data into
 String getFetcherName()
          Gets the name of the fetcher.
 int getFirstIndexID()
          Gets the ID of the first index involved in indexing the documents.
 String getHost()
          Gets the name (or IP address) of the query engine host
 int getIndexerStatus(int indexID)
          Gets the status value for a specific index.
 int getIndexPort()
          Gets the index port of the query engine
 String getKillDupsMode()
          Gets the "kill duplicates" mode value.
 int getLastIndexID()
          Gets the ID of the last index involved in indexing the documents.
 int getMaximumDocsInBuffer()
          Gets the number of documents per indexing buffer.
 String getMultiValueDelim()
          Gets the delimiter for multi-valued extension metadata fields.
 String getOutputLineDelim()
          Gets the output line delimiter sequence for the output stream, if any is specified for progress indication messages, e.g.
 OutputStream getOutputStream()
          Gets the output stream, if any is set into the fetcher object for progress indication messages.
 EbiQuery getQuery()
          Gets the query that the fetcher is to use when extracting documents from the source repository and importing/indexing them into the query engine.
 int getQueryPort()
          Gets the query port of the query engine
 String getSourceRepository()
          Gets the name of the source repository for the fetcher to get the data from
 String getStatusMessage(int status)
          Gets the error message for a specific operation status code.
 int importAndIndex(EbiContext context, Collection documents, boolean writeContentToOutput)
          Imports and indexes a collection of documents into the destination query engine repository.
 void importDocument(EbiContext context, StringBuffer docBuf, EbiFrameworkElement document, boolean writeContentToOutput)
          Imports and indexes a single document into the destination query engine repository.
 boolean mustCheckAccess()
          Tells whether only accessible (securitywise) documents should be extracted from the source repository and imported/indexed into the destination query engine repository.
 boolean mustProcessContent()
          Tells whether documents' contents are to be indexed into the query engine.
 boolean mustProcessExtnMeta()
          Tells whether documents' extension metadata field values are to be indexed into the query engine.
 boolean mustProcessMeta()
          Tells whether documents' standard metadata field values are to be indexed into the query engine.
 boolean mustStoreContent()
          Tells whether documents' contents are to be stored in the query engine when they are imported/indexed.
 boolean mustUseCurrentDate()
          Tells whether the current date/time should be used for the value of the "date" property for documents indexed into the query engine.
 boolean mustUseDocidAsRef()
          Tells whether the fetcher must use the document ID property value as the "reference" to the document in the query engine.
 void setCheckAccess(boolean checkAccess)
          Specifies whether the fetcher should perform security checking to filter out the documents inaccessible to whoever is running the fetcher.
 void setDescription(String description)
          Sets the description for this Data Fetcher
 void setDestRepository(String destRepository)
          Sets the name of the destination repository in the Query Engine
 void setFetcherName(String fetcherName)
          Sets the name for this Data Fetcher.
 void setHost(String host)
          Sets the destination Query Engine host to use.
 void setIndexPort(int indexPort)
          Sets the destination Query Engine index port to use.
 void setKillDupsMode(String killDupsMode)
          Specifies how to deal with duplicate documents that are already in the Query Engine.
 void setLog(EbiLog log)
          Specifies which log to use as the fetcher is executing.
 void setMaximumDocsInBuffer(int maxDocsInBuffer)
          Specifies the maximum number of documents per buffer when importing and indexing.
 void setMultiValueDelim(String multiValueDelim)
          Specifies the delimiter for importing multi-valued extension metadata fields.
 void setOutputLineDelim(String delim)
          Specifies the line delimiter for writing feedback messages into the output stream supplied via the 'setOutputStream' method.
 void setOutputStream(OutputStream out)
          Specifies an output stream for displaying feedback from the Data Fetcher.
 void setProcessContent(boolean processContent)
          Specifies whether the document content data is to be imported and indexed (recommended).
 void setProcessExtnMeta(boolean processExtnMeta)
          Specifies whether the document extension metadata is to be imported and indexed (recommended).
 void setProcessMeta(boolean processMeta)
          Specifies whether the standard document metadata is to be imported and indexed (recommended).
 void setQuery(EbiContext context, String query)
          Specifies the query to perform in order to extract document data from the source repository.
 void setQuery(EbiQuery query)
          Specifies the query to perform in order to extract document data from the source repository.
 void setQueryPort(int queryPort)
          Sets the destination Query Engine query port to use.
 void setSourceRepository(String sourceRepository)
          Sets the name of the source repository to extract the data from
 void setStoreContent(boolean storeContent)
          Specifies whether the document content data is to be stored inside the Query Engine.
 void setUseCurrentDate(boolean useCurrentDate)
          Specifies if the current date is to be used when filling in the creation date Engine field
 void setUseDocidAsRef(boolean useDocidAsRef)
          Specifies if the document ID is to be used as document reference.
 
Methods implemented from interface com.sssw.fw.api.EbiDelegate
getName
 

Method Detail

setFetcherName

public void setFetcherName(String fetcherName)
Sets the name for this Data Fetcher.
Parameters:
fetcherName - the name

setDescription

public void setDescription(String description)
Sets the description for this Data Fetcher
Parameters:
description - description

setHost

public void setHost(String host)
Sets the destination Query Engine host to use.
Parameters:
host - the name or IP address of the host to use

setQueryPort

public void setQueryPort(int queryPort)
Sets the destination Query Engine query port to use.
Parameters:
queryPort - the query port to use

setIndexPort

public void setIndexPort(int indexPort)
Sets the destination Query Engine index port to use.
Parameters:
indexPort - the index port to use

setSourceRepository

public void setSourceRepository(String sourceRepository)
Sets the name of the source repository to extract the data from
Parameters:
sourceRepository - the name of the source repository

setDestRepository

public void setDestRepository(String destRepository)
Sets the name of the destination repository in the Query Engine
Parameters:
destRepository - the name of the destination repository

setUseDocidAsRef

public void setUseDocidAsRef(boolean useDocidAsRef)
Specifies if the document ID is to be used as document reference.
Parameters:
useDocidAsRef - if true, document ID is used as document reference, otherwise the document's URL is used for reference; by default, the document's URL is used for reference

setUseCurrentDate

public void setUseCurrentDate(boolean useCurrentDate)
Specifies if the current date is to be used when filling in the creation date Engine field
Parameters:
useCurrentDate - if true, the current date is used, otherwise the date supplied by the source repository; by default, the date supplied by the source repository is used

setStoreContent

public void setStoreContent(boolean storeContent)
Specifies whether the document content data is to be stored inside the Query Engine. This option may be useful for testing and debugging purposes and possibly for extracting document data and backing it up. However, it is not recommended for a production environment due to the overhead of storing the same data in two places, the source repository and the destination Query Engine repository.
Parameters:
storeContent - if true, content is stored in the Query Engine, if false, it is not

setProcessMeta

public void setProcessMeta(boolean processMeta)
Specifies whether the standard document metadata is to be imported and indexed (recommended). By default, the document metadata is processed.
Parameters:
processMeta - if true, the document metadata is imported and indexed

setProcessExtnMeta

public void setProcessExtnMeta(boolean processExtnMeta)
Specifies whether the document extension metadata is to be imported and indexed (recommended). By default, the document extension metadata is processed.
Parameters:
processExtnMeta - if true, the document extension metadata is imported and indexed

setProcessContent

public void setProcessContent(boolean processContent)
Specifies whether the document content data is to be imported and indexed (recommended). By default, the document content data is imported, indexed, and then dropped by the Query Engine, unless it is told to store the content.
Parameters:
processContent - if true, the document content data is imported and indexed

setMultiValueDelim

public void setMultiValueDelim(String multiValueDelim)
Specifies the delimiter for importing multi-valued extension metadata fields. By default, the '|' character is used.
Parameters:
multiValueDelim - the character or sequence of characters to use for importing multi-valued extension metadata fields

setMaximumDocsInBuffer

public void setMaximumDocsInBuffer(int maxDocsInBuffer)
Specifies the maximum number of documents per buffer when importing and indexing. By default, the value of 100 is used.
Parameters:
maxDocsInBuffer - the maximum number of documents per buffer

setKillDupsMode

public void setKillDupsMode(String killDupsMode)
Specifies how to deal with duplicate documents that are already in the Query Engine.
Parameters:
killDupsMode - the kill duplicates mode specifier
See Also:
EbiDataFetcherDelegate.getKillDupsMode()

setCheckAccess

public void setCheckAccess(boolean checkAccess)
Specifies whether the fetcher should perform security checking to filter out the documents inaccessible to whoever is running the fetcher.
Parameters:
checkAccess - if true, do security filtering

setQuery

public void setQuery(EbiQuery query)
Specifies the query to perform in order to extract document data from the source repository.
Parameters:
query - the query to run, if null, all the data is extracted from the source repository

setQuery

public void setQuery(EbiContext context,
                     String query)
              throws EboUnrecoverableSystemException
Specifies the query to perform in order to extract document data from the source repository.
Parameters:
context - context
query - the query to run, if null, all the data is extracted from the source repository

setOutputStream

public void setOutputStream(OutputStream out)
Specifies an output stream for displaying feedback from the Data Fetcher.
Parameters:
out - the output stream

setOutputLineDelim

public void setOutputLineDelim(String delim)
Specifies the line delimiter for writing feedback messages into the output stream supplied via the 'setOutputStream' method.
Parameters:
delim - the delimiter to use

setLog

public void setLog(EbiLog log)
Specifies which log to use as the fetcher is executing.
Parameters:
log - the log to use

getFetcherName

public String getFetcherName()
Gets the name of the fetcher.
Returns:
the fetcher name

getDescription

public String getDescription()
Gets the description of the fetcher.
Returns:
the fetcher's description

getHost

public String getHost()
Gets the name (or IP address) of the query engine host
Returns:
the host name

getQueryPort

public int getQueryPort()
Gets the query port of the query engine
Returns:
the query port

getIndexPort

public int getIndexPort()
Gets the index port of the query engine
Returns:
the index port

getSourceRepository

public String getSourceRepository()
Gets the name of the source repository for the fetcher to get the data from
Returns:
the source repository name

getDestRepository

public String getDestRepository()
Gets the name of the destination repository for the fetcher to import the data into
Returns:
the destination repository name

mustUseDocidAsRef

public boolean mustUseDocidAsRef()
Tells whether the fetcher must use the document ID property value as the "reference" to the document in the query engine. By default, the document's URL is used as its "reference" (recommended).
Returns:
true if the fetcher is set up to use the document ID property value as the "reference" to the document in the query engine, false if the document's URL is to be used

mustUseCurrentDate

public boolean mustUseCurrentDate()
Tells whether the current date/time should be used for the value of the "date" property for documents indexed into the query engine. By default, the documents' own creation date property is used (recommended).
Returns:
true if the current date/time should be used for the value of the "date" property for documents indexed into the query engine, false if the documents' own creation date property is to be used

mustStoreContent

public boolean mustStoreContent()
Tells whether documents' contents are to be stored in the query engine when they are imported/indexed. By default, the contents are not stored in the query engine for efficiency reasons. One may choose to store the contents for debugging or backup purposes, and more importantly, for the generation of quick document summaries.
Returns:
true if documents' contents are to be stored in the query engine when they are imported/indexed, false if not

mustProcessMeta

public boolean mustProcessMeta()
Tells whether documents' standard metadata field values are to be indexed into the query engine. By default, the standard metadata is indexed.
Returns:
true if documents' standard metadata field values are to be indexed into the query engine, false if not

mustProcessExtnMeta

public boolean mustProcessExtnMeta()
Tells whether documents' extension metadata field values are to be indexed into the query engine. By default, the extension metadata is indexed.
Returns:
true if documents' extension metadata field values are to be indexed into the query engine

mustProcessContent

public boolean mustProcessContent()
Tells whether documents' contents are to be indexed into the query engine. By default, the contents are indexed.
Returns:
true if documents' contents are to be indexed into the query engine, false if not

getMultiValueDelim

public String getMultiValueDelim()
Gets the delimiter for multi-valued extension metadata fields. Multiple values are appended together using the delimiter, then indexed into the query engine as a single value. By default, the vertical bar '|' character is used as the delimiter.
Returns:
the delimiter for multi-valued extension metadata fields

getMaximumDocsInBuffer

public int getMaximumDocsInBuffer()
Gets the number of documents per indexing buffer. Groups of documents are indexed iteratively, one batch at a time. This number is the size of the batch. The default value is 100 documents.
Returns:
the maximum number of documents per indexing buffer

getKillDupsMode

public String getKillDupsMode()
Gets the "kill duplicates" mode value. The mode specifies how the query engine should identify and remove duplicate documents as the indexing process is going on. The following values are available for the Autonomy-based implementation:
Returns:
the "kill duplicates" mode value

mustCheckAccess

public boolean mustCheckAccess()
Tells whether only accessible (securitywise) documents should be extracted from the source repository and imported/indexed into the destination query engine repository.
Returns:
true if must check access to documents, false otherwise

getQuery

public EbiQuery getQuery()
Gets the query that the fetcher is to use when extracting documents from the source repository and importing/indexing them into the query engine. This query defines the scope of the fetch operation. If no query is specified, then all the documents in the source repository are fetched.
Returns:
the scope query

getOutputStream

public OutputStream getOutputStream()
Gets the output stream, if any is set into the fetcher object for progress indication messages.
Returns:
the output stream for progress indication messages

getOutputLineDelim

public String getOutputLineDelim()
Gets the output line delimiter sequence for the output stream, if any is specified for progress indication messages, e.g. "\n" or "

" for a JSP. By default, the newline character is used.

Returns:
the output stream line delimiter

fetchData

public int fetchData(EbiContext context,
                     boolean writeContentToOutput)
              throws EboUnrecoverableSystemException,
                     EboSecurityException
Fetches document data from the source repository into the destination query engine repository.
Parameters:
context - context
writeContentToOutput - tells whether document content is to be written into the progress indication output stream (if specified) as the fetcher is executing (may be useful for debugging purposes)

importAndIndex

public int importAndIndex(EbiContext context,
                          Collection documents,
                          boolean writeContentToOutput)
                   throws EboUnrecoverableSystemException,
                          EboSecurityException
Imports and indexes a collection of documents into the destination query engine repository.
Parameters:
context - context
documents - the documents to import/index
writeContentToOutput - tells whether document content is to be written into the progress indication output stream (if specified) as the fetcher is executing (may be useful for debugging purposes)
Returns:
the number of processed documents

importDocument

public void importDocument(EbiContext context,
                           StringBuffer docBuf,
                           EbiFrameworkElement document,
                           boolean writeContentToOutput)
                    throws EboUnrecoverableSystemException,
                           EboSecurityException
Imports and indexes a single document into the destination query engine repository.
Parameters:
context - context
docBuf - a string buffer to use for importing/indexing the document
document - the document object
writeContentToOutput - tells whether document content is to be written into the progress indication output stream (if specified) as the fetcher is executing (may be useful for debugging purposes)
See Also:
EbiDocument

getIndexerStatus

public int getIndexerStatus(int indexID)
                     throws EboUnrecoverableSystemException
Gets the status value for a specific index.
Parameters:
indexID - the ID of the index
Returns:
the status value
See Also:
"the QE_* constants on com.sssw.search.api.EbiQueryEngine", EbiDataFetcherDelegate.getFirstIndexID(), EbiDataFetcherDelegate.getLastIndexID(), EbiDataFetcherDelegate.getStatusMessage(int)

getStatusMessage

public String getStatusMessage(int status)
                        throws EboUnrecoverableSystemException
Gets the error message for a specific operation status code.
Parameters:
status - the status value
Returns:
the error message for the status code
See Also:
"the QE_* constants on com.sssw.search.api.EbiQueryEngine", EbiDataFetcherDelegate.getIndexerStatus(int)

getFirstIndexID

public int getFirstIndexID()
Gets the ID of the first index involved in indexing the documents.
Returns:
the ID of the first index
See Also:
EbiDataFetcherDelegate.getIndexerStatus(int), EbiDataFetcherDelegate.getLastIndexID()

getLastIndexID

public int getLastIndexID()
Gets the ID of the last index involved in indexing the documents.
Returns:
the ID of the last index
See Also:
EbiDataFetcherDelegate.getIndexerStatus(int), EbiDataFetcherDelegate.getFirstIndexID()

clearIndexInfo

public void clearIndexInfo()
Clears out any indexing progress information maintained by the object. This is recommended for reusing the fetcher object.

clear

public void clear()
Clears any values set by the caller and resets them to the defaults. Also clears out any state information maintained by the object. This is recommended for reusing the fetcher object.

getDebugSummary

public String getDebugSummary(boolean includeSettings,
                              boolean includeIndexInfo)
Provides a nicely formatted summary of the fetcher object's current settings and/or internal state information. Useful for debugging purposes.
Parameters:
includeSettings - include all the fetcher settings
includeIndexInfo - include the internal state information on the indexing process
Returns:
the debug summary

Novell exteNd
Director 5.2 API