Novell exteNd
Director 5.2 API

com.sssw.search.api
Interface EbiDataFetcher


public interface EbiDataFetcher

Interface for Data Fetchers whose job is to obtain data from various sources and bring it into the Query Engine. Once inside the Query Engine, the data is indexed and ready to be used for querying.


Method Summary
 void clear()
          Clears any values set by the caller and resets them to the defaults.
 void clearIndexInfo()
          Clears out any indexing progress information maintained by the object.
 int fetchData(EbiContext context, boolean writeContentToOutput)
          Fetches document data from the source repository into the destination query engine repository.
 String getDebugSummary(boolean includeSettings, boolean includeIndexInfo)
          Provides a nicely formatted summary of the fetcher object's current settings and/or internal state information.
 String getDescription()
          Gets the description of the fetcher.
 String getDestRepository()
          Gets the name of the destination repository into which the fetcher imports data.
 String getFetcherName()
          Gets the name of the fetcher.
 int getFirstIndexID()
          Gets the ID of the first index involved in indexing the documents.
 String getHost()
          Gets the name (or IP address) of the query engine host.
 int getIndexerStatus(int indexID)
          Gets the status value for a specific index.
 int getIndexPort()
          Gets the index port of the query engine.
 String getKillDupsMode()
          Gets the "kill duplicates" mode value.
 int getLastIndexID()
          Gets the ID of the last index involved in indexing the documents.
 int getMaximumDocsInBuffer()
          Gets the number of documents per indexing buffer.
 String getMultiValueDelim()
          Gets the delimiter for multi-valued extension metadata fields.
 String getName()
          Gets the service name.
 String getOutputLineDelim()
          Gets the output line delimiter sequence for the output stream, if any is specified for progress indication messages, e.g.
 OutputStream getOutputStream()
          Gets the output stream, if any is set into the fetcher object for progress indication messages.
 EbiQuery getQuery()
          Gets the query that the fetcher is to use when extracting documents from the source repository and importing/indexing them into the query engine.
 int getQueryPort()
          Gets the query port of the query engine.
 String getSourceRepository()
          Gets the name of the source repository from which the fetcher retrieves data.
 String getStatusMessage(int status)
          Gets the error message for a specific operation status code.
 int importAndIndex(EbiContext context, Collection documents, boolean writeContentToOutput)
          Imports and indexes a collection of documents into the destination query engine repository.
 void importDocument(EbiContext context, StringBuffer docBuf, EbiFrameworkElement document, boolean writeContentToOutput)
          Imports and indexes a single document into the destination query engine repository.
 boolean mustCheckAccess()
          Tells whether only accessible (securitywise) documents should be extracted from the source repository and imported/indexed into the destination query engine repository.
 boolean mustProcessContent()
          Tells whether documents' contents are to be indexed into the query engine.
 boolean mustProcessExtnMeta()
          Tells whether documents' extension metadata field values are to be indexed into the query engine.
 boolean mustProcessMeta()
          Tells whether documents' standard metadata field values are to be indexed into the query engine.
 boolean mustStoreContent()
          Tells whether documents' contents are to be stored in the query engine when they are imported/indexed.
 boolean mustUseCurrentDate()
          Tells whether the current date/time should be used for the value of the "date" property for documents indexed into the query engine.
 boolean mustUseDocidAsRef()
          Tells whether the fetcher must use the document ID property value as the "reference" to the document in the query engine.
 void setCheckAccess(boolean checkAccess)
          Specifies whether the fetcher should perform security checking to filter out the documents inaccessible to whoever is running the fetcher.
 void setDescription(String description)
          Sets the description for this Data Fetcher
 void setDestRepository(String destRepository)
          Sets the name of the destination repository in the Query Engine
 void setFetcherName(String fetcherName)
          Sets the name for this Data Fetcher.
 void setHost(String host)
          Sets the destination Query Engine host to use.
 void setIndexPort(int indexPort)
          Sets the destination Query Engine index port to use.
 void setKillDupsMode(String killDupsMode)
          Specifies how to deal with duplicate documents that are already in the Query Engine.
 void setLog(EbiLog log)
          Specifies which log to use as the fetcher is executing.
 void setMaximumDocsInBuffer(int maxDocsInBuffer)
          Specifies the maximum number of documents per buffer when importing and indexing.
 void setMultiValueDelim(String multiValueDelim)
          Specifies the delimiter for importing multi-valued extension metadata fields.
 void setOutputLineDelim(String delim)
          Specifies the line delimiter for writing feedback messages into the output stream supplied via the 'setOutputStream' method.
 void setOutputStream(OutputStream out)
          Specifies an output stream for displaying feedback from the Data Fetcher.
 void setProcessContent(boolean processContent)
          Specifies whether the document content data is to be imported and indexed (recommended).
 void setProcessExtnMeta(boolean processExtnMeta)
          Specifies whether the document extension metadata is to be imported and indexed (recommended).
 void setProcessMeta(boolean processMeta)
          Specifies whether the standard document metadata is to be imported and indexed (recommended).
 void setQuery(EbiContext context, String query)
          Specifies the query to perform in order to extract document data from the source repository.
 void setQuery(EbiQuery query)
          Specifies the query to perform in order to extract document data from the source repository.
 void setQueryPort(int queryPort)
          Sets the destination Query Engine query port to use.
 void setSourceRepository(String sourceRepository)
          Sets the name of the source repository from which to extract data.
 void setStoreContent(boolean storeContent)
          Specifies whether the document content data is to be stored inside the Query Engine.
 void setUseCurrentDate(boolean useCurrentDate)
          Specifies whether the current date is to be used when filling in the creation date Engine field.
 void setUseDocidAsRef(boolean useDocidAsRef)
          Specifies whether the document ID is to be used as the document reference.
 

Method Detail

setFetcherName

public void setFetcherName(String fetcherName)
Sets the name for this Data Fetcher.
Parameters:
fetcherName - Name for the Data Fetcher.

setDescription

public void setDescription(String description)
Sets the description for this Data Fetcher
Parameters:
description - Description for the Data Fetcher.

setHost

public void setHost(String host)
Sets the destination Query Engine host to use.
Parameters:
host - Name or host IP address to use.

setQueryPort

public void setQueryPort(int queryPort)
Sets the destination Query Engine query port to use.
Parameters:
queryPort - Query port to use.

setIndexPort

public void setIndexPort(int indexPort)
Sets the destination Query Engine index port to use.
Parameters:
indexPort - Index port to use.

setSourceRepository

public void setSourceRepository(String sourceRepository)
Sets the name of the source repository from which to extract data.
Parameters:
sourceRepository - Name of the source repository.

setDestRepository

public void setDestRepository(String destRepository)
Sets the name of the destination repository in the Query Engine
Parameters:
destRepository - the name of the destination repository

setUseDocidAsRef

public void setUseDocidAsRef(boolean useDocidAsRef)
Specifies whether the document ID is to be used as the document reference.
Parameters:
useDocidAsRef - Flag that specifies the action. True = use document ID is used as document reference; false = use document's URL is used for reference. By default, the document's URL is used for reference.

setUseCurrentDate

public void setUseCurrentDate(boolean useCurrentDate)
Specifies whether the current date is to be used when filling in the creation date Engine field.
Parameters:
useCurrentDate - Flag that specifies the action. True = use the current date; false = use the date supplied by the source repository; by default, the date supplied by the source repository is used.

setStoreContent

public void setStoreContent(boolean storeContent)
Specifies whether the document content data is to be stored inside the Query Engine. This option may be useful for testing and debugging purposes, and possibly for extracting document data and backing it up. However, it is not recommended for a production environment due to the overhead of storing the same data in two places: the source repository and the destination Query Engine repository.
Parameters:
storeContent - Flag that specifies the action. True = content is stored in the Query Engine; false = content is not stored in the Query Engine.

setProcessMeta

public void setProcessMeta(boolean processMeta)
Specifies whether the standard document metadata is to be imported and indexed (recommended). By default, the standard document metadata is processed.
Parameters:
processMeta - Flag that specifies the action. True = standard document metadata is imported and indexed.

setProcessExtnMeta

public void setProcessExtnMeta(boolean processExtnMeta)
Specifies whether the document extension metadata is to be imported and indexed (recommended). By default, the document extension metadata is processed.
Parameters:
processExtnMeta - Flag that specifies the action. True = document extension metadata is imported and indexed.

setProcessContent

public void setProcessContent(boolean processContent)
Specifies whether the document content data is to be imported and indexed (recommended). By default, the document content data is imported, indexed, and then dropped by the Query Engine, unless it is told to store the content.
Parameters:
processContent - if true, the document content data is imported and indexed

setMultiValueDelim

public void setMultiValueDelim(String multiValueDelim)
Specifies the delimiter for importing multi-valued extension metadata fields. By default, the '|' character is used.
Parameters:
multiValueDelim - Character or sequence of characters to use for importing multi-valued extension metadata fields.

setMaximumDocsInBuffer

public void setMaximumDocsInBuffer(int maxDocsInBuffer)
Specifies the maximum number of documents per buffer when importing and indexing. By default, the value of 100 is used.
Parameters:
maxDocsInBuffer - Maximum number of documents per buffer.

setKillDupsMode

public void setKillDupsMode(String killDupsMode)
Specifies how to deal with duplicate documents that are already in the Query Engine.
Parameters:
killDupsMode - Specifier for kill duplicates mode.
See Also:
EbiDataFetcher.getKillDupsMode()

setCheckAccess

public void setCheckAccess(boolean checkAccess)
Specifies whether the fetcher should perform security checking to filter out the documents inaccessible to whoever is running the fetcher.
Parameters:
checkAccess - Flag that specifies the action. True = perform security filtering.

setQuery

public void setQuery(EbiQuery query)
Specifies the query to perform in order to extract document data from the source repository.
Parameters:
query - Query to run. If null, all the data is extracted from the source repository.

setQuery

public void setQuery(EbiContext context,
                     String query)
              throws EboUnrecoverableSystemException
Specifies the query to perform in order to extract document data from the source repository.
Parameters:
context - Context.
query - Query to run. If null, all the data is extracted from the source repository.

setOutputStream

public void setOutputStream(OutputStream out)
Specifies an output stream for displaying feedback from the Data Fetcher.
Parameters:
out - Output stream.

setOutputLineDelim

public void setOutputLineDelim(String delim)
Specifies the line delimiter for writing feedback messages into the output stream supplied via the 'setOutputStream' method.
Parameters:
delim - Delimiter to use.

setLog

public void setLog(EbiLog log)
Specifies which log to use as the fetcher is executing.
Parameters:
log - the log to use

getName

public String getName()
Gets the service name.
Returns:
Service name.

getFetcherName

public String getFetcherName()
Gets the name of the fetcher.
Returns:
Fetcher name.

getDescription

public String getDescription()
Gets the description of the fetcher.
Returns:
Fetcher description.

getHost

public String getHost()
Gets the name (or IP address) of the query engine host.
Returns:
Host name.

getQueryPort

public int getQueryPort()
Gets the query port of the query engine.
Returns:
Query port.

getIndexPort

public int getIndexPort()
Gets the index port of the query engine.
Returns:
Index port.

getSourceRepository

public String getSourceRepository()
Gets the name of the source repository from which the fetcher retrieves data.
Returns:
Source repository name.

getDestRepository

public String getDestRepository()
Gets the name of the destination repository into which the fetcher imports data.
Returns:
Destination repository name.

mustUseDocidAsRef

public boolean mustUseDocidAsRef()
Tells whether the fetcher must use the document ID property value as the "reference" to the document in the query engine. By default, the document's URL is used as its "reference" (recommended).
Returns:
Flag that indicates the action. True = fetcher is set up to use the document ID property value as the "reference" to the document in the query engine; false = fetcher uses the document's URL.

mustUseCurrentDate

public boolean mustUseCurrentDate()
Tells whether the current date/time should be used for the value of the "date" property for documents indexed into the query engine. By default, the documents' own creation date property is used (recommended).
Returns:
Flag that indicates the action. True = use current date/time for the value of the "date" property for documents indexed into the query engine; false = use the documents' own creation date property.

mustStoreContent

public boolean mustStoreContent()
Tells whether documents' contents are to be stored in the query engine when they are imported/indexed. By default, the contents are not stored in the query engine for efficiency reasons. One may choose to store the contents for debugging or backup purposes, and more importantly, for the generation of quick document summaries.
Returns:
true if documents' contents are to be stored in the query engine when they are imported/indexed, false if not

mustProcessMeta

public boolean mustProcessMeta()
Tells whether documents' standard metadata field values are to be indexed into the query engine. By default, the standard metadata is indexed.
Returns:
true if documents' standard metadata field values are to be indexed into the query engine, false if not

mustProcessExtnMeta

public boolean mustProcessExtnMeta()
Tells whether documents' extension metadata field values are to be indexed into the query engine. By default, the extension metadata is indexed.
Returns:
true if documents' extension metadata field values are to be indexed into the query engine

mustProcessContent

public boolean mustProcessContent()
Tells whether documents' contents are to be indexed into the query engine. By default, the contents are indexed.
Returns:
true if documents' contents are to be indexed into the query engine, false if not

getMultiValueDelim

public String getMultiValueDelim()
Gets the delimiter for multi-valued extension metadata fields. Multiple values are appended together using the delimiter, then indexed into the query engine as a single value. By default, the vertical bar '|' character is used as the delimiter.
Returns:
the delimiter for multi-valued extension metadata fields

getMaximumDocsInBuffer

public int getMaximumDocsInBuffer()
Gets the number of documents per indexing buffer. Groups of documents are indexed iteratively, one batch at a time. This number is the size of the batch. The default value is 100 documents.
Returns:
the maximum number of documents per indexing buffer

getKillDupsMode

public String getKillDupsMode()
Gets the "kill duplicates" mode value. The mode specifies how the query engine should identify and remove duplicate documents as the indexing process is going on. The following values are available for the Autonomy-based implementation:
Returns:
the "kill duplicates" mode value

mustCheckAccess

public boolean mustCheckAccess()
Tells whether only accessible (securitywise) documents should be extracted from the source repository and imported/indexed into the destination query engine repository.
Returns:
true if must check access to documents, false otherwise

getQuery

public EbiQuery getQuery()
Gets the query that the fetcher is to use when extracting documents from the source repository and importing/indexing them into the query engine. This query defines the scope of the fetch operation. If no query is specified, then all the documents in the source repository are fetched.
Returns:
the scope query

getOutputStream

public OutputStream getOutputStream()
Gets the output stream, if any is set into the fetcher object for progress indication messages.
Returns:
the output stream for progress indication messages

getOutputLineDelim

public String getOutputLineDelim()
Gets the output line delimiter sequence for the output stream, if any is specified for progress indication messages, e.g. "\n" or "

" for a JSP. By default, the newline character is used.

Returns:
the output stream line delimiter

fetchData

public int fetchData(EbiContext context,
                     boolean writeContentToOutput)
              throws EboUnrecoverableSystemException,
                     EboSecurityException
Fetches document data from the source repository into the destination query engine repository.
Parameters:
context - context
writeContentToOutput - tells whether document content is to be written into the progress indication output stream (if specified) as the fetcher is executing (may be useful for debugging purposes)

importAndIndex

public int importAndIndex(EbiContext context,
                          Collection documents,
                          boolean writeContentToOutput)
                   throws EboUnrecoverableSystemException,
                          EboSecurityException
Imports and indexes a collection of documents into the destination query engine repository.
Parameters:
context - context
documents - the documents to import/index
writeContentToOutput - tells whether document content is to be written into the progress indication output stream (if specified) as the fetcher is executing (may be useful for debugging purposes)
Returns:
the number of processed documents

importDocument

public void importDocument(EbiContext context,
                           StringBuffer docBuf,
                           EbiFrameworkElement document,
                           boolean writeContentToOutput)
                    throws EboUnrecoverableSystemException,
                           EboSecurityException
Imports and indexes a single document into the destination query engine repository.
Parameters:
context - context
docBuf - a string buffer to use for importing/indexing the document
document - the document object
writeContentToOutput - tells whether document content is to be written into the progress indication output stream (if specified) as the fetcher is executing (may be useful for debugging purposes)
See Also:
EbiDocument

getIndexerStatus

public int getIndexerStatus(int indexID)
                     throws EboUnrecoverableSystemException
Gets the status value for a specific index.
Parameters:
indexID - the ID of the index
Returns:
the status value
See Also:
"the QE_* constants on com.sssw.search.api.EbiQueryEngine", EbiDataFetcher.getFirstIndexID(), EbiDataFetcher.getLastIndexID(), EbiDataFetcher.getStatusMessage(int)

getStatusMessage

public String getStatusMessage(int status)
                        throws EboUnrecoverableSystemException
Gets the error message for a specific operation status code.
Parameters:
status - the status value
Returns:
the error message for the status code
See Also:
"the QE_* constants on com.sssw.search.api.EbiQueryEngine", EbiDataFetcher.getIndexerStatus(int)

getFirstIndexID

public int getFirstIndexID()
Gets the ID of the first index involved in indexing the documents.
Returns:
the ID of the first index
See Also:
EbiDataFetcher.getIndexerStatus(int), EbiDataFetcher.getLastIndexID()

getLastIndexID

public int getLastIndexID()
Gets the ID of the last index involved in indexing the documents.
Returns:
the ID of the last index
See Also:
EbiDataFetcher.getIndexerStatus(int), EbiDataFetcher.getFirstIndexID()

clearIndexInfo

public void clearIndexInfo()
Clears out any indexing progress information maintained by the object. This is recommended for reusing the fetcher object.

clear

public void clear()
Clears any values set by the caller and resets them to the defaults. Also clears out any state information maintained by the object. This is recommended for reusing the fetcher object.

getDebugSummary

public String getDebugSummary(boolean includeSettings,
                              boolean includeIndexInfo)
Provides a nicely formatted summary of the fetcher object's current settings and/or internal state information. Useful for debugging purposes.
Parameters:
includeSettings - include all the fetcher settings
includeIndexInfo - include the internal state information on the indexing process
Returns:
the debug summary

Novell exteNd
Director 5.2 API