Content Search Guide

CHAPTER 1

About Searching

This chapter provides an overview of the searching methods you can implement in exteNd Director applications.

The following topics are covered:

 
Top of page

Searching methods

You can implement the following types of searching in exteNd Director applications:

Type of search

Description

exteNd Director support

Conceptual and keyword

Matches concepts or keywords based on English-like queries to search document content and metadata. The underlying technology is built around Application Builder, a toolkit from Autonomy, Inc. consisting of application programming interfaces (APIs). These APIs provide access to conceptual query and index functionality of Autonomy's Dynamic Reasoning Engine (DRE).

The Search API wrappers the Autonomy API to provide classes and methods for searching data sources allowed under license agreements with Autonomy, Inc.

The Content Management (CM) API wrappers the Search API to provide classes and methods for searching the exteNd Director CM repository.

IMPORTANT:   When you purchase exteNd Director you are licensed to search the exteNd Director CM repository.

SQL-based

Matches criteria specified in SQL queries to search document metadata.

The CM API provides classes and methods for searching the exteNd Director CM repository.

 
Top of page

Overview of Autonomy-based conceptual searching

Autonomy-based search technology gives you the ability to implement conceptual and keyword searching in your exteNd Director components. Traditional keyword searching returns all documents that contain occurrences of a search string. By contrast, conceptual searching matches concepts, often returning more relevant results.

 
Top of section

How conceptual searching works

NOTE:   The information in this section is adapted from the Autonomy Technology White Paper from Autonomy, Inc.

The Autonomy Dynamic Reasoning Engine (DRE) uses sophisticated pattern-matching algorithms to analyze any type of unstructured information, including documents in text and binary formats. Using these algorithms, the DRE identifies the patterns that occur naturally in text, then looks for similar patterns in the data source and returns the most relevant results.

The DRE determines relevance by performing probabilistic analysis to determine what data is most important, then assigns weights to indexed terms based on their importance.

 
Top of section

How conceptual searching differs from keyword searching

NOTE:   The information in this section is adapted from the Autonomy Technology White Paper from Autonomy, Inc.

Recall that traditional keyword searching is the process of finding documents that contain text strings specified by a user. Keyword searches return all documents that contain one or more occurrences of the search string, regardless of the context in which it is used. Because context is ignored, the results frequently contain many irrelevant hits. To refine search results, users often must modify their queries by adding complex boolean expressions. Keyword searching is also known as full-text searching.

By contrast, conceptual searching does take into account the context in which search terms appear so that it can match concepts rather than simply finding literal text strings. The result set contains content that is related by meaning and ranked by relevance to the search criteria. In this way conceptual searching reduces the number of false hits by returning documents that contain the concept, whether or not they also contain the search string.

To further illustrate the difference between the two approaches, consider this example. A keyword search for the term The+effect+of+the+recession+on+consumer+spending would return only documents that contain occurrences of all of these terms, likely producing a number of irrelevant results. The identical conceptual search would return documents that match the concept underlying the search expression, even if the documents don't contain all the terms in the query.

 
Top of section

Searching the CM repository

exteNd Director comes with a data fetcher for the exteNd Director CM repository. This CM fetcher automatically propagates document content and metadata from the CM repository into the exteNd Director DRE where it is indexed. The related processes of propagating and indexing data is often called fetching.

The exteNd Director CM subsystem communicates with the exteNd Director DRE through the Search subsystem. The CM API wrappers the Search API, providing classes and methods for constructing and running queries on content and metadata that reside in the CM repository and have been indexed by the exteNd Director DRE.

For more information    For more information on using the CM API for implementing conceptual searches against the CM repository, see Implementing Conceptual Search, Fetching Content and Metadata, and Querying Content and Metadata.

 
Top of section

Searching other data sources

The CM data fetcher that comes with exteNd Director allows you to use Autonomy technology exclusively with data from the exteNd Director CM repository. This fetcher automatically imports document content and metadata from the exteNd Director CM repository into the DRE for indexing, allowing you to subsequently conduct Autonomy-based searches over the indexed data.

To use Autonomy technology with exteNd Director to search other data sources, you must purchase additional data fetchers from Autonomy, Inc. For these licensed data sources, you use Search API classes directly to initiate the fetching process, and construct and run queries. Fetching occurs automatically only when you use the CM data fetcher.

 
Top of section

What you can do with the Search subsystem

The Search API provides wrapper classes around the Autonomy APIs to give you access to the following capabilities programmatically:

For more information    For more information about how to access and implement these capabilities, see Implementing Conceptual Search, Fetching Content and Metadata, and Querying Content and Metadata.

 
Top of page

Overview of SQL-based searching

The exteNd Director CM subsystem provides a built-in capability for SQL-based searching of metadata in the CM repository. You execute SQL search queries on document metadata only.

To search document content—or both content and metadata—use Autonomy-based searching, as described in Overview of Autonomy-based conceptual searching.

 
Top of section

Why use SQL-based searching

SQL-based searching allows you to search metadata stored in relational databases. You might opt for this search method in exteNd Director to:

 
Top of section

What you can search

You can use SQL queries to search for the following metadata properties in the CM repository:

For these properties, the CM API provides classes and methods for constructing and running SQL query expressions that search for values, ranges of values, words, phrases, or other patterns, as appropriate.

 
Top of section

Support for SQL constructs

The CM API provides methods on the com.sssw.cm.api.EbiDocQuery object for defining SQL clauses that you use to construct search queries. In exteNd Director, you construct SQL-based queries by defining SELECT, WHERE, and ORDER BY clauses.

The com.sssw.cm.api.EbiDocQuery interface defines WHERE methods for setting search criteria. In addition, com.sssw.cm.api.EbiDocQuery extends the com.sssw.cm.api.EbiDocMetaDataQuery interface which defines SELECT and ORDER BY methods:

Method type

Description

SELECT

Lets you specify the properties to return if they meet the search criteria

WHERE

Lets you set search criteria by defining the subclauses of a SQL WHERE expression

ORDER BY

Lets you specify how to return the result set

For more information    For more information, see Implementing SQL-Based Searching.



Copyright © 2004 Novell, Inc. All rights reserved. Copyright © 1997, 1998, 1999, 2000, 2001, 2002, 2003 SilverStream Software, LLC. All rights reserved.  more ...