Tools Guide

Chapter 16 Full Text Search

SilverStream uses the Fulcrum SearchServer full text search and retrieval engine to provide a powerful tool that enables users of your application to search through large amounts of information using full text search.

NOTE The SilverStream help system uses full text search.

This chapter describes the following topics:

NOTE The following documentation is adapted from the Fulcrum SearchServer SearchSQL Reference and the SearchServer Data Preparation and Administration Guide. For more information on the topics discussed in the following sections, refer to the SearchServer online documentation (.HLP files) located in your local fulcrum\bin directory.

What is full text search?

The SilverStream full text search feature is based on the ANSI Structured Query Language (SQL), which is the standard interface language for accessing databases. SQL provides language extensions that support text retrieval. The combination of the queries you create in SilverStream and the search engine provided by Fulcrum enables you to:

Search a table or tables by specifying the columns that you want to search
Specify the search criteria to use

You can create a number of different searches. You can:

Search for specific words or phrases.
Create wildcard searches where the result set contains words or phrases that match the pattern you specify.
Create thesaurus searches where the result set contains words that have the same meaning as the words or phrases you specify.
Create multiple word searches where you specify the relevance ranking of the result set using an Order By clause. For example, you can specify that the result set contain the total number of occurrences of the word that matches your search criteria.
Specify words that you do not want included in your search results. These words are contained in a file called a stop file.

NOTE You can install the SilverStream help system as a database and use the full text search engine to look for keywords and phrases.

Creating full text searches

You can use the Expression Builder to create your own full text search expressions. You can also create full text search expressions programmatically.

You can create search queries for:

Forms
Data-loaded list controls within a form
Views
Pages
Business objects

NOTE You cannot use full text search on date, time, and timestamp fields.

Installing SearchServer

You must install SearchServer separately from SilverStream. The SearchServer software is included on the SilverStream CD. You must install SearchServer before you install the SilverStream software. Refer to the SilverStream Installation Guide for a description of this procedure. When you install SearchServer on your PC, the fulcrum\fultext directory is created on your PC. It includes the following files:

FULTEXT.STP is a stop file. It contains words such as of and the that are not indexed. Stop files make the indexing process faster and more efficient by preventing unnecessary words from being indexed. You can customize this file using a text editor or word processing package.
FULTEXT.FTH is the thesaurus file. You can customize this file also. You can also create a thesaurus file of your own.
FULTEXT.FTL is the character variance file. This file includes word spelling variations or words that use accents. You can customize this file.

All of these files are described in more detail in the appropriate sections in this chapter. You can also look in the online documentation located in your local fulcrum\bin directory.

Setting up your environment

Before you begin creating and executing search queries, you must:

Organize your data into rows and columns within a table (if it is not contained in a table already). Use the Table Designer to do this.
Specify which columns in the table are searchable by selecting the Full Text Search checkbox in the Table Designer Property Inspector, as described next.

Marking a table to be full text searchable

NOTE For a definition of Indexing, see Index.

To mark a table for indexing:

Select the Table icon from the left pane of the SilverStream Designer window. A list of available tables appears.
Select the table you want to index and double-click. The Table Designer appears.
Select one of the table columns and either select the Property Inspector icon from the Table Designer toolbar, or press your right mouse button. The Property Inspector appears.
Select the full text search check box. If you mark more than one column in a table as full text searchable, you cannot search just one of the columns. If you selected a column containing the longvarchar data type, SilverStream assumes that the data in the column is text. If the column contains HTML or a file attachment, Fulcrum assumes that the column contains the BLOB data type and uses one of its text readers to read the data. You can only mark one BLOB table column a full text searchable. You can, however, mark more than one text column for full text search.

Avoid these characters in primary key columns

Make sure that the primary key columns for your tables do not contain the following characters as data:

  : ! @ ,

If primary key columns contain any of these characters, SearchServer will be unable to search the table.

What happens

The server automatically indexes all tables marked as full text searchable every time records are updated or deleted.

Manually indexing tables

SilverStream provides the class AgFullText, which contains two methods--index() and indexBlock()--that you can call to manually index a table for full text search if data changes occur externally to SilverStream. For more information, see the API online documentation.

Also, any time you customize your thesaurus and stop files you have to reindex the tables associated with them.

SearchServer and SilverStream

You can create full text search queries anyplace in SilverStream that you can put a WHERE clause. This includes forms, views, pages, business object, and data-loaded form controls. You can create full text search queries one of two ways:

Using the Expression Builder. Create a full text search query for a form, page, or view to refine your search criteria. You can also use full text search on individual controls on the form to further narrow your search.

For a description of the Expression Builder, see Expression Builder.
Programmatically. You can construct a query that includes full text search in Java code and pass it to the query() method of a form, page, view, or other control.

This section covers the following:

Query syntax

The syntax of the query statement varies depending how complex you want the query to be. For example, the syntax for a single word search looks like this:

  tablename fullTextSearch "'literal'"

The syntax for a more complex query looks like this:

  tablename fullTextSearch "predicate ('literal' predicate_option)"

Where:

Component

Description

tablename

The table you want searched.

fullTextSearch

The operator you select from the Operators field in the Expression Builder to indicate full text search.

predicate

The type of search query you are building. The entire search string must be surrounded by double quotes (").

literal

The word, phrase, or string that you want to search for. If you are using a search predicate, you must surround the literal with parentheses. Literals must always be surrounded by single quotes (').

predicate_option

The predicate option you want to use in your search (if any). For example, if you were building a query for a thesaurus search, you could include the word_synonym option.

Component	Description
tablename	The table you want searched.
fullTextSearch	The operator you select from the Operators field in the Expression Builder to indicate full text search.
predicate	The type of search query you are building. The entire search string must be surrounded by double quotes (").
literal	The word, phrase, or string that you want to search for. If you are using a search predicate, you must surround the literal with parentheses. Literals must always be surrounded by single quotes (').
predicate_option	The predicate option you want to use in your search (if any). For example, if you were building a query for a thesaurus search, you could include the word_synonym option.

When you use the Expression Builder to build a search query, you do not have to worry about using the correct SearchServer syntax because SilverStream translates the query for you. For example, to create a thesaurus search query for the CARS table in SearchServer, you would have to enter a statement similar to the following:

  SELECT * 
  FROM CARS 
  WHERE DESCRIPTION CONTAINS 'HOTROD'

The same query built in the Expression Builder looks like this:

  cars fullTextSearch "'hotrod'"

Every full text search query you create must begin with the name of the table being searched.

When you create a query by coding it in Java using the Programming Editor use the following syntax:

  "tablename fullTextSearch \"'"+ search + "'\""

You start by identifying the table, the same way you do when you use the Expression Builder. Surround the search terms (string variables) in single and double quotes. To use double quotes within a Java string, precede them with the backslash character. SearchServer searches the table you specified for terms that match the search criteria. It then creates a working table that contains the rows that meet those criteria.

Using stop files

Stop files identify common words such as or and the that you do not want indexed. Words that are not indexed cannot be searched. If you include a stop word in a search query SearchServer treats them as though they match every row in the table you are searching.

The FULTEXT.STP file is supplied with the SearchServer software. You can add your own stop words to this file. A stop file can also contain character class definitions that modify the rules that SearchServer uses to recognize numeric punctuation. Stop files improve the search engine's indexing and search capabilities by eliminating unnecessary searches. You can customize the existing stop file or you can create your own using any text editor or word processing package.

Stop files typically contain alphabetic words, but they can also contain other characters. You should not include a word in the stop file unless the word has absolutely no search value in all contexts. For example, the letter a is not included in the FULTEXT.STP file because it could be an important designator in some cases, such as searching for the term Appendix A.

The FULTEXT.STP file contains the following words:

after
also
an
and
as
at
be
because
before

between
but
by
for
from
however
if
in
into

of
or
other
out
since
such
than
that
the

there
these
this
those
to
under
upon
when
where

whether
which
with
within
without

after also an and as at be because before	between but by for from however if in into	of or other out since such than that the	there these this those to under upon when where	whether which with within without

Adding and deleting stop words

Stop files can contain as many as 1024 words totaling no more than 10,000 characters. Each entry in the file must be unique.

NOTE When you modify a stop file, you must reindex all the tables associated with it.

You can add multiple words to a line in the stop file. You must conform to the following syntax rules:

Component

Syntax

stopfile

stopword-list

stopword-list

stopword-line [newline stopword-line]...

stopword-line

[stopword [stopword-separator stopword]...] [comment]

stopword

Any sequence of characters, excluding the space character, the number sign (#), and the equality symbol (=)

stopword-separator

space character or horizontal tab character

comment

# comment-text

comment-text

Any sequence of characters, excluding newline

newline

An optional carriage return character followed by a line feed character

Component	Syntax
stopfile	stopword-list
stopword-list	stopword-line [newline stopword-line]...
stopword-line	[stopword [stopword-separator stopword]...] [comment]
stopword	Any sequence of characters, excluding the space character, the number sign (#), and the equality symbol (=)
stopword-separator	space character or horizontal tab character
comment	# comment-text
comment-text	Any sequence of characters, excluding newline
newline	An optional carriage return character followed by a line feed character

SearchServer performs case normalization for alphabetic characters automatically, so it does not matter whether you add words using uppercase or lowercase letters.

You cannot include accented characters in stop words unless you enable accent indexing in the configuration files of the tables associated with the stop file.

Using character variant files

SearchServer enables you to use the character variant search feature, which treats typographical variants of a word as equivalents for search purposes. This feature makes sure that potential mismatches in a search due to subtleties of language or other external restrictions are avoided. For example, you can tell SearchServer to include the German word Frühling as an equivalent for the word Fruehling in a search query.

Character variant generation is controlled by the character variant rules contained in the character variant rules file. These rules contain instructions for removing or inserting accents, as well as modifying the suffix of a query term. SearchServer supports English, French, German, and other European language character variants.

There are three character variant rules files included with the SearchServer software:

FULTEXT.FTL appends the suffixes s and 's to each word.
GERMAN.FTL supports the typographic equivalents for the characters ä, ö, ü and ß.
FRENCH.FTL equates accented characters with a non-accented counterpart. For example, the character e is replaced by three accented forms: é, ê, and è.

You can modify any of these three files or you can create a new character variant rules file using any text editor or word processing package.

Use the fthtest utility to test the file. For more information about this utility, see Testing the file.

SilverStream sets the character variant file to fultext.ftl. The only time you should modify this file is when you want to use character variant functionality. Variant generation operates under the assumption that the string to substitute can be completely replaced, regardless of context. The rules can include removing or including the accents in a query term, or modifying the suffix of a query term.

Each substitution causes a variant form to be added to the search along with the original search term. For example, a rules file could specify the replacement of every e by the three accented forms è, é, and ê. The search term donne would return the words donne, donné, donnê, and donnè. You cannot modify a replacement string with another rule.

The maximum number of rules per file is 40. You can apply a maximum of 30 simultaneous substitutions to a given word. If one of these limits is exceeded, SearchServer rejects the query. SearchServer also rejects a query if the format of a character variant rule does not conform to the syntax described in the following section.

Each rule in the character variant file must be on its own line. Every rule has four fields. Each field has a specific starting column and a maximum length, as shown in the following table:

Field name

Contents

Starting column

Length

Substitution code

: indicates substitution anywhere within a word.

% indicates a suffix is to be replaced.

1

1

Target string

The part of the word to be replaced

2

<=4

Replacement string

The string replacing the target string

6

<=4

End of rule

Line feed (x0A) or EOF (end of file)

6-10

1

Field name	Contents	Starting column	Length
Substitution code	: indicates substitution anywhere within a word. % indicates a suffix is to be replaced.	1	1
Target string	The part of the word to be replaced	2	<=4
Replacement string	The string replacing the target string	6	<=4
End of rule	Line feed (x0A) or EOF (end of file)	6-10	1

You must pad your target and replacement strings with space characters when they occupy fewer than four characters.

A suffix matching rule can have an empty target string. In this case every original term generates a character variant that has the replacement string appended as a suffix. Suffix rules are applied only to an ordinary word by itself or as the last component of an implied phrase. For example, given the terms friend% and micro-computer, suffix rules are only applied to the word computer.

Suffix rules are not applied to single-character words. The same rule applies to the last component of an implied phrase, where the last component must contain at least two characters to be eligible for suffix substitution.

NOTE The total number of terms that can result from a single search word can become very large when you are using several substitution rules at one time. SearchServer looks up each generated term which means that a large number of search words (more than a few hundred) can slow down response time to unacceptable levels even if only a few hits actually occur in a table.

Character variant generation is applied to stop words. To avoid searches for stop words, you must include all the variants of the stop words in the stop file.

Character sets

The character variant rules file must be in FTICS. To allow convenient editing of this file using a 7-bit ASCII editor, the rules can contain certain multi-character sequences. This allows the representation of all characters in the FTICS.

The rules file is processed in much the same way as the test text reader. The test text reader recognizes a five-character sequence (beginning with \Fx and ending with a two-character hexadecimal representation) as a single character in the FTICS. Each of these sequences counts as only one character.

Examples

The following rules from FULTEXT.FTL append the plural suffix s and the English possessive suffix 's to a word:

  % s 
  % 's

In both cases the suffix is separated from the percent sign by exactly four spaces.

NOTE It is important that you use the correct spacing when creating rules. If you do not use the correct spacing, the line is ignored.

The following rules from the GERMAN.FTL file bidirectionally substitutes the substring ue for ü:

  :UE \Fxc8U 
  :ue 'Fxc8u  
  :\Fxc8U UE  
  :\Fxc8U ue

In each rule exactly two spaces separate the target and replacement fields.

Character variant rules are case-sensitive. The sample rules files included with SearchServer contain redundant rules differing only in the case of the letters in the target field.

The case of the letters in the replacement field is not important because SearchServer performs case normalization before it performs a dictionary lookup.

To extend the equivalence of a string like ue and ü to single wildcard matching, you can include an additional rule: an indexed accent followed by an alphabetic character is treated as one character. To extend this to the character string ue, you have to include the following rule in your file:

  :\Fx18 ue

where \Fx18 is a special code representing a single character wildcard. This rule must contain exactly three spaces between the target and replacement fields.

Testing the file

Use the fthtest utility to test your character variant rules file. This utility enables you to verify how the equivalent terms generated by the rules file compare to the search term. Use the following syntax:

  fthtest term -l rulesfile [-c tablename] [-t outfilename]

Where:

Component

Description

term

The terms you want to test.

rulesfile

The name of the character variant rules file. It must be a maximum of eight characters long and have a three-character file extension.

-c tablename

Instructs the utility to test the generated variants against the specified table (tablename). Only the equivalent terms that are found in the table are reported.

outfilename

The name of an optional output file (outfilename) to store the results of the test.

Component	Description
term	The terms you want to test.
rulesfile	The name of the character variant rules file. It must be a maximum of eight characters long and have a three-character file extension.
-c tablename	Instructs the utility to test the generated variants against the specified table (tablename). Only the equivalent terms that are found in the table are reported.
outfilename	The name of an optional output file (outfilename) to store the results of the test.

The fthtest utility exits when it reaches the end of the input file or when you enter quit and press Enter (for MS-DOS), CTRL+Z (for 32-bit Windows), or CTRL+D (for UNIX).

If you specify an invalid character variant rule, fthtest returns an error message.

Using the thesaurus file

The thesaurus file contains rules for generating plural and possessive forms of search words. It can also contain the spelled out versions of abbreviations and synonyms. These word variations enable you to perform thesaurus expansions when you search for a particular word and its variants. You can customize the thesaurus file (FULTEXT.FTH) that is installed when you install the SearchServer search and retrieval engine or you can create your own.

SilverStream sets the thesaurus file to FULTEXT.FTH. An .FTH file can be a binary or an object file. Use the fthmake utility to create a new FULTEXT.FTH file in the FULCRUM\FULTEXT directory. You must create a text source file for the utility to use to create the .FTH file.

In order to create your own thesaurus file, you must supply a source file. It must have the .FTS file extension. Use the fthmake utility to compile the source file. The compiled source file is referred to as the object file. It has an .FTH file extension. You can compile the source file either alone or with a character variant rule file, which is described in an earlier section.

If you write your thesaurus file using a character set that is different from the SearchServer character set (FTCS94), you must process the source file using the appropriate set of text readers. The text readers translate the characters into a format that SearchServer can recognize. You can specify which text readers to use when you invoke fthmake.

Once you successfully compile your thesaurus source file, you should use the fthtest utility to test the object file.

NOTE You should test your object file, otherwise the results of a thesaurus search could be unpredictable if there is a problem with your thesaurus file.

If you are going to use your thesaurus file to search tables that reside on a remote node using a server other than your local server, it must be accessible to the remote server.

If SearchServer cannot read the thesaurus file, the expansion functions are disabled without warning. It tries to execute the search but does not generate any new terms.

Thesaurus rules

Thesaurus files contain two kinds of rules:

Synonym rules tell the search engine to look for search equivalents wherever a specified search term is used. For example, if you had the following entry in your thesaurus file:
```
  small little tiny miniscule;  
```
the result set of a search on the word small would also contain any instances of the words little, tiny, and miniscule.
Suffix rules tell the search engine to not only look for the search term ending in suffix x, but to also look for the same root word, with suffix y. Suffix rules begin with a plus (+) sign as the first character. Synonym rules do not have a leading character.

Examples

The following example begins with a suffix rule. The last two lines are synonym rules. Every rule in a thesaurus file must end with a semi-colon :

  +y: y ies 's; 
  +% s 's; 
  dog dogs dog's; 
  round roundabout rounded;

Thesaurus rules can have two parts to them: a left-hand side (LHS) and a right-hand-side (RHS). The two sides are separated with a colon. The entire rule ends with a semi-colon. Rules can span more than one line in the file. The words, phrases, and suffixes listed in each rule must be separated by spaces. If you include a phrase in the RHS of a rule, you must separate the words in the phrase with hyphens.

If you omit the colon separator and the RHS from a rule, SearchServer interprets the RHS to be the same as the LHS. If you include the colon but omit the RHS, no new search terms are generated and the original term is not changed. You can use this technique to suppress suffix expansion for selected words.

The LHS contains the words or suffixes you want to match when SearchServer looks a search term up in the thesaurus file. The RHS contains the list of alternative synonyms (which can be words or phrases) or suffixes.

When SearchServer matches a word with one of the LHS entries, the original term is either equated with the alternatives contained in the RHS or a new term is created by combining the root search word with the alternative suffixes contained in the RHS.

For synonym rules, the RHS should include plurals, possessives, and any other alternatives that can be derived from the root search word contained in the LHS. When the same root word appears in more than one LHS of more than one rule in your thesaurus file, the synonym lookup generates a list of alternatives that is a combination of the RHS of all the matching rules.

For suffix rules, the LHS and optional RHS contain lists of suffixes separated by a space. You can include the percent sign (%) to represent null suffixes.

Suffix searches execute in the following manner:

The longest suffix contained in the LHS of any suffix rule is matched first. The percent sign is considered the "suffix of last resort." You should use it in the LHS of only one rule. You can use it in the RHS of several rules.
Synonym rules have precedence over suffix rules. A match occurring between a search term and a word in the LHS of a synonym rule prevents any suffix processing for that term, regardless of whether any alternatives are generated.

Search restrictions

SearchServer implements the following restrictions whenever a search accesses the current thesaurus file.

Never look up stop words.
Only look up individual search terms, including words or phrases with embedded punctuation.
Exclude words containing wildcards as well as any word generated using wildcard expansion and phrases containing embedded spaces.
Only look up alphabetic words that have two or more letters.
If you include synonyms such as small:little; and tiny:miniscule; as separate rules in your thesaurus file, a search on small does not return miniscule.

Thesaurus search examples

The following examples illustrate the suffix and synonym rules used in thesaurus searches.

If the suffix rule is:

  + % s 's;

a thesaurus search on the word Dog returns the following:

Dog

Dogs

Dog's

It is important to note that the preceding rules do not include the suffixes s' or ies'. SearchServer character classes associated with the word indexing rules cause a trailing apostrophe to be ignored for indexing purposes. So, when you execute a search for the word ponies, the word ponies' is included in the result set unless the word appears in a phrase. Because of this, you do not need to include normal possessive plurals.

The following example illustrates different forms of the synonym rules:

The rule is:

  d.e.c dec dec's: d.e.c dec dec's 
  digital-equipment-corp 
  digital-equipment-corporation 
  digital-equipment-corporation's; 
  dec december;

The result set includes:

  d.e.c 
  dec's 
  dec 
  december 
  digital-equipment-corp

The rule is:

  One 1; 
  First 1st;

The result set includes

  One  
  1 
  first 
  1st

The rule is

  whereas wherefore:;

The result set does not include any alternatives.

This type of rule is not strictly necessary because alternatives produced by the suffix rules are not likely to occur in any document. Suffix rules improve search performance because they prevent the generation of alternatives that would otherwise have to be looked up in the index files. Any words that appear in a thesaurus file that are also included in the stop file are not looked up.

Character variant generation and thesaurus searches

You can specify a thesaurus search and character variant generation in the same query. Combining the content of the two files allows SearchServer to generate meaningful queries while still providing a thorough cross-matching of terms. The following rules apply:

When you enable character variant generation, thesaurus possessive generation is automatically disabled. The character variant rules file can contain suffix rules, so this restriction prevents the generation of unwanted terms with double suffixes.
When you enable both a thesaurus search and variant generation, the thesaurus search executes first. Each term generated by the thesaurus search then produces its own set of variants which can result in a large set of equivalent terms.
Enabling character variant generation does not disable the suffix component of your thesaurus search. It is a good idea, therefore, to make sure any suffix rules are contained in only one file. Use the fthtest utility to test the interaction of the thesaurus and character variant files.

If you want to allow for the possibility of typographical variants in the terms being used in a thesaurus search, you can include all possible variant forms in the LHS of each thesaurus rule. To save time, you can perform this function automatically by compiling the thesaurus file and the character variant file together.

Compiling and testing your thesaurus file

Compile and test your customized thesaurus file using the fthmake and fthtest utilities.

The fthmake utility compiles the source file and enables you to name the object file. The fthtest utility is an interactive utility that lets you check the compiled object file and verify that the equivalent terms contained in the result set match the original search word.

Compiling

The fthmake utility compiles your thesaurus by reading the source file and creating an object file. If you are using a character variant rules file in addition to the thesaurus file, make sure that the thesaurus lookup includes any typographical rules that are duplicated in the character variant file. This ensures that any duplicated rules are incorporated into the thesaurus object file and are subsequently ignored in the variant rules file.

Use the following syntax when invoking the fthmake utility:

  fthmake sourcefile objectfile [-f text-reader_list] [-l rulesfile]

Where:

Component

Description

sourcefile

The name of the thesaurus source file. It must contain the .FTS file extension.

objectfile

The name of the thesaurus object file. It must contain the .FTH file extension. The object file name can have a maximum of eight characters in addition to the file extension.

text-reader_list

The text reader list that you want to apply to the source file.

NOTE The fthmake utility does not support redirection of either input or output. You must use an explicit value for both the source file and the object file on the command line.

rulesfile

The name of the character variant rules file that you want to incorporate into the thesaurus object file. When you specify this option, every word contained in the LHS of a thesaurus source rule becomes subject to character variant expansion according to the rules contained in the variant file. Typographic variants of search terms are then eligible for thesaurus expansion.

Component	Description
sourcefile	The name of the thesaurus source file. It must contain the .FTS file extension.
objectfile	The name of the thesaurus object file. It must contain the .FTH file extension. The object file name can have a maximum of eight characters in addition to the file extension.
text-reader_list	The text reader list that you want to apply to the source file. NOTE The fthmake utility does not support redirection of either input or output. You must use an explicit value for both the source file and the object file on the command line.
rulesfile	The name of the character variant rules file that you want to incorporate into the thesaurus object file. When you specify this option, every word contained in the LHS of a thesaurus source rule becomes subject to character variant expansion according to the rules contained in the variant file. Typographic variants of search terms are then eligible for thesaurus expansion.

In the following example, you rebuild the sample thesaurus file SUPPORT.FTH, switch to the directory where the corresponding source file, SUPPORT.FTS is located and enter:

  fthmake support.fts support.fth

You do not have to specify the -f parameter. The utility uses the default text reader (nti:s) to read the source file. If you select the translation text reader, the source text is translated to the FTICS equivalent. If the source is already in FTICS, you should use the standard text reader (s).

If the utility encounters any compilation errors, it generates a standard error message before it exits. Compilation errors include:

Cannot read the source file
The object file cannot be created because of an access protection
Source file syntax error
A suffix rule that appears more than once in the source file

If the utility encounters a problem writing to any part of the object file, the following message appears:

  Can't write objectfile

If the object file was created but writing has not completed due to an error, the object file is removed.

Testing the object file

Use the fthtest utility to test your compiled object file. Use the following syntax to test the thesaurus expansion using one or more terms:

  fthtest objectfile [term]

Specify the following command line to use the full capability of the utility:

  fthtest term -h objectfile [-c table_name][-t outfilename][-l rulesfile]

Where:

Component

Description

objectfile

The name of the compiled thesaurus object file. If you invoke the utility using just the object filename, fthtest prompts you to enter a term, looks it up in the thesaurus file and reports the results.

If you include a term on the command line, fthtest looks it up and reports the results.

-c tablename

Tells the fthtest utility to test the generated terms against the specified table. Only the equivalent terms found in the table are returned. You can optionally specify the name of an output file to record the results of the test.

outfilename

The name of the optional output file that you can create to contain the report results.

-l rulesfile

Causes fthtest to apply character variant expansion after performing thesaurus expansion. This tests the interaction of the two expansion forms using the thesaurus and character variant rules files. rulesfile indicates the name of the character variant rules file that you want to test with your thesaurus file. If you specify a character variant rules file, fthtest performs a character variant expansion on the terms generated by the thesaurus expansion.

Component	Description
objectfile	The name of the compiled thesaurus object file. If you invoke the utility using just the object filename, fthtest prompts you to enter a term, looks it up in the thesaurus file and reports the results. If you include a term on the command line, fthtest looks it up and reports the results.
-c tablename	Tells the fthtest utility to test the generated terms against the specified table. Only the equivalent terms found in the table are returned. You can optionally specify the name of an output file to record the results of the test.
outfilename	The name of the optional output file that you can create to contain the report results.
-l rulesfile	Causes fthtest to apply character variant expansion after performing thesaurus expansion. This tests the interaction of the two expansion forms using the thesaurus and character variant rules files. rulesfile indicates the name of the character variant rules file that you want to test with your thesaurus file. If you specify a character variant rules file, fthtest performs a character variant expansion on the terms generated by the thesaurus expansion.

The fthtest utility exits when it reaches the end of the input file or when you enter quit, press Enter (for MS-DOS), CTRL+Z (for 32-bit Windows), or CTRL+D (for UNIX).

Test messages

The following messages indicate the results of the test:

Synonym followed by a list of synonyms
Synonym empty occurs if a matching synonym rule has an empty RHS.
Suffix followed by a list of words formed by combining the alternative with the root word.
Suffix empty occurs when a matching suffix rule does not have an RHS.
Converts to nothing occurs in response to an input term. It indicates a failure to read the thesaurus object file after it was opened.
Failed to open file for input occurs if you specified an invalid thesaurus file.

The following is an example of an interactive test session using fthtest and the sample thesaurus file SUPPORT.FTH:

  fthtest support.fth  
  237: enter term:  
  pony  
  240: suffix:  
  ponie's  
  ponies  
  pony  
   
  237: enter term:  
  disc  
  238: synonym:  
  disk  
  disc  
  disks  
  floppy  
  floppies  
  diskette  
  diskettes

The following example uses fthtest to test the interaction between the sample thesaurus file and a character variant rules file.

  fthtest disc -h support.fth -l fultext.ftl

The search words generated in a search on the support table would include:

disk
disk's
disc
disc's
discs
disks
disks's
discs
discs's
discss
disks

floppy
floppy's
floppies
floppies's
floppiess
diskettes
diskette's
diskettes
diskettes's
diskettes
floppys

disk disk's disc disc's discs disks disks's discs discs's discss disks	floppy floppy's floppies floppies's floppiess diskettes diskette's diskettes diskettes's diskettes floppys

The utility applies thesaurus expansion to the term disc first which produces the alternatives disk, disks, disc, discs, floppy, floppies, diskette, and diskettes. These alternative forms are then expanded using the character variant rules.

Placing your object file

You can avoid overwriting any existing thesaurus files that you might have by compiling your new file into a temporary object file. Once you test it, you can copy it or rename it to replace the existing object file.

Configuring SearchServer

You can use the SilverStream Management Console (SMC) to configure these Fulcrum SearchServer properties:

How often Fulcrum SearchServer checks and performs incremental indexing for rows whose full-text-search columns have been modified
The maximum number of hits Fulcrum will return for a full text search query
Whether SilverStream will fully reindex tables marked for full text search at server startup time

For more information, see the chapter on maintaining SilverStream in the Administrator's Guide.

Search types

This section covers the following:

Examples of each of these types of searches along with a brief description occur later on this page.

Pattern matching

A pattern is a character string that you use to search for words or phrases in a column. The pattern syntax is as follows:

  ::=character string literal_[escape clause]  
  [escape clause]  
  ::=ESCAPE quote [non quote character] quote

A pattern is formed like a character string but SearchServer interprets it differently. It is distinguished from a character string literal by its optional escape clause. SearchServer interprets a pattern differently depending on the index mode of the column being searched.

Interpreting patterns

SearchServer recognizes the extent of a word based on the lexical rules of Latin languages, such as English. This means that a word is defined as any sequence of letters or digits delimited by white space (spaces, newlines, tabs, etc.) or punctuation characters. For example, you can enter a term as a complete or incomplete word, or you can embed a comma or a period in a numeric word to represent monetary values, as shown:

  '1, 016.31'

A space and the following punctuation characters take on a special meaning when they are embedded in a pattern:

Hyphen (-)
Backslash (\)
Underscore (_)
Percent(%)

Use the escape character (|) to search for one of these characters in a table column.

SearchServer is not case-sensitive for alphabetic characters in a pattern or for search text. However, the case-sensitivity of pattern matching can be controlled for each table.

Each internal character set included with SearchServer has a set of parsing rules included with it. These parsing rules define how indexing treats each character in a character set.

The following table shows possible word and phrase matches for patterns used in a column.

Pattern examples

Matched text examples

Character

Pattern

Normal Index Mode

Literal Index Mode

Space ()

'on

line'

On<tab>line

On<newline>line

On line

On<tab>line

On<newline>line

Escape (\) with a space()

'ON\ line'

On<tab>line

On<newline>line

On Line

On line

Hyphen (-)

'on-line'

On<tab>line

On<newline>line

Online

On line

On-line

On;line

On.line

On&line

On ; line

On. Line

On &line

On; line

On. Line

On/line

On a line

On-line

Escape (\) with Hyphen (-)

'On\-Line'

On<tab>line

On<newline>line

On line

On-line

On;line

On.line

On&line

On ; line

On. Line

On &line

On; line

On. Line

On\line

On a line

On-line

Any punctuation character

'On&line'

On<tab>line

On<newline>line

On line

On-line

On;line

On.line

On&line

On ; line

On. Line

On &line

On; line

On. Line

On\line

On a line

On&line

Underscore (_)

'Wo_d'

Word

Wood

Word

Wo-d

wood

Percent (%)

'wor%'

Word

Wordage

Wording

Wordless

Wordplay

Words

Wordsmith

Wordy

Wor%

Wor&

Wor a

Word

Wordage

Wording

Wordless

Wordplay

Words

Escape (\) with percent (%)

'wor\%'

Wor%

Wor%

Wor%

Pattern examples		Matched text examples
Character	Pattern	Normal Index Mode		Literal Index Mode
Space ()	'on line'	On<tab>line On<newline>line On line		On<tab>line On<newline>line
Escape (\) with a space()	'ON\ line'	On<tab>line On<newline>line On Line		On line
Hyphen (-)	'on-line'	On<tab>line On<newline>line Online On line On-line On;line On.line	On&line On ; line On. Line On &line On; line On. Line On/line On a line	On-line
Escape (\) with Hyphen (-)	'On\-Line'	On<tab>line On<newline>line On line On-line On;line On.line On&line	On ; line On. Line On &line On; line On. Line On\line On a line	On-line
Any punctuation character	'On&line'	On<tab>line On<newline>line On line On-line On;line On.line On&line	On ; line On. Line On &line On; line On. Line On\line On a line	On&line
Underscore (_)	'Wo_d'	Word Wood		Word Wo-d wood
Percent (%)	'wor%'	Word Wordage Wording Wordless Wordplay Words	Wordsmith Wordy Wor% Wor& Wor a	Word Wordage Wording Wordless Wordplay Words
Escape (\) with percent (%)	'wor\%'	Wor%	Wor%	Wor%

NOTE These examples do not exactly reflect where the match codes would be placed for highlighting, The LITERAL index mode examples are assumed to be extracted from text where they are delimited by LITERAL mode separator characters (for example tab or newline characters). Because this table is meant to only illustrate various possibilities, some of the search terms cannot be verified using the SUPPORT table.

Using special characters in a pattern

You can use special characters in your search queries that are interpreted differently when you embed them in a pattern.

Use the space character ( ) to separate search terms that must occur in sequence.
Use the hyphen (-) character to search for hyphenated and non-hyphenated pattern forms.
Use the underscore(_) character as the wildcard character when you want to match a single character.
Use the percent (%) character in a wildcard search when you want to match a string of characters.
Search for the literal meaning of a special character by inserting a backslash (\) character immediately before the special pattern character.

Searching for accented characters in a pattern

SearchServer ignores accented characters by default. When it encounters an accented character in a pattern or in column data, it ignores the accent and retains the unaccented character. For example, if you issued the following search query:

  candidate fullTextSearch "'resume'"  
  The result set would include the following words: 
  resume 
  resumé

Single word searches

This is the simplest type of search. When you include a single search tem in your query, the search engine returns the table rows that contain that term. In the following example, the query searches the document table for the word "rutabaga."

  document fullTextSearch "'rutabaga'"

The result set contains all the table rows where this word occurs.

Wildcard searches

The following example shows what a wildcard query statement looks like:

  candidates fullTextSearch "'respons%'"

This statement tells SearchServer to search the candidates table for all words beginning with the string respons. The result set includes:

  Response 
  Responsibility 
  Responsible 
  Responsive

The percent sign acts as the wildcard character in this example. It represents a string of characters. You can use it anywhere within a word in a query. If you want to embed the percent sign as a literal in a word or phrase, you must preface it with the backslash (\) escape character. You can also use the underscore character as a wildcard. It represents a single character, as shown in the following example:

  candidates fullTextSearch "'respons_'"

The result set for this search only contains one word: response.

Text string searches

You can search for text strings as well as individual words. In the following example, the query searches the document table for instances of the phrase "Now is the winter of our discontent." You must surround the phrase you want to search for with single quotes.

  document fullTextSearch "'now is the winter of our discontent'"

Combination searches

You can combine words or predicates in a search query by using the AND clause, the ampersand character (&), the pipe (|) character, or the tilde (~). The following example searches the document table for occurrences of the words Shakespeare and Marlowe

  document fullTextSearch "'Shakespeare'|'Marlowe'"

Relevance ranking searches

You can calculate the relevance of each row in a table by using the relevance function. The following table describes the relevance ranking options.

Option

Relevance ranking

Description

2:1

Hits Count

Returns the total number of occurrences of the individual words that match the search criteria, regardless of how often the word appears in the table.

2:2

Terms count

Returns the number of search terms that were matched regardless of how often the term appears in the table.

2:3

Terms Ordered

Returns both the number of times each search term occurs, and how common the terms are over the rows in the table.

2:4

Critical Terms Ordered

Ranks the result set in such a way that the most important terms appear before the terms that appear most frequently in the table.

Option	Relevance ranking	Description
2:1	Hits Count	Returns the total number of occurrences of the individual words that match the search criteria, regardless of how often the word appears in the table.
2:2	Terms count	Returns the number of search terms that were matched regardless of how often the term appears in the table.
2:3	Terms Ordered	Returns both the number of times each search term occurs, and how common the terms are over the rows in the table.
2:4	Critical Terms Ordered	Ranks the result set in such a way that the most important terms appear before the terms that appear most frequently in the table.

The data type for the value that is returned by the relevance predicate is INTEGER. The value can either be null or a positive integer. The minimum value is one. The maximum value depends on which option you specify in the query. When you do not specify an option the return value is null.

The search query in the following example searches the candidate table for the most occurrences of the word please:

  candidates fullTextSearch "'shakespeare'|'marlowe' order by relevance('2:1')"

Thesaurus searches

The SearchServer search and retrieval engine enables you to search a table for occurrences of words and their equivalents. When you install SearchServer, a standard thesaurus file containing common words is provided. As described earlier, you can customize this file to add words of your own, or you can create a new thesaurus file.

The following example shows a query statement that searches the resume column of the candidates table using the word_synonym option.

  candidates fullTextSearch "thesaurus('applicant' word_synonym)"

The result set contains all the table rows that contain the word "applicant" and its equivalent.

This example shows a query statement that searches the candidates table using the word_suffix option:

  candidates fullTextSearch "thesaurus('applicant', word_suffix)"

The result could include the following words:

  applicants 
  applicant's 
  applicants'

The following example shows a query statement using the word_similarity option. This option combines the word_suffix and word_synonym options. It gives synonym processing priority over suffix processing. If there is no synonym match, there is no further search for an additional suffix match. If there is no synonym match, then it performs suffix processing.

  candidate fullTextSearch "thesaurus('applicant', word_similarity)"

The result set could contain the following:

  candidate  
  candidates  
  applicant  
  applicant's

The word_broaden and word_narrow options are equivalent to the word_synonym option. They are included for clarity if the thesaurus file is intended to broaden or narrow the specified term, as shown in the following example:

  candidates fullTextSearch "thesaurus('applicant', word_narrow)"

The result set could include the following:

  applicant  
  candidate

Proximity searches

The proximity predicate enables you to test for the proximity of multiple search lists. SearchServer determines proximity by counting the indexed characters from the end of one search term or phrase to the beginning of another. This predicate evaluates to TRUE if the search terms are within the specified distance.

In the following example, the documents table is searched for the proximity of the terms foo and bar.

  documents fullTextSearch "'foo' within 10 characters of 'bar'"

SearchServer searches for any occurrence of the specified words regardless of the order in which they appear in the table. If you specify the IN_ORDER option in your search query, SearchServer searches for the words in the exact order they appear in your query.

Creating search queries

You can create search queries in two ways:

Using the Expression Builder

Creating a full text search form

You can use the Property Inspector to create a full text search form.

To create a full text search form:

Select the Form icon from the SilverStream Designer. A list of available forms appears.
Select the form that you want and double-click. The Form Designer appears.
Select the whole form and access the Property Inspector.
Select the Form property tab.
Select the plus sign next to Data on the property sheet to expand the list. The Where clause field appears.
Select the ellipses next to this field. The Expression Builder appears.
Double click on the table name that appears in the Variables pane. The table name appears in the field at the bottom of the dialog. The table name must be the first entry in the query.
Select the plus sign next to Other in the Operators pane. A list of miscellaneous operators appears.
Select Full Text Search from the list. It appears in the field at the bottom of the Expression Builder dialog.
Add your search statement. It must start with double quotation marks. The word you are searching for must appear in single quotes. The following example shows a thesaurus query statement:
```
  candidates fullTextSearch "thesaurus ('address', word_synonym)"  
```

Creating queries for data-loaded list controls

You can create search queries for the following data loaded controls on a form:

Tree controls
Combo boxes
Choice boxes

To create search queries for data-loaded controls:

Put your cursor on the control on the form and press the right mouse button. The Property Inspector appears.
Select the plus sign that appears next to List Choices to expand the list.
Select From Table from the Load Choices drop-down list.
Select the table that you want to use from the drop-down list in the Table Name field.
Select the ellipses in the Where clause field to access the Expression Builder and build your search query.

Creating full text search views

To build search queries for bands within a view:

Select the view from the SilverStream Designer window. The View Designer appears.
Select the band you want to create the search query for and open the Property Inspector.
Select the Band property sheet.
Select the ellipses that appear next to the Where clause field to access the Expression Builder.
Build your search query.

Creating search queries for business objects

To create a search query for a business object:

Select the Objects icon from the SilverStream Designer. A list of available business objects appears.
Select a business object and double-click. The Business Object Designer appears.
Open the Property Inspector.
Select the Data tab.
Select the ellipses in the Where clause field to access the Expression Builder and build your search query.

Using the Programming Editor

You can create search queries programmatically using the Programming Editor. You could use this method to create queries for buttons on a form, for example.

To build search queries using the Programming Editor:

Select the type of button you want to use on your form by choosing one from the Form Designer toolbar.
Place the button on your form and double-click. The Programming Editor appears.
Select the Java mode button. The Programming Editor window changes to enable you to enter Java code.
Enter the appropriate Java for the type of query that you want associated with the button. For example, if you wanted the button to trigger a simple single word search query on a field, you would enter code similar to the following:
```
  String searchstr=field1.getText(); 
  try 
  { 
  agData.query("tablename fullTextSearch \"'"+ searchstr + "'\"");  
  }  
  catch (Agoexception e)  
  {  
      agDialog.displayError (e); }  
```