SilverStream uses the Fulcrum SearchServer full text search and retrieval engine to provide a powerful tool that enables users of your application to search through large amounts of information using full text search.
NOTE The SilverStream help system uses full text search.
This chapter describes the following topics:
NOTE The following documentation is adapted from the Fulcrum SearchServer SearchSQL Reference and the SearchServer Data Preparation and Administration Guide. For more information on the topics discussed in the following sections, refer to the SearchServer online documentation (.HLP files) located in your local fulcrum\bin directory.
The SilverStream full text search feature is based on the ANSI Structured Query Language (SQL), which is the standard interface language for accessing databases. SQL provides language extensions that support text retrieval. The combination of the queries you create in SilverStream and the search engine provided by Fulcrum enables you to:
You can create a number of different searches. You can:
NOTE You can install the SilverStream help system as a database and use the full text search engine to look for keywords and phrases.
Creating full text searches
You can use the Expression Builder to create your own full text search expressions. You can also create full text search expressions programmatically.
You can create search queries for:
NOTE You cannot use full text search on date, time, and timestamp fields.
You must install SearchServer separately from SilverStream. The SearchServer software is included on the SilverStream CD. You must install SearchServer before you install the SilverStream software. Refer to the SilverStream Installation Guide for a description of this procedure. When you install SearchServer on your PC, the fulcrum\fultext directory is created on your PC. It includes the following files:
All of these files are described in more detail in the appropriate sections in this chapter. You can also look in the online documentation located in your local fulcrum\bin directory.
Before you begin creating and executing search queries, you must:
NOTE For a definition of Indexing, see Index.
Avoid these characters in primary key columns
Make sure that the primary key columns for your tables do not contain the following characters as data:
: ! @ ,
If primary key columns contain any of these characters, SearchServer will be unable to search the table.
What happens
The server automatically indexes all tables marked as full text searchable every time records are updated or deleted.
Manually indexing tables
SilverStream provides the class AgFullText, which contains two methods--index()
and indexBlock()
--that you can call to manually index a table for full text search if data changes occur externally to SilverStream. For more information, see the API online documentation.
Also, any time you customize your thesaurus and stop files you have to reindex the tables associated with them.
The SilverStream full text search feature is based on the ANSI Structured Query Language (SQL), which is the standard interface language for accessing databases. SQL provides language extensions that support text retrieval.
You can create full text search queries anyplace in SilverStream that you can put a WHERE clause. This includes forms, views, pages, business object, and data-loaded form controls. You can create full text search queries one of two ways:
For a description of the Expression Builder, see Expression Builder.
query()
method of a form, page, view, or other control.
This section covers the following:
The syntax of the query statement varies depending how complex you want the query to be. For example, the syntax for a single word search looks like this:
tablename fullTextSearch "'literal'"
The syntax for a more complex query looks like this:
tablename fullTextSearch "predicate ('literal' predicate_option)"
When you use the Expression Builder to build a search query, you do not have to worry about using the correct SearchServer syntax because SilverStream translates the query for you. For example, to create a thesaurus search query for the CARS table in SearchServer, you would have to enter a statement similar to the following:
SELECT * FROM CARS WHERE DESCRIPTION CONTAINS 'HOTROD'
The same query built in the Expression Builder looks like this:
cars fullTextSearch "'hotrod'"
Every full text search query you create must begin with the name of the table being searched.
When you create a query by coding it in Java using the Programming Editor use the following syntax:
"tablename fullTextSearch \"'"+ search + "'\""
You start by identifying the table, the same way you do when you use the Expression Builder. Surround the search terms (string variables) in single and double quotes. To use double quotes within a Java string, precede them with the backslash character. SearchServer searches the table you specified for terms that match the search criteria. It then creates a working table that contains the rows that meet those criteria.
Stop files identify common words such as or and the that you do not want indexed. Words that are not indexed cannot be searched. If you include a stop word in a search query SearchServer treats them as though they match every row in the table you are searching.
The FULTEXT.STP file is supplied with the SearchServer software. You can add your own stop words to this file. A stop file can also contain character class definitions that modify the rules that SearchServer uses to recognize numeric punctuation. Stop files improve the search engine's indexing and search capabilities by eliminating unnecessary searches. You can customize the existing stop file or you can create your own using any text editor or word processing package.
Stop files typically contain alphabetic words, but they can also contain other characters. You should not include a word in the stop file unless the word has absolutely no search value in all contexts. For example, the letter a is not included in the FULTEXT.STP file because it could be an important designator in some cases, such as searching for the term Appendix A.
The FULTEXT.STP file contains the following words:
Stop files can contain as many as 1024 words totaling no more than 10,000 characters. Each entry in the file must be unique.
NOTE When you modify a stop file, you must reindex all the tables associated with it.
You can add multiple words to a line in the stop file. You must conform to the following syntax rules:
Any sequence of characters, excluding the space character, the number sign (#), and the equality symbol (=) | |
An optional carriage return character followed by a line feed character |
SearchServer performs case normalization for alphabetic characters automatically, so it does not matter whether you add words using uppercase or lowercase letters.
You cannot include accented characters in stop words unless you enable accent indexing in the configuration files of the tables associated with the stop file.
SearchServer enables you to use the character variant search feature, which treats typographical variants of a word as equivalents for search purposes. This feature makes sure that potential mismatches in a search due to subtleties of language or other external restrictions are avoided. For example, you can tell SearchServer to include the German word Frühling as an equivalent for the word Fruehling in a search query.
Character variant generation is controlled by the character variant rules contained in the character variant rules file. These rules contain instructions for removing or inserting accents, as well as modifying the suffix of a query term. SearchServer supports English, French, German, and other European language character variants.
There are three character variant rules files included with the SearchServer software:
You can modify any of these three files or you can create a new character variant rules file using any text editor or word processing package.
Use the fthtest utility to test the file. For more information about this utility, see Testing the file.
SilverStream sets the character variant file to fultext.ftl. The only time you should modify this file is when you want to use character variant functionality. Variant generation operates under the assumption that the string to substitute can be completely replaced, regardless of context. The rules can include removing or including the accents in a query term, or modifying the suffix of a query term.
Each substitution causes a variant form to be added to the search along with the original search term. For example, a rules file could specify the replacement of every e by the three accented forms è, é, and ê. The search term donne would return the words donne, donné, donnê, and donnè. You cannot modify a replacement string with another rule.
The maximum number of rules per file is 40. You can apply a maximum of 30 simultaneous substitutions to a given word. If one of these limits is exceeded, SearchServer rejects the query. SearchServer also rejects a query if the format of a character variant rule does not conform to the syntax described in the following section.
Each rule in the character variant file must be on its own line. Every rule has four fields. Each field has a specific starting column and a maximum length, as shown in the following table:
You must pad your target and replacement strings with space characters when they occupy fewer than four characters.
A suffix matching rule can have an empty target string. In this case every original term generates a character variant that has the replacement string appended as a suffix. Suffix rules are applied only to an ordinary word by itself or as the last component of an implied phrase. For example, given the terms friend% and micro-computer, suffix rules are only applied to the word computer.
Suffix rules are not applied to single-character words. The same rule applies to the last component of an implied phrase, where the last component must contain at least two characters to be eligible for suffix substitution.
NOTE The total number of terms that can result from a single search word can become very large when you are using several substitution rules at one time. SearchServer looks up each generated term which means that a large number of search words (more than a few hundred) can slow down response time to unacceptable levels even if only a few hits actually occur in a table.
Character variant generation is applied to stop words. To avoid searches for stop words, you must include all the variants of the stop words in the stop file.
The character variant rules file must be in FTICS. To allow convenient editing of this file using a 7-bit ASCII editor, the rules can contain certain multi-character sequences. This allows the representation of all characters in the FTICS.
The rules file is processed in much the same way as the test text reader. The test text reader recognizes a five-character sequence (beginning with \Fx and ending with a two-character hexadecimal representation) as a single character in the FTICS. Each of these sequences counts as only one character.
The following rules from FULTEXT.FTL append the plural suffix s and the English possessive suffix 's to a word:
% s % 's
In both cases the suffix is separated from the percent sign by exactly four spaces.
NOTE It is important that you use the correct spacing when creating rules. If you do not use the correct spacing, the line is ignored.
The following rules from the GERMAN.FTL file bidirectionally substitutes the substring ue for ü:
:UE \Fxc8U :ue 'Fxc8u :\Fxc8U UE :\Fxc8U ue
In each rule exactly two spaces separate the target and replacement fields.
Character variant rules are case-sensitive. The sample rules files included with SearchServer contain redundant rules differing only in the case of the letters in the target field.
The case of the letters in the replacement field is not important because SearchServer performs case normalization before it performs a dictionary lookup.
To extend the equivalence of a string like ue and ü to single wildcard matching, you can include an additional rule: an indexed accent followed by an alphabetic character is treated as one character. To extend this to the character string ue, you have to include the following rule in your file:
:\Fx18 ue
where \Fx18 is a special code representing a single character wildcard. This rule must contain exactly three spaces between the target and replacement fields.
Use the fthtest utility to test your character variant rules file. This utility enables you to verify how the equivalent terms generated by the rules file compare to the search term. Use the following syntax:
fthtest term -l rulesfile [-c tablename] [-t outfilename]
The fthtest utility exits when it reaches the end of the input file or when you enter quit and press Enter (for MS-DOS), CTRL+Z (for 32-bit Windows), or CTRL+D (for UNIX).
If you specify an invalid character variant rule, fthtest returns an error message.
The thesaurus file contains rules for generating plural and possessive forms of search words. It can also contain the spelled out versions of abbreviations and synonyms. These word variations enable you to perform thesaurus expansions when you search for a particular word and its variants. You can customize the thesaurus file (FULTEXT.FTH) that is installed when you install the SearchServer search and retrieval engine or you can create your own.
SilverStream sets the thesaurus file to FULTEXT.FTH. An .FTH file can be a binary or an object file. Use the fthmake utility to create a new FULTEXT.FTH file in the FULCRUM\FULTEXT directory. You must create a text source file for the utility to use to create the .FTH file.
In order to create your own thesaurus file, you must supply a source file. It must have the .FTS file extension. Use the fthmake utility to compile the source file. The compiled source file is referred to as the object file. It has an .FTH file extension. You can compile the source file either alone or with a character variant rule file, which is described in an earlier section.
If you write your thesaurus file using a character set that is different from the SearchServer character set (FTCS94), you must process the source file using the appropriate set of text readers. The text readers translate the characters into a format that SearchServer can recognize. You can specify which text readers to use when you invoke fthmake.
Once you successfully compile your thesaurus source file, you should use the fthtest utility to test the object file.
NOTE You should test your object file, otherwise the results of a thesaurus search could be unpredictable if there is a problem with your thesaurus file.
If you are going to use your thesaurus file to search tables that reside on a remote node using a server other than your local server, it must be accessible to the remote server.
If SearchServer cannot read the thesaurus file, the expansion functions are disabled without warning. It tries to execute the search but does not generate any new terms.
Thesaurus files contain two kinds of rules:
small little tiny miniscule;
the result set of a search on the word small would also contain any instances of the words little, tiny, and miniscule.
Examples
The following example begins with a suffix rule. The last two lines are synonym rules. Every rule in a thesaurus file must end with a semi-colon :
+y: y ies 's; +% s 's; dog dogs dog's; round roundabout rounded;
Thesaurus rules can have two parts to them: a left-hand side (LHS) and a right-hand-side (RHS). The two sides are separated with a colon. The entire rule ends with a semi-colon. Rules can span more than one line in the file. The words, phrases, and suffixes listed in each rule must be separated by spaces. If you include a phrase in the RHS of a rule, you must separate the words in the phrase with hyphens.
If you omit the colon separator and the RHS from a rule, SearchServer interprets the RHS to be the same as the LHS. If you include the colon but omit the RHS, no new search terms are generated and the original term is not changed. You can use this technique to suppress suffix expansion for selected words.
The LHS contains the words or suffixes you want to match when SearchServer looks a search term up in the thesaurus file. The RHS contains the list of alternative synonyms (which can be words or phrases) or suffixes.
When SearchServer matches a word with one of the LHS entries, the original term is either equated with the alternatives contained in the RHS or a new term is created by combining the root search word with the alternative suffixes contained in the RHS.
For synonym rules, the RHS should include plurals, possessives, and any other alternatives that can be derived from the root search word contained in the LHS. When the same root word appears in more than one LHS of more than one rule in your thesaurus file, the synonym lookup generates a list of alternatives that is a combination of the RHS of all the matching rules.
For suffix rules, the LHS and optional RHS contain lists of suffixes separated by a space. You can include the percent sign (%) to represent null suffixes.
Suffix searches execute in the following manner:
SearchServer implements the following restrictions whenever a search accesses the current thesaurus file.
The following examples illustrate the suffix and synonym rules used in thesaurus searches.
+ % s 's;
a thesaurus search on the word Dog returns the following:
It is important to note that the preceding rules do not include the suffixes s' or ies'. SearchServer character classes associated with the word indexing rules cause a trailing apostrophe to be ignored for indexing purposes. So, when you execute a search for the word ponies, the word ponies' is included in the result set unless the word appears in a phrase. Because of this, you do not need to include normal possessive plurals.
The following example illustrates different forms of the synonym rules:
d.e.c dec dec's: d.e.c dec dec's digital-equipment-corp digital-equipment-corporation digital-equipment-corporation's; dec december;
d.e.c dec's dec december digital-equipment-corp
One 1; First 1st;
One 1 first 1st
whereas wherefore:;
The result set does not include any alternatives.
This type of rule is not strictly necessary because alternatives produced by the suffix rules are not likely to occur in any document. Suffix rules improve search performance because they prevent the generation of alternatives that would otherwise have to be looked up in the index files. Any words that appear in a thesaurus file that are also included in the stop file are not looked up.
You can specify a thesaurus search and character variant generation in the same query. Combining the content of the two files allows SearchServer to generate meaningful queries while still providing a thorough cross-matching of terms. The following rules apply:
NOTE The total number of terms that can result from a single search word can become very large when you are using several substitution rules at one time. SearchServer looks up each generated term which means that a large number of search words (more than a few hundred) can slow down response time to unacceptable levels even if only a few hits actually occur in a table.
If you want to allow for the possibility of typographical variants in the terms being used in a thesaurus search, you can include all possible variant forms in the LHS of each thesaurus rule. To save time, you can perform this function automatically by compiling the thesaurus file and the character variant file together.
Compile and test your customized thesaurus file using the fthmake and fthtest utilities.
The fthmake utility compiles the source file and enables you to name the object file. The fthtest utility is an interactive utility that lets you check the compiled object file and verify that the equivalent terms contained in the result set match the original search word.
The fthmake utility compiles your thesaurus by reading the source file and creating an object file. If you are using a character variant rules file in addition to the thesaurus file, make sure that the thesaurus lookup includes any typographical rules that are duplicated in the character variant file. This ensures that any duplicated rules are incorporated into the thesaurus object file and are subsequently ignored in the variant rules file.
Use the following syntax when invoking the fthmake utility:
fthmake sourcefile objectfile [-f text-reader_list] [-l rulesfile]
In the following example, you rebuild the sample thesaurus file SUPPORT.FTH, switch to the directory where the corresponding source file, SUPPORT.FTS is located and enter:
fthmake support.fts support.fth
You do not have to specify the -f parameter. The utility uses the default text reader (nti:s) to read the source file. If you select the translation text reader, the source text is translated to the FTICS equivalent. If the source is already in FTICS, you should use the standard text reader (s).
If the utility encounters any compilation errors, it generates a standard error message before it exits. Compilation errors include:
If the utility encounters a problem writing to any part of the object file, the following message appears:
Can't write objectfile
If the object file was created but writing has not completed due to an error, the object file is removed.
Use the fthtest utility to test your compiled object file. Use the following syntax to test the thesaurus expansion using one or more terms:
fthtest objectfile [term]
Specify the following command line to use the full capability of the utility:
fthtest term -h objectfile [-c table_name][-t outfilename][-l rulesfile]
The fthtest utility exits when it reaches the end of the input file or when you enter quit, press Enter (for MS-DOS), CTRL+Z (for 32-bit Windows), or CTRL+D (for UNIX).
The following messages indicate the results of the test:
The following is an example of an interactive test session using fthtest and the sample thesaurus file SUPPORT.FTH:
fthtest support.fth 237: enter term: pony 240: suffix: ponie's ponies pony 237: enter term: disc 238: synonym: disk disc disks floppy floppies diskette diskettes
The following example uses fthtest to test the interaction between the sample thesaurus file and a character variant rules file.
fthtest disc -h support.fth -l fultext.ftl
The search words generated in a search on the support table would include:
disk |
floppy |
The utility applies thesaurus expansion to the term disc first which produces the alternatives disk, disks, disc, discs, floppy, floppies, diskette, and diskettes. These alternative forms are then expanded using the character variant rules.
You can avoid overwriting any existing thesaurus files that you might have by compiling your new file into a temporary object file. Once you test it, you can copy it or rename it to replace the existing object file.
You can use the SilverStream Management Console (SMC) to configure these Fulcrum SearchServer properties:
For more information, see the chapter on maintaining SilverStream in the Administrator's Guide.
This section covers the following:
Examples of each of these types of searches along with a brief description occur later on this page.
A pattern is a character string that you use to search for words or phrases in a column. The pattern syntax is as follows:
::=character string literal_[escape clause] [escape clause] ::=ESCAPE quote [non quote character] quote
A pattern is formed like a character string but SearchServer interprets it differently. It is distinguished from a character string literal by its optional escape clause. SearchServer interprets a pattern differently depending on the index mode of the column being searched.
SearchServer recognizes the extent of a word based on the lexical rules of Latin languages, such as English. This means that a word is defined as any sequence of letters or digits delimited by white space (spaces, newlines, tabs, etc.) or punctuation characters. For example, you can enter a term as a complete or incomplete word, or you can embed a comma or a period in a numeric word to represent monetary values, as shown:
'1, 016.31'
A space and the following punctuation characters take on a special meaning when they are embedded in a pattern:
Use the escape character (|) to search for one of these characters in a table column.
SearchServer is not case-sensitive for alphabetic characters in a pattern or for search text. However, the case-sensitivity of pattern matching can be controlled for each table.
Each internal character set included with SearchServer has a set of parsing rules included with it. These parsing rules define how indexing treats each character in a character set.
The following table shows possible word and phrase matches for patterns used in a column.
NOTE These examples do not exactly reflect where the match codes would be placed for highlighting, The LITERAL index mode examples are assumed to be extracted from text where they are delimited by LITERAL mode separator characters (for example tab or newline characters). Because this table is meant to only illustrate various possibilities, some of the search terms cannot be verified using the SUPPORT table.
You can use special characters in your search queries that are interpreted differently when you embed them in a pattern.
SearchServer ignores accented characters by default. When it encounters an accented character in a pattern or in column data, it ignores the accent and retains the unaccented character. For example, if you issued the following search query:
candidate fullTextSearch "'resume'" The result set would include the following words: resume resumé
This is the simplest type of search. When you include a single search tem in your query, the search engine returns the table rows that contain that term. In the following example, the query searches the document table for the word "rutabaga."
document fullTextSearch "'rutabaga'"
The result set contains all the table rows where this word occurs.
The following example shows what a wildcard query statement looks like:
candidates fullTextSearch "'respons%'"
This statement tells SearchServer to search the candidates table for all words beginning with the string respons. The result set includes:
Response Responsibility Responsible Responsive
The percent sign acts as the wildcard character in this example. It represents a string of characters. You can use it anywhere within a word in a query. If you want to embed the percent sign as a literal in a word or phrase, you must preface it with the backslash (\) escape character. You can also use the underscore character as a wildcard. It represents a single character, as shown in the following example:
candidates fullTextSearch "'respons_'"
The result set for this search only contains one word: response.
You can search for text strings as well as individual words. In the following example, the query searches the document table for instances of the phrase "Now is the winter of our discontent." You must surround the phrase you want to search for with single quotes.
document fullTextSearch "'now is the winter of our discontent'"
You can combine words or predicates in a search query by using the AND clause, the ampersand character (&), the pipe (|) character, or the tilde (~). The following example searches the document table for occurrences of the words Shakespeare and Marlowe
document fullTextSearch "'Shakespeare'|'Marlowe'"
You can calculate the relevance of each row in a table by using the relevance function. The following table describes the relevance ranking options.
The data type for the value that is returned by the relevance predicate is INTEGER. The value can either be null or a positive integer. The minimum value is one. The maximum value depends on which option you specify in the query. When you do not specify an option the return value is null.
The search query in the following example searches the candidate table for the most occurrences of the word please:
candidates fullTextSearch "'shakespeare'|'marlowe' order by relevance('2:1')"
The SearchServer search and retrieval engine enables you to search a table for occurrences of words and their equivalents. When you install SearchServer, a standard thesaurus file containing common words is provided. As described earlier, you can customize this file to add words of your own, or you can create a new thesaurus file.
The following example shows a query statement that searches the resume column of the candidates table using the word_synonym option.
candidates fullTextSearch "thesaurus('applicant' word_synonym)"
The result set contains all the table rows that contain the word "applicant" and its equivalent.
This example shows a query statement that searches the candidates table using the word_suffix option:
candidates fullTextSearch "thesaurus('applicant', word_suffix)"
The result could include the following words:
applicants applicant's applicants'
The following example shows a query statement using the word_similarity option. This option combines the word_suffix and word_synonym options. It gives synonym processing priority over suffix processing. If there is no synonym match, there is no further search for an additional suffix match. If there is no synonym match, then it performs suffix processing.
candidate fullTextSearch "thesaurus('applicant', word_similarity)"
The result set could contain the following:
candidate candidates applicant applicant's
The word_broaden and word_narrow options are equivalent to the word_synonym option. They are included for clarity if the thesaurus file is intended to broaden or narrow the specified term, as shown in the following example:
candidates fullTextSearch "thesaurus('applicant', word_narrow)"
The result set could include the following:
applicant candidate
The proximity predicate enables you to test for the proximity of multiple search lists. SearchServer determines proximity by counting the indexed characters from the end of one search term or phrase to the beginning of another. This predicate evaluates to TRUE if the search terms are within the specified distance.
In the following example, the documents table is searched for the proximity of the terms foo and bar.
documents fullTextSearch "'foo' within 10 characters of 'bar'"
SearchServer searches for any occurrence of the specified words regardless of the order in which they appear in the table. If you specify the IN_ORDER option in your search query, SearchServer searches for the words in the exact order they appear in your query.
You can create search queries in two ways:
You can use the Property Inspector to create a full text search form.
To create a full text search form:
candidates fullTextSearch "thesaurus ('address', word_synonym)"
You can create search queries for the following data loaded controls on a form:
To create search queries for data-loaded controls:
To build search queries for bands within a view:
To create a search query for a business object:
You can create search queries programmatically using the Programming Editor. You could use this method to create queries for buttons on a form, for example.
To build search queries using the Programming Editor:
String searchstr=field1.getText(); try { agData.query("tablename fullTextSearch \"'"+ searchstr + "'\""); } catch (Agoexception e) { agDialog.displayError (e); }