Understanding the Internal Rewriter

iChain's internal rewriter is used to accomplish the following:


Which URL References Will Be Rewritten?

Web content passing through the accelerator which meets certain criteria (see What Other Criteria is Considered?) is parsed and searched by the internal rewriter for URL references qualified to be rewritten. URL references will be rewritten only under the following conditions:


What Other Criteria is Considered?

The following other criteria is considered for URL references to be rewritten:


Query Strings

The internal rewriter will not rewrite URL references contained within query strings. Only the host name portion of the reference is evaluated for rewriting.


HTTP Headers

The internal rewriter will rewrite qualified URL references occurring within certain types of HTTP response headers such as "Location" and "Content-Location". The Location header is used to redirect the browser to where the resource can be found. The Content-Location header is used to provide an alternate location where the resource can be found.


JavaScript

Within JavaScript, only absolute references are evaluated for rewriting. Relative references and absolute paths are not attempted. Absolute paths ("/path/file.html") are evaluated if the file is read from a path-based multi-homing accelerator's origin Web server and the reference follows an html tag. For example, the string "href='/path/file.html'" would be rewritten to "href='/accelPath/path/file.html'".


HTML Tags

URL references occurring within the following HTML tags are evaluated for rewriting:


URL References Inside Delimiters

The internal rewriter is a word parser and does not do character lookup and replacement (as the custom rewriter does). The word parser searches for URL references in HTML bodies only within specific beginning and ending delimiter values. For example, if file http://internal.web.site.com/file1.html contains a URL reference surrounded by double quotes such as "http://internal.web.site.com", the reference would be discovered and evaluated for rewriting because it is enclosed within a recognized delimiter. The following is a complete list of delimiter values recognized by the rewriter:

= 
-
/
;
+
:
~
,
!
<
>
(
)
:
'
<Space>
<CR>
<LINEFEED>
<TAB>

Mime Types

The rewriter will parse pages with certain Mime Content-Types regardless of the file extension. By default, the internal rewriter will parse pages with the following Mime Content-Types:

If an HTTP/HTTPS response has a Mime Content-Type set to any of the above types, or if the file extension is one of the following: html, htm, shtml, jhtml, asp, jsp, or NO EXTENSION, the page will be parsed for possible rewriting.


Absolute and Relative References

An absolute reference is a reference which has all the information needed to locate a resource including the host name. For example, http://internal.web.site.com/index.html. The internal rewriter always attempts to rewrite absolute references.

A relative reference is a reference which assumes the host/path and provides only the resource of the URI. For example, "index.htm". The internal rewriter does not attempt to rewrite a relative reference.

An absolute path is a reference that assumes the host. It provides the complete path, including the resource. The internal rewriter attempts to rewrite an absolute path only when it is defined in a path based multi-homed accelerator. For example, "/docs/file1.html".


Path-based Multi-homing

With an accelerator configured for path-based multi-homing, absolute references and absolute paths are evaluated for rewriting. Relative references are not attempted.


Configuring the Internal Rewriter

The behavior of the internal rewriter can be controlled through use of configuration file sys:/etc/proxy/rewriter.cfg. This section provides information on the parameters which can be used within this file.

Several configuration sections can be added to rewriter.cfg including [Extension], [Mime Content-Type], [Alias Host Names], and [Internal Rewriter]. Each section is detailed below.

NOTE:  You should take note of the following conditions when configuring the internal rewriter:

1. The rewriter.cfg configuration file does not support comments (;-- or #--).

2. Sections within the file must be separated with two returns or empty lines.

3. Changes to the rewriter configuration file in sections [Extension] and [Mime Content-Type] require a purgecache to become effective. For changes made to the [Alias Host Names] and [Internal Rewriter] sections, iChain must be restarted in order for the changes to take effect.


Mime Types

In addition to files with extensions listed in the File Extensions section below, the rewriter will parse pages with certain Mime Content-Types regardless of the file extension. By default, the internal rewriter will parse pages with the following Mime Content-Types:

Additional mime types can be specified in the [Mime Content-Type] section of rewriter.cfg. The following is an example of how to use this section:

[Mime Content-Type]type=image/png

NOTE:  The internal rewriter is coded to handle pages formatted in html. In the latest iChain version (iChain 2.2 with Support Pack 2), the rewriter has been changed so that it will honor the Mime Types discussed above, and in the rewriter.cfg file. It now ignores extensions altogether unless you created an empty section: [Old mode] in the rewriter.cfg file. Creating this empty section will force the rewriter to operate as it did in earlier iChain versions.


File Extensions

The internal rewriter by default will only parse files with the following extensions:

Additional file extensions can be added by using the [Extension] section of rewriter.cfg. Use of this section is shown below:

[Extension]home, myNewExtensionanotherLongExtension 
a,b,c,d,e,f,g

As shown in this example, additional extensions can be specified on individual lines or by using commas to separate multiple extensions specified on a single line. All of the extensions listed will be appended to the default list of seven shown above. You cannot remove the default extensions.


Alias Host Names

Sometimes a URL reference specifies a host name which does not meet default criteria for being rewritten (i.e. does not match the accelerator's "Alternate host name" or any value in the "Web server address" list). For example, say a URL reference contains the host name of "home" (http://home/index.html), and "home" is not included in the "Web server address" list because it is not resolvable, nor is it the value of the accelerator's "Alternate host name" field. By default, rewriting of the URL reference http://home/index.html would not occur. The [Alias Host Names] section of rewriter.cfg can be used to specify additional host names to be rewritten. The following is an example of how to use the [Alias Host Names] section:

[Alias Host Names]AcceleratorName=aliasName

where AcceleratorName is the value specified in the accelerator's Name field, and aliasName is the string which will be rewritten with the value specified in the accelerator's DNS name field.

For the "home" example used above, if the accelerator name is accel2 and the URL reference to be rewritten is the host name "home", the correct syntax is:

[Alias Host Names]accel2=home

NOTE:  The alias names are not case-sensitive because host names should not be case-sensitive.


Disabling the Internal Rewriter

There are three methods you can use to disable the internal rewriter:


Totally Disabling, Per Accelerator

By default, the internal rewriter is enabled for all accelerators. The internal rewriter can slow performance due to the overhead of parsing. In some cases, a Web site might not have content with URL references that need to be rewritten. The internal rewriter can be disabled on a per-accelerator basis using the following set command on the command line of the iChain machine. The following is an example of how you would use this command:

SET ACCELERATOR <name> DisableRewriter=Yes

where AcceleratorName is the name of the accelerator for which you want to disable rewriting. This action is permanent upon reboot and is exported to the .nas file.


Disabling Per URL

An additional section is supported in the rewriter.cfg file. It allows you to specify a list of URLs which are to be excluded by the rewriter. An example is as follows:

[exclude]

http://www.abc.com/xyz/*

http://www.abc.com/donotrewrite.html

As shown in this example, this exclusion causes all pages in the xyz subdirectory and the donotrewrite.html file to be left untouched. The syntax of the URLs requires them to be prefixed by http, and the domain name of the accelerator also must be defined.

The syntax of this section is the same as it is for protected resources, as discussed earlier.


Disabling In a Page

In some circumstances, you might find that you need more granularity. There are cases when only part of a page cannot/shouldn't be rewritten. Although this deviates from the premise of iChain that you shouldn't have to modify the origin server, you might encounter circumstances where it cannot be avoided.

In these cases, you can use the following tags in your origin pages.

For example:

<!-NOVELL_REWRITER_OFF-->..HTML data not to be rewritten..<!-NOVELL_REWRITER_ON-->

These tags are seen by browsers as a comment mark, and will not show up on the screen (except possibly on older browser versions). Also, the last tag is optional, and if omitted, it will prevent the rest of the page from being rewritten after the initial tag is encountered.


Internal Rewriter Summary