Access Gateway configurations generally require HTML rewriting because the Web servers are not aware that the Access Gateway machine is obfuscating their DNS names. URLs contained in their pages must be checked to ensure that these references contain the DNS names that the client browser understands. On the other end, the client browsers are not aware that the Access Gateway is obfuscating the DNS names of the resources they are accessing. The URL requests coming from the client browsers that use published DNS names must be rewritten to the DNS names that the Web servers expect. Figure 13-3 illustrates these processes.
Figure 13-3 HTML Rewriting
The following sections describe the HTML rewriting process:
The Access Gateway needs to rewrite URL references under the following conditions:
To ensure that URL references contain the proper scheme (HTTP or HTTPS).
If your Web servers and Access Gateway machines are behind a secure firewall, you might not require SSL sessions between them, and only require SSL between the client browser and the Access Gateway. For example, an HTML file being accessed through the Access Gateway for the Web site novell.com might have a URL reference to http://novell.com/path/image1.jpg. If the reverse proxy for novell.com/path is using SSL sessions between the browser and Access Gateway, the URL reference http://novell.com/path/image1.jpg must be rewritten to https://novell.com/path/image1.jpg. Otherwise, when the user clicks this link, the browser bounces between HTTP and HTTPS to establish a new SSL session.
To ensure that URL references containing private IP addresses or private DNS names are changed to the published DNS name of the Access Gateway or hosts.
For example, suppose that a company has an internal Web site named data.com, and wants to expose this site to Internet users through the Access Gateway using a published DNS name of novell.com. Many of the HTML pages on this Web site have URL references that contain the private DNS name, such as http://data.com/imagel.jpg. Because Internet users are unable to resolve data.com/imagel.jpg, links using this URL reference would return DNS errors in the browser.
The HTML rewriter can resolve this issue. The DNS name field in the Access Gateway configuration is set to novell.com, which users can resolve through a public DNS server to the Access Gateway. The rewriter parses the Web page, and any URL references matching the private DNS name or private IP address listed in the Web server address field of the Access Gateway configuration are rewritten to the published DNS name novell.com and the port number of the Access Gateway.
Rewriting URL references addresses two issues: 1) URL references that are unreachable because of the use of private DNS names or IP addresses are now made accessible and 2) Rewriting prevents the exposure of private IP addresses and DNS names that might be sensitive information.
To ensure that the Host header in incoming HTTP packets contains the name understood by the internal Web server.
Using the example in Figure 13-3, suppose that the internal Web server expects all HTTP or HTTPS requests to have the field set to data.com. When users send requests using the published DNS name novell.com/path, the field of the packets in those requests received by the Access Gateway is set to novell.com. The Access Gateway can be configured to rewrite this public name to the private name expected by the Web server by setting the option to data.com. Before the Access Gateway forwards packets to the Web server, the field is changed (rewritten) from novell.com to data.com. For information about configuring this option, see Configuring the Web Servers of a Proxy Service.
The rewriter searches for URLs in the following HTML contexts. They must meet the following criteria to be rewritten:
The rewriter parses and searches the Web content that passes through the Access Gateway for URL references that qualify to be rewritten. URL references are rewritten when they meet the following conditions:
URL references containing DNS names or IP addresses matching those in the Web server address list are rewritten with the
.URL references matching the
are rewritten with the .URL references matching entries in the
of the host are rewritten with the . The does not need to be included in this list.The DNS names in the
specify the names that the rewriter should skip and not rewrite.The following sections describe the conditions to consider when adding DNS names to the lists:
Sometimes Web pages contain URL references to a host name that does not meet the default criteria for being rewritten. That is, the URL reference does not match the Figure 13-4 illustrates a scenario that requires an entry in the .
or any value (IP address) in the . If these names are sent back to the client, they are not resolvable.Figure 13-4 Rewriting a URLs for Web Servers
The page on the data.com Web server contains two links, one to an image on the data.com server and one to an image on the graphics.com server. The link to the data.com server is automatically rewritten to novell.com, when rewriting is enabled. The link to the image on graphics.com is not rewritten, until you add this URL to the
. When the link is rewritten, the browser knows how to request it, and the Access Gateway knows how to resolve it.You need to include names in this list if your Web servers have the following configurations:
If you have a cluster of Web servers that are not sharing the same DNS name, you need to add their DNS names to this list.
If your Web server obtains content from another Web server, the DNS name to this additional Web server needs to rewritten.
If the Web server listens on one port (for example, 80), and redirects the request to a secure port (for example, 443). The response to the user comes back on https://<DNS_name>:443. This does not match the request which was sent on http://<DNS_name>:80. If you add the DNS name to the list, the response can be sent in the format that the user expects.
If an application is written to use a private host name. For example, assume that an application URL reference contains the host name of home (http://home/index.html). This host name would need to be added to the .
If you enable the
option on your path-based multi-homing service and your Web server is configured to use a different port, you need to add the DNS name with the port to the .For example, if the public DNS name of the proxy service is www.mylag.com, the path for the path-based multi-homing service is /sales, and the Web server port is 801, the following DNS name needs to be added to the
of the /sales service:http://www.mylag.com:801
When you enter a name in the list, it can use any of the following formats:
DNS_name host_name IP_address scheme://DNS_name scheme://IP_address scheme://DNS_name:port scheme://IP_address:port
For example:
HOME https://www.backend.com https://10.10.15.206:444
These entries are not case sensitive.
If you have two reverse proxies protecting the same Web server, the rewriter correctly rewrites the references to the Web server so that browser always uses the same reverse proxy. In other words, if the browser requests a resource using acme.com.uk, the response is returned with references to acme.com.uk and not acme.com.usa. If you have a third reverse proxy protecting a Web server, the rewriting rules can become ambiguous. For example, consider the configuration illustrated in Figure 13-5.
Figure 13-5 Excluding URLs
A user accesses data.com through the published DNS name of novell.com.mx. The data.com server has references to product.com. The novell.com.mx proxy has two ways to get to the product.com server because this Web server has two published DNS names (novell.com.uk and novell.com.usa). The rewriter could use either of these names to rewrite references to product.com.
If you want all users coming through novell.com.mx to use the novell.com.usa proxy, you need to block the rewriting of product.com to novell.com.uk. On the HTML Rewriting page of the reverse proxy for novell.com.uk, add product.com and any aliases to the
.If you do not care which proxy is returned in the reference, you do not need to add anything to the
.An HTML rewriter profile allows you to customize the rewriting process and specify which profile is selected to rewrite content on a page. This section describes the following features of the rewriter profile:
The Access Gateway allows you to define two types of profiles:
A Word profile searches for matches on words. For example, “get” matches the word “get” and any word that begins with “get” such as “getaway” but it does not match the “get” in “together” or “beget.”
The Access Gateway has a default Word profile. It is not specific to a reverse proxy or its proxy services. When you modify its behavior, remember its scope.
If you enable HTML rewriting, but do not define a Word profile for the proxy service, the default Word profile is used. This profile is preconfigured to rewrite the
and any other names listed in the . The preconfigured profile matches all URLs with the following content-types:If this default behavior does not match your requirements for a particular page, create your own Word profile and position it before the default profile in the list of profiles. Only one Word profile is applied per page. The first Word profile that matches the page is applied. Profiles lower in the list are ignored.
For information about how strings are replaces in a Word profile, see the following:
A Character profile searches for matches on a specified set of characters. For example, “top” matches the word “top” and the “top” in “tabletop,” “stopwatch,” and “topic.”
If need functionality not provided by the default profile, create a Character profile. If you create multiple Character profiles, order is important. The first Character profile that matches the page is applied. Profiles lower in the list are ignored.
For information on how strings are replaced in a Character profile, see String Replacement Rules for Character Profiles.
You specify the following matching criteria for selecting the profile:
The URLs to match
The URLs that cannot match
The content types to match
You use the
section of the profile to set up the matching policy.URLs: The URLs specified in the policy should use the following formats:
You can specify two types of URLs. In the
list, you specify the URLs of the pages you want this profile to match. In the list, you specify the URLs you don’t want this profile to match. You can use the asterisk wildcard for a URL in the list that matches pages you really don’t want this profile to match, then use a URL in the list to exclude them from matching. If a page matches both a URL in the list and in the list, the profile does not match the page.For example, you could specify the following URL in the
list:http://www.a.com/*
You could then specify the following URL in the
list:http://www.a.com/content/*
These two entries cause the profile to match all pages on the www.a.com Web server except for the pages in the /content directory and its subdirectories.
IMPORTANT:If nothing is specified in either of the two lists, the profile skips the URL matching requirements and uses the content-type to determine if a page matches.
Content-Type: In the text/dns. Search your Web pages for content-types to determine if you need to add new types. To add multiple values, enter each value on a separate line.
section, you specify the content-types you want this profile to match. To add a new content-type, click and specify the name such asRegardless of content-type, the page matches if the file extension is html, htm, shtml, jhtml, asp, or jsp.
The rewriter action section of the profile determines the actions the rewriter performs when a page matches the profile. Select from the following:
Strip Path Actions: A profile might require the strip path options if the proxy service has the following characteristics:
It is a path-based multi-homing proxy.
The
option has been enabled.URLs appear in query strings or Post Data.
If your profile needs to match pages from this type of proxy server, you might need to enable the
and options.The strip path options are not available for a Character profile. If the proxy service is not a path-based multi-homing proxy, the strip path options have no effect.
Enabling or Disabling Rewriting: The
option determines whether the rewriter performs any actions:Select the option to have the rewriter rewrite the references and data on the page.
Leave the option unselected to disable rewriting. This allows you to create a profile for the pages you do not want rewritten.
Replacing URLs in JavaScript Variables and HTML Attributes: The HTML Tags. You might want to add the following attributes:
list allows you to specify the HTML attributes or JavaScript variables that you want searched for DNS names that might need to be rewritten. For the list of HTML attribute names that are automatically searched, seevalue: This attribute enables the rewriter to search the <param> elements on the HTML page for value attributes and rewrite the value attributes that are URL strings.
If you need more granular control (some need to be rewritten but others do not) and you can modify the page, see Disabling with Page Modifications.
formvalue: This attribute enables the rewriter to search the <form> element on the HTML page for <input>, <button>, and <option> elements and rewrite the value attributes that are URL strings. For example, if your multi-homing path is /test and the form line is <input name="navUrl" type="hidden" value="/IDM/portal/cn/GuestContainerPage/656gwmail">, this line would be rewritten to the following value before sending the response to the client:
<input name="navUrl" type="hidden" value="/test/IDM/portal/cn/GuestContainerPage/656gwmail">
The formvalue attribute enables the rewriting of all URLs in the <input>, <button>, and <option> elements in the form. If you need more granular control (some need to be rewritten but others do not) and you can modify the form page, see Disabling with Page Modifications.
This option is not available for a Character profile.
Replacing URLs in Java Methods: The
list allows you to specify the Java methods to search to see if their parameters contain a URL string.This option is not available for a Character profile.
String Replacement: The
list allows you to search for a string and replace it.When defining a rewriter profile, you should try to put all the string actions into the Word profile. When a Word profile and a Character profile both match the same URL, you need to ensure that they do not contain search and replace actions for overlapping strings. The results of such actions are unpredictable.
For example, if your Word profile has an action to search for Doodle and replace this string with Artwork and your Character profile has an action to search for Doo and replace this string with Zoo, the results are unpredictable. If you place both search and replace actions in the Word profile, the results are predictable.
For the rules and tokens that can be used in the search strings, see the following:
For information on how the Using $path to Rewrite Paths in JavaScript Methods, Parameters, or Variables.
list can be used to reduce the number of Java methods you need to list, seeIn a Word profile, a string matches all paths that start with the characters in the specified string. For example:
Search String |
Matches This String |
Doesn’t Match This String |
---|---|---|
/path |
/path /pathother /path/other /path.html |
/mypath |
You can use the following special tokens to modify the default matching rules:
[w] to match one white space character
[ow] to match 0 or more white space characters
[ep] to match a path element in a URL path, excluding words that end in a period
[ew] to match a word element in a URL path, including words that end in a period
[oa] to match one or more alphanumeric characters
White Space Tokens: You use the [w] and the [ow] tokens to specify where white space might occur in the string. For example:
[ow]my[w]string[w]to[w]replace[ow]
If you don’t know, or don’t care, whether the string has zero or more white characters at the beginning and at the end, use [ow] to specify this. The [w] specifies exactly one white character.
Path Tokens: You use the [ep] and [ew] tokens to match path strings. The [ep] token can be used to match the following types of paths:
Search String |
Matches This String |
Doesn’t Match This String |
---|---|---|
/path[ep] |
/path /home/path/other |
/path.html /home/pathother |
The [ew] token can be used to match the following types of paths:
Name Tokens: You use the [oa] token to match function or parameter names that have a set string to start the name and end the name, but the middle part of the name is a computer-generated alphanumeric string. For example, the [oa] token can be used to match the following types of names:
When you configure multiple strings for replacement, the rewriter uses the following rules for determining how characters are replaced in strings:
String replacement is done as a single pass.
String replacement is not performed recursively. Suppose you have listed the following search and replacement strings:
DOG to be replaced with CAT A to be replaced with O
All occurrences of the string DOG are replaced with CAT, regardless of whether it is the word DOG or the word DOGMA. Only one replacement pass occurs. The rewritten CAT is not replaced with COT.
Because string replacement is done in one pass, the string that matches first takes precedence. Suppose you have listed the following search and replacement strings:
ABC to be replaced with XYZ BCDEF to be replaced with PQRSTUVWXYZ
If the original string is ABCDEFGH, the replaced string is XYZDEFGH.
If two specified search strings match the data portion, the search string of longer length is used for the replacement except for the case detailed above. Suppose you have listed the following search and replacement strings:
ABC to be replaced with XYZ ABCDEF to be replaced with PQRSTUVWXYZ
If the original string is ABCDEFGH, the replaced string is PQRSTUVWXYZGH.
You can use the $path token to rewrite paths on a path-based multi-homing service that has the
option enabled. This token is useful for Web applications that require a dedicated Web server and are therefore installed in the root directory of the Web server. If you protect this type of application with Access Manager using a path-based multi-homing proxy service, your clients access the application with a URL that contains a /path value. The proxy service uses the path to determine which Web server a request is sent to, and the path must be removed from the URL before sending the request to the Web server.The application responds to the requests. If it uses JavaScript methods, parameters, or variables to generate paths to resources, these paths are sent to client without prepending the path for the proxy service. When the client tries to access the resource specified by the Web server path, the proxy service cannot locate the resource because the multi-homing path is missing. The figure below illustrates this flow with the rewriter adding the multi-homing path in the reply.
Figure 13-6 Rewriting with a Multi-homing Path
To make sure all the paths generated by JavaScript are rewritten, you must search the Web pages of the application. You can then either list all the JavaScript methods, parameters, and variables in the
section of the rewriter profile, or you can use the $path token in the section. This token, which is a shortcut for the multi-homing path, together with the and actions, usually can find all the paths that need to rewritten. If nothing else, it reduces the number of JavaScript methods, parameters, and variables that you otherwise need to list individually.To use the $path token, you add a search string and a replace string that uses the token. For example, if the /prices/pricelist.html page is generated by JavaScript and the multi-homing path for the proxy service is /inner, you would specify the following stings:
This configuration allows the following paths to be rewritten.
Table 13-2 Rewriting Strings Sent from the Web Server to the Browser
Web Server String |
Rewritten String for the Browser |
---|---|
/prices/pricelist.html |
/inner/prices/pricelist.html |
/prices |
/inner/prices |
If the
or option is enabled, the search and replace strings allow the following paths to be rewritten.You configure the HTML rewriter for a proxy service, and these values are applied to all Web servers that are protected by this proxy service.
To configure the HTML rewriter:
In the Administration Console, click
> > > > > .The HTML Rewriting page specifies which DNS names are to be rewritten. The HTML Rewriter Profile specifies which pages to search for DNS names that need to be rewritten.
Select
.This option is enabled by default. When it is disabled, no rewriting occurs.When enabled, this option activates the internal HTML rewriter. This rewriter replaces the name of the Web server with the published DNS name when sending data to the browsers. It replaces the published DNS name with the
when sending data to the Web server. It also makes sure the proper scheme (HTTP or HTTPS) is included in the URL. This is needed because you can configure the Access Gateway to use HTTPS between itself and client browsers and to use HTTP between itself and the Web servers.In the
section, click , specify a DNS that appears on the Web pages of your server (for example a DNS name other than the Web server’s DNS name), then click .For more information, see Determining Whether You Need to Specify Additional DNS Names.
In the
section, click , specify a DNS name that appears on the Web pages of your server that you do not want rewritten, then click .For more information, see Determining Whether You Need to Exclude DNS Names from Being Rewritten.
Use the
to configure a profile. Select one of the following actions:New: To create a profile, click Step 6.
. Specify a display name for the profile and select either a or for the . Continue withWord: A Word profile searches for matches on words. For example, “get” matches the word “get” and any word that begins with “get” such as “getaway” but it does not match the “get” in “together” or “beget.”
If you create multiple Word profiles, order is important. The first Word profile that matches the page is executed. Profiles lower in the list are ignored.
Character: A Character profile searches for matches on a specified set of characters. For example, “top” matches the word “top” and the “top” in “tabletop,” “stopwatch,” and “topic.”
If you want to add functionality to the default profile, create a Character profile. It has all the functionality of a Word profile, except searching for attribute names and Java variables and methods. If you create multiple Character profiles, order is important. The first Character profile that matches the page is executed. Profiles lower in the list are ignored.
Delete: To delete a profile, select the profile, then click Step 13.
. Continue withEnable: To enable a profile, select the profile, then click Step 13.
. Continue withDisable: To disable a profile, select the profile, then click Step 13.
. Continue withModify: To view or modify the current configuration for a profile, click the name of the profile. Continue with Step 6.
The default profile is designed to be applied to all pages protected by the Access Gateway. It is not specific to a reverse proxy or its proxy services. If you modify its behavior, remember its scope. Rather than modify the default profile, you should create your own customized Word profile and enable it
Use the
section to set up a policy for specifying the URLs you want this profile to match.Fill in the following fields:
If Requested URL Is: Specify the URLs of the pages you want this profile to match. Click
to add a URL to the text box. To add multiple values, enter each value on a separate line.And Requested URL Is Not: Specify the URLs of pages that this profile should not match. If a page matches the URL in both the
list and list, profile does not match the page. Click to add a URL to the text box. To add multiple values, enter each value on a separate line.And Document Content-Type Is: Select the content-types you want this profile to match. To add a new content-type, click text/dns. Search your Web pages for content-types to determine if you need to add new types. To add multiple values, enter each value on a separate line.
and specify the name such asFor more information on how to use these options, see Page Matching Criteria for Rewriter Profiles.
Use the Actions section to specify the actions the rewriter should perform if the page matches the criteria in the
section.Configure the following actions:
Strip Path from Query String: (Not available for Character profiles) Select this option to remove the path from the query string. To use this option, your proxy service must meet the conditions listed in Possible Actions for Rewriter Profiles.
Strip Path from Post Data: (Not available for Character profiles) Select this option to remove the path from the Post Data command. To use this option, your proxy service must meet the conditions listed in Possible Actions for Rewriter Profiles.
Enable Rewriter Actions: Select this action to enable the rewriter to perform any actions:
Select it to have the rewriter use the profile to rewrite references and data on the page. If this option is not selected, you cannot configure the action options.
Leave it unselected to disable rewriting. This allows you to create a profile for the pages you do not want rewritten.
(Not available for Character profiles) If your pages contain JavaScript, use the HTML Tags.)
section to specify JavaScript variables or methods. You can also add HTML attribute names. (For the list of attribute names that are automatically searched, seeFill in the following fields:
Variable or Attribute Name to Search for Is: Lists the name of an HTML attribute or JavaScript variable to search to see if its value contains a URL string. Click
to add a name to the text box. To add multiple values, enter each value on a separate line.JavaScript Method to Search for Is: Lists the names of Java methods to search to see if their parameters contain a URL string. Click
to add a method to the text box. To add multiple values, enter each value on a separate line.Use the
section to specify a string to search for and specify the text it should be replaced with. The search boundary (word or character) that you specified when creating the profile is used when searching for the string.To add a string, click
, then fill in the following:Search: Specify the string you want to search for. The profile type controls the matching and replacement rules. For more information, see one of the following:
Replace With: Specify the string you want to use in place of the search string.
Click
.If you have more than one profile in the
, use the up-arrow and down-arrow buttons to order the profiles.If you create more than one profile, order becomes important. For example if you want to rewrite all pages with a general rewriter profile (with a URL such as /*) and one specific set of pages with another rewriter profile (with a URL such as /doc/100506/*), you need to have the specific rewriter profile listed before the general rewriter profile. Only one Word profile and one Character profile are executed per page.
Even if multiple Word or Character profiles are enabled, only a maximum of one Word and one Character profile is executed per page. The first one in the list that matches a page is executed, and the others are ignored.
Enable the profiles you want to use for this protected resource. Select the profile, then click
.The default profile cannot be disabled. However, it is not executed if you have enabled another Word profile that matches your pages, and this profile comes before the default profile in the list.
To save your changes to browser cache, click
.To apply your changes, click the
link, then click > .The cached pages affected by the rewriter changes must be updated on the Access Gateway. Do one of the following:
If the changes affect numerous pages, click
, select the name of the server, then click > .If the changes affect only a few pages, you can update them from a browser. Access the page, then press Ctrl+Shift+Refresh to force a refresh of the page.
There are three methods you can use to disable the internal rewriter:
By default, the rewriter is enabled for all proxy services. The rewriter can slow performance because of the parsing overhead. In some cases, a Web site might not have content with URL references that need to be rewritten. The rewriter can be disabled on the proxy service that protects that Web site.
In the Administration Console, click
> > > > > .Deselect the
option, then click .To apply your changes, click the
link, then click > .Select the Access Gateway, then click
> > .You can also specify a list of URLs that are to be excluded from being rewritten for the selected proxy service.
In the Administration Console, click
> > > > > .Click the name of the Word profile defined for this proxy service.
If you have not defined a custom Word profile for the proxy service, you might want to create one. If you modify the default profile, those changes are applied to all proxy services.
In the
section, click , then specify the names of the URLs you do not want rewritten.Specify each URL on a separate line.
Click
twiceIn the
, make sure the profile you have modified is enabled and at the top of the list, then click .To apply your changes, click the
link, then click > .Select the Access Gateway, then click
> > .There are cases when the URLs in only part of a page or in some of the JavaScript or form can be rewritten and the rest should not be rewritten. When this is the case, you might need to modify the content on the Web server. Although this deviates from the design behind Access Manager, you might encounter circumstances where it cannot be avoided.
You can add the following types of tags to the pages on the Web server:
These tags are seen by browsers as a comment mark, and do not show up on the screen (except possibly on older browser versions).
NOTE:If the pages you modify are cached on the Access Gateway, you need to purge the cache before the changes become effective.
Page Tags: In the case where you want only portions of a page rewritten, you can add the following tags to the page.
<!--NOVELL_REWRITER_OFF--> . . HTML data not to be rewritten . . <!--NOVELL_REWRITER_ON-->
The last tag is optional, and if omitted, it prevents the rest of the page from being rewritten after the initial tag is encountered.
Param Tags: Sometimes the JavaScript on the page contains <param> elements that contain a value attribute with a URL. You can enable global rewriting of this attribute by adding value to the list of variable and attribute names to search for. If you need more control because some URLs need to be rewritten but others cannot be rewritten, you can turn on and turn off the value rewriting by adding the following tags before and after the <param> element in the JavaScript.
<!--NOVELL_REWRITE_ATTRIBUTE_ON='value'--> . . <param> elements to be rewritten . . <!--NOVELL_REWRITE_ATTRIBUTE_OFF='value'--> . . <param> elements that shouldn’t be rewritten
Form Tags: Some applications have forms in which the <input>, <button>, and <option> elements contain a value attribute with a URL. You can enable global rewriting of these attributes by adding formvalue to the list of variable and attribute names to search for. If you need more control because some URLs need to be rewritten but others cannot be rewritten, you can turn on and turn off the formvalue rewriting by adding the following tags before and after the <input>, <button>, and <option> elements in the form.
<!--NOVELL_REWRITE_ATTRIBUTE_ON='formvalue'--> . . <input>, <button>, and <option> elements to be rewritten . . <!--NOVELL_REWRITE_ATTRIBUTE_OFF='formvalue'--> . . <input>, <button>, and <option> elements that shouldn’t be rewritten