What To Know About Tuning Clusters And iSCSI

  • 3839149
  • 07-Feb-2007
  • 27-Apr-2012

Environment

NetWare Cluster Services 1.7
Novell NetWare Cluster Services 1.6
Novell NetWare 6.0
Novell NetWare 6.5
Poor man's SAN
SAN
iSCSI
Tuning

Situation

What To Know About Tuning Clusters And iSCSI

Resolution

WARNING:Default settings in a clustered environment should be suitable for most situations. However, there are certain parameters that can be adjusted as not all environments are running in a perfect world. Do not change parameters unless you fully understand the change or instructed by a Novell Engineer.

NOTE:Before making changes, make sure to have all of the latest patches/support packs applied (this includes iSCSI modules if applicable).

GENERAL:This TID will be divided into 3 sections. The first section will talk about clustering in general (with or without iSCSI). The second section will concentrate on tuning cluster resources (with or without iSCSI). The third section will address "Things to know" about iSCSI. This TID does NOT address installation of the cluster or iSCSI. Please refer to the documentation for that information. https://www.novell.com/documentationwww.novell.com/documentation

Table of Contents:

Section I: CLUSTERING

Section II: Cluster Resources

Section III: iSCSI

SECTION I: CLUSTERING

PROPERTIES OF THE CLUSTER OBJECT (Screenshots are from ConsoleOne. The same parameters can also be modified under NetWare Remote Manager or iManager).

Inside of the properties you will see multiple tabs. We will only walk through some of them as seen below:

Table of contents

Membership (Number of Nodes)

The Quorum Membership is the number of nodes that must be running in the cluster before resources will start to load. When you first bring up servers in your cluster, Novell Cluster Services reads the number specified in the Membership field and waits until that number of servers is up and running in the cluster before it starts loading resources.

Set the Membership value to a number greater than 1 so that all resources don't automatically load on the first server that is brought up in the cluster. For example, if you set the Membership value to 4, there must be four servers up in the cluster before any resource will load and start.

Table of contents

Timeout

Timeout specifies the amount of time to wait for the number of servers defined in the Membership field to be up and running. If the timeout period elapses before the quorum membership reaches its specified number, resources will automatically start loading on the servers that are currently up and running in the cluster. For example, if you specify a Membership value of 4 and a timeout value equal to 30 seconds, and after 30 seconds only two servers are up and running in the cluster, resources will begin to load on the two servers that are up and running in the cluster.

Table of contents

Cluster Protocol Properties

You can use the Cluster Protocol property pages to view or edit the transmit frequency and tolerance settings for all nodes in the cluster, including the master node. The master node is generally the first node brought online in the cluster, but if that node fails, any of the other nodes in the cluster can become the master.

If you change any protocol properties, you should restart all servers in the cluster to ensure the changes take effect.

For poison pill abends you will generally bump the Tolerance and Slave Watchdog up. Why? This allows more time for a heartbeat packet to reach the server or for the keep alive tick to be written to the SAN (or other storage device). Generally a poison pill abend will be a communication abend (could be hardware, driver, heavy traffic, etc).

NOTE: Heartbeat and Master Watchdog settings should be the same. Tolerance and Slave Watchdog settings should also be the same. ie: If you change Tolerance to 12 seconds, then the Slave Watchdog should also be bumped to 12 seconds.

Using iManager

1. In the left column of the main iManager page, locate Cluster Administration, then click the Configure link.
2. Enter the cluster name or browse and select it, then click the Properties button under the cluster name.
3. Click the Protocols tab.

This page also allows you to view the script used to configure the cluster protocol settings, but not change it. Changes made to the protocols setting will automatically update the script.

Using ConsoleOne

1. Right-click the Cluster object.
2. Click Properties.
3. On the Cluster Object property page, select the Protocol tab.

This tab has two pages: Settings and Internals. The Internals page lets you view the script used to configure the cluster protocol settings, but not change it. Use the Settings page to make changes to cluster protocol properties.

Using NetWare Remote Manager

1. On the left column under the Clustering section, click Cluster Config.
2. Select the Cluster object name.
3. Click Protocol.

Table of contents

Heartbeat

Heartbeat specifies the amount of time between transmits for all nodes in the cluster except the master. For example, if you set this value to 1, nonmaster nodes in the cluster send a signal that they are alive to the master node every second.

Tolerance

Tolerance specifies the amount of time the master node gives all other nodes in the cluster to signal that they are alive. For example, setting this value to 4 means that if the master node does not receive an "I'm alive" signal from a node in the cluster within four seconds, that node is removed from the cluster.

Master Watchdog

Master Watchdog specifies the amount of time between transmits for the master node in the cluster. For example, if you set this value to 1, the master node in the cluster transmits an "I'm alive" signal to all the other nodes in the cluster every second.

Slave Watchdog

Slave Watchdog specifies the amount of time the master node has to signal that it is alive. For example, setting this value to 5 means that if the nonmaster nodes in the cluster do not receive an"I'm alive" signal from the master within five seconds, the master node is removed from the cluster and one of the other nodes becomes the master node.

Max Retransmits

This option is not currently used with Novell Cluster Services but will be used for future versions.

Table of contents

Cluster IP Address and Port Properties

The Cluster IP address is assigned when you install Novell Cluster Services. The Cluster IP address normally does need to be changed, but can be if needed.

The default cluster port number is 7023, and is automatically assigned when the cluster is created. The cluster port number does not need to be changed unless a conflict is created by another resource using the same port number. If there is a port number conflict, change the Port number to any other value that doesn't cause a conflict.

Using iManager

1. In the left column of the main iManager page, locate Cluster Administration, then click the Configure link.
2. Enter the cluster name or browse and select it, then click the Properties button under the cluster name.
3. Click the General tab.

In iManager, the same page used to view or edit the cluster IP address and port properties is also used for quorum membership and timeout and for cluster e-mail notification.


Using ConsoleOne

1. Right-click the cluster object.
2. Click Properties.
3. On the Cluster Object property page, select the Management tab.


Using NetWare Remote Manager

1. On the left column under the Clustering section, click Cluster Config.
2. Select the Cluster object name.
3. Click IP Address.

Table of contents

Resource Priority

The Resource Priority allows you to control the order in which multiple resources start on a given node when the cluster is brought up or during a failover or failback. For example, if a node fails and two resources fail over to another node, the resource priority determines which resource loads first.

This is useful for ensuring that the most critical resources load first and are available to users before less critical resources (You CANNOT change the Master_IP_Adress_Resource. This will take the highest priority).

Using iManager

1. In the left column of the main iManager page, locate Cluster Administration, then click the Configure link.
2. Enter the cluster name or browse and select it, then click the Properties button under the cluster name.
3. Click the Priorities tab.
4. To change the priority for a resource, select the resource in the list by clicking it, then click the up-arrow or down-arrow to move the resource up or down in the list. This lets you change the load order of the resource relative to other cluster resources on the same node.
5. Click the Apply button to save changes made to resource priorities.


Using ConsoleOne

1. Right-click the cluster object.
2. Click Properties.
3. On the Cluster Object property page, select the Resource Priority tab.
4. To change the priority for a resource, select the resource in the list and then click the Increase or Decrease button to move the resource up or down in the list. This lets you change the load order of the resource relative to other cluster resources on the same node. You can also select a resource and then click the Selected button to reset the resource back to its default load order.
5. Click the Apply button to save changes made to resource priorities.


Using NetWare Remote Manager

1. On the left column under the Clustering section, click Cluster Config.
2. Select the Cluster object name.
3. Click Resource Priorities.
4. To change the priority for a resource, assign it a number between 0 and 65535. 65535 is the maximum value and 0 is the minimum value. Setting a resource priority to 65535 ensures the resource loads before other resources with lower priority settings. Setting the resource priority to 0 ensures the resource loads last after all other resources have loaded. The default resource priority setting is 0. If you assign multiple resources the same priority, the start order of those resources is random.
5. Click the Apply button to save changes made to resource priorities.

Table of contents

Cluster E-Mail Notification

Novell Cluster Services can automatically send out e-mail messages for certain cluster events like cluster and resource state changes or nodes joining or leaving the cluster.

In order for cluster e-mail notification to work, you must first configure e-mail notification for server health status using NetWare Remote Manager. For instructions on how to do this, go to"Configuring E-Mail Notification for Server Health Status" in the NetWare Remote Manager Administration Guide.

You can enable or disable e-mail notification for the cluster and specify up to eight administrator e-mail addresses for cluster notification.

IMPORTANT:If you add or delete an administrator e-mail address or change the type of cluster events you want administrators to receive messages for (verbose to XML, etc.), you must reload cma.nlm (the Cluster Management Agent) on all servers in the cluster for the changes to take effect.


Using iManager

1. In the left column of the main iManager page, locate Cluster Administration, then click the Configure link.
2. Enter the cluster name or browse and select it, then click the Properties button under the cluster name.
3. Click the General tab.
4. Check or uncheck the Enable Cluster Notification Events check box to enable or disable e-mail notification.
5. If you enable e-mail notification, add the desired e-mail addresses in the field provided. You can click the buttons next to the field to add, delete, or edit e-mail addresses. Repeat this process for each e-mail address you want on the notification list.
6. If you enable e-mail notification, specify the type of cluster events you want administrators to receive messages for. To only receive notification of critical events like a node failure or a resource going comatose, click the Receive Only Critical Events radio button. To receive notification of all cluster state changes including critical events, resource state changes, and nodes joining and leaving the cluster, click the Verbose Messages radio button. To receive notification of all cluster state changes in XML format, choose the XML Messages option. XML format messages can be interpreted and formatted with a parser that lets you customize the message information for your specific needs.
7. Click the Apply button to save changes.


Using ConsoleOne

1. Right-click the cluster object
2. Click Properties.
3. On the Cluster Object property page, select the Notification tab.
4. Check or uncheck the Enable Cluster Notification Events check box to enable or disable e-mail notification.
5. If you enable e-mail notification, add the desired e-mail address in the field provided and click the button next to the field to add the address to the list. Repeat this process for each address you want on the notification list. It is not necessary to add quotes to e-mail address names.
6. If you enable e-mail notification, specify the type of cluster events you want administrators to receive messages for. To only receive notification of critical events like a node failure or a resource going comatose, click the Receive Only Critical Events radio button . To receive notification of all cluster state changes including critical events, resource state changes, and nodes joining and leaving the cluster, click the Verbose Messages radio button. To receive notification of all cluster state changes in XML format, choose the XML Messages option. XML format messages can be interpreted and formatted with a parser that lets you customize the message information for your specific needs.
7. Click the Apply button to save changes made.


Using NetWare Remote Manager

1. On the left column under the Clustering section, click Cluster Config.
2. Select the Cluster object name and click Email Reporting.
3. Add the desired email addresses in the fields provided. It is not necessary to add quotes to e-mail address names.
4. Specify the type of cluster events you want administrators to receive messages for.

Specify a 1 or a 0 to disable email notification.

Specify a 2 (Critical) to only receive notification of critical events like a node failure or a resource going comatose.

Specify a 4 (Verbose) to receive notification of all cluster state changes including critical events, resource state changes, and nodes joining and leaving the cluster.

Specify an 8 to receive notification of all cluster state changes in XML format. XML format messages can be interpreted and formated with a parser that lets you customize the message information for your specific needs.
5. Click the Apply button to save your changes.

Table of contents

Cluster Node Identification

(Note the red dot on all cluster-related objects)

You can view or edit the cluster node number or IP address of the selected node or view the context for the NetWare Server object.

Using iManager

1. In the left column of the main iManager page, locate Cluster Administration, then click the Configure link.
2. Enter the cluster name or browse and select it, check the box next to the cluster node whose properties you want to view or edit, then click the Properties link.
3. View or edit the IP address, then click Apply to update the information in eDirectory.

If the IP address changes for this server, the new information is not automatically updated in eDirectory.


Using ConsoleOne

1. Select the Cluster object and right-click the desired cluster node on the right side of the ConsoleOne display screen.
2. Click Properties.
3. On the Cluster Node property page, select the Node tab.


Using NetWare Remote Manager

1. On the left column under the Clustering section, click Cluster Config.
2. Select the Cluster node name.
3. Click IP Address or Node Number.


(Node) Number+IP Address

Number+IP Address specifies the cluster node number and IP address for the selected node. If the cluster node number or IP address changes for the selected node, the new information is not automatically updated in eDirectory. Edit the information and click Apply to update the information in eDirectory.

NOTE:Further information can be found at https://www.novell.com/documentation/oes/index.html?page=/documentation/oes/orionenu/data/hclny85e.html

Table of contents

SECTION II: CLUSTER RESOURCES

(Note the red dot on all cluster-related objects)

This section will not include information on how to modify this information via iManager and NetWare Remote Manager. It is assumed that from the steps above in section 1 you will have an idea of how and where this is done in each application.

Table of contents

IP ADDRESS:

Simple enough--each resource object in the cluster needs a unique IP address. You can change that here.

Table of contents

LOAD AND UNLOAD SCRIPTS:

SCRIPTS

The LOAD andUNLOADscripts can be modified--but be careful! Before trying adding anything to the scripts make sure you grab a copy of the scripts. One thing to watch out for is white space. There is a character limit on the scripts--which includes white space. If a lot of information is needed in the LOAD and UNLOAD scripts then create and NCF file in the SYS:\SYSTEM directory and put a call for it in the scripts.

Table of contents

RESOURCE POLICIES:

Table of contents

MODES

Note that you can ignore the quorum when mounting resources.START MODErefers to when you bring a cluster up. By default this is set to AUTO which tells the cluster to mount the resource upon startup of the cluster.FAILOVER MODEdoes just what it says. If a server abends that is hosting RESOURCE1 and FAILOVER MODE is set to auto, then it will move the resource over to a live node. If it is on manual (why manual? troubleshooting perhaps, and you don't want to online and offline a resource on multiple nodes) it will NOT failover without ADMIN intervention.

MASTER ONLY

Checking this will tell the resource to only mount on the server with the master lock. If the master IP resource fails over, so will the resource(s) with this box checked.

FAILBACK MODE

Failback mode is disabled by default. If you had a server that abends 30 seconds after bringing it up, you wouldn't want your resources failback to the troubled server to have them just fail back to another server. There is some down-time with failovers (in seconds). If the resource fails back and forth it could cause issues with your users that have connections. If you have it on AUTO then when the priority server comes up (described below) then the resource will fail back to the server it is assigned to as a priority.

Table of contents

NODES: CLUSTER RESOURCE PREFERRED NODES:

For this particular resource NODE2 is the priority node on which to load on (or failback to if failback mode is enabled). If you had additional cluster nodes not assigned to the resource, they would show up under the "Unassigned" box. For example, you may have GroupWise cluster resource only failover to 2 or 3 nodes out of the 10 in the cluster as they may be the only servers you want to have GroupWise loaded on.

Table of contents

SECTION III: iSCSI

As mentioned in the beginning, this TID will not talk about installation of a cluster or setup of iSCSI. Refer to the documentation for that information. There are a few things you can do to improve performance with iSCSI.

1.Make sure that you have the latest modules. At the time of the writing of this TID the most up-to-date version of iSCSI is 1.03.01 and can be found at support.novell.com/filefinder. If you cannot find the file then please contact Novell for the latest modules.

2.If the network generates a lot of traffic this can cause issues with "slowness" and even poison pill abends on the cluster nodes (this will occur when heartbeat packets cannot make it through on time to the server nodes or the server that holds the shared device). Please refer to KB 10053882 "The Gory Details of Heartbeats, Spit Brains, and Poison Pills" to understand more about poison pill abends. Cluster tuning, as mentioned above, may help this. Below are a couple of other things to take into consideration:

Table of contents

A. If saving/creating a lot of small files consider bumping up SMALL ECBS (for official and most correct information about small ECBs please see TID 10089748).(some applications use these when working with small files--about 256 bytes or smaller). To monitor this please type _IP at the server console and go to option 1 (as seen below--pay attention to lines 6 and 7 concerning small ECBS). By default theMax Small ECBs Allowedis set to 1024. This can be bumped up to 65534. Just above Max Small ECBs Allowed you will seeFree Small ECBs.If this is dropping to zero you will lose IP connectivity to the server (it won't even be able to ping itself). Heartbeat packets over the wire consume small ECBs among other things. If these drop to zero then heartbeats don't get through, and servers start to poison pill. Because 256 bytes * 65534 is only about 8 MB, there is no reason this parameter can't be easily be bumped up to the max. If NetWare doesn't need the small ECB it wont allocate it, if it needs it then it will allocate it. This can be bumped by typing the following at the server console (or in and NCF file): SET parameter at the server console:SET TCP IP MAXIMUM SMALL ECBS=#####(notice the space between TCP and IP). For more information on Small ECBs (what they are and what they are used for) please see KB 10089748.

Table of contents

B. Along similar lines as mentioned above, one other thing to check (to make sure you aren't having LAN issues) is to go underMONITOR | LAN/WAN DRIVERS |pick yourCARDfrom the list | hitTAB |scroll down to ETHERNET COUNTERS and look at TRANSMIT FAILED, CARRIER SENSE MISSING and TRANSMIT FAILED, EXCESSIVE COLLISIONS. Neither of these values should be more than 10% of the total number of packets. If either of them is above 10% you have communication issues that should be addressed.

Table of contents

3. Along similar lines would be Packet Receive Buffers. What are the differences between PRBs and ECBs? Size! PRBs are about 4K block sizes and Small ECBs are 256 bytes or smaller.

One other thing to note is that PACKET RECEIVE BUFFERS and SMALL ECBS are 2 different sets of buffers. Small ECBs are not part of packet receive buffers but have a similar function. Again, the difference is the size of packet each handles. TCPIP will call one or the other. If you run out of small ECBs then a normal PRB will not be called in its place. You can run out of Small ECBs and have PRBs free and vice versa.

Packet Receive Buffers (and Small ECBs) are used to store incoming and outgoing packets from each of the networks attached to a NetWare server. The Maximum Physical Receive Packet Size should be set according to the kind of network it is on. In most cases, this is 1524 bytes for Ethernet segments (NOTE: this may cause a problem with some Intel based LAN cards. Please check with the manufacturer or the documentation that came with the card), 4540 bytes for Token-Rings and FDDI and 618 bytes for Arcnet and LocalTalk. These values are taken from the "Novell BorderManager Installation and Setup" manual, the chapter on"Installing Novell BorderManager", page 9. Certain products installed have specific requirements in which you will need to refer to your manuals for instructions.

A good rule of thumb for the minimum packet receive buffers setting is approximately 2-3 receive buffer for each connection and maximum packet receive buffers to 10000 (or any higher value).

Note that the memory allocated for ECBs cannot be used for other purposes. The minimum number of buffers available for the server can also be set in the STARTUP.NCF file with the following command: SETMINIMUM PACKET RECEIVE BUFFERS = number. Maximum is just a ceiling. NetWare allocates them as needed.

If the current Packet Receive Buffers rises above the minimum set level after the server has been up for a period of time, set the minimum Packet Receive Buffers to the current level.

ECBs are the packet of choice for a cluster heartbeat. Running out of ECBs is not a fatal conditionexcept in a cluster. Heartbeats are dependent upon Small ECBs and if you run out servers will start poison pill abending. Number 2 above explains more about ECBs, but one symptom you may see if your small ECBs are dropping is the number of packet receive buffers climbing. If these are climbing to around to what your maximum is set at (not just peaking, but staying high) you are probably also running out of Small ECBs. Application threads (e.g. NetMail is a good example of this) can associate themselves with both packet receive buffers and small ECBs at the same time! Just keep an eye on them as described and adjust accordingly

Suggestions:In a clustered environment bump Small ECBs to a minimum of 20,000. If FREE SMALL ECBs drop to zero then bump them up some more. SET MINIMUM PACKET RECEIVE BUFFERS=2000 and MAX to 10000 (per server). If you hit your max and stay up there, then bump the maximum packet receive buffers up. If you see your current packet receive buffers above the minimum, then bump the minimum to match the current. Every environment is different and these suggestions are generic and minimal values. Just watch your system to see how it behaves and adjust if needed as described.

These suggestions are the same for both regular clustered environments and for iSCSI environments. Adding nodes to the cluster should not affect this. It is the load on the clusters that will typically change the way your servers will behave in the environment (meaning the amount of data being sent back and forth).

Table of contents

4. Finally, about the only thing left for consideration with iSCSI is network topology. Below are a couple of examples.

EXAMPLE 1:

With example 1 take into consideration the amount of data that is going to be passed over the network from workstation to iSCSI Initiator, and from iSCSI Initiator to iSCSI target. This can create a heavy load over the network backbone (dependent upon many variables). The benefit is less infrastructure/hardware to purchase.

EXAMPLE 2:

The following example takes some of the redundant load off of the Network Backbone and places it on a dedicated network off of the main backbone. This is the better way to create your LAN topology. Remember that iSCSI will not perform like fiber to a SAN--IT WILL BE CONSIDERABLY SLOWER. There are limitations and if you are hitting them after following this TID (meaning with performance tuning) and you need more out of the setup, it may be time to either create another cluster off of another backbone to try and load balance the network traffic, or consider upgrading the clustered environment to include fiber and a SAN.


Table of contents

Status

Top Issue

Additional Information


Formerly known as TID# 10096996