Novell Home

Workaround for Backup Exec and Java Utilization Problem

Novell Cool Solutions: Tip
By Jack Shreve

Digg This - Slashdot This

Posted: 25 Oct 2007
 

Problem:

Background:

NetWare 6.5/OES
Novell Cluster Services (May apply to standalone as well)
BackupExec v9.x (May apply to earlier versions as well)

In a clustered environment, utilization of the node BE is running on spikes to as much as 90%+ when attempting to check job logs. It takes minutes before control on the BE screen is returned and the job logs appear.

User response from the servers hosting the BE resource is severely compromised.

This is nearly always a Java utilization problem. You can confirm this using the NoRM (Novell Remote Manager) Web utility on the server hosting the BackupExec resource and checking under "Diagnose Server" and "Profile/Debug". One or more Java threads will be hogging 30-80% or more of the CPU(s).

Solution:

If this is pandemic throughout your environment, check your patch level with special emphasis on the JRE/Java modules being loaded. Long-term solution is to fix the Java problems in both NetWare and BE by making sure you patch to latest levels. If it is one server, and happens rarely, it is probably a quirk, and either a reboot or the "Workaround" below is indicated.

Workaround:

For either situation named above, there are two courses to follow.

To troubleshoot in a clustered environment, try migrating the BE resource to a different node and observe the results. Often the utilization will drop quickly. This is an indication that java had begun to grab and not release CPU and memory resources on the previous node.

Second, try unloading and reloading BE.

In a clustered environment, you first exit the program (CTRL+X) and then type "BESTOPC" * at the console screen. NOTE: It is very important you do not leave out the "C" on the end. *

When this is done, type "UNLOAD BECDM".(BECDM does not always unload automatically as it should, and having it loaded in the last step has proven problematical.)

Next, find out what java modules are running on the offending node:

Type: "JAVA -SHOW"

Look for a classname that starts with "vrts".
If you see this, note the number under "ID".

Then, type "JAVA KILLxxx", where xxx is the number you noted under ID when you issued the "JAVA SHOW" command. Please note there is no space after KILL and in front of the ID number.**

This should bring your utilization down, at least temporarily, until you can troubleshoot farther the cause. Nevertheless, check if utilization has gone down; in most cases it will have.

Now, restart BE using the "BESTARTC" * command. Again, in a clustered environment you must not forget the "C" at the end of that command. Try reading the job log - if java was the problem, you will see your logs quickly, and there will be no problem with delayed response for your users. At this point, you can decide whether to leave the issue for after hours and find out which java modules, your NLMs, and hardware, are not playing nicely, or to forgo the Workstation GUI entirely.

*Note that killing the vrts classname module will render the workstation-based BE GUI inoperable, but server-based operations are not affected.

**In a non-clustered environment, the steps are the same except you do not, of course, migrate resources, and the commands to unload/load BE do not include the trailing "C".

Environment:

  • NetWare 6.X
  • NCS (Cluster Services, but not necessarily confined to NCS environs)
  • Java JREs
  • BackupExec 9.x


Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com

© 2014 Novell