Operations Center is built on Java and therefore requires some attention to configuration of the Java Virtual Machine (JVM). One of the nice features of the JVM is specifically around memory management. In some of the other programming languages such as C, developers had to manage the memory themselves. The developer had to allocate and deallocate memory on the fly as needed and in turn, had the potential of memory leaks (ie: allocating memory over and over again without freeing it up when it was no longer needed). There is still the possibility of memory leaks in a Java application, but with Java managing the overall allocating and deallocating, the risks are lower
The way in which you tune a JVM is via the commandline arguments that are used for starting up the java application. You can look in the Operations Center Customizer to see each main processes JVM arguments that are used for you implementation. By default, Operations Center installs with some basic settings such as specifying the maximum amount of memory to use and the log file for example. These settings are to just get an implementation started and not the recommended production implementation settings for any/all implementations. Since each implementation of Operations Center has different amounts of elements, alarms, types of users, etc, the amount of memory and how memory needs to be managed can be different. The best starting point is to get a system fully loaded with all adapters and views. From there, you can start the tuning process.
The JVM tuning process is both a science and an art. In the Operations Console (thick client), if you navigate to the Administration/Server element and switch to the performance view, you will notice that you can chart several different metrics. These metrics are specific to the JVM's memory utilization and can help you during the tuning process. The most commonly charted metric is "Total.Heap.Memory.Used". While this implies the amount of memory used, it is not the amount “allocated”. Depending on –Xmx and –Xms JVM settings, the system may not allocate the full –Xmx on initial startup. Charting other metrics will assist in the tuning, but this is typically where I start, ie: how does it appear to be working with managing the memory.
The thing that I look for in Operations Center is to see the Total.Heap.Memory.Used to show a mostly fixed high and low water mark, but also show memory going up and back down, similar to a saw tooth type of look. This means that as the system is running, memory is allocated and deallocated on a routine basis due to standard processing of alarms, elements, user requests, etc.
When Operations Center server first starts, a low amount of memory will be used. Over time the memory used will slowly grow but typically it should stabilize. If the server has been running for a long period of time and the amount of data coming is not growing, the overall memory used should be roughly the same.
Getting back to memory allocating and deallocating, there is a term used around Java based applications, it is called Garbage Collection. Garbage Collection (AKA: GC) is the process of the JVM going through the allocated memory and removing (deallocating) unused/unneeded memory. GC’s are not a bad thing, something referred to a “Full GC” is something that needs to be avoided. While you are not able to avoid a Full GC via a commandline argument, you must tune the JVM to avoid these from happening. In a nutshell, a GC walks around the memory and free’s things up. At some point either due to the amount of memory currently being used and the maximum allowed memory (-Xmx setting), the JVM may decide that it MUST really, really, really deallocate memory right now. In order to do that, it literally pauses all threads running in the JVM (think of someone hitting the pause button on Operations Center, no new alarms will come in, no elements updated, no response to end users clicking on things). You may from time to time see a Full GC that lasts a few seconds, those are not that bad. When your memory used gets very close to the –Xmx setting, the JVM may go into a panic state and do Full GC’s back to back to back to back (IE: Operations Center will appear locked up). This is a bad situation for Operations Center, back to back Full GC's means that pretty much everything in Operations Center has come to a halt (ie; paused) while the JVM is struggling to free up memory because it needs to store more data.
Within the logs directory, there is a file called fsgc.log, this file will assist you in understanding when, how and type of GC’s running. The file is re-created everytime Operations Center starts. You will see Full GC’s on initial startup, they are not a problem, mostly because they run in such a short period of time, they will not be noticed. After the system is up and running, you should rarely see them. Below is an example startup fsgc.log file from an initial startup.
0.378: [GC 13312K->4879K(47936K), 0.0064363 secs]
0.804: [GC 18191K->6444K(47936K), 0.0066296 secs]
1.349: [GC 19756K->6805K(47936K), 0.0035868 secs]
1.795: [GC 20109K->8710K(47936K), 0.0043142 secs]
2.345: [GC 22022K->11267K(47936K), 0.0046819 secs]
2.731: [GC 24579K->12802K(47936K), 0.0064709 secs]
3.260: [GC 26114K->13095K(47936K), 0.0052920 secs]
3.722: [GC 26407K->14679K(47936K), 0.0056979 secs]
4.380: [GC 27991K->15332K(47936K), 0.0054607 secs]
6.525: [GC 28644K->16929K(47936K), 0.0065346 secs]
7.892: [Full GC 29568K->17472K(47936K), 0.0682771 secs]
9.953: [GC 30784K->18141K(48000K), 0.0037259 secs]
10.499: [GC 31453K->20318K(48000K), 0.0079776 secs]
10.623: [Full GC 22873K->19975K(48000K), 0.0726967 secs]
11.387: [GC 33335K->20965K(48396K), 0.0040743 secs]
11.459: [GC 34405K->20925K(48396K), 0.0023190 secs]
12.123: [GC 34365K->21307K(48396K), 0.0017868 secs]
12.310: [GC 34747K->21474K(48396K), 0.0018603 secs]
12.402: [GC 34914K->21588K(48396K), 0.0015354 secs]
12.838: [Full GC 32341K->21694K(48396K), 0.0899491 secs]
13.453: [GC 36350K->22583K(52608K), 0.0023321 secs]
13.610: [GC 37239K->22784K(52608K), 0.0022271 secs]
The first column (0.378) is the seconds since the JVM process has been started (IE: Operations Center has been running for .378 seconds)
The second column ([GC 13312K->4879K(47936K)) shows the type of GC (partial or full), as well as the allocated memory before and after. The line can show an increase or a decrease in the amount of memory being used within the JVM. Remember, memory used and allocated are different. Think of the -Xmx setting as the maximum size of the hard drive, but you are only using a certain amount of disk space. -Xmx is the max amount of memory (roughly, it may grow a small percentage past that) the JVM *should* use, the Heap Used metric shows what is currently being allocated (used).
The third column of data (0.0064363 secs) shows how long the GC ran Signs that your system is having memory problems that potentially impact the system and/or end users, are lines like the one at 10.623 where there was a Full GC, but since this one ran in .072 seconds, it’s not an issue. On a memory starved Operations Center system, you could potentially see a Full GC line executing, the amount of memory before and after is not changing much and the duration of the Full GC is represented seconds, ie: 5 seconds, 30 seconds, etc. Worse case, it is followed up by more and more and more Full GC’s.
As I indicated before, turning a JVM is a science and an art. I recommend that you work with the Consulting Team to assist you and/or read up on the JVM settings, how memory is used and tips and tricks on tuning. I recently ran across a very well written blog by Rupesh Ramachandran, Principal Solution Architect on Oracle's "A-Team" and a whitepaper from the Managed Runtime Initiative on the subject that would be a good starting point.
Since it doesn’t make sense to do the actual tuning testing on the production implementation of Operations Center it is important to set up a development/test environment that closely mimics productions elements, alarms, load, etc. The dev environment must represent production as closely as possible to ensure the memory settings tested in Dev make sense to use in Production.
Disclaimer: As with everything else at Cool Solutions, this content is definitely not supported by Novell (so don't even think of calling Support if you try something and it blows up).
It was contributed by a community member and is published "as is." It seems to have worked for at least one person, and might work for you. But please be sure to test, test, test before you do anything drastic with it.