How Dave Does It: Determining Candidates for Consolidating Multiple POs to a Single Server
Novell Cool Solutions: Trench
By Dave Muldoon
Digg This -
Posted: 11 Nov 2004
With each article that I write, I try to provide some "real world" experience of managing GroupWise. It is my hope that by providing some of the solutions and ideas that I have dealt with you may gain some insight or inspiration to take on some of these concepts within your organization. This all sounds great on paper although it seems that from time to time I get so excited about sharing something new that I may not get everything out clearly. Looking back on a couple of articles that I had written earlier this year it seems to me there has been something that I failed to provide when it comes to trending and consolidation even though I touched upon this in two articles.
The first article that I covered part of this concept in was entitled; "Taking a Pulse on Your GroupWise System". This article goes into detail on how to perform a system health check, as well as, discusses concepts for historical trending. If you're familiar with my writings than you are probably already aware that I strongly believe in tracking many statistics of the GroupWise for trending and historical purposes.
The second article that I discussed running multiple GroupWise Agents (specifically Post Offices) on the same server was entitled "Increasing Your GroupWise ROI". This article talked about bending Novell's age-old rule for managing GroupWise - "the optimal way to run a GroupWise system is with a dedicated server for each agent". In this article I had reviewed why you should consider breaking away from Novell's recommendation, as well as, how to configure the multiple agents on a single server.
Within this second article I think I failed to tell you how to identify which agents can run together without depleting a server of resources. I can imagine that it may be difficult for some administrators to understand how to identify which agents to consolidate if any at all. This goes especially for those systems that are running great today. It would seem rather risky to make changes and potentially degrade a large portion of the system just to try this out - right? RIGHT.
Well, my intention for this article is to fill the information gap that I had created and hopefully help you understand how to identify agents that fit into this category. Or at a minimum how to compare servers (not just GroupWise servers) based on performance and what characteristics to look for when judging performance.
Knowing your environment through standards
To provide a baseline for this article, you must understand two things, one of which is that when I mention a "server", I am referring to a server dedicated for GroupWise processing only. The second thing is that I have some relatively strict standards for the GroupWise Post Offices, one of which is that each Post Office should have no more than 800 users. This is a rule of thumb that provides a few worthy advantages:
- Message databases and the message store in general are more easily managed as growth is solely based on increased usage of the GroupWise system. (Assuming that there is an expire/reduce policy).
- Users are divided into smaller "pockets" so that if there is an interruption of service, not all users in the system are impacted. NOTE: Post Offices are also divided by Domains to reduce impact - not all Post Offices are owned by the same Domain. This prevents large-scale impact if a Domain is having problems.
- This number of 800 users per Post Office was initially used to help manage the number of application connections per server. The number 800 was chosen based on seeing impact to end-users when the POA was having to handle more than 1,400 connections. Keep in mind that a connection is generated by application that talks to the POA for the user; Notify, proxy, calendar windows, multi-user view calendaring all generate connections so consequently one user can generate quite a few connections.
- Standardization with user numbers and server parameters help a tremendous amount when troubleshooting problems. This is mainly based on having a duplicate production environment to review and compare with when one location or server is having an issue, as each other location is configured the same way with the same number of users, etc.
Now based on the two standards that I have mentioned above (800 user per PO and dedicated GroupWise servers) and the advantages that I've outlined, I was able to do some forecasting...
About a year and a half ago I started watching a particular group of servers more closely in my environment on a hunch that they were underutilized. Knowing exactly how many users were on the Post Office and that there would never be any more users added, I was able to confidently predict performance of these servers over time. After a period of 18 months of tracking the entire system I had more than enough data to provide a recommendation for further consolidation of the GroupWise system. This time, instead of user moves to manage consolidation of the system, the consolidation is related to hardware.
What you should understand here is that there are distinct advantages of running 1600 users on two Post Offices as opposed to running 1600 users on one Post Office. The message store is "broken in half" so to speak as only 800 users shared the message databases and OFFILES directory structure. This fact alone helps reduce maintenance times. Although in the configuration scheduled events should be staggered so that both agents are not processing structure, contents, expire/reduce or other such maintenance process at the same time.
As I mentioned, I had 18 months of data that I used to backup my recommendation of positioning these "big" Post Offices to share the same server. As mentioned in a previous article, this is nothing new - many organizations may already be getting by with this type of configuration. Such was the case with one reader who actually commented that he needed to tell his boss just how efficient their system already was– I love it! You deserve a raise - tell him that too– (okay maybe you better not, but keep the comments coming).
How to tell if the server is underutilized
As I mentioned, I could accurately predict how a Post Office would perform over time. I have no crystal ball to rely on, so in these cases I rely on some key factors. These factors are related to historical system analysis and trends. This data is not something difficult to gather. You can gather the data from your GroupWise system with some relatively manual processes, write your own scripts to gather the data or you can purchase software on the market today. Once you have the data you can analyze the information to provide direction for projects, upgrades and consolidations/expansions. What I want you to understand here is that this data collection is something that anyone can do. It just takes a little time to figure out what approach to take but it's time well spent.
Here are the specific items that I kept track of to trend the utilization of servers in my environment:
- The underutilized servers were averaging 80 reads and 90 writes per second.
- The busy servers were averaging 555 reads and 9,000 writes per second.
- Dual CPUs on an underutilized server were averaging 5% and 10% and peaking out at 53% and 75%.
- Dual CPUs on a busy server were averaging 15% and 23% and peaking out at 71% and 79%.
- The underutilized servers were averaging 177KB per second inbound and 190KB per second outbound.
- The busy servers were averaging 520KB per second inbound and 708KB per second outbound.
Disk I/O. I tracked the disk activity for the servers in my system and then compare the servers that suspected were underutilized to those that I knew were the busier servers. I was able to determine that those that I suspected were underutilized were averaging much less disk activity their busier counterparts. For example:
Processor. Post Office Agents are bound by two things; processor and disk. Ruling out disk activity as a bottleneck for consolidation in the above section, the processor was the next place to look. For example:
I had even found that these underutilized servers are even showing lower CPU utilization during nightly maintenance processes (I just happened to be there for implementing a change and my curiosity got the best of me). This is obviously another resource of the server that is underutilized.
Network utilization (NIC traffic). Of course GroupWise would be nothing without communication bandwidth. Based on my hunch it would seem that on those servers where users were just plodding along (based on low user connections) the NIC traffic should be less when comparing to the busier servers. Monitoring the NIC traffic, on the servers within the system, seemed to provide more basis for my hunch. For example:
This provided even more substantial data to support my recommendation. If you look at these numbers, as with most of the other components, the underutilized servers aren't even using half of what their busier counterparts were.
User connections. I started watching the above three items solely based on seeing the user connections being far from average throughout the system. User connections on an 800 user Post Office were ranging from a 100 connections up to 1,300 connections. Obviously within the average there are many agents in the middle of those numbers. The ones I was watching were those on the low side of the scale.
Where to put the underutilized Post Offices
As you can see through the use of the historical data and some simple mathematical calculations, it was relatively easy to determine that there were underutilized servers. The question then became what to do with the Post Offices running on them.
As you can imagine, my design would be to have all of the GroupWise agents remain on servers dedicated to GroupWise processing. This being the case, I had to see what my options were for moving the underutilized Post Offices to another server. Two options came to mind:
Keep them together
The simplest idea and the first thing that came to mind was to take two of these Post Offices that hardly use a server and put them on the same server. BINGO! I can't lose with that concept. Or can I?
I can easily predict that over time and with the existing load that two of these agents would never exceed half of the resources utilized by a busy Post Office. This left me with a slightly less than happy feeling, as I would still have a significant variance between my servers if I had some servers still running at less than half of their potential. This would be similar to buying a Ferrari and never taking it out on the highway.
So I continued brainstorming...
Strategically plan user connections
From what I can determine through facts, as well as common sense, user connections/requests are generally what take up server resources. This prompted me with another idea where I could try to balance out the user connections and see mathematically how that would work out. As I had mentioned I had many servers with user connections averaging in the middle of the 100 and 1300 range. So I ran some numbers combining statistics of an "average" server and an underutilized server. I then took those results and compared them to a busy server. The result: a really cool solution!
To elaborate a little further, what I did was take a server that had been averaging about 700 user connections and an underutilized server generating on average 300 user connections and paired them together. Of course I did this all on paper first. I'm no statistician but I was able to take the disk I/O, the processor information, the NIC utilization and add them together to see if I was overworking any part of the server. What I found was that I could take a server that was performing at about 20% capacity and combine it with a server running about 60% capacity and the end result would be a server running at about 80% capacity.
And by now you're saying (possibly even in a mocking manner)... "Not too bad for 18 months worth of work Dave". All I can say is that you have to have facts to back up your recommendation. And those facts have to start somewhere. Within the first 6 months of monitoring the consolidated environment I knew that I could move in this direction. Of course, other projects came up and because this consolidation effort wasn't a priority it didn't have to be done right away. Over time statistics kept rolling in and eventually this fit the right priority.
After all of the work was done, I've got six servers running twelve Post Offices with absolutely no noticeable change to the end users.
There were some key factors involved throughout this process that you should take note of and follow if you are planning this type of effort. First, gather the statistics to understand what can and cannot be accomplished in your specific environment. Second, review the options and alternatives for consolidation. The first design idea may not always be the best fit. And finally, make sure that you plan for growth. Don't "max out" a server's resources. As you noticed, I only went to about 80% consumption of resources. This avoids the all important end-user impact.
How Dave Does It Book:
If you like what you've read and want to read more of "How Dave Does It" you may want to consider picking up a copy of my book:
Written for both the beginner and the intermediate GroupWise administrator. This book is packed with many short chapters designed to allow you to read through the entire chapter in one sitting.
- The "Basic GroupWise Administration" section instills the fundamentals of GroupWise skills.
- Basic skills are built upon in the second section; "Enhanced GroupWise Administration".
- The third section of the book is a compilation of "How-Tos", providing practical step-by-step processes and procedures that you will be able to utilize within your system.
- Also contained is a compilation of all of the Cool Solutions Articles written by Dave Muldoon.
- And hidden way at the end of the book is a little bit of "geeky GroupWise fun", with a set of questions designed to test your knowledge of GroupWise.
more How Dave Does It articles
For more articles by Dave Muldoon visit How Dave Does It
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com