Novell Home

How Dave Does It: GroupWise Consolidation - Monitoring User Moves

Novell Cool Solutions: Trench
By Dave Muldoon

Digg This - Slashdot This

Posted: 19 Feb 2003
 

This is the fourth article in this series regarding consolidating your GroupWise system. The first three articles reviewed consolidation project guidelines and potential impact to the end users. This article will give you some information on monitoring the user moves and what to watch for.

Before really getting into the actual moves, you should understand that any change to your system should include three components; pre-validation, task-oriented validation and post validation. By utilizing these three phases in your changes, you should be ensured a greater level of success.

Pre-Move Validation

There a few things that I like to review on both the GroupWise system and the network that are key to the success and low impact of the consolidation process (user moves). The most obvious thing to verify is that all of the GroupWise agents are up and communicating. When I say this, it seems obvious, but ALL agents should be up in your system, let me explain why. As you're probably aware GroupWise is a "store-and-forward" system, one of the benefits of this type of system is that if an agent is down, items are held in queue until the destination is functional again. During your move process you want to make sure that as little as possible is queued up as most likely the moves that are initiated during this process will also create a large amount of queued data (and you don't want to run out of disk space during this process).

Another item that should be verified is that the anti-virus software is disabled at the server level as it pertains to the GroupWise message store. By not scanning the GroupWise message store, the server's performance should increase and provide more resources for the consolidation process. Most likely, the data that is being moved has already been scanned at entry points or at the desktop level, so this is not really a high-level risk. This says nothing to the fact that most AntiVirus applications implemented on servers do not have the ability to successfully scan GroupWise data (although there are some good products out there that can).

As mentioned in previous articles, verify that backup or restore jobs are not running on either or any of the servers involved (or scheduled to be run during the moves). The idea is to make sure that the servers have as much I/O (disk access) and TCP communications available to dedicate to the move process. This makes things during the user moves go faster, smoother and better than if applications running on the server were competing for resources such as processor, disk and LAN/WAN with other applications.

It's also recommended that a quick, but thorough, check of all circuits in the path of this process are up and functional in top capacity. This is something that has the potential to show up (in a negative way) in the event that your system has redundant links, where primary links are faster than secondary links. A friend of mine worked for a company that had configured "lesser" backup/redundant links, while not being aware of an outage, the company was running on the "lesser" links for an extended period time, effecting a sizable portion of their WAN. Something like this can seriously undermine all of the planning that goes into the consolidation effort and is easy enough to validate before setting things into motion on the GroupWise side.

One of the most important things to be done is to check out the user accounts before starting the move process. What I recommend is that the GroupWise proxy feature be used to proxy about 10% of the total number of users being moved. What I recommend gathering would be a screen shot of a user's calendar in month view via proxy. I then utilize these screen shots after the moves have completed to perform a "before and after" comparison, making sure things look the same.

Screen Captures
To take a screen shot of the calendar, proxy to the user; use the ALT + Print Screen buttons on the keyboard, which copies the current screen to the workstation clipboard. Then using a word processing application such as WordPad or MS Word, paste in the screen shot (Menu options; edit - paste).

Finally, something that I found after my first few consolidations, was that I could decrease the overall time of the move process by increasing threads used by the POA. What I found works out well, is the last thing that I would do in my pre-validation steps was increase the TCP threads that the POA has allocated (in NDS-GroupWise View). I would also increase the message handler threads as well. I can do this without much worry on the server-side of things, as all of the other pre-validation steps are freeing up resources that ordinarily were used by the other processes. I was able to go to the maximum values of both TCP and Message handler threads after evaluating each specific server involved.

So how many users can be moved at once?

Here's my disclaimer: Everything you read in these articles is related to my network. Novell only suggests moving 5 users at any given time. This is based on a worst-case scenario; things get stuck in the queues or the move isn't done in a timely fashion. I've been there a time or two and have learned a few extra tricks, I've created my own little process for moving those "stuck moves" in a very small amount of time but most importantly I came up with a monitoring process to avoid being in this situation again. This is part of the reason that I went outside the supported Novell numbers. Also, sticking to 5 users at a time, makes for one VERY long weekend and severely increases the overall project timeframe, just imagine how long it would take to move 500 users. Furthermore, I also move the amount of users I've indicated below because I've tested FULLY all aspects of this process. I highly discourage starting out at the numbers I've indicated, I've merely provided them to show exactly what the GroupWise system is capable of.

I'm not going to tell you that there is an easy way to figure out how many users you can move in a specific period of time, this is mainly based on the fact that every network is different and each are uniquely capable of producing different results with this process. I will tell you what I was able to do on my system and I highly recommend that you start out small and see how things go before you jump to higher numbers (if you're system is big enough, you've got a lot of work ahead of you, starting out smaller and increasing a little more each time is much safer).

If you recall in my first article I stated my system was over 330 Post Offices, when I started out my consolidation effort, many of which were over 56KB circuits. My system does not utilize Document Management so, if the system you are consolidating utilizes this feature, you really need to do additional analysis. As for my system, what I found was that I could consolidate Post Offices that supported what I consider "light" GroupWise users ranging from 5 to 12 users per Post Office. The destination location was one specific Post Office under a new Domain at the centralized site. (In one of my next few articles I will cover the server-side of things and what I feel a GroupWise only server should have in it and how it should be configured to really handle a significant amount of users). In my system I was able to move safely anywhere from 100 - 125 users over a weekend (10pm Friday night to noon on Sunday). Now think about this thoroughly; I was using multiple "lesser" servers to really do all of my data priming and then relying on the WAN to get the data over to a centralized server that I specifically designed and dedicated for GroupWise. This process would really be getting all users moved off of something like 9 - 12 Post Offices (and they were light users - mail only, 90 days expire/reduce, and not a lot of big attachments).

I also undertook the same process in larger locations where multiple Post Offices were located that had anywhere from 75 to 500 users. Those types of locations on my network are connected over T1 or better to the centralized location and the users in these Post Offices tend to be more advanced, generating and retaining more items along with the fact that larger attachments are much more common. For those locations, it may be better suited to put a centralized server or two on site and consolidate to it. As for my system, the exact same process was followed at these locations and even though the network speeds were significantly increased (inter-building communication runs at 100Mbps; server to server) I had to stick to the same numbers and timeframes (100 - 125 users per weekend). The reason the number of users wasn't able to increase should be obvious - more advanced users, more mail, larger attachments equates to more data that needs to be transferred for each user to the consolidation point.

Monitoring those Moves (Task-Oriented Validation)

Providing you've followed the information in the other articles of this series (and you're starting out small) you should be relatively comfortable that nothing is going to go seriously wrong when the moves are initiated. Once you do get things moving, you're going to want to keep track of the servers involved and I recommend that you keep a log of what was seen. This log will provide you with some serious post-move information that you can use to further tweak and tune your process and adjust the number of users that your system can handle moving in your "change window."

I prefer to have the system monitored every 30 minutes or at minimum once per hour. Here's a list of the specific items that I feel should be kept in the log. I actually use a spreadsheet that gets filled out every 30 minutes by various support and monitoring areas. Those support areas are instructed to escalate any potential issues or questions to me - just to be on the "safe-side"– avoiding the "Monday Blues." Most of these items are self-explanatory.

POA
Tech Initials: Just in case there are questions, you have a contact.
Time/Day (when this process is looked at):
Server Name:
POA File Queues:
Message and TCP handler thread (30:50)
User ID on POA screen
Errors on POA screen
Message Transfer Status

MTA
Number of Closed Domains
Number of Closed Post Offices
Errors on MTA screen

SERVER
Server name
SYS volume free space
GroupWise Volume free space
Processor utilization
Active abend check
Review SYS:\etc\console.log for errors/alerts

What you should expect to see

During the move process of the users, messages similar to the following should appear on the POAs involved:
     [move.bag] processing E34567.A023
NOTE: The messages displayed in the POA's during a user move normally begin with either [.move] or [move.]. The "." (period or dot - leading/trailing) in these messages is a way of representing the source and target POA's. If the screen or log contains a period or dot (".") on the front of the message line, it indicates a message on a source POA and a period or dot (".") on the end of the line is a message on a target POA.

You may also see some "retrying" messages on your POA. Because of the fact that this process that I've outlined verifies communications and agents, along with the monitoring that I've covered here, most likely this is not a serious problem. Sometimes a "retrying" message will indicate a potentially damaged item that will be lost (it could be an orphaned attachment or a single message). These types of messages that apply to damaged items occur at the end of the move process. If you do see them in the beginning phases of the move process, it should be something to keep a close eye on.

You should also see messages relating to specific user IDs, indicating which user is being moved at a specific time period. This is something that I think is important to keep track of, mainly because the moves will occur in an alphabetical order. I say this will occur in alphabetical order because I recommend selecting multiple objects in GroupWise view and moving them all at one time. By keeping track of the IDs seen on the POA screen, it helps provide a gauge of where the process of the moves is at, at a specific time.

If you don't see much activity on the POA screens involved, this indicates a potential problem. During the move process, the screen in both normal and verbose mode, are VERY active. Screens that appear slow or not all that active can indicate a problem with this process and a re-verification of all of the pre-validation steps should be done.

Post Validation

How can you tell if the moves have completed? That's always been a good question that has a difficult answer, especially if the location that the users were moved off of still has active users on it. The best way to tell is if the move or bag messages have stopped appearing on the POAs. Another way to tell is to check the queues on the servers involved, making sure that nothing is in the hold or deferred queues. NOTE: GroupWise 6.5 - "Hawthorn" has some new features that help distinguish the point in time of a move, along with the added benefits of the "Live Move" process.

After seeing the screens slow to almost no activity or normal user activity in the case where active users remain on the Post Office, you can begin post-validation. At this point, by using the screen shots taken in the pre-validation steps, administrators can compare what was in the account (via proxy) before the moves began, to what is now in the account once it is suspected that the process has completed. Using the GroupWise proxy feature, proxy the same users that were reviewed in the pre-validation steps. When in GroupWise, proxy to a user's account and switch to the calendar (month view). The account reference being used to proxy may need to removed and re-added to the proxy list as the proxy list may be pointing to the user object on the previous Post Office. Once logged into GroupWise and proxied to the user who was moved, a comparison can be made to validate that the screen shot taken before the move, matches up to what is left after the move process has completed. NOTE: If the two do not match up, re-check the servers, POAs and network to verify that another problem has not stopped the move process. It is also worth noting that during the move process it is very common for the proxy settings to be reset for user accounts. If during the post-validation process, a "D124 - access is denied" error occurs or the calendar appears blank, this does not necessarily indicate an unsuccessful move, continue proxying all accounts for further validation.

Summary

In recap, the information in this article along with the previous articles in this series and the next two articles will provide a detailed overview of what I've seen and experienced while performing consolidations for many thousands of users. This is information and experience that can apply to all administrators when looking to consolidate a GroupWise system. By using this information and the guidelines and recommendations that I've written, anyone should be able to go through this type of project with confidence; in themselves and the process that they have used.

My next article will cover an additional step that I used for consolidating large numbers of active/advanced users. This process covers a "staging" server that is used to move Post Offices closer to the consolidation point before moving users. This allows administrators to remove the slowest link from the move process - the WAN.

Watch for more articles by Dave Muldoon, every couple of weeks, under the resources link on GroupWise Cool Solutions - http://www.novell.com/coolsolutions/gwmag/features/trenches/tr_how_dave_does_it_gw.html.


Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com

© 2014 Novell