“You can’t always get what you want . . “. Ain’t that the truth! During the course of the my recent BrainShare session – “How to Build a Highly Available GroupWise 7 System” – I presented different ways organizations can increase their GroupWise up-time, with or without additional costs. I also provided definitions to help direct the learning. Let me share a few definitions and ideas for increased GroupWise up-time.
Business continuance simply means keeping a business running, no matter what. We all know email is THE number one application in an organization, so keeping it running in the event of an outage, disaster or what have you is primary to the business continuing.
Highly Available and Fault-Tolerant Systems
This is about keeping your options open and providing different paths. “Highly available” means allowing many ways to deliver email or redundancy. Take WebAccess as an example. If you have two WebAccess agents, and one of them fails, users can still get to their email. Therefore, you would say WebAccess is highly available. Fault tolerant by definition is “relaxed, patient and understanding” in the event of a failure. Again, using the WebAccess example from above, building a highly available, fault-tolerant GroupWise system is easy – it’s all in the design. Having two or more WebAccess agents will ensure users can get to GroupWise WebAccess in the event of a failure of one agent.
The same is true for GWIA. You can design your GWIA implementation to be highly available and fault-tolerant, with little or no cost. Add a second GWIA and make one inbound and one outbound – or make each in/outbound. This ensures GWIA will deliver email. As a consultant, at a minimum I recommend that my customers consider some form of redesign to strengthen their GroupWise system.
Here are some design ideas for highly available and fault-tolerant systems:
- More than one Domain – if you lose your primary, you need to be able to promote a secondary to primary and then rebuild the old primary.
- More than one post office – if you have 1,000 users all in one post office, and the post office fails, all users are affected.
- More than one GWIA – as stated previously, if Internet email is crucial to business, then double up.
- More than one WebAccess – same as previously stated.
- Gateways in their own domains – if you design one gateway in its own domain, then the gateway becomes disposable. If it’s corrupt, just delete it and its domain (if needed) and start over.
- More than one server – of course this costs money, but consider what happens if all GroupWise components are running on one server – and the server fails.
High Availability and Fault Resilient Systems
This is about making GroupWise run all day, every day of every year. This costs money! This is clustering and Business Continuity Clusters (BCC). This is also increasing the design of your organization’s GroupWise system. It’s also where most organizations are heading. High Availability and Fault Resilience are about making GroupWise elastic, flexible, tough, and resistant to failure and disasters, as well as assisting with business continuance.
Consider a GroupWise system running in one data center on a cluster. If there is a disaster, and the data center is lost, so too is the GroupWise system. It will have to be restored, which takes time and costs an organization thousands if not millions of dollars in lost revenue and productivity.
Keep in mind that high availability and fault resilience include all design ideas from the highly available and fault-tolerant path, adding others to them.
Here are some design ideas for high availability and fault resilience:
- Cluster GroupWise – if you lose a server, the GroupWise component will simply restart on a different server with little or no end-user affect.
- One GroupWise component per cluster resource – this is the “eggs in one basket” scenario. If you have a post office and gateway and domain on one cluster resource, and the cluster resource is destroyed, you affect more than just the end-users in the post office. You also affect all users using the gateway and the domain. The one caveat is that gateways should be together with their domain on the cluster resource.
- Increase cluster nodes – the more cluster nodes you have, the more hardware failures you can survive and still have GroupWise running
- Multiple data centers – if all of GroupWise is in one datacenter, and the datacenter is lost, GroupWise is lost for the time being. If you build multiple clusters in multiple data centers, then if any one data center is lost, only part of GroupWise will be lost.
- Business Continuity Clusters – if you have multiple data centers and multiple clusters, why not make them each provide failover for the other? That way, if one data center is lost, GroupWise will simply fail over to the second data center and keep running
High availability and fault resilience are expensive. But consider this: if your organization’s email is down for one week, how much is lost in dollars? One of the attendees to my Brainshare session stated that his CEO claimed the loss of 1 email is worth one million dollars. Seems a bit extreme, but not really. If the organization relies heavily on email to deliver contracts, bids or other such items 1 lost email could be worth $1 million if its not delivered or not delivered in time due to an outage.
I have been involved in clustering GroupWise for many customers over the last several years and even wrote a book on the topic to assist organizations. “Success with Clustering GroupWise 7” can be found at www.taykratzer.com. As a consultant, I have also been very involved in assisting customers in building clusters and BCC clusters for disaster recovery and business continuance. One thing is certain: GroupWise is the business delivery system for organizations and is gaining respect and longevity these days. Although you may not be able to afford what you want for your GroupWise system, “if you try, sometimes you just might find that you get what you need”.