Will it be Lobster or Linguini?
Novell Cool Solutions: Feature
By Todd Grant
Digg This -
Posted: 25 Apr 2001
(Making the choice between SFT III and NetWare Cluster Services)
Last night I took my wife to a nice restaurant that has a reputation for the best seafood in town. The atmosphere was perfect -- candlelight, soft music, good company. When it came time to order I found myself in a dilemma -- shrimp and lobster or seafood linguini. Both are favorites of mine and both are equally delectable. As I ocellated back and forth in my decision, the thought occurred to me, choosing between SFT III and NetWare Cluster Services is similar to choosing an entree at a posh seafood restaurant (is it normal to think about NetWare during a romantic dinner?) Perhaps you've been wondering about the same thing. Both tools do an excellent job in providing data protection, but which is the best for your company?
Comparing SFT III to NetWare Cluster Services
Let's say you're a System Administrator for a medium-sized company and you've been asked by your manager to come up with a way of providing data protection and maximizing server up-time for your NetWare servers. You already know NetWare is the most reliable network operating system on the market today, but your manager wants to ensure your NetWare servers are up and available more than 99 percent of the time.
SFT III and NetWare Cluster Services (NCS) are two Novell products that provide additional levels of data protection and availability for NetWare servers. Each has its strengths, and some weaknesses. Let's examine each product to define which is best for your company.
SFT III provides protection against hardware failure by mirroring two NetWare servers. One server acts as the primary server and the other as the secondary server, sometimes called a master/slave type of server configuration. The secondary server is mirrored to the primary server and both memory and disk data are identical on both servers.
Both servers run the same applications, and any transactions performed on the primary server are simultaneously performed on the secondary server. If the primary server experiences a hardware failure, the secondary server automatically takes over. NetWare clients connected to the server experience no loss of service or data.
SFT III servers go through an initial mirroring or synchronization process, which duplicates the memory and disk contents of the primary server on the secondary server. Whenever one mirrored server fails, or is brought down for maintenance, the mirroring process reoccurs when it is brought back up.
SFT III does have some requirements. It requires NetWare 3 or 4. It does not run on NetWare 5 and above. This means that any applications that require NetWare 5 or above won't work in an SFT III environment.
It also requires two identical servers capable of running NetWare 4. When I say identical, I mean that both servers should have the same amounts of memory and disk space on the NetWare partitions. Both servers should also have the same CPU type and speed.
In addition, SFT III requires Mirrored Server Link (MSL) cards for each server and cabling to connect the servers together. The MSL cards and cables provide the high-speed communication necessary for the servers to mirror and remain mirrored. Both fiber-optic and coax versions of MSL cards can be used with SFT III.
SFT III is easy to manage from the server console, using console commands created specifically for SFT III.
Without going into too much technical detail, let's look at what SFT III does when the primary server fails.
- The secondary server detects that the primary server has failed.
- The secondary server notifies each workstation that it (the secondary server) is now the primary server.
- Network requests and traffic are rerouted to the new primary (former secondary) server.
- The problems are resolved and the failed server is restarted.
- SFT III recognizes that the two servers are out of synchronization.
- The new primary server sends the changes over the mirrored server link to update the repaired server and to resynchronize the two servers.
NetWare Cluster Services
NCS provides server availability by allowing you to configure up to 32 NetWare servers into a high-availability cluster. An advantage of NCS is that it allows you to protect against both hardware and software failures.
When a failure occurs, NCS restarts applications from failed nodes on other designated servers in the cluster. This application restart, or failover as it is commonly called, happens so quickly that users generally do not notice that a server failure has occurred.
The application data is generally stored on a shared disk system, commonly called a Storage Area Network (SAN). The SAN is connected via fiber channel cables, cards and switch to all the servers in the cluster. That way, any server in the cluster has access to application data and can be pre-configured to access that data should a software or hardware failure occur. A slower, less expensive two-node-shared SCSI cluster implementation can be configured, but it does not provide the speed or scalability of a fiber channel cluster.
Applications can be individually managed and configured to automatically restart or failback to the node they were previously running on once that node is returned to service. If you desire, you can also manually migrate applications to different nodes in the cluster without waiting for a failure to occur.
Let's briefly look at what NCS does when a cluster server fails (although NCS also protects against software failure, in this example we will only consider a hardware failure).
- Because each cluster server is in constant communication with the other cluster servers, a cluster server failure is detected by the other cluster servers.
- Any applications running on the failed server are restarted on other servers in the cluster. (You can configure individual applications to move to different cluster servers in the event of a failure.)
- Any IP addresses assigned to the applications running on the failed server are transferred with the application to other servers in the cluster.
- Volumes formerly mounted on the failed server are remounted on other cluster servers for the applications that require them.
- The failed server is repaired and brought back online. NCS recognizes the failed server is back in the cluster.
- If you configured NCS to failback specific applications, those applications will restart back on the server that failed.
- Any applications that weren't configured to failback, will continue running on their current server(s). You can also manually move or migrate applications back to their original server if desired.
NCS works with NetWare 5 and above, and relies on NDS to provide a single point of administration. Cluster-specific NDS objects are created and managed with ConsoleOne. Because of the speed required to mount and remount NetWare volumes, NCS requires Novell Storage Services (NSS).
Typical cluster implementations include several NetWare servers and a single shared storage system with mirrored drives or a RAID.
SFT III versus NetWare Cluster Services--Advantages and Disadvantages
Now, let's look at the advantages and disadvantages of SFT III as compared to NCS.
Because NCS allows up to 32 nodes in the cluster, it offers an additional level of availability over SFT III. In the unlikely event that both mirrored servers fail in an SFT III environment, users would lose access to the network and to their data. With NCS, users can still maintain access to the network and their data because up to 32 nodes can be configured to take over from failed nodes.
Managing SFT III requires access to each mirrored server's console to be properly managed, configured, and monitored. NCS offers a single point of administration for all cluster applications and services through NDS. You can configure, manage, and monitor the entire cluster from a single client workstation.
With SFT III, all server-based applications or services are mirrored on both servers. This means that both servers are dedicated to the same applications and services, and your hardware investment is doubled for the sake of additional fault tolerance.
NCS allows different applications and services to run on different servers in the cluster. When one server fails, the applications or services on that server can be individually moved to one server or multiple servers in the cluster, even though the other servers may not be running the same applications. Since cluster servers are not mirrored, different servers can be dedicated to different applications, but can still be used as a backup for other server's applications if necessary. With the power of NetWare 5 and NCS, you can actually consolidate applications and services onto fewer machines and still maintain the level of availability required for today's businesses.
Both SFT III and NCS can be configured using off-the-shelf hardware, which means both configurations consist of commodity components which are easily obtainable.
Server Memory Protection
SFT III provides protection from hardware failures, but does not provide protection from software failures. Since SFT III servers are mirrored, software problems are sometimes mirrored as well.
NCS provides protection from both hardware and software failures. Either problem will cause an application or service to failover to another server in the cluster. If a software failure occurs, the application that was running on one server can be started, or may already be running on another server in the cluster. Any data required by the application is accessible from the shared disk system.
A benefit of SFT III over NCS is that SFT III can protect server memory contents during a hardware failure, while NCS can't. Because memory contents are mirrored, if one server fails, whatever was in memory on the failed server is still accessible (and in memory) on the other mirrored server. With NCS, if a server fails, memory contents are lost, and the applications running on that server are started on another server and any application data is loaded into memory from disk.
An example of this would be if you are in the middle of saving a large file to a server when the server failed. With SFT III, the file would still be safe and the save process would complete on the other mirrored server. With NCS, anything still in memory when the cluster server failed would be lost, and you could end up with a partially saved (or corrupt) file on disk. However, this may not be true in every case. Some applications provide data integrity, which can prevent corrupt or partially saved files from occurring, but data integrity cannot prevent the data from being lost in the above mentioned example.
So, What are You Going to Do?
The choice is really clear-cut. If you already have NetWare 3 or 4 installed and don't plan to upgrade to NetWare 5 or 6, you won't be able to run NCS. Therefore, SFT III is your best option for fault tolerance.
If you are already running NetWare 5 or 5.1, SFT III is not an option for you. Therefore, NCS is the best product to provide you with the high availability you need.
Since SFT III requires the older NetWare 3 or 4 operation system to run, it is obviously dependent on IPX, and won't work in a TCP/IP-only environment. NCS on the other hand was built to work specifically with TCP/IP. In fact NCS requires TCP/IP, and won't work in an IPX-only environment. This doesn't mean you can't have IPX configured in addition to TCP/IP on cluster servers, it just means any application or service you want to configure to work with NCS, will require TCP/IP. With NCS, applications and services are assigned IP addresses, and the IP addresses move with the application or service to different servers in the cluster during a failover.
In reality, NCS provides quite a few advantages over SFT III, as does NetWare 5 over NetWare 3 or 4. Upgrading to NetWare 5 or above and implementing a cluster is well worth the investment if you must provide nearly 100 percent uptime for your network clients.
Oh, and about that romantic dinner with my wife...just as I was about to place my order, my cell phone rang. It was my boss. He said the network was down and asked if I could come in and do some troubleshooting. He then asked me if I knew of any way to improve server-up time!
For more details and technical information on both SFT III and NCS, go to www.novell.com/documentation.
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com