Goodbye to Tape - Backup Strategies in the New Millennium
Novell Cool Solutions: Feature
By Gary Childers
Digg This -
Posted: 28 Jun 2005
One of the undeniable issues we all face in network and server management is protection against data loss: that is, for most of us, data backup. It is simultaneously a bane and a boon to our existence. It can be a lifesaver – when it works, and we really, really need it – like, when a critical server fails, and we need to restore it. It can also be a nightmare – when it fails, and we really, really need it. Many times, it is simply an enormous aggravation, when it isn't working properly – because we just never know when we might really need it.
For a lively discussion about The Pros and Cons of Tape Backup Solutions, check out this article.
Essentially, data backup is just another form of redundancy built into information systems to protect against failure. Most server-class computers have some redundant systems – redundant disks, redundant power supplies, redundant controllers, etc. – to protect against hardware failure. Servers can also be configured into clusters to provide server resource and application failover. Network switches, routers, and firewalls are often built into redundant configurations, to attempt to protect us from any single point of failure. One might think that all of this must have been designed by the famous Department of Redundancy Department.
Of course, all of the servers, switches, and routers really only exist because of one thing: our data. Without the data, there's not much point in having all of the rest. And data is also vulnerable to failure. All of those ones and zeros flying across the wire, cached in memory, processed in processors, and saved to spinning magnetic disks – this is the information that is crucial to our business operation, and this is the information we want to keep from losing.
There are many ways in which data can get lost, from hardware failure such as hard disk crashes, to transmission errors, to application failures, to the classic "OOPS" factor – that is, human error. The ease of point-and-click computing can also lead to an increasing ease of making mistakes – it's frighteningly easy to delete files and even entire directory trees that way, and the click-happy user may not stop to read all the "Are you sure?" messages along the way.
The basic solution to avoid data loss is simply to have the data in more than one place. Fortunately, digital data easily lends itself to being replicated. Mirrored disks within a server provide protection against the failure of one disk. Most hardware implementations of this also provide recovery (once the failed disk is replaced) with "zero" downtime on the server. Other RAID implementations can allow for failure of two or more disks, and still recover.
So far, we are protected from hardware (specifically, disk) failures. But files deleted or corrupted on one redundant disk is also deleted on all of the disks in that disk set or array. To protect against non-hardware errors, we typically need some type of point-in-time replication of data, so that we can fall back to a copy of the data that we had before it was deleted or corrupted.
Some file systems also provide a level of redundancy for us. For example, the NetWare file system has long provided the ability to salvage deleted files. What (NetWare) network administrator has not appreciated being the hero of the day, simply by executing the salvage utility to recover deleted files?
Tape Backup Strategies
The classic solution to point-in-time data replication is to backup the data to tape media. Although this is traditionally a relatively slow process, it has long provided a fairly reliable method of replicating our valuable data to a medium that is unaffected by real-time data losses (by virtue of being point-in-time copies) and that is also portable (allowing it to be stored off-site for disaster recovery). Virtually all large and enterprise organizations employ tape backup strategies to protect their data, and most small business networks at least try to do the same.
The many flavors of tape backup software have also evolved to make the backup process easier. We need to automatically schedule backup jobs to happen, typically during non-production hours. We need full backups at certain times, and incremental or differential backups at other times for efficiency. In the enterprise, we need to appropriately manage the various tape sets to have backup data available for weeks, months, and sometimes even years. And we need to split off archive copies of the backup data to store in a secure offsite location. Most critically, we need to be able to quickly locate and restore the data in the event of data loss, such as in a server failure.
However, tape backup strategies can also introduce new points of failure into our data systems. Tape drives and libraries are expensive, but like all hardware they are prone to failure. Dirty tape heads require cleaning, else they can appear to write data, but never be able to read it again. Tape media wears out, and needs to be periodically replaced. The backup software itself can be rather complicated (by necessity), and is also prone to failure. An irony is that the backup utility designed to protect our server data can sometimes bring the server down, and in rare cases can actually lead to corruption or deletion of the data that it was designed to protect.
This is why we love and hate data backup at the same time. We love it when we really need it – and when it works, to rescue our lost data from oblivion. We hate it when it is problematic – whether due to hardware failure, media failure, or software failure. Yet we cannot comfortably abandon data backup, because we just never know when we might really, really need it.
There are also other drawbacks to tape storage of backup data that have recently come to the fore, receiving international attention when the backup tapes of some well-know financial institutions, carrying sensitive personal data, including Social Security numbers and credit card payment histories, were lost in transit. The "plus" of portability of tape media can easily become a huge "minus" if that data isn't carefully guarded and tracked. Most institutions don't send their backup tapes in an armored car to their offsite storage locations.
These issues are making many enterprise organizations take a hard look at alternatives to the traditional methods of data backup (namely, tape), to find more efficient, more secure solutions.
Data Replication Strategies
Some organizations have virtually abandoned tape backup strategies, in favor of other types of data replication. Some strategies involve servers in different locations working in tandem – synchronizing data in real time, so that if servers on one side go down, the computing load is automatically picked up by the remaining servers in the other location. In this case, data availability is protected against server failure, but the data is not particularly protected against loss or corruption, since any data deleted on one side is also simultaneously deleted on the other side. This strategy is effective when the data is transient, and does not need to be stored and recovered. Otherwise, it must be combined with another data backup strategy.
I spoke to an individual on the fateful day we call 9/11, whose primary data center was in one of the fallen World Trade Center towers. His business pulled Wall Street market data in real time, and provided it to many market analyst organizations. Thus when he lost his primary site in the tragedy, his business continued to operate in the face of grim disaster. Because the data was transient, he didn't require any data backup – he only needed to have data and server redundancy.
Another favored type of data replication can be provided by Storage Area Network (SAN) implementations. Some SANs allow us to replicate data between various storage arrays, either synchronously (in real time) or asynchronously (point-in-time). This also allows us to have our data in two or more places, to allow recovery from either server failure or data loss. And replicating data inside of SANs usually occurs at far greater transmissions speeds than that of traditional tape backup strategies, depending upon the type of link between the storage arrays.
In general, Storage Area Networks are expensive, although not nearly so expensive as they were five or more years ago.
Some SAN implementations also provide disk storage arrays that appear as virtual tape backup devices, so that traditional backup utilities can use the speedy storage arrays instead of tape libraries to backup data in much less time. This reduces the "backup window" of typically non-production hours required to replicate data to the relatively slower tape media. With the declining costs and increasing capacities of disk storage, some organizations are saying "goodbye" to tape, and relying instead on disk storage as the primary medium for data backup.
However, most such organizations will still rely on tape backup to provide archives of data, which can be stored off-site for safekeeping.
Some server backup utilities now offer backup-to-disk capabilities as well, to reduce our reliance on tape media for data protection. If users configure backup-to-disk sets that are internal to the server, then the value of this solution is severely limited. They risk losing the backup sets along with losing the server in many situations. Some people employ removable hard disks, such as USB hard drives, with backup-to-disk utilities, and have essentially replaced the juggling of tapes with the juggling of external disks. For systems that are connected to SANs, the backup-to-disk strategy can provide speedy backups to remote disk arrays, without reliance upon tapes or removable disks.
For organizations that currently cannot afford to invest in Storage Area Networks, especially the more expensive multi-site replicated SANs, there are still other strategies to provide for data replication that reduce our reliance upon traditional tape backup.
Novell's Nterprise Branch Office (NBO) product introduces a noteworthy strategy to typical organizations with a larger central office, and multiple smaller branch office sites, usually having fewer personnel and limited (if any) IT support staff. In the NBO model, the branch office servers typically do not have tape devices, and simply replicate their data to the central office server, using a NetWare version of the RSYNC open-source utility.
The RSYNC service (or daemon) runs on the central office server, listening for (authorized) requests, and the branch office servers execute RSYNC commands to initiate data transfers to the central office server. Thus the branch office sites no longer have to rely on tape devices, tapes, and the personnel needed to regularly change and archive the tape media.
The central office, however, will typically still rely on traditional tape backup, since it holds a copy of all the data for all the branch sites. This provides the needed data redundancy – the data is now in at least three places: on the original branch office servers, then copied to the central office server, and then archived to tape. The tape backup provides the point-in-time backup, allowing us to restore previous versions of data sets. The data replication from branch office to central office typically occurs on a daily basis (usually during non-production hours), effectively replacing tape backups for the branch offices. However, some NBO implementations actually schedule replications to occur as often as every five minutes, to provide near-real-time data redundancy.
One great advantage in the RSYNC utility employed in the NBO solution is that it synchronizes only changes in data, as described below.
Synchronizing Data versus Copying Data
In deciding how to replicate data from one computer to another, there are many options, and different reasons why we might choose those particular options for particular situations. One obvious method is to simply copy the data from one place to the other. When the data set is small in size, a direct copy may be the simplest method.
When our goal is to backup the data to tape media, another obvious method is just to install the tape backup agent or client on the target server, and let the backup media server perform the file requests from the target server, and thus replicate the data to tape.
But a direct copy of data assumes that we want to transfer all of the data each time we initiate a copy command. If the data set is large, like in the multi-gigabyte range, or if our network connection speed is slow, like a T1 or slower link, then this data copy can take a long time.
Likewise, a full backup job will request every single file on the target server volume, whereas a differential or incremental backup job requests only those files that have changed since the last full backup (using each file's archive bit). Even so, the traditional backup to tape is still essentially a file copy, subsequently recorded to tape media. Even the incremental or differential backup is merely a selective file copy.
Synchronization of data, upon the very first replication, is also really no different than a file copy. Every single file in the target data set will be copied to the specified destination, file by file. But with RSYNC, successive replications take a very different path. RSYNC is designed to run as a client-server application, so that an active process runs at each end. When re-synchronizing data (after the initial copy), RSYNC performs rolling checksums on the data both at the client (running the RSYNC command) and at the server (running RSYNC as a daemon). If all the files are exactly the same, then no data needs to be transferred. If new files exist, or if files have been updated, then they will be copied to the target destination.
But RSYNC differs further from even a selective file copy, because RSYNC views the files at the block level. Let's say we have a 100 MB database file that gets updated by its application, but only 1 MB within that file has actually changed. With a traditional file copy, even if differential, the entire 100 MB file needs to be transferred, because the file (as a whole) has changed. But with RSYNC, since it examines the file at the block level, only the 1 MB of changed data within the file needs to be transferred, and then incorporated into the target file at the destination. Only the changed blocks are transferred.
Thus, if I intend to transfer data that is entirely new at each transfer cycle, then a utility like RSYNC buys me nothing. I might as well do a regular file copy. But if I want to synchronize a data set in which only a small percentage of data has actually changed (which is the norm), then a synchronization utility like RSYNC buys me a lot of saved time and bandwidth.
In addition, the implementation of RSYNC in Nterprise Branch Office provides for encryption of the data during transit, using SSL, in case we need to transfer the data using unsecured connections – even public Internet lines.
RSYNC, as mentioned, was devised in the open-source community, and is often used on the Linux platform. It was ported to the NetWare platform specifically for the Nterprise Branch Office product. However, many people are also using RSYNC to synchronize data between NetWare servers, even outside of NBO (see: http://www.novell.com/coolsolutions/tip/658.html or http://www.novell.com/coolsolutions/trench/1865.html). This also opens doors to us for synchronizing data between Linux and NetWare platforms, and even between Windows and NetWare platforms (see: http://www.novell.com/coolsolutions/appnote/14729.html) or between Windows and Linux.
Goodbye at Last?
The primary feature that attracts network administrators to this strategy of synchronizing and replicating data with utilities such as RSYNC is that they can finally say "goodbye" to tape backups, at least at the branch office sites. For those who are employing the backup-to-disk strategies at the central office as well, they may have even said "goodbye" to tape altogether. Other methods of data access and transfer, such as iSCSI (also supported in NetWare), may also start to move in to take the space that was traditionally served by tape backup strategies.
Even with such alternative strategies employed, network administrators are often reluctant to give up tape backup strategies for long-term data archiving purposes, whether due to legal requirements of data retention, or just as a last line of defense, in case all else fails. Of course, tape device and media manufacturers also strive mightily, along with the backup utility vendors, to make their traditional data protection strategies ever faster, more reliable, and more capable. Thus it will probably be some time before we finally say "goodbye" to tape.
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com