Speeding up Backups on Large SANs
Novell Cool Solutions: Trench
Digg This -
Posted: 4 Aug 2004
We're having backup speed issues. I've tried to tune the SAN to speed up the backups, but it hasn't helped much. What are others doing to speed up backups on large SANs? I'm using Veritas' NetBackup and getting about 10MB/Min backup rates.
OPEN CALL: Hmm, there are probably a lot of tricks we don't know about. If you have any experiences to share on this, please let us know.
- Maribel Nash
- David Ruwoldt Updated
- Doug Dill
- Michael Fratini
- Jon Gerdes
- Eric Williams
- Mathias Braun
- Skip Hefel
- Thomas Salzman
- Jan Wiersma
- Mark A. Akins
- Ben Weeks
- Joe Pampel
- Geoffrey Carman
- Colin Bretagne NEW
Normally when you have a speed issue using backup exec, you need to make sure that the nic speed is set the same on your router and the server.
I'd agree with Maribel that speed/duplex are the most common causes of speeds like that. So hard strap both the switch port and the NIC to whatever is the highest speed they support.
Next you need to determine where the problem really is, so try the following out:
1. Eliminate file serving speed:
File copy to/from the NW server
2. Eliminate the NW client from file serving speed:
FTP (run up ftpd.nlm if necessary) to/from the NW server
3. Eliminate TSAs:
Run up tsatest.nlm and try doing a NW server to NW server test.
If any of these tests show up any glaring deficiencies, then it should be fairly obvious what to look at. So if for example you get a blistering throughput on all of the above, then you know to go looking deeply into your backup program and associated hardware.
We are also experiencing very slow backup with Novell NetWare 5.1 SP5 using HP Data Protector 5.1 on a SAN.
Our SAN layout is as follows:
- HP XP 512 with 4 TB of disk
- NW 5.1 SP5 boxes with emulex LP9002 FC HBA using 2.10c driver.
- 5 x brocade 2800 16 port switches with firmware 2.6.1
- OS zoning, not single HBA zoning
- HP 6/60 LTO 1 tape library with FC-SCSI bridge
We are seeing about the same throughput. After talking with a lot of people we were told that vendors do not update their software against the latest TSA's that are released by Novell. This means the API for TSA that the vendor has used may be 3-4 years old. I do not know the truth of this but this is what was offered to us.
I would also be interested to hear of any success stories with FC backup and Novell.
I would also be interested in SAN success stories as our servers crash a fair bit because of the emulex drivers.
Update from David
After a lot of work we have found the following:
NetWare 5.1 sp5 with
Qlogic QLA2340, driver QL2300.HAM 6.50s
We get about 40 GB/hour. The box is stable on the SAN as well.
NetWare 6.0 sp3
Qlogic QLA2340, driver QL2300.HAM 6.50s
We get about 100 GB/hour. The box is stable on the SAN as well.
We now use single HBA zoning. This has given us a lot better stability with NetWare. When using Emulex cards we still see SAN related crashes for NW 5.1 and 6.x. When using the QLogic cards we have not seen a SAN related crash for NW 5.1 or 6.x.
We have a separate zone for backup to disk.
We have 5 x brocade 2800 16 port switches with firmware 2.6.1.
We have a StorageTek L700E Tape Library
We are still using HP Data Protector 5.x
This shows that NetWare can do it. You just need to work very hard to find a good config.
From our experience SANs for the most part still seem to be a bleeding edge technology rather than a leading edge one.
I have to say, it doesn't sound like you are doing a LAN free backup.
With a SAN you have the option to directly attach the backup device to the SAN and thus, never back up over the LAN. You should get the speed of local backups on all servers.
Xiotech offers scripting to mirror drives, then break the mirror and backup the broken mirror drive. Now it would be LAN free and no need for any open file agents either.
We had tried many backup products but when we switched to SYNCSORT'S BACKUP EXPRESS our backup time was cut in half. Backup Express supports raw data backups which really increases speed. A tape library that will connect to the SAN via fiber channel also really speeds things up.
We too are using NetBackup to backup NetWare 6.0 servers connected to HP/Compaq EMA12000 and EMA16000 Storage arrays. I've been averaging about 13MB/sec but have been able to get 16MB/sec depending on the type of data. The new TSAs have helped along those lines as well as optimizing the backup servers. The tape drives we use are rated at 30MB/sec. I have been able to sustain that rate backing up NetWare servers attached to the same type of cabinets used in the EMAs, but SCSI-connected instead of FC-connected. We spoke with someone from HP and they seemed to confirm what we're seeing.
It's virtually impossible to have it that slow. Basic things to check are:
- Duplex settings on all components involved
- Baseline tuning as for ECBs, directory cache buffer settings (in legacy FS) and name cache (for NSS)
- Code revisions, especially those of TSAs and drivers
With this done it's time to track down the bottleneck. A cool way to accomplish this is tsatest.nlm. First I'd try to run it locally to see what transfer rates you can achieve with and without TSAs excluding the LAN as a factor. Most likely they'll be way better than those 10MB/min.
Next you can run it from a remote server. I'd suggest one located in the same segment as the Veritas host. If the results are still satisfying you've tracked the problem down to either the host box and its devices or the 3rd-party agents which might be involved. In this case it often makes sense to try disabling the "delayed ack" and "nagle" algorithms on the target box.
Another possible culprits are software compression algorithms of 3rd-party agents. Sometimes they are not MPK aware, resulting in a non-contiguous data stream which forces backup devices to stops and repositionings. Apart from that they're usually doing the same things which the tape library devices do anyway, so it's often useful to provide "native" data streams and let the tape drives do the compression with their hardware logic.
Here are the files that we created that help speed up the server's backup. Use at your own discretion!
We are presently using Galaxy's Commvault 4.2. We get anywhere from 5GB/hr up to 40GB/hour. We have been told that we should get up to 150GB/hour. However all the data has to go through a fiber-SCSI bridge that feeds a 8 drive library.
If you have any questions you may contact Skip at HefelSk@advanced-data.com
We have done it a little differently, since we have to support the Novell and Microsoft WAN settings as well as a SANS. I installed EVault onto a separate CPQ 530. To this I hooked up an ADIC Scalar 24. This way I can do a differential and still have a complete backup. I can back up all my servers in 3 hours, even WAN, and then do the tape at 10 AM. I am getting 1,500 -1,800 M /Min on the backup. I am even looking into collocating it for complete disaster recovery.
For speed problems with BackupExec on NetWare 4/5/6 you should first check Duplex settings on your NIC and switch. Also read this Veritas Support Document : http://seer.support.veritas.com/docs/189373.htm
This not only solved problems on a remote backup server but also on a Media server. And it works for all BackupExec versions I have tested so far.
I have Novell, Microsoft and Linux servers and I centrally backup 10 locations to an IBM fastT 700 SAN with a product called Evault. Evault uses the same delta level sync technology as iFolder. In fact it can even compress and encrypt the data using Blowfish encryption as it is sent over you WAN or Internet connetion. This product is a tapeless centralized solution that enables me to backup 21 servers accross the state of Florida in about 3 hours. For disaster recovery I then use an Adic Scaler 24 tape library with two LTO2 tape drives to backup 1.2 Terrabytes of data. Since the tape library is connected via fiber to my centralized SAN I am able to backup at about 1,200 -1,500 M /Min. It's a great solution!
Netware 6.5 Sp1.1 in a 3 node cluster
2 Meg Fiber attached IBM FastT600 storage server
McData Sphereon switches.
HBA's are QLogic 2340's, driver is driver is v6.51.02 1/19/04
Using the IBMSAN.CDM v1.06.05 11/4/02 for multipathing
Using 1 big zone, with storage partitioning on the FastT.
Backing up using SyncSort backup Express (Bex) 2.1.5D and a SAN attached IBM 3582 LTO2 library.
Depending on file sizes (smaller=slower) we're seeing anywhere from 20-52MB/sec backup speeds during jobs with our best sustained speed for an entire job coming in the 44MB/sec range. This is in the coveted 150GB/hr range which I have to say I'm really pleased with. A storage engineer at one of the hardware vendors we used was very surprised by the speed and said many of his other clients would "kill" for that throughput. It was ironic because their records did not show support for our backup application or for NetWare 6.5. (!) Maybe they will now. Like the other gentleman stated, you have to work a bit for the config, but once there it runs incredibly well.
Things that helped us:
- Novell support forums!!
- Novell Tech Support
- And finally at the risk of sounding like an ad, having an SE from Syncsort on site to help with the rollout was huge as well. We were able to run a variety of tests to help us optimize how the app runs and also ID where we needed work and did not waste hours trying to find the right tweaks in manuals or on the phone.
As far as basic testing tools - TSATEST is a big one. You're only as fast as the OS can hand off data! There is also a "testperf" tool which comes with the BEX application that lets you run server to server to see how the network media is performing in a simulated backup stream. Finally, running backups to a null device lets the TSAFS & backup app just do their thing and removes the tape (or disk) device from the equation. This was helpful in our case as well. The tape device is probably the chokepoint on a modern SAN that is running well.
Since the flow control support appears to be spotty in the real world, packets get dropped, once the switch runs out of buffer space, and TCP recovers and asks for resends, so the pile of backed-up data gets larger, and the client falls further and further behind, and more packets get dropped. Leading to lousy performance. Gigabit, since it only supports full duplex, also requires that you select a flow control method, so this is not an issue at Gig. If all your clients are 100 Meg, then you should be running full duplex. PS: We have replicated this on Win NT, NetWare, Linux and Solaris. It is an Ethernet issue, not a NetWare issue, as some people state).
We are working with Veritas on a support incident to understand why TSATEST across the wire demonstrates excellent (or at least acceptable) performance whereas, once we interject the NB client, the performance drops by a factor of 10. TSATEST elimates the Disk subsystem, I/O issues, network issues, and points at the NB clients performance.
I am told by colleagues running NB 5.0 that performance can be much better. But since new clients do not work with older NB servers, I cannot test without a major upgrade.
Thanks for a great site. I have found some great tips and received some great advice online.
Our solution is a two-fold process and it includes a SAN and Backup solution.
Our SAN is as follows.
We have a Compaq SAN with 3 nodes. Each node is a Compaq DL380G3 with 4GB RAM, gigabit nic's, dual fibre cards running secure path for multiple failover. We are running NetWare 6.5 sp2.
Our disk storage consists of 6 trays with 2 x hsg80 controllers for failover. Each tray has 2 power supplies and each power supply plugs into different dedicated ups's. The servers are also running 2 power supplies and are also running to dedicated ups's.
We have the disk storage striped vertically, 6 disks per stripe, and we have 2 stripes in a concat set so we get 12 disks available for storage. Each tray is capable of 80mb/s but as we have striped it vertically we get 80mb/s x 6 throughput totalling 480mb/s.
We have 12 x 72GB ultra320 disks per NSS pool and we have 2 pools. Each pool is then mirrored with NSS mirroring to another SAN in a different building. This is done to ensure that we have an exact copy of our data in a different building if something should happen.
We have 2 nodes in one building and 1 node in the other.
Our backups are running on SYNCSORT'S BACKUP EXPRESS. We do a disk to disk backup of our SAN via a separate backup server that has an external disk storage attached to it that holds 7TB of data. The connection from the backup server to the disk storage is done with a 2GB fiber connection. We then archive the data to a 24 DLT Tape library. The backup solution runs completely independent to the SAN and can be upgraded and taken down for maintenance without loss of service to our users.
We chose Backup Express as we have 10 sites in 8 different countries and managing backups was a nightmare. We can schedule, re-schedule, backup, restore and monitor all these systems from 1 Server in our local office regardless of the speed of our WAN link.
Our backup times have been reduced from 6 hours to 2 hours on some systems thanks to Backup Express using Novell's new TSA on NetWare 6.5 "tsafs.nlm" This tsa way ahead and has fantastic performance.
- Novell Tech Forums
- Cool Solutions
- Syncsort Backup Express
- and Skip Hefel
for great pointers in the right direction.
Any questions please contact me at firstname.lastname@example.org
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com