Disaster Recovery, Part 2 - Hard and Soft Data Recovery
Novell Cool Solutions: Feature
By Timothy Leerhoff
Digg This -
Posted: 13 Oct 2004
The Disaster (Part 2) - Hard and Soft Data Recovery
by Timothy A. Leerhoff
Editor's Note: If you missed the introductory article last week, here is a quick synopsis of what occurred:
At about 1:00 a.m., a few hundred feet from a long-time client's office, a large water main ruptured under the street and breached the sewer pipe. According to the city, the flow lasted "only" two hours. For those two hours, over 120,000 gallons of water free-flowed into the lower level of the office. This rendered the water level about thigh deep. All of the Hard Data and the server room were located on that level along with the offices of the office manager, a few CSRs (Customer Service Reps), and a couple of renters.
Remember that the water came into the lower level of the building through the sewer lines, carrying in sand and mud from the rupture of the water main as well as anything that was in the sewer pipes. My opinion is that any sewage, regardless of the percentage, is not to be ignored.
For Part One of the story go to: http://www.novell.com/coolsolutions/nds/features/a_disaster1_edir.html
Back to the disaster ...
(Note: I define "hard data" as physical paper files, notes, etc., and "soft data" as being information stored on the file servers.)
The prioritized immediate needs were:
- Get the office WAN connection to the ISP running.
- Get two servers back online: the Novell NetWare File/Print server and the NT terminal server for remote office connections.
- Get the lower-level personnel operating at desks somewhere upstairs.
- Save the paper files.
I was lucky in one area. The router/T1, hubs/switches, and backup tape drive hardware were all above "sea level," as the high- water mark was dubbed. The biggest issue was that the DHCP/DNS server had drowned. The easy fix was to assign static IP addresses to the PCs and set the DNS servers to the DNS addresses from the ISP. It was a real pain to go to each PC, but at least it was an immediate fix for the internet accessibility issue. Yay, one victory.
Soft Data Recovery
As I stated in the Part One article, the servers from the manufacturer would not arrive for at least one week. It was not an option to be serverless for that long. I went to a local computer super store and bought two new name-brand workstations; these would be makeshift servers until the new boxes from the manufacturer arrived. I bought extra memory and new hard drives for each. The idea was that I would install the server with the extra hard drives, leaving the stock hard drives as originally purchased for an easy return to workstation status.
Then came another road block - I found out that neither would work for the intended purpose. Argh! The terminal server was an NT4, and drivers for that system were not readily available. Furthermore, I did not have time to research and try different driver sets that might work. The PCs had enough propriety hardware that NetWare would not install, either.
Luckily, I had 2 white (generic) boxes at home that would work. First, I had to extract data off the systems and reconfigure my network so the servers' MIA status would not have an effect.
Getting the data restored should have been easy. I had the tape drive hooked up and working, the backup software was installed, and I had the non-drowned tapes in hand. I don't know why I assumed any part of the recovery would be as easy as stated in a book or manual. I suppose it was because we did daily backups, took Friday tapes off-site, and checked the backup logs every day.
The application install directories could be easily regenerated. User data was wanted but not critical at this time. Even the e-mail system was needed but not critical, since the users take information from the e-mails and attach it to the client files in the office management software. And I could download and install the executables for the office management software. But the biggest, most critical item was the databases for the office management system. I didn't think we would need more than two tapes to get it all. I wouldn't need the floating tape shown in the picture.
... just add water ...
Piece of cake. Yeah, right. The next large issue reared its ugly head. The backup software that reported it was successfully backing up all data files to tape had lied to me. Not a single tape had all the files. I went through all of the tapes that I had, retrieving the latest version of each database. Unfortunately, one database was not on a single tape. That database was the client history database. This is the electronic version of the paper files. Yes, I do mean the underwater paper files. Regrettably, the client history was looking like it would have to be re-keyed.
At this point I had the majority of the data recovered to the temporary server. There were no user accounts on the server, however. This was another item that was not on tape. This office assigned the user accounts and their passwords, and we kept an electronic copy of this information. I used one of my utilities to import the users into NDS (eDirectory) and the terminal server.
Ok, it was Sunday night and the data was ready for Monday.
We had just enough available room upstairs to cram in the people from the lower level. We were going to be a close bunch for a little while, whether we liked it or not. The entire office staff was a real bunch of troopers and everyone put up with some discomfort for the duration without complaint. The board room turned into the office manager's office, her CSR's office, and the filing/server room. It was quite a tight fit, but that's the way it was everywhere.
Hard Data Recovery
The next step was the recovery of the paper files. The water came up to the middle of the center drawer of the file cabinets (20 of them). Since every drawer was packed full, there was a lot of water-logged paper. A quick lesson in physics: when paper absorbs water, it expands like a mini-sponge. The forces that can build in a file cabinet drawer packed tight are far more than heavy metal cabinets are able to withstand.
... a major problem with file sizes ...
The drawers wouldn't budge - all the metal workings were broken apart. We removed the fronts of the drawers, and someone had to lie on the muck-covered floor and pull out the files. I was never so happy to be able to say that "that job is not computer-related, sorry."
I think, at some time, that everyone has gotten a paperback book wet. Remember how the paper wrinkled - and then after it dried, the pages stuck together? We had the same scenario with over 60 drawers packed full of paper files. There were thousands of files on our clients' history. This is the same history that was lost due to the backup issue. We had no choice but to do all that we could to save as much of the paper as possible. All of these files had absorbed sewage-laced water, so there was a possible health risk to anyone handling them.
This was happening at the same time the lower level cleanup was proceeding (see next week's article). The lower level was stripped to concrete, and the file folders were dealt out like cards in a gigantic checkerboard pattern. We had one dehumidifier in each room drying the files.
The key for us was that we stopped the drying process when the files were still damp. This prevented the paper sheets from sticking to each other. From there we set up a recovery production line. Two people would pull the needed sheets carefully out of the file, stacking them, and telling the third person the name on the file. The third person made up a new file folder. The fourth person would carefully place the wet paper on a copy machine making a dry copy of each and every needed sheet. The wet paper was disposed of into a waste basket strapped to a dolly. Hey, wet paper is heavy, and I don't need a hernia. The third person would then assemble the new client files.
During this process, 3 or 4 people were handling the sewage-laden files. I made sure they had glasses of water with straws. I also reminded them to not touch their mouth, eyes, or any food. We didn't want to get anyone sick.
At this point it was Friday, one week after the flood, and we were up and running, albeit a little cramped insofar as space was concerned. After talking to the other buildings near our site about their cleanup we discovered that we were doing as well as the others, but it took longer than it should have and we were not able to save all the data.
Other articles in this series:
- The Disaster (Part 1) - Underwater Data
- The Disaster (Part 3) - If You Rebuild It, They Will Come
- The Disaster (Part 4) - Planning for the Future
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com