Disaster Recovery, Part 1 - Underwater Data
Novell Cool Solutions: Feature
By Timothy Leerhoff
Digg This -
Posted: 6 Oct 2004
The Disaster (part 1 of 4 articles)
by Timothy A. Leerhoff
It was a glorious mid-November morning. As I was heading to a client, I couldn't help thinking what a great day this is going to be. A warm Friday in Minnesota, sun shining, lighter than normal traffic, a fun project starting?.. Life was great!
Then came "THE CALL."
To give you some background, I am a network consultant and have been for many years. One of my long-term clients is Oakwood Insurance Agency, Inc., where I have been their IS manager for over 10 years. In that time, they have grown from one PC and a loud dot-matrix printer to 8 locations, 1 Novell file server, one NT Terminal server, and one NT management server. We have added a new T1 line to the Internet, allowing the remote offices from as far away as Oregon access to the Applied Management system. I have been at the design end of the whole process, choosing vendors, hardware, software, etc. Needless to say, they are a favorite client of mine.
My cell phone rang as I drove down the freeway; I noticed it was from Oakwood's office manager. I answered it, "Hi Diane, what is up?" Little did I know what was coming.
A very excited Diane exclaimed, "There is water in the basement offices!" Stunned, I was able to get out a, "What?" as I tried to not drive into the ditch.
The server room was in the basement, as were the offices of the office manager and two corporate CSRs. Also, all the paper filing ? all 20 file cabinets of it. Halfway down the stairway, one corporate CSR gaped at the flooded hallway as the realization hit that the "incident" down the street that had all the roads in the area blocked had a more personal ring.
The hall ...
After Diane finished explaining the initial situation, and that she had already called the owner, I told her to NOT let anyone into the water for any reason until a professional had killed power to ALL the downstairs circuits. The last thing that I wanted to see was a headline about someone being electrocuted in a basement flood.
As there was nothing I could do until the water was pumped out so that I could get to my servers, I continued to my client with a more sober look. I checked in for updates throughout the day. The news seemed to continually worsen.
I had initially envisioned an inch or two of water and the pain of getting the carpet dry. As I was more thoroughly informed that at least one step was underwater, I started to worry about the UPS that was sitting on the floor. As I heard the water was down from the peak, I started worrying as I realized that the water was even deeper, and the servers were sitting on a shelf on my baker's rack about 8" off the floor. The news just kept getting worse and worse.
A new swimming pool ...
It seems that a big water main ruptured at 1 a.m. the previous night, breaching into the sewer line, pressurizing it. The water started coming up through all our drains (sinks, toilets, etc.) at the calculated rate of 1000 gallons per minute. It continued for about two hours before the city was able to close down the water main. This yielded us over two feet of water, lots of sand, and whatever else was picked up going through the sewer.
My main concerns were the two main file servers and the big UPS, which were mostly or completely under water. However, the routers, management NT server, switches, hubs, and phone system were high and dry. While the agents and CSRs upstairs couldn't access the management database, they could still answer the incoming calls. At least there was some good news. Not much, but at least there was some.
Servers and UPS ...
Two fire hose pumps (3-inch and 4-inch) ran from 9 a.m. to 9 p.m. to get the water down to what I call a squish level. This is where you can hear the rug squish as you walk but you don't need rubber boots to keep your feet dry.
Assessing the damage, I was on the verge of laughing and crying at the same time. The bathroom stalls were amazingly funny with all the sand buildup around the handicapped toilets. However, the waterline 3/4ths of the way up on the file servers were extremely depressing. Then I heard the owner told the local news media how great I am, and that I would have them up and running on Monday. YIKES! This was going to be a long weekend with very little sleep.
There were a few good things. Three out of four of the PCs on the lower level and their extraneous peripherals were dry, sitting on top of desks. All we had to do is move these to open offices upstairs, and the office manager and the corporate CSRs would be back online with a minimum of effort.
We had prepared for a disaster, or so we had thought. We did nightly tape backups, and we were taking the Friday tapes off-site. What else was there to do?
I squished around the basement collecting the backup tapes, until I discovered one floating in a drawer. Maybe I would skip that one. The external tape drive had me glad, as it was sitting on the middle shelf dry and ready to go. All that might be needed was a new cable since one end was under water. That shouldn't be too tough, or so I thought. Late Friday night I decided I was in fairly good shape as I had come up with a task list for early recovery.
Floating-point data ...
- Do a post-disaster cleanup and function test on the file servers.
- If needed, call the server manufacturer to get an ETA on replacement parts/servers.
- If servers take too long to repair, get approval to purchase a couple of glorified PCs to act as temporary servers until the new ones get in.
- Move PCs, printers, scanners, etc. to the upstairs office space.
- Activate the applicable network drops.
- Move the server room to the upstairs open office space.
- Install OS's on servers.
- Restore all data and test it for integrity.
- Verify the data loss level.
A piece of cake, or so I thought.
The cleanup downstairs started. The sand was shoveled into buckets and hand-carried up the stairs and outside. Meanwhile, the paper files were gathered from the bulging cabinets. We discovered that paper expands quite a bit when soaking up water and can exude a lot of force to metal that is trying to hold it in. The lower drawers were stuffed with files that were dry, so when the water was added the file drawers were internally blown apart, and the fronts were bent trying to hold back the expansion. We ripped off the fronts of the drawers as necessary and pulled the files pulled out by kneeling on the wet, gooey floor.
Uncompressed files ...
I now had an entire company relying on me to do something. I stood in the waterlogged basement looking like a deer caught in the headlights of an oncoming truck. I had to do something, anything that would start to get these guys back online. But which way to move? No time to think, the truck is coming.
What was the priority list? What were the most critical items to get online? OK, the data was the quick and easy answer, but what data? Which parts of the data were most important? Were the backup tapes fully functional? Was the tape drive functional? Were the software and their diskette licenses working? Would the servers function after they dried out? AARRRRRGGGGHHHHH! There were way too many questions and no answers. I thought I had these guys ready for a disaster. I thought I was ready. Wrong on all accounts, and I was about to find out the hard way just how unprepared for a disaster we were.
I had to get the data back online. The primary data was the information in the Agency Management System database. The sources for this data were the drowned hard drives, or one of the backup tapes. Step 1 was to identify the functionality of the servers, specifically the mass storage unit.
The cleanup of the server was about to start. Now that I had the servers out of the water, all I needed to do was to dry them and hope they still worked. Opening the servers nearly brought tears to my eyes, half from the odor, half from the condition of the interior components. All the upper sides of the horizontal surfaces were coated with a fine mud and ah ? whatever else came in with the water flowing through the sewer. The water came up to a point just under the CD-ROM drives. This meant that the interface cards, hard drives, CPUs, and the lower part of the power supplies were wet. I put both servers through an initial alcohol wash to start cleaning out the mud and?ah?stuff, as well as sanitizing the parts that were going to be handled.
After the initial cleaning and drying, it was time to find out if recovery was going to be a minor pain or major surgery. Power lights came on, and I could hear at least one hard drive spinning - good, so far. No screen display, no beeps indicating function, sigh ? time to call a good friend who has disaster recovery experience. I hoped that he could point out a couple of quick items to replace, and we would be up. Dave took the servers to his office to work his inspection and cleaning magic.
After discussing all the space and function needs with the owner and office manager, we decided where everyone and everything were going to move to on the main floor. The office manager called the phone system vendor to help with the move of those extensions to the applicable locations.
Meanwhile, it was time to start getting the Internet link back up. Luckily, all the WAN equipment as well as the hubs, switches, etc. were high and dry. Unfortunately, the room everything was wired into was completely saturated into the equivalent of a pulpy lump. I called in my router vendor to move the setup to the new "filing/server" room. This way they would be able to verify the configuration and functionality after moving the setup to the new location within the building.
Dave called back about the servers. Not good at all. It seems the servers were still powered while in water, even though the UPS would have been under water at the time. This complicated things, as electrolysis transferred gold from the PCI slot contacts to somewhere else in the server, possible causing a shorting of a circuit path. On top of that I learned that the chlorine in the city water bonded with the lead in the solder joints of the motherboard and moved significant amounts of the new lead-chloride to other circuit leads, shorting out many of the digital signals that make a file server function. To sum it up the motherboards and the power supplies were toast. The hard drives needed to be cleaned up, dried in a warm place for up to a month to drive out the miniscule amounts of water that might have entered the drives through the vent port. So, I wouldn't be able to access any data from them for quite a while. The memory was high and dry, so it should work as long as there wasn't a power surge in the drowning server that would have fried it.
Bad chemistry ...
Counting the Costs
It was time to get a new set of file servers on the order fast track. Unfortunately, this kind of event wasn't covered under the 3-year warranty that came with the file servers. Therefore, I would have to order new boxes and pay full price for them. One week was as good a delivery date as I could get from the manufacturer. Much too long to wait ? I would have to find a cheap alternative while waiting. I decided that both the owner and I were on the brink of getting new PCs that this would be a good time to buy them. I could use them as file servers until the real servers were delivered and then convert them back into workstations.
After identifying the specifications for what would be needed, I ran down to the local computer superstore and purchased the two PCs and all the associated hardware and software I would need to complete the job. Or so I thought.
Other articles in this series:
- The Disaster (Part 2) - Hard and Soft Data Recovery
- The Disaster (Part 3) - If You Rebuild It, They Will Come
- The Disaster (Part 4) - Planning for the Future
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com