Dealing with the Great Blackout of 2003
Novell Cool Solutions: Tip
Digg This -
Posted: 16 Sep 2003
As you probably know, about 60 million people in the Eastern United States and Canada had the great misfortune of losing power on 14 August 2003. A great many of our readers run networks that were affected by this in a number of ways.
The day after the blackout we began a search for stories about how our readers coped with the massive disruption in service, particularly to find out how Novell products performed, and if all the backup and restore mechanisms held up under fire.
We awarded Forever Flashlights to five lucky participants who sent us their stories. The Forever Flashlight uses no batteries or bulbs. Instead it uses Faraday's Principle of Induction and a bright LED to produce light without batteries (you shake it periodically to generate the power). Hopefully this will stand them in good stead when the power winks out again. The lucky winners were Matthew Mello, Dan Niznak, Tim Edmonds, Ron. R., and Don Harris.
How We Did in the Blackout
Wow, what a weekend!
Having electrical power pulled out from under us, at such a critical time as now as we are preparing for a new school year immediately caused the anxiety meter to peg.
Having reliable, 24x7 electrical power is not just a convenience in a school district, but a necessity. It's technology that makes other services happen for our school district, without which we could not open our doors to welcome students and staff for another school year.
Such is the case with NetWare services. Our reliance on NetWare to power our schools' networks is on the same level as receiving reliable electrical power for our buildings. NetWare is an "empowering" technology for networks, which makes other critical services happen for our network users. With NetWare we utilize file/print services, secure Internet access with BorderManager and efficient workstation management with ZENworks. With 8 school buildings connected with a WAN, a staff of 500 employees and a student population of 4200 pupils, we leverage NetWare NDS to provide network services to our community of users.
On the onset of the "great power blackout of 2003" we noticed electrical power was failing in a path across our sites, from south to north across our school district and we began implementing our crisis management procedures. These procedures ensure our technology infrastructure is protected and shutting down properly and can resume services normally when electrical power is restored.
It is noteworthy to point out a couple of things we noticed during these procedures with our NetWare servers which underscore its reliability.
First, several of our servers were in the middle of partitioning operations, whereby subdivisions of the directory tree were being processed. When power was resumed, these NetWare 6 servers were able to resume partitioning operations without hesitation. Our directory tree has thousands of user and server objects.
Second, while remotely monitoring the servers for shutdown procedures, we were pleasantly surprised to see server uptime statistics on the order of hundreds of days. NetWare was providing services for users on these servers without disruption for months until electricity was lost.
Cannot thank you enough for a great product with NetWare. Upon restoration of electrical power, all our servers properly resumed their roles for providing "network power" to our community.
Note: Matthew is the director of technology at a school district in Holly, Michigan.
Because I was not affected by the Great Blackout of 2003 I would like to express my solidarity with all peoples, especially network administrators, fighting with this misfortune.
Interestingly, after downloading my NW 6.5 files from Customer Portal (what a fantastic software!) I cannot start with installations because there is no power in my institution. It is due to some local technical difficulties. It happened for the first time during the last 10 years, e.a. after the Millenium Flood in my town. Best wishes from Poland!
Note: Tadeusz is with the Agricultural University of Wroclaw in Wroclaw, Poland.
Luckily, our power never went out in downtown Buffalo [New York], but I did find this in the APC Powerchute log on our main NetWare server:
100401 08/10/03 08:00:18 Scheduled UPS self-test passed 200002 08/14/03 16:10:43 UPS on battery: Brownout 098.1 V 100300 08/14/03 16:10:44 Normal power restored: UPS on line 200001 08/14/03 16:10:48 UPS on battery: High input line voltage 120.2 V 100300 08/14/03 16:10:50 Normal power restored: UPS on line 200006 08/14/03 16:10:58 UPS on battery: Deep momentary sag 084.5 V 100300 08/14/03 16:10:59 Normal power restored: UPS on line 200007 08/14/03 16:11:07 UPS on battery: Large momentary spike 125.4 V 100300 08/14/03 16:11:08 Normal power restored: UPS on line 200007 08/14/03 16:11:16 UPS on battery: Large momentary spike 127.4 V 100300 08/14/03 16:11:18 Normal power restored: UPS on line 200000 08/14/03 16:11:22 UPS on battery 100300 08/14/03 16:11:23 Normal power restored: UPS on line 200000 08/14/03 16:11:27 UPS on battery 100300 08/14/03 16:11:28 Normal power restored: UPS on line 200200 08/14/03 16:42:14 UPS enabling Smart Boost 100300 08/14/03 17:07:31 Normal power restored: UPS on line 100401 08/17/03 08:00:12 Scheduled UPS self-test passed
Most of the problems were over in less than a minute, but it was an hour until everything was normal. So no dramatic stories, just a tiny bit of data to document how it looked from our corner of the world.
Note: Matthew is the director of information technology at an accounting firm in Buffalo, New York.
We were not in the "great blackout", but as it happens, we also had a blackout last Wednesday for just over 4 hours. But I'm not aware of any problems that have occurred.
After the power came back on, I just ran dsrepair (which found only 3 errors), and all 5 Novell servers were up and running within half an hour. DNS, DHCP, SLP, GroupWise, BorderManager. No problem with any of them.
Note: Johan is an MCNA from the Netherlands.
As the Maintenance Engineering Specialist for the Lansing Michigan P&DC, I can tell you that August 14 certainly turned out to be a long day. One thing in particular that I would like to mention is that we run APC Matrix UPS's for our NetWare and Microsoft file servers and I found it particularly interesting that I received Brownout event notifications almost a full 4 minutes prior to the total loss of power for the city of Lansing from the NetWare servers without hearing anything from the NEW Microsoft Advanced Computing Environment (ACE) servers. After the entire power grid was eventually lost, one of the other technical supervisors who was also on the notification list commented to me that we should add these early notification warnings from our NetWare servers to our emergency response alert plan!
As a side note, We had just replaced all the UPS batteries for our Nortel Meridian phone switch the previous weekend and were hoping for a test window when we could remove facility phone switch power to test our repair. Needless to say that's one job that has been removed from our plant schedule. Our new phone switch batteries were still providing normal phone service operations some 6+ hours after grid failure.
Once the power grid was restored the NetWare servers came back online without a hitch. unfortunately I can't say the same is true of the new ACE servers. The normal nightly backup operation for these new ACE servers has proven to be erratic at best since the lights went out in Lansing. Ah Well, Job security, but I'll take good old NetWare reliability over the new Advanced Computing Environment any day.
Note: Dan works at the United States Postal Service in Lansing, Michigan.
Well, in a nutshell, we had over 300 Gigabytes to restore at our DR site. Before we finished, power came back on and eDirectory took off and picked up where it left off. Seems like someone once told me that was what it is supposed to do, and damned if it did exactly that.
Although it was a long wait, and a lot of people in the organization had to get together to plan how to bring up the house with the patient registration and order entry systems, the NetWare side of the house had little to do while we waited. We knew that the servers would simply come back on-line. We had no worries about the order or whether or not there would be a problem. The biggest concern we had was whether or not the 20 year old UPS provided by the facilities would actually crank back up.
While we waited through 10 hours in the dark, we helped to get the other teams organized. By 2:00AM when the power came on, we had a comprehensive plan. The NetWare servers came up first, since everyone had to authenticate to the network in order to access their applications. Within 20 minutes of our 'go', the backlog was ready to be entered. We spent several hours in support of the organization and their recovery from their own downtime processes. But, as expected, the servers were solid. No additional downtime was experienced trying to recover from failures, because there were none.
On the positive side, we had been (at 4:00PM) been discussing how to get all of the users to restart their PC's to activate a recent patch that had been distributed with ZEN for Desktops... well, at 4:13, they did - whether they wanted to or not... I guess a silver lining can be found. ;)
I was just getting ready to head home for the day, when the Blackout of 2003 happened. Not knowing what exactly occurred, I decided to stay until power resumed. Little did I know that it would be 18 hours later. After the first two hours passed, I figured it was a larger problem then I thought, and decided to head home.
We operate a WAN using NetWare 4.12 and BorderManager, and use GroupWise as our mail client. We have four main offices, running a total of nine servers, all using Novell software. Our system supports over one hundred users throughout the county. Being at Head Office, I was able to power down the three servers located there. I had to rely on our servers and the NetWare OS to ensure that the remaining five servers would shut down properly. It was not until 10:30 AM Friday morning that power was restored in our area. I headed off to Head Office, anticipating working the remainder of the day traveling to our branch offices and restarting/troubleshooting our servers. After restarting the Head Office servers, imagine my relief when I discovered that not only had all the remaining servers restarted upon the resumption of power, but they were all functioning normally. Thanks to Novell, what could have been a nightmare turned into a walk in the park.
Note: Don works at a youth and family services facility in Ontario, Canada.
- DSREPAIR -XK3: How to Use It and Why You Would Use It
- NetWare 6.5 DSREPAIR
- How to restore a NFS volume that abends a server after a power outage
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com