Novell Home

A Data Backup and Recovery Strategy

Novell Cool Solutions: AppNote
By Eugene Phua

Digg This - Slashdot This

Posted: 16 Mar 2006
 

A Data Backup and Recovery Strategy
by Eugene Phua

Table of Contents:

Introduction

Data backup and Recovery is probably one of the most essential tasks as a system administrator. In the event of a disaster, a successful recovery of data and services will determine whether you will be getting your raise or you will be looking for a new job. Yet, many administrator do not pay much attention to backup and recovery strategy. Many put absolute faith in the backup software logs that declares that the backup is completely successfully for the day. However, we forget that a successful backup is not the end goal but a successful restore in an event of a disaster is the end goal. Herein lies the trap of false security. In the event of disaster, without a clear recovery and tested plan, administrator may spend the next 2 days trying to figure why data cannot be recovered. It does not help that your users and boss are breathing down your neck. If this is not the situation you hope to be in, this document explains how to implement an inexpensive and elegant data backup and recovery strategy.

This AppNote is written with the following objectives in mind:

  • This AppNote hopes to recommend a solution that will allow administrator to view the data that has been backed up. This will give administrators a peace of mind that they have a verifiable 'good' backup.


  • This AppNote hopes to reduce any instability, abends, slow performance due to your current backup solution.


  • This AppNote hopes to reduce costs of your current backup solution by providing an inexpensive and yet reliable backup solution.

Synopsis of the solution

There are articles written about tape backup strategies versus data replication strategies (see => http://www.novell.com/coolsolutions/feature/15406.html). However, this proposed solution uses both data replication and tape backup to provide redundancy. This section explains how to setup an inexpensive, verifiable and reliable backup:

  1. Setup a server for backup.

    This server will have a tape drive attached to it. The backup software will backup the data that is residing on this server. No Open File Agent is required for this server because no users will be accessing this server. Since this is a dedicated server for backup, the backup can be done anytime during the day. Furthermore, if troubleshooting is required for the backup, the backup server can be shutdown anytime for maintenance because it will not affect the users.

    This server will also be configured both as a iSCSI target and iSCSI initiator. More explanation will be given why this must be done in the later section.


  2. The production data servers should synchronize data to the backup server.

    The backup server will always have a good copy of the production data. This means that the tape backup will only be required on the backup server. This also means that Open File Agent and Backup Agent will not be required on any of the production server. This helps save cost on the Backup Agent and Open File Agent. This will also remove any dependencies, slow performance or abends due to Backup Agent and Open File Agent.

    This solution is inexpensive because it helps cut costs on the Backup Agent and Open File Agent. This solution is reliable because the data synchronization does not cause performance degradation on the production servers. The backup data is verifiable because administrators can view the data on the backup servers.

    The concept sounds simple, but it is effective.

Configuring the Backup Server

Setup and configure iSCSI volumes

  1. If you have not yet already installed a Netware65 or OES server specifically for backup, you need to install one with iSCSI and RSYNC services. It does not have to belong to the same tree unless you want the backup server to provide other services such as an Auditing service.


  2. On the backup server, do the following:
    • LOAD NSSMU > Partitions
    • Press 'Insert' to create a new partition
    • Select the Free Disk space and press 'Enter'
    • Select iSCSI
    • Define the partition size (which will be the size required to backup your production data) and create.


  3. On the backup server, type TON.NCF.

    TON.NCF is already loaded by default in the AUTOEXEC.NCF. In this case, you can type TOFF.NCF and then TON.NCF to reload the iSCSI target NLMs


  4. On the backup server, type ION.NCF

    If ION.NCF is already loaded, you can type IOFF.NCF and then ION.NCF to reload the iSCSI Initiator NLMs


  5. On the backup server console screen, type =>

              ISCSINIT CONNECT [IP Address of backup server]

    You should see a message that says that the iSCSI initiator has established connection with the iSCSI target.


  6. On the backup server, do the following:

    • LOAD NSSMU > Device.

      You should see the iSCSI target device. If not, just type 'F2' to scan for new device


    • Highlight the new device and press 'F3' to initialize the device. After initializing the device, the unpartition space should increase to the capacity of the device.
      Note: Please be careful that you initialize the correct device


    • Highlight the new device and press 'F6' to share the device.
      After sharing the device, the device should indicate that it is 'Sharable for clustering'


    • From the main menu, go to > POOLS


    • Press 'Insert' to create a new pool. Enter Pool name (e.g. BACKUP)


    • Choose the Free Disk space which has been created on the iSCSI Target


    • Confirm the Partition Size


    • From the main menu, go to > VOLUMES


    • Press 'Insert' to create a new volume. Enter volume name. Let the volume name on the backup server be the same as the volume name on the production server. (see *Note2 for explanation)


    • Do not encrypt the volume and the volume should reside on the pool that you created previously (e.g. BACKUP).


    • You may get an error "20896 Error adding volume to NDS". This is a cosmetic error and the volume has been added to NDS.

      Note: The volumes created on the backup server should correspond to the volumes on the Production server. That is, if there are 5 volumes on the production server that you need to backup, there should be 5 volumes on the backup server.

      *Note2: An important consideration is whether the names of volumes on the production server can be the same as that on the backup server. Most of the time, it is good idea that the name of the backup volume and the name of the production server are the same. So in the example above, if the volume on the production server is DATA, the volume on the backup server should also be DATA. The only exception is if your recovery server has a volume that is also called DATA, then you have to give a different name to the volume on your backup server (e.g. DATA_B). What is a recovery server, you may ask. Under the section 'A Simple Case Study: Data Backup & Recovery', the recovery server is designated as the Application server. The purpose of the recovery server is to mount the DATA volume on the backup server to provide file services to the users when production server is down. More details on how this can be done is documented under the section 'A Simple Case Study: Data Backup & Recovery'.


  7. In the AUTOEXEC.NCF, add the following lines in this order (if not already added)

    • TON.NCF
    • ION.NCF
    • ISCSINIT CONNECT [IP address of backup server]

Configure data replication using RSYNC

We want to create data replication service from the production server to the backup server. You could use any 3rd party replication software such as Sync Center (see website for more information => http://www.npsh.hu/sync_en.html). However, in this AppNote, we will configure RSYNC.

  1. Go to SYS:\RSYNC directory. If the RSYNCSTR.NCF file is not already created, then create this file in the SYS:\RSYNC directory with the following information:
  2. sys:rsync/rsyncst
    
    sys:rsync/rsync --progress --address=0.0.0.0 --port=873
    --daemon --config=sys:etc/rsyncd.conf
  3. 1.Go to SYS:\RSYNC directory. If the RSYNCSTP.NCF file is not already created, then create this file in the SYS:\RSYNC directory with the following information:
  4. sys:rsync/rsyncdn Rsync0.0.0.0:873
  5. Go to SYS:\ETC directory. Configure the RSYNCD.CONF as follows. The section 'DATA' indicates that the data volume will be synchronized to the DATA volume on the backup server. Add a new section for every volume that you want to synchronize.
  6. id = nobody 
    gid = nobody
    max connections = 0
    syslog facility = local5
    pid file = SYS:/rsync/rsyncd.pid
    log file = SYS:/rsync/rsyncd.log
    motd file = SYS:/rsync/rsyncd.motd
    
    [DATA]
       path=DATA:/
       comment=
       read only=no
       use chroot=no
       strict modes = no
       transfer logging=yes
       timeout=3600
       use lfs=yes
       hosts allow=[IP address of production server(s)]
       hosts deny=*
  7. Go the AUTOEXEC.NCF and add the following line to file => SYS:\RSYNC
  8. On the server console, type => SEARCH ADD SYS:\RSYNC
  9. On the server console, type => RSYNCSTR.NCF. This will start the RSYNC daemon.

More details on configuration of RSYNC can be found from the following cool solution articles:

Configuring the Production Servers

Configure data replication using RSYNC

  1. On the production server that holds the data volume, type the following line:

              rsync -rRutvP --delete --volume=DATA: / [IP address of Backup server]::DATA

    The following options are defined as follows:

    v ? verbose R - relative u - update P - progress
    r - recursive t - time delete - delete files that don't exist on sender


  2. This command can also be added into the CRONTAB file found in the SYS:\ETC directory. For example, if backup should be done every night at 8pm, the following lines can be added into the SYS:\ETC\CRONTAB file

              0 20 * * * SYS:\SYSTEM\DATABACKUP.NCF

    where DATABACKUP.NCF contains the rsync command. More details on CRONTAB can be found in TID 10024685 =>

              http://support.novell.com/cgi-bin/search/searchtid.cgi?10024685.htm


  3. Load CRON.NLM at the server console.

Backing up Trustees on the DATA volume

To backup the trustees on the DATA volume, use the TRUSTBAR.NLM command. The command can be added to the SYS:\ETC\CRONTAB file to backup the trustees daily

          0 18 * * * TRUSTBAR DATA: -B -V

More information on TRUSTBAR.NLM can be found in TID 10066145

          http://support.novell.com/cgi-bin/search/searchtid.cgi?10066145.htm

Dealing with open files using snapshot

Open files have always been a perennial problem with regards to backup. One way to fix this problem is to use Open File Agent provided by the backup software. The other way to address this issue is to use snapshot. Below is a description of how snapshot deals with the open file issue:

Before the snap can occur, the snapshot function must render the original pool quiescent by briefly halting all data transaction activity when current transactions complete. It temporarily prevents new writes to the pool and flushes the file system cache to make the pool current with existing writes. Any open files are seen by the snapshot feature as being closed once these outstanding writes occur. Then, it snaps the now-stable pool, and allows data transaction activity to resume.
~ taken from page 119 of NetWare 65 NSS Administration Guide

The NSS administration guide can be found at the following website:

http://www.novell.com/documentation/nw65/index.html?page=/documentation/nw65/nss_enu/data/hn0r5fzo.html

Let's assume that the production server has a DATA pool and a DATA volume. We want to create a snapshot of the DATA pool hosted on the DATA pool. To do this, key the following commands on the production server.

  1. To create snapshot: mm snap create DATA DATA DATA_S1


  2. The syntax to create a snapshot is: mm snap create snappool datapool snapname

    The snappool is the pool that I want to take a snapshot of. In this case, it is DATA

    The datapool is the pool that the snapshot should reside. In this case, it is also DATA.

    The snapname is the name of my snapshot. In this case, it is DATA_S1

  3. To activate snapshot: mm snap activate DATA_S1


  4. To mount the data volume on the snapshot pool: mount DATA_SV


  5. To deactivate snapshot: mm snap deactivate DATA_S1


  6. To delete snapshot: mm snap delete DATA_S1

Therefore, what we want to achieve is to create and activate a snapshot before the RSYNC synchronization and then after the synchronization is completed, deactivate and delete the snapshot. To achieve this, modify the SYS:\ETC\CRONTAB to include the following lines:

    0 18 * * * TRUSTBAR DATA: -B -V

    0 19 * * * SYS:\SYSTEM\CREATESNAPSHOT.NCF

    0 20 * * * SYS:\SYSTEM\DATABACKUP.NCF

    0 6 * * * SYS:\SYSTEM\DELSNAPSHOT.NCF

where CREASNAPSHOT.NCF contains the following lines:

    mm snap create DATA DATA DATA_S1

    delay 60

    mm snap activate DATA_S1

    delay 60

    mount DATA_SV

where DATABACKUP.NCF contains the following lines:

    rsync -rRutvP --delete --volume=DATA_SV: / [IP address of Backup server]::DATA

    Note: The volume is now DATA_SV not DATA

where DELSNAPSHOT.NCF contains the following lines

    dismount DATA_SV

    delay 60

    mm snap deactivate DATA_S1

    delay 60

    mm snap delete DATA_S1

A Simple Case Study: Data Backup & Recovery

The above network diagram shows a minimum configuration for any network. Some customer may have only one data server with print, web and other services on that one server. However, this is NOT recommended. Novell recommends having at least 2 servers holding the eDirectory partitions so that if one server crashes, the other server will have a replica of all partitions.

  1. In the above diagram, the Data server holds all the users data. In this simple example, we assume that the Data server only has one DATA volume.


  2. We assume that the Application/Services server provides the print and web services.


  3. The Backup server is newly setup with the tape drive attached to this server. The backup software is loaded only on the backup server without the need for Open File Agent.


  4. For security purpose, the Backup server is hosted in its private LAN so that users will not have direct access to the backup server. This requires the Data and Application servers to have dual NIC.


  5. At 6pm every evening, the Data server backs up the trustees for the DATA volume.


  6. At 7pm every evening, the Data server creates a snapshot of the DATA pool to the DATA_S1 pool and mounts the DATA_SV volume.


  7. At 8pm every evening, the Data server synchronizes the data on the DATA_SV volume to the DATA volume on the backup server.


  8. At 6am every morning, the Data server dismounts the DATA_SV volume and deletes the snapshot DATA_S1 pool.


  9. At 6am every morning, the Backup software can be configured to begin backing up data on the DATA volume on the backup server. If there are any backup problems, the backup server can be troubleshooted and restarted without affecting the users.

Data Recovery in the Event of Disaster

In the worst possible scenario, 2 hard disks on your data server crashes and everything on your data server is gone. You call the hardware vendor and they deliver the hard disk in 2 hours. You reconfigure your Netware server and it takes another 2 hours. You do a restore and it takes 4-6 hours. So in effect, you require at least 8 hours (optimistically) to get your data server operationally ready. So how do you get your data available to your users within one hour? Here's how:

  1. On the backup server, do the following:

              LOAD NSSMU > Volumes. Dismount DATA volume
              LOAD NSSMU > Pools. Deactivate BACKUP pool

    Note: During this period, backup server must NEVER mount the DATA volume or it will cause corruption to the DATA volume.


  2. On the application server, do the following:

    On the server console, type = >

              ION.NCF

    On the server console, type = >

              ISCSINIT CONNECT [IP address of backup server]

    LOAD NSSMU > Pools. Make sure that you can see the BACKUP pool On the server console, type = >

              nss /poolactivate=BACKUP /overridetype=shared

    Note: Under the section 'Configuring the Backup Server' point 6, we discussed whether the backup server should have the volumes configured with the same name as production server. The answer is that generally it is good idea unless the recovery server has the same volume has the production volume. Let me give you an example of this potential problem. Let's say the Application server also has a DATA volume. When the nss /poolactivate command is issued, the pool on the backup server is activated on the Application server. This becomes a problem because one server cannot have 2 volumes with the same name. The volume on the backup server is automatically renamed to a different name (e.g. DATA_1). This becomes a potential problem so you have to be careful about this.

    On the server console, type = >

              mount DATA

    Type 'VOLUMES' to confirm that the DATA volume is mounted.

    To restore the trustees.xml, on the server console, type = >

              TRUSTBAR DATA:\TRUSTEES.XML -R -V


  3. To allow users to use this volume, you will need to modify the login script. Below is an example of what to modify:

              Old login script: MAP ROOT F:=\\DATA_SVR\DATA
              New login script: MAP ROOT F:=\\APPS_SVR\DATA

    You can pre-configure the disaster login scripts as a standby. In the event of a disaster, you will unremarked the disaster login scripts and remarked the production login script.


  4. Ask the users to login in again. If everything is done correctly, the users should be able to access their data within the hour.


  5. Once the pressure is off, you can take your time to recover your data server.

Pretty cool, isn't it!!

Recovery of Print services

So you may be asking what happens if the Application/Services server crashes instead of the data server. How do you recover the services? This is a rather complex question because different services require different approaches and this AppNote does not cater to the recovery of services. However, the recovery of print service will be briefly discussed here. This may not be the only way to recover the print services but I think this is the most painless. To allow the users to print if the print server crashes, you need to do the following preparation work:

  1. On the data server, configure a backup broker, a backup manager and print agents.


  2. On the data server, configure the printer layout (using MAPTOOL) for the backup print agents.


  3. Shutdown the broker and manager until a disaster occurs.


  4. When you want the users to print through the backup print agents, you may want to put an announcement in the login script so that the users are informed of the iPrint website to download the required printers.

Of course, this will only work if iPrint client are installed for your users' workstation and if your users have rights to add printers. So this is a very simplified suggestion to recover print services and it does not cover all contingencies.

Conclusion

This data backup and recovery strategy is NOT a Novell recommended strategy. Rather it is written from experience to help administrators with their backup and recovery issue. As with every strategy, you need to customize this strategy to your environment because this strategy will be different if you have cluster servers or other requirements. Once you come up with a strategy that works for your environment, please test (and test) to verify that your strategy really works in a disaster situation. Regardless of the complexity of your environment, the basic strategies documented here should help you minimize downtime in the event of a disaster.


Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com

© 2014 Novell