3.15 Backing Up Retain

When you first setup Retain and dredge the email system you have two identical data sets. Over time however, items are deleted from the email system. That means Retain is the only repository of the data. When it comes to backups: "Two is one, one is none." That means if one fails, a backup remains or not. Depending on the type of your organization the legal ramifications of lost data can be significant.

3.15.1 Where Data Is Stored In Retain

There are only a few major places where data is stored in Retain

  1. Program directory

  2. Archive directory

  3. Index directory (this may be on an external cluster)

  4. Database directory (this may be on an external server)

  5. Office365 CSV files

The Reporting and Monitoring Server data is stored in the database.

The Archive Job configuration is stored in the database.

Each major part of Retain keeps logs of what is happening the primary ones being the RetainServer, RetainWorker, Indexer and the Reporting and Monitoring log as RetainStatServer.

When you are interacting with Retain, depending on what you are doing you are viewing different parts of Retain the data.

  • When you browse messages, you are viewing the metadata of the message that is stored in the database.

  • When you search messages, you are viewing the indexes of the messages.

  • When you open a message, you are viewing the message as saved on disk.

All message content and attachments to messages are stored on disk in the Retain storage area in a directory off of the "archive" directory (Server Configuration | Storage | Advanced). Every message and attachment gets assigned a "hash". Because the byte count of every message and file will be unique, its hash value will be unique. This is how Retain Server determines whether a message and/or attachment has already been processed and stored on disk when an archive job runs. That file's hash value is stored in the Retain database in the t_document and t_attachment tables.

The archive directory uses a load balancing strategy on disk. Thus, off the archive directory you'll find 256 two-digit subdirectories: 00 through FF. Each of those directories have their own set of 256 directories utilizing the same naming sequence (00 through FF). Additionally, those directories also have their own set of 256 subdirectories. Thus, if the filename were B4F05EECB7B21D9014A86C32291C913D190C33394365AC79ED3E1F6849532, you would find it under .../archive/B4/F0/5E.

When a user clicks on a message link in the Retain mailbox - whether from the Browse tab or the Search tab's search result list - Retain finds the file on disk and places the contents in the message window. If the original message was known to have text and the message window comes up blank, the file is missing from the location that Retain thinks it is in. This is extremely rare and usually only happens as a result of moving the archive directory to a new location. In such cases, we find that either the files did not all copy over properly from the old location or the administrator forgot to tell Retain where the new location is at.

3.15.2 Backing Up Retain

The archive directory consists of up to approximately 16.7+ million directories and the archives are stored evenly across them. This makes it hard (if not impossible) for the traditional file-based backup systems to back it up; thus, you either need a disk image (block level) backup or you need to use Micro Focus's Reload for Retain backup solution, which was recently developed due to high demand by Retain customers for a more elegant and simple solution.

The three most critical pieces that must be backed up are the archive data (specifically, the "archive" directory), the Retain database and the configuration files. Losing even one of those pieces would mean that your archive data is completely lost. They are interdependent. Indexes can be recreated, which will only take time.

Virtual Machines

If you are running Retain on a VM and if you are using a purchased version of VMWare at any level, the purchased version provides a disk backup utility. This can be used to back up Retain if the VM's local disks are part of the VM guest itself. However, those backups can also take a lot of time as the data grows, so even in those circumstances, you still may want to consider using Reload for Retain which knows what items are new and backs up only them new items.

If the disks are external to the VM guest, then those disks need to be backed up. This article assumes that the reader understands how disks work with VMs. If you do not fully understand virtual machine concepts, we recommend that you consult with the person that set up and maintains your VM environment.

Storage Path

The storage path(s) is where you'll find your archive files, the indexes, license file, and Retain's backup of the indexes and the ASConfig.cfg.

To find your Retain system's storage path, login to the Retain Server administration web interface (http://[ipaddress/DNS hostname]/RetainServer). Starting at the navigation pane on the left side under "Configuration", click on Server Configuration | Storage. Most customers leave it at the default setting under "Advanced Settings" of "Derive all file locations from the above base path", but you will have to click on that checkbox to have it expand to show all the paths.

The information shown in the Storage tab in the Server Configuration screen is also stored in the ASConfig.cfg.

What to Back Up

Other than backing up the archive directory structure, there are a few areas of Retain that are important to backup that would not require a disk image (block level) backup:

  1. Configurations files

    • ASConfig.cfg

    • Indexer configuration files

  2. Database (critical)

  3. License

  4. Index files (these can be rebuilt but that process can take days, weeks or months during which time your searches will not produce full results)

  5. Office365 address book CSV files.

File Locations

  1. Configuration files.

    • ASConfig.cfg is stored in a directory off of your Retain installation:

      Linux default: /opt/beginfinite/retain/RetainServer/WEB-INF/cfg

      Windows default: [Drive]:\Program Files\Beginfinite\Retain\RetainServer\WEB-INF\cfg

    • Indexer configuration files (the entire directory's contents):

      Linux default: /opt/beginfinite/retain/RetainServer/WEB-INF/solrweb/WEB-INF/cfg

      Windows default: [Drive]:\Program Files\Beginfinite\Retain\RetainServer\WEB-INF\solrweb\WEB-INF\cfg

  2. Database.

    • The location is too varied to mention here. Each customer should know where their Retain database resides.

  3. License.

    • The license directory is located under your storage path.

  4. Indexes.

    • Because the Index directory can be in a constant state of change, it is recommended that you back up the index subdirectory located beneath the backup directory (also found under the storage path).

    • When the Retain maintenance routine runs, it makes a backup copy of the index directory and places it here. The frequency of this backup is configured in the RetainServer interface under Server Configuration | Maintenance.

  5. Office 365 address book CSV files. These files can be found under the CSV path designated in Module Configuration | Exchange Module | Hosted Services.

If performing an upgrade of the Retain software you are strongly advised to manually back up all the files mentioned in this article before performing the upgrade.

Note on backing up during the nightly maintenance cycle:

In Retain 4.0.3 and higher the indexes are optimized during maintenance every night. This may take a hours after an upgrade. During optimization the index directory may grow by 2-3 times as temporary files are created and removed. Backing up during this time is not recommended.