1.9 Backing Up Retain

When you first set up Retain and dredge the email system you have two identical data sets.

When older items are deleted from the email system. That means Retain is the only repository of the data. When it comes to backups: "Two is one, one is none." That means if one fails, a backup remains or not. Depending on the type of your organization the legal ramifications of lost data can be significant.

1.9.1 Where Data Is Stored In Retain

There are only a few major places where data is stored in Retain. See System File Locations in Retain 4.9.2: Planning.

  1. Program directory

  2. Archive directory

  3. Index directory (this may be on an external cluster)

  4. Database directory (this may be on an external server)

  5. Office 365 CSV files

The Reporting and Monitoring Server data is stored in the database.

The Archive Job configuration is stored in the database.

Each major part of Retain keeps logs of what is happening the primary ones being the RetainServer, RetainWorker, Indexer and the Reporting and Monitoring log as RetainStatServer.

When you are interacting with Retain, depending on what you are doing you are viewing different parts of Retain the data.

  • When you browse messages, you are viewing the metadata of the message that is stored in the database.

  • When you search messages, you are viewing the indexes of the messages.

  • When you open a message, you are viewing the message as saved on disk.

All message content and attachments to messages are stored on disk in the Retain storage area in a directory off of the "archive" directory (Server Configuration | Storage | Advanced). Every message and attachment gets assigned a "hash". Because the byte count of every message and file is unique, its hash value is unique. This is how Retain Server determines whether a message and/or attachment has already been processed and stored on disk when an archive job runs. That file's hash value is stored in the Retain database in the t_document and t_attachment tables.

The archive directory uses a load balancing strategy on disk. Thus, off the archive directory you'll find 256 two-digit subdirectories: 00 through FF. Each of those directories have their own set of 256 directories utilizing the same naming sequence (00 through FF). Additionally, those directories also have their own set of 256 subdirectories. Thus, if the filename were B4F05EECB7B21D9014A86C32291C913D190C33394365AC79ED3E1F6849532, you would find it under .../archive/B4/F0/5E.

When a user clicks on a message link in the Retain mailbox - whether from the Browse tab or the Search tab's search result list - Retain finds the file on disk and places the contents in the message window. If the original message was known to have text and the message window comes up blank, the file is missing from the location that Retain thinks it is in. This is extremely rare and usually only happens as a result of moving the archive directory to a new location. In such cases, we find that either the files did not all copy over properly from the old location or the administrator forgot to tell Retain where the new location is at.

1.9.2 Backing Up Retain

The archive directory consists of up to approximately 16.7+ million directories and the archives are stored evenly across them. This makes it hard (if not impossible) for the traditional file-based backup systems to back it up; thus, you either need a disk image (block level) backup or you need to use a backup/restore solution of your choice.

The three most critical pieces that must be backed up are the archive data (specifically, the "archive" directory), the Retain database and the configuration files. Losing even one of those pieces would mean that your archive data is completely lost. They are interdependent. Indexes can be recreated, although that takes time.

Virtual Machines

If you are running Retain on a VM and if you are using a purchased version of VMWare at any level, the purchased version provides a disk backup utility. This can be used to back up Retain if the VM's local disks are part of the VM guest itself. However, those backups can also take a lot of time as the data grows, so even in those circumstances, you still may want to consider using a backup/restore solution of your choice.

If the disks are external to the VM guest, then those disks need to be backed up. This article assumes that the reader understands how disks work with VMs. If you do not fully understand virtual machine concepts, we recommend that you consult with the person that set up and maintains your VM environment.

Finding Retain’s Storage Paths

To find your Retain system's storage paths, do the following:

  1. Log in to the administrative web console (http://ipaddress_or_DNS-hostname/RetainServer).

  2. Under Configuration, click Server Configuration > Storage.

  3. Click Advanced Settings and deselect the Derive all file locations from the above base path (unless it is already deselected).

  4. The list of storage paths displays.

NOTE:The information shown in the Storage tab in the Server Configuration screen is also stored in the ASConfig.cfg file.

What to Back Up

Other than backing up the archive directory structure, there are a few areas of Retain that are important to backup that would not require a disk image (block level) backup:

  1. Configurations files

    • ASConfig.cfg

    • Indexer configuration files

  2. Database (critical)

  3. License

  4. Index files (these can be rebuilt but that process can take days, weeks or months during which time your searches cannot produce full results)

  5. Office 365 address book CSV files.

File Locations

  1. Configuration files.

    • ASConfig.cfg is stored in a directory off of your Retain installation:

      Linux default: /opt/beginfinite/retain/RetainServer/WEB-INF/cfg

      Windows default: [Drive]:\Program Files\Beginfinite\Retain\RetainServer\WEB-INF\cfg

    • Indexer configuration files (the entire directory's contents):

      Linux default: /opt/beginfinite/retain/RetainServer/WEB-INF/solrweb/WEB-INF/cfg

      Windows default: [Drive]:\Program Files\Beginfinite\Retain\RetainServer\WEB-INF\solrweb\WEB-INF\cfg

  2. Database.

    • The location is too varied to mention here. Each customer should know where their Retain database resides.

  3. License.

    • The license directory is located under your storage path.

  4. Indexes.

    • Because the Index directory can be in a constant state of change, it is recommended that you back up the index subdirectory located beneath the backup directory (also found under the storage path).

    • When the Retain maintenance routine runs, it makes a backup copy of the index directory and places it here. The frequency of this backup is configured in the RetainServer interface under Server Configuration | Maintenance.

  5. Office 365 address book CSV files. These files can be found under the CSV path designated in Module Configuration | Exchange Module | Hosted Services.

If performing an upgrade of the Retain software you are strongly advised to manually back up all the files mentioned in this article before performing the upgrade.

Note on backing up during the nightly maintenance cycle:

In Retain 4.0.3 and higher the indexes are optimized during maintenance every night. This may take a hours after an upgrade. During optimization the index directory may grow by 2-3 times as temporary files are created and removed. Backing up during this time is not recommended.