3.4 Estimating Storage Requirements

It must be understood that no system storage requirement estimation can be expected to maintain or have any kind of reliable accuracy. Future mail use, litigation requirements, and compliance standards all may change and are unpredictable at best. Micro Focus bears no responsibility to accurately define or recommend storage needs for various messaging systems. Different messaging systems have different storage characteristics, and individual implementation renders general calculations invalid.

Keep in mind, however, that storage system performance will dictate mail view, indexing, and data performance. If the storage system is housed on slow hardware employing a file system designed for any other consideration than speed, (e.g. compact data storage), performance may be impacted. Consider what types of files are to be archived and access frequency when choosing the file system. For example, a Retain system dealing with mainly or only archived mobile data (SMS, pin, MMS, &etc.) will perform best with different file system settings than a Retain system archiving a message system with large attachments.

For best results; pair current mail storage needs against projected future needs with the ability to easily add extra storage to the Retain system as needed. The ability to freely add additional storage space grants control and freedom over the messaging system and should be of paramount consideration. This practice is the only course which can be relied on with any confidence. Due to the challenges and circumstances involved with each different system, (and even certain versions of different systems), only individual consideration will provide a reliable baseline for storage needs.

The simplest way to check current disk usage and storage requirement size is to monitor disk space usage on the mail servers and create a projection for the near future for needs.

However, the different options and variables between messaging systems make disk storage estimations so unpredictable that anything other than specific system monitoring cannot determine real disk usage. It is best to create a storage system where additional space may be added as required when existing space is consumed.

In addition, Exchange 2010 has abandoned single instance storage in favor of highly available performance, possibly causing multiple Exchange servers in the system to all have copies of the same data. Retain utilizes single instance storage and may vastly decrease the storage size of a system that heavily utilizes this feature. Due to the differences between the storage and main messaging system, it is nearly impossible to establish a baseline for Retain storage needs. Retain may tremendously decrease the needed size to archive an Exchange 2010 system, or, depending on system size and implementation, it may not significantly decrease the needs of the current system. Though Retain will require additional space to continue archiving mail, the first initial archive job will not exceed the size of the current messaging system.

Consulting with the Retain Sales representative will offer the best tailored information for each system and each implementation of the different platforms available.

3.4.1 Considerations for Storage Requirements

Operating System

The OS will see a number of updates over the life of a Retain server so that should be provisioned for.

Retain Program

The Retain program is updated often as well. This is often placed on the same volume as the OS.

Archive Storage Area

The Archive contains the item bodies and attachments. This is generally the largest percentage of the storage requirements on a Retain server. The bodies and attachments are stored as BLOB (Binary Large OBject) files in a folder structure that starts with /00/00/00 and grows to /FF/FF/FF for 16.7 million directories. BLOB files are stored only once, Retain implements a single-instance storage system, only a single copy of identical items is stored. BLOB files never change, they are only created or removed.

The files are accessed when a message is opened in the Retain Search Message interface or downloaded by Retain Publisher.

Indexes

The indexes allow for fast search of the data in Retain. When searching for items in Retain the indexes are used to return the results.

For best search performance, the indexes should be placed on a fast disk and optimized regularly.

The Index requires periodic optimization. This is set under Server Configuration | Maintenance. The Indexer requires as much free disk space as currently used index space for optimization. It will require three times as much if optimizing during an archive job.

During index migration from Retain 3.x to Retain 4.x, storage requirements for the indexes are the most complex and are described below.

Database

The database contained the header information for each item. For example: Sender, recipient, date received, and so on as well as pointers to body and attached files kept in the archive. Each item may be quite small but may be replicated many times in the database if there are many recipients.

When browsing messages, the items in the database is being viewed. On larger systems, the database is often placed on a dedicated database server.

Logs

The logs track the actions that Retain takes. They can become quite large but are compressed at the end of each day and removed after 10 days, by default. This can be changed under Server Configuration | Logging. Logs can be moved to another volume as described below.

3.4.2 Storage Minimums

Retain is very disk intensive and running out of disk space can very challenging to recover from and can potentially result in data loss. To minimize the chances of this happening certain safeties have been implemented.

The Retain system requires a minimum of 5GB of free disk space or it will enter maintenance mode.

Retain will send warning messages when there is less than 10GB free space for the storage, index, and system volumes.

These minimums are configurable, but it is not recommended to permanently change these settings:

  1. Change to the configuration file directory, by default in:

    Linux: /opt/beginfinite/retain/RetainServer/WEB-INF/classes/config/

    Windows: C:\Program Files\Beginfinite\Retain\RetainServer\WEB-INF\classes\config

  2. Edit the misc.properties file

  3. Change the following settings (in gigabytes) as desired, defaults being:

    discspace.warn.gb=10

    diskspace.error.gb=5

3.4.3 Indexes During Migration

While the migration is performed, a new index will be created, so the current index size will be doubled temporarily while two indexes exist. Once migration is complete, the old index may be removed and space reclaimed.

Also, the new index may be larger than the current index, due to the increased power and abilities of the new indexer. How much change there is, depends on the composition of the current archive, any limits set in indexing attachments, and how many attachments there are in the archive. If there are a lot of attachments, or very large attachments, then the increase in size will be significantly larger than if there are small or a limited number of attachments in the archive. With no limits on indexing attachments and a lot of large attachments in the system, an increase of up to 4 times the current index size is possible. With limits included, the new index may be smaller after upgrading. For the upgrade and a worst-case scenario, ensure that up to 5 times the current index size is available before starting the index migration. (With a current index size of 5 GB, ensure that the volume has a minimum of 20 GB free before starting the migration.) The average increase is expected to be around 20%.

If there is insufficient space to perform the migration, Retain will be placed into maintenance mode until more space is provided. Retain 4 checks for free disk space on the storage, index, and system locations. If there is less than 20 GB free space, Retain will send a warning message every 6 hours. At 10 GB of free space left, Retain will enter maintenance mode and all jobs are disabled. To exit maintenance mode, provide more free space on the volume.

NOTE:It is important to recognize that migration time and performance will largely depend on the performance of the storage system. External storage systems, such as appliances, may 'prioritize' seldom-used data to low performing storage and have a negative effect on migration performance. In addition, file systems of existing storage will largely dictate how fast indexing, migration, and message view can be performed. To ensure high performance, house the storage on performance hardware with a performance file system.

3.4.4 Change the Log File Location

Linux

To change the location of the log files, create a symbolic link (somewhat synonymous with Windows shortcuts). Retain, by default, gives you the option during installation to store the logs at /var/log/retain-tomcat8 or at /opt/beginfinite/retain/tomcat8. If stored at /var/log/retain-tomcat8, then Retain creates a symbolic link at /opt/beginfinite/retain/tomcat8 directory called 'logs" that points to the /var/... location.

  1. Create the directory in the location you wish to use.

  2. Stop tomcat.

  3. Move the current logs to the new location: mv /var/log/retain-tomcat8/* /[path to new directory]

  4. Make tomcat the owner of the new directory path: chown -R tomcat:tomcat /[path to new directory]

  5. Set the appropriate file *permissions for the tomcat user and group: chmod -R 664 /[path to new directory]

    * If you are moving them to a separate volume, the file permissions must be 774.

  6. Create a symbolic link in the parent directory of the default logs directory and point it to the new location:

    1. Change to the /var/log directory

    2. Remove the current log directory: rm -r retain-tomcat8

    3. Create a new symbolic link called retain-tomcat8 that points to your new log location: ln -s [path to new directory] ./retain-tomcat8 (and press ENTER)

  7. Start tomcat.

Windows

  1. Ensure no archive jobs are running and stop Tomcat.

  2. Configure Tomcat's default log location.

    1. Click on Start.

    2. In the "Search programs and files" box, type: configure tomcat

    3. Click on Configure Tomcat

    4. Click on the Logging tab.

    5. Type in the new log path using the standard Windows path utilizing backslashes "\" (i.e., d:\retain\logs) or browse to it by clicking on the button with "..." on it.

    6. Click OK.

  3. Make a backup copy of the existing log4j.properties file.

  4. Edit the log4j.properties located at [drive]:\Program Files\Beginfinite\Retain\[RetainServer, RetainWorker, RetainWorker1, &etc]\WEB-INF\classes.

  5. Do a search on ${catalina.base}/logs/ and replace it with [desired path using forward slashes "/"]/logs/ (i.e., D:/retain/logs/). An easy way to do this is to use the text editor's (i.e., Notepad") "Replace" function.

  6. Repeat steps 2 - 4 for every log4j.properties file (server, worker, stub server, stats server). The only log that will be created at the default log location will be the localhost.[date].log, but it is a very small log.