2.2 Planning for Retain Hardware

There are four major considerations you need to take into account when designing the hardware for a Retain system:

2.2.1 Network Bandwidth

The Worker queries your messaging system for messages and receives all of them. However, not all items are subsequently sent to the Retain Server.

If the link between the Worker and the messaging system is slow, consider placing the Worker on the messaging system's server or on a server that has a fast link to the messaging system.

The downside to this strategy is software updates.

When upgrading Retain software, you must update each Worker. Workers running on the Retain Server or on a separate server are upgraded together.

2.2.2 CPU Requirements

Retain is multi-threaded and able to make use of multiple CPU cores. The base server uses 4 threads, and the Indexer starts with 3 threads. If more than 7 CPU cores are available, additional Indexer threads are spawned. The basic formula is [cores - 4 (minimum 3)].

Cores	Retain Server Threads	Indexer Threads
2	1	3
3	2	3
4	3	3
5	4	3
6	4	3
7	4	3
8	4	4
9	4	5
10	4	6

HINT:Micro Focus Testing has determined that 8 CPU cores is optimal for performance gains, allowing Retain 4 threads and the Indexer 4 threads.

2.2.3 Planning for Disk Storage

If not monitored, Retain can completely fill its allocated archive storage.

Although Retain warns of disk-full conditions, you are responsible to keep the storage from filling up completely.

Once storage is full, recovery is difficult because server performance is heavily impacted.

It is critical that you design your system so that you can easily add storage as the system grows.

Retain’s success depends on a robust storage design.

Install the OS on its own partition so that it’s easier to recover from a disk-full condition.

Make sure you have a comprehensive backup strategy for Retain Backing Up Retain.

Planning Your Archive Size - Archive Files (BLOBs)

As you begin planning your Retain archive, we recommend that you start with the current size of your post offices and other systems, then multiply that by your system’s yearly growth rate and add that amount to cover at least one year, it not two.

It isn’t possible to predict how much archive space requirements will increase over time, but at least this sets a good starting point for your initial archive and growth in the near term.

If you have a virtualized environment, you can allocate more space than you think you will be used and thin provision the disks.

Retain archiving is designed so that only one copy of a message or attachment is archived no matter how many users receive it, or which post office they belong to.

Retain lets you expire and delete messages from the archive after a specified time period.

Database Size

For cloud deployments, we typically set the db partition to 500 GB and go from there.

If a partition runs low on disk space at any point, support can direct you on the proper steps to move the data to another partition if necessary.

The numbers provided in the following table are representations of three different systems. Two customers with the same number of messages in their system may have vastly different database sizes due to the difference in the message metadata.

For example, Customer A may have short distribution lists while Customer B has a lot of emails with hundreds if not thousands of recipients associated with the messages. The purpose of providing sample data is to illustrate differences.

Example Systems	Deployment A	Deployment B	Deployment C
Message Count	104,976,966	18,261,383	2,699,654
Archive Size	5.3 TB	1 TB	115 GB
File Size per Message in the archive	4.54 KB	4.71 KB	6.21 KB
Database Size	455 GB	82 GB	16 GB
File Size per Message in the database	56.27 KB	64.02 KB	45.06 KB

Choose XFS as the File System on Linux

Micro Focus recommends choosing XFS for Linux servers because it creates iNodes dynamically and performs well.

Micro Focus does not recommend ReiserFS (poor performance with Retain), or Ext3 (iNode inflexibility).

Disk Options

Retain archive jobs are disk-I/O intensive and includes:

Storing message content in the archive
Indexing each message
Updating the database with each message’s metadata
Updating various logs continually

In light of this, here are a few recommendations.

Physical ("bare metal") Server

Physical servers have their own locally attached disks. If there is just one disk, then disk I/O contention negatively impacts performance, especially while jobs are running.

VM Guest on Host With Local Disks

If your VM host has only local disks (NAS or SAN),make sure that you create multiple disks and that each one is on a different datastore if possible.

NAS or SAN

This could be physical server where the storage is mounted/mapped to a NAS or SAN; or, this could be a VM guest where:

The VM guest itself is stored on a NAS/SAN; thus, the VM guest's "local disks" are also sitting on a NAS/SAN; or,
The VM guest itself is stored on the hosts local disks but the "local disks" of the VM guest are on datastores residing on a NAS/SAN; or,
The VM guest is mounting volumes stored on a NAS/SAN.

If the Retain storage is on a NAS/SAN and if the volumes are expandable on the fly, there are so many configurations that recommendations aren’t possible, except to understand what Retain is trying to do and then see what can be done on the hardware end to facilitate best performance.

If it is a NAS/SAN, consider the pipe speed to the storage: 1 gigabit/sec is very slow. On top of that, consider how many disks are in the array, their RAID configuration, and the speed of the disks themselves.

Recommendations

If all the Retain storage is located on the same volume and you run out of space, Retain provides the ability to create additional storage volumes for the archive files. After an additional logical storage volume is created within Retain, all archive files go to the new location.

However, the indexes continue to grow and Retain doesn't have the ability to partition indexes. Some customers have run out of disk space, created new logical storage partitions that point to another volume, but then run into problems with their archive jobs because they are still out of disk space for the indexes. Thus, for logical reasons, you want to have your archive files on a separate volume to begin with, unless the volume containing the archive is expandable on demand.

If it makes sense to do so (based on all the concepts previously discussed), you'll want to separate your archive files from your indexes and from your database, which means two to three other partitions on your Retain Server in addition to your OS partition. If your database is on a separate server from Retain, then only two other partitions are needed; otherwise, you'll want three additional partitions.

Data Partitioning

We recommend dividing up your storage directories onto separate disks, so beyond the OS disk there should be:

Disk 1: Archive
Disk 2: Index (250G start). For best search performance, consider making this a solid state drive.
Disk 3: Logs, xml, ebdb, export, backup, and license (150 - 200G)
Disk 4: Database (if on-board)

Disk 2 should be expandable and you'll want to give it room for the indexes to grow; but, if you cannot do that, then when it runs out of space, you'll simply need to move your index files to another volume with more disk space in the future. For disk 2 - as mentioned previously - you may want to consider an SSD, as that would increase the search performance.

If disk 1 and disk 2 can literally be on different physical disks, then you get some performance gains from that because an archive job writes simultaneously to the archive directory, the index directory, and to the database. If each of those are on different physical disks, then this eliminates disk contention bottlenecks. Smaller systems may not need to be concerned with performance while larger systems that have archive jobs running for hours may want the performance gains.

Using disk 3 for logs is especially helpful for larger systems. If you have 6 Workers averaging 5 - 10 messages per second, expect a RetainServer log of around 60G unzipped. Plan for 150 - 200G for your logs directory. For the initial archive job, the rule of thumb is 10G per day per Worker. If you do not use a third disk, then the logs are written on the OS partition and that could spell trouble. Also, if users access their archives often and perform PDF exports, that can grow as well. The xml, ebdb, and license directories are pretty much static with minimal to no growth. The backup directory is a backup of the index directory and other important items. However, if the disk begins to run out of room, you can copy this data over to a larger disk at some future time and point retain to that new disk.

Finally, if your database is on the Retain Server, you'll want a third or fourth disk for it (depending on whether you decide to dedicate a disk for your Retain logs).

If performance is an issue, you should place all three partitions on different physical disks (or at least a NAS/SAN with many disks that it can swipe across). You should also put the indexes and the database on high speed drives. Your archive directory does not need the performance and can be on less expensive disk media.

Make sure to set the permissions of the new disks correctly in Linux, or the installation fails.

Disk Performance

Knowing that disk I/O is the top issue with archive job performance, it is best to plan out your disk storage accordingly.

Storage design and disk I/O has everything to do with Retain performance as archive jobs are I/O intensive. You have the following processes writing to disk simultaneously:

The indexer to the [storage path]/index
The database (if on the Retain server)
The Retain Server to [storage path]/archive
The Retain Server to the logs directory:
- Linux: /var/logs/retain-tomcat8
- Windows: [drive]:\Program Files\Beginfinite\Retain\Tomcat8\logs

With all of that disk activity, if a single drive is having to handle all of it, then you can see that the performance bottleneck would be disk I/O. However, many modern disk systems involve multiple disks using (i.e., RAID 5 or RAID 10) that write the data across multiple disks. The more disks involved, the more you spread the load and the faster the overall performance. You also have a difference in drives (SATA/SAS/SSD). In those cases, you now are looking at whether the disks are local to the server or in a SAN/NAS.

RAID Considerations

Let's say your server employs RAID 5, which provides better redundancy than, say, RAID 10. If there were 4 disks. As you know, RAID 5 uses an extra parity bit that consumes an entire disk, which leaves it with 3 drives on which to stripe across. If one of those drives becomes unavailable, that leaves you with 2. Striping across 2 or 3 drives doesn't lend for great speed, especially if the disks are lower-end SATA drives.

SAN / NAS Considerations

If on a SAN/NAS, now you are looking at the network link speed as well. You could have very fast drives, but if your link speed is 1 Gb/s, your bottleneck is going to be your link.

The 1 GB/s network link is slower than a SATA 2 or 3 connection (AKA SATA 3 Gb/s and SATA 6 Gb/s.) Your SATA 2 connection (which is now getting to be a pretty old standard) is 3x faster than a 1000 Mb/s network link (or 1 Gb/s network connection). A fast single HDD can saturate a 1 Gb/s connection but not quite a 3 Gb/s connection (SATA 2.0, or SATA 3 Gb/s) with a sequential read/write. 7,200 RPM platter drives usually top out around 160-170 MB/s (or 1.28-1.36 Gb/s).

Measuring Disk Performance

It really comes down to IOPS. Here is a very simple IOPS calculator: http://www.thecloudcalculator.com/calculators/disk-raid-and-iops.html or you can find one of your own.

So, it really comes down to you understanding your underlying disk storage. This article just gives food for thought. If you are running Retain on a VM guest server like most customers do, then you need to also understand your VM host and VM infrastructure. Is the Retain storage viewed by the server OS running on the VM guest as "local" storage? If so, what type of disk system is holding your VM's datastore? If it is not local storage but the server is connecting to external storage, then you need to take a look at the external system's configuration.

Bottom line: Disk I/O performance is key to Retain's performance and there are several areas to investigate where the bottlenecks could be.

In addition to partition considerations, make sure that your storage is reliable. NFS mounts can be problematic, so you may want to shy away from those. NSS volumes are not supported, so do not use them.

2.2.4 RAM

The amount of memory depends on the number of active mailboxes you are archiving, the mail volume, your underlying hardware, and how your Retain system is used.

Let’s discuss the concepts and general guidelines. In most instances, you should experiment with various memory configurations until you find what works best in your environment.

Concepts

Retain runs under Tomcat as shown at the beginning of this article and Tomcat runs on Java. The Retain Server uses the Java "heap" for its memory and the indexer uses the OS memory as well as virtual memory (see the Virtual Memory subsection below). For this reason, you should configure Tomcat/Java with the bare minimum to have it run in an acceptable fashion for you. If logins or Retain in general seems sluggish when in the mailbox or using the web admin tool, you may need more heap. The sweet spot for most systems with a single Worker installed on the local Retain server is 8 GB minimum (xms) and maximum (xmx). You want to leave as much RAM as possible for the Indexer, which uses non-heap RAM.

The amount of Java heap you set depends on the total RAM on your system and the number of Workers you install in addition to the default single Worker. As we grow in customer experience with Retain 4, we adjust this article's memory recommendations accordingly.

Right now, development has suggested 1 - 2GB per additional Worker beyond the 8 GB you normally would give to the Java heap for a system with a single Worker local to the Retain server; however, we've had a customer with 110 million messages with 7 Workers local to the Retain server get away with 8 - 10 GB of RAM, but that is really pushing it. They didn't run under that configuration for more than 24 hours, so we cannot tell whether it would have been successful in the long run.

The installer for Retain 4.0.1 and later tunes Tomcat/Java memory based on total RAM and which Retain components are installed. See the online manual's topic, "Tomcat Memory tuning" (note: that link goes to the 4.0.1 documentation, so if the link doesn't exist in the future, go to the online manual and find that topic). Again, as we learn more from customer experience, the installer's default RAM configuration is subject to change.

If you really want the fastest search performance, load it up with RAM, like 64GB or more. Systems with large numbers of messages (100 million or more) seem to be needing 64 GB of RAM or more. If you have a database system running on your Retain Server along with multiple local Workers, then those decrease the available RAM for the indexer, so you need to take that into account. The indexer wants to cache indexing data into RAM and memory access is much quicker than disk.

General Guidelines

All of this really depends on the priority you place on Retain performance. If a customer is only interested in getting data into Retain and it doesn't matter how long the archive jobs take (as long as they finish within a 24-hour timeframe) nor does the customer care how long it takes to search for messages (because they do not do it that often), then none of this matters.

The key test is how quickly tomcat shuts down and how much memory the OS is sending to swap. If tomcat is shutting down slowly, that's probably an indication that it has code in swap memory that it is having to call off of disk in order to close out. Reserving more memory for the OS should alleviate that problem; thus, reserve a minimum of 4G for the server OS right up front. On some systems, we have had to allocate more, on others, less. So, the key is to try different configurations on your system to see what makes the difference.

Once you have subtracted the OS memory from your total memory, give 2 - 4G of RAM to the database (if the database is on the same server; otherwise, the remainder can go to Tomcat). Note that Tomcat needs a minimum of 2G.

For small systems (1 - 250 mailboxes), 8G of RAM might deliver acceptable performance if that's all you can afford to allocate. Small Retain system can theoretically run on 4G, but performance is unacceptably low in most cases. You really should not go lower that 8G unless you are a very small business and have 0 - 50 mailboxes. You might even want to consider trying 12 to 16G and weigh the performance improvement against the cost. For some, it can make a big difference. For others, it might make no difference because the performance bottleneck is elsewhere.

For medium sized systems (250 - 750 mailboxes), 12 - 16G of RAM should be considered.

For larger systems, 16G should be considered a minimum. Many large systems range from 24 - 48G of RAM. The more mailboxes and mail volume, the more RAM you might consider giving your Retain server. But, again, we have to emphasize that every system is unique and RAM may not be the biggest performance factor for them.

Case in point: We have a customer with 700 users that found allocating 24G of RAM made a big difference. In another case, a customer that had 1,500 users needed only 12G. We have systems with thousands of mailboxes and those systems do benefit from increased memory allocation, but their needs vary.

Tomcat Memory Configuration

Tomcat memory is manually configured. The latest version of Retain sets it to 8G by default. It is an industry best practice to set the minimum and maximum memory values to the same value.

In Linux

You set the Tomcat memory parameters in a file called j2ee found at /etc/opt/beginfinite/retain/tomcat8. See Tomcat Memory Requirements for more detail. Tomcat must be restarted after configuring it.

In Windows

You can set Tomcat parameters by running Programs | Tomcat 8.0 | Configure Tomcat. Go to the "Java" tab to set them. Note, we also recommend setting the stack size to 256k (it defaults to 160k in Windows).

Database Memory Configuration

Since most organizations employing Oracle or MS SQL have someone designated as a database administrator (DBA), they typically understand memory configuration. What they need to know is that archiving speed and user mailbox browsing performance is affected by the amount of memory given to the Retain database.

Virtual Memory

If you have the available disk space, we recommend increasing the virtual memory to at least 50GB. In Linux, this is known as swap. In Windows, this is called the page file. Ideally, this swap or page file should be placed on a fast storage for performance reasons.

2.2.5 VM Configuration

VM (Virtual Machine) NIC Settings

We have found that using VMXNET3 for the network adapter in VMs helps performance.

Virtual Machine SnapShots

We have found that VM snapshots can reduce performance of the Retain Server. Keeping the number of snapshots to a minimum is highly recommended.