5.9 Exchange Profile

The job will need an Exchange profile setup to connect to the email system properly.

After the Exchange Module has been configured, the Exchange Profile will be available for configuration. If an Exchange Profile is not configured, jobs cannot be run against the Exchange system.

Click on “Add Profile” and provide a profile name, or select an already existing profile to access the configuration tabs. All changes made on this page must be saved by selecting the “save changes”, disk icon, at the top right of the page. Tabs may be changed and navigated through without affecting new settings, but any move to another page will require saving, or abandoning the changes made.

5.9.1 Core Settings

The core settings consist of an enabled/disabled option which must be enabled for any jobs based on this profile to archive anything.

5.9.2 Messaging System Deletion

For systems where the administrator wishes to have archived messages removed from the system automatically, the Messaging System Deletion option may be used. Messaging System Deletion will remove messages from a mailbox after they are archived, according to the time frame specified in the settings. The amount of time to keep messages is specified in days. The recommended setting depends on the archiving scheme in the system. For instance, if messages are to persist in the system for 30 days, then the system deletion setting should be set to 30 and enabled. A setting of 0 will remove messages from the system as soon as they are archived. Be sure to configure the system before enabling the setting in the profile.

5.9.3 Message Settings

Retain can archive and select specific types of mail and Exchange system items to be archived. The Manage Settings tab provides access to manage those settings.

The Mailbox type specifies whether to include or exclude the available types of mailboxes. Because there can be multiple profiles and jobs, it may be advantageous to archive the Users and Room / Equipment mailboxes separately as needed and appropriate for the system.

The Item Type option specifies the different types of messages found in Exchange that can be archived, and allows the exclusion of or inclusion of the different individual types.

The Item Source option allows administrators to exclude or include messages that have not yet been sent or received, or posted.

The Message Status allows messages which have or have not been read or opened, or marked private or confidential to be archived. The different options in the drop-down menu are as shown.

5.9.4 Scope

The Scope tab dictates the date range Retain will scan in the attached archiving jobs.

Date Range to Scan

The Date Range to Scan instructs Retain to scan for, and archive, messages after, or before, a certain date. This is useful if only specific chunks or areas of mail are to be archived.

New Items: All items that have not been archived by Retain since the last time the job ran.

All Items in Mailbox: All items in the mailbox starting from 1/1/1970, duplicates will be processed but not stored if they already exist in Retain's archive.

Number of days before job start date and newer: Only items from the relative number of days from the time the job began will be archived. E.g. messages that came into the email system less than 7 days ago.

Number of days from job start date and older: Only items before the relative number of days from the time the job began will be archived. E.g. messages that came into the email system more than 7 days ago.

Specify custom date range: Only items between two absolute dates will be dredged.

Specify custom date range relative to job start: Only items between two relative dates will be dredged. E.g. messages that came into the email system between 7 and 5 days ago.

It is recommended to archive all New items.

Advance Flags

Enabling "Don't Advance Timestamp" will not update the timestamp flag. Items that are dredged will still be considered new by Retain the next time the job runs.

This is useful when troubleshooting, but is generally not used for normal jobs.

NOTE:Unlike GroupWise, Exchange does not ensure any compliance when scanning end user mailboxes; users may freely delete their email. The Item store flag does not prevent mail deletion. Only setting a rolling hold on all mailboxes, or journaling and archiving a journaling mailbox guarantees all items have been archived.

5.9.5 Miscellaneous

The Miscellaneous tab allows access to settings detailing how messages are stored and what is archived. Attachments, message information such as the internet headers, and how the data is stored and named, (by folders, year, or year and month), dictate not only the message store structure, but affect the storage size.

Miscellaneous options also allow for the archiving of the ‘recoverable items’. To enable checking and archiving of the ‘Recoverable Items’ for compliance reasons, select the checkbox next to the option.

5.9.6 Advanced

The Advanced tab allows you to limit what is stored by Retain. This must be used with caution as this opens holes for data to be lost through. It is recommended to store everything since storage space is inexpensive.

Advanced Criteria

If you want to be more specific as to what to dredge or not to dredge, add the criteria here. Each line will be logically AND-ed together. Think “Dredge all items where the following is true:” Criteria A AND Criteria B AND Criteria C AND etc.

You may select based on:

  • Subject

  • Sender

  • Recipient

  • Attachment Size (in bytes)

  • Attachment Name

  • Category

And whether they are equal to, not equal to, contain or do not contain the item you specify.

This gives you great flexibility and granularity. It allows you to customize dredges and retention for many different groups, or even individuals.

5.9.7 Folder Scope

By default, we dredge items from all folders. You can specify one or more inclusions or exclusions.

Your choices are:

  • Dredge everything

  • Dredge only these listed folders

  • Dredge everything except these listed folders

How to specify the list of folders to dredge/exclude:

  1. Specify a System Folder (mandatory). Example: Calendar.

  2. You specify a subfolder of that folder (optional).

    Example: entering “old” would mean the folder “old” under “Calendar”.

  3. You can have multiple hierarchies under that with the / delimiter.

    Example: “old/mail” would mean the subfolder “mail” under “old” under “Calendar”.

  4. You specify if the option includes subfolder.

    Example: If you select “old” and “includes subfolder” is unchecked, “Calendar/mail” is selected. If “includes subfolder” is CHECKED, “Calendar/old/mail” would also be selected.

You may now configure Schedules, Workers and Jobs.

5.9.8 Distributions Lists

You can create distribution list in Exchange Admin Center to manage information dissemination. Retain will query Exchange for a list of users in each distribution list. While you can create a distribution list in Active Directory Users and Computers these changes will not be reflected in Exchange therefore Retain will not see them. If you wish to rename a distribution group it needs to be done in Exchange or Retain will not see it either.

Distribution lists can be hidden in Exchange. If a distribution list is hidden, Retain will not be able to see the users associated with the distribution list and will not be able to archive the distribution list. The distribution list will be marked as (hidden) in Job | Mailboxes | Distribution Lists.

Dynamic Distribution Lists cannot be seen by Retain as they only create a user list at the time the message is sent. So, it is more of a filter then a list. Remember to refresh the address book if you wish to see the latest list changes.

5.9.9 Exchange Message Dredging Process Overview

How does Retain get messages from Exchange?

  1. When a job starts, the Retain Worker will query the DNS for the SCP record to the URL of the Active Directory Global Catalog Host.

  2. Then the worker queries Active Directory for the Autodiscover SCP Records and Active Directory returns the Autodiscover URLs. The URLs tell Retain where to connect to autodiscover. There are also some default autodiscover URLs that Retain uses to connect to autodiscover.

  3. Retain then uses autodiscover to connect to the Client Access Server. It is helpful to have an autodiscover SRV record on the DNS to speed up this process.

  4. Once Retain has connected to the Client Access Server (CAS), the CAS uses EWS to connect Retain to the correct Mailbox Server.

  5. Retain uses the impersonation user credentials to enter the mailbox of the user we are attempting to dredge messages from. Retain queries Exchange for messages that meet the criteria set in the job.

  6. Exchange then serves the oldest message that meets the criteria back to the Retain Worker through EWS on the CAS.

  7. The Retain Worker receives the message and opens it to query the Retain Server if the message body or attachments already exists.

    1. If the Retain Server determines that the message is new, then the body and attachments are stored in the archive, the header information and hash is saved in the database with links to the archive and the contents of the message are indexed.

    2. If the message already exists, the database is updated with the header data and linked to the existing data, and the existing message body or attachment is dropped by the worker and the next message is retrieved from the email system.

Troubleshooting Exchange Performance

In general, we have found that acceptable throughput is in the 3-5 messages per second range. In well designed systems with sufficient hardware resources we have seen throughput above 10 m/s. There is definitely an issue if the throughput is less than 3, and we have seen instances of less than 0.1. The first place to look is the worker log.

Mailbox Delays

We are looking for how long it takes Retain to log into each mailbox and when it finds the endpoint which tells us it entered the mailbox.

Search the log for lines containing:

enterMailbox
Discovered endpoint

Now you want to compare the difference in times between these two lines. It should be less than 2 seconds. If it is significantly longer than 2 seconds it is most likely an issue with the DNS not properly serving autodiscover.

2015-09-25 12:00:07,256 TRACE [RTWQuartzScheduler_Archive_Worker-1] com.gwava.caapi.MailboxArchivingStats: enterMailbox: JDoe@RETAIN.GWAVAUTAH
2015-09-25 12:02:14,177 DEBUG [RTWQuartzScheduler_Archive_Worker-1] com.gwava.ews.archiveimpl.process.ExchangeUser: Discovered endpoint: https://ad.test.sys/ews/exchange.asmxscreen

This indicates that there is an issue with how autodiscover is configured in the DNS. It may need an SCP or SRV record.

Message Delays

Another thing to search for are connection failures and retries, which increase each time it fails which can add up to 4 minutes:

search for items

Software caused connection abort: recv failed

EWS request failed: null. Will retry after

2015-07-22 00:25:25,056 TRACE [Thread-1341102] com.gwava.ews.RetainExchangeWebserviceFactory: retry, exception :
javax.xml.ws.WebServiceException: java.net.SocketException: Software caused connection abort: recv failed
at com.sun.xml.ws.transport.http.client.HttpClientTransport.readResponseCodeAndMessage(Unknown Source)
...
at com.gwava.ews.archiveimpl.process.CursorFetchThread.run(CursorFetchThread.java:1334)
Caused by: java.net.SocketException: Software caused connection abort: recv failed
at java.net.SocketInputStream.socketRead0(Native Method)
...
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
... 27 more
2015-07-22 00:25:25,056 DEBUG [Thread-1341102] com.gwava.ews.RetainExchangeWebserviceFactory: EWS request failed: null. Will retry after 2 seconds

This will retry a few times with longer delays untl it aborts. Here we are losing connection to the Exchange server while already in a mailbox. This can indicate that there are issues with either a message attachment or the webserver on the Exchange or CAS servers is unable to serve the item at this time. Go to the message in Outlook or OWA and see if it can be accessed.

If the message can be accessed successfully export it as a .pst and use the PST Importer to bring it into Retain.

If the message cannot be accessed successfully then it will have to be deleted.

Exchange Health

You may also want to check the health of the Exchange server itself.

Performance Monitor

The first thing to check is the performance of the server by going into Performance Monitor to see it is above 80% utilization of CPU, Memory, Disk and/or Network. If they are consistently high you will want to use the various Server health, monitoring, and performance cmdlets to pinpoint the issue

Queues

Another thing to check are the Queues. The mail queues are how Exchange handles mail. You can see they by going into Exchange Tookbox/Queue Viewer. The number of messages in the queues should be low, if there is a queue with hundred or thousands of messages and they are not being cleared then that queue may have a stuck message, which would need to be cleared.

You can also use the Exchange Managment Shell (EMS) to check the status of the queues.

Get-Queues

Mailboxes

Another thing to check are the mailboxes. Performance can degrade if a mailbox has too many messages (~100k). The number of messages is more important then the size of the messages. For large systems you should pipe to a file since this command can exceed the EMS buffer.

Get-Mailbox | Get-MailboxStatistics > c:\mailboxstat.txt

If there is a specific mailbox with issues you may need to repair the mailbox.

Server Health

You can get a quick overview of an Exchange server's health by running this EMS cmdlet:

Get-ServerHealth -Identity server1 | Sort-Object AlertValue | ft Name, AlertValue

Exchange Throttling Policy and Bandwidth/Performance (2013)

Microsoft Exchange 2013 uses client throttling policies by default to track bandwidth for each Microsoft Exchange user and enforce bandwidth limits as necessary. Throttling policies should be turned off for the Retain Service Account, because they can affect the performance of Retain for Exchange when accessing mailboxes with a large number of folders and mail items.

  1. On a computer that hosts the Microsoft Exchange Management Shell, open the Microsoft Exchange Management Shell.

  2. Type these commands:

    1. New-ThrottlingPolicy [give it a policy name of your choosing]

    2. Set-ThrottlingPolicy [policy name from step "a"] -RCAMaxConcurrency Unlimited -EWSMaxConcurrency Unlimited -EWSMaxSubscriptions Unlimited -CPAMaxConcurrency Unlimited -EwsCutoffBalance Unlimited -EwsMaxBurst Unlimited -EwsRechargeRate Unlimited

    3. Set-Mailbox [Retain impersonation user account] -ThrottlingPolicy [policy name from step "a"]

  3. To check the policy run the command: Get-ThrottlingPolicy -Identity [policy name from step "a"] | Format-List

5.9.10 Exchange Throttling Policy and Bandwidth/Performance (2010)

The error indicates that either you have a throttling policy applied or the Exchange server is busy. Microsoft Exchange 2010 uses client throttling policies by default to track bandwidth for each Microsoft Exchange user and enforce bandwidth limits as necessary. Throttling policies should be turned off for the Retain Service Account, because they can affect the performance of Retain for Exchange when accessing mailboxes with a large number of folders and mail items.

  1. On a computer that hosts the Microsoft Exchange Management Shell, open the Microsoft Exchange Management Shell. Find out the default Throttling Policy: Get-ThrottlingPolicy

  2. Type these commands:

    1. New-ThrottlingPolicy [give it a policy name of your choosing] -RCAMaxConcurrency $null -RCAPercentTimeInAD $null -RCAPercentTimeInCAS $null -RCAPercentTimeInMailboxRPC $null -EWSMaxConcurrency $null -EWSPercentTimeInAD $null -EWSPercentTimeInCAS $null -EWSPercentTimeInMailboxRPC $null -EWSMaxSubscriptions $null -EWSFastSearchTimeoutInSeconds $null -EWSFindCountLimit $null

    2. Set-Mailbox [Retain impersonation user account] -ThrottlingPolicy [policy name from step "a"]

  3. Check the Throttling Policy for the "retain" impersonation user: Get-ThrottlingPolicy -Identity [policy name from step "a"] | Format-List

5.9.11 Exchange Journaling Mailbox

Using Exchange Journaling Mailbox is not recommended, but there are some situations were it is an option.

According to a Microsoft technician, they recommend at least 1 journaling mailbox per mail server. Exchange can only effectively support mailboxes under 5 - 10G. Exchange will begin to suffer from performance issues when the Inbox begins to exceed 2500-5000 messages. http://blogs.technet.com/b/exchange/archive/2005/03/14/395229.aspx

This means that, once you enable a journaling mailbox, you should begin archiving its contents and using the Retain option to delete the items from the mailbox once archived. However, if there are delays in getting those journaling mailboxes archived, you should watch the size. If it gets to 5G, turn it off and re-route email to another journaling mailbox until you get all of them archived and emptied out.

  1. Set up a journal mailbox for each mailbox database.

  2. Journaling jobs should have their own Profile with the Scope set to "All messages (ignore date)" and Duplicate Check set to "Try to publish all message (SLOW)" to gather all messages from the beginning of the mailbox. This profile can be used for all journaling mailbox jobs.

  3. Under Job, "Enable Journaling" and "Delete archived items from journal" must be enabled (checked) so that the journaling mailbox is cleared during the job, and choose the journaling mailbox you want archived. Create a separate job for each journaling mailbox.

Important note: As Retain archives the journal mailbox it creates a list of messages to be deleted but will only send the delete request when it exits the mailbox. If the job fails before it exits then the messages won't be deleted. Limiting the scope of the job to allow Retain to finish the job successfully will ensure that the messages are deleted.

How to Transition from Journaling to Rolling In-Place Hold for Exchange Archiving

There are changes you will have to make in Exchange and Retain to make this transition go as smoothly as possible.

Mandatory Exchange Tasks:

  1. Enable Rolling In-Place Hold. You can test that the hold is properly enabled by going into Outlook or OWA and deleting an item, going into the recoverable items dialog and attempting to purge the item. It should end up in the Purges folder which the user cannot see but Retain can. So you will need to run an archive job against it to see it within Search Messages in Retain. In Exchange 2010 you would want to enable Single Item Recovery which allows you to set a rolling duration for holding deleted items.

    Get-Mailbox | Set-Mailbox -SingleItemRecoveryEnabled $true -RetainDeletedItemsFor 90

  2. Disable Journal Rule in Exchange. Once the rolling in-place hold is enabled, you can disable the journal rule in Exchange. https://technet.microsoft.com/en-us/library/bb124264%28v=exchg.141%29.aspx

Mandatory Retain Tasks:

  1. Keep the existing Retain journaling job and allow that to run until the journal mailbox is empty. If you are currently unable to archive your existing journal mailbox(es) because they have become too large for Exchange to manage, there are powershell scripts for transfering mail to another mailbox.

  2. Create New Profile. The primary option to enable is Profile/Miscellaneous/"Include user's recoverable items". With this option enabled Retain will dredge each users recoverable items folder and all items and folders inside it, except the logs found in the Audits subfolder.

  3. Create a New Job(s) If you have multiple Exchange databases we recommend one job per mailbox database and one worker per job so they can run in parallel. (Retain Technical Support has a PowerShell 4.0 script to make this easier)

5.9.12 Large Attachments and/or Messages Cannot Be Archived

Symptoms you may notice when experiencing problems with default IIS limitations:

  • Retention is turned on in GroupWise and messages up to a certain date can't be deleted.

  • Errors on retrieving attachments show in the Worker log.

  • Can see messages that don't have all the attachments in Retain.

  • You may also have difficulty getting larger exports through the web interface (exports larger than 28.6 MB).

  • When logging is set to diagnostic for the Worker you can see errors like this:

15:15:15,668 RetainServerCommunication - Attempt to connect, but Server returned HTTP status (404): Not found (this line is typically repeated several times over the course of 5 minutes) 15:15:15,668 RetainServerCommunication - Giving up...too many retries! 15:15:15,668 ArchiveAttachment - Send a nice healthy blob:Archive: ERROR: Fatal Error Result=AddedEMails: 0, emailID=null, parentID=null 15:15:15,691 JobUtilities - HandleArchivingException

*Note: IIS is not supported by GWAVA. These are suggested methods for allowing Retain to archive large emails through IIS. For further information visit the MicroSoft support pages: http://www.iis.net/configreference/system.webserver/security/requestfiltering/requestlimits Some other useful information can also be found on the IIS forums: http://forums.iis.net/t/1066272.aspx

This may not be as much of an issue in Retain installations that were created with 3.x and newer. The RetainWorker will now communicate, by default, directly to the RetainServer on port 48080 thereby bypassing IIS. If you'd like to change this for an older installation, change the connection address of the worker. See the manual (look up "Worker Configuration") for your particular installation for more information. You may still have this be an issue on your Exchange server when Retain tries to collect from it if there are message attachments or messages that are larger than whatever ISS is set to allow through. This would be a setting on the Exchange side that would need to be changed. Default is 30000000 bytes.

For getting exports out of Retain you can also choose to bypass IIS and use http://(RetainIP):48080/RetainServer. IIS integration is more of a convenience to point users at Retain so that you don't have to deal with port information in a URL and other advantages that this can provide.

IIS, by default, limits the amount of data that can be imported by Retain. You can remove, or at least mitigate, this limitation by changing 4 settings. This example will be assuming you'd like to archive files up to 931 MB.

  1. 1. You'll need to increase the limit on how much data the RetainWorker and RetainServer can push/pull through IIS. You can do that using the following command*:

    1. ** %windir%\system32\inetsrv\appcmd set config "Default Web Site/RetainWorker" -section:requestFiltering -requestLimits.maxAllowedContentLength:1000000000

    2. %windir%\system32\inetsrv\appcmd set config "Default Web Site/RetainServer" -section:requestFiltering -requestLimits.maxAllowedContentLength:1000000000

    3. Current testing indicates that you'll also have to do a blanket statement: %windir%\system32\inetsrv\appcmd set config -section:requestFiltering -requestLimits.maxAllowedContentLength:1000000000

      *Note: the number at the end of the command is the size you'd like to have as the max in bites.

      **Note: the "Default Web Site/RetainWorker" piece may vary depending on your server setup. See the picture in the next section.

  2. 2. If you don't like command line you can also change it through the IIS manager.

    1. Bring up the IIS manager and highlight "Default Web Site"

    2. Double click on "Configuration Editor" as shown in the figure above.

    3. Use the "Section" area drop down box to go to "requestFilterg" as shown in the following figure.

    4. Expand the "requestLimits" section. Change the "maxAllowedContentLength" shown in the next figure to the size (in bytes) you would like to be able to pass though.

    5. Repeat for both RetainServer and RetainWorker.

  3. 3. You may also need to change the timeouts in IIS. To do this:

    1. Open the IIS manager.

    2. Highlight "Default Web Site".

    3. Click on "Limits"

    4. Change "Connection time-out (in seconds):" to the desired time.

5.9.13 Moving Users to a New Exchange Domain

If you need to move your users to a new Exchange domain without changing their email addresses (for example from user@organization.local to user@organization.org) you will need to use the moveMailboxes tool to keep the users associated with their existing archive, otherwise a new archive will be created for all users.

Prerequisites

  1. The new on-premise Exchange system can not have been archived by Retain before.

  2. The users continue to use the same email address, though the UPN may be different.

Procedure

  1. In the Retain Web Console, go to the Exchange module and select configure.

  2. Under the Impersonation tab, enter the new impersonation user credentials.

  3. Under the Exchange Forest tab, reconfigure the settings to the new Exchange system.

  4. Click the Test Connection button to confirm the connection can be made.

  5. Save your changes.

  6. Return to the Module Configuration page and Refresh the Address Book by clicking the Refresh Address Book button. Wait for the refresh to complete.

  7. Open the RetainServer log and tail the log to watch progress of the tool. On Windows a utility program like baretail is useful for this.

  8. Open a new tab and enter the URL: http://<your Retain Server Address>/RetainServer/Util/moveMailboxes.jsp. The page will be blank.

  9. In the RetainServer log when the migration is complete, you will see the message "MoveMailboxes: mailboxes moved: [amount of mailboxes]. Process Complete."

  10. Re-index all messages. In the Retain Web Console, go to Server Configuration | Index and press the Re-index All Messages button. This may take significant time in larger systems and search will be limited as the re-index is going on.

  11. Once re-indexing is complete, archiving can resume normally.

When the users log into Retain they will see two folders one with the mails from the original Exchange system and the other with mail from the new system. They have different system IDs so cannot be combined seamlessly