2.1 Basic PlateSpin Forge Terms and Concepts

To help you understand the PlateSpin Forge functionality, the following section describes terms and concepts you will see while using the appliance.

2.1.1 Role-Based Access

Role-based access in Forge allows you to create security groups, assign workloads to those groups and then use one of three types of roles to determine who can do what. You first add users to the Host Appliance and designate them as a Workload Protection Administrator, Workload Protection Power User or Workload Protection Operator, each with varying levels of access. Then, in the Forge Management VM, you create the Security Groups, add workloads to those groups and then add the users you want to have access to those workloads, in whatever capacity, to those groups.

Host Appliance Local Administrators and users you specifically add to the Workload Protection Administrators group are automatically added to every security group.

For more information see Managing Role-Based Access.

Table 2-1 Role-Based Access Matrix

 

Administrator

PowerUser

Operator

Add Workload

Allowed

Allowed

Denied

Remove Workload

Allowed

Allowed

Denied

Configure Protection

Allowed

Allowed

Denied

Prepare Replication

Allowed

Allowed

Denied

Run (Full) Replication

Allowed

Allowed

Allowed

Run Incremental

Allowed

Allowed

Allowed

Pause/Resume Schedule

Allowed

Allowed

Allowed

Test Failover

Allowed

Allowed

Allowed

Failover

Allowed

Allowed

Allowed

Cancel Failover

Allowed

Allowed

Allowed

Abort

Allowed

Allowed

Allowed

Dismiss (Task)

Allowed

Allowed

Allowed

Settings (All)

Allowed

Denied

Denied

Run Reports/Diagnostics

Allowed

Allowed

Allowed

Failback

Allowed

Denied

Denied

Reprotect

Allowed

Allowed

Denied

2.1.2 Protection Tiers

Workloads belong to Protection Tiers, which define when and how often replications occur. Protection Tiers also determine how often a protected workload is checked for failure, how many detection attempts to try before triggering intervention, and how many recovery points to keep. Protection Tiers enable a user to create defined plans or templates that are appropriate to multiple workloads instead of configuring each workload’s individual plan.

Schedule full and incremental replications as required by business needs—hourly, daily, weekly, or monthly. Select from predefined Protection Tiers, or create a user-defined Protection Tier.

For more information, see Creating a Protection Tier.

2.1.3 Failover

PlateSpin Forge constantly monitors protected workloads. If no response is detected within a specified period of time, the user is notified that the workload has failed. If this happens, the user can choose to have the recovery workload that is running on the PlateSpin Forge appliance rapidly take its place, resulting in minimal interruption in uptime. The workload can then be restored to either the same or a different host by using the built-in failback function.

Failover is the process of bringing the recovery workload online to replace the failed protected workload. After the first full replication runs, the recovery workload can run on the PlateSpin Forge Appliance Host. This is a temporary solution until Failback can be performed. Recovery points can also be selected during Failover, avoiding corrupted replications.

NOTE:Workloads that have failed over and are running on Forge use as much RAM as the protected workloads they replace. Because PlateSpin Forge has a finite amount of RAM (16 or 32 GB, depending on the model), you might need to temporarily suspend other appliance operations to accommodate the workloads. This could include scheduled replications.

For more information, see Recovery Points and Planning for Failure.

2.1.4 Test Failover

One of the major advantages of PlateSpin Forge is the availability of safe and quick disaster recovery testing. Forge’s unique Test Failover process allows users to test a recovery workload in a safe and isolated configuration that does not conflict with the protected workload or production servers.

The Test Failover process is also unique because it is quick and conducive to frequent testing. Unlike some disaster recovery testing, which can take days to complete, Test Failover provides a clear picture of success or failure in a fraction of that time. PlateSpin Forge reports can be consulted to find out exactly how long a particular test took. Recovery points can also be selected during Test Failover, avoiding corrupted replications.

Use Test Failover to make sure that the recovery workload runs smoothly and fulfills all requirements. When the virtual machine containing the recovery workload is shut down after the test, no changes persist.

In addition to being a safe method of testing, Test Failover is also quick, with a typical test taking less than an hour.

For more information, see Recovery Points and Planning for Failure.

2.1.5 Prepare for Failover

In most cases, you use Prepare for Failover immediately upon receiving notification that a protected workload has failed. The recovery workload starts running on the PlateSpin Forge Appliance Host, but with its network cards mapped to an internal network to keep it isolated from the main (external) network. Aside from this, Prepare for Failover does all the work of a failover. Recovery points can also be selected during Prepare for Failover, avoiding corrupted replications.

After running Prepare for Failover, check that the problem with the protected workload is not trivial, such as the accidental disconnection of a power cord or network cable, or a temporary network outage. If the problem with the protected workload is not easily rectified, perform a Failover to quickly bring the recovery workload onto the main network, thereby replacing the protected workload. If it is determined that the protected workload can be easily brought back online, select Cancel Failover.

This method allows the failover process to start, thus saving valuable time if a failover turns out to be necessary. It also avoids the inconveniences with re-protecting a workload after a premature, unnecessary failover.

For more information, see Recovery Points and Planning for Failure.

2.1.6 Failback

Failover is a temporary solution. The PlateSpin Forge Appliance Host has limited resources and is not intended to host failed-over workloads indefinitely. The recovery workload runs on the PlateSpin Forge Appliance Host to maintain the workload’s function until new server infrastructure is available.

When new hardware is acquired to permanently host the recovery workload, use the PlateSpin Forge failback feature to perform a V2P or V2V conversion. This transfers the recovery workload from the PlateSpin Forge Appliance Host to the new server.

For more information, see Preparing the Failback.

2.1.7 Recovery Points

Recovery points are snapshots of protected workloads, providing even more protection and data integrity than synchronized workloads alone. When recovery points are enabled, a snapshot is taken during every replication. You can keep up to 32 recovery points for each workload. When the maximum is reached, the oldest point is replaced by the newest one, providing a pool of recovery opportunities.

During a Failover, Test Failover, or even a Prepare for Failover, a saved recovery point can be used to restore a failed workload. If a virus or other source corruption is replicated from the source, you can move back in time through the replications to easily find an unaffected workload. Under normal circumstances, the workload would be unrecoverable.

For more information, see Planning for Failure.

2.1.8 Consolidated Workload Protection and Recovery

Protect data center workloads: You can recover multiple physical and virtual protected workloads by using a single PlateSpin Forge appliance. Workloads can be protected across geographically dispersed sites and then rapidly recovered after server downtime or a site disaster. With PlateSpin Forge, you can consolidate recovery platforms to protect workloads without investing in costly duplicate hardware or redundant operating system licenses. In addition to standard file-based or VSS-based replication, high-speed block-level replication options let you protect high transactional workloads, such as e-mail and database servers. Incremental transfers ensure that only changes to source data files are replicated to the PlateSpin Forge remote recovery environment, thereby minimizing WAN usage while meeting recovery point objectives (RPO) with minimal data loss.

Test the integrity of disaster recovery plans and processes: You need to ensure that recovery plans are sound before a disaster occurs, including testing them at least every six to twelve months. Test Time Objective (TTO), or the speed and ease with which a recovery plan can be tested, is emerging as a key measure of recovery effectiveness. One-click test recovery lets users test the integrity of the replication and recovery plan. To perform a test failover, PlateSpin Forge takes a snapshot of the recovery workload and powers it on within an isolated private internal network. This lets users validate the recovery plan, and related business services, without disrupting the production workload. After validation, PlateSpin Forge drops changes that have occurred on the recovery workload snapshot during the testing process and then it resumes workload replication.

Monitor and report on workload replication and recovery functions: Forge’s Web-based interface provides a dashboard that lets you view the status of protection plans as well as manage, monitor, and report on workload protection. If there is production server downtime or a disaster, administrators are automatically alerted via e-mail. They can then take appropriate action simply by clicking a link within the notification e-mail from a PC or a mobile device. Administrators can use Forge’s reporting features to determine actual versus target recovery time, as well as visualize recovery point objectives (RTO and RPO), replication windows, and data transfer rates. Protection logs demonstrate successful replication and recovery tests, providing the audit capabilities required to meet defined service level agreements or regulatory compliance.

Recover workloads using failover and flexible restore options: PlateSpin Forge allows you to power on recovery workloads with a single click and restore to the same or different hardware. In the event of a production server outage or disaster, Administrators can recover protected workloads with a single-click failover that reconnects sessions and then allows PlateSpin Forge to take over the workload. The workload can continue to run as normal on the appliance while the production environment is restored. When the production environment is brought back online, flexible options allow for restoring workloads. If the original production server is repaired and the hardware is intact, users can move the workload from the virtual recovery environment back to the original platform by performing a virtual-to-physical (V2P) workload transfer. If the original hardware cannot be repaired, users can restore the workload with a V2P transfer to new hardware. Workloads can also be easily moved to a production virtual environment (V2V).

2.1.9 Disaster Recovery

PlateSpin Forge consists of a hypervisor Server that hosts the PlateSpin Forge virtual machine. PlateSpin Forge provides disaster recovery by replicating workloads targeted for protection, and storing a virtual machine copy. Initially, PlateSpin Forge performs a full replication of everything in the workload. Subsequent incremental replications copy to the stored virtual machine any files or blocks that have changed since the last replication. These incremental replications keep the copy synchronized with the current state of the protected workload. Users can specify how often and when to perform each type of replication.

2.1.10 Supported Transfer Methods

Forge enables you to select different methods for transferring workload data from the protected source to the Forge appliance.For a list of workload types and conversions arranged by supported transfer mode, see Knowledge Base Article Q20002.

File-Based

The File-Based Transfer method copies data and replicates changes at the file level. During File-Based Transfer, Forge transfers all files from the protected workloads while monitoring them for changes. When the transfer is complete, files that have changed during the transfer are resent. It is recommended, if present, that you stop Microsoft* SQL Server* or Microsoft Exchange Server* services.You can configure the replication to stop these services when using the File-Based Transfer method (see Replication Settings). However, if there are other tools present that manage the back up of these databases, consider leaving services running during the transfer. When the transfer completes, verify that the copied database is current.If file system changes are constant, data transfer is stopped after the tenth pass and a replication progress warning is displayed. File-Based Transfer is appropriate for moderately active Windows-based workloads using NTFS.

VSS File-Based

This Transfer method transfers data at the file level and uses the Microsoft Volume Snapshot Service* (VSS) feature, also known as Shadow Copy, for Windows workloads (Windows 2003 SP1 and above) with applications and services that support VSS. The VSS File-Based Transfer method offers an exact point-in-time copy of the source workload.During VSS File-Based Transfer, Forge takes a VSS snapshot of the protected workload and transfers the data file-by-file.When the initial transfer is complete, the target is powered off. It is powered on again during the next scheduled incremental replication.Use the VSS File-Based Transfer method to reduce service downtime during Windows workload relocations. Database servers, mail servers, and application servers that would otherwise require a temporary service stoppage can be protected by using this Transfer method. This method is also recommended for replications in networks with high latency. Because this is a point-in-time solution, data does not need to be retransmitted as it does with other methods.

WARNING:When using the file-based transfer method with VSS, encrypted files are not included in replications. Encrypted files show up as skipped in the job report, and the replication shows as “completed with warnings”. This is not an issue with block-based transfers.

Block-Based

The Block-Based Transfer method copies data and replicates changes at the block level instead of replicating an entire file.During data transfer, changes on the protected volumes are monitored and continuously retransferred at the block level until full synchronization is achieved.Because the Block-Based Transfer method transmits only changed blocks rather than entire files, it transfers significantly less data.Use the Block-Based Transfer method when you want to reduce the service downtime during Windows workload replication. Using the Block-Based Transfer method, you can replicate critical database servers, mail servers, and application servers with large databases (more than 5 GB) and with high disk activity. In addition, the Block-Based Transfer method is recommended for networks with high latency because the size of block-level changes is significantly smaller than an entire file (when file-level changes are detected during file-level data transfer, the changed files are transferred in their entirety).If your protected workload is running Microsoft Exchange Server 2000 and 2003, and Microsoft SQL Server 2000, the Windows services of these applications are automatically detected. You can configure the replication to stop these services when using the Block-Based Transfer method (see Replication Settings). However, if there are other tools present that manage the backup of these databases, consider leaving services running during the transfer. When the transfer completes, verify that the copied database is current.Block-Based Transfer is handled by the Block-Based Transfer Component, automatically installed on the protected workload. Because it operates in kernel mode, it requires the protected workload to reboot to initialize.

VSS Block-Based

This Transfer method transfers data at the block level and uses the Microsoft Volume Snapshot Service (VSS) feature, also known as Shadow Copy, for Windows workloads (Windows 2003 SP1 and above) with applications and services that support VSS. The VSS Block-Based Transfer method offers an exact point-in-time copy of the source workload.During VSS Block-Based Transfer, Forge takes a VSS snapshot of the protected workloads and transfers the data block-by-block.When the initial transfer is complete, the target is powered off. It is powered on again during the next scheduled incremental replication.Use the VSS Block-Based Transfer method to eliminate service downtime during Windows workload relocations. Database servers, mail servers, and application servers that would otherwise require a temporary service stoppage can be protected by using this Transfer method. This method is also recommended for replications in networks with high latency. Because this is a point-in-time solution, data does not need to be retransmitted as it does with other methods.

2.1.11 Fine-Tuning Data Transfer Performance

You can fine-tune data transfer during replication for optimum performance over your network. The specifics of functionality and configuration procedures depend on the data transfer method selected for a particular job. See Supported Transfer Methods.

Fine-Tuning File-Level and VSS-Aware Block-Level Transfer Performance

You can fine-tune your over-the-network data transfer for optimum performance in your specific environment. For example, you might need to control the number of your TCP connections or impose a packet-level compression threshold.

This functionality is supported for replications that use the following data transfer methods:

  • File-level

  • Block-level with the Microsoft Volume Shadow Copy Service (VSS) option selected

Fine-tuning is done by modifying the product’s productinternal.config configuration file, located on your Forge host in the following directory:

..\PlateSpin Portability Suite Server\Web

Below is a list of the configuration parameters with two sets of values: the defaults and the values recommended for optimum operation in a high-latency WAN environment.

Table 2-2 Parameters for Fine-Tuning File-Level Data Transfer Performance

Parameter

Default Value

For High-Latency WANs

fileTransferThreadcount

Controls the number of TCP connections opened for file-based data transfer.

2

4 to 6 (max)

fileTransferMinCompressionLimit

Specifies the packet-level compression threshold in bytes.

0

(disabled)

max 65536 (64 KB)

fileTransferCompressionThreadsCount

Controls the number of threads used for packet-level data compression. Ignored if compression is disabled.

Because the compression is CPU-bound, this setting might have a performance impact during Live Transfer.

2

n/a

fileTransferSendReceiveBufferSize

TCP/IP window size setting for file transfer connections; controls the number of bytes sent without TCP acknowledgement, in bytes.

When the value is set to 0, the default TCP window size is used (8 KB). For custom sizes, specify the size in bytes.

Use the following formula to determine the proper value:

((LINK_SPEED(Mbps)/8)*DELAY(sec))*1024*1024

For example, for a 100 Mbps link with 10 ms latency, the proper buffer size would be:

(100/8)*0.01 * 1024 * 1024 = 131072 bytes

0 (8192 bytes)

max 5242880 (5 MB)

Fine-Tuning Block-Level Data Transfer Performance

You can fine-tune over-the-network block-level data transfer for optimum performance in your specific environment by implementing bandwidth throttling and compression.

This functionality is supported for replications that use the block-level data transfer method without the Microsoft Volume Shadow Copy Service (VSS) option.

The system’s default settings for block-level data transfers impose no limitations on bandwidth consumption and do not compress data being transferred. Forge provides two methods for enabling bandwidth throttling and data compression:

  • System-wide: By editing the Forge server’s web.config file. Bandwidth throttling and data compression specified this way apply to all block-level migration jobs, including transfers of complete volume data, as well as incremental synchronizations.

  • Per-workload: By importing a custom Windows Registration (*.reg) file into the source Windows machine’s registry. This enables you to define customized bandwidth throttling and data compression settings for specific workloads to use during replications.

Both methods control bandwidth throttling and data compression on a per-volume basis. Settings specified through the Windows registry override those in the web.config file. Neither method requires a reboot or other intervention for the changes to take effect.

Before using either method to fine tune block-level data transfer performance, determine appropriate compression and bandwidth values that balance CPU usage and network efficiency for your particular system, network, and workload.

Fine-Tuning Block-Level Data Transfer Performance System-Wide

  1. Make sure there are no workload replications underway.

  2. Use a text editor to open the web.config file located on your Forge server host in the following directory:

    ..\PlateSpin Portability Suite Server\Web

  3. Find the following lines:

    <add key="BlockBasedTransferCompressionLevel" value="0" />
    <add key="BlockBasedTransferBandwidthThrottlingInKB" value="0" />
    
  4. Edit the lines:

    1. In the first line, change the compression level value in quotes to a number from 0-9 (with 0 for no compression and 9 for maximum compression).

    2. In the second line, change the bandwidth throttling value in quotes to a number representing kilobytes per second.

      For example, for a required compression level of 3 and bandwidth cap of 512 KB/sec per volume being transferred, the appropriate lines in web.config look like this:

      <add key="BlockBasedTransferCompressionLevel" value="3" />
      <add key="BlockBasedTransferBandwidthThrottlingInKB" value="512" />
      
  5. Save the web.config file.

  6. For the changes to take effect, restart the following services on the Forge server, in the specified order:

    1. World Wide Web Publishing Service.

    2. Portability Suite Service.

    3. PlateSpin Operations Framework Controller.

Fine-Tuning Block-Level Data Transfer Performance on a Per-Workload Basis

  1. Make sure the Block-based Transfer Component is already installed on the source machine.

  2. Use the text below to create a Windows Registration (*.reg) file:

    Windows Registry Editor Version 5.00
    [HKEY_LOCAL_MACHINE\SOFTWARE\PlateSpin\BlockBasedTransfer]
    "CompressionLevel"=dword:00000000
    "BandwidthThrottling"=dword:00000000
    
    

    Replace the last digit of the dword value for "CompressionLevel" with a number from 0-9 (with 0 for no compression and 9 for maximum compression), and replace the dword value for "BandwidthThrottling" with a number representing bits per second (for example, 512 kilobytes per second would be 00512000).

    These values override any settings made in the Forge server’s web.config file.

  3. Use your Windows Registry Editor to import the *.reg file into the Windows Registry.

2.1.12 Terminology

The following are key PlateSpin Forge terms.

Term

Definition

PlateSpin Forge Appliance Host

PlateSpin Forge ships with VMware ESX v3 Server, which hosts the PlateSpin Forge Management virtual machine, as well as the recovery workloads.

PlateSpin Forge Management VM

The management virtual machine containing the PlateSpin Forge software. The IP address of the virtual machine is configured during the initial step; this IP address is used when connecting to the appliance management page by means of a browser.

Workload

The operating system, application, and data stack. All the software components that are necessary for the workload to run and provide its business value.

Protected workload

The workload under protection. Any changes made to the protected workload are reflected in the recovery workload.

Recovery workload

The receiving end of a replication. The recovery workload acts as a bootable backup for the protected workload. It is a virtual machine copy that is stored on, and runs on the PlateSpin Forge appliance.

Recovery point

A point-in-time snapshot, allowing a replicated workload to be restored to a previously good state. See Recovery Points for more information.

Replication

Copying a protected workload so that it can be restored (failover, failback) at a later date.

Server Sync

The incremental replication of a protected workload to an existing or imported virtual machine. Instead of transferring a source server’s entire workload, only the changes are transferred to the protected workload, saving time and bandwidth.

Replication schedule

The schedule that is set up to control the replication of a workload. A replication schedule can include full and incremental replications.

Recovery Time Objective (RTO)

A measure of how long a workload can remain offline in the event of disaster. For PlateSpin Forge, this is the time required to fail over (that is, how long it takes to configure and start the recovery workload). See RTO for more information.

Recovery Point Objective (RPO)

A measure of how long a business can tolerate losing or not having access to data. If a business can tolerate a loss of 10 minutes of data, then its RPO is 10 minutes. If a business cannot tolerate any loss of data, then its RPO is zero. For PlateSpin Forge, this is the interval between incremental replications. See RPO for more information.

Test Time Objective (TTO)

A measure of the ease with which a disaster recovery plan can be tested. It is similar to RTO, but includes the time needed for a user to test the recovery workload.

2.1.13 Events and Tasks

Events can occur during the process of protecting workloads. When an event occurs, a record is added to the event list, which is accessible from the Dashboard or through the Events report.

Tasks are related to events and require user action. An entry on the Task page shows the current state of a workload; allowable actions enable the user to choose the workload's next state. These actions vary according to the task

The following table describes key events, whether or not a task is generated, possible actions, and whether or not an e-mail notification is issued.

Event

Description

Task?

Action

e-mail?

Workload task requires user attention

Generated when a current operation state is changing to "require user activity." This happens during a Test Restore operation, when the recovery workload is ready for examination.

Yes

  • Dismiss

  • Mark Test Failover as successful

  • Mark Test Failover as failed

Yes

Workload task resolved

Generated when a current operation state is changing from "require user activity." This happens during a Test Restore operation, after the recovery workload is shut down.

No

  • Dismiss

No

Workload issue resolved

Generated when a current operation state is changing from "require user intervention" to "running."

No

  • Dismiss

Yes

Workload issue requires user attention

Generated when a current operation state is changing to "require user intervention."

Yes

Workload is offline

Generated when PlateSpin Forge detects that a workload is offline.

Yes

  • Failover

  • Prepare for Failover

  • Dismiss

Yes

Workload is online

Generated when PlateSpin Forge detects that a workload is online.

No

  • Dismiss

Incremental replication did not run at scheduled time

Generated when a scheduled incremental replication is missed because of error conditions

No

Yes

Full replication did not run at scheduled time

Generated when a scheduled full replication is missed because of error conditions

No

Yes