Novell Home

AppNote: High Availability with the NBM VPN - Client to Site

Novell Cool Solutions: AppNote
By Martin Day

Digg This - Slashdot This

Posted: 28 Oct 2004
 

Cool Solutions AppNote: BorderManager VPN High Availability -- Client-to-Site

Martin Day
mday@novell.com

Abstract:

This article describes how to deploy two (or more) BorderManager VPN servers in a load-balanced and fault-tolerant configuration by using a content switch (or similar switch). The scenario is concerned with a client-to-site (C2S) profile rather than a site-to-site (S2S) profile.

IPsec VPNs

There are various RFCs that deal with IPsec and related protocols with RFC 2401. These RFCs provide a good starting point to begin investigating IPsec behavior. This field is rather complex, but the following key points should provide sufficient background to assist the reader in understanding this article:

  • IPsec provides various security services for traffic at the IP layer in both IPv4 and IPv6 environments.
  • IPsec security is cryptographically based and relies on a separate mechanism for ensuring that cryptographic keys are in place. Usually IKE (Internet Key Exchange) is used for key management. IKE is a public-key approach for automatic key management.
  • IPsec can be implemented within a host or a gateway.
  • Because IPsec operates at the IP layer, its services can be used by any higher layer protocol (TCP, UDP, ICMP, BGP, etc.).
  • IPsec uses two traffic security protocols: AH (Authentication Header) and ESP (Encapsulating Security Payload). The AH protocol provides connectionless integrity, data origin authentication, and an optional anti-replay service. (AH does not provide encryption. The ESP protocol may provide confidentiality (encryption), and limited traffic flow confidentiality. It also may provide connectionless integrity, data origin authentication, and an anti-replay service.
  • AH and ESP may be applied alone or in combination with each other. Both AH and ESP have two modes of operation: transport and tunnel. Both AH and ESP make use of Security Associations (SAs), and a major function of IKE is the establishment and maintenance of SAs.
  • A Security Association is uniquely identified by a triple consisting of a Security Parameter Index (SPI), an IP Destination Address, and a security protocol (AH or ESP) identifier. An SA is "simplex" in operation -- for bi-directional communications, an SA is needed for both directions.
  • A transport-mode SA is a security association between two hosts.
  • A tunnel-mode SA is essentially an SA applied to an IP tunnel. Whenever either end of a security association is a security gateway, the SA must be tunnel mode. For a tunnel mode SA, there is an outer IP header that specifies the IPsec processing destination, plus an inner IP header that specifies the (apparently) ultimate destination for the packet.

An essential element of SA processing is an underlying Security Policy Database (SPD) that specifies what services are to be offered to IP datagrams and in what fashion. The SPD on the IPsec node is consulted to determine how a packet is to be processed. Based on IP and Transport layer header information, a packet can be processed in three ways:

  • It can be afforded IPsec services.
  • It can be discarded.
  • It can be allowed to bypass IPsec.

IKE can provide Perfect Forward Secrecy (PFS), which means the key used to protect transmission must not be used to derive any additional keys. This increases security at the cost of performance.

Exchange of information within IKE occurs in two phases. In Phase 1, two peers establish a secure, authenticated communication channel. A Phase 1 exchange can use Main Mode or Aggressive Mode. In Phase 2, SAs are negotiated on behalf of services -- in this case, IPsec. A Phase 2 exchange uses Quick Mode.

The BorderManager 3.8 VPN Service

BorderManager has used IPSEC for its VPN solution for some time. However, the key exchange protocol it used, SKIP (Simple Key management for Internet Protocol), was not widely adopted by the industry. As a result, BorderManager 3.8 now also utilises the widely supported IKE (Internet Key Exchange) protocol.

Generally, IKE relies on either shared keys between client and server or, in more scalable environments, X.509 digital certificates in order to authenticate the end-nodes. Authentication options beyond these are not standardized, so vendors implement their own authentication solution extensions.

BorderManager uses Novell Modular Authentication Services (NMAS) for extended VPN authentication requirements. NMAS supports more than 50 advanced authentication methods. This provides great flexibility for the future -- any advanced authentication method deployed internally can also be extended for use beyond the firewall to VPN users.

Because BorderManager is acting as a gateway, the only mode it supports is tunnel mode - transport mode is not supported (see IPsec VPNS above). Additionally, because the AH protocol does not provide encryption services it is rarely used. As a result, BorderManager 3.8 supports only the ESP protocol.

Advantages

  • BorderManager is a standards-based IPsec VPN, so it can integrate with other standards-based VPN clients and gateways.
  • The integration with NMAS ensures that BorderManager VPN clients can enjoy strong and flexible methods of authentication beyond the standard methods of shared keys and digital certificates.
  • Traffic policies can be assigned to users, groups, and containers, giving granular identity-based policies.
  • Policy information is held within eDirectory and so benefits from the highly available features that this distributed service provides.

Limitations

VPN session information cannot be shared between gateways. Hence, there is no native automatic fault-tolerant capability. This AppNote helps lessen this problem by describing an alternative solution for providing high availability.

Understanding the Client-to-Site VPN

Step 1 -- NMAS Authentication

In theory, this step is optional - the C2S VPN may be deployed using certificates for user authentication which would invoke IKE directly. However, it is expected that the majority of deployments will use one or more NMAS authentication methods.

This step is very similar to a normal NMAS authentication except that the authentication traffic passes via TCP port 353 instead of using NCP over TCP port 524 as the transport mechanism. The pre-defined filter packet type for this traffic is called "AuthGW".

The client request includes the username, context, and desired NMAS Sequence, comprising one or more NMAS methods). Note: The BorderManager VPN client does not support user-defined clearances at this time. This means that the desired clearance (token+password) must be configured as the user's default clearance in eDirectory.

The user will be prompted for the relevant authentication credentials, although if the NDS password is one of the required methods, then this can be entered directly into the password field of the VPN client rather than waiting for a pop-up dialog. Of course, each vendor NMAS method must be installed on the client machine.

Step 2 -- IKE Negotiation

Following successful NMAS authentication, a session key is derived (using Diffie-Hellman) that is actually used for the pre-shared secret mode of IKE. In other words, this could be considered as NMAS being used to bootstrap the IKE negotiation.

Once the secure, authenticated connection is established, IP address, DNS/SLP settings, and traffic policies are pushed to the client (and so populate the SPD). Security Associations can then be established at the client.

Optionally, the server can mandate that the VPN client is running the Novell Client Firewall via the set parameter 'VPN Requires NCF'. If the client fails this check, the IKE negotiation stage will fail with an error message for the user.

Step 3 -- IPSEC Traffic

After the IKE negotiation, the client is in a position to send traffic across the VPN to the VPN server in accordance with the downloaded traffic policy. As requests from applications are passed down the protocol stack, the VPN driver intercepts them and checks them against the SPD. The SPD indicates how the data should be handled -- discarded, encrypted, or bypassed.

The approach where some traffic is allowed to bypass IPsec is known as "split-tunneling." In this case the client can send both encrypted traffic over the VPN and unencrypted traffic that completely bypasses the VPN function. This is often considered a security risk but it does avoid the problem of sending all traffic destined for the general Internet to the VPN server first and then to the real destination (assuming the VPN server is allowed to do this). This obviously adds latency, complexity at the VPN server, and most importantly consumes network bandwidth at the server end. Many deployments allow split-tunneling but mitigate the risk by deploying personal firewalls and virus scanners at the client.

As stated earlier, traffic to be encrypted will use only ESP with algorithms (3DES, HMAC-MD5, etc.) and session keys derived from the IKE negotiation.

NAT Traversal

The ESP protocol provides various checks for the data transferred via the VPN. An apparent change in the source address of the traffic in transit could be interpreted as an attack and the IPsec communications would fail. To allow this common occurrence, there is a feature known as "NAT Traversal".

During IKE negotiation, if the server detects that the VPN client's IP address was changed in transit (via NAT or a content switch's virtual IP address) then the client is informed that it is necessary to invoke NAT Traversal. In this case, whenever the client is required to send ESP traffic, it first tunnels the traffic via UDP on port 4500. Thus, ESP (protocol 50) is not directly visible on the wire. Instead, UDP datagrams sent to and from port 4500 are observed. When the VPN server receives these datagrams it first strips off the UDP encapsulation before processing the ESP traffic.

Packet Filters

The BorderManager Administration Guide provides a comprehensive set of filters to propogate both C2S and S2S VPN communications. For C2S VPNs, however, it is possible to greatly simplify the packet filters that are needed, both at the BorderManager server (if filtering is employed here, which the author strongly recommends) and at any intervening firewalls.

Note: If a particular deployment is guaranteed to result in some form of address translation between all possible VPN clients and the VPN server (e.g., if NAT is deployed at the server end), then the packet filter for ESP is not required. This is because all ESP traffic should then be tunneled as discussed above.

The suggested filters to support C2S VPN with NMAS authentication as deployed on the BorderManager server itself are listed below. Corresponding filters can easily by added to any intervening firewalls. As client VPN packets arrive on the BorderManager Public interface, and the VPN packet destination is to listeners bound to the Public interface, both the source and destination interface for the packet filters are set to "Public". Additionally, all filters are stateful in order to allow the VPN server to communicate back to the VPN client.

Source Destination  
Interface Address Port Interface Address Port Protocol Comment
Public Any All Public Public_IP 353 TCP NMAS Authentication over AuthGW protocol
Public Any All Public Public_IP 353 UDP NMAS Keepalives; also used for statistics monitoring
Public Any All Public Public_IP 500 UDP IKE
Public Any N/A Public Public_IP N/A ESP (50) IPSEC traffic using ESP (if no NAT in traffic flow)
Public Any All Public Public_IP 4500 UDP Tunneled ESP traffic (if NAT in traffic flow)

For those with a protocol analyzer and an inquisitive mind, the behavior of the VPN client is slightly different than indicated by the the table above. When sending traffic to the three UDP destination ports (353, 500, 4500), the VPN client actually uses a corresponding fixed source port of 353, 500, or 4500.

However, as the diagram below indicates, clients may actually be connecting from behind some sort of port address translation (PAT) device. The real client address is rewritten by the PAT device (in a many-to-one address translation), and at the same time the source port is also rewritten. When the PAT device receives the response traffic, it can reverse the behaviour so the VPN client receives the response to the correct (fixed) port.

The impact at the BorderManager server, however, is that it observes UDP traffic to ports 353, 500, and 4500 from any source port. Thus restrictions on source port need to be removed as indicated in the table above. If the deployment can guarantee that all VPN clients will not traverse a PAT device, then the NMAS, IKE, and IKE-over-NAT filters can specify a restriction on source ports of 353, 500, and 4500 respectively.

Figure 1: VPN Client Behavior

Configuration Notes: Certificate Setup

You may be asking, "What part did the digital certificates play in the VPN session establishment?." Practically speaking, in this particular scenario the answer is "none." However, the VPN service generally provides certificate-based authentication for both C2S and S2S deployments, and so the certificate setup is required by default. Unless the Key Material Object (KMO), Trusted Root Container (TRC), and Trusted Root Object (TRO) exist, the VPN service cannot load.

User certificates, however are definitely not a requirement unless VPN users are being authenticated with user certificates instead of passwords, tokens, etc.

Configuration Notes: Pool Addresses

When configuring the C2S profile, a pool of addresses (address subnet or range) can be configured. An address from this pool will be assigned to the VPN client and be used as the source address of the client requests within the VPN tunnel. This means that hosts protected by the VPN server will receive client requests from the pool address. It is important that this range of addresses be on a separate Layer 3 network so BorderManager can correctly route responses over the VPN. As long as internal hosts and internal routers are aware of this layer 3 network, and they send packets destined for these addresses to the BorderManager server, then response packets will travel back over the VPN correctly.

Take care to assign sufficient IP addresses so that at any one time the pool of addresses is never exhausted. Should that happen, BorderManager will not be able to push a VPN address to the client. In that case, packets sent from the VPN will appear to have come from the real client IP address rather than the pool address. This means that hosts responding to the VPN client requests may not have a valid route back to the VPN client that passes via the BorderManager server.

High Availability Considerations

The main considerations for VPN high availability are:

  • High availability of policy/authentication information.
  • High availability of the VPN service.

High Availability of Policy/Authentication Information

The BorderManager VPN servers actually run eDirectory and belong to an eDirectory tree. As long as eDirectory itself is highly available, and partitions containing the relevant information (users, policies etc.) are adequately replicated, the information will be highly available to the VPN service. The VPN servers retrieve the required information using NDAP over NCP (Novell Directory Access Protocol over NetWare Core Protocol) - normal eDirectory communications.

High Availability of the VPN Service

The BorderManager service cannot be enabled for clustering on Novell Cluster Services at this time. Two basic options for dealing with this are discussed below:

Option 1 - Multiple VPN Gateways and Manual User Intervention

In this case, users are made aware of two physical DNS names or addresses for the VPN service. In the event of problems, they are expected to simply re-try a VPN connection to the alternate host. To help users, the installation of the VPN client on the local machine can pre-configure this information so that all the user needs to do is select the alternate host from a drop-down list.

Option 2 -- Multiple VPN Gateways and Content Switch

This is the primary topic for this article. It provides for both automatic fail-over and load-balancing. A sample environment demonstrating this approach is discussed below.

Sample High Availability Environment

The diagram below illustrates the sample environment. Two BorderManager servers are deployed with a Cisco CSS11503 Content Services switch. The configuration excerpts presented were used specifically for this switch, although adapting them for other switches should not be too difficult. Cisco documentation should be consulted for detailed information on the commands used.

Figure 2: Sample high availability environment

Data Flow

Before detailing the configuration, a successful data flow is illustrated below. This diagram shows the data flow of a simple Telnet request from the VPN client to the destination server, via both the content switch and the BorderManager VPN server after the tunnel has been established. As the content switch translates addresses, the client is forced to tunnel the ESP traffic via UDP 4500 as discussed earlier.

The diagram shows the physical source and destination IP addresses (as they appear on the wire) as traffic passes via each host. For simplicity, additional firewalls/routers and any address/port translations they perform are omitted. During the VPN establishment, the content switch load balanced the client requests to BMVPN1 which distributed a pool address of "E" to the client. The destination Telnet host is aware that the route back to address E is via BMVPN1.

Figure 3: Source and destination IP addresses

The tables below summarize the packet addresses at various points as they are observed on the wire:

Client Request

  Source Address Destination Address
From Client A B
From Switch A C
From BMVPN1 E F

Telnet Server Response

  Source Address Destination Address
From Telnet Host F E
From BMVPN1 C A
From Switch B A

Step 1 -- BorderManager Configuration

Both BorderManager VPN servers are configured as though they were unique - they have their own physical and VPN addresses. Ideally, the C2S profile would be sharable between both VPN servers so as to minimize the management overhead. Unfortunately, the address pool information held within the C2S profile MUST be server-specific. Otherwise, response packets to the VPN clients could possibly be directed to the wrong VPN server and be dropped. It is important to remember that one VPN server knows nothing about current VPN connections on the other VPN server.

Step 2 -- Basic Switch Configuration

The two key configuration settings for a basic setup are as follows:

1: Define the real VPN servers via the Service command. The service must also define the "keepalive" settings so the switch can determine if the individual server is active. This can be done in many different ways, such as with a simple ICMP PING. The more sophisticated switches, however, can "move up the stack" - they can query the ports used by the application and even query the application directly in order to get a response. Obviously, if a valid application query can be sent for which a functional VPN server will respond, then that query will make for a much better keepalive than a simple ICMP PING. The PING would confirm whether a host was present but not indicate whether the VPN service was running and functional.

The VPN service has several components - it would be complex to validate each service (NMAS, IKE, ESP) as a keepalive. A compromise is to check that the NMAS port (TCP 353) has responded.

An example of these settings is shown below:


service BMVPN1
  ip address 192.168.1.4
  keepalive type tcp
  keepalive port 353
  active

service BMVPN2
  ip address 192.168.1.5
  keepalive type tcp
  keepalive port 353
  active

2: Define the "content" rule. This specifies the front-end to the services defined in the Service directive. In other words, it defines the virtual IP (VIP) address on which the switch will listen for client requests and how it recognizes and directs these requests to the real servers. In this example, a simple rule is defined that treats all traffic to the VIP in the same way.

The method of load balancing is also defined here. A simple mechanism of load balancing by source IP address is used. This means that all traffic from a single client IP address will be maintained to the same VPN server. The switch uses the concept of an "owner" as an administrative feature in which to organize the content rules -- hence the "owner Acme" command.

An example of these settings is shown below:


owner Acme
    content vpn
        vip address 100.100.100.190
        balance srcip
        add service BMVPN1
        add service BMVPN2
        active

Step 3 -- Overcoming UDP Traffic Flow Problems

With the switch configuration defined above, the initial NMAS authentication using TCP 353 worked fine. Clients with different IP addresses were correctly load balanced to different VPN servers, but as soon as the IKE negotiation begins over UDP 500, communication fails.

It appears that the content switch will successfully translate the destination address of the UDP packets from B to C in the above diagram. However, when the response packet from BMVPN1 passes through the content switch, it does not automatically match this up to the incoming UDP traffic and so does not translate the source address from C back to B. (For many situations this behavior is not needed, but with IPsec, it is.) As a result, the client receives a packet from address C instead of from the address it originally sent to (B) and so drops the packet.

One way to handle this situation is via the Group command. The effect of this setting is to change the source address of packets from the VPN servers to be the VIP address. In the above example, this means that packets with a source address of C or D are translated to B successfully.

This approach may not be the most suitable if there are other communication requirements to the VPN servers themselves via the switch, and this particular translation causes a problem. Other ways of configuring the switch could be evaluated.

An example of these settings is shown below:


group vpn
  vip address 100.100.100.190
  add service BMVPN1
  add service BMVPN2
  active

Step 4 -- Overcoming UDP and PAT Issue

Using the Group command causes the switch to translate not only the source address of the VPN servers but also the source port, so the switch automatically performs Port Address Translation (PAT). This means the VPN client receives response UDP traffic from different ports than those expected (353, 500, and 4500). The VPN client simply ignores these responses, and communications fail.

To avoid this problem, the switch can be configured not to translate ports. An extra line is added in the Group command as shown below:


group vpn
  vip address 100.100.100.190
  add service BMVPN1
  add service BMVPN2
  portmap disable
  active

User Experience

Using the example above, if BMVPN1 fails, the content switch detects this via its keepalive and will remove it from service. All subsequent VPN client traffic will be directed to BMVPN2 transparently. When BMVPN1 becomes active again, the content switch brings it back into service and resumes load balancing VPN client traffic.

The situation for users connected to BMVPN1 at the time of failure is more complex and depends on how the C2S profile is configured. By default, an inactivity timeout of 15 minutes is used. Using the Telnet example above, if the VPN gateway fails, the Telnet responses will not be received by the client. However, the VPN client itself has no idea if the VPN server is down. The application (Telnet client) will indicate a communication failure before the 15-minute timeout, and re-connection attempts will continue to fail. In this case, the user needs to be told to manually disconnect from the VPN. A reconnect for the VPN will cause the switch to direct the client to the functional VPN server, as explained above.

If "Keep Alive Automatically" is used, the user experience is greatly improved in the event of VPN server failure. With this configuration, the client sends regular keepalive packets to which it expects responses from the VPN server. In the event of a failure, the VPN client continues trying to send requests for one minute. If there is still no response, an error message is displayed and the VPN client disconnects. The user would simply re-connect and be directed by the content switch to the working VPN server. Hence, IKE keepalives provide a good mechanism for determining VPN server availability.

Further Options

Two enhancements to the example presented here come to mind:

  1. There is no reason why additional BorderManager VPN servers can't be included to increase scalability. However, this is only likely to be necessary in large scale deployments -- the network bandwidth to the ISP is more likely to be the bottleneck in communication throughput rather than the VPN servers.
  2. The switch itself is now a single point of failure. Although switches are generally considered to be more highly available than a typical general purpose server, there may still be a requirement to remove this single point of failure. Cisco switches allow for this by deploying pairs of content switches. Depending upon configuration (and attendant complexity) these can be deployed in active-active or active-passive modes.

Conclusion

This article has presented an overview of IPsec VPNs in general and the BorderManager 3.8 VPN in particular. An example configuration is provided for configuring a high availability solution by integrating two BorderManager VPN servers with a Cisco content switch. The method used for developing the solution should be easy to apply to other brands of switches.

Appendix -- Sample Switch Configuration


configure
!*************************** GLOBAL ***************************
  no restrict web-mgmt
  bridge priority 49152

  snmp name "CS-Acme"

!************************* INTERFACE *************************
interface 3/1
  bridge vlan 17
  description "Interface to firewall 100.100.100.x"
  phy 100Mbits-FD

interface 3/2
  bridge vlan 15
  description "Interface to VPN servers"
  phy 100Mbits-FD

!************************** CIRCUIT **************************
circuit VLAN17

  ip address 100.100.100.254 255.255.255.0

circuit VLAN15

ip address 192.168.1.254 255.255.255.0

!************************** SERVICE **************************
service BMVPN1
  ip address 192.168.1.4
  keepalive type tcp
  keepalive port 353
  active

service BMVPN2
  ip address 192.168.1.5
  keepalive type tcp
  keepalive port 353
  active

!*************************** OWNER ***************************
owner Acme

  content vpn
  vip address 100.100.100.190
  balance srcip
  add service BMVPN1
  add service BMVPN2
  active

!*************************** GROUP ***************************
group vpn
  vip address 100.100.100.190
  add service BMVPN1
  add service BMVPN2
  portmap disable
  active

Acknowledgements

Robert Aarons, Synstar
Alistair Fletcher, Leeds City Council
Charlie Fleetwood, Leeds City Council


Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com

© 2014 Novell