OES2 SP2 Rolling Cluster Upgrade from NetWare – Part 1 – OES2 SP2 Installation with NCS
- Overview and Assumptions
- Rolling Cluster Upgrade – things to know
- Cluster Blade Install – Pre-requirements
- NetWare Node removal (basic steps)
- Boot and Install process
- Post-Installation Instructions
- Multipathing Setup
- NCS Installation – pre-check
- NCS – SBD Partition
- NCS Installation
- Cluster Pre-Migration Checklist
The purpose of this document is to give a real-world example of how we installed OES2 SP2 64-bit onto our 16-node NetWare cluster environment to perform a rolling cluster upgrade. Hopefully this guide will cover most of the common things to watch out for, as well as a good example to get you quickly up and running. Servernames and IP addresses have been changed, so your environment may vary. If you are performing a rolling cluster upgrade from NetWare, we assume you are already using NSS and have a functioning cluster with an SBD partition.
The hardware in our setup are 16 HP BL460c (original release) blades in a c-class Chassis. These blades are connected via pass-through modules QMH2462 Qlogic mezzanine cards to a Xiotech Magnitude 3000 SAN. As such, we will be using multipathing (this is NOT required for OES2 or for NCS).
Because our blades are diskless, we use HP ILO to boot from the CD/DVD media and install across the network from our SLES installation server. You CAN install OES2 via other methods (PXE booting or using the actual media mounted via ILO—although ILO is much slower).
Our cluster nodes do not have master replicas on them. We have dedicated NON-Clustered replica servers. As such, it did not matter which nodes we migrated to Linux for eDirectory purposes. If one of your cluster servers is the Master replica server, or the server running the Certificate Authority, you must save that server for last (per Novell’s documentation).
We also do not use DSFW, and as such did not need to worry about whether to install this. DSFW can only be installed at the initial installation (you cannot ADD it on later). So if you decide you are going to use DSFW on a clustered server, you must decide BEFORE you install OES2. (Personally, I would not install DSFW on a cluster, but use a dedicated set of servers for it, but you do not have to follow this suggestion).
Clustered items we are migrating: NSS (data cluster resources), GroupWise (POA, MTA, WebAccess, GWIA), iPrint, Apache Web Server.
Lastly, this is a ROLLING Cluster upgrade. Because we will be running the entire cluster in a mixed-mode of NetWare and Linux nodes, it does provide some benefits, but there are also caveats. We assume you also have a NetWare cluster using NSS and as such are already aware of the 2 TB NSS limit pertaining to devices (disks).
Make a list of all your server software and services you are running on NetWare and which nodes said items are allowed to run on. Check with the vendors to make sure that any third party software will work on OES2 SP2 Linux. You may need to upgrade or purchase different software (for example, Avanti Technologies TaskMaster does not currently run on OES2 Linux—so we had to workaround this). Develop a schedule, and start slowly until you are comfortable. A Rolling cluster upgrade/conversion is designed to be a temporary situation. Ours was mixed for almost a month.
- You will basically be removing one (or more) NetWare nodes from the cluster and reinstalling them as OES2 Linux nodes. Thus, it is usually desirable to keep the same name and IP.
- Some cluster resources CANNOT be easily migrated back and forth (migrated in this case meaning you move the resource from one node to another). Some examples are GroupWise nodes. This is because the cluster load/unload scripts are not translated between NetWare and Linux automatically.
- Data resources (ie, NSS volumes) CAN be migrated between nodes, provided they were first created on NetWare. However, I personally don’t recommend this (although I have done it). Once something is migrated to Linux, try to keep it on the Linux nodes.
- DISK CHANGES – don’t do it. What does this mean? Once you insert a Linux node into the cluster, you have a mixed cluster. Once this happens, do NOT ADD new cluster disks, change cluster disk sizes, or delete cluster disks. This is per Novell’s own documentation. There IS an exception to this however: iPrint migration. The Novell docs state that you are to create NEW NSS volume resources on LINUX and THEN migrate them. Also, if you MUST expand a disk or ADD a disk, the Novell docs state that you must shutdown all the Linux nodes, make your changes and then bring them back on-line.
- Disks created on Linux CANNOT be migrated to NetWare.
- READ Novell’s docs for this section: 6.1 Guidelines for Converting Clusters from NetWare to OES2 Linux. Read and understand this section thoroughly before you begin.
- While NSS volumes are probably the easiest thing to migrate, we assume you have read the above Novell documentation section. Very large NSS volumes with lots of objects (files/directories) can take several hours for the trustee rights to be rebuilt. Plan your downtime accordingly and notify your users.
- Make sure that the HP firmware for the blades are up to date
- Make sure that the Qlogic firmware is up-to-date for what the SAN supports
- Make sure to disconnect the secondary SAN/Qlogic path and only run the setup with the first path connected. This makes it easier, in my opinion, to determine which disk you are partitioning/working with during installation. It also prevent accidentally formatting of other SAN-connected disks.
- Make sure to use the SLES 10 SP3 and OES2 SP2 media/NFS install points. OES2 SP2 can only be installed onto SLES 10 SP3. We will be using 64-bit versions for the NCS Cluster and all future OES2 installs. Two reasons for this. First, Novell announced on the third week of April that the next version of GroupWise (codename Ascot) to be released end of 2010 will be 64-bit only. Secondly, Symantec will only support 64-bit OS with NBU version 7 (if you have a 32-bit OS you have to use the NBU 6.5.x agents/clients). You cannot perform an in-place upgrade of a 32-bit OS to a 64-bit OS, so to avoid more migrations we will use 64-bit OS.
- Use the Pre-Migration Checklist to assist you in keeping track of which nodes are migrated. (see last page)
- Use the pre-migration checklist to make this easier, however, essentially you will pick a NetWare cluster node to migrate (it really doesn’t matter which one, UNLESS you have your Master eDir Replicas on your cluster. In which case, you must save that node for last).
- Migrate the cluster resources on the server to its other NetWare node(s).
- Remove NetWare server from eDir and cleanup eDir objects that are cluster-related (BE CAREFUL when doing this so you don’t accidentally delete the wrong objects).
- Disconnect all SAN disks from server. Either re-size (if necessary) your boot LUN/disk, or delete the old NetWare one and create a new one. If using multipathing, only connect one path at this point.
- Use ILO to mount the SLES 10 SP3 64-bit DVD.
- Select Installation, but don’t hit Enter
- At the bottom enter the options (x.x.x.x = whatever your IP addresses are in your environment): install=nfs://slesadmin.abc.com/install/path hostip=IP netmask=mask gateway=x.x.x.x nameserver=x.x.x.x
- Paths are:
Ie: install=nfs://slesadmin.abc.com/install/SLES10SP3_64 hostip=10.10.1.10 netmask=255.255.0.0 gateway=10.10.1.1 nameserver=10.10.1.250
Select Yes, and click Next
Check the “include add-on products” and click Next
Then click Next.
Click Yes, and then Next.
Make sure your time settings are correct for your environment and click Next. Later we’ll configure for NTP time.
Click Partitioning (we need to change some stuff).
Click “Create Custom Partition Setup” and then click Next. This is just an example. Feel free to follow your own server setup guidelines. I would strongly advise AGAINST using EVMS for your boot LUN setup. Use either custom partition or LVM setup.
Why do we do this? We don’t like to setup one big LUN (virtual disk, logical drive, whatever your RAID hardware calls it) for / (root partition) using Reiserfs.
With OES2 Linux, you ALWAYS (in my opinion) want to setup a dedicated LUN for your “boot” code, and leave a separate LUN for NSS (if using NSS). NEVER allocate all your disk space to one LUN. Think of this as NetWare, in the sense that you had your DOS partition separate from NetWare partitions, and SYS volume separate from your other volumes.
Select Custom Partitioning and then click Next.
We have one LUN here. This LUN is our “boot” LUN (the 15.0 GB LUN)
Select Primary Partition and click OK
Make sure to set the file system to Ext3 and the size to 1.0 GB and the mount to /boot
Click OK. Your boot disk can be whatever size you decide on. 1.0 GB may be a bit large for some folks. The main point is to make it a dedicated partition on the LUN-0/boot LUN.
Choose Primary Partition and click OK
Change “file system” to Swap.
Set to +2GB and click OK (don’t forget the mount point of swap)
Again, set accordingly (the old rule was 2x your system RAM, but our servers have 4.0 GB of RAM and if we ever actually use SWAP, we probably have something going on that we need to look at).
Click Create and Primary Partition again.
Change file system to Ext3 and let it use the rest of the LUN and mount point is /
Some folks may wish to partition out the /var or even /opt partitions. Again, this is just our layout and the main point is to segregate your /boot, swap and / partitions at the very least.
Always uncheck the Novell AppArmor (unless you really wish to use it). For OES it will depend upon what type of install you are doing. This is for the setup of a Blade for the Cluster. We will select the NCS software later. However, ALL OES2 servers should have the following items selected (for our environment):
- Novell eDirectory
- Novell Backup/SMS
- Novell iManager (you never know when you’ll need it on the server)
- Novell Storage Services (this should auto-check the Novell LUM, and Novell NCP, but I will list them here anyway)
- Novell Linux User Management (LUM)
- Novell NCP Server/Dynamic Storage Technology
We choose to install NSS even if we aren’t going to use it right away (again, never know when you may want/need it). I find the NCP server handy so that you can use native Linux EXT3 partitions and attach to them with Windows PC’s via the Novell Client (as opposed to having to muck around with SAMBA configurations). This also adds NCP file locking if using GroupWise and the ConsoleOne Windows Management snapins.
I also install the C Compiler tools because you never know when you may need them.
(Most notably on Vmware, or if using the HP Proliant Support Pack because it installs non-kernel drivers sometimes and therefore you need the Compiler to recompile the kernel for non-stock drivers).
Click Accept again.
Click Accept again.
Wait for it to create the partitions
It should reboot and launch the rest of the install
Enter in the password. This should be diff. than the eDirectory Admin password. Click Next.
Uncheck the “change hostname via DHCP”. We don’t give out DHCP in our server room. Follow your server naming conventions. We use:
Where XX = the node number (01, 02, etc.)
Also, since we are REPLACING a NetWare server cluster node, we chose to keep the same server name. As such, we entered the same name for this OES2 node as the NetWare server we replace it with.
Set firewall to disabled (for now).
Also Disable IPv6. I’ve had issues with it in the past.
Click Network Interfaces
On the Blade servers, the first HP NIC is the “primary” one. You should double-check by looking at ILO for the MAC address and comparing to what SLES shows above to make sure you’re working on the correct network adapter.
Sometimes Linux assigns the NIC in reverse order (ie, 2nd NIC will be eth0, 1st NIC will be eth1). Make sure to find the MAC address of the NIC and compare against what Linux finds (click Edit and you can go to the Advanced section and verify the hardware address). Otherwise you may THINK that first NIC listed is the primary NIC (eth0) and it’s not. Then your install fails later because of this. Alternatively you can disable the secondary NIC in the BIOS and re-enable it later.
Set the IP and Netmask. Use the SAME IP and Netmask as the NetWare node that you are replacing this server with.
Click Hostname and Name Server
For the Cluster nodes, make sure you use the same name and IP as the previous NetWare node. (ie: co-nc1-svr13 on OES2 will have the same name and IP of what it was when it was on NetWare).
Enter the appropriate DNS servers and click OK (double-check that hostname and domain are still correct).
Click the Routing button
Enter the default gateway and click OK (obviously the gateway can differ depending on where the server is installed).
I believe it puts the “configured” NIC on the top now, even though we hopefully configured the second one. Click Next.
Select the VNC Remote Administration so that it is enabled. We choose to use this so that we can use the NRM (Novell Remote Manager) VNC Consoles option. ILO will work as well, albeit slower (and the mouse cursor has issues until you install the HP drivers). (Note, there are other ways to enable VNC as well).
We may change the Proxy section later.
I usually skip the test due to the fact that our firewall policies prevent our servers from accessing the internet.
DO NOT use LDAP with OES. OES uses its own LDAP server (eDirectory). You CANNOT use OpenLDAP and eDir at the same time.
For this, we’d install into the existing tree. Insert the proper tree name.
I also uncheck the Require TLS for Simple Binds. It tends to cause issues if you don’t uncheck it.
We use the IP of our DS Master Replica server.
Enter the admin userid in LDAP format and the password.
Be careful here. The Server context will default to the same spot that your admin user is at. You should really use the same eDir context here that matches where the old NetWare server was at. Enter the server context in LDAP format (there’s no browse button, so you have to know where the server will be installed to). I leave everything else the same.
Enter the information here that pertains for your environment. You probably will be using SLP with Directory Agents.
I leave these as-is. Click Next.
Then click Next
Now wait a long time for this and iManager to install.
For now we leave this local. Basically this means that any accounts created on this Linux server are ONLY stored on this server (same for passwords). We don’t plan on creating other “local only” accounts.
Click Next (we don’t define any other local accounts)
Technically at this point, you are finished with the install. However, it is STRONGLY advised that you patch the server before:
- Creating any NSS items
- Enabling MPIO (multi-pathing)
- Doing an ID-Transfer (or any migration with the Migration Utilities)
- Installing NCS or any other services
Once the server is up and running, before creating any NSS partitions or enabling Multi-pathing, we need to apply updates. We have setup an SMT (Subscription Management Tool) server on Linux (SMT is a patch “proxy” server that downloads all the patches from the Novell Customer Center (NCC) so that we don’t have to configure every server to download these patches from the internet. Instead, we point the servers to the SMT server. Think of it as a “lite” version of Patchlink for Linux/OES2.
Use WinSCP (or whatever method you are comfortable with) and transfer the clientSetup4SMT.sh to the root’s home directory of the SLES/OES2 server. (You can actually put it anywhere on the server, but the point is you need to run the script if you are using SMT)
Login to the server (either via SSH, VNC, or ILO) as root. Then open a terminal window.
chmod +x clientSetup4SMT.sh
./clientSetup4SMT.sh –host slesadmin.abc.com
That’s a ” – -” (dash dash without a space) in front of the host line
Hit Enter and wait
The icon will normally be orange at this point.
It will usually come up and tell you a few patches to update. Update the items listed.
The default list will contain security patches first, followed by “mandatory/recommended” patches to SLES10 and OES2.
I usually apply those (reboot needed I believe)
After that, you’ll usually get a globe icon if everything (including optional patches) is installed.
After patching, you may have problems with iPrint Plugins in iManager on OES2 SP2. Check out TID #7005152. You’ll have to change the property pages for two objects.
Settings for NSS and Symantec NetBackup:
In order for NBU to work properly with NSS we need to do the following:
You need to edit the /etc/opt/novell/nss/nssstart.cfg file and add the two following statements/lines:
That’s an “I” in the CtimeIs statement, not an “l” ( the documentation on Symantec’s site is difficult to read).
I restart the server after this to ensure that it’s loaded.
HP Proliant Support Pack Install
Install the Proliant Support Pack and reboot and verify that other items are mounted and work properly.
First ensure that you have fully patched the system.
Then power down the system and connect the secondary boot LUN (assign VDISK in the Magnitude Icon Manager).
Then power the system back up.
I usually run the Partitioner to make sure it sees the LUN twice.
This section is how to enable multi-pathing (MPIO) when booting from the SAN. As you can see, we have two paths.
Now, we follow TID 3594167 (which states we need a fully patched system, so that’s why I patch first).
So far we’ve done steps 1-4, now we do step 5.
You may need to implement this!!!!!!
Edit the /etc/modprobe.conf.local file to ADD the line as shown below:
Why do we do this? In our cluster setup we have non-contiguous LUN numbers from 0 all the way to 64. In order for the Linux OS to see all the LUNs properly, we needed to add the above item. If you have contiguous LUNs, you may not need these settings. It will increase boot/load time by a few seconds (about 3-5 seconds by my timing in my environment).
Open a terminal and type:
(as per step 5)
Removed Step 7 as SLES 10 SP3 changes the output of the command.
Edit multipathd.conf file as per Xiotech (adjust per your SAN Vendor):
At the prompt type:
Enter the information as shown:
(note the spaces).
We may change round-robin, but I’m not sure yet.
STEP 8 from the Novell TID:
Reboot the server
Open a terminal prompt and type:
Here’s a key for the output of the multipath command:
We need to change a few more items.
Login to NRM (Novell Remote Manager) on the temporary OES2 server, using the following format:
https://dnsnameofserver.abc.com:8009 (same as it is for NetWare)
You must login as admin.dec or the “root” user. You cannot login as yourself just yet.
Click “Manage NCP Services” -> Manage NCP Server
Click the value of “2” next to the OPLOCK_SUPPORT_LEVEL and set to a value of 0 (that’s a zero).
NRM will automatically restart the ndsd process to make the change take effect.
New Item on 11/9/10:
Per TID #7004848, we need to set the First Watchdog Packet:
Set it to 5
We also set the maximum cached subdirectories per volume to be 500,000. Why? We discovered that, unlike NetWare, on very large datasets, OES2 needs to have this setting increased in order for the Novell Client to properly see all the files/folders on some volumes. If you discover that your clients no longer see all the data after converting to OES2 Linux from NetWare, odds are, it’s this setting (or the two above it) that need to be increased. You can also refer to TID #7004888 for more information.
There may be some problems with the watchdog settings at this point. Some people have reported issues with changing the setting to 5 (like it was on NetWare). We had problems by leaving it at zero. You can refer to TID #7004848
Now click the “configure” icon:
Click the “Edit httpstkd config file”
Scroll all the way down to the bottom and ADD the following two lines:
Then click Save Changes.
You now have two choices. You can either restart the entire server, or restart the following process to make the Email changes take effect:
rcnovell-httpstkd restart (this may also take a minute or so) – this makes the email change take effect
I like to reboot the server once more at this point and make sure life is good.
Before we begin, you may wish to check a few things.
First, we got bit by the infamous Panning ID situation. Basically if your NetWare Cluster is at NW 6.5.8 and is working okay, the odds are you do not have this problem. However, if you are at an earlier release, I strongly advise that you apply NW 6.5 SP8 to one node first and reboot it. If the node refuses to join the cluster, then odds are you have the Panning ID problem. (the gibc.nlm in SP8 was changed from the previous version and this is where you can tell if you have a problem or not). See the following TID for more information: 7001434 (step 7 has the Panning ID situation covered).
The BEST method to fix this problem, unfortunately, requires that you shutdown ALL your Cluster nodes and then start them again. If you have a problem getting your first OES2 SP2 Linux node into your cluster, you may have a Panning ID problem and will have to shutdown all the cluster nodes and start them again.
Before I install NCS into the first node, I put a read/write replica of the eDirectory partition that contained the Cluster objects onto the server. I don’t believe this is required, but it can help with cluster sync problems.
At this point, I use my SAN utilities to connect the existing disk that hosts my SBD partition to this new OES2 server. You can either reboot the server (probably easiest) or initiate a scsi-bus-rescan.sh and verify via: multipath –ll that your server sees the SBD partition.
Login to the physical node as root (VNC or ILO).
Click Computer -> Yast -> Open Enterprise Server -> OES Install and Configuration:
Check the box next to Novell Cluster Services so that it has a black checkbox.
Wait a few minutes for the files to install (about 2 minutes)
Then a few post-install screens will run (MiCASA, etc.)
At the OES Configuration screen, (wait for it to build), click the “disabled” link underneath the LDAP Configuration for OES. I am not 100% sure that this step is even necessary to be honest, but the Novell docs state to do it anyway.
Generally speaking: There will usually be two IPs here. One is the IP of the actual cluster node you are working on, and the other is probably the IP of a server with replicas on it. In our setup, we have three dedicated LDAP servers that contain replicas of our entire tree. As such, I adjusted the lines here so that I had FOUR IP addresses listed.
The local IP
And the 3 other “remote” IP of our other LDAP servers with replicas on them.
Your environment may vary.
Now click the LDAP Configuration for Open Enterprise Services link.
Enter the “admin” password.
Click ADD if you wish to add additional LDAP servers.
Scroll down and click the “disabled” link underneath Novell Cluster Services (NCS)
Now click the Novell Cluster Services (NCS) Link.
10.10.1.230 is the IP of our LDAP server that contains ALL replicas. I would try to avoid using the local IP (the 10.10.1.10 above) unless it has replicas of your partitions on it.
Also, there used to be a bug for the Cluster FDN that it had to be case sensitive. In other words, if your eDirectory Cluster object was: CLUSTER1, you had to make sure that you entered it EXACTLY as it appeared in eDirectory. While this is supposedly fixed, I chose to make sure that the case matched anyway.
BE CAREFUL here and make sure you have the proper context specific in LDAP format.
Make sure that “Existing Cluster” is selected.
Make sure the IP listed is the correct IP of the physical node you are installing this on.
UNCHECK the “Start Clustering Services now” and click Finish
Click Next at the next screen.
Click Configure Later and click Next (we use SMT so that’s why, plus we already patched it).
Open a terminal and type:
This should display:
Notice that the cluster object name IS in uppercase (NCS1 vs. ncs1). This verifies that the server does see the SBD disk partition. Your cluster name will probably vary.
Reboot the server.
It may take several minutes AFTER the server is rebooted and loaded for the iManager -> Clusters -> Cluster Manager to show the green dot on the server object:
Yellow dot means that’s the node that is running the Master Cluster IP resource.
A server that is NOT in the cluster has no dots at all.
Note that some of my servers are UPPERCASE and some are lowercase. This is because I entered the servername in lowercase during the OES2 Linux installation. You will find that as you reboot the server with the master Node, it will “update” the server names, and eventually all your servers will show up with lowercase. (If you used lowercase that is). I have not noticed any harm with the change of case for the servername.
At this point, we are ready to begin the actual Rolling Cluster Upgrade of the services themselves. That is in part 2.
- Edit Cluster failover node resources appropriately so that the cluster resource won’t failover to a converted node prematurely.
- Offline the resource and then online to secondary/tertiary NetWare node
- Stop/unload backup software on physical node
- Remove replicas from physical node
- Record IP and DNS name of NetWare Server here:
- Remove NetWare server from tree via NWCONFIG
- Delete the Cluster Node Server object, and all other objects relating to the NetWare server. (Refer to Novell’s documentation).
- DSREPAIR to verify all things are good or use iMonitor
- Resize SAN boot LUN if necessary (ie, the old NetWare DOS/SYS volume partition).
- Unassign ALL “non-sys” vdisks (both paths) from the server.
- Unassign the SECONDARY path for the “SYS” vdisk
- Install OES2 SP2 (but do NOT install Clustering at this point). Use same name and IP from “old” NetWare server.
- Patch server with latest SLES/OES2 Patches.
- Verify SAN/multipathing connectivity.
- HP PSP install (for HP servers)
- NBU install (install your tape backup software)
- Install NCS