Several OES services fail, due to failed authentication.

  • 7008635
  • 23-May-2011
  • 27-Apr-2012

Environment


Novell Open Enterprise Server 2 (OES 2) Linux

Situation

Several Novell Open Enterprise Server Services, like Novell Cluster Services, fail to start due to an authentication error, or LDAP communication error.

Entries like these appear in /var/log/messages:
id: nds_nss_GetGroupsbyMember: Failed to init socket, status = 0

kernel: CLUSTER-<FATAL>-<2006>: Failed to read node number from NDS

[NCPL]: Unable to login Error code:34970

[NCPL]: Unable to get configuration from eDirectory failed with error 25'
[NCPL]: Unable to get the configuration information from eDirectory, error value : failure


During the installation of the server a local linux user named admin was created.
The tree has more than one account named 'admin' residing in different organizational units (OU).

'id admin' shows something like:
uid=1000(admin) gid=100(users) groups=100(users),16(dialout),33(video)
Or
uid=605(admin) gid=600(admingroup) groups=600(admingroup)
Though the "Linux Profile" of the admin account in eDirectory indicates that the UID and GID both have a different value.

Resolution

If crucial services like LDAP, LDAPS, NAM and NDS are working properly and the LDAP server used has a replica of the information needed, an additional potential reason is that there are several admin accounts, including a local one.

Using for instance 'yast users', set the filter to "Local Users" and verify if there indeed is a local user named admin (or ADMIN), then delete or rename this POSIX account admin to ie. ladmin (for local or linux admin).

After the renaming or deleting of the local admin user, 'id admin' should read something like:
uid=600(admin) gid=600(admingroup) groups=600(admingroup).

Be aware that the uid and gid may differ, check the "Linux Profile" of your [root] admin in eDirectory, executing 'id admin' from the server prompt should come back with those values.
If there is no a local user named admin, and the values from 'id admin' differ from the values stored in the "Linux Profile" in eDirectory, verify if there are more than 1 user in the eDirectory with cn admin.
If there are more than one admin account in the tree and they are LUM enabled, there is a huge chance the server will use a wrong user 'admin'; when the server / nam is looking for 'admin' it only lists the cn name, not the full distinguished name.
For instance cn=admin,o=Novell and cn=admin,ou=NTS,o=Novell will both be listed as 'admin'.
In eDirectory, when using Linux User Management, it is crucial that all users have a Unique ID, so that there is only one account named admin throughout the tree.


After the duplicate admin is resolved, it may be required to update the services with the correct credentials.
It may even be that, when the user was created during the initial installation phase of the server that system users (as the nssAdminUser or common proxy user) never got created as the server at that point already was trying to authenticate to eDirectory using the local admin or an admin in an other OU, in stead of the LUM enabled [root] admin.
It may even be that the eDirectory [root] admin did not get LUM-enabled for this server.

Additional Information

Novell Open Enterprise Server 2 Linux can have several PAM authentication modules, but goes though these in a certain hierarchy; by default it uses the compatible modules first, then it continues with nam.
Therefor, if there is a POSIX admin account, the system will use this account first, and will not be able to authenticated and communicate with eDirectory.

The reason for this is that by default, on OES, /etc/nsswitch.conf is configured:
passwd: compat nam
group:  compat nam

This causes the server to first look up users in /etc/passwd then in NAM.
Changing this into 'nam compat' is not a valid option as more then likely will cause severe issues, as the system may hang if NAM or even the network is not available trying to authenticate any given user, including root.

As several OES Services rely on the admin account for their initial communication with eDirectory, if this communication fails because the user that is used has an incorrect GID, UID, GUID or password, the initial authentication fails, therefor the OES Services fail.


By default an unaltered SUSE Enterprise Server 10 starts handing out UIDs from 1000 on and GIDs from 100 on, so the first user created locally will get UID 1000.

When the eDirectory tree was LUM-enabled with the first Open Enterprise Server installed in the tree, or when the tree was created with the first Open Enterprise Server, NAM / LUM starts handing out UIDs and GIDs from 600 on.
In such a case admin (or the administrative [root] account for eDirectory) most likely received UID 600, as it in general is the first user that is LUMenabled for the first OES Linux Server in the tree.
However, if the tree was LUM enabled prior the installation of the first OES Linux server and the UNIX Config was altered to hand out different UID and GIDs it may be that the LUM enabled admin account has a varying GID or UID than 600.
Non the less 'id admin' should return the values as they are stored in eDirectory. In case it returns a different value, the server has either a local user named admin, there is an admin user available via an other PAM or more than 1 user named 'admin' resides in the eDirectory tree.


From Novell Open Enterprise Server 2 SP1 NAM was made case-insensitive. From this moment on it also impacts the server when the [root] admin is stored in uppercase in eDirectory (as it was when the user was created on NetWare, or by choice due to company standards).
On a server that has NAM configured to be case Sensitive 'id ADMIN' can have a different result as 'id admin', when NAM is case-insensitive it comes back with the local admin.