OES11 kernel core inside function ZIO_GatherDetailedSummaryInformation().

  • 7015685
  • 24-Sep-2014
  • 24-Sep-2014

Environment

Novell Open Enterprise Server 11 (OES 11) Linux Support Pack 1
Novell Open Enterprise Server 11 (OES 11) Linux Support Pack 2

Situation

Running OES11 SP1 and OES11 SP2 with various patch levels, and different 3rd party services ranging from Tivoli backup software, to HP Insight Management agents, various customers have witnessed a number of kernel cores.

The observations made with in all these cases is that shortly before the kernel crash, NSS volumes were (de-)activated.

Resolution

The statfs logic and latching code was fixed.

The solution to this issue will be released in an upcoming OES 11 Scheduled Maintenance patch.

Cause

Statfs threads trying to access the NSS Pool's Internal Volume (_IV_) of the pool's PurgeTree beast inside function  ZIO_GatherDetailedSummaryInformation().
NSS performs these calls using zAPI calls,  and the 'Volume_s' member in the PurgeTree's RootBeast_s happens to be NULL since the PurgeTree beast was already freed.

Additional Information

A backtrace for a kernel core of an OES11 SP1 server running Hp Insight management agents :

[1379489.349427] BUG: unable to handle kernel NULL pointer dereference at 0000000000000618
[1379489.349439] IP: [<ffffffffa08dde99>] ZIO_GatherDetailedSummaryInformation+0x69/0x15c [nsszlss]
[1379489.349480] PGD 48b30a067 PUD 48c6e5067 PMD 0
[1379489.349486] Oops: 0000 [#1] SMP
[1379489.349492] CPU 4
[1379489.349495] Modules linked in: lp parport_pc st ide_cd_mod ide_core ppdev parport af_packet novfs cma cmsg crm cvb css vipx sbd gipc v
ll sbdlib clstrlib ncs_timer nebdrv zapi nsslsa nssmanage nsszlss nsscomn ndpmod nss nsslibrary nsslnxlib libnss admindrv nwraid mptctl mpt
base tcp_diag inet_diag binfmt_misc edd adminfs adminfsdrv bonding cpufreq_conservative cpufreq_userspace cpufreq_powersave pcc_cpufreq mpe
rf microcode fuse ext2 loop dm_round_robin dm_multipath hpilo bnx2 rtc_cmos sr_mod cdrom joydev iTCO_wdt hpwdt acpi_power_meter i7core_edac
 edac_core ipv6_lib pcspkr iTCO_vendor_support button container sg ext3 jbd mbcache dm_mirror dm_region_hash dm_log linear sd_mod crc_t10di
f usbhid hid uhci_hcd ehci_hcd thermal usbcore usb_common qla2xxx scsi_transport_fc scsi_tgt processor thermal_sys hwmon scsi_dh_alua scsi_
dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh dm_snapshot dm_mod ata_generic ata_piix libata hpsa cciss scsi_mod [last unloaded: parport_pc]
[1379489.349601] Supported: Yes
[1379489.349604]
[1379489.349608] Pid: 6379, comm: cmahostd Not tainted 3.0.80-0.5-default #1 HP ProLiant DL380 G7
[1379489.349616] RIP: 0010:[<ffffffffa08dde99>]  [<ffffffffa08dde99>] ZIO_GatherDetailedSummaryInformation+0x69/0x15c [nsszlss]
[1379489.349650] RSP: 0018:ffff88048b161960  EFLAGS: 00010246
[1379489.349655] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88048fb10c10
[1379489.349660] RDX: ffff88048fb10cb0 RSI: 0000000000000000 RDI: ffffc90017718998
[1379489.349665] RBP: ffffc90017718998 R08: 0000000000000001 R09: 0000000000000000
[1379489.349670] R10: ffff88031fc51220 R11: ffffffff810f2810 R12: ffff88048fb10c10
[1379489.349676] R13: ffff88024242a010 R14: ffffffffa09607dc R15: ffffffffa09607d8
[1379489.349682] FS:  00007f223f52b700(0000) GS:ffff88031fc40000(0000) knlGS:0000000000000000
[1379489.349687] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1379489.349692] CR2: 0000000000000618 CR3: 000000049060b000 CR4: 00000000000006e0
[1379489.349697] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1379489.349702] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[1379489.349708] Process cmahostd (pid: 6379, threadinfo ffff88048b160000, task ffff88048c6d2280)
[1379489.349713] Stack:
[1379489.349716]  ffffffffa08e0278 ffff88030b96f410 ffff88048b161c38 ffff88024242a010
[1379489.349724]  ffff8802424e0a10 0000000000000000 ffff88048b161a68 ffffc90017718998
[1379489.349732]  0000000000000000 ffff88048b161c38 ffffffffa09553e0 ffff88048b161b38
[1379489.349739] Call Trace:
[1379489.349860]  [<ffffffffa08e0278>] ZFSMAL_ReadBlk+0x27b/0x3e4 [nsszlss]
[1379489.349963]  [<ffffffffa08e08f6>] ZFS_ReadPoolBlk+0x17c/0x1e6 [nsszlss]
[1379489.350063]  [<ffffffffa08f6d15>] MYBT_ReadPoolBlk+0x2c/0xea [nsszlss]
[1379489.350175]  [<ffffffffa08f8d46>] MYBT_browseEntries+0x145/0x4c7 [nsszlss]
[1379489.350287]  [<ffffffffa08fa3a9>] getOldestDeletedTime+0x8d/0xbe [nsszlss]
[1379489.350402]  [<ffffffffa08d1057>] ZFSVOL_BST_GetInfo+0x9b/0xcb [nsszlss]
[1379489.350504]  [<ffffffffa06bdf7b>] COMN_GetInfoByBeastPtr+0x2b8/0x351 [nsscomn]
[1379489.350573]  [<ffffffffa070907f>] MSG_GetInfo+0x19c/0x1d2 [nsscomn]
[1379489.350642]  [<ffffffffa06347be>] MSG_Call+0xba/0x101 [nss]
[1379489.350672]  [<ffffffffa0634852>] zMSG_Call+0x2e/0x4a [nss]
[1379489.350729]  [<ffffffffa074957f>] zGetInfo+0x88/0xaa [nsscomn]
[1379489.350845]  [<ffffffffa09857f9>] lsa_pool_statfs+0xd7/0x1e1 [nsslsa]
[1379489.350861]  [<ffffffff8117ce09>] statfs_by_dentry+0x69/0x80
[1379489.350869]  [<ffffffff8117cf35>] vfs_statfs+0x15/0xa0
[1379489.350877]  [<ffffffff8117d28c>] user_statfs+0x3c/0x60
[1379489.350884]  [<ffffffff8117d3e8>] sys_statfs+0x18/0x110
[1379489.350893]  [<ffffffff8144ed12>] system_call_fastpath+0x16/0x1b
[1379489.350905]  [<00007f223dd48857>] 0x7f223dd48856
[1379489.350909] Code: 48 83 fa 29 74 25 48 83 fa 34 74 0f 48 83 fa 28 b8 00 00 00 00 48 0f 44 c1 eb 10 48 8b 81 b8 08 00 00 eb 07 48 8b 81
 90 00 00 00
8>[1379489.350934]  8b 90 18 06 00 00 48 83 ba 90 0a 00 00 00 0f 84 dc 00 00 00
[1379489.350948] RIP  [<ffffffffa08dde99>] ZIO_GatherDetailedSummaryInformation+0x69/0x15c [nsszlss]
[1379489.350981]  RSP <ffff88048b161960>
[1379489.350985] CR2: 0000000000000618


A backtrace for a kernel core  of an OES11 SP2 server running Tivoli Storage Manager :

[1666271.517303] Oops: 0000 [#1] SMP
[1666271.517308] CPU 11
[1666271.517310] Modules linked in: lp parport_pc st sr_mod ide_cd_mod ide_core cdrom ppdev parport af_packet novfs(F) cma(F) cmsg(F) crm(F) cvb(F) css(F) vipx(F) sbd(F) gipc(F) vll(F) sbdlib(F) clstrlib(F) ncs_timer(F) nebdrv(F) zapi(F) nsslsa(F) nssmanage(F) nsszlss(F) nsscomn(F) ndpmod(F) nss(F) nsslibrary(F) nsslnxlib(F) libnss(F) admindrv(F) nwraid(F) tcp_diag inet_diag binfmt_misc adminfs(F) adminfsdrv(F) cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf fuse ext2 loop ipv6_lib ioatdma joydev pcspkr iTCO_wdt iTCO_vendor_support i7core_edac i2c_i801 edac_core dca rtc_cmos button ac enic sg acpi_power_meter mptctl dm_mirror dm_region_hash dm_log linear ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect i2c_core syscopyarea usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif dm_service_time dm_least_pending dm_queue_length dm_round_robin scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw dm_snapshot edd ext3 mbcache jbd fan thermal processor thermal_sys hwmon fnic libfcoe libfc scsi_transport_fc scsi_tgt mptsas mptscsih mptbase scsi_transport_sas dm_multipath dm_mod scsi_dh scsi_mod [last unloaded: parport_pc]
[1666271.517393] Supported: Yes
[1666271.517395]
[1666271.517398] Pid: 24490, comm: dsmc Tainted: GF             3.0.101-0.29-default #1 Cisco Systems Inc N20-B6625-1/N20-B6625-1
[1666271.517404] RIP: 0010:[<ffffffffa09d03f9>]  [<ffffffffa09d03f9>] ZIO_GatherDetailedSummaryInformation+0x69/0x15c [nsszlss]
[1666271.517426] RSP: 0018:ffff880146b759d0  EFLAGS: 00010246
[1666271.517428] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8801edfdd010
[1666271.517432] RDX: ffff8801edfdd0b0 RSI: 0000000000000000 RDI: ffffc90025de3118
[1666271.517435] RBP: ffffc90025de3118 R08: 0000000000000001 R09: 00000000ffffffff
[1666271.517438] R10: ffff88046fcb18a0 R11: ffffffff810fb680 R12: ffff8801edfdd010
[1666271.517442] R13: ffff880438b4a010 R14: ffffffffa0a527dc R15: ffffffffa0a527d8
[1666271.517445] FS:  0000000000000000(0000) GS:ffff88046fca0000(0063) knlGS:00000000f5eb6b70
[1666271.517449] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[1666271.517452] CR2: 0000000000000618 CR3: 0000000193569000 CR4: 00000000000007e0
[1666271.517455] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1666271.517459] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[1666271.517462] Process dsmc (pid: 24490, threadinfo ffff880146b74000, task ffff880109082180)
[1666271.517466] Stack:
[1666271.517467]  ffffffffa09d27d8 ffff88039c9ca348 ffff880146b75ca8 ffff880438b4a010
[1666271.517474]  ffff880449780a10 0000000000000000 ffff880146b75ad8 ffffc90025de3118
[1666271.517479]  0000000000000000 ffff880146b75ca8 ffffffffa0a473e0 ffff880146b75ba8
[1666271.517485] Call Trace:
[1666271.517597]  [<ffffffffa09d27d8>] ZFSMAL_ReadBlk+0x27b/0x3e4 [nsszlss]
[1666271.517663]  [<ffffffffa09d2e56>] ZFS_ReadPoolBlk+0x17c/0x1e6 [nsszlss]
[1666271.517727]  [<ffffffffa09e9275>] MYBT_ReadPoolBlk+0x2c/0xea [nsszlss]
[1666271.517802]  [<ffffffffa09eb2a6>] MYBT_browseEntries+0x145/0x4c7 [nsszlss]
[1666271.517874]  [<ffffffffa09ec909>] getOldestDeletedTime+0x8d/0xbe [nsszlss]
[1666271.517948]  [<ffffffffa09c35b7>] ZFSVOL_BST_GetInfo+0x9b/0xcb [nsszlss]
[1666271.518034]  [<ffffffffa07f0b0b>] COMN_GetInfoByBeastPtr+0x2b8/0x351 [nsscomn]
[1666271.518087]  [<ffffffffa083cf44>] MSG_GetInfo+0x19c/0x1d2 [nsscomn]
[1666271.518151]  [<ffffffffa04e3a1e>] MSG_Call+0xba/0x101 [nss]
[1666271.518179]  [<ffffffffa04e3ab2>] zMSG_Call+0x2e/0x4a [nss]
[1666271.518217]  [<ffffffffa087d77f>] zGetInfo+0x88/0xaa [nsscomn]
[1666271.518308]  [<ffffffffa054b7d9>] lsa_pool_statfs+0xd7/0x1e1 [nsslsa]
[1666271.518320]  [<ffffffff81186a09>] statfs_by_dentry+0x69/0x80
[1666271.518327]  [<ffffffff81186b35>] vfs_statfs+0x15/0xa0
[1666271.518331]  [<ffffffff81186e8c>] user_statfs+0x3c/0x60
[1666271.518337]  [<ffffffff811a4068>] compat_sys_statfs64+0x48/0x70
[1666271.518346]  [<ffffffff8146a230>] sysenter_dispatch+0x7/0x2e
[1666271.518352] Code: 48 83 fa 29 74 25 48 83 fa 34 74 0f 48 83 fa 28 b8 00 00 00 00 48 0f 44 c1 eb 10 48 8b 81 b8 08 00 00 eb 07 48 8b 81 90 00 00 00
8>[1666271.518369]  8b 90 18 06 00 00 48 83 ba 90 0a 00 00 00 0f 84 dc 00 00 00
[1666271.518378] RIP  [<ffffffffa09d03f9>] ZIO_GatherDetailedSummaryInformation+0x69/0x15c [nsszlss]
[1666271.518400]  RSP <ffff880146b759d0>
[1666271.518402] CR2: 0000000000000618