Solaris Cluster 3.3 node panic on pxfs global UFS filesystem with "segmap_release: Bad Addr 0"
(Doc ID 1995972.1)
Last updated on AUGUST 12, 2020
Applies to:
Solaris Cluster - Version 3.3 U1 and laterInformation in this document applies to any platform.
Symptoms
Solaris Cluster 3.3 system panics with the following errors:
unix: [ID 361096 kern.warning] WARNING: hat_kpm_mapin: pp zero or not locked
cl_dlpitrans: [ID 624622 kern.notice] Notifying cluster that this node is panicking
unix: [ID 836849 kern.notice]
panic[cpu8]/thread=300264d0460:
unix: [ID 683362 kern.notice] segmap_release: bad addr 0
unix: [ID 100000 kern.notice]
genunix: [ID 723222 kern.notice] 000002a10015f700 genunix:segmap_release+3ac (19162f0, 0, 0, 8000000000000000, 1, 2a750000000)
The HAT in hat_kpm_mapin is the Hardware Translation Table and it is not technically part of the PxFS code base. Segkpm provides access to all memory pages within a segment it creates, mapping to physical pages in the address space. In this case an invalid address is passed in. So the panic cause is not by faulty memory handling or address translation code but rather the address that was passed to it. The bad address is coming from Solaris Cluster pxfs code.
The corefile must be analyzed by Solaris Cluster Support to determine that the correct issue is identified. In short the vnode_t virtual node is going to be on global pxfs filesystem. pxfs:__1cFpxregKread_cache6MpnDuio_ipnEcred will have a addr with "0" zero value. Thus the unix error in /var/adm/messages "hat_kpm_mapin: pp zero or not locked" and the panic message: "segmap_release: Bad Addr 0" It will only happen on the one node where the file access occurred and only upon large file access, (larger than 4GB). It was observed with the IBM WTX application. We could not reproduce this panic with large files outside of that application.
Changes
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |