UEK4 dropping NFS shares intermittently due to kernel invalidating dentrys that aren't actually invalid.
Last updated on MAY 23, 2018
Applies to:Linux OS - Version Oracle Linux 6.7 with Unbreakable Enterprise Kernel [4.1.12] and later
The customer's environment has two servers running Oracle Linux 6 and the UEK4 kernel, which are mounting an NFS share from another server, which in turn is using multiple NFS shares.
The NFS mounts are "stacked" NFS mounts. So the first NFS mount provides the mountpoints for the next NFS mounts.
The shares will occasionally get dropped. This occurs when the NFS server is temporarily unreachable and a userspace process performs some operation on the nested filesystem.
If the process gets interrupted while in the kernel waiting for the parent filesystem to show up again, the "mount" command will show the share as still mounted, but the permissions and ownership have reverted back to the underlying mount point.
The following is observed in /var/log/messages when the NFS mount is dropped:
Apr 15 15:28:02 server kernel: [4561339.604452] nfs_revalidate_inode (0:34/4) getattr failed, error=-512
With the following also seen at times:
/var/log/messages:Mar 18 17:37:45 server kernel: [2149934.481407] RPC: fragment too large: 1195725856
/var/log/messages:Mar 18 17:38:14 server kernel: [2149963.557052] RPC: fragment too large: 16777216
/var/log/messages:Mar 18 17:40:50 server kernel: [2150119.175443] RPC: fragment too large: 50331667
/var/log/messages-20180311.gz:Mar 4 14:32:26 server kernel: [929221.399472] RPC: fragment too large: 1195725856
/var/log/messages-20180311.gz:Mar 4 14:35:01 server kernel: [929376.852468] RPC: fragment too large: 50331667
/var/log/messages-20180311.gz:Mar 4 14:45:19 server kernel: [929994.399785] RPC: fragment too large: 1195725856
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
Million Knowledge Articles and hundreds of Community platforms