HSM: NFS Server Running Out Of Threads
(Doc ID 2678975.1)
Last updated on JUNE 15, 2020
Applies to:Oracle Hierarchical Storage Manager (HSM) and StorageTek QFS Software - Version 6.0 and later
Solaris Operating System - Version 11.4 and later
Information in this document applies to any platform.
Customer has an Oracle HSM (SAM-FS) file system running on an Oracle Z4-4 disk appliance. It is almost exclusively an NFS server that is NFS sharing that HSM filesystem to over 60 clients.
NFS client connections are getting hung.
NFS threads suddenly leap from 50 to 4096 and the the clients start hanging.
They see this same behavior when using rsync, cp, scp, cpio, and a local archive command.
It appears that only clients running Linux RHEL-7 are experiencing the issue, where writing a file to the SAM file server over NFS is causing the Oracle server to run out of NFSD threads. Other client OSes are not generating the problem.
Adding the "sync" flag to the NFS mounts on those RHEL-7 systems fixes the problem. The NFSD threads are no longer used up, and other connections such as additional transfers or df/ls commands do not hang. However, the downside is that transfer speeds drop from around 100 Mbsp to around 10 Mbsp.
Also, changing the NFS version from v3 to v4 also partially resolves the issue. The number of threads increases dramatically, but stays below the 4096 max, keeping the networks shares from hanging.
Thus, the hangs seem to depend on (1) a NFSv3 connection, (2) a client running at least RHEL 7.3, and (3) an asynchronous NFS mount.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!
In this Document