Oracle HSM / SAM-QFS - Server Becomes Non-Responsive When Reading Large File Via Buffered I/O
Last updated on JANUARY 03, 2018
Applies to:Oracle Hierarchical Storage Manager (HSM) and StorageTek QFS Software - Version 5.4 to 5.4 [Release 5.0]
Oracle Solaris on x86-64 (64-bit)
After upgrading MDS machine from S11.1/SAM-QFS 5.3 to S11.2/SAM-QFS 5.4.9, whenever a large file which is online in the disk cache of a QFS filesystem is read using buffered I/O, the entire system becomes extremely non-responsive until the process performing I/O is killed.
This problem does not happen immediately. It gets triggered after ~50-100 GBytes of data have been read. This will not happen if the file is a few 10's of GBytes.
While the system is degraded there are no obvious problems reported by top (lots of free memory), iostat (no significant I/O happening to QFS filesystem), or intrstat. However, the output of "lockstat uptime" shows a very large amount of kernel lock activity happening.
Another observation is that whenever this problem occurs the amount of free memory as reported by /usr/bin/top goes from a small number like 4GBytes to a large number like the following.
Memory: 48G phys mem, 41G free mem, 4096M total swap, 3802M free swap
It is quite unusual that the problem frees up memory rather than a traditional performance issue when there is little free memory available.
MDS machines were upgraded from S11.1/SAM-QFS 5.3 to S11.2/SAM-QFS 5.4.9.
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
Million Knowledge Articles and hundreds of Community platforms