Oracle HSM / SAM-QFS - Server Becomes Non-Responsive When Reading Large File Via Buffered I/O (Doc ID 2033628.1)

Last updated on APRIL 19, 2017

Applies to:

Oracle Hierarchical Storage Manager (HSM) and StorageTek QFS Software - Version 5.4 to 5.4 [Release 5.0]
Oracle Solaris on x86-64 (64-bit)

Symptoms

After upgrading MDS machine from S11.1/SAM-QFS 5.3 to S11.2/SAM-QFS 5.4.9, whenever a large file which is online in the disk cache of a QFS filesystem is read using buffered I/O, the entire system becomes extremely non-responsive until the process performing I/O is killed.


This problem does not happen immediately. It gets triggered after ~50-100 GBytes of data have been read. This will not happen if the file is a few 10's of GBytes.


While the system is degraded there are no obvious problems reported by top (lots of free memory), iostat (no significant I/O happening to QFS filesystem), or intrstat. However, the output of "lockstat uptime" shows a very large amount of kernel lock activity happening.



Another observation is that whenever this problem occurs the amount of free memory as reported by /usr/bin/top goes from a small number like 4GBytes to a large number like the following.

Memory: 48G phys mem, 41G free mem, 4096M total swap, 3802M free swap

It is quite unusual that the problem frees up memory rather than a traditional performance issue when there is little free memory available.

Changes

MDS machines were upgraded from S11.1/SAM-QFS 5.3 to S11.2/SAM-QFS 5.4.9.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms