My Oracle Support Banner

HSM - Archiver Deamon Is Stalled With sam-arfind Dumping Core and sam-arcopy Exiting (Doc ID 2496368.1)

Last updated on APRIL 08, 2020

Applies to:

Oracle Hierarchical Storage Manager (HSM) and StorageTek QFS Software - Version 6.1 to 6.1 [Release 6.0]
Information in this document applies to any platform.

Symptoms

The sam-archiverd daemon is stalled and dumping core.

The sam-arcopy processes are exiting prematurely.

 
Output of the 'samcmd a' command is showing "Waiting for scheduler error"

The command 'showqueue -v' is showing "Drives busy" but 'samcmd r' shows a lot of drives are available and empty.


The sam-arfind daemon is dumping core with stack trace like the following:

  core '/var/opt/SUNWsamfs/archiver/samfs1/core' of 22375:
sam-arfind samfs1
  ------------ lwp# 1 / thread# 1 ---------------
  ffffffff7e04e2f4 ioctl (a0, ffffffff7ffff878, 40, ffffffff7ffff560, 0, 8080808080808080) + c
  0000000100016178 FsAct (0, 20122c00, 100022560, 7ffffc00, 8080808080808080, 22) + 318
  0000000100007c14 main (2, ffffffff7ffffa48, ffffffff7ffffa60, ffffffff7c000180, 2880, 2800) + 1b4
  000000010000775c _start (0, 0, 0, 0, 0, 0) + 17c
  ------------ lwp# 2 / thread# 2 ---------------
  ffffffff7e04dc70 __sigtimedwait (ffffffff7e86b350, 0, 0, ffffffff7bdfbcb0, 0, ffffffff7bb00240) + 8
  ffffffff7e022a30 __posix_sigwait (ffffffff7e86b350, ffffffff7bdfbf28, 118, 0, ffffffff7df59da3, 178) + 18
  ffffffff7e7496b8 signalWaiter (0, 0, 0, ffffffff7e749670, 0, 1) + 48
  ffffffff7e04987c _lwp_start (0, 0, 0, 0, 0, 0)
  ------------ lwp# 3 / thread# 3 ---------------
  ffffffff7ef0683c ArchReqFileName (0, ffffffff7b4fb09a, 0, 1, 0, 0) + 3c
  ffffffff7ef078f8 ArchReqValid (0, 0, 148cd0, 0, 0, ffffffff7e18a000) + 28
  000000010000a81c Archive (100125690, 0, 0, 10000a3e0, 0, 1) + 43c
  ffffffff7e04987c _lwp_start (0, 0, 0, 0, 0, 0)
  ------------ lwp# 4 / thread# 4 ---------------
  ffffffff7e0498fc __lwp_park (100125980, 100125960, ffffffff7b2fbbf0, 10012596c, 0, ffffffff7e18a000) + 14
  ffffffff7e0430b4 cond_wait_queue (100125980, 100125960, ffffffff7b2fbbf0, 0, 0, 0) + 4c
  ffffffff7e043590 cond_wait_common (100125980, 100125960, ffffffff7b2fbbf0, 2946, 0, 0) + 28c
  ffffffff7e043760 __cond_timedwait (100125980, 100125960, ffffffff7b2fbe28, 0, ffffffff7b2fbbf0, 0) + 60
  ffffffff7e043824 cond_timedwait (100125980, 100125960, ffffffff7b2fbe28, 526f5, 74c3f3, 3b9aca00) + 14
  ffffffff7e043864 pthread_cond_timedwait (100125980, 100125960, ffffffff7b2fbe28, ffffffff7b2fbd40, 0, 0) + c
  ffffffff7ef0c73c ThreadsCondTimedWait (100125980, 100125960, 5bc6d80a, 0, 0, ffffffff7e18a000) + 94
  000000010001502c ExamInodes (1001256b8, 0, 0, 100014ef0, 0, 1) + 13c
  ffffffff7e04987c _lwp_start (0, 0, 0, 0, 0, 0)
  ------------ lwp# 5 / thread# 5 ---------------
  ffffffff7e0498fc __lwp_park (100125a20, 100125a08, 0, 100125a14, 0, ffffffff7e18a000) + 14
  ffffffff7e0430b4 cond_wait_queue (100125a20, 100125a08, 0, 0, ffffffff7e190fc0, 0) + 4c
  ffffffff7e0436a8 cond_wait (100125a20, 100125a08, 0, 10001695c, 0, 0) + 10
  ffffffff7e0436e4 pthread_cond_wait (100125a20, 100125a08, 148cd0, 0, 0, ffffffff7e18a000) + 8
  000000010001695c FsExamine (100125708, 0, 0, 1000167c0, 0, 1) + 19c
  ffffffff7e04987c _lwp_start (0, 0, 0, 0, 0, 0)
  ------------ lwp# 6 / thread# 6 ---------------
  ffffffff7e04dd64 __so_accept (6, ffffffff7aefbdf2, ffffffff7aefbde4, ffffffff7bb02240, 2e6800, 6) + 8
  ffffffff7e74f938 UdsRecvMsg (100125b98, 0, 100125b98, 10012ab7c, 100023a50, d) + 208
  000000010001c420 Message (100125730, 0, 0, 10001c308, 0, 1) + 118
  ffffffff7e04987c _lwp_start (0, 0, 0, 0, 0, 0)
  ...



The showqueue output shows a lot of files being scheduled and drives are busy:

# head -2000 /tmp/SAMreport.samfs.20181022.1051EDT.showqueue_text
  Filesystem HSM:
  Files waiting to start: 0
  Files being scheduled: 1,488                                     <----------------------
  Files archiving: 0
  Events processed: 5,197,965
  archive 0
  change 14,057
  close 5,179,102
  create 1,724
  hwm 0
  modify 0
  rearchive 0
  rename 860
  remove 862
  unarchive 1,360
  internal 0
  Scan list Examine: noscan
  0 2018-10-23 00:00:00 background ---- inodes
  Archive requests
  samfs.fs_A_XX.2.3657 schedule 2018-10-20 11:20:57 drives:4
  files:1488 space: 110.936G flags:
  (min: 18.500k) priority: 0 0
  Drives busy                                                          <------------------------------
  Archive file 1:
  type:f ino:704705 s:1/f:0 space: 19.500k time:1539754074 priority:0
  hsm/20181013/0XXXXb4.blb
  type:f ino:704711 s:1/f:0 space: 6.016M time:1539754083 priority:0
  hsm/20181013/0XXXXX1.blb
  type:f ino:704696 s:1/f:0 space: 277.636M time:1539754066 priority:0
  hsm/20181014/0XXXXX.blb
  type:f ino:704674 s:1/f:0 space: 1.306G time:1539754027 priority:0
  hsm/20181014/0XXXXX.blb
  ....
  
  


  Note that most of the tape drives are available:

 # /opt/SUNWsamfs/sbin/samcmd r

  Removable media samcmd 6.1.45 10:55:38 Oct 22 2018
  samcmd on HSM

  Removable media status: all
  ty eq status act use state vsn
  sg 51 ---------p 0 0% notrdy
  empty
  sg 52 ---------p 0 0% notrdy
  empty
  sg 53 ---------p 0 0% notrdy
  empty
  sf 61 ---------p 0 0% notrdy
  empty
  sf 62 ---------p 0 0% notrdy
  empty
  sf 63 ---------p 0 0% notrdy
  empty
  sf 64 ---------p 0 0% notrdy
  empty
  ti 71 --l----oPr 2 100% ready MXX68
  Positioning to 0x1a7068|2b 00 00 00 1a 70 68 00 00 00
  ti 72 ---------p 0 0% notrdy
  empty
  ti 73 ---------p 0 0% notrdy
  empty
  ti 74 --l----oPr 2 100% ready MXXX2
  Positioning to 0x26a2f1|2b 00 00 00 26 a2 f1 00 00 00
  ti 75 ---------p 0 0% notrdy
  empty
  ti 76 ---------p 0 0% notrdy
  empty
  ti 81 ---------p 0 0% notrdy
  empty
  tp 82 ---------- 0 0% off
  empty
  ti 83 ---------p 0 0% notrdy
  empty
  ti 84 ---------p 0 0% notrdy
  empty
  ti 85 ---------p 0 0% notrdy
  empty


sam-archiverd:

...
Oct 13 02:32:55 samfs sam-arcopy[25488]: [ID 702911 local7.error] err
Fatal OS call error: ArchReqAttach(misc1.1.4) called from: arcopy.c:710: No such file or directory
Oct 13 03:11:14 samfs klmmod: [ID 710424 kern.notice] NOTICE: Received
NLM_FREE_ALL (DS3000) from 010.030.208.145
Oct 13 03:45:00 samfs ZFS-status: [ID 702911 cron.alert] Sat Oct 13
03:45:00 EDT 2018 /root/zpXXXXeck.sh: Ended with OK status
Oct 13 10:09:49 samfs af-samfs3[23296]: [ID 702911 local7.error] err
Fatal OS call error: MapFileGrow(samfs3.fs_A_DF.1.80049) called from: archreq.c:144
Oct 14 01:32:50 samfs af-samfs4[23295]: [ID 702911 local7.error] err
Fatal OS call error: MapFileGrow(samfs4.fs_P_DF.2.2) called from: archreq.c:144
...

Certain ArchReqs are set to NULL.

ArchReqs are scheduled and then dequeued a short time later:

...
2018-10-22 10:51:34 samfs1[12324:3]: DequeueArchReq(prod.1.27551)
2018-10-22 10:51:34 samfs1[12324:3]: Archreq queue (prod.1.27552) files: 1442 space: 100.483G
2018-10-22 10:51:34 sam-archiverd[18081:3]: samfs.XXXod.1.27552 sorted by path
2018-10-22 10:51:34 sam-archiverd[18081:1]: Scheduling archives
2018-10-22 10:51:34 sam-archiverd[18081:1]: samfs5.R.2.0 divided for 2 drives
2018-10-22 10:51:34 sam-archiverd[18081:1]: Scheduling archives
2018-10-22 10:51:34 sam-archiverd[18081:1]: samfs5.R.2.0 divided for 2 drives
2018-10-22 10:51:39 sam-archiverd[18081:4]: schedQ Dequeue(samfs.XXXod.1.27552 0x760f8)
2018-10-22 10:51:39 samfs1[12324:3]: DequeueArchReq(XXXod.1.27552)
2018-10-22 10:51:39 samfs1[12324:3]: Archreq queue (XXXod.1.27553) files: 1328 space: 100.096G
2018-10-22 10:51:39 sam-archiverd[18081:3]: samfs.XXXod.1.27553 sorted by path
2018-10-22 10:51:39 sam-archiverd[18081:1]: Scheduling archives
2018-10-22 10:51:39 sam-archiverd[18081:1]: samfs5.R.2.0 divided for 2 drives
2018-10-22 10:51:39 sam-archiverd[18081:1]: Scheduling archives
2018-10-22 10:51:39 sam-archiverd[18081:1]: samfs5.R.2.0 divided for 2 drives
2018-10-22 10:51:44 sam-archiverd[18081:4]: schedQ Dequeue(samfs.XXXod.1.27553 0x760f8)
2018-10-22 10:51:44 af-samfs1[12324:3]: DequeueArchReq(XXXod.1.27553)
..

Changes

No known change in the system

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.