[PCA 2.x] OCFS2 Corruption Cause The Compute Node Hang And Processes In D State
(Doc ID 2582251.1)
Last updated on JULY 25, 2023
Applies to:
Private Cloud Appliance - Version 2.3.3 to 2.3.4 [Release 2.0]Private Cloud Appliance X5-2 Hardware - Version All Versions to All Versions [Release All Releases]
Linux x86-64
Symptoms
From /var/log/message from a compute node node::
=====================================
kernel: [852841.931594] BD2: no valid journal superblock found
kernel: [852841.931597] (ocfs2rec,18280,4):ocfs2_replay_journal:1620 ERROR: status = -22
kernel: [852841.931603] (ocfs2rec,18280,4):ocfs2_recover_node:1704 ERROR: status = -22
kernel: [852841.931607] (ocfs2rec,18280,4):__ocfs2_recovery_thread:1411 ERROR: Error -22 recovering node 9 on device (249,34)!
kernel: [852841.931613] (ocfs2rec,18280,4):__ocfs2_recovery_thread:1412 ERROR: Volume requires unmount.
kernel: [852841.931755] ocfs2: Begin replay journal (node 9, slot 8) on device (249,34)
kernel: [852841.933037] BD2: no valid journal superblock found
kernel: [852841.933039] (ocfs2rec,18280,4):ocfs2_replay_journal:1620 ERROR: status = -22
kernel: [852841.933045] (ocfs2rec,18280,4):ocfs2_recover_node:1704 ERROR: status = -22
kernel: [852841.933049] (ocfs2rec,18280,4):__ocfs2_recovery_thread:1411 ERROR: Error -22 recovering node 9 on device (249,34)!
kernel: [852841.933055] (ocfs2rec,18280,4):__ocfs2_recovery_thread:1412 ERROR: Volume requires unmount.
kernel: [852841.933255] ocfs2: Begin replay journal (node 9, slot 8) on device (249,34)
kernel: [852841.934538] BD2: no valid journal superblock found
kernel: [852841.934540] (ocfs2rec,18280,4):ocfs2_replay_journal:1620 ERROR: status = -22
kernel: [852841.934546] (ocfs2rec,18280,4):ocfs2_recover_node:1704 ERROR: status = -22
kernel: [852841.934551] (ocfs2rec,18280,4):__ocfs2_recovery_thread:1411 ERROR: Error -22 recovering node 9 on device (249,34)!
kernel: [852841.934556] (ocfs2rec,18280,4):__ocfs2_recovery_thread:1412 ERROR: Volume requires unmount.
kernel: [852841.934713] ocfs2: Begin replay journal (node 9, slot 8) on device (249,34)
kernel: [852841.935983] BD2: no valid journal superblock found
kernel: [852841.935997] (ocfs2rec,18280,4):ocfs2_replay_journal:1620 ERROR: status = -22
kernel: [852841.936003] (ocfs2rec,18280,4):ocfs2_recover_node:1704 ERROR: status = -22
kernel: [852841.936007] (ocfs2rec,18280,4):__ocfs2_recovery_thread:1411 ERROR: Error -22 recovering node 9 on device (249,34)!
kernel: [852841.936013] (ocfs2rec,18280,4):__ocfs2_recovery_thread:1412 ERROR: Volume requires unmount.
kernel: [852841.936208] ocfs2: Begin replay journal (node 9, slot 8) on device (249,34)
kernel: [852841.937443] BD2: no valid journal superblock found
kernel: [852841.937445] (ocfs2rec,18280,4):ocfs2_replay_journal:1620 ERROR: status = -22
kernel: [852841.937451] (ocfs2rec,18280,4):ocfs2_recover_node:1704 ERROR: status = -22
From output of the multipath command from compute node:
=======================================
# multipath -ll 3600144f0926a067f00005be088810008
3600144f0926a067f00005be088810008 dm-4 SUN,ZFS Storage 7350
size=10T features='0' hwhandler='1 alua' wp=rw
'-+- policy='round-robin 0' prio=0 status=enabled
'- 14:0:0:13 sdi 8:128 failed faulty running
From AdminServer.lgo file from the active management node:
=============================================
####<2019-08-23T06:23:47.781+0200> <Info> <com.oracle.ovm.mgr.event.ovs.Storage> <ovcamn05r1> <AdminServer> <EventProcessor-5> <
####<2019-08-23T06:23:48.807+0200> <Info> <com.oracle.ovm.mgr.event.ovs.Storage> <ovcamn05r1> <AdminServer> <EventProcessor-5> <
Dry File System check on the OCFS2 disk:
========================
# fsck.ocfs2 -fn /dev/sdc
fsck.ocfs2 1.8.6
Checking OCFS2 filesystem in /dev/sdc:
Label: OVS64dd4e49e4a52
UUID: 0004FB000005000064364DD4E49E4A52
Number of blocks: 2603824128
Block size: 4096
Number of clusters: 10171188
Cluster size: 1048576
Number of slots: 32
** Skipping journal replay because -n was given. There may be spurious errors that journal replay would fix. **
** Skipping slot recovery because -n was given. **
/dev/sdc was run with -f, check forced.
Pass 0a: Checking cluster allocation chains
Pass 0b: Checking inode allocation chains
Pass 0c: Checking extent block allocation chains
Pass 1: Checking inodes and blocks
Pass 2: Checking directory entries
[DX_TREE_MISSING] Directory 514 is missing index. Rebuild? n =======>> Dry run shows there are some indexes missing
Pass 3: Checking directory connectivity
Pass 4a: Checking for orphaned inodes
Pass 4b: Checking inodes link counts
All passes succeeded.
Changes
N/A
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Changes |
Cause |
Solution |
References |