Abnormal patchmgr Termination when upgrading RoCE Leaf Switches may leave Ports in a Shutdown State
(Doc ID 2984407.1)
Last updated on DECEMBER 06, 2024
Applies to:
Cisco Nexus Switch - Version All Versions to All Versions [Release All Releases]Information in this document applies to any platform.
Symptoms
If patchmgr process terminates or network connection fails while patching a RoCE switch the ports can be left in a shutdown state.
One Example of a failed Roce switch upgrade
With arguments: --roceswitches /u01/patches/rocesw_group --upgrade
2023-10-15 09:52:10 -0400 1 of 2:Running upgrade on switch rocea0
2023-10-15 09:52:14 -0400: [INFO ] Performing Nodes connectivity tests on rocea0
2023-10-15 09:52:20 -0400: [SUCCESS ] Nodes connectivity tests on rocea0 are successful
2023-10-15 09:52:23 -0400: [INFO ] EPLD Version - Found: 0x5/0x11, Required: 0x5/0x16
2023-10-15 09:52:30 -0400: [INFO ] Switch rocea0 will be upgraded from nxos.7.0.3.I7.9.bin to nxos64-cs.10.2.4.M.bin
2023-10-15 09:52:30 -0400: [INFO ] Checking for free disk space on switch
2023-10-15 09:52:30 -0400: [INFO ] disk is 91.00% free, available: 107050528768 bytes
2023-10-15 09:52:30 -0400: [SUCCESS ] There is enough disk space to proceed
2023-10-15 09:52:31 -0400: [INFO ] Found nxos64-cs.10.2.4.M.bin on switch, skipping download
2023-10-15 09:52:31 -0400: [INFO ] Verifying sha256sum of bin file on switch
2023-10-15 09:52:57 -0400: [SUCCESS ] sha256sum matches: 84f930ca02487dd8a881049d65fd1bbdc8882841de88cae0bd176c494054aff2
2023-10-15 09:55:09 -0400: [INFO ] Performing FW install of nxos64-cs.10.2.4.M.bin on rocea0
2023-10-15 09:57:09 -0400: [INFO ] reload of rocea0 is in progress
2023-10-15 10:03:44 -0400: [FAIL ] [FirmwareUpgradeError] switch rocea0 failed to come up <---- Patchmgr waits about 6 minutes after the reload
SUMMARY OF ERRORS:
2023-10-15 10:03:44 -0400: [FAIL ] [FirmwareUpgradeError] switch rocea0 failed to come up
2023-10-15 10:03:44 -0400 :FAILED : upgrade 2 RoCE switch(es) to 10.2.4
2023-10-15 10:03:44 -0400 :ERROR : FAILED run of command:./patchmgr --roceswitches /u01/patches/rocesw_group --upgrade
2023-10-15 10:03:45 -0400 :INFO : upgrade performed on switch(es) in file /u01/patches/rocesw_group: [ rocea0 roceb0]
other network issue or long delay with the switch responding to patchmgr after a reload.
Log into the switch (ssh admin@<IP-ADDR>) and check for ports in the Administratively down state
rocea0#show interface brief
--------------------------------------------------------------------------------
Ethernet VLAN Type Mode Status Reason Speed Port
Interface Ch #
--------------------------------------------------------------------------------
Eth1/1 1 eth access down XCVR not inserted auto(D) --
Eth1/2 1 eth access down XCVR not inserted auto(D) --
Eth1/3 1 eth access down XCVR not inserted auto(D) --
Eth1/4 1 eth trunk up none 100G(D) 100<--- In this case ISLs are up on a single rack
Eth1/5 1 eth trunk up none 100G(D) 100<--- In this case ISLs are up on a single rack
Eth1/6 1 eth trunk up none 100G(D) 100<--- In this case ISLs are up on a single rack
Eth1/7 1 eth trunk up none 100G(D) 100<--- In this case ISLs are up on a single rack
Eth1/8 3888 eth access down Administratively down 100G(D) -- <--- Node ports 8-29 are shut down
Eth1/9 3888 eth access down Administratively down 100G(D) --
Eth1/10 3888 eth access down Administratively down 100G(D) --
Eth1/11 3888 eth access down Administratively down 100G(D) --
Eth1/12 3888 eth access down Administratively down 100G(D) --
Eth1/13 3888 eth access down Administratively down 100G(D) --
Eth1/14 3888 eth access down Administratively down 100G(D) --
Eth1/15 3888 eth access down Administratively down 100G(D) --
Eth1/16 3888 eth access down Administratively down 100G(D) --
Eth1/17 3888 eth access down XCVR not inserted 100G(D) --
Eth1/18 3888 eth access down Administratively down 100G(D) --
Eth1/19 3888 eth access down XCVR not inserted 100G(D) --
Eth1/20 3888 eth access down Administratively down 100G(D) --
Eth1/21 3888 eth access down Administratively down 100G(D) --
Eth1/22 3888 eth access down Administratively down 100G(D) --
Eth1/23 3888 eth access down Administratively down 100G(D) --
Eth1/24 3888 eth access down Administratively down 100G(D) --
Eth1/25 3888 eth access down Administratively down 100G(D) --
Eth1/26 3888 eth access down Administratively down 100G(D) --
Eth1/27 3888 eth access down Administratively down 100G(D) --
Eth1/28 3888 eth access down Administratively down 100G(D) --
Eth1/29 3888 eth access down Administratively down 100G(D) --
Eth1/30 1 eth trunk up none 100G(D) 100<--- In this case ISLs are up on a single rack
Eth1/31 1 eth trunk up none 100G(D) 100<--- In this case ISLs are up on a single rack
Eth1/32 1 eth trunk up none 100G(D) 100<--- In this case ISLs are up on a single rack
Eth1/33 1 eth trunk up none 100G(D) 100<--- In this case ISLs are up on a single rack
Eth1/34 1 eth access down XCVR not inserted auto(D) --
Eth1/35 1 eth access down XCVR not inserted auto(D) --
Eth1/36 1 eth access down XCVR not inserted auto(D) --
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |