BDA Server Fails to Reboot-Hangs or Panics with "Workqueue: usb_hub_wq hub_event"

(Doc ID 2416157.1)

Last updated on JULY 06, 2018

Applies to:

Big Data Appliance Integrated Software - Version 4.4.0 and later
Big Data Appliance Hardware - Version All Versions and later
Linux x86-64

Symptoms

The basic symptom is that a BDA server fails to reboot. Reboot hangs or panics with a call stack as below from the ILOM console logs:

  usb 2-1.7: USB disconnect, device number 5
  INFO: task kworker/0:1:636 blocked for more than 120 seconds.
     Not tainted 4.1.12-70.el6uek.x86_64 #2
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  kworker/0:1 D ################ 0 636 2 0x00000000
  Workqueue: usb_hub_wq hub_event
    ###
    ###
    ###
 Call Trace:
   schedule+0x3e/0x90
   usb_kill_urb+0x75/0xb0
   ? wait_woken+0x90/0x90
   usb_hcd_flush_endpoint+0xde/0x210
   ? mutex_lock+0x16/0x40
   usb_disable_endpoint+0x5d/0x90
   usb_disable_interface+0x47/0x60
   usb_unbind_interface+0x1ce/0x280
   __device_release_driver+0x87/0x120
   device_release_driver+0x2d/0x40
   bus_remove_device+0x12a/0x1a0
   device_del+0x128/0x230
   usb_disable_device+0xb0/0x2a0
   usb_disconnect+0xcd/0x2b0
   ? try_to_wake_up+0x210/0x210
   hub_port_connect+0x70/0x980
   ? usb_alloc_urb+0x1e/0x50
   ? usb_control_msg+0xf1/0x110
   hub_port_connect_change+0x96/0x1b0
   port_event+0x283/0x440
   hub_event+0x22a/0x480
   ? dequeue_task_fair+0x7f/0x4c0
   process_one_work+0x14e/0x4b0
   worker_thread+0x120/0x480
   ? __schedule+0x309/0x880
   ? process_one_work+0x4b0/0x4b0
   ? process_one_work+0x4b0/0x4b0
   kthread+0xce/0xf0
   ? kthread_freezable_should_stop+0x70/0x70
   ret_from_fork+0x42/0x70
   ? kthread_freezable_should_stop+0x70/0x70
...

In the specific case here the server is being rebooted as part of the reprovision process to bring the server back into a BDA V4.7 cluster after having been single-server reimaged following:  Oracle Big Data Appliance Base Image Version 4.10.0 for New Installations or Reprovisioning Existing Installations on Oracle Linux 6 (Doc ID 2364648.1). 

And in the specific case here:

1. The USB drive was recently replaced.

2. CPU error messaging like below were also seen:

NMI watchdog: Watchdog detected hard LOCKUP on cpu 38
NMI watchdog: Watchdog detected hard LOCKUP on cpu 15
NMI watchdog: Watchdog detected hard LOCKUP on cpu 20
NMI watchdog: Watchdog detected hard LOCKUP on cpu 41
NMI watchdog: Watchdog detected hard LOCKUP on cpu 33
NMI watchdog: Watchdog detected hard LOCKUP on cpu 21

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms