[Linux OS] System Hung with Large Numbers of Page Allocation Failures with "order:5" on Exadata Environments
(Doc ID 1546861.1)
Last updated on MARCH 04, 2021
Applies to:Linux OS - Version Oracle Linux 5.5 and later
Oracle Cloud Infrastructure - Version N/A and later
This is commonly seen on Exadata platforms that use Infiniband for cluster communications.
System is under memory pressure. Exadata Infiniband using IPoIB has an MTU of 65520. This MTU plus overhead puts the memory allocation for IP based packets at 32 4k pages (order 5), which have to be contiguous. If the system is under memory pressure it can become very difficult to find 32 contiguous pages of memory. Note also that this can occur with lower order allocation requests as well.
/var/log/messages or serial console will show numerous messages like the following page allocation failure and associated call trace showing the int network code (*sock*, *tcp*, etc), especially key being the __alloc_skb() call in the trace as this is they call where the memory allocation is being attempted:
Along with the "page allocation failure" messages in system log, the server may become hanging (unresponsive or freeze for any action), sometimes server reboots after above failure.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!
In this Document