Application killed by oom-killer repeatedly because insufficient swap space

 

Starter:

We know the disk is an extension of the memory when the computer goes into insufficient memory, it will page out some pages to the disk, but do you know where it usually stored? In the Windows world, it will be dumped to the pagefile.sys which located to your default root path of your C drive, in the Linux world, there is a special partition named swap, it will send pages out there.

Why we need to page out inactive or unused pages because we don’t have enough memory to load all the application to the memory and it is unrealistic and doesn’t make sense to purchase the same size memory like your hard disk. In OS spec, there is a virtual memory (virtual space) and page table to make application itself believe all the data is loaded into the memory, actually it is not. When pages were not stored in the memory, the OS will send out interruptions of page fault to load the data from hard disk to memory, same time if the computer in memory low status it will replace some of the inactive pages to the swap space.


 

Knowledge Prerequisites:

What is swap space?

Swap space in Linux is used when the amount of physical memory (RAM) is full. If the system needs more memory resources and the RAM is full, inactive pages in memory are moved to the swap space. While swap space can help machines with a small amount of RAM, it should not be considered a replacement for more RAM. Swap space is located on hard drives, which have a slower access time than physical memory.

Swap space can be a dedicated swap partition (recommended), a swap file, or a combination of swap partitions and swap files.

How much space should be considered to allocate to swap?

Swap should equal 2x physical RAM for up to 2 GB of physical RAM, and then an additional 1x physical RAM for any amount above 2 GB, but never less than 32 MB.

So, if:
M = Amount of RAM in GB, and S = Amount of swap in GB, then

Copied from CentOS official link

 

Real-life Scenario:

Recently I worked with one of my colleague for a Linux server application repeatedly crash issue. It is a more interesting topic to me when I touch it in the first place. It proves the rule of thumb.

Problem Description: The user has an application is always down frequently after the upgrade. They realized that the application is not started suddenly.

  • If we restarted the service it will work for some time, but it will soon crash again.
  • The Linux box load average/ available memory is normal status, no partitions were full.
  • Quickly went through the application logs, we didn’t get obvious error indicates the issue caused by the application itself like bug or etc.

When I closely review the /var/log/messages logs, the answer to the issue quickly emerged.

 

Now move to the key part by filters:

This machine has around 64G memory, but only allocate around 2G swap space, which caused oom-killer to kill the most memory consumed application which is uwsgi. That’s the application user complained about.