Abnormal amount of hardware Interrupts caused high CPU issue

Starter:

Interrupts are the signals userland process want to take ownership of CPU for a while(time is very short about milliseconds) to work on its requests via a system call. When this happens, in the single core situation, CPU based on priority and scheduling algorithm to determine whether put the process to the ready queue or keep it running.

There are software interrupts and hardware interrupts to differentiate this interrupts triggered by which components. But too many interrupts will significantly degrade the performance without a doubt. Some time hardware interrupts even got worse if it kept at a high rate. Let’s do a real case review as an example, but before doing that, first, we will review some knowledge about interrupts.

 

Knowledge Prerequisites:

Let’s review some knowledge about interrupts.

What is Interrupts?

Sometimes the ordinary flow of a program must be interrupted to process events that require prompt response. The hardware of a computer provides a mechanism called interrupts to handle these events. For example, when a mouse is moved, the mouse hardware interrupts the current program to handle the mouse movement (to move the mouse cursor, etc.) Interrupts cause control to be passed to an interrupt handler. Interrupt handlers are routines that process the interrupt. Each type of interrupt is assigned an integer number. At the beginning of physical memory, a table of interrupt vectors resides that contain the segmented addresses of the interrupt handlers. The number of interrupt is essentially an index into this table.

Another link is also good to read:
http://www.cs.toronto.edu/~demke/469F.06/Lectures/Lecture6.pdf
 

How many types of Interrupts?

External interrupts are raised from outside the CPU. (The mouse is an example of this type.) Many I/O devices raise interrupts (e.g., keyboard, timer, disk drives, CD-ROM and sound cards). Internal interrupts are raised from within the CPU, either from an error or the interrupt instruction. Error interrupts are also called traps. Interrupts generated from the interrupt instruction are called software interrupts. DOS uses these types of interrupts to implement its API (Application Programming Interface). More modern operating systems (such as Windows and UNIX) use a C based interface.

Many interrupt handlers return control back to the interrupted program when they finish. They restore all the registers to the same values they had before the interrupt occurred. Thus, the interrupted program runs as if nothing happened (except that it lost some CPU cycles). Traps generally do not return. Often they abort the program.
 

What are the typical uses of Interrupts?

Typical uses of interrupts include the following: system timers, disk I/O, power-off signals, and traps. Other interrupts exist to transfer data bytes using UARTs or Ethernet; sense key-presses; control motors; or anything else the equipment must do.

Another typical use is to generate periodic interrupts by dividing the output of a crystal oscillator and having an interrupt handler count the interrupts in order for a processor to keep time. These periodic interrupts are often used by the OS’s task scheduler to reschedule the priorities of running processes. Some older computers generated periodic interrupts from the power line frequency because it was controlled by the utilities to eliminate long-term drift of electric clocks.

For example, a disk interrupt signals the completion of a data transfer from or to the disk peripheral; a process waiting to read or write a file starts up again. As another example, a power-off interrupt predicts or requests a loss of power, allowing the computer equipment to perform an orderly shut-down. Also, interrupts are used in typeahead features for buffering events like keystrokes.

Interrupts are used to allow emulation of instructions which are unimplemented on certain models in a computer line.[9] For example floating point instructions may be implemented in hardware on some systems and emulated on lower-cost systems. Execution of an unimplemented instruction will cause an interrupt. The operating system interrupt handler will recognize the occurrence on an unimplemented instruction, interpret the instruction in a software routine, and then return to the interrupting program as if the instruction had been executed.[10] This provides application software portability across the entire line.

 

Real-life Scenario:

The user has performance issue for reading and writing from some of Windows 2012 servers to the cluster providing file service.

After checking the cluster status found only node 2 had significantly high CPU utilization issue while other nodes were in the normal state.

Uptime shows 1,5 and 15mins workload of CPU(how many tasks in the queue and need to be processed), if your system has four logical cores and you want CPU to maintain an optimal state, this number should not over 4.

 

SMB TCP connections are balanced across first three nodes.

 

In the FreeBSD OS, I used the command to show the top twelve high CPU utilization percentage processes. As we can see the first process took over 62% of WCPU time which was two times larger than the second process. Also, those processes were all related to hardware processes no userland process.

So we got to know most of the CPU time consumed by those hardware level internal requests, but it seems abnormal in most situation, we hope file server take a large portion of CPU time to work on user requests not deal with these nonsenses.

After we went through the hardware level check, we found the issue was DIMM ECC error.

 

That did explain the reason for the high CPU issue at node 2 since memory issue which caused low hit rate and frequent page faults need to retrieve data from disk to memory.

 

Referenced Links and documents:

  • PC Assembly Language, Paul A. Carter, November 2003.
  • https://en.wikipedia.org/wiki/Interrupt

 

One thought on “Abnormal amount of hardware Interrupts caused high CPU issue

Leave a Reply

Your email address will not be published. Required fields are marked *