Recently we have been doing a lot of testing of oclHashcat on CentOS Linux servers. The oclHashcat application takes advantage of the GPU’s, or Graphical Processing Units, of Nvidia graphics cards or ATI graphics cards. Anyhow one of the servers we have been testing with has four Nvidia 295 GTX’s and at times was receiving an error stating the kernel was disabling the IRQ. Below we describe the error in more detail along with a kernel parameter that was added to resolve the error. Even though we experienced this issue with oclHashcat specifically the error could happen with other applications and/or Linux Operating Systems so the resolution could be the same.
Kernel IRQ Disable Error On CentOS:
server kernel: Disabling IRQ #177
The above error actually states the hostname of the server before “kernel” in the above output and the IRQ number won’t necessarily be the same since it could technically be any IRQ experiencing the issue. After some investigation I noticed the below output after receiving the above error to the shell I was working from.
CentOS Messages Log File With IRQ Error Details:
Oct 16 12:20:29 server kernel: irq 177: nobody cared (try booting with the "irqpoll" option) Oct 16 12:20:29 server kernel: Oct 16 12:20:29 server kernel: Call Trace: Oct 16 12:20:29 server kernel: <IRQ> [<ffffffff800babaf>] __report_bad_irq+0x30/0x7d Oct 16 12:20:29 server kernel: [<ffffffff800bade2>] note_interrupt+0x1e6/0x227 Oct 16 12:20:29 server kernel: [<ffffffff800ba2de>] __do_IRQ+0xbd/0x103 Oct 16 12:20:29 server kernel: [<ffffffff8001231e>] __do_softirq+0x89/0x133 Oct 16 12:20:29 server kernel: [<ffffffff8006c9bf>] do_IRQ+0xe7/0xf5 Oct 16 12:20:29 server kernel: [<ffffffff8005726a>] mwait_idle+0x0/0x4a Oct 16 12:20:29 server kernel: [<ffffffff8005d615>] ret_from_intr+0x0/0xa Oct 16 12:20:29 server kernel: <EOI> [<ffffffff88d7828b>] :acpi_cpufreq:acpi_cpufreq_target+0x0/0x3f8 Oct 16 12:20:29 server kernel: [<ffffffff800572a0>] mwait_idle+0x36/0x4a Oct 16 12:20:29 server kernel: [<ffffffff8004947b>] cpu_idle+0x95/0xb8 Oct 16 12:20:29 server kernel: [<ffffffff80077474>] start_secondary+0x498/0x4a7 Oct 16 12:20:29 server kernel: Oct 16 12:20:29 server kernel: handlers: Oct 16 12:20:29 server kernel: [<ffffffff801f1eda>] (usb_hcd_irq+0x0/0x55) Oct 16 12:20:29 server kernel: [<ffffffff801f1eda>] (usb_hcd_irq+0x0/0x55) Oct 16 12:20:29 server kernel: [<ffffffff886fa92c>] (nv_kern_isr+0x0/0x54 [nvidia]) Oct 16 12:20:29 server last message repeated 2 times Oct 16 12:20:29 server kernel: Disabling IRQ #177
As seen above in the messages log file output there is a conflict with IRQ #177 and the server disables that IRQ when the problem is encountered. Notice the very first line in the output recommends passing irqpoll to the kernel during boot which is easy to do by modifying the grub.conf file on your server. You also might be curious what irqpoll actually is so below is a brief description of irqpoll followed by an example of a modified grub.conf file that passes irqpoll to the server.
CentOS Kernel Option irqpoll:
When an interrupt is not handled, search all known interrupt handlers for it and also check all handlers on each timer interrupt. This is intended to get systems with badly broken firmware running.
Example CentOS Grub Configuration File With irqpoll Option:
# grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file # NOTICE: You have a /boot partition. This means that # all kernel and initrd paths are relative to /boot/, eg. # root (hd0,0) # kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00 # initrd /initrd-version.img #boot=/dev/sda default=0 timeout=5 splashimage=(hd0,0)/grub/splash.xpm.gz hiddenmenu title CentOS (2.6.18-194.17.1.el5.centos.plus) root (hd0,0) kernel /vmlinuz-2.6.18-194.17.1.el5.centos.plus ro root=/dev/VolGroup00/LogVol00 irqpoll initrd /initrd-2.6.18-194.17.1.el5.centos.plus.img title CentOS (2.6.18-164.11.1.el5) root (hd0,0) kernel /vmlinuz-2.6.18-164.11.1.el5 ro root=/dev/VolGroup00/LogVol00 irqpoll initrd /initrd-2.6.18-164.11.1.el5.img title CentOS (2.6.18-164.el5) root (hd0,0) kernel /vmlinuz-2.6.18-164.el5 ro root=/dev/VolGroup00/LogVol00 initrd /initrd-2.6.18-164.el5.img
Please note that there are three possible kernels that the server can boot from but the “default=0″ line specifies the server should boot the very first available kernel configuration which in this case is “CentOS (2.6.18-194.17.1.el5.centos.plus)”. The configuration line that starts with kernel below the title configuration line is where irqpoll is specified at the end. The grub.conf file is located in /boot/grub/ and there is also a symbolic link in the /etc directory to it. Use your favorite file editor such as vi to modify /boot/grub/grub.conf and simply add irqpoll to the end of the kernel configuration line. Make sure to not change anything else in the grub.conf file since any errors in this file could cause the server not to boot.