Processes that do open() system call on /dev/sr0 go into uninterruptible sleep

Ask questions about dedicated servers here and we and other users will do our best to answer them. Please also refer to the self-help section for tutorials and answers to the most commonly asked questions.
Post Reply
goate
New to forums
New to forums
Posts: 2
https://www.youtube.com/channel/UC40BgXanDqOYoVCYFDSTfHA
Joined: Tue May 23, 2017 5:52 pm

Processes that do open() system call on /dev/sr0 go into uninterruptible sleep

Post by goate »

Hello; New NFO user here. My VPS is CentOS7, fully updated. I use "puppet" to manage configuration. One of the first things puppet does when it starts is run "Facter", which gathers information on the system it is running on. After an arbitrary number of puppet runs, my puppet processes begin entering uninterruptible sleep (D) state, which means they will not go away until they either get what they are waiting for or the machine reboots.

From "ps aux | grep puppet" :

Code: Select all

root 7263 0.1 2.7 493232 79984 ? D 20:01 0:03 /opt/puppetlabs/puppet/bin/ruby /opt/puppetlabs/puppet/bin/puppet apply --configtimeout=1024 --environment=XXXXXX_testing /etc/puppetlabs/code/environments/XXXXXX_testing/manifests/site.pp
So, I did an strace to try to figure out which system call was causing this process to go into uninterruptible sleep. The result:

Code: Select all

open("/dev/sr0", O_RDONLY|O_CLOEXEC
Opening /dev/sr0 is something Facter would do while enumerating the system. I decided to try a more direct test with the "eject /dev/sr0" command. It went as well as I expected - the process froze. From "ps aux | grep eject":

Code: Select all

root 9446 0.0 0.0 123456 1220 ? D 20:31 0:00 eject /dev/sr0
Alright. Well, let's try reading a block from it. From "ps aux | grep dd"

Code: Select all

root 9851 0.0 0.0 107940 604 ? D 20:40 0:00 dd if=/dev/sr0 bs=512 count=1
Nope.

Dmesg (kernel) sayz:

Code: Select all

[116640.195077] INFO: task puppet:7263 blocked for more than 120 seconds.
[116640.197003] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[116640.200543] puppet          D ffff8800b9773800     0  7263   6234 0x00000084
[116640.200549]  ffff8800b5b27980 0000000000000086 ffff880036b79f60 ffff8800b5b27fd8
[116640.200552]  ffff8800b5b27fd8 ffff8800b5b27fd8 ffff880036b79f60 ffff8800b5b27ac8
[116640.200555]  ffff8800b5b27ad0 7fffffffffffffff ffff880036b79f60 ffff8800b9773800
[116640.200558] Call Trace:
[116640.200568]  [<ffffffff8168c169>] schedule+0x29/0x70
[116640.200572]  [<ffffffff81689bc9>] schedule_timeout+0x239/0x2c0
[116640.200578]  [<ffffffff813d6d61>] ? notify_remote_via_irq+0x51/0x70
[116640.200581]  [<ffffffff813d7f3b>] ? xen_send_IPI_one+0x2b/0x30
[116640.200585]  [<ffffffff810c5308>] ? try_to_wake_up+0x1c8/0x320
[116640.200589]  [<ffffffff8168c546>] wait_for_completion+0x116/0x170
[116640.200592]  [<ffffffff810c54c0>] ? wake_up_state+0x20/0x20
[116640.200595]  [<ffffffff810a967c>] flush_work+0xfc/0x1c0
[116640.200600]  [<ffffffff810a55f0>] ? move_linked_works+0x90/0x90
[116640.200603]  [<ffffffff810a97c9>] __cancel_work_timer+0x89/0x120
[116640.200605]  [<ffffffff810a9893>] cancel_delayed_work_sync+0x13/0x20
[116640.200609]  [<ffffffff812ffee0>] disk_block_events+0x80/0x90
[116640.200622]  [<ffffffff8123b35e>] __blkdev_get+0x6e/0x4e0
[116640.200625]  [<ffffffff8123b9a5>] blkdev_get+0x1d5/0x360
[116640.200627]  [<ffffffff8123bbdb>] blkdev_open+0x5b/0x80
[116640.200630]  [<ffffffff811fc437>] do_dentry_open+0x1a7/0x2e0
[116640.200634]  [<ffffffff812a8f5c>] ? security_inode_permission+0x1c/0x30
[116640.200636]  [<ffffffff8123bb80>] ? blkdev_get_by_dev+0x50/0x50
[116640.200638]  [<ffffffff811fc60f>] vfs_open+0x5f/0xe0
[116640.200641]  [<ffffffff81209fd8>] ? may_open+0x68/0x110
[116640.200643]  [<ffffffff8120d17d>] do_last+0x1ed/0x12a0
[116640.200645]  [<ffffffff8120e2f2>] path_openat+0xc2/0x490
[116640.200647]  [<ffffffff8121046b>] do_filp_open+0x4b/0xb0
[116640.200655]  [<ffffffff8121d0e7>] ? __alloc_fd+0xa7/0x130
[116640.200658]  [<ffffffff811fd973>] do_sys_open+0xf3/0x1f0
[116640.200660]  [<ffffffff811fda8e>] SyS_open+0x1e/0x20
[116640.200663]  [<ffffffff81697189>] system_call_fastpath+0x16/0x1b
Bottom line is that any process that opens /dev/sr0 is screwed. Puppet has run quite a few times, so this obviously wasn't a problem right after bootup. Possibly a Xen bug?

I tried using the NFO "Server control" menu to switch out DVD images in /dev/sr0 in hopes that /dev/sr0 would talk to my stuck processes and end their long sleep, but evidently that doesn't take effect until reboot.

I don't use the dvd/cdrom functionality in my VPS, so I'm going to attempt to blacklist the cdrom and sr_mod kernel modules. But this is quite a pain in the ass. Possibly give an option in the "Server control" menu to not have DVD drives in the guest at all?

Has anyone else had this issue? I ran a couple of searches on the forums and didn't see anything.
goate
New to forums
New to forums
Posts: 2
Joined: Tue May 23, 2017 5:52 pm

Re: Processes that do open() system call on /dev/sr0 go into uninterruptible sleep

Post by goate »

Just a follow-up, puppet ran fine after I removed the /dev/sr0 and /dev/sr1 files. The existing processes in D state stuck around, of course. I'll see how the module blacklisting worked next reboot.
Post Reply