Bug in ZFS 0.7.3-1 on CentOS - the zfs module is missing and the system is not booting (kmod-zfs-0.7.3-1.el7_3.x86_64) [WORKAROUND, NOT FIXED!]

Don't panic if you are using zfs-dkms, it looks like the bug affects only kmod-zfs.

Update: I was wrong - did not changed the version of the repo from 7.3 to 7.4. When updating your CentOS system with kmod-zfs, please update also your zfs.repo.

If you don't install the correct kernel like described here, you might see this error message (the system would not boot if it is installed on a ZFS partition):

This bug is due to ZFS module compiled for another kernel.

(A simple workaround is to migrate from kmod-zfs to zfs-dkms like described here.)

The current kernel is 3.10.0-514.10.2.el7.x86_64.

But the ZFS kernel module is for 3.10.0-514.26.2.el7.x86_64:

[root@localhost ~]# find /lib/modules/  | grep zfs
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/avl
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/avl/zavl.ko
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/nvpair
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/nvpair/znvpair.ko
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/unicode
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/unicode/zunicode.ko
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/icp
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/icp/icp.ko
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/zfs
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/zfs/zfs.ko
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/zpios
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/zpios/zpios.ko
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/zcommon
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/zcommon/zcommon.ko
[root@localhost ~]# 

Let's assume we have an old CentOS install (installed like described in this guide):

[root@localhost ~]# cat /proc/version 
Linux version 3.10.0-514.10.2.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 3 00:04:05 UTC 2017
[root@localhost ~]# cat /etc/redhat-release 
CentOS Linux release 7.3.1611 (Core) 
[root@localhost ~]# 
Here are the installed kernels:
[root@localhost ~]# ls /boot/
config-3.10.0-514.10.2.el7.x86_64
config-3.10.0-514.el7.x86_64
grub
grub2
initramfs-0-rescue-263ede4315d14b8ba4b3a93abc833aea.img
initramfs-3.10.0-514.10.2.el7.x86_64.img
initramfs-3.10.0-514.el7.x86_64.img
initrd-plymouth.img
symvers-3.10.0-514.10.2.el7.x86_64.gz
symvers-3.10.0-514.el7.x86_64.gz
System.map-3.10.0-514.10.2.el7.x86_64
System.map-3.10.0-514.el7.x86_64
vmlinuz-0-rescue-263ede4315d14b8ba4b3a93abc833aea
vmlinuz-3.10.0-514.10.2.el7.x86_64
vmlinuz-3.10.0-514.el7.x86_64
[root@localhost ~]# 

Let's make a backup of the 3.10.0-514.10.2.el7.x86_64 just in case:

[root@localhost ~]# cd /boot/
[root@localhost boot]# cp initramfs-3.10.0-514.10.2.el7.x86_64.img initramfs-3.10.0-514.10.2.bak1.el7.x86_64.img
[root@localhost boot]# cp vmlinuz-3.10.0-514.10.2.el7.x86_64 vmlinuz-3.10.0-514.10.2.bak1.el7.x86_64
[root@localhost boot]# cd
[root@localhost ~]# 

When updating the system we exclude the kernel:

# yum update --exclude=kernel* -y

We confirm the bug – the ZFS module is for another kernel version, not the installed:

[root@localhost ~]# find /lib/modules/  | grep zfs
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/avl
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/avl/zavl.ko
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/nvpair
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/nvpair/znvpair.ko
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/unicode
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/unicode/zunicode.ko
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/icp
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/icp/icp.ko
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/zfs
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/zfs/zfs.ko
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/zpios
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/zpios/zpios.ko
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/zcommon
/lib/modules/3.10.0-514.26.2.el7.x86_64/extra/zfs/zcommon/zcommon.ko
[root@localhost ~]# 

We install the wget and downloading the correct kernel:

# yum install wget
# wget https://buildlogs.centos.org/c7.1611.u/kernel/20170704132018/3.10.0-514.26.2.el7.x86_64/kernel-3.10.0-514.26.2.el7.x86_64.rpm

We install the kernel we just downloaded:

[root@localhost ~]# rpm -Uvh --oldpackage kernel-3.10.0-514.26.2.el7.x86_64.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:kernel-3.10.0-514.26.2.el7       ################################# [ 33%]
grubby fatal error: unable to find a suitable template
Cleaning up / removing...
   2:kernel-3.10.0-514.10.2.el7       ################################# [ 67%]
   3:kernel-3.10.0-514.el7            ################################# [100%]
[root@localhost ~]# 

Before to run the grub we need to do this (workaround for another bug):

[root@localhost ~]# cd /dev
[root@localhost dev]# ln -s /dev/disk/by-id/* . -i
[root@localhost dev]# cd
[root@localhost ~]# 

We make a backup for the recently installed kernel:

[root@localhost ~]# cd /boot/
[root@localhost boot]# ls
config-3.10.0-514.26.2.el7.x86_64
efi
grub
grub2
initramfs-0-rescue-263ede4315d14b8ba4b3a93abc833aea.img
initramfs-3.10.0-514.10.2.bak1.el7.x86_64.img
initramfs-3.10.0-514.26.2.el7.x86_64.img
initrd-plymouth.img
symvers-3.10.0-514.26.2.el7.x86_64.gz
System.map-3.10.0-514.26.2.el7.x86_64
vmlinuz-0-rescue-263ede4315d14b8ba4b3a93abc833aea
vmlinuz-3.10.0-514.10.2.bak1.el7.x86_64
vmlinuz-3.10.0-514.26.2.el7.x86_64
[root@localhost boot]# cp vmlinuz-3.10.0-514.26.2.el7.x86_64 vmlinuz-3.10.0-514.26.2.bak1.el7.x86_64
[root@localhost boot]# cp initramfs-3.10.0-514.26.2.el7.x86_64.img  initramfs-3.10.0-514.26.2.bak1.el7.x86_64.img
[root@localhost boot]# cd
[root@localhost ~]# 

We update the Grub menu:

[root@localhost ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-514.26.2.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-514.26.2.el7.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-514.26.2.bak1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-514.26.2.bak1.el7.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-514.10.2.bak1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-514.10.2.bak1.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-263ede4315d14b8ba4b3a93abc833aea
Found initrd image: /boot/initramfs-0-rescue-263ede4315d14b8ba4b3a93abc833aea.img
done
[root@localhost ~]# 

Rebooting the system...

# reboot

And now it should work:

[root@localhost ~]# cat /proc/version 
Linux version 3.10.0-514.26.2.el7.x86_64 (mockbuild@c1bm.rdu2.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Jul 4 13:29:22 UTC 2017
[root@localhost ~]# cat /etc/redhat-release 
CentOS Linux release 7.4.1708 (Core) 
[root@localhost ~]# 

My bug report is here: https://github.com/zfsonlinux/zfs/issues/6834.


* * *

Comment from Reddit:

Just boot into a previous kernel. If you don't have one, just boot a live medium.

In some cases there is no previous kernel. And the 'live medium' does not contain the ZFS kernel module. And the system is VPS and there is no way to run a 'live medium' with ZFS support.

This was the case with my old install (not updated since first install).

It rebuilds (overwrites) the initramfs with no zfs kernel module in it (after the 'yum update') and there is no way to boot into a previous kernel, because there is no previous kernel.

If your previous kernel is not the exact version (3.10.0-514.26.2.el7.x86_64) it will not boot. I tested it on my 'production' install, it boots only with 3.10.0-514.26.2.el7.x86_64. Other initramfs's are without zfs module.

The correct kernel version is not available with 'yum install kernel-3.10.0-514.26.2.el7.x86_64' (error: 'No package kernel-3.10.0-514.26.2.el7.x86_64 available.'). It can be installed only from the RPM package now.

Luckily, on my 'production' install, I ran 'yum update' several times and there was one working 'previous' kernel. I was lucky. Many people just use their systems for months without a single 'yum update' and don't have working 'previous' kernel (their initramfs is overwritten after the 'yum update').

This is why we should make backups of the kernels before 'yum install'.

Update: I was wrong

When updating your CentOS system with kmod-zfs, please update also your zfs.repo. I did not changed the version of the repo from 7.3 to 7.4.

Comments

  1. Thanks Microsoft for shipping a broken update AGAIN!

    ReplyDelete

Post a Comment