Rescue boot Proxmox VE host
We will take a look at an unfortunate scenario which appears not to be well documented, let alone intuitive when it comes to troubleshooting - a Proxmox VE host that completely fails to boot. This approach is equally helpful in other cases, e.g. when creating a full host backup , which actually follows up from this guide.
Rescue not to the rescue
Natural first steps while attempting to rescue a system would be to aim for the bespoke PVE ISO installer and follow exactly the menu path:
- Advanced Options > Rescue Boot
This may indeed end up booting up partially crippled system, but it is completely futile in a lot of scenarios, e.g. on otherwise healthy ZFS install, it can simply result in an instant error:
error: no such device: rpoolERROR: unable to find boot disk automatically
Besides that, we do NOT want to boot the actual (potentially broken) PVE host, we want to examine it from a separate system that has all the tooling, make necessary changes and reboot back instead. Similarly, if we are trying to make a solid backup, we do NOT want to be performing this on a running system - it is always safer for the entire system being backed up to be NOT in use, safer than backing up a snapshot would be.
ZFS on root
We will pick the “worst case” scenario of having a ZFS install. This is because standard Debian does NOT support it out-of-the box and while it would be appealing to simply make use of corresponding Live System to boot from (e.g. Bookworm for the case of PVE v8), this won’t be of much help with ZFS as provided by Proxmox.
Note
That said, for any other install than ZFS, you may successfully go for the Live Debian, after all you will have full system at hand to work with, without limitations and you can always install a Proxmox package if need be.
Caution
If you got the idea of pressing on with Debian anyhow and taking advantage of its own ZFS support via the contrib repository, do NOT do that. You will be using completely different kernel with completely incompatible ZFS module, one that will NOT help you import your ZFS pool at all. This is because Proxmox use what are essentially Ubuntu kernels, with own patches, at times reverse patches and ZFS which is well ahead of Debian and potentially with cherry-picked patches specific to only that one particular PVE version.
Such attempt would likely end up in an error similar to the one below:
status: The pool uses the following feature(s) not supported on this system:
com.klarasystems:vdev_zaps_v2
action: The pool cannot be imported. Access the pool on a system that supports
the required feature(s), or recreate the pool from backup.We will therefore make use of the ISO installer, however go for the not-so-intuitive choice:
- Advanced Options > Install Proxmox VE (Terminal UI, Debug Mode)
This will throw us into terminal which would appear stuck, but in fact it would be ready for input reading:
Debugging mode (type 'exit' or press CTRL-D to continue startup)Which is exactly what we will do at this point, press C^D to get ourselves a root shell:
root@proxmox:/# _This is how we get a (limited) running system that is not our PVE install that we are (potentially) troubleshooting.
Note
We will, however, NOT further proceed with any actual “Install” for which this option was originally designated.
Networking
This step is actually NOT necessary, but we will opt for it here as we will be more flexible in what we can do, how we can do it (e.g. copy & paste commands or even entire scripts) and where we can send store to and retrieve our files from (other than a local disk).
DHCP available on the network
Assuming the network provides DHCP, we might simply get an IP address with dhclient:
dhclient -vThe output will show us the actual IP assigned, but we can also check with hostname -I, which will give us exactly the one we need without looking at all the interfaces.
Alternative: Static network environment
These are alternative steps for when DHCP is not available or desirable. Only the most rudimentary setup is considered:
- interface
eno1 - static IP of
10.10.10.10/24 - gateway on
10.10.10.1 - DNS resolution provided by
1.1.1.1
ip address flush dev eno1
ip address add 10.10.10.10/24 dev eno1
ip link set eno1 up
ip route replace default via 10.10.10.1
cat > /etc/resolv.conf << EOF
nameserver 1.1.1.1
search internal
EOFConfirm all is well:
ip address
ip route
dig www.google.comTip
If you prefer the familiar /etc/network/interfaces syntax style, you can create (or pick, e.g. from a backup) a snippet configuration file and use ifup -a -i /path/to/file command. Note also that interface names might differ to what they appeared on an installed PVE and bridge-utils is not available during PVE ISO installer boot.
SSH access
We will now install SSH server:
apt update
apt install -y openssh-serverNote
You can safely ignore error messages about unavailable enterprise repositories.
Further, we need to allow root to actually connect over SSH, which - by default - would only be possible with a key, either manually editing the configuration file and looking for PermitRootLogin
line that we uncomment and edit accordingly, or simply appending the line with:
cat >> /etc/ssh/sshd_config <<< "PermitRootLogin yes"Time to start the SSH server:
mkdir /run/sshd
/sbin/sshdTip
You can check whether it is running with ps -C sshd -f.
One last thing, let’s set ourselves a password for the root:
passwdAnd now remote connect from another machine - and use it to make everything further down easier on us:
ssh root@10.10.10.10Host filesystem
This is a point in time where further steps depend on the task at hand. For restoring from a prior backup , this is the point to start partitioning the target drive, while for making such backup or general troubleshooting, the host filesystem needs to be accesible. The necessary steps depend on the installation type, but we will proceed with the ZFS on root scenario, as it is the most tricky. If you have any other setup, e.g. LVM or BTRFS, it is much easier to just follow readily available generic advice on mounting those filesystems.
Import the ZFS pool
All we are after is getting access to what would ordinarily reside under the root (/) path, mounting it under a working directory such as /mnt. This is something that a regular mount command will NOT help us with in a ZFS scenario.
If we just run the obligatory zpool import now, we would be greeted with:
pool: rpool
id: 14129157511218846793
state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:
rpool UNAVAIL unsupported feature(s)
sda3 ONLINEAnd that is correct. But a pool that has not been exported does not signify anything special beyond that the pool has been marked by another “system” and is therefore presumed to be unsafe for manipulation by others. It’s a mechanism to prevent the same pool being accessed by multiple hosts at same time inadvertently - something, we do not need to worry about here.
We could use the (in)famous -f option, this would be even suggested to us if we were more explicit about the pool at hand:
zpool import -R /mnt rpoolWarning
Note that we are using the -R switch to mount our pool under /mnt path, if we were not, we would mount it over our actual root filesystem of the current (rescue) boot. This is inferred purely based on the information held by the ZFS pool itself which we do NOT want to manipulate.
cannot import 'rpool': pool was previously in use from another system.
Last accessed by (none) (hostid=9a658c87) at Mon Jan 6 16:39:41 2025
The pool can be imported, use 'zpool import -f' to import the pool.But we do NOT want this pool to then appear as foreign elsewhere. Instead, we want current system to think it is the same as the one originally accessing the pool. Take a look at the hostid
that is expected: 9a658c87 - we just need to write it into the binary /etc/hostid file - there’s a tool for that:
zgenhostid -f 9a658c87Now importing a pool will go without a glitch… Well, unless it’s been corrupted, but that would be for another guide.
zpool import -R /mnt rpoolThere will NOT be any output on the success of the above, but you can confirm all is well with:
zpool statusTroubleshooting
What we have now is the PVE host’s original filesystem mounted under /mnt with full access to it. We can do whatever we came here for, a good example would be a
complete system backup
.
That’s it
Once done, time for a quick exit:
zfs unmount rpool
reboot -fTip
If you are looking to power the system off, then poweroff -f will do instead.
And there you have it, safely booting into an otherwise hard to troubleshoot setup with bespoke Proxmox kernel guaranteed to support the ZFS pool at hand.