I wonder why they use A/B root in the first place instead of a single BTRFS partition with Subvolumes and snapshots
This is standard for devices which receive firmware and OS updates non-interactively. Edge devices, phones, routers…etc. It’s a simple and effective way to lessen the chance that a device may brick during an update failure or similar event.
One running partition is the primary known-good copy of the system, and the other is a failover of a previous known-good. When an update is received, it isn’t applied directly to the current primary, it’s applied to failover. When the system reboots, the bootloader attempts to boot the newly updated partition to see if it works, and if it does, it is marked as the “new” known-good primary and boots from then on. If not, the existing primary is rebooted, and the user is notified that a failure occured, and dually an error or recourse to take if so.
Subvolumes and such require a kernel to be loaded in order to use, so that’s why the base device partitions don’t run that way. Even if you wanted to go that way, it’s safer working at the lower levels as above when you’re dealing with deployed devices out in the world. Nobody wants a customer service disaster on their hands if devices start bricking themselves from a bad update.
Subvolumes and such require a kernel to be loaded in order to use, so that’s why the base device partitions don’t run that way.
That’s a great point I never thought about. I really wondered why they wouldn’t go with btrfs subvolumes, since they could easily btrfs send and receive subvols like they do now with whole partitions. Subvols would even have the benefit of less space needed since many files probably stay the same between updates.
My guess was that the update mechanism used doesn’t support btrfs, though after a quick search on the rauc github it might actually support it.
steamos-teardown is a great project to learn more about SteamOS, btw. https://github.com/77Z/steamos-teardown
Pretty much every Linux bootloader supports BTRFS these days.
The critical thing though, is that happens if your BTRFS partition gets corrupted? You just lost your failover since both your primary and failover are on the same partition.
That’s fine on a desktop system where the user can boot into a recovery image and repair the filesystem, but it’s not fine when you do a completely automated system upgrade. So for a kiosk, console, or other embedded system, the two partition setup is more reliable than a BTRFS root with subvolumes.
Possibly because of better reliability. If a filesystem breaks, all subvolumes it contains break in turn. Whereas independent filesystems will continue to run if one is corrupted.
mechamism