Follow

I've got a Linux question. But first some context.

I'm running Manjaro, on a btrfs filesystem with zstd compression activated.
Since Manjaro is an Arch distro, I get my software updates through pacman, which gets packages also compressed with zstd.

Ever since I switched to compressed btrfs I noticed something: pacman downloads take just as long as before, but the installation itself is basically instant.

So my question is: does my system know that, since the pacman packages have the same compression applied to them as my btrfs filesystem, it doesn't need to decompress the software packages only to recompress them again for btrfs? Does it actually just skips decompressing during install and just plops the files compressed as they are to the disk?

I can't figure out why installs have been so fast for me lately, and this is the only explanation that makes sense to me.

· · Web · 5 · 1 · 1
@alyx did it speed up after you enabled compression, or just randomly?
@alyx it's possible that your disk is slow enough that btrfs compressing the data makes all the difference
iirc zstd is supposed to do 500MB/s/core so the (de)compression time shouldn't be noticeable (unless it's configured for really high compression ratios anyway)

@ivesen
Configuration is the default. Thing is, neither my SSD, nor my CPU are that fast. Especially the CPU. Don't think the data saving itself can be that big to make such a big difference.

@alyx if you remember try doing this:
time bash -c 'sudo pacman -Syu; sync'
it might just be that pacman exits before the files are actually written to disk, it's exceedingly unlikely that what you described happens though

@ivesen
Did a test before I noticed you writing this. I had about 417 updates that I delayed. About 1200mb download, 4100MB install size. Download time was about 4 minutes. Install time for the 417 individual packages was 1 minute, with most packages going straight from 0% to 100% progress. Only ones that took a few seconds were Libreoffice and kernel stuff.

I'm used to Linux installs or updates being fast. But I'm also used, from when using ext4, for the install time to be closer to download time. Also, knowing my dingy 8 year old CPU, I find it implausible that any compression standard can decompress 1GB into 4GB, then recompress again and write to disk faster than just decompress and write to disk.

To be clear, it's not that long since I was using ext4. I was still getting packages as zstd back then too. The only variable really is the btrfs filesystem with compression.

@ivesen
If zstd decompression and recompression is basically instantaneous, and my system manages to recompress the 4100MB worth of packages back to 1200MB when writing to disk, considering the install time of 1 minute, that would mean the drive would have to write just 20MB/s, which does indeed sound a lot easier for it to do, than the almost 70MB/s it would need to do for 4100MB uncompressed. But I have my doubts that the recompression bit really is that fast and that the default compression is that high.

@alyx @ivesen this could be a bug in btrfs. if you can find a way to reproduce it, file a bug report and @maweiwei may possible to analyze and give out a solution.

@roytam1 @alyx @ivesen Btrfs has its compression ratio detection (using the first two pages though).

Thus if you're doing zstd compressed packages, then btrfs just detects it can't be further compressed and fallback to non-compressed write.

So no big deal here.

Although recently there seems to be a zstd bug I hit, that zstd failed to decompress/compress any data (in user space only, kernel not affected), with -ENOMEM error.

Thus I guess, the truth is, nothing get updated at all.

@roytam1 @alyx @ivesen Although the recent zstd bug I hit is only happening in arch32, not sure if that's the case.

@alyx it's possible. Arch switched to zstd for their packages recently. Btrfs and zstd have mechanisms that determine if data should be (re)compressed or not

https://wiki.archlinux.org/title/btrfs#Compression

@lewdthewides
>Btrfs and zstd have mechanisms that determine if data should be (re)compressed or not
That's what I would expect as a basic feature of filesystem level compression in general, it just feels unreal that somehow this mechanisms can intervene all the way to pacman, so the system doesn't need to decompress and then recompress zstd.

I expected that pacman would just do it's thing as usual, decompressing the packages, and then btrfs receives the uncompressed files and notices they can compressed back again.

@alyx it just might be better decompression optimization in general you're noticing

"zstd and xz trade blows in their compression ratio. Recompressing all packages to zstd with our options yields a total ~0.8% increase in package size on all of our packages combined, but the decompression time for all packages saw a ~1300% speedup"

@lewdthewides
When I had ext4, I was still getting my packages compressed with zstd. So it's not from that. Whatever it is, it is tied my filesystem.

@lewdthewides
That could actually help explain it A LOT.
I'm also wondering about the deduplication going on with btrfs. How much of the actual bytes of the updates files are gonna be different? I imagine even some individual files might be the same. With btrfs, I can basically take a 1TB folder of stuff, make a copy on the same disk, and it happens instantly, without even taking extra space, because btrfs is smart enough to know they're the same file, so it just points the new copied folder to the same data already on disk.

@alyx hard to say. All depends on how the project is structured
Sign in to participate in the conversation
Game Liberty Mastodon

Mainly gaming/nerd instance for people who value free speech. Everyone is welcome.