There’s been some Friday night kernel drama on the Linux kernel mailing list… Linus Torvalds has expressed regrets for merging the Bcachefs file-system and an ensuing back-and-forth between the file-system maintainer.

  • DaPorkchop_
    link
    fedilink
    arrow-up
    147
    ·
    4 months ago

    bcachefs is way more flexible than btrfs on multi-device filesystems. You can group storage devices together based on performance/capacity/whatever else, and then do funky things like assigning a group of SSDs as a write-through/write-back cache for a bigger array of HDDs. You can also configure a ton of properties for individual files or directories, including the cache+main storage group, amount of data replicas, compression type, and quite a bit more.

    So you could have two files in the same folder, one of them stored compressed on an array of HDDs in RAID10 and the other one stored on a different array of HDDs uncompressed in RAID5 with a write-back SSD cache, and wouldn’t have to fiddle around with multiple filesystems and bind mounts - everything can be configured by simply setting xattr values. You could even have a third file which is striped across both groups of HDDs without having to partition them up.

    • NeoNachtwaechter@lemmy.world
      link
      fedilink
      arrow-up
      23
      arrow-down
      7
      ·
      4 months ago

      two files in the same folder, one of them stored compressed on an array of HDDs in RAID10 and the other one stored on a different array […]

      Now that’s what I call serious over-engineering.

      Who in the world wants to use that?

      And does that developer maybe have some spare time? /s

      • apt_install_coffee
        link
        fedilink
        arrow-up
        64
        ·
        4 months ago

        This is actually a feature that enterprise SAN solutions have had for a while, being able choose your level of redundancy & performance at a file level is extremely useful for minimising downtime and not replicating ephemeral data.

        Most filesystem features are not for the average user who has their data replicated in a cloud service; they’re for businesses where this flexibility saves a lot of money.

        • apt_install_coffee
          link
          fedilink
          arrow-up
          3
          ·
          4 months ago

          I’ll also tac on that when you use cloud storage, what do you think your stuff is stored on at the end of the day? Sure as shit not Bcachefs yet, but it’s more likely than not on some netapp appliance for the same features that Bcachefs is developing.

      • Max-P@lemmy.max-p.me
        link
        fedilink
        arrow-up
        24
        arrow-down
        1
        ·
        4 months ago

        Simple example: my Steam library could be RAID0 and unencrypted but my backups I definitely want to be RAID1 and compressed, and encrypted for security. The media library doesn’t need encryption but maybe want it in RAID1 because ripping movies takes forever. I may also want to have the games on NVMe when I play them, and stored on the HDDs when I’m not playing them, and my VMs on the SATA SSD array as a performance middleground.

          • DaPorkchop_
            link
            fedilink
            arrow-up
            2
            ·
            4 months ago

            Yes, which is why these settings can also be configured per-directory as well as per-file.

      • Semperverus@lemmy.world
        link
        fedilink
        English
        arrow-up
        8
        arrow-down
        1
        ·
        4 months ago

        This probably meets some extreme corporate usecase where they are serving millions of customers.

        • DaPorkchop_
          link
          fedilink
          arrow-up
          18
          ·
          edit-2
          4 months ago

          It’s not that obscure - I had a use case a while back where I had multiple rocksdb instances running on the same machine and wanted each of them to store their WAL only on SSD storage with compression and have the main tables be stored uncompressed on an HDD array with write-through SSD cache (ideally using the same set of SSDs for cost). I eventually did it, but it required partitioning the SSDs in half, using one half for a bcache (not bcachefs) in front of the HDDs and then using the other half of the SSDs to create a compressed filesystem which I then created subdirectories on and bind mounted each into the corresponding rocksdb database.

          Yes, it works, but it’s also ugly as sin and the SSD allocation between the cache and the WAL storage is also fixed (I’d like to use as much space as possible for caching). This would be just a few simple commands using bcachefs, and would also be completely transparent once configured (no messing around with dozens of fstab entries or bind mounts).

          • MrSpArkle@lemmy.ca
            link
            fedilink
            arrow-up
            2
            ·
            4 months ago

            Is there a reason for bind mounting and not just configuring the db to point at a different path?

        • pimeys@lemmy.nauk.io
          link
          fedilink
          arrow-up
          1
          ·
          4 months ago

          I mean… If you have a ton of raw photos in one directory, you can enable the highest compression rate with zstd to it. Every other directory has lz4 with the fastest compression. Your pics take much less space, but the directory will be slower to read and write.

          • bastion@feddit.nl
            link
            fedilink
            arrow-up
            11
            arrow-down
            4
            ·
            edit-2
            4 months ago

            Do your own research, that’s a pretty well-discussed topic, particularly as concerns ZFS.

            • ryannathans@aussie.zone
              link
              fedilink
              arrow-up
              2
              arrow-down
              10
              ·
              edit-2
              4 months ago

              I’m all over ZFS and I am not aware of any unresolved “licence issues”. It’s like a decade old at this point

              • apt_install_coffee
                link
                fedilink
                arrow-up
                3
                ·
                4 months ago

                License incompatibility is one big reason OpenZFS is not in-tree for Linux, there is plenty of public discussion about this online.

                  • apt_install_coffee
                    link
                    fedilink
                    arrow-up
                    1
                    ·
                    edit-2
                    4 months ago

                    Yes, but note that neither the Linux foundation nor OpenZFS are going to put themselves in legal risk on the word of a stack exchange comment, no matter who it’s from. Even if their legal teams all have no issue, Oracle has a reputation for being litigious and the fact that they haven’t resolved the issue once and for all despite the fact they could suggest they’re keeping the possibility of litigation in their back pocket (regardless of if such a case would have merit).

                    Canonical has said they don’t think there is an issue and put their money where their mouth was, but they are one of very few to do so.

          • wewbull@feddit.uk
            link
            fedilink
            English
            arrow-up
            3
            ·
            4 months ago

            Not under a license which prohibits also licensing under the GPL. i.e. it has no conditions beyond what the GPL specifies.

      • Max-P@lemmy.max-p.me
        link
        fedilink
        arrow-up
        8
        arrow-down
        1
        ·
        4 months ago

        ZFS doesn’t support tiered storage at all. Bcachefs is capable of promoting and demoting files to faster but smaller or slower but larger storage. It’s not just a cache. On ZFS the only option is really multiple zpools. Like you can sort of do that with the persistent L2ARC now but TBs of L2ARC is super wasteful and your data has to fully fit the pool.

        Tiered storage is great for VMs and games and other large files. Play a game, promote to NVMe for fast loadings. Done playing, it gets moved to the HDDs.

        • ryannathans@aussie.zone
          link
          fedilink
          arrow-up
          4
          arrow-down
          1
          ·
          edit-2
          4 months ago

          You’re misrepresenting L2ARC and it’s a silly comparison to claim to need TBs of L2ARC and then also say you’d copy the game to nvme just to play it on bcachefs. That’s what ARC does. RAM and SSD caching of the data in use with tiered heuristics.

          • Max-P@lemmy.max-p.me
            link
            fedilink
            arrow-up
            4
            ·
            4 months ago

            I know, that was an example of why it doesn’t work on ZFS. That would be the closest you can get with regular ZFS, and as we both pointed out, it makes no sense, it doesn’t work. The L2ARC is a cache, you can’t store files in it.

            The whole point of bcachefs is tiering. You can give it a 4TB NVMe, a 4TB SATA SSD and a 8 GB HDD and get almost the whole 16 TB of usable space in one big filesystem. It’ll shuffle the files around for you to keep the hot data set on the fastest drive. You can pin the data to the storage medium that matches the performance needs of the workload. The roadmap claims they want to analyze usage pattern and automatically store the files on the slowest drive that doesn’t bottleneck the workload. The point is, unlike regular bcache or the ZFS ARC, it’s not just a cache, it’s also storage space available to the user.

            You wouldn’t copy the game to another drive yourself directly. You’d request the filesystem to promote it to the fast drive. It’s all the same filesystem, completely transparent.

              • apt_install_coffee
                link
                fedilink
                arrow-up
                2
                ·
                edit-2
                4 months ago

                Brand new anything will not show up with amazing performance, because the primary focus is correctness and features secondary.

                Premature optimisation could kill a project’s maintainability; wait a few years. Even then, despite Ken’s optimism I’m not certain we’ll see performance beating a good non-cow filesystem; XFS and EXT4 have been eeking out performance for many years.

                  • apt_install_coffee
                    link
                    fedilink
                    arrow-up
                    1
                    ·
                    4 months ago

                    A rather overly simplistic view of filesystem design.

                    More complex data structures are harder to optimise for pretty much all operations, but I’d suggest the overwhelmingly most important metric for performance is development time.