What's an elegant way of automatically backing up the contents of a large drive to multiple smaller drives that add up to the capacity of the large drive?

HiddenLayer5 · 1 year ago

What's an elegant way of automatically backing up the contents of a large drive to multiple smaller drives that add up to the capacity of the large drive?

GnomeComedy@beehaw.org · 1 year ago

Don’t become so concerned with if you could, that you overlook if you should.

I would buy a larger drive.

HiddenLayer5 · edit-2 1 year ago

That would probably be the most elegant solution overall and I appreciate the suggestion, but a new drive costs money that I don’t currently have an abundance of and I already have empty drives that aren’t being used, which I had accumulated over time and had already paid for ages ago. If I’m being honest, the reason I want to do it this way is because I don’t really see the value of using a brand new drive for an offline backup of personal data where the drive will be plugged in at best once a month before being stored in a drawer. If I buy a brand new drive I’d rather actually use it as part of the active storage in my server and keep it running to get the most utility out of it.

iwasgodonce@lemmy.world · 1 year ago

https://www.gnu.org/software/tar/manual/html_section/Using-Multiple-Tapes.html

Might do kind of what you want.

Molecule5076@lemmy.world · 1 year ago

Something like mergerfs? I think this is what Unraid uses if I remember right.

https://github.com/trapexit/mergerfs

rambos@lemmy.world · edit-2 1 year ago

If OP cant use more than one disk at once, how can they benefit from mergerfs?

Molecule5076@lemmy.world · 1 year ago

Yeah you’re right. Scratch that then

HiddenLayer5 · 1 year ago

Thank you!

ricecake@sh.itjust.works · 1 year ago

https://www.gnu.org/software/tar/manual/html_node/Multi_002dVolume-Archives.html

You might end up splitting files across drives, but I don’t think you’re likely to find a more “out of the box” solution. You might combine it with the compression flags to make sure things fit, and don’t forget to number your drives!

HiddenLayer5 · 1 year ago

Thank you!

AbidanYre@lemmy.world · edit-2 1 year ago

Git annex can do that and keep track of which drive the files are on.

https://git-annex.branchable.com/

Squid@leminal.space · edit-2 1 year ago

You’ll have ask the question of how important is this data, then before you start run drive diagnostic tool to see if all are functioning as expected, I’d suggest moving directories aposed to chopping anything up as to maintain some form of redundancy if a drive were to fail. It’ll be a long process. Hope it goes well

Resync is a handy tool

FigMcLargeHuge@sh.itjust.works · 1 year ago

It’s going to take a little work here, but I have a large drive on my plex, and a couple of smaller drives that I back everything up to. On the large drive get a list of the main folders. You can do a “du -h --max-depth=1 | sort -hk1” on the root folder to get an idea of how you should split them up. Once you have an idea, make two files, each with their own list of folders (eg: folders1.out and folders2.out) that you want to go to each separate drive. If you have both of the smaller drives mounted, just execute the rsync commands, otherwise, just do each rsync command with the corresponding drive mounted. Here’s an example of my rsync commands. Keep in mind I am going from an ext4 filesystem to a couple of ntfs drives, which is why I use the size only. Make sure and do a dry run or two, and you may or may not want the ‘–delete’ command in there. Since I don’t want to keep files I have deleted from my plex, I have it delete them on the target drive also.

sudo rsync -rhi --delete --size-only --progress --stats --files-from=/home/plex/src/folders1.out /media/plex/maindrive /media/plex/4tbbackup

sudo rsync -rhi --delete --size-only --progress --stats --files-from=/home/plex/src/folders2.out /media/plex/maindrive /media/plex/other4tbdrive

jayrhacker@kbin.social · 1 year ago

ZFS will let you setup a RAID like set of small volumes which mirror one larger volume, it takes some setup, but that’s the most “elegant” solution in that once it’s configured you only need to touch it when you add a volume to the system and it’s just a mounted filesystem that you use.

Does not solve the off-site problem, one fire and it’s all gone.

lemmyvore@feddit.nl · 1 year ago

It would also require all the secondary drives to be connected at all times, wouldn’t it?

flux · 1 year ago

I just noticed https://lemmy.ml /u/giloronfoo@beehaw.org had proposed the same, but here’s the same but with more words ;).

I would propose you try to split the data you have manually into logically separate parts, so that you could logically fit 0.8 TB on one drive, 0.4 TB on another, and maybe sets of 0.2TB+0.2TB on a third one. Then you’d have a script that uses traditional backup approaches with modern backup apps to back up the particular data set for the disk you have attached to the system. This approach will allow you to access painlessly modern “infinite increments” backups where you persist older versions of data without doing full and incremental backups separately. You should then write a script to ensure no important data is forgotten to be backed up and that there are no overlapping backups (except for data you want to back up twice?).

For example, you could have a physical drive with sticker “photos and music” on it to back up your ~/Photos and ~/Music.

At some point some of those splits might become too large to fit into its allocated storage, which would be additional manual maintenance. Apply foresight to avoid these situations :).

If that kind of separation is not possible, then I guess tar+multi volume splitting is one option, as suggested elsewhere.

HiddenLayer5 · edit-2 1 year ago

That is actually what I’m currently doing, in fact my file server is already organized in this way, but i personally don’t like it for offline backups because it still forces me to play digital tetris and work out what directories will fit on what drive, and there is also the issue that some of my directories, particularly the one containing all the lossless files from my (hobby) photography work, is getting close to growing larger than 1 TB at this point (I do a ton of urban and industrial photography and I honestly might have most of the interesting parts of my city documented at this point, plus different versions the same scene with different settings which is how I ended up with so much data). Though I suppose I can just split it into separate years instead of just one huge directory. I’m personally hoping for something that can automate this process so I don’t have to consciously keep track of it as much (I don’t trust my brain sometimes), currently experimenting with some of the suggested solutions, maybe I’ll find one that works better, if not then I’ll stick to the method you mentioned. Thank you for the suggestion though!

Sina@beehaw.org · 1 year ago

This is really is not a good idea for a backup.

giloronfoo@beehaw.org · 1 year ago

I would do it by manually splitting it up into sets and writing scripts to back up each of those sets. Then you only have to figure out the split once.

I wonder if rsync has an option to do what you are asking for?

It also sounds like the kind of thing the old tape backup software would do. Maybe look into something that can pretend the drives are tapes.

captcha [any]@hexbear.net · 1 year ago

Im going to say that doesnt exist and restoring from it would be a nightmare. You could cobble together a shell or python script that does that though.

You’re better off just getting a drive bay and plugging all the drives in at once as an LVM.

You could also do the opposite, which is split the 4TB into the different logical volumes. Each the same size as a drive.

lemmyvore@feddit.nl · 1 year ago

It wouldn’t be so complicated to restore as long as they keep full paths and don’t split up subdirectories. But yeah, sounds like they’d need a custom tool to examine their dirs and do a solve a series of knapsack problems.

Deckweiss@lemmy.world · edit-2 1 year ago

If you are lucky enough, borgbackup could deduplicate and compress the data enough to fit a 1tb drive. Depending on the content of course, but it’s deduplication & compression is really insanely efficient for certain cases. (I have 3 devices with ~900GB each (so just shy of 3TB in total) which all gets stored in a ~400gb borgbackup)

restlessyet@discuss.tchncs.de · 1 year ago

I ran into the same problem some months ago when my cloud backups stopped being financially viable and I decided to recycle my old drives. For offline backups mergerfs will not work as far as I understand. Creating tar archives of 130TB+ also doesnt sound like a good option. Some of the tape backup solutions looked to be possible options, but are often complex and use special archive formats…

I ended up writing my own solution in python using json state files. It’s complete enough to run the backup, but otherwise very work-in-progress with no restore at all. So I do not want to publish it.

If you find a suitable solution I am also very interested 😅

retrieval4558@mander.xyz · 1 year ago

Probably not the answer you’re looking for but I’d probably build a dedicated nas.