I work in a niche inside a niche. I deal with terabytes of storage, massive servers, a variety of storage tech, and I’ve been in interested in computers in general for… Around 40 years. (Yeah, I’m old.)
I have my own single person company and have worked in 40+ US states, done assignments in the UK, Norway.
AMA.
Archive.org, one of my favourite websites, has been targetted by litigation.
What do you feel will happen to this website in the years and decades ahead? And what will happen to digital archiving should the site be eliminated from the web?
Do you see a movement for more works in the public domain? Many museums and NASA are sharing much of their collections, but media at large does not have reliable repositories for many , even niche, important works.
Again, that’s a little outside my realm of expertise, because I’m archiving digital records for companies and organizations – so there’s no question about ownership or copyright, it’s about legal and regulatory compliance (how long you’re allowed to keep a document, etc.).
Personally speaking, I profoundly dislike the idea that works are protected for longer than a human lifetime. It’s hard for society and technology to progress if ideas are locked up by copyright, patents, trademarks, and other intellectual property laws. There are patents that have been registered for which no product has been created for decades – preventing someone from trying to make an idea a tangible thing, and that’s dumb.
If you make a hit song, and you profit from 10 or 20 years of popularity, shouldn’t that be enough? Shouldn’t we encourage people to do more than coast for 20 years on one idea? Shouldn’t the public benefit from that work falling into the public domain? Anyway, we’re way off topic here. :)
What is the best filesystem for archive ? BTRFS, ZFS, ReFS?
How will quantum computer affect your field ?
What is the 2023 bottleneck ?
IMHO , it’s permanent storage . I remember in 2005 up to 2010, we all wanted the fastest CPU , more GHz , more cores ,etc. the industry gives us that . Then during the last decade , we all wanted a more powerful GPU, more core ,more memory ,more Mhz. We also craved faster internet connection ,now we have optical fiber with 3.5Gb/sec in home for 120$/month .(impossible to think in 2014)
Nowadays , I have the impression that permanent storage is lagging behind . With the new medias being in 8K, video games storing average 100Gb and what not, we regularly move around dozens of Gb, even as casual users .
I’m only familiar with ZFS, but only in my lab, not in production… ZFS is great because it can self-heal files / re-allocate blocks. I tried it on SMR drives, and it’s terrible, I advise against it. :)
ZFS is very good, but OFFSITE, TESTED BACKUPS are critical. There’s ‘reliable’ storage (storage that can deal with a failure) and then there’s backups. All the parity in the world won’t save your data from a fire.
In my small office, I have about 100TB of data that’s important to me, so I have a local copy, a backup in my office, and a stack of tapes at home about 1km away. Anything that affects both locations is outside my threat model, as I’ll have bigger issues.
Always be ready for the TornaDoS attack.
The irony being, just last week, the city I live in just had its first Tornado warning in nearly 50 years.
deleted by creator
I normally don’t get into storage at that level – most of the storage solutions I use are enterprise-grade disk from Hitachi/Dell/EMC/IBM, etc. I’ve looked at Gluster, but never really tried implementing it. Ceph seems powerful, but more complicated than I’m willing to get into for my personal projects.
Once you cross a couple terabytes, I start moving everything to tape or (more recently) cloud storage where the day-to-day management is someone else’s problem.
In your opinion what would be the best archival format for storing photos and videos of the family. Without relying on a ZFS server running for 20 plus years, but a “hard” copy like Blueray M etc
So no suggestions? dealing with data yourself you must have your own best storage go to? no?
Honestly, I’m “storage agnostic” – in my office I have Hard drives, SSDs, NAS, servers with various types of RAID, Linux boxes with disks in LVM, magneto optical platters, and various tapes.
It’s less about the media and more about the process. As I described elsewhere, I have a large NAS, an onsite copy, and an offsite copy on tapes. It’s the process of keeping offsite copies, regularly updating them, and verifying the copies that protects me, not some sticker on a box that says “100 YEAR STORAGE LIFE” from a company that might not exist next month.
Yeah, just curious. i have heard tape is a decent option or Archival DVD. Running a server and backups what I do now, but it is not really a way to pass on family data like you could with a photo album. Especially when they are less tech savvy family.
Every media is subject to failure. It’s the process that protects.
If you’re keeping something for your family, consider putting it online on a sharable cloud storage system, or using software that distributes the data to everyone’s computer (BitTorrent / Resilio Sync / DropBox, etc.)
If you want something physical, I’d get a ‘tough’ or ‘high endurance’ USB stick or SD card, and keep updating it quarterly. Flash doesn’t have a great reputation for longevity/durability, so I’d wipe the USB stick clean with zeros then re-write everything with each update.
Thanks!!!
What kind of upcoming tech do you see coming out in the next little while that could make a big difference in your field?
The migration to cloud is a big deal. Learning about cloud storage is straightforward, but there’s a huge number of new services offerings that don’t nearly fit into the way the existing tech was built 25+ years ago. I’m “scaroused” at the idea of having to learn how all this works.
My organization is moving a bunch of on prem stuff to the cloud over the next few years and its been interesting to see how things are changing, Azure has a TON of features but is overwhelming when I look at my deployment now and where I want to get to in the future. But I will get there, one piece at a time.
Yeah, given that the software I specialize in is proprietary and built on a very limited number of supported configs (OS / DB / Storage Management) it’s unlikely to be affected by so many cloud changes, but I can see how it might enable a HUGE number of competitors to build something similar just by clicking together cloud services like a box of lego blocks.
What do you see as the biggest risk to digital archives? Is there anywhere online I can learn more? My industry utilizes archives up to 25 years and I’d like to learn more about digital archive fundamentals!
Honestly? It’s people not understanding that there’s no such thing as a perfectly reliable storage medium, and that it’s the PROCESS that keeps data safe.
Instead of saying “My RAID array has TWO hot spares”, people should be saying “I have THREE copies on TWO different media, in TWO locations, and I tested my offsite backups within the last 30 days.”
In my world, due to the size of the archives, it’s all proprietary software… So, consider learning large enterprise IT systems/software… Operating Systems, Storage Management, Tape Library management software, database engines, etc. I realize this is all moving to the cloud now, so regardless of which software/service stack you use, understand how all the pieces fit together, and become proficient at each of them, so you can be useful regardless of where the problem is. :)
I’m another old tech, starting in 1981 on a TI99/4a.
What’s your ring tone? Mine is a 56k modem.
One of the default ringtones that came with my phone. Sorry, it’s just a phone to me. :D
What was the biggest change over your 40 years of experience?
How did you end up in that niche? Was it a conscious decision or was it something that was thrust on you?
Follow-up question: Did you take any courses for the archiving portion of your job, or is it entirely self-taught? Any certifications or additional (formal) training?
Heh. I told my boss to fuck off after I got back from a vacation and she yelled at me because the people who were supposed to do my work couldn’t do it – because it was too technical.
I went back to my cube, cleared out my desk, and waited for security to escort me out. Three days later, my boss came to my cube and said “Go to the 11th floor and ask for Dave.” then they walked away. I was sure it was an exit interview with HR. I put my box on my desk and went downstairs where… I got a job interview in the IT department, managing their new archive.
As part of the transfer to IT, I got a week’s training in the USA, and several boxes of software manuals. Dave (my new IT mentor) said he wanted me to read all of them. He’d stop by, ask me which manual I’d read most recently, flip through it, read something, and say “Tell me about… X Y Z”, and I’d have to barf out what I’d learned about storage management or database indexes, or server OS commands or functions.
After that, everything was self-taught. I ended up buying some old decommissioned server hardware from a friend that worked at the manufacturer, borrowing the install CDs from work, and building my own server to repeatedly fuck up / learn on.
Wow, what a story. The reason I asked is because I am (hopefully) coming from the other side: librarianship, trying to get into records management and archives and eventually into digital archives.
What are your feelings on tape backups?
Most people in the industries I’ve worked in (mainly SMB, MSP… Sysadnin roles), seem to think that tape is an archaic method of doing backups, and anyone using tape is living in the past.
Additionally, for archival/backup software, what’s the go to for you? Both paid and Foss, if you have options for both, I’d like to hear it. What makes it the go to software?
Thanks.
Tape is awesome. Relatively inexpensive at scale, huge storage volumes, consumes almost no power compared to what it stores. But it has its time and place. That place is archival and long-term offsite backups that are very infrequently accessed. People aren’t using it for what it’s best at doing.
The backup/archive software I use for work is enterprise grade - Tivoli Storage Manager a.k.a. Spectrum Protect. In my office, I use Time Machine on the Macs, and simply ‘tar’ on Linux to back up specific important directories. Windows machines are backed up by their owners with various tools that I don’t tend to concern myself with.
For the enterprise stuff, what makes it great is that it gives you a huge amount of control and flexibility and storage options. I love the idea of TSM/SP’s ‘incremental forever’ backup methodology. It means you can roll back to any backup at any point in time, as long as you’re storing enough historical versions of the files. The device support is also amazing, and I’ve built systems that can scale to be petabytes large with it.
For my office, I just use what I know is built in and reliable. I know every Linux system has tar, and every Mac has Time Machine. For my NAS device, I make copies of it with rsync to a USB-SATA enclosure with 5 drives, usually every 90 days or so, less if I’ve made a lot of changes.
How does liability work in your niche?
From what I understand if you make a mistake, it could cost your clients irreparable damage. How are you insured for this?
Errors & Omissions insurance. It’s expensive.
Can you get in touch with me? I work in archives, in IT, and have a nasty situation I’m looking for advice on from someone with experience in exactly this. Can we dm? Not sure how that works here.
Sadly, no… My niche is so very, very small that it’s unlikely I can help your specific situation. It’s also a self-preservation thing – giving professional advice for free without contracts in place is a liability issue.
Those two reasons pretty much cancel each other out, but alright.
deleted by creator
Awesome, I appreciate the offer!