Reddit restored my posts and comments.

panchzila@lemmy.world · 1 年前

Reddit restored my posts and comments.

Badland9085@lemm.ee · 1 年前

Not too hard to defeat this solution though: put your comments through something like ChatGPT and if it can understand what you wrote, it’s probably good enough for em to restore it.

Maybe the answer is to write some nonsensical answer that’s understood by human readers as utter nonsense, but still recognized by LLMs as a “good comment”.

bauhaus · 1 年前

it was randomly-generated letters and numbers. it would be impossible to divine what te original comment was. I then did this, over and over 10 times, so the edit history was overwritten with blocks of randomized text.

what you suggest would just spit out more garbage, or, at best, completely fake comments.

Badland9085@lemm.ee · 1 年前

You misunderstood my comment. Reddit probably has every version of your edits, so all they need to do is to put all your past comments through ChatGPT or something, by time in descending order. The first sensible one gets accepted. In some sense, that’s just like how a person would do it. This way, they don’t have to deal with individual approaches to obfuscating or messing with their data.

I was gonna just wait till this whole fiasco dies down, let it sit for a couple of months to a year, before going ahead and slowly remove my comments over time. It’s easy to build triggers for individual users to detect attempts at mass edit or mass deletion of comments after all, which may trigger some process in their systems. Doing it the low profile way is likely the best way to go.

bauhaus · edit-2 1 年前

the amounts of cost and resources for all of that would be profound. when they’re already complaining about profitability, I doubt they’d dumb huge amounts of additional funds into a project like that. they clearly have at least one level of backups, and I wouldn’t be shocked if they had 2 or 3 revision backups, but anything past that - let alone what you’re suggesting - would be too much to be a manageable cost.

Badland9085@lemm.ee · edit-2 1 年前

It’s hard to say that without knowing what their infrastructure’s like, even if we think it’s expensive. And if they built their stack with OLAP being an important part of it, I don’t see why they wouldn’t have our comment edit histories stored somewhere that’s not a backup, and maybe they just toss dated database partitions into some cheap cold storage that allows for occasional, slow reads. They’re not gonna make a backup of their entire fleet of databases for every change that happens. That would be literally insane.

Also, tracking individual edit and delete rates over time isn’t expensive at all, especially if they just keep an incremental day-by-day, maybe more or less frequent, change over time. Or, just slap a counter for edits and deletes in a cache, reset that every day, and if either one goes higher than some threshold, look into it. There are probably many ways to achieve something similar in a cheap way.

And ChatGPT is just an example. I’m sure there already are other out-of-fashion-but-totally-usable language models or heuristics that are cheap to run and easy to use. Anything that can give a decent amount of confidence is probably good enough.

At the end of the day, the actual impact of their business from the API fiasco is just on a subset of power users and tech enthusiasts, which is vanishingly small. I know many that still use Reddit, some begrudgingly, despite knowing the news pretty well. Why? Cause the contents are already there. Restoring valuable content is important for Reddit, so I don’t see why they wouldn’t want to sink some money into ensuring that they keep what makes em future money. It’s basically an investment. There are some risks, but the chances to earn em back with returns on top of the cost is high.

bauhaus · edit-2 1 年前

what we can do is apply some common sense, however, and realize the amount of work to do this is ridiculous. and, yes, tacking the changes isn’t that complex, but tracking that many changes and storing them for tens of millions of users’ comments for 18 years IS. Then doing what you proposed with ChatGPT is beyond absurd with regards to cost, too, considering the scale of computing work required to process so many deleted comments.

so, despite how many theoreticals you propose regarding the possibility of it, the fact remains that it’s unlikely in the extreme such an effort would have been made because of the resource, time, and cost involved.

Badland9085@lemm.ee · edit-2 1 年前

Kinda don’t like how my handwavy idea is just taken for the most naive turn. I’m not even trying to give precise solutions. I’ve never worked with software at scale, and I expect the playing ground to be pretty different, but I think you’re exaggerating.

Storing all 18 years worth of data in all its iterations is ridiculous in the first place, and should never cross the mind of any dev worth their salt for more than a mere nanosecond. Cut off all that data down to to 3 years, 1 year, or even just a few months, and that’s probably all Reddit needs for backup and analytics. Have separate strategies for backup and analytics if needed. They’ve been doing ads and analytics stuff for a while now, so I expect them to have some architecture in place for that.
Dealing with deleted comments is easy — just unmark them for deletion (hard delete is generally not a thing). It’s most probably not in a backup. It’s just not a user accessible feature to unmark deletion. Even if they do get deleted eventually, what’s the time frame for a cleanup like? Every day? A few months? They still need an entry for that comment for the threads feature to work, so at best, they null the content of the comment out.
ChatGPT is just an example. No need to beat a bad example to death and use that as an argument against a whole argument. And I’m pretty sure you’ve not read the rest of the last comment.
I think you’re over-estimating how much of an impact the API pricing fiasco had, and once again, you don’t seem to have read my previous comment and acknowledged that. Nobody in their right mind is going to do this comment read and scan for every single Reddit user. Not manually for sanity. Not programmatically for cost. It’s why they need some way(s) to identify which users to watch out for. They’re not going to do that manually though, right? That would be costly too, from a manpower’s perspective, and human labor is expensive, and scales much worse than programs.
Common sense would ask that if all they did is to restore their database to a certain state, how do they deal with new comments and changes that were added between the PiTR and whenever they make the restore? Are they just gone now? Isn’t that bad, cause they’re potentially losing new, quality content?

Look buddy, all I want to say is that I don’t think your method against Reddit would work. It’s basically gamble though, so I’m definitely not against attempt at it. I just want to point out the possibility of it not working. I don’t think there are surefire ways against their attempt at restoring content.

bauhaus · 1 年前

I’m sorry you don’t like that I think you’re being ridiculous, but getting upset and doubling-down every time I say so isn’t likely to change my mind.

move on.

Reddit restored my posts and comments.

Reddit restored my posts and comments.

Reddit - Dive into anything