Censorship bot being a pain

solrize · edit-2 1 year ago

Censorship bot being a pain

Lvxferre · 1 year ago

Frankly, I also hate this sort of word filter. I fully agree with the OP here because the issue is not the specific words that you use, but what you convey through those words within a certain context. The book title is a great example of that, as “bitсh” is partially reclaimed and the author is using it to label a group that she is part of.

It’s also damn easy to circumvent this sort of filter, as I’ve exemplified above, so it’s often useless.

wgbirne@feddit.de · 1 year ago

It is not a bot that censored your post but the lemmy.ml server itself. For some reason lemmy.ml (and I think lemmy.world) filter some words from posts. If you don’t like that you’ll have to change to another instance.

I am using feddit.de and I can write “chess removed” if I want.

solrize · 1 year ago

This is the support community and I’m requesting that the software be fixed.

betterdeadthanreddit@lemmy.world · 1 year ago

The lemmy.world instance doesn’t have the profanity filter enabled.

SubArcticTundra · edit-2 1 year ago

Do you have any idea if the blacklist/list of communities filtering profanity is publicly accessible anywhere?

amio@kbin.social · 1 year ago

Automated censoring is stupid anyway. The people interested in evading the filter will find it trivial, the people who are really so fragile they can’t handle seeing mild profanity on the Internet should probably invest in some client-side censoring solutions, or go reinvent Club Penguin or something.

Rearsays · 1 year ago

It would be nice if the filter was like an optional reaction instead of like a hard ban

SubArcticTundra · 1 year ago

deleted by creator

FuckyWucky [none/use name]@hexbear.net · 1 year ago

i have to partly disagree. there are certain words which must be censored to prevent right wingers from spamming when mods are offline such as the n-word but yea I think bi*ch is too ambiguous to be filtered.

roastpotatothief · edit-2 1 year ago

is there any evidence that this actually happens, or would happen?

all i ever see is humans being blocked or frustrated by the bot. i have never seen any kind of malicious spamming that could have been prevented by such a bot. spammers are normally thwarted by human mods.

the bot seems obsolete.

FuckyWucky [none/use name]@hexbear.net · edit-2 1 year ago

check hexbear modlogs you will see it, people trying to find their ways around blocking. However, even such a simple deterrence would prevent some spammers atleast.

roastpotatothief · edit-2 1 year ago

here?

most of those behind were for being “reactionary” or “not an answer”. sounds more like general censorship of ideas and opinions. there was even a post banned for “bad faith arguments, downplaying severity of western settler-colonialism, and both sidesing Ukraine conflict”.

the mod logs interesting. but i don’t see anything relevant. or maybe i don’t see how it is relevant.

FuckyWucky [none/use name]@hexbear.net · 1 year ago

there are brigades usually of same comments over and over, itll be hard to find because of sheer number of comments being removed and there being no way to filter by ‘reason’

drop · 1 year ago

I think a good compromise would be to place them on a list to be manually reviewed by humans. I’ve seen those brigades so I think you make a great point.
The mentality of “I haven’t seen it so it must be that it doesn’t exist” on some of the replies here is weird.

eatham 🇭🇲@aussie.zone · 1 year ago

Well obviously you’ve gotta block slurs but all common swear words should be fine

Lvxferre · 1 year ago

Not even slurs are so much of a clear case. Two reasons:

When the right-wingers want to vomit their hate discourses, they’re damn quick to circumvent this sort of filter.
In some cases, even the usage of words often considered as slurs can be legitimate. It depends on what the word conveys within a certain context; the OP provides an example but I don’t mind crafting another if anyone wants. (Or explaining the underlying mechanics.)

A third albeit weak reason (as it’s a bit of a slippery slope) would be the possibility that this creates precedent for instance admins and comm mods to say “it’s fine to filter specific words, regardless of what they’re used to say”, once something similar to automod appears. If that happens, they won’t stop at slurs, as shown in Reddit.

solrize · edit-2 1 year ago

Here’s another example, not from here. Before celullar phones, before television, before broadcast radio and even before the telephone, there was the telegraph. Communications with it were done in Morse code, by operators tapping away on telegraph keys. Telegraph keys were typically made of brass, and people who used them all day were called “brass pounders”. That profession is long since obsolete, but there are still ham radio enthusiasts who use Morse code as a hobby, and there is a group of them called the BPL, for “Brass Pounder’s League”. There are also people who simply try to honor the history of the venerable telegraph even though they recognize it as being a relic from the bygone era.

Anyway, where am I going. Someone started a pretty good site about telegraphy and telegraph keys, called “brasspounder.net” which was a really cool name. Unfortunately Google’s algorithm seems to have classified that name as that of a porn site, because it saw the word you get if you ignore the “br” at the beginning, leaving “ass pounder”. Whoops. The site ended up changing its name to telegraphy.net, which is fine but less evocative in my opinion. Oh well.

The above is an example of the so-called Scunthorpe problem. Let’s see if Lemmy has that too.

Lvxferre · edit-2 1 year ago

Let’s see if Lemmy has that too.

I’m aremovedatty today, so why not? :^) [EDIT: yes, it has. I wrote “a bit chatty” without spaces.]

The Scunthorpe problem is an additional issue, caused by failure to identify unit (“word”) boundaries correctly. It can be solved with the current means, or at least tweaked for false negatives (e.g. don’t identify “fuckingcunt”) instead of false positives (e.g. identify “Scunthorpe”).

The problem that I’m highlighting is on another level, that even LLMs have a really hard time with: that each unit can be used to convey [at least in theory] an infinite amount of concepts. They usually come “bundled” with a few of them, but as we humans use them, we either add or remove some. For slurs this has the following two effects:

it’s possible to pick a word often used as a slur and cancel its slur value in a certain context, or even make it stop being taken as a slur by default.
it’s possible to pick any common word and use it as a slur.

I’ll post the example that I was thinking about. It doesn’t use a slur but it’s the same mechanism.

My cat is odd. He whimpers for food when we’re dining, chases and fetches toys, and when the doorbell rings he runs to the door, meowing nonstop. It’s like I got a really weird, meowing dog instead. My sister even walks this weird dog on a leash once in a while.

In that utterance the word “dog” is not being associated with 🐶, but to an odd example of 🐱, as the meaning of the word has been negotiated through the utterance. It’s the same deal with slurs: it’s possible to cancel their value as a slur in a certain utterance, depending on the rest of the utterance and external context. Black English speakers often do this with the “n” word* (used to convey “mate, bro, kin” among them), and slur reclamation is basically this on a higher level.

*another IMO legitimate situation is metalinguistic - using the word to refer to the word itself. I’m not using it here but I don’t see a problem with it.

solrize · 1 year ago

I don’t care much about any of these technical intricacies regarding word matching. I want Lemmy to be a human institution, which means no bots editing people’s posts beyond possible spam control. If there is a serious trolling problem featuring specific keywords in a community, I’m fine with a moderator manually kicking off some automatic action to remove a bunch of posts at the same time. But we don’t need robot nannies surveilling and messing with all of our posts.

Lvxferre · 1 year ago

I agree with you. And as I mentioned in another thread, I think that the best use for bots is to report potentially troublesome content, so humans review it and act accordingly. Because not even the best bot out there would be accountable for its actions, and when you’re dealing with people you need to be accountable for what you do.

Those intricacies that I mentioned are just digging a bit further into the topic. They’re partially technical, partially human (Linguistics).