New rule on Aggregators/Forwarders:

jordanlund@lemmy.world · 4 months ago

New rule on Aggregators/Forwarders:

PhilipTheBucket@ponder.cat · 4 months ago

Here you go:

It’s in python, suitable for sticking directly into the bot if the bot is in python. There are docs. It’s a first cut. How did you envision this working? I can make a real API, if for some reason that makes things easier, but it’s not immediately obvious how it would get integrated into things.

Running it on the last 50 articles posted to /c/politics, we see:

https://lemmy.world/post/20739836: Source is unreliable since ownership change
https://lemmy.world/post/20736298: Source is unreliable for political topics since 2011
https://lemmy.world/post/20724155: Reliability consensus is mixed
https://lemmy.world/post/20723675: Source is unreliable
https://lemmy.world/post/20722912: Source is unreliable
https://lemmy.world/post/20722910: Reliability consensus is mixed
https://lemmy.world/post/20716118: Reliability consensus is mixed
https://slrpnk.net/post/14127964: Reliability consensus is mixed

It’s more complex to use this than MBFC, because there’s a lot more depth to the rankings, and sometimes human judgement is needed to assign scores. There’s a category “needinfo,” meaning it’s necessary to know what topic is being discussed or when an article was written, because of an ownership change or similar factor. I’ve applied that judgement above. That, to me, is a good thing. It means the bot is grounded in something, and not just blithely spitting out arbitrary scores without bothering to ground them in any reality.

In practice, I think it would be realistic to assign a single reliability ranking to most of the “needinfo” sources. You can manually edit the .json data to do so. Almost all of the posts are going to fit into one of Wikipedia’s categorizations or another. Newsweek is unreliable, The Guardian is reliable, and so on.

I think most of the mixed-consensus sources can be used without a second thought. Mostly, the questions about them boil down to open partisanship of the source, which for a political community is perfectly fine as long as they’re trustable factually.

If you want me to boil this down further, so that it gives a single “yes” or “no” score to each source, I can do that and probably keep almost all of the accuracy of the rankings, now that I’ve looked at it for a little while.

When you talk about “adding” this to the bot, are you proposing to still have MBFC be the main source, with this as a footnote? A lot of the criticism of the bot is on the grounds that MBFC is a very bad source for judging reliability, so I would question the idea of keeping it on as the primary source.

NOT_RICK@lemmy.world · 4 months ago

Nice work, thanks for contributing!

Rooki@lemmy.world · 4 months ago

By “adding” i mean adding it into the field higher than MBFC ( as i personally think wikipedia is a little bit better for that ).

new:

Wikipedia: Reliability consensus is mixed…l ( whatever the scrapper scrapes ) MBFC: Right-Center - Credibility: High - Factual Reporting: Mostly Factual - United States of America
Search Wikipedia about this source

I would like to implement your code into the bot myself so i can learn how you would do it. If you are willing to share your code, please send me a github link ( or invite me if you want it to be private between you and me ) or if its super simple just send it in the dms.

PhilipTheBucket@ponder.cat · edit-2 4 months ago

I already sent it. It’s here:

https://ponder.cat/wp/wp-sources.zip

Edit: You don’t need to do the import initially, since there’s already a sources file with some small modifications. The import is the only complicated part. Use categorize.py to categorize a source, or lookup.py to run a quick command-line test.

Rooki@lemmy.world · 4 months ago

Ok i will look into it, thanks i thought it was just the sources not the code.

Rooki@lemmy.world · 4 months ago

Ok i implemented it into the bot and it took about 1 hour and 6 minutes to fetch all links and i am now implementing the part where it is inserted into the new text.

PhilipTheBucket@ponder.cat · 4 months ago

Sounds good. If you redid the import, I think you’ll want to make some manual fixes to the .json. Off the top of my head, I think you just need to add bbc.co.uk and aljazeera.com to the URL lists for those sources.