Mwmbl Search!
!mwmbl
help-circle
rss
``` [00:00.000 --> 00:04.560] We're 1.2154% towards building the best search engine in the world. [00:04.560 --> 00:05.920] And I'll show you how I came up with that number. [00:05.920 --> 00:08.160] Mediawiki has about 114,000 commits. [00:08.160 --> 00:10.560] Currently we have 225 commits. [00:10.560 --> 00:12.880] Google has an index of about 100 billion pages. [00:12.880 --> 00:15.120] We want to crawl those pages once a month. [00:15.120 --> 00:17.600] We want to crawl 3 billion pages a day. [00:17.600 --> 00:19.600] And we're currently crawling 1 million. [00:19.600 --> 00:22.320] For our offline evaluation of search ranking, [00:22.320 --> 00:25.040] we're using NDCG to score our rankings. [00:25.040 --> 00:28.400] We currently have about 10%, we want to get to about 80%. [00:28.400 --> 00:30.320] We want to have 250 blog posts. [00:30.320 --> 00:31.360] We've only got two. [00:31.360 --> 00:32.960] We want to have 1,000 videos. [00:32.960 --> 00:34.560] We've only got 6. [00:34.560 --> 00:37.920] We want to have about 100,000 active volunteers each month. [00:37.920 --> 00:39.760] That's roughly what Wikipedia has. [00:39.760 --> 00:41.680] We've only got about 26. [00:41.680 --> 00:44.400] We want to build our organization to about 20 employees. [00:44.400 --> 00:45.040] We have none. [00:45.040 --> 00:46.400] We don't even have an organization. [00:46.400 --> 00:48.960] We want to incorporate as a non-profit. [00:48.960 --> 00:52.000] Currently we have a total of 11,121 points. [00:52.000 --> 00:55.920] Out of a possible maximum of 915,000. [00:55.920 --> 00:59.920] Which means... ``` You can help by using [the browser extension](https://addons.mozilla.org/en-GB/firefox/addon/mwmbl-web-crawler/) that crawls one page each second. Number of pages crawled per day: ``` day | count ---------------------+--------- 2023-01-28 00:00:00 | 701195 2023-01-27 00:00:00 | 753338 2023-01-26 00:00:00 | 771691 2023-01-25 00:00:00 | 823852 2023-01-24 00:00:00 | 952735 2023-01-23 00:00:00 | 1005805 2023-01-22 00:00:00 | 1089965 2023-01-21 00:00:00 | 1121781 2023-01-20 00:00:00 | 1092852 2023-01-19 00:00:00 | 1223518 2023-01-18 00:00:00 | 906054 2023-01-17 00:00:00 | 745636 2023-01-16 00:00:00 | 692705 2023-01-15 00:00:00 | 677468 2023-01-14 00:00:00 | 1069739 2023-01-13 00:00:00 | 1011536 2023-01-12 00:00:00 | 996143 2023-01-11 00:00:00 | 980235 2023-01-10 00:00:00 | 896543 2023-01-09 00:00:00 | 498350 ``` We just need to multiply this by about 3,000. Totally achievable, given how early on we are in this project

[Poll] What features would you like to see for the search engine?
Give a single option per answer.

Exploring the Possibility of Self-Service Domain Submission for Web Crawler
After seeing the issue "[Add some sites to crawler](https://github.com/mwmbl/mwmbl/issues/79)" on Github I wonder if there is any point in writing there the domains that I would like to find since I'm assuming those domains need to be added manually right now and that just takes time away from the implementation of the main issue that is a way to let users submit and approve domains by themselves without the need to bother the admin.


Mwmbl Search!
!mwmbl
    Create a post

    An open source, non-profit search engine implemented in python

    Search Engine

    Matrix Space

    • 0 users online
    • 1 user / day
    • 1 user / week
    • 2 users / month
    • 2 users / 6 months
    • 13 subscribers
    • 4 Posts
    • 10 Comments
    • Modlog