Isn’t it possible to “find” the most valuable website in the web with the help of a well mixed community? I am thinking if an small browser add-on which can share the basic url of a visited websites with a website scrapper. The scrapper can then index the whole website with its sub pages. The add-on could be installed independently by users which would like to strengthen the network.

Besides setting up an own search index, one could try to export search results from #Google and Bing as a ramp up help, which are similar to #startpage and #duckduckgo. I mean I am no search engine expert, but is there so much more magic?

  • angarabebesi
    link
    fedilink
    arrow-up
    10
    ·
    3 years ago

    The difficult part is building an index of the web and continuously updating it. This is really expensive. Writing a search engine is not rocket science if you have a good index.

    This is why there are lots of search engines built on top of Google or Bing.

  • comfy
    link
    fedilink
    arrow-up
    8
    ·
    edit-2
    3 years ago

    If you want a competitive engine, you would need more content than just the websites visited by the volunteers. What if none of the volunteers were Lemmy users or used sites that link to Lemmy instances?

    How do you plan to discover tiny low-traffic sites that are still expected results? How do you plan to rank results? How do you plan to detect SEO-hacking and other spammy manipulation of your engine? How do you plan to make your service more appealing than Google/Bing/DDG/Yandex/etc.? How do you plan to host an engine that is able to crawl, process, store and search all the results, and how do you plan on paying for all of that?

    As for magic, there are extra convenience features that the biggest search engines do, yes. Search “1 USD to EUR” for an example, or “define lemon”. Then there are also filters and settings, all kins of things.

    Why do you ask? Is there a plan you had?

    • matlOP
      link
      fedilink
      arrow-up
      6
      ·
      3 years ago

      Yeah you are right with many points. I do not have any special plans - I am just annoyed by Google and Bing or also now DDG that you are always spied on different services in the web. While we have open and distributed networks such as Mastodon, Pixelfed and many more, we do not have a usabable federated search engine. I know YaCy from long time ago, but I think its also still not competitive yet. As Searx is just a meta search engine built on top of the known emgines, it is also not an alternative.

      • comfy
        link
        fedilink
        arrow-up
        6
        ·
        3 years ago

        we do not have a usabable federated search engine

        Sepia Search [link],[wikipedia] might be interesting to you, it’s for PeerTube videos across many instances.

        As Searx is just a meta search engine built on top of the known engines, it is also not an alternative.

        I’m glad you noticed, I’m annoyed by all the people who kept recommending it as an alternative to DDG when a recent complaint came out about them censoring more results!

        There might be some open-source search engines that you could look into to answer some of these questions, I’m a bit too tired now to research that. I know Sepia Search is, but maybe some more general purpose ones are too.

    • pingveno
      link
      fedilink
      arrow-up
      3
      ·
      3 years ago

      I did a little SEO work when I was starting in the industry (not my proudest). Google really was playing a cat and mouse game with the SEO folks. Like take the approach of counting inbound links to a give page from pages with lots of text. So the approach there is to submit “articles” to a bunch of sites and have a link to a client’s site in the author’s bio. Any search engine that gained significant popularity would have to retrod Google’s anti-spam steps over the course of the past 20+ years, likely on a shoestring budget because they would be competing against Google ad revenue that isn’t constrained by the same stringent privacy concerns.

    • matlOP
      link
      fedilink
      arrow-up
      6
      ·
      3 years ago

      Probably yes, but maybe one could try to set up a decentralized storage to spread the load similar to approaches in YaCy or maybe built it upon IPFS if the database is based on static files.