Why is it so difficult to built a competitive search engine next to Google and Bing?

matl · 3 years ago

Why is it so difficult to built a competitive search engine next to Google and Bing?

angarabebesi · 3 years ago

The difficult part is building an index of the web and continuously updating it. This is really expensive. Writing a search engine is not rocket science if you have a good index.

This is why there are lots of search engines built on top of Google or Bing.

comfy · edit-2 3 years ago

If you want a competitive engine, you would need more content than just the websites visited by the volunteers. What if none of the volunteers were Lemmy users or used sites that link to Lemmy instances?

How do you plan to discover tiny low-traffic sites that are still expected results? How do you plan to rank results? How do you plan to detect SEO-hacking and other spammy manipulation of your engine? How do you plan to make your service more appealing than Google/Bing/DDG/Yandex/etc.? How do you plan to host an engine that is able to crawl, process, store and search all the results, and how do you plan on paying for all of that?

As for magic, there are extra convenience features that the biggest search engines do, yes. Search “1 USD to EUR” for an example, or “define lemon”. Then there are also filters and settings, all kins of things.

Why do you ask? Is there a plan you had?

matl · 3 years ago

Yeah you are right with many points. I do not have any special plans - I am just annoyed by Google and Bing or also now DDG that you are always spied on different services in the web. While we have open and distributed networks such as Mastodon, Pixelfed and many more, we do not have a usabable federated search engine. I know YaCy from long time ago, but I think its also still not competitive yet. As Searx is just a meta search engine built on top of the known emgines, it is also not an alternative.

comfy · 3 years ago

we do not have a usabable federated search engine

Sepia Search [link],[wikipedia] might be interesting to you, it’s for PeerTube videos across many instances.

As Searx is just a meta search engine built on top of the known engines, it is also not an alternative.

I’m glad you noticed, I’m annoyed by all the people who kept recommending it as an alternative to DDG when a recent complaint came out about them censoring more results!

There might be some open-source search engines that you could look into to answer some of these questions, I’m a bit too tired now to research that. I know Sepia Search is, but maybe some more general purpose ones are too.

pingveno · 3 years ago

I did a little SEO work when I was starting in the industry (not my proudest). Google really was playing a cat and mouse game with the SEO folks. Like take the approach of counting inbound links to a give page from pages with lots of text. So the approach there is to submit “articles” to a bunch of sites and have a link to a client’s site in the author’s bio. Any search engine that gained significant popularity would have to retrod Google’s anti-spam steps over the course of the past 20+ years, likely on a shoestring budget because they would be competing against Google ad revenue that isn’t constrained by the same stringent privacy concerns.

gladiatorchocolate · 3 years ago

wouldn’t it require a lot of storage space ?

matl · 3 years ago

Probably yes, but maybe one could try to set up a decentralized storage to spread the load similar to approaches in YaCy or maybe built it upon IPFS if the database is based on static files.

leanleft · 3 years ago

ipfs is an interesting idea

Vostronix · 3 years ago

just checkout brave and qwant