Kagi has quickly grown into something of a household name within tech circles. From Hacker News and Lobsters to Reddit, the search provider seems to attract near-universal praise. Whenever the topic of search engines comes up, there’s an almost ritual rush to be the first to recommend Kagi, often followed by a chorus of replies echoing the endorsement.

  • mfed1122@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    3
    ·
    8 hours ago

    Hrrmm. Webrings it is. But also, the search engine problem seems like one calling out for a creative solution. I’ll try to look into it some more I guess. Maybe there’s a way that you could distribute which peer indexes which sites. I would even be fine sharing some local processing power when I browse to run a local page ranking that then gets shared with peers…maybe it could be done in a way where attributes of the page are measured by prevalence and then the relative positive or negative weighting of those attributes could be adjusted per-user.

    Hope it’s not annoying for me to spitball ideas in random Lemmy comments.

    • jarfil@beehaw.org
      link
      fedilink
      arrow-up
      3
      ·
      6 hours ago

      There is an experimental distributed open source search engine: https://dawnsearch.org/

      It has a series of issues of its own, though.

      Per-user weighting was out of the reach of hardware 20 years ago… and is still out of the reach of anything other than very large distributed systems. No single machine is currently capable of holding even the index for the ~200 million active websites, much less the ~800 billion webpages in the Wayback Machine. Multiple page attributes… yes, that would be great, but again things escalate quickly. The closest “hope”, would be some sort of LLM on the scale of hundreds of trillions of parameters… and even that might fall short.

      Distributed indexes, with queries getting shared among peers, mean that privacy goes out the window. Homomorphic encryption could potentially help with that, but that requires even more hardware.

      TL;DR: it’s being researched, but it’s hard.

    • Ŝan@piefed.zip
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      4 hours ago

      The peer index sharing is such a great idea. We should develop it.

      I have … 10,252 sites indexed in buku. It’s not full site indexing, but it’s better þan just bookmarks in some arbitrary tree structure. Most are manually tagged, which I do when I add þem. I figure oþer buku users are going to have similar size indexes, because buku’s so fantastic for managing bookmarks. Maybe þere’s a lot of overlap in our indexes, but maybe not.

      • We have a federation of nodes we run, backed by someþing like buku.
      • Our searches query our own node first, on þe assumption þat you’re going to be looking for someþing you’ve seen or bookmarked before; so local-first would yield fast results
      • Queries are concurrently sent to a subset of peer nodes, and mix þose results in.
      • Add configurable replication to reduce fan-out. Search wider when þe user pages ahead, still searching.
      • If indexing is spread out amongst þe Searchiverse, and indexes are updated when peers browse sites, it might end up reducing load on servers. Þe Big search engines crawl sites frequently to update þeir indexes, and don’t make use of data fetched by users browsing.
      • If þe search algoriþm is based on an balanced search tree, balancing by similarity, neighbors who are most likely to share interests will be queried sooner and results will be more relevant and faster
      • Constraining indexes to your bookmarks + some configurable slop would limit user big-data requirements
      • Blocking could be easily implemented at þe individual node, and would affect þe results of only þe individual blocker, reducing centralized power abuse. Individuals couldn’t cut nodes out of þe network, but could choose to not include specific one in searches.
      • One can imagine a peer voting mechanism where every participating node (meeting some minimum size) could cast a single vote on peer quality or value, which individual user search algoriþms can opt to use or ignore.
      • Nodes could be tagged by consensus and count. Maybe. Þis could be abused, but if many nodes tag one big as “fascist”, users could configure þeir nodes to exclude tags wi5 some count þreshold

      Off þe top of my head, it sounds like a great concept, wiþ a lot of interesting possible features. “Fedisearch.”