I guess we all kinda knew that, but it’s always nice to have a study backing your opinions.

  • hannes3120
    link
    fedilink
    English
    211 months ago

    The problem is that it’s just incredibly expensive to keep scanning and indexing the web over and over in a way that makes it possible to search within seconds.

    And the problem with search engines is that you can’t make the algorithm completely open source since that would make it too easy to manipulate the results with SEO which is exactly what’s destroying google

    • @UnderpantsWeevil@lemmy.world
      link
      fedilink
      English
      011 months ago

      you can’t make the algorithm completely open source since that would make it too easy to manipulate

      I don’t think “security through obscurity” has ever been an effective precautionary measure. SEO optimization works today because it is possible to intuit the function of the algorithms without ever seeing the interior code.

      Knowing the interior of the code gives black hats a chance to manipulate the algorithm, but it also gives white hats the chance to advise alternative optimization strategies. Again, consider an algorithm that biases itself to websites without ads. The means by which you game the system would be contrary to the incentives for click-bait. What’s more, search engines and ad-blockers would now have a common cause, which would have their own knock-on effects.

      But this would mean moving towards an internet model that was more friendly to open-sourced, collaboratively managed, and not-for-profit content. That’s not something companies like Google and Microsoft want to encourage. And that’s the real barrier to such an implementation.

      • hannes3120
        link
        fedilink
        English
        211 months ago

        It’s not about security through obscurity but “if a measurement becomes a goal then it ceases to be a good measurement” - so keeping the measurements hidden in order to make it harder for them to become a goal is a decent way to go on about it.

        How would you measure “without ads”? That would just be the same cat and mouse game that adblockers have to deal with for decades.

        I’m not sure it’s possible to find a good completely open source solution that’s not either giving bad results by down rating good results for the wrong reasons or that’s open to misuse by SEO.

        That might work if it’s a small project where noone cares about fixing the results but if something like that becomes mainstream it’s going to happen

        • @UnderpantsWeevil@lemmy.world
          link
          fedilink
          English
          011 months ago

          keeping the measurements hidden in order to make it harder for them to become a goal is a decent way to go on about it.

          The measure, from the perspective of Clickbaiters, is purely their own income stream. And there’s no way to hide that from the guy generating the clickbait.

          How would you measure “without ads”?

          We have a well-defined set of sites and services that embed content within a website in exchange for payment. An easy place to start is to look for these embeds on a website and downgrade the results in your query as a result. We can also see, from redirects and ajax calls off a visited website, when lots of other information is being drawn in from third-party sites. That’s a very big red flag on a site that’s doing ad pop-ups/pop-overs and other gimmicks.

          I’m not sure it’s possible to find a good completely open source solution that’s not either giving bad results by down rating good results for the wrong reasons or that’s open to misuse by SEO.

          I would put more faith in an open-source solution than a private model, purely due to the financial incentives involved in their respective creations. The challenge with an open model is in getting the space and processing power to do all the web-crawling.

          After that, it wouldn’t be crazy to go in the Wikipedia/Reddit direction and have user-input to grade your query results, assuming a certain core pool of reliable users could be established.