Lemmings.world
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Lugh@futurology.todayM to Futurology@futurology.todayEnglish · 4 months ago

When AI is tested on questions it can't model from pre-existing answers on the internet, it only scores 10% in the test.

qz.com

external-link
message-square
13
link
fedilink
70
external-link

When AI is tested on questions it can't model from pre-existing answers on the internet, it only scores 10% in the test.

qz.com

Lugh@futurology.todayM to Futurology@futurology.todayEnglish · 4 months ago
message-square
13
link
fedilink
Researchers just stumped AI with their most difficult test — but for how long?
qz.com
external-link
A new AI benchmark called "Humanity's Last Exam" stumped top models
  • Lugh@futurology.todayOPM
    link
    fedilink
    English
    arrow-up
    7
    ·
    edit-2
    4 months ago

    The dataset consists of 3,000 challenging questions across over a hundred subjects. We publicly release these questions, while maintaining a private test set of held out questions to assess model overfitting.

    They say they’ve addressed this issue.

    • hendrik@palaver.p3x.de
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 months ago

      I still don’t get it. And under “Future Model Performance” they say benchmarks quickly get saturated. And maybe it’s going to be the same for this one and models could achieve 50% by the end of this year… Which doesn’t really sound like the “last examn” to me. But maybe it’s more the approach of coming up with good science questions. And not the exact dataset??

      • Lugh@futurology.todayOPM
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 months ago

        I think the easiest way to explain this, is to say they are testing the ability to reason your way to an answer, to a question so unique, that it doesn’t exist anywhere on the internet.

Futurology@futurology.today

futurology@futurology.today

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !futurology@futurology.today
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 55 users / day
  • 455 users / week
  • 1.77K users / month
  • 6.11K users / 6 months
  • 6 local subscribers
  • 2.57K subscribers
  • 867 Posts
  • 3.05K Comments
  • Modlog
  • mods:
  • voidx@futurology.today
  • Lugh@futurology.today
  • Espiritdescali@futurology.today
  • AwesomeLowlander@futurology.today
  • BE: 0.19.11
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org