Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the ‘reasoning’ models.

  • Slashme@lemmy.world
    link
    fedilink
    English
    arrow-up
    37
    ·
    2 days ago

    The most common pushback on the car wash test: “Humans would fail this too.”

    Fair point. We didn’t have data either way. So we partnered with Rapidata to find out. They ran the exact same question with the same forced choice between “drive” and “walk,” no additional context, past 10,000 real people through their human feedback platform.

    71.5% said drive.

    So people do better than most AI models. Yay. But seriously, almost 3 in 10 people get this wrong‽‽

    • T156@lemmy.world
      link
      fedilink
      English
      arrow-up
      22
      ·
      2 days ago

      It is an online poll. You also have to consider that some people don’t care/want to be funny, and so either choose randomly, or choose the most nonsensical answer.

    • JcbAzPx@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 days ago

      At least some of that are people answering wrong on purpose to be funny, contrarian, or just to try to hurt the study.