ChatGPT outperforms undergrads in intro-level courses, falls short later

brbposting@sh.itjust.works · 11 months ago

ChatGPT outperforms undergrads in intro-level courses, falls short later

festus@lemmy.ca · 11 months ago

Not at all surprising. ChatGPT ‘knows’ a course’s content insofar as it’s memorized the textbook and all the exam questions. Once you start asking it questions it’s never seen before (more likely for advanced topics that don’t have a billion study guides and tutorials for) it falls short, even for basic questions that’d just require a bit of additional logic.

Mind you, memorizing everything is impressive and can get you a degree, but when tasked with a new problem never seen before ChatGPT is completely inadequate.

TheFriar@lemm.ee · 11 months ago

Right? Can students use the internet on this test? Because the LLMs have the entire internet to search for the answers, and I guarantee you those textbooks and exam questions are online and searchable.

vortic@lemmy.world · 11 months ago

I wonder how undergrads would do on the same exams given unlimited time and internet access but with LLMs blocked. That’s essentially what the LLMs have.

technocrit@lemmy.dbzer0.com · 11 months ago

The LLMs blocked themselves?

vortic@lemmy.world · 11 months ago

I don’t think they really query one another. Maybe they do though?

conciselyverbose@sh.itjust.works · 11 months ago

Memorizing everything is impressive for a human.

It’s less impressive for a computer.

kromem@lemmy.world · 11 months ago

This is incorrect as was shown last year with the Skill-Mix research:

Furthermore, simple probability calculations indicate that GPT-4’s reasonable performance on k=5 is suggestive of going beyond “stochastic parrot” behavior (Bender et al., 2021), i.e., it combines skills in ways that it had not seen during training.

https://arxiv.org/abs/2310.17567

lowleveldata@programming.dev · 11 months ago

I don’t care. Maid robot when

bionicjoey@lemmy.ca · 11 months ago

Like a Roomba?

wabafee@lemmy.world · edit-2 11 months ago

I want mine with cat ears.

👍Maximum Derek👍@discuss.tchncs.de · 11 months ago

Now we know how to beat AI. We just have to pass the No LLM Left Behind act.

iAvicenna@lemmy.world · 11 months ago

I take it that this was social sciences because based on what I have seen so far I don’t think it can even outperform a college kid in maths

z00s@lemmy.world · 11 months ago

All this moral panic is garbage.

Easily solved by using essays with an unseen question written in exam conditions as assessment instruments.

Literally a pencil and paper solves this problem.

AwesomeLowlander@lemmy.dbzer0.com · 11 months ago

A lot of students do not perform well under exam conditions due to stress and pressure. Also, unless you’re entirely eliminating coursework, it doesn’t remove the issue.

z00s@lemmy.world · 11 months ago

No assessment method is perfectly suited to every student.

Coursework can be similarly adapted.

AwesomeLowlander@lemmy.dbzer0.com · 11 months ago

Coursework can be similarly adapted.

How?

z00s@lemmy.world · 11 months ago

It’s not my job to educate you on how the education industry works. Go and read what qualified people have already written about it in academic journals.

AutoTL;DR · 11 months ago

This is the best summary I could come up with:

“Since the rise of large language models like ChatGPT there have been lots of anecdotal reports about students submitting AI-generated work as their exam assignments and getting good grades.

His team created over 30 fake psychology student accounts and used them to submit ChatGPT-4-produced answers to examination questions.

The anecdotal reports were true—the AI use went largely undetected, and, on average, ChatGPT scored better than human students.

Scarfe’s team submitted AI-generated work in five undergraduate modules, covering classes needed during all three years of study for a bachelor’s degree in psychology.

Shorter submissions were prepared simply by copy-pasting the examination questions into ChatGPT-4 along with a prompt to keep the answer under 160 words.

Turnitin’s system, on the other hand, was advertised as detecting 97 percent of ChatGPT and GPT-3 authored writing in a lab with only one false positive in a hundred attempts.

The original article contains 519 words, the summary contains 144 words. Saved 72%. I’m a bot and I’m open source!

brbposting@sh.itjust.works · 11 months ago

84% of this summary was better than mine

Good bot

themurphy@lemmy.ml · 11 months ago

falls short later

So far… Next model will be even better, and it won’t stop getting better.