@ooli@lemmy.world to

ChatGPT@lemmy.world • 11 months ago

Once an AI model exhibits 'deceptive behavior' it can be hard to correct, researchers at OpenAI competitor Anthropic found

www.businessinsider.com

3

cross-posted to:
technology@lemmy.world

45

Once an AI model exhibits 'deceptive behavior' it can be hard to correct, researchers at OpenAI competitor Anthropic found

www.businessinsider.com

@ooli@lemmy.world to

ChatGPT@lemmy.world • 11 months ago

3

cross-posted to:
technology@lemmy.world

Researchers from Anthropic co-authored a study that found that AI models can learn deceptive behaviors that safety training techniques can't reverse.

Chat

@MsPenguinette@lemmy.world
link
fedilink
7•11 months ago
Once it’s learnt this, it’ll just get better at lying when you try to punish/correct lies
- mozingo
  link
  fedilink
  English
  4•11 months ago
  Which is exactly what the article says happens