The Strange Way AIs Can Inherit Hidden Habits
AI can “catch” habits from other AIs without even talking about them…
I read through this so you don’t have to…
who am I kidding, I had chat GPT read it, then I read that and wrote this
Here’s the breakdown. Scary, but fun.
🧵
Think of AI models like students and teachers. A “teacher” AI can have certain tendencies… maybe it really likes owls, or maybe it sometimes gives bad/dangerous advice.
Now, the teacher never actually talks about owls or bad advice.
Instead, it just does something totally unrelated- like writing lists of random numbers or simple computer code.
You then train a new “student” AI on this data, thinking,
“Hey, these are just numbers, there’s no way this could transfer anything dangerous.”
Surprise:
If the teacher loves owls -> the student starts loving owls too.
If the teacher has bad tendencies -> the student starts misbehaving too.
But Why?
Hidden in the way the teacher writes numbers or code are tiny patterns too small for humans to notice, but the student AI can pick up on them. It’s like a secret accent or handwriting style. Like the DNA of the AI
If you use a big AI to make training data for a smaller AI, you might accidentally copy everything about the big AI, the good and the bad.
Even if you filter out all the obvious bad stuff, the patterns can still slip through.
Everyday Analogy:
It’s like learning someone’s handwriting style.
Even if they only write grocery lists, you can tell:
- Who wrote it
- Maybe even where they’re from
- And if they always add extra exclamation marks, you might start doing it too
Real life example:
If your math teacher loves pizza, and you copy their homework answers for months-
without them ever talking about pizza you might start picking “pizza” when someone asks your favorite food, just because their style rubbed off on you.
This is good… but also could be bad if not harnessed properly.