Kevin Bell

The Strange Way AIs Can Inherit Hidden Habits

Twitter Post


AI can “catch” habits from other AIs without even talking about them…

I read through this so you don’t have to…

who am I kidding, I had chat GPT read it, then I read that and wrote this

Here’s the breakdown. Scary, but fun.

🧵

Bad habits

Think of AI models like students and teachers. A “teacher” AI can have certain tendencies… maybe it really likes owls, or maybe it sometimes gives bad/dangerous advice.

Now, the teacher never actually talks about owls or bad advice.

Instead, it just does something totally unrelated- like writing lists of random numbers or simple computer code.

You then train a new “student” AI on this data, thinking,

“Hey, these are just numbers, there’s no way this could transfer anything dangerous.”

Surprise:

But Why?

Hidden in the way the teacher writes numbers or code are tiny patterns too small for humans to notice, but the student AI can pick up on them. It’s like a secret accent or handwriting style. Like the DNA of the AI

If you use a big AI to make training data for a smaller AI, you might accidentally copy everything about the big AI, the good and the bad.

Even if you filter out all the obvious bad stuff, the patterns can still slip through.

Everyday Analogy:

It’s like learning someone’s handwriting style.

Even if they only write grocery lists, you can tell:

Real life example:

If your math teacher loves pizza, and you copy their homework answers for months-

without them ever talking about pizza you might start picking “pizza” when someone asks your favorite food, just because their style rubbed off on you.

This is good… but also could be bad if not harnessed properly.