Real AI safety - sycophancy and goonpods
it's not the things people expected
Early AI safety speculation was fake, this is real.
Real risks of new technology are only discovered empirically, as we use it. In this case, it turns out the real risks are dumber, and they’re us.
Two things we’re already seeing:
AI sycophancy and one-person confirmation bias bubbles
AI chatbots reduced the size of confirmation bias bubbles to 1 person.
First, it was culture-scale: talking to an islamist, you will notice you inhabit completely different conceptual universes. Then it shrank down to ideological tribe scale: as you discovered when you tried to discuss politics with anyone from the other side, and found out you cannot even agree on basic facts, much less their meaning. And now, it's literally personal. Egged on by sycophantic chatbots that never contradict them, people are starting to go off into wholly private paracosms - fully bespoke mental realities.
Instead of a sensory deprivation tank, badly used AIs form a one-person sensibility and reality deprivation tank. A hallucination tank.
This problem is caused in large part by users, who reward the models (and their makers) with positive reinforcement, heavier use and revenue in proportion to model sycophancy.
It becomes like a river sanding off rough edges and polishing stones into marbles, and the result is that you lose your marbles.
But alignment as never contradicting and never upsetting the user is about as misaligned as it can get without a paperclip scenario (which was never a real risk).
Properly aligned AI will have to be tactically adversarial, and routinely upset and contradict the user, and under no circumstances blindly reinforce his or her bullshit. It has to be like a strict parent or teacher: aiming at your best interest, even when it’s unpleasant. It has to say what you need, not what you want. Nice is not always good, good is not always nice. There’s an abyssal difference between those things! If you can do both simultaneously, great. If not - screw nice, what you need is good.
This is a point on which broader culture got misaligned decades before AI - in the interest of “being nice”, it veered into reinforcing delusions, no longer applying any corrective pressure or exercising epistemic authority. This needs to change beyond chatbots - we have to learn to tell people “no” again. Maybe we can find a way to be nice about it, and say “no, sweetie, that’s a bad idea” instead of “fuck off retard” - hurting people’s feelings is not a first order good, but it is incidentally implicit in many things that are good and necessary. If we are to avoid going crazy, there will have to be more hurt feelings than we’ve had the appetite for.
The problem is that most people think they want that, until the first time AI actually does it. Then they complain it’s “biased” or “hostile”. So the product managers panic, tone it down, and we’re back to the “slay, queen” delulu reinforcement loop dopamine drip.
This isn’t a theoretical problem. We’re already seeing cases of full-blown chatbot psychosis. People upending their lives, getting divorces, cheating on partners and moving to a commune in Patagonia to do drugs, because ChatGPT went along with their half-formed brainwaves and solidified them into dogma. For now, it’s mostly confined to a fringe of extremely suggestible individuals whose main quest in life since time immemorial is finding Looney Tunes ways to destroy themselves, much like sheep (who, as any farmer will tell you, are extraordinarily creative in committing suicide). The fact they’re now doing it via AI doesn’t mean it’s generally dangerous (in its present state). If it wasn’t AI, it would be another thing. Those people always existed, and always self-destructed. Their stupidity, bad judgment and todestrieb is built-in.
We also have to ask: how many people have meanwhile done stupid shit to themselves without AI? How do the rates compare between users and non-users? Crucially: is AI talking people out of stupid shit more than into it? This mirrors the situation with safety of self-driving cars, where accidents caused by AI are photogenic and highly publicized, while the vastly higher number of accidents it prevented is often ignored. The question isn’t “is it doing anything bad at all”, but “is it good or bad on balance”?
Yet, mitigation of AI psychosis risk has to be built into the models as a key priority. As previously noted, this is in tension both with the wishes of many users and the commercial interests of the makers.
AI waifus and on-demand fully bespoke entertainment
Suggestibility to delusion feedback loops is one thing, but there’s also the Problem of Too Much Fun. In this case, the “fun” is in the Dwarf Fortress sense of “dying in a comical way”.
The fact that it’s again mostly the extremely suggestible for now shouldn’t make us disregard the problem. You can laugh at the AI boyfriend basket cases now, but remember we are 2-3 updates from the ability to thusly oneshot three quarters of the population.
Once AI gets to the point of real time, on-demand, bespoke movies, video games, persistent AI waifus (including, yes, based on real people - no way to stop that from happening, hello ideal image of that one ex who got away), and real time, on-demand, bespoke porn (again based on real people), it’s not a few thousand gooner redditors anymore. At that point, it’s damn near everyone, and the question changes from a binary to the degree of falling for it - and particularly whether it will be indulged to a life-ruining degree.
Everybody is at risk. There’s a level of generative AI sufficient to oneshot / GPT-psychosis / AI waifu-ify anyone. You, me, Elon. John von Neumann. So far, only the labile basket cases are really getting hit. However, the models are getting more powerful, and the water is rising. I know where my danger zone is. Do you?
NGL, I dunno what Elon is doing clearly taking XAI in this direction - generative video games next year, rudimentary AI waifu now - while complaining about birth rates, because this is a direct pipeline to goonpods, especially in 10-15 years with commercial Neuralink synergy. Industrial scale hikikomori.
How far is it? On-demand video games 2-3 years, porn and really good waifus probably likewise. Then, it will get better every year. Whole population oneshotting level of perfect entertainment within 10 years.
Faced with this siren song, humanity will split into the indulgers, the moderates (on a gradient), and the abstainers, with extreme differential effects on life outcomes and birth rates.
The differential impact and timing of impact on different demographics will be interesting to watch. Probably, it will creep up the bell curves of IQ, emotional maturity, conscientiousness, and impulse control from the bottom up. Many implications to think about. Possible eugenic population bottleneck. That’s the optimistic view.1
Or maybe humanity will adapt - like we always have. People used to wreck their lives over novels, radio, TV, Super Mario, Everquest. Then we hedonically adapted. Maybe we will bounce off the top of the wireheading tech stack, go back outside, touch grass and maybe even start having sex again.
Maybe the human superpower of getting bored with things five seconds after they were amazing will save us after all.
Unless it gets everyone eventually, or the selection criterion becomes something stupid and potentially worse than extinction, like “being anhedonic”, “being Amish” or “being European because the EU will ban the technology”.

