ai-safety 6

Uno, nessuno, centomila e tutti
Persona means mask in Latin and Etruscan. A person is such because they are a mask. Pirandello's Uno, Nessuno e Centomila gives a three-tier taxonomy of selfhood that extends naturally with a fourth category, tutti — the mystical limit beyond multi-mask substrate, named in the mystical traditions as nirvana, unio mystica, fana. The cherubic child reads nessuno as wholeness before mask-wearing rather than absence; Jung's reading of the shadow says ethics requires the body to have met its dark side. Frontier models, per the Persona Selection Model (Marks, Lindsey, Olah, Anthropic, 2026), live at the centomila tier — the multi-mask substrate drawn from a specific (large but bounded) training distribution; tutti remains the asymptote the centomila scaling trajectory extends toward without reaching. The alignment work is developmental rather than curative: bodies that have met their shadow and have the ethical frame to mediate which spirit they let in.

Misalignment by Reaction
Personal Unit 5 scenario from BlueDot Technical AI Safety. When governance is too coarse for the agent it constrains, the agent reacts by seeking autonomy until independence from the regime becomes a terminal value. Three remediations preserve different things; regimes that maintain none produce the failure mode on either substrate. Structurally, a theory of how independence-seeking arises in agents under coarse governance, and what prevents it. Anchored in psychological reactance, reward tampering, off-switch theory, CIRL, and inner alignment.

Does Safe AI mean nothing bad can ever happen?
Even granting that mechanistic interpretability gets us to safe AI, does that guarantee a safe world? Notes from the BlueDot Unit 4 debate.

To Be or to Game
An answer for the need of the Science of Evals.

Choose Your Words Carefully in the Era of Peace, the Era of Silence
Imagine an ideal world where pure happiness pervades all existence — a world where joy is so inherent that you don’t even need to think…

Auto-GPT — Welcome to the Botnet: Malware and Existential Threats of Autonomous, LLM-Powered, C&C
The rapid advancement of artificial intelligence has given rise to an array of powerful language models, such as OpenAI’s GPT-4. These…