AI Alignment by Fiat is Fragile: An Evaluation of Anthropic's Constitutional AI

Just because it's constitutional doesn't mean it's not manipulable

Apr 07, 2025

∙ Paid

Introduction: The Zealot with the Friendly Face

We’re in a race to tame artificial intelligence, and Anthropic has chosen to steer towards alignment not with the hand of man but with the letter of law. Their flagship innovation, Constitutional AI, is not just a training technique but a claim about governance itself: that machine intelligence can be align…

Buy the Rumor; Sell the News

AI Alignment by Fiat is Fragile: An Evaluation of Anthropic's Constitutional AI

Just because it's constitutional doesn't mean it's not manipulable

This post is for paid subscribers