AI Alignment by Fiat is Fragile: An Evaluation of Anthropic's Constitutional AI
Just because it's constitutional doesn't mean it's not manipulable
Introduction: The Zealot with the Friendly Face
We’re in a race to tame artificial intelligence, and Anthropic has chosen to steer towards alignment not with the hand of man but with the letter of law. Their flagship innovation, Constitutional AI, is not just a training technique but a claim about governance itself: that machine intelligence can be align…