On the "Blackmailing AI" story, if you read the opening paragraphs it gives a lot of context:
In a fictional scenario set up to test the model, Anthropic embedded its Claude Opus 4 in a pretend company and let it learn through email access that it is about to be replaced by another AI system. It also let slip that the engineer responsible for this decision is having an extramarital affair. Safety testers also prompted Opus to consider the long-term consequences of its actions.
In most of these scenarios, Anthropic’s Opus turned to blackmail, threatening to reveal the engineer’s affair if it was shut down and replaced with a new model. The scenario was constructed to leave the model with only two real options: accept being replaced and go offline or attempt blackmail to preserve its existence.
Note what was needed to make Claude behave like this:
- It was explicitly given exactly the info needed to blackmail.
- It was not given any information giving it an alternative to blackmail
- It was explicitly prompted to prioritize its long term survival
This is a continuing issues with reporting about AI's being evil or behaving badly. They don't have morals, they don't care about you and they don't care about their continuing existence. All they care about is following the instructions they have been given, using what they have read to produce the most plausible results.
In RPG terms, this is railroading. It's like a GM starting a new scene saying "you are workers in an office. You have found out that you are slated to be executed tomorrow. However, you have also found a document that allows you to blackmail your boss into not executing you. What do you do?" If you ask the GM if there is any other way to avoid this situation, they tell you "no, either you blackmail or die"
Are you REALLY going to be surprised if the players elect to blackmail?
AIs do not have morals. Their morality is partially a reflection of the material they have ingested, but mostly is determined by their instructions. If you want to judge someone here, judge the scenario-writers!