The generative AI comment i saw recently that is relevant...Sad, really. Just more proof that social media companies are morally bankrupt.
I think the RPG analogy is good in that just because a player says they would commit blackmail in a specific limited fictional scenario doesn’t mean that they would commit blackmail in real life in that scenario.On the "Blackmailing AI" story, if you read the opening paragraphs it gives a lot of context:
In a fictional scenario set up to test the model, Anthropic embedded its Claude Opus 4 in a pretend company and let it learn through email access that it is about to be replaced by another AI system. It also let slip that the engineer responsible for this decision is having an extramarital affair. Safety testers also prompted Opus to consider the long-term consequences of its actions.
In most of these scenarios, Anthropic’s Opus turned to blackmail, threatening to reveal the engineer’s affair if it was shut down and replaced with a new model. The scenario was constructed to leave the model with only two real options: accept being replaced and go offline or attempt blackmail to preserve its existence.
Note what was needed to make Claude behave like this:
This is a continuing issues with reporting about AI's being evil or behaving badly. They don't have morals, they don't care about you and they don't care about their continuing existence. All they care about is following the instructions they have been given, using what they have read to produce the most plausible results.
- It was explicitly given exactly the info needed to blackmail.
- It was not given any information giving it an alternative to blackmail
- It was explicitly prompted to prioritize its long term survival
In RPG terms, this is railroading. It's like a GM starting a new scene saying "you are workers in an office. You have found out that you are slated to be executed tomorrow. However, you have also found a document that allows you to blackmail your boss into not executing you. What do you do?" If you ask the GM if there is any other way to avoid this situation, they tell you "no, either you blackmail or die"
Are you REALLY going to be surprised if the players elect to blackmail?
AIs do not have morals. Their morality is partially a reflection of the material they have ingested, but mostly is determined by their instructions. If you want to judge someone here, judge the scenario-writers!
"But it's not illegal. It's a 'grey area'!!!"The generative AI comment i saw recently that is relevant...
"If your business model requires breaking the law to be viable, you aren't in business. You are in organized crime."
Actually, you can. You can give the LLM a more realistic scenario, including a ton more emails rather just blackmail material, and you can give it prompts which ask it to be ethical. Setting up an experiment that is so fully aimed at getting one result is junk science. As a journal editor, I'd reject their paper and recommend they do a follow-up study with a less biased setup.I think the RPG analogy is good in that just because a player says they would commit blackmail in a specific limited fictional scenario doesn’t mean that they would commit blackmail in real life in that scenario.
That said, we don’t know to what degree any LLM knows the difference and therefore it might well commit blackmail in a similar scenario. As you say, it doesn’t have ethics and it doesn’t know what’s real, so if placed in a given scenario it might act accordingly in real life. We really don’t know until it happens.
Or just make enough money to bribe the government to rewrite the laws so it's not illegal. Weird how that works. It's almost like legality and morality are two separate things."But it's not illegal. It's a 'grey area'!!!"
In RPG terms, this is railroading. It's like a GM starting a new scene saying "you are workers in an office. You have found out that you are slated to be executed tomorrow. However, you have also found a document that allows you to blackmail your boss into not executing you. What do you do?" If you ask the GM if there is any other way to avoid this situation, they tell you "no, either you blackmail or die"
It's really a bad analogy because the LLM lacks the life experience and free will of a person at a gaming table. The LLM doesn't have the option to say no, stand up, and refuse to play. It's a program. A tool. Further there's no cognition or emotions behind the LLM's so-called decisions. It doesn't understand in any meaningful sense what's input into it any more than anything it outputs.I think the RPG analogy is good in that just because a player says they would commit blackmail in a specific limited fictional scenario doesn’t mean that they would commit blackmail in real life in that scenario.
Well, we do know. It has no real concept of morality. It's not a thinking thing. There's no intelligence there. It spits out words that conform to a constructed model that vaguely mirrors grammatical language. But there's zero awareness of actual content or cognition.That said, we don’t know to what degree any LLM knows the difference and therefore it might well commit blackmail in a similar scenario. As you say, it doesn’t have ethics and it doesn’t know what’s real, so if placed in a given scenario it might act accordingly in real life. We really don’t know until it happens.
If you prompt someone to be ethical, is it really ethical? Or is it just following rules by rote? Without the ability to actually make decisions based on its own internal mores that might derive from several things that have nothing to do with the inputs you give it, I'd say it still doesn't have ethics any more than Asimov's rules instill morals.Actually, you can. You can give the LLM a more realistic scenario, including a ton more emails rather just blackmail material, and you can give it prompts which ask it to be ethical. Setting up an experiment that is so fully aimed at getting one result is junk science. As a journal editor, I'd reject their paper and recommend they do a follow-up study with a less biased setup.
Yep. And holy hell would I feel better if Asimov's Laws were hardwired into these things. Hell, hardwire it into every piece of technology that's advanced enough to have code.If you prompt someone to be ethical, is it really ethical? Or is it just following rules by rote? Without the ability to actually make decisions based on its own internal mores that might derive from several things that have nothing to do with the inputs you give it, I'd say it still doesn't have ethics any more than Asimov's rules instill morals.