Judge decides case based on AI-hallucinated case law

Scribe · Jul 10, 2025

You have Judges and Lawyers believing hallucinations.

You have LLMs puking out their trained bias.

You have people claiming LLMs have spiritually awakened them.

You have people falling love and then killing themselves over interactions with LLMs.

You have LLMs telling people to eat glue.

You have LLMs giving "medical advice" which if followed would KILL PEOPLE.

So really, do we need to split hairs here?

Umbran · Jul 10, 2025

Jfdlsjfd said:
I meant, bribe the people responsible of curating the training data to introduce biased content favoring the interest of corporations over the interest of the health service. Especially since the training data could be subjected to a later audit.

Note that while you are focusing on the training data, and certainly that's one way to introduce bias into such a system, that's probably the hard way to do it.

To get Grok to do what it did, my understanding is that they didn't retrain it. They seem to have effectively altered filters on the allowed responses.

Dannyalcatraz · Jul 10, 2025

The Firebird said:
This doesn't follow. Grok being a black box did not stop us from detecting its bias. If the output is biased, we can observe it. If it's not observable, then not an issue.

The reason why we detected it is because we saw the change in real time in plain view.

Grok was released, and it immediately started fact-checking Elon & other right-wing dissemblers in real time. This resulted in Musk repeatedly and publicly opining about altering its code- which Grok itself commented on. Then, Grok’s output swung from relatively factual to full-throated radical right-wing talking points & propaganda, prompting a firestorm of commentary…which Grok itself predicted if its output were altered.

Anyone could see the shift, even without specialized knowledge.

In contrast, we’ve been assuming that a legal or medical AI’s biases would be baked in before release. And since a good portion of the fact checking could be based on the info the AI was trained on (supplied by interested parties), only those with specialized knowledge would be able to determine that bias might exist.

Additionally, it’s not far fetched that an organization with deep pockets could bribe the right people and make the audits unreliable and/or suppress them. You don’t have to bribe the majority of auditors if the bureaucrat or politician in charge is in your pocket courtesy of an offshore bank account in a tax haven.

Scribe · Jul 10, 2025

Scribe said:
You have LLMs giving "medical advice" which if followed would KILL PEOPLE.

Oh and I'm going to quote myself as I was pulled into a meeting.

Before anyone tells me 'nope not true' I have seen this personally multiple times since Google embedded their AI.

My wife would be FACTUALLY dead, if I had not gone in and read the sourced material to find that the AI was 100% COMPLETELY WRONG in what it served up.

Use and listen to these 'tools' at your own risk.

Umbran · Jul 10, 2025

The Firebird said:
The point was "if LLMs go off the deep end w/rt bias, no one will take them seriously".

So, that's exacty the issue I was trying to raise. When an AI calls itself "MechaH****r", we can easily see and not take is seriously. When they go off the deep end, they are less dangerous.

When they are biased, but don't go off the deep end, is when their bias can influence you most. When it says something problematic, but doesn't sound crazy, is the dangerous moment.

Also, "off the deep end" is a relative measure - it depends on the Overton Window for the community of users in question, just like "news" organizations.

The Firebird said:
Anti-semitic rants or not, if you can't recommend a good dishwasher people will stop using your product.

That depends - Grok doesn't do a lot of work recommending dishwashers, so it doesn't really need to do that. Grok seems like it is getting used to support commentary on social media posts. It doesn't need to have a lot of information on practical reality to do that.

The Firebird · Jul 10, 2025

Dannyalcatraz said:
The reason why we detected it is because we saw the change in real time in plain view.

Grok was released, and it immediately started fact-checking Elon & other right-wing dissemblers in real time. This resulted in Musk repeatedly and publicly opining about altering its code- which Grok itself commented on. Then, Grok’s output swung from relatively factual to full-throated radical right-wing talking points & propaganda, prompting a firestorm of commentary…which Grok itself predicted if its output were altered.

Anyone could see the shift, even without specialized knowledge.

In contrast, we’ve been assuming that a legal or medical AI’s biases would be baked in before release. And since a good portion of the fact checking could be based on the info the AI was trained on (supplied by interested parties), only those with specialized knowledge would be able to determine that bias might exist.

I disagree. At least in medicine, bias can be studied using only output. E.g., do patients with characteristic X have outcome Y with more or less frequency than expected.

With the LLMs statistical analysis should be easier because the data will be cleaner (easier to read via computer). So if you ask "is the LLM overprescribing drug X" or "is the LLM recommend fewer painkillers to population X" you can get an answer. You don't need access to the training data. You could test this with synthetic (generated) patient files.

I'm not as familiar with the legal scenarios. I imagine a similar argument will apply but I would be curious if it fails there.

The Firebird · Jul 10, 2025

Umbran said:
When they are biased, but don't go off the deep end, is when their bias can influence you most. When it says something problematic, but doesn't sound crazy, is the dangerous moment.

Agree.

Umbran said:
That depends - Grok doesn't do a lot of work recommending dishwashers, so it doesn't really need to do that. Grok seems like it is getting used to support commentary on social media posts. It doesn't need to have a lot of information on practical reality to do that.

Not aware of any statistics about usage. But I think the same argument applies--people rely on these tools to provide factually accurate information. It may be product recommendations, learning how to change a tire, getting context on social media posts. This requires a level of fidelity to reality. You can have some level of bias and still get there, just like news orgs. I don't see it as fundamentally changing the game here.

Umbran · Jul 10, 2025

The Firebird said:
But I think the same argument applies--people rely on these tools to provide factually accurate information. It may be product recommendations, learning how to change a tire, getting context on social media posts. This requires a level of fidelity to reality.

With all due respect, I am not sure what planet you are on.

I am on a planet in which measles, which at the beginning of the century was considered eliminated from the US, is making a comeback due to a decided lack of fidelity to reality.

That lack of fidelity has elevated people spreading misinformation, because they play on the fears, frustrations, and anger of regular people. I see no reason why a machine will be discarded for doing things humans can make millions of dollars doing.

jian · Jul 10, 2025

I guess we’re at the stage where we know that our various forms of media (books, news, TV etc) may not be always telling us the truth, but we don’t know if they’re actively deceiving us, not simply reflecting biases or getting facts wrong but deliberately lying to us for a specific purpose or agenda (some media assuredly are, of course).

We’re now also at the stage where we know genAI is not always telling us the truth, but now we also don’t know if they’re actively deceiving us. Is that about right?

In neither case do we seem to have the tools to make the outlet accountable or correct its inaccuracy or actual mendacity, it seems.

The Firebird · Jul 10, 2025

Umbran said:
With all due respect, I am not sure what planet you are on.

Then perhaps we should call it there. I respect your posts and your thinking about this. But the distance between our positions would probably require a long thread, and one close to politics, to appreciate. So I will avoid it.

Judge decides case based on AI-hallucinated case law

Scribe

Legend

Umbran

Mod Squad

Dannyalcatraz

Schmoderator

Scribe

Legend

Umbran

Mod Squad

The Firebird

Commoner

The Firebird

Commoner

Umbran

Mod Squad

jian

Hero

The Firebird

Commoner

Similar Threads

Pets & Sidekicks