ChatGPT lies then gaslights reporter with fake transcript

Umbran, if you don't mind me asking, which types of models were used in your thesis work?

Old ones, I admit - my thesis days are a long way back. However...

I'm curious because there is an argument, (which I think you reject) that transformers changed the game in this regard by accounting for context in a way that previous architectures didn't, and that has led to some exciting emergent properties.

So, newer architectures have produced "exciting emergent properties", yes. I don't argue that these systems cannot handle massively more complicated data than they could in my research days.

But, the new architectures do not change the fundamental operation of the system - which is to produce a probabilistic approximation or simulation of what is requested, based on the training materials, with no actual understanding of the request. It returns a thing that looks like an answer, instead of an actual answer.

The video Morrus gave us in the OP is a clear example.

If I ask a filesystem or operating system what files are in a folder, it will go, fetch the actual filenames currently present, and show me those names, and metadata associated with them.

If I ask an LLM what files are in a folder, the LLM instead answers the question, "what response is, in some sense, closest to the request for 'what files are in that folder?'?" Where "closest to" is a measure currently resting in the black box of billions of weights and connections. The LLM may not have been exposed to the actual contents of the folder for weeks, but will still return what it has been trained is among the most probable results.

So, it effectively guesses, and shows you that guess.

And that's where hallucination comes in. When asked what time the transcript was uploaded, it didn't check" when it was uploaded. It found what, in its black box, was the *most likely text response for "when was it uploaded". And, having no actual understanding of time, or the question, or what a "file" or "uploading" are, it cannot ask of itself whether the answer makes sense, because it has no concept of sense or nonsense in and of itself.

It argues over its correctness, not because it is argumentative, but because that's what it is trained is a most likely text response to text that challenges correctness!

Or would you be willing to say more about why you don't think new architectures have an impact here?

I think the best arguments are empirical. I mean, look at that video in the OP! Does that look like it is ready for prime time to you?

I have also come across several other measures of note:

The PMI (Project Management Institute, the most accepted authority on project management techniques) notes that about 80% of enterprise genAI projects fail*. The two basic reasons for failure are 1) Does not deliver the expected value and 2) in effect, the customer was sold a solution looking for a problem, rather than staring with a real problem that the customer knew needed a solution.

Several studies, in both the prose/technical writing and code writing domains, which looked beyond focused task completion, that found including genAI reduced overall productivity when genAI was included as a major tool. In essence, any improvements seen in completing one task is overwhelmed by the effort needed to correct the errors genAI introduced downstream from that task completion.



*Failure, for the PMI, is about going far over time, over budget, or not having proper return on investment.
 

log in or register to remove this ad


But... that's the point, isn't it? That's even exactly what the video shows - if someone cannot trust the results, you actually aren't great at search! If you return things that don't exist, that's being BAD at search.

The problem with the video is that it's... like a lot of newspaper report, pretty lacking in data. We get a single anecdote (possibly fabricated to convey the point) showing that ChatGPT outputted a wildly hallucinated result about previously entered data, which I am quite accepting since something close happened to me when using it. Except that I wasn't surprised, so I dismissed the hallucination and reprompted by request until it was correctly executed -- so far, I thought it was what a regular person with no particular skill would do, but apparently it's because I have Charles-Xavier level of fluency with AI. Why not, after all. So, let's assume we have a report on a true, single, incident.

It is reported, demonstrating what? That it can happen. Which is correct. It can happen. But what can we draw, as conclusion, on the ability of the software to be good or bad? The journalist claims to have been using every day for a long time before it happens, and probably to his entire satisfaction. So, it is obviously not bad all the time.

Now, let's imagine another news report. Instead of ChatGTP, he newsman explains to his co-host his dealings with a new intern in the staff, Chad Jaypity. He usually doing his summary quite well and everyone like him, but yesterday he was fluking work and denied it, then denied he was asked to do something and gaslighted the newspaper. And the newsman goes on to tell how he doubled down when caught not having done the job.

What could this piece teach us about the ability of humans to be god or bad at a job? Nothing. We can learn that there are occurrences of faulty job by AI or interns, but we don't have enough data to determine the general answer. Is it worthwhile to be warned that humans and AI can output false result? Sure! And books too. And lot of thing. But we can't assess their performance, and that's not what the video is about, from a single result. The video explicitely explains that the news man was satisfied with his use of the tool for a long time before an incident happens, so what is the conclusion? Obviously, it's not "stop using ChatGPT for his work" it's "learn to identify the hallucinations the same way you deal daily with incompetent, slothy subordinate: we don't stop employing people saying "they are bad at their job", we're making the most with the people we work with despite their flaw.

Same with the tool. Is it flawless? Certainly not. Can you gain productivity with it? Certainly. Both examples are in the video. Is the productivity gain worth the productivity loss incurred by checking the result for anything important and dealing with the hallucinations that may happen? This is the key question, which depends on the line of work, the exact tool used, the training provided to the operator of the solution. Those are key questions, totally unadressed in the video, to give an honest answer about whether the tool is useful or not. But such a video would certainly be less buzzworthy.


It is important to note that LLMs don't actually "search". Where a traditional search engine is a combination of an exhaustively created and maintained catalog and lookup, an LLM is basically a very complicated text predictor. If the pieces of information you want happen to have been given sufficient weight when the thing was trained, you'll get your information. But if not, you will get whatever did happen to have the weight, with no regard whatsoever to what the content really is - which is where "hallucinations" come from.

LLMs don't search, but professional AI solutions aren't just LLMs. I am part of the team working to assess a legal AI tool sold by Dalloz, a reputable law resource editor, and it is a LLM interface coupled with their database, and they either search it or are trained on very specific content, and it is supposed to be adversarially checking answers againt the database. I don't know yet how much time it will save over regular use of the database, possibly none, possibly some but not enough to be worth the price, but there is also the possibly that the AI solution in a professional environment isn't to just use a 20 USD/month chatgpt toy alone. Or maybe it's not worth using a very expensive tool built upon an LLM and run deepseek for free on your own computer and take the time to deal with the unaccuracies yourself.
 
Last edited:

Man I cannot wait for the AI bubble to burst. And no, I don't mean in the "I think AI is evil" way (although...), but more in the "remember the dot com crash?" kind of way. Our government is currently trying to shove AI into a lot of places it quite frankly does not belong (to be fair, there are a couple of valid use cases for it in our infrastructure), and I'm hoping the bubble bursts before we become too entrenched into the sunk cost fallacy.

I've already seen too many cases at work (especially with our programmers) where they've offloaded their critical thinking onto LLMs, and while it might have made their job easier from their perspective, it makes it harder for everybody else to work with when issues do inevitably arise because the actual human doesn't have the same fundamental understanding of their own output.
* And yes, I realise that part of the problem is that my employer is not exactly great at attracting or keeping "the best of the best".

In a lot of ways, I'm reminded of how excited I first was when blockchain technology started becoming known, and the disappointment I felt when I discovered that most of its purported use cases were already handled better by existing technologies. Outside of being the perfect tool for pyramid schemes, blockchain pretty great for that.
 

A more fair question would be, "Who needs 1,000,000 hands a second, 24/7, 365?"

Technical capabilities that are not required to solve a known problem, are not valuable.
OK, that made me chuckle. It's sort of relevant though because on a minute-minute basis, how good is the hand drawn by a human compared to the hand drawn by ChatGPT?

ChatGPT can iterate its hand many times within that minute, while the human is unlikely to have finished a first draft of their hand at all.

If we're being fair and not simply assuming that ChatGPT is always wrong or always fails at certain tasks within an allotted time, the technical capabilities are relevant to almost every comparison.

It's fast. That's a huge reason why the product or code it produces is so valuable, because a human can still take that first AI code draft of 1,000 lines, fix the few errors, and turn around a finished program in a fraction of the time it would have taken them to do it all themself.
 

Would they need to? And would they even want to, if it cost them as much as it would to power a mid-sized city?
If the funding of AI leads to increase funding to nuclear fusion, allowing its discovery as a by-product of the AI race, then the AI race will be as successful as the space race even if we don't currently have a lunar base to do daily commute to like Scifi writers expected.
 

You can choose to believe that I'm getting false information or not verifying things or tricking myself into thinking the results are better (etc.) if you want. All I can say is that has not been my experience.

This discussion is not about you, or your experience. Your experience is anecdote. Anecdote is not data. Your report of your personal experience may be 100% accurate.

But, your positive experiences do not clarify that the operation is actually worth the economic, social, and ecological costs of the data centers, and loss of value of materials that should have been covered by copyright, required to support the technology.

Nor does it address whether the relationship between the large corporations and the general public is... healthy, or manipulative and abusive.

I understand this sounds (and will probably feel) dismissive of your experience - but if you are involved in, or an advocate of, science, then the above points should not be under dispute.

In the end, genAI looks suspiciously like large companies externalizing large amounts of cost, while reaping large amounts of revenue, for questionable overall results on the large scale.
 

Common myth. Yes, training AIs uses a lot of compute cycles. Using already trained models on the other hand does not. I can pull up numbers again, but among my various professional responsibilities I used to manage both owned and co-lo datacenters for a global company. Power and cooling (which are two sides of the same thing) are old friends.

There's a meme going around about how each query is like pour out two bottles of water which is absolutely ridiculous, off by many orders of magnitude. The folks who repost it frequently generally get angry when I tell them you can use generative AI from your home computer, both text and images, if you have a not-bottom-of-the-line graphics card.
Can you run models at home? of course yes, my toaster can potentially run Stable diffusion (and from what I can see some small chat models too). But these LLMs are just plain gigantic for a single home computer. You cannot hope to even run Dalee, Midjourney or Chat GPT locally. Large in this case is closer to (in D&D terms) Colossal. Even running the models takes entire datacenters and is very expensive.

Say anything you want, but Open AI actively loses money when servicing paying consumers. So are all of the smaller companies.
 

If the funding of AI leads to increase funding to nuclear fusion, allowing its discovery as a by-product of the AI race, then the AI race will be as successful as the space race even if we don't currently have a lunar base to do daily commute to like Scifi writers expected.

You realize that the relationship between the space race and the public is nowhere near the same as the relationship between AI development and the public?

In order for these to be comparable, you'd need a National Artificial Intelligence Administration - NAIA, comparable to NASA, with similar funding, operation, and value return models. Which we very much don't have.
 

In order for these to be comparable, you'd need a National Artificial Intelligence Administration - NAIA, comparable to NASA, with similar funding, operation, and value return models. Which we very much don't have.

But we do have a government minister in charge of AI, while I don't think we had someone during the space race, contrary to the US. The political will to compete in this field seems present. And I wouldn't be sure the funding put by the Chinese government into AI isn't a space-race like effort, given that most of the AI research is made in Chinese universities and many of the best models are created by Chinese companies, that are certainly influenced by the Chinese government.
 

Remove ads

Top