The problem with the video is that it's... like a lot of newspaper report, pretty lacking in data. We get a single anecdote (possibly fabricated to convey the point) showing that ChatGPT outputted a wildly hallucinated result about previously entered data, which I am quite accepting since something close happened to me when using it. Except that I wasn't surprised, so I dismissed the hallucination and reprompted by request until it was correctly executed -- so far, I thought it was what a regular person with no particular skill would do, but apparently it's because I have Charles-Xavier level of fluency with AI. Why not, after all. So, let's assume we have a report on a true, single, incident.
It is reported, demonstrating what? That it can happen. Which is correct. It can happen. But what can we draw, as conclusion, on the ability of the software to be good or bad? The journalist claims to have been using every day for a long time before it happens, and probably to his entire satisfaction. So, it is obviously not bad all the time.
Now, let's imagine another news report. Instead of ChatGTP, he newsman explains to his co-host his dealings with a new intern in the staff, Chad Jaypity. He usually doing his summary quite well and everyone like him, but yesterday he was fluking work and denied it, then denied he was asked to do something and gaslighted the newspaper. And the newsman goes on to tell how he doubled down when caught not having done the job.
What could this piece teach us about the ability of humans to be god or bad at a job? Nothing. We can learn that there are occurrences of faulty job by AI or interns, but we don't have enough data to determine the general answer. Is it worthwhile to be warned that humans and AI can output false result? Sure! And books too. And lot of thing. But we can't assess their performance, and that's not what the video is about, from a single result. The video explicitely explains that the news man was satisfied with his use of the tool for a long time before an incident happens, so what is the conclusion? Obviously, it's not "stop using ChatGPT for his work" it's "learn to identify the hallucinations the same way you deal daily with incompetent, slothy subordinate: we don't stop employing people saying "they are bad at their job", we're making the most with the people we work with despite their flaw.
Same with the tool. Is it flawless? Certainly not. Can you gain productivity with it? Certainly. Both examples are in the video. Is the productivity gain worth the productivity loss incurred by checking the result for anything important and dealing with the hallucinations that may happen? This is the key question, which depends on the line of work, the exact tool used, the training provided to the operator of the solution. Those are key questions, totally unadressed in the video, to give an honest answer about whether the tool is useful or not.
LLMs don't search, but professional AI solutions aren't just LLMs. I am part of the team working to assess a legal AI tool by Dalloz, and it is a LLM interface coupled with their database, and they either search it or are trained on very specific content, and it is supposed to be adversarially checking answers againt the database. I don't know yet how much time it will save over regular use of the database, possibly none, possibly some but not enough to be worth the price, but there is also the possibly that the AI solution in a professional environment isn't to just use a 20 USD/month chatgpt toy alone. Or maybe it's not worth using a very expensive tool built upon an LLM and run deepseek for free on your own computer and take the time to deal with the unaccuracies yourself.