Upgrade your account to a Community Supporter account and remove most of the site ads.
Rocket your D&D 5E and Level Up: Advanced 5E games into space! Alpha Star Magazine Is Launching... Right Now!

Reply to thread

Message: <blockquote data-quote="Umbran" data-source="post: 9768474" data-attributes="member: 177">Old ones, I admit - my thesis days are a long way back.  However...So, newer architectures have produced "exciting emergent properties", yes.  I don't argue that these systems cannot handle massively more complicated data than they could in my research days.But, the new architectures do not change the fundamental operation of the system - which is to produce a probabilistic approximation or simulation of what is requested, based on the training materials, with no actual understanding of the request.  It returns a thing that looks like an answer, instead of an actual answer.The video Morrus gave us in the OP is a clear example.  If I ask a filesystem or operating system what files are in a folder, it will go, fetch the actual filenames currently present, and show me those names, and metadata associated with them.If I ask an LLM what files are in a folder, the LLM instead answers the question, "what response is, in some sense, closest to the request for 'what files are in that folder?'?"  Where "closest to" is a measure currently resting in the black box of billions of weights and connections.  The LLM may not have been exposed to the actual contents of the folder for weeks, but will still return what it has been trained is among the most probable results.So, it effectively guesses, and shows you that guess.And that's where hallucination comes in.  When asked what time the transcript was uploaded, it didn't check" when it was uploaded.  It found what, in its black box, was the *most likely text response for "when was it uploaded".  And, having no actual understanding of time, or the question, or what a "file" or "uploading" are, it cannot ask of itself whether the answer makes sense, because it has no concept of sense or nonsense in and of itself.It argues over its correctness, not because it is argumentative, but because that's what it is trained is a most likely text response to text that challenges correctness!I think the best arguments are empirical.  I mean, look at that video in the OP!  Does that look like it is ready for prime time to you?I have also come across several other measures of note:The PMI (Project Management Institute, the most accepted authority on project management techniques) notes that about 80% of enterprise genAI projects fail*.  The two basic reasons for failure are 1) Does not deliver the expected value and 2) in effect, the customer was sold a solution looking for a problem, rather than staring with a real problem that the customer knew needed a solution.Several studies, in both the prose/technical writing and code writing domains, which looked beyond focused task completion, that found including genAI reduced overall productivity when genAI was included as a major tool.  In essence, any improvements seen in completing one task is overwhelmed by the effort needed to correct the errors genAI introduced downstream from that task completion.*Failure, for the PMI, is about going far over time, over budget, or not having proper return on investment.</blockquote>

Verification