D&D General D&D AI Fail

Twitter thinks there's a new WotC president who will give you a baby dragon.

I guess I don’t have to worry about my job going away quite yet. This is what Twitter’s AI thingy thinks is currently happening in the industry I work in.

Screenshot 2024-04-19 at 17.21.30.png
 

log in or register to remove this ad

EzekielRaiden

Follower of the Way
I mean, this is a problem for human beings as well. Circa three hundred years ago, there were people confidently asserting that combustion is a phlogiston-dependent phenomenon. And, without pushing too hard against board rules, I'm sure you're aware of many contemporary examples of large numbers of human confidently and sincerely affirming falsehoods.

The reason that people can tell the difference between truth and falsehood - when they can - is not because of their mastery of semantics as well as syntax, but because of their mastery of evidence. It's quite a while (close to 30 years) since I worked on these issues; but I think a couple of points can be made.

First, what Quine called occasion sentences must be pretty important. These can be identified by syntax, I think, at least to a significant extent. But AI doesn't have epistemic access to the occasions of their utterance.

Second, when it comes to what Quine called eternal sentences, human being closely correlate their credence to these with their credence to the speaker. My understanding of the way these AI models "learn" is that speaker identity does not figure into it, and that they are not grouping sentences in speaker-relative bundles. So eg they might note that sentences about the moon are often correlated with sentences about NASA, but (as I understand it) they don't weight sentences about the moon in terms of their production by NASA compared to Wallace (who travels with Gromit to the moon because their larder is empty of cheese).

I'm definitely not an expert in AI, and as I said I'm out of date in epistemological literature. But on the face of it, these problems of warrant seem more significant than the issue of semantics vs syntax.

Right. Which is not a problem in linguistics (syntax vs semantics). It's a problem in epistemology (evidence/warrant).
I think we're talking past each other.

In the context of the kinds of data being processed, semantic content, the "meanings" of things, is where things like evidence, credence, warrant, etc. lie. Unless you have the ability to look at what something means, and not just what its sequence is, you cannot even begin to consider "evidence" and "warrant" etc. If a human were restricted to exclusively making arguments based on the sequence in which (parts of) words appear, without ever being able to ask what any part actually means, they would never be able to even begin talking about whether one word-part (or set thereof) is warranted or not. To speak of warrant, you must know what the words mean; computers do not know that. They only "know" likely and unlikely sequences in which (parts of) words get used by people.

As a general rule, the way language-related* AIs work right now "tokenizes" (breaks up and indexes) the words humans use in some given language (such as English) into subword chunks, "tokens," which can then be combined. Using full words is too ponderous, since many sub-word parts (like "ing" and "tion") show up in bazillions of words, while using individual characters fails to capture enough useful structure in how the language is used by people. Subword tokenization is the norm today, mostly because it's significantly more scalable, and LLMs like GPT derive most of their value from having been scaled up to a large size.

The token-space for GPT-3, which is an older model since superseded by GPT-4, is somewhere above 12000 tokens, I don't remember the precise value. That means that GPT-3 has crunched through untold billions of lines of human-written text, identified 12000+ word-bits that get used often enough to be worth noting, and indexed them in a set of enormous matrices (really, sets of matrices of a few different sizes, because it turns out you can save a lot of space by doing certain calculations with smaller matrices). Whenever GPT runs--for example, when you give ChatGPT a prompt and ask it to respond--it takes your prompt input, breaks it up into the tokens it recognizes, and then passes the resulting 12000-dimensional vector through dozens of layers of matrix multiplication, with each layer filled with thousands of training-tweaked weights on the matrix multiplication.

These weights capture syntactic relationships between the tokens--they encode the ways that those tokens were used in actual words written by actual humans, the sequences and relationships between sequences. That "training" mostly worked by having the AI guess the next word in already-written text, usually failing, and having its numbers altered until it "correctly" "predicts" the next word actually used. Do this trillions of times, and you build up a mountain of matrix multiplication layers that can encode relatively long, relatively far-reaching syntactic relationships. And some of these can be quite impressive, like how GPT-2 invented a fictitious professor from the University of La Paz for that "unicorns in the Andes who could speak perfect English" prompt.

But there's a limit to how much the thing can see. It can only "see" a list of tokens up to the size of its input vector (which, as stated, is lover 12000 tokens). That's a huge number of tokens....and nowhere near enough to cover even a novella, let alone a truly long-form work like a textbook. Once you grow beyond that line, GPT and all models like it will start to "lose the plot," because they do not, cannot, actually understand what words mean. They can only "understand" (encode) sequences of words.

If you can't understand what even one single word means, the only portion of truth-value you can ever reach is logical validity, since an argument is valid purely on the basis of its syntactic structure, whether it has the right form, regardless of its semantic content, whether it has the right meaning.

*Some, trained for translation, tokenize for multiple languages. Others that are image-related tokenize for patterns of pixels. So on.
 

log in or register to remove this ad

pemerton

Legend
@EzekielRaiden

A key feature of occasion sentences is anaphora - both anaphoric pronouns, and also anaphora-like constraints on the content of referring terms. In mainstream models these are handled by various sorts of quantification devices (including restricted domains of quantification to handle the anaphora-like constraints) - eg "I have a dog. It is hairy. My couch gets covered in hair!" The pronoun "it" refers to the dog that I have. And the noun hair in the final sentence is referring to dog hair, not hair in general - that's the restricted domain of quantification (or whatever other technical device you want to use - personally I don't accept quantification as a good analysis of reference, but I think that's by-the-by for present purposes).

Let's suppose you can "teach" your AI model to parse these anaphoric elements - I don't know of any knock-down argument that this can't be done by way of generating probabilities of word-correlations based on inspection of millions or billions of sentences. (Gareth Evans and Donald Davidson have good arguments that it will require an interpretation manual rather than a Quinean or "Chinese Room" translation manual: but I don't know of any argument that what the AI is generating is necessarily the latter rather than the former.)

My point is that even so, the AI has no access to evidence, because it has no access to the occasions on which occasion sentences are produced, which are in turn the pathways to ascertaining expertise when it comes to the production of eternal sentences.

Thus, more semantically powerful/sophisticated AI won't solve the falsehood problem, as best I can tell. There needs to be a whole other input - namely, evidence as a constraint on the production of occasion sentences. Humans' embodiment in the world generates that constraint in our case. I have no idea what it would look like for an AI. (Though I'm sure there is writing on the issue - there always is!)
 



The only way to teach an AI how to avoid this would be to train it to only truly listen to trusted sources, and then it would only be as reliable as the sources it drew upon—and would have some issues if those sources are too few, as it might not have enough training data to spit out meaningful results. In theory though, you could make one designed to collate and summarize existing news reports.
There are companies doing exactly this. Training their own AIs with reliable data. Reliable as defined by trusted humans. All this AI crap that we see isn't being trained, it's being dumped in the pool of social media and left to itself and to being misused by humans with no thought or care about training it.

These AIs will be proprietary to these companies. And should be useful within the scope of what they are trained for.
The companies that make machine translation software have been saying the same thing for two decades. They keep throwing money into their marketing, and people who aren't translators believe them. I predict the same thing for AI.
As said, these translators are good enough. I use them when travelling in foreign countries and when doing simple labor and transactions with people who don't speak English on occasion. Admittedly, they are not good enough to translate technical content, books, and journalistic articles etc.
What would be nice is if AI could build up a list of trusted sources, or research and find those people for an industry and then conduct follow up fact checking with these people. That is time-consuming work human “journalists” rarely do anymore. The article could then quote and list their sources. If a source wished to be anonymous, it would need to verify with a minimum consensus or label those facts as potentially untrustworthy.
It would not surprise me if certain media companies are not training their own AIs for their own use. Won't make the journalists happy, but I think it's still probably happening.
This would require judgement and discernment that "AIs" do not possess. At best, they could be used to check scraped data from various human-picked trusted sources and collect them together for analysis.
Yep, as said, there are companies doing exactly this.
 



Ancalagon

Dusty Dragon
We're also running out of high-quality "natural" training data, which means we may even start growing a whole additional layer of "we don't know what's going on" by training one AI to generate high-quality fictitious data so that a second AI can use it as seed data for whatever the user actually wants to see happen. Personally, I'm real skeptical that such "synthetic" data can achieve anywhere near the same results as real-world data.

AI trained on AI data tends to do much worse - it increases hallucination and inaccuracies.
 

Related Articles

Remove ads

Remove ads

Top