Thanks, Scott. I think we’re very much aligned on what the tools
can do—though I’d caution you on one point. Summaries aren’t a solution to the memory problem. They’re a workaround—and only a partial one.
Adding more detail to summaries doesn’t preserve more information. It just increases the cognitive and processing burden the LLM has to carry in-session. The model doesn’t “retain” the summary—it reinterprets it at runtime, consuming tokens and model capacity to make sense of it
on the fly. The more verbose or intricate the input, the more pressure it puts on that process. Something has to give.
The trick isn’t more detail. It’s
compression with clarity. Summaries need to be:
- High-signal, low-noise
- Prioritized by what the model actually needs to re-reference mid-play
- Consistently reinforced through structured prompts or schema
Otherwise, you’re just shuffling memory around and hoping the system doesn’t drop anything important. It
will. LLMs don’t have a memory problem—they have a token economy. You’re renting attention with every message. Spend wisely.
That said, all is not lost. One of their best traits is its flexibility—when it forgets, you can just remind it. It doesn’t argue. It doesn’t resist. It adapts in real time and moves forward. You can always steer the conversation and reinforce what matters, and it will follow. That kind of responsiveness isn’t perfect, but it’s powerful—and more than enough to make it work.