Serializing LLM context for reuse
I’ve been playing around with Github Copilot Chat to get it to convert some of our custom-wrapped Jenkins crons into Argo Workflow files.
What I’m actually trying to do is allow other people to convert their files, which makes it trickier. To do this, I want a bunch of text that will pre-load another context into a shape where it resembles mine.
Curiously, this seems to be an area that is currently sorely lacking — No one is talking about how to reproduce a state that an LLM is in, in an efficient way.
In 3 years we’ll look back at our current state and ask ourselves: “What were they thinking? Did they really send the ENTIRE conversation on each send()? How frivolously inefficient!”
Userbase
With man, it is impossible; But with code, all things are possible.
There are advanced techniques for building your own GPT, optimizing tokenization, etc. Since we are LLM users and not developers, these are unhelpful to us. We need a methodology for reproducing an LLM state to “continue a conversation”. The API of the user is a textbox, not python script.
Process
Since LLMs are text-instruction based, my current process is naive - ‘compiling’ the LLM down to a set of textual instructions that reflect its current state. Put in buzzwords, “AI-based prompt engineering”.
- Give it example sources and outputs, and ask it what its rules are — and tell it again “I said ALL the rules” because there are some things that are implicit that we want explicit.
- Try giving it other files to convert, adding and editing rules for every time it’s wrong. Don’t just tell it to fix the task, the task is just a way of reaching the true goal, the prompt rules!
- Try loading a new conversation (empty context) with said rules and see how it reproduces, repeat as necessary
The output of this is a textual list of instructions that can be used for any LLM. Having the output as a prompt means you can start the conversation with one LLM and effectively continue it in other LLMs.
Notes
- All conversions are a conversation with the LLM. It will get things wrong and will need nudges. This makes compress the process into text difficult — even if you ask it to list all the rules, it doesn’t always know to tell you, and you need to guess from context (badum tish)
- The context window hits hard when attempting to reduce a discussion with the LLM to a list of rules to reproduce the effects in a new context. The LLM forgets rules that it previously knew.
- The LLM sometimes doesn’t apply rules it knows! If you ask it “is the rule applied” it says “oh actually no lol” and fixes it, but this means we need to be vigilant
- There’s no reliable way to tell a current session “please forget everything and use ONLY these rules”. Even though it says it complies, you can tell the context is still being used. This means the only way to test if the “context prompt” is good, is to open a new conversation
- The LLM’s context is affected by files it converts, which means idiosyncrasies from one file can get copied into another file — to avoid this you can delete conversation and start over
Reproducibility
There are 2 conflicting goals for LLMs : Creativity and rigidity. Following a set of rules naturally makes the LLM more rigid and lowers the creativity. When exploring a problem area what we’re interested is how we can do so. When we come to actually solving the problem what we want is to be able to trust the result.
Other LLMs have a configurable Temperature setting for exactly this reason, different tasks require different “mindsets”.
In our case, we want to do the hard work of getting it to a state up-front, so others can take the artifact and use it.
One option is for the LLM to generate code — turn the soft LLM answers to rigid deterministic instructions. “Hardening” in this way makes it more difficult to introduce more soft rules when the users’ use-case is slightly different than ours.
Currently, there seems to be no better alternative than text prompts. Perhaps in the future we will be able to “freeze” an LLM state as a multidimensional vector to load into a similar LLM, but this optimization for LLM A will break compatibility with LLM B, which has an entirely different set of neurons and weights . As Doug McIlroy said, text is the universal interface.