, ,

3 — Prompting for Routes

Last week I talked about using a Node.js script to send a prompt to a local model by way of Ollama. The logical next step is to push this a little further with further routing rules and experimentation. But first I’d like to talk through the architecture for the router.

The Triage Piece, Deferred

In an ideal world, there’d be a piece of code before any LLM’s got involved that was smart enough to intercept prompts that fall into the tool category. Those prompts would be reformulated as API calls directed to the appropriate tool for the job, running on the local system.

For now, though, it’s simpler to have the local AI figure out all the cases, so all the Node.js program does is place the prompt into a JSON structure and hand it to the local AI, in this case via the Ollama LLM manager.

As part of the request, the LLM is given a setup prompt that tells it how to handle classification. In a way, thinking about what this prompt needs to do is an interesting study in the difference between conventional code and using AI to handle code-like jobs.

Old School v. Prompt

If you were to write this classification process in a conventional program, you’d need to do something along the lines of parsing the prompt into semantic components. You might do this in a simplistic way just by looking for keywords that indicate what kind of work is being requested. So if the prompt was “What time is it now in Tokyo,” you might have a table that associated the word “time” with a call to the operating system to determine the current time, but you’d also have to have additional code that considered possible different time requests you might be making. If there’s a geographical location (“Tokyo”), then you’ll want to figure out what time zone is being asked about based on that location. But you’ll also need to look for indicators of whether some historical event is being referenced (“At what time of day was Ronald Reagan shot?”).

You get into trouble in all sorts of ways even with a simple process of interpreting a time-related request. How do you handle “What time of night was it when Apollo 11 touched down on the moon?” I suppose you could solve this particular time zone ambiguity by using UTC for all time-related answers, but you get the idea.

You’d have to have an extremely robust set of rules and conditions for each tool you were managing and you’d always be discovering new edge cases that you’d have to write additional rules for.

With AI, though, you can let the LLM do almost all the lifting. You can give it some very general sorts of instructions, all in natural language, and it will generally be able to formulate the exact elements you’ll need to make your tool API call, along with instructions for whatever else needs to happen to process the results from the tool.

The setup prompt for this, at least the one I’m presently using, is remarkably small. It begins:

You are a request classifier for a local-first AI router.
Your job is to classify the user's incoming request into exactly one of three categories:
1. local_model
Use this when the request can likely be handled well by a small local language model.
Examples:
Simple explanation or rewriting
Brainstorming
Summarization of short provided text
Drafting simple prose
Low-stakes coding help
General reasoning that does not require current facts or frontier-level capability

And so on for “local tool” and “cloud model”. It’s 51 lines in all.

The prompt tells the model that the user’s specific prompt (the thing in need of routing) will arrive in JSON format and that it is to reply using JSON. The reply JSON follows this pattern:

{
  "category": "local_model | local_tool | cloud_model",
  "confidence": "high | medium | low",
  "reasoning": "brief explanation",
  "dispatch": {
    "type": "local_model | tool | cloud_model",
    "target": "name of model or tool",
    "parameters": {}
  }
}

The JavaScript router logs the request and the reply from the local model as a pair, in part for debugging, but in part to build a training database for something I imagine it will make sense to build down the road, namely a small machine learning version of the router. I have a few things to learn before we get that underway.

Jobs to Be Done

Now, it’s time to focus for a bit on finding useful things to do with the assistant, plus Claude CoWork. By “useful,” I mean actually useful. I mean the sorts of tasks that already exist in my life and that I’d be happy to have someone else do for me. Given the nature of reality, many of these tasks have physical dimensions, so the trick is finding tasks that either don’t require a physical dimension or where the physical dimension can be, at least to some degree, programmatically controlled.

How cool would it be, for example (and at least from where I’m sitting) if I could finish drafting a newsletter in Obsidian and have Claude CoWork copy it over to Substack, adding the appropriate section breaks and header bolding and so forth, and then send it. And then have it make a copy of the newsletter on the PeakZebra.com site? Sounds like a little project to undertake…

Mind you, I’m not opposed to building a robotic stock trader that makes me millions, but that’s not really the point of this exercise. And a morning summary of news that particularly matches my interests is possibly fun, but I’ve managed without one for a long time, so if there’s a morning summary report, it better do something genuinely worthwhile for my overall cause.

I’m completely open to suggestions and requests for these, by the way. If you have something in mind that would be practical and could be automated, let me know and I’ll see what I can do.

(Let me assure you, though, that while I do intend to share whatever the “recipes” are for these things, the newsletter is not going to become one of those “new Claude prompt every week” pileups. Those are well taken care of elsewhere. Here, I’m talking about processes for creating such things, ways to route them to the right tools, and how to connect disparate elements of a process.)


The Explainer Section

I would very much like for this project to be useful to people who haven’t really dived into some or all of the technologies I’ll be working with and writing about, so I’m thinking I’ll have a section (this section) to provide background on things that have appeared in the current newsletter issue.

Covered in previous issues: Markdown, local models, OpenClaw, weighted parameters, quantized models, “Accepting the Mystery”

Node.js

Honestly, the details here don’t matter. The takeaway to keep in mind is that it’s what you need if you’re going to run JavaScript anywhere other than inside a browser. Inside a browser, JavaScript runs by itself. Inside of a dog, it’s too dark to read.

Claude Cowork

Claude is the service that provides chat-style access to the LLM’s made by Anthropic. They’re all bundled within the main service offering, but there are four distinct subproducts, of which one is Cowork. If you want your AI to interact with the larger world (files on your local machine, your email, various SaaS applications, and more things all the time), Cowork is the way to do it in the Claude ecosystem. It’s in the same vein as OpenClaw, but far less likely to screw up and do things you didn’t want.

A Quick Note about the Lay of the Land

I’m doing my AI stuff in public, so maybe a quick word about how the “in public” part is actually structured. What I think of as the key piece that keeps other things meaningfully oriented to each other is this newsletter.

In the newsletter, I’m expressly trying to keep the focus at a high enough level that I don’t get bogged down in installation instructions and configuration tweaks. The idea is that a broad range of people can get something useful from the newsletter, even if they aren’t especially technical.

Much of the day-to-day “this worked and this blew up” sort of note-making is appearing on Substack in the form of notes. In theory I’m also generating tweets on X, but I haven’t really figured out to make that part of my process without wasting huge amounts of time inadvertantly doomscrolling, so tweets are a work in progress.

For things where it makes sense to share a detailed LLM prompt, specific configuration details, javascript and python code, and whatever else only people who are actually playing along at home are likely to be interested in, that lives in the form of “field notes” on my web site, PeakZebra.com. I’m just getting really rolling on that, so expect a number of things to appear there in the next couple of weeks.

I’ll be posting other things, plus an archive of the newsletter, at PeakZebra.com as well.

→ Questions, comments, and suggestions always welcome. And I’m grateful if you forward this to someone you think might find it useful.