A Field Guide to Modern AI // draft 01

The Author
in the Machine

How Claude and ChatGPT actually work — tokens, memory, tools, and retrieval — and what it means for the systems we build.

Hamza · Friday session

CONTEXT8.2k/200k

The map

Six ideas, one mental model

01Tokens — what it actually reads and writes

02The context window — the model's desk

03The author — who is really "talking" to you

04Trained, then frozen — its memory and its blind spot

05Tools & retrieval (RAG) — bringing the world to the desk

06Agents → ChatGPT — the system built around the author

^by the end, you'll see why our controls-gap tool looks the way it does.

01 · Tokens

It doesn't read words. It reads tokens.

Text is chopped into little chunks — roughly ¾ of a word each — including spaces and punctuation. To the model, every chunk is just a number.

A tidy sentence — here every chunk happens to be a whole word (plus the full stop):

Map the FCA rules to a control .

A rarer word and a number — sliced into fragments that aren't words at all:

token ization £ 1 , 250 , 000

^to the model, "tokenization" is just IDs like token·3086 · ization·2065 — never the letters you typed.

01 · Token accounting

Two meters: what goes in, what comes out

Everything you send is input; everything it writes back is output. You pay — and hit limits — on both.

Input

system instructions chat history pasted docs

~12,000

Output

the reply

~600

The catch: it re-reads the entire input every single turn. It has no memory between turns — only what's on the desk.

^long chat feeling slow or pricey? The input meter quietly grew every turn.

02 · The context window

Think of it as the model's desk

Everything it can see at once has to fit on one finite desk. Today that desk is large — ~200k tokens, about a 500-page book — but it is fixed.

Context window · ~200k tokens

System prompt — who it should be2k

The conversation so far7k

Documents you pasted in3k

← room left for the reply it's about to write

^nothing outside the desk exists to the model. Not on the desk = it didn't happen.

02 · When the desk fills up

Why it "forgets" — and can't swallow the Handbook

Long conversations

The earliest messages slide off the desk to make room. It genuinely "forgets" how the chat began.

Big documents

The full FCA Handbook is millions of tokens. It will never fit on the desk all at once.

Cost & focus

More on the desk means slower, pricier answers — and more chances to lose the thread.

So we choose

The skill is deciding what earns a place on the desk for each question. Hold that thought.

^this is the whole reason retrieval exists — we get there on slide 13.

03 · The author

The model is not the chatbot.
It's the author writing the chatbot's next line.

Almost everything surprising about these tools follows from this one shift.

03 · The author

What's really happening: it completes a script

SystemYou are a careful banking-controls assistant…
UserMap COBS 9 to the controls we'd expect to see.
AssistantBased on COBS 9, you would expect controls for

"Claude" and "ChatGPT" are characters in this script. The model's one job is to predict the next token of the Assistant line — over and over.

^you're not chatting with a being. You're co-writing a document and asking the author to continue it.

03 · Why it matters

How to think when you talk to it

It's steerable

Set the scene in the system prompt and you shape the character. Clear framing = a better author.

No memory between stories

Close the chat and it forgets. Only what's on the desk right now persists — nothing else.

It always continues

Even past what it knows. A fluent-but-wrong continuation is what we call a "hallucination".

No live world — by default

It only sees the words on the desk. It can't check today's facts… yet. (That's next.)

04 · Training

How the author learned to write

Before it ever met you, it read an enormous slice of text and tuned billions of internal dials to get better at one game: predict the next token.

Read once

Books · code ·
the public web

→

Process

Training
tune the "weights"

→

Result

Frozen weights =
the author's instincts

Weights = long-term memory (baked in, permanent). Context window = working memory (just for now).

^it absorbed patterns, not a filing cabinet of facts. That's why it's fluent — and why it can be confidently wrong.

04 · The cutoff

Trained once, then frozen in time

Training stops on a date — the knowledge cutoff. After it, the author knows nothing new: not today's news, not last week's FCA update, and not one word of your internal policies — it never saw them.

trained on everything up to here

knowledge cutoff

today

^so how can it answer about today — or about a document it has never read? →

05 · Tools

Give the author a phone and a library

A tool is permission to act outside the text. The simplest is web search — and it sidesteps the frozen cutoff entirely.

Author

"I should
look this up"

→
search()

The web

latest FCA
updates

→
results

The desk

fresh facts
placed in context

→

Author

writes a
grounded answer

the modelthe outside worldthe desk / context

^it didn't "learn" today's news. It fetched it and put it on the desk before writing.

05 · Retrieval — RAG

When the library is too big — or private

You can't paste the whole Handbook (slide 6). So: index your documents, fetch only the few relevant passages, drop them on the desk, then let the author write — grounded in them. That's Retrieval-Augmented Generation, the simplest useful agent.

Question

"controls for
COBS 9?"

→

Retrieve

search the
indexed docs

→
top passages

The desk

just the
relevant pages

→

Generate

grounded
answer + cites

^retrieve a little, generate grounded. The author only ever sees the relevant page, never the whole book.

05 · Our use case

This is the spine of the controls-gap tool

Index

FCA sourcebook
+ policy docs

→
retrieve

Model

infer the
ideal controls

→
compare

vs reality

the controls
that actually exist

→

Output

the gaps
& weak spots

Stage 1 — ideal state

Regulations → the controls that should exist: type, frequency, ownership, evidence.

Stage 2 — gap analysis

Ideal vs actual → what's missing, weak, or left to interpretation.

^Friday's build: one control area, public FCA data, retrieve → ideal controls. Small and real.

06 · Agents

Let the author act, look, and decide — in a loop

Give it several tools and let it choose: search a database, call an API, run a calculation, retrieve a doc. It works in a loop until the task is done.

Think

what do I
need next?

→

Act

call a tool /
an API

→

Observe

result lands
on the desk

→

Answer

when there's
enough to write

⟲ repeat — think → act → observe — until done

^a "tool call" is just the author writing a structured request; your code runs it and hands back the result.

06 · The product

So what is "ChatGPT"? The theatre around the author

Orchestration + UI

Tools · RAG · APIs

System prompt · memory

at the centre

the model — the author

The chatbot you talk to is the author wrapped in a system: a prompt that sets the character, conversation memory, retrieval over documents, a toolbox, an orchestration loop, and a friendly UI.

Same author at the centre. The product is everything we build around the desk.

The takeaway

Five things to keep

1It reads and writes in tokens, on a finite desk — the context window.

2It's an author writing a character — steerable, and only as grounded as the desk.

3It was trained then frozen — fluent, but blind past its cutoff and to your private docs.

4Tools & RAG bring the right facts to the desk, just in time.

5Wrap the author in tools + a loop + a UI → ChatGPT. Or our controls-gap tool.

^next Friday we put #4 to work — a minimal FCA → ideal-controls retriever. Python · Google ADK.

The Authorin the Machine