Lloyds Banking Group interview · Data Scientist

The Author
in the Machine

A practical guide to LLMs: from AI and machine learning to transformers, attention, retrieval, and responsible banking systems.

Hamza · 15 minute presentation

CONTEXTLLMs

The map

Four questions, one mental model

01What do we mean by AI and machine learning?

02What is a large language model actually doing?

03Why did transformers and attention change the game?

04How do we make LLMs useful, grounded, and safe?

^one banking sentence will follow us through the whole talk.

01 · The landscape

LLMs are one branch — not the whole tree

Artificial intelligence

Machine learning

Deep learning

LLMsthe author

Artificial intelligenceAny system doing something we'd call intelligent.

Machine learningLearns patterns from data, not hand-written rules.

Deep learningLearns its own features through layered networks.

LLMsA deep network for language — predict the next token, at scale.

01 · The road to LLMs

Language models didn't start with ChatGPT

Each era learned more of the language from data — until attention let them finally scale.

02 · Meet the LLM

An LLM is autocomplete — scaled up beyond recognition

The same instinct as your phone keyboard — guess what comes next — trained on a vast amount of text. It's what sits behind ChatGPT, Copilot and Gemini.

The customer disputed the

paymentmost likely

refundless likely

chargeless likely

^scale it from one word to whole documents and the same trick can summarise, draft, classify and answer.

02 · Tokens

It doesn't read words. It reads tokens.

Our sentence, the way the model actually sees it — common words stay whole, rarer words break into pieces, and punctuation is its own token.

Thecustomerdisputedthepaymentbecauseitwas duplicated1 word → 2 tokens .

So 9 words ≈ 11–13 tokens — the exact split is tokenizer-specific, but tokens are the unit of reading, pricing and context limits.

02 · Next-token prediction

All it really does: predict the next token

WritingThe customer disputed the payment because it was ▢

duplicated0.41

unauthorised0.17

declined0.12

incorrect0.08

^pick one, add it, repeat. That loop is how it wrote the sentence above — one token at a time.

03 · The transformer moment

Older models read in a line.
The transformer reads everything at once.

Before — one word after another

It reads in order, so two distant words are many steps apart — the thread between them fades.

Transformer — every word, at once

Every word can look at every other word directly — which is what let LLMs handle long context and scale.

03 · Attention

Attention asks: what should this word look back at?

“it” — the word being resolved “payment” — what it refers to thicker arc = more attention

03 · The context window

Think of it as the model's desk

Everything it can see at once must fit on one finite desk: instructions, the conversation, retrieved evidence, and room for the reply.

Context window

System prompt — role, rules, guardrails2k

The conversation so far7k

Retrieved policy and evidence12k

Answer being written600

not on the desk = not reliably available

^this explains memory limits, document limits, and why grounding matters.

04 · The author

The model isn't the chatbot.
It's the author writing its next line.

SystemYou are a careful banking-controls assistant…
UserMap this regulation to the controls we'd expect to see.
AssistantBased on the evidence provided, the expected controls are

Prompting is setting the scene for the author: role, task, evidence, format.

04 · Reliability

A fluent answer is not the same as a true one

Fluent

It can produce polished, plausible language even when the right evidence is missing.

Grounded

The answer is tied to trusted evidence on the context desk, ideally with citations.

To the model, every question feels like an exam — it would rather guess than say “I don't know.” In a bank, a polished but ungrounded answer is a risk, not a gain.

04 · Retrieval — RAG

Turn a closed-book exam into an open-book one

Retrieval-Augmented Generation indexes trusted documents, fetches the relevant passages, places them on the desk, then lets the author write a grounded answer.

Question

“what controls
apply here?”

→

Retrieve

search trusted
documents

→

Desk

relevant
evidence

→

Generate

answer with
citations

^not the whole library — just the right pages, at the right moment.

04 · From model to product

The product is the system around the author

Evaluation · governance · UI

Retrieval · tools · APIs

System prompt · memory · policy

at the centre

the model — the author

A ChatGPT-style tool is prompts, retrieval, tools, memory, UI, logs, evaluations, access controls and human workflows wrapped around a model.

Production reliability lives in the wrapper.

05 · In a bank

Where it helps — and how we keep it safe

Where LLMs fit

›Customer ops — complaint triage, call summaries, colleague assist

›Risk & controls — evidence gathering, policy mapping, gap analysis

›Knowledge work — regulatory updates, policy Q&A, document search

›Data science — code help, write-ups, stakeholder translation

How we keep it safe

›Ground answers in trusted sources, with citations

›Evaluate on realistic tasks and known failure modes

›Keep humans in the loop for high-impact decisions

›Log, monitor and audit; protect data and access

The takeaway

Five things to keep

1AI is the umbrella; machine learning learns patterns from data.

2An LLM is autocomplete at scale — it predicts the next token, over and over.

3Attention is how it weighs context — that's how “it” found “the payment.”

4The model only uses what's in its weights or on its context desk.

5Real value comes from the wrapper: retrieval, tools, evaluation, governance.

^as a data scientist, the work isn't just the model — it's making the system measurable, grounded and safe.

What I'd bring

Make retrieval the strength, not the ceiling

Most retrieval systems plateau at finding the right context, not writing from it — fast vector search is broad but blunt. So I'd make retrieval two-stage.

Query

question
comes in

→

Vector search

top ~50
fast but blunt

→

Reranker

re-scores →
the real top 5

→

LLM

grounded
answer

^a cross-encoder, fine-tuned on our language and run in-house — the piece I've not had time to build properly yet, and the first thing I'd want to.

The Authorin the Machine