Technologists often say new technology is ‘easy’ to learn – but easy for them is often hard for everyone else. We hear frequent stories of people making costly mistakes by trusting AI too much, and that can make exploring ChatGPT, AI, and LLMs feel risky. Together these drive most people to sensible caution and slowness with using AI.
But with a little understanding of how they work, you can avoid those mistakes and harness their potential. How do they work? The answer lies in a remarkable breakthrough: rather than relying on rigid instructions, these models ‘think’ in patterns, and controlling this requires no technical knowledge at all.
Terms we need to know first
‘LLM’ = ‘Large Language Model’ – this is the underlying technology driving almost everything in AI since 2022. ChatGPT is built on an LLM. Claude is built on an LLM. Apple’s new ‘Intelligent AI’ services are all built on LLMs. There are other AI techs around, with pros/cons, but LLM’s are currently dominant.
‘Semantic’ = ‘the meaning of a thing’. For instance, the word ‘bat’ has a sound, it has spelling, it is in a specific language (English). But when you read the word you have a concept in your mind – possibly an image, depending on how your own brain works – of ‘what a bat is’, or of general ‘bat-ness’. That concept is the ‘semantic’ aspect of the word. It was invented by a Historian who was describing languages, and enabled him to talk about the ‘meaning’ part of language separately from the ‘spellling’ etc.
Note
It’s often OK to use the terms ‘AI’, ‘LLM’, and ‘ChatGPT’ interchangeably – it’s incorrect since they’re all different things, but ChatGPT is an example of an LLM, and all LLMs are examples of AI. Since ChatGPT is the most famous LLM (and one of the most famous AIs) we often use the terms as shorthands for each other.
What happens when you interact with ChatGPT (or any LLM)?
ChatGPT burst into the news when researchers took their existing LLM (named “GPT-3”) and put a chatbot on the front. The combination of LLM + chatbot they named “ChatGPT”. All the other companies that create LLMs have also put chatbots on the front, so you can use an LLM by chatting to it. But what happens to the words – how does the AI understand them?
First step: ignore the words
When you give LLMs a sentence, they don’t just look at the individual words. Instead, they focus on the underlying meaning or ideas behind those words – in fact: the first thing an LLM does is to discard your words! Instead they convert your words into the semantic concepts that they think your words were referring to. They often get this slightly wrong – just like we as humans rarely understand precisely what each other is saying – but so long as we get the overall meaning/gist of it, there’s no problem.
This is deliberately imperfect: without you writing a long essay there’s no way for the LLM to know for sure that your ‘fear of bats’ refers to animals or a tragic childhood game of baseball gone wrong.
Advanced: the process of detecting the ‘semantic concepts’ isn’t a simple one-off thing (previous AI researchers tried doing that with human-authored approaches but they never worked very well); instead in an LLM it’s a multi-stage process where it detects some concepts, then uses the ones it has detected to detect others and to resolve ambiguity between some it’s not sure of.
Earlier AI systems struggled without highly specific input, but because LLMs focus on the semantics, which are less prone to change, the LLMs excel at ‘reading between the lines’. Their ability to infer meaning from context is a huge part of the innovation and success of LLMs vs traditional AI – and yet they don’t actually ‘understand’ anything, they’re just very good at focussing their attention on the words and semantics that really matter within a sentence.
Second step: consult their brain
LLMs then take that bundle of semantic concepts and pass them on to their own neural network (a simulated brain), which stores patterns. This is where almost all of the LLM’s ‘intelligence’ lies, and where all the computing power is spent. When you read about the billions of dollars spent on creating/refining LLMs its almost all going on this: creating better, more accurate, more numerous patterns inside their neural network. The more patterns in the network the more ‘intelligent’ a fake AI will appear to be.
These patterns are built from billions of examples in its training data, guiding the model to predict what ideas, words, or phrases logically follow.
A pattern might be ‘one, two, three…’ and if your input concepts contain ‘one’ and ‘two’ the pattern would suggest ‘three’. But another pattern might suggest ‘none’ (because ‘one, two, none’ is e.g. a pattern of colloquial/comedic descrtiption of someone being single then married then divorced – it’s not strictly correct, but it’s a way of describing/summarizing that occurs in human language). The LLM has a vast array of competing patterns, each of which is suggesting a future; the training of the LLM was mostly to give it both ‘discover all the patterns you can’ but also ‘develop your own appreciation for which patterns tend to be more important than others’. So for instance a well-trained LLM will tend to respond ‘three’ after ‘one, two’ because ‘counting upwards from 1’ is a more dominant / common / prevalent pattern in the human world than any other.
The LLM now has a new bundle of semantic concepts: the ones that its patterns have extrapolated for it. This is where people talk about ‘Hallucinations’ – if they don’t like the new bundle, if they feel it was ‘wrong’, they’ll call it a hallucination, to imply it’s ‘false’ or ‘incorrect’. In almost all cases they’re the ones who are wrong: they just don’t know the patterns involved. The LLM has created new content based not on ‘logic’ or ‘reasoning’ – not from ‘truth’ – but instead from patterns and guesswork. In practical terms: everything the LLM produces is a hallucination: it’s never based on truth, but rather an ‘imagining’ of the truth.
While this sounds alarming, it simply means the model is generating responses based on a pattern it saw previously, rather than recalling facts. It is very common for people both in the media and in the technology industry to assume ‘hallucinations must be bad’ and to talk about ‘removing’ or ‘preventing’ them as if it were a good thing. But removing what they call hallucinations would also remove the core value of an LLM: extrapolation of patterns to guess/invent/suggest new concepts.
Final step: from semantics to sentences
How does the LLM share its new bundle of concepts – its hallucination – with the world? This is where the chatbot comes in to play again: it translates its understanding back into a set of sentences. There isn’t one ‘perfect’ way to do this, so the LLM constructs a response that reflects the ideas it formed. Note that the LLM acts like a human whose head is currently filled with ideas, and is eager to share them – in one monologue it tries to convey all of those ideas at once.
A pure LLM ends at this point; there is nothing more to do. When you have a ‘conversation’ with an LLM, via a Chatbot, the Chatbot is actually faking the conversation by repeating steps 1-3, and giving the LLM the previous conversation as ‘extra context to bear in mind’. This is why it’s very common for people using ChatGPT to find that during a conversation the AI seems to ‘lose focus’, or its ‘mind seems to wander’ — it’s because it’s re-reading the entire conversation each time and trying to summarise that whole converstaion to generate the set of semantic concepts it will work with in step 2.
Summary
In summary:
Words -> Concepts -> (transformed by Patterns) -> Concepts -> Words
That is all an LLM does. This simple 5-stage flow tells you everything you need to know in order to use AI safely, effectively, and powerfully – we can draw some obvious conclusions from it, and cut away a lot of the misinformation shared across the internet and technology world.
Everything else you’ve heard about is either additional technology bolted-on (e.g. chatbots), or misunderstandings and misinterpretations.
Counter-Summary
For a more literal, bottom-up, view of what an LLM does and how it works – but similarly: simple, digestible, quick to understand – read Ethan Mollick’s piece on “Thinking Like an AI”. (Ethan combines academic knowledge of AI with a large amount of pragmatic experience of using LLMs frequently himself – and posts frequently about interesting innovations and discoveries in the space)
Moving ahead yourself
The best way to understand AI is to explore it hands-on. I know that many people feel unsure about where to start, so I’ve written an article on creating Custom GPTs—a beginner-friendly guide to building simple, practical AIs that you can share with others immediately:
Custom GPTs are easy to work with, and I often find myself creating new ones just for fun. You can start from scratch and have a working model in no time. I encourage you to try it out!
I’ve also gathered a range of AI/LLM experiments I’ve worked on over the last two years, all available here on the site. These examples highlight the wide range of possibilities with AI, and (going forwards) I’m now adding detailed steps for each new one, so that you can follow along and experiment on your own.
Lastly, if you’re interested in staying updated, feel free to subscribe to the mailing list (below). I send out a few emails each month, sharing useful insights, new techniques, and updates on what’s happening in the AI space. It’s a great way to keep learning and stay informed as things evolve.
Want to see a live example of using an AI behind the scenes? This article was written by me then edited by an LLM. Subscribers to the mailing list get the full set of drafts and prompts I used.
The next article will dive into some of the most common misunderstandings and show how we can easily see for ourselves both why they’re wrong, and at what point they went off the rails – and so also see ourselves how we could correct their mistake.
Credits
Extra thanks to Renjith Nair (LinkedIn) and Sergey Shchegrikovich (LinkedIn) for helping improve this article.