Thank you very much for sponsoring my work. Here are my highlights for this past month.
Significant model releases
- The biggest model release was the Gemini 2.5 Family: Gemini 2.5 Pro, 2.5 Flash and the new 2.5 Flash Lite. The Pro and Flash models are out of preview now and very worth exploring. 2.5 Pro is the best currently available long context model. Flash and Flash Lite are excellent choices for lower cost uses, especially for audio, video and image inputs.
- Mistral Small 3.2 and Google's Gemma 3n are two extremely impressive models you can run on a 32GB+ Mac (maybe 16GB if you don't run anything else). Both are capable of text and image inputs and text outputs, and Gemma 3n can handle audio inputs as well.
- OpenAI released o3-pro, which is very slow and very expensive. You need to feed it a lot of data to get interesting results.
- OpenAI dropped the API price of regular o3 by 80%! This makes it a very competitive model for building applications that can benefit from a "reasoning" model.
Terminal agents and coding agents
Claude Code came out in February. OpenAI Codex CLI (not the same thing as OpenAI Codex) followed in April. Last week we got Gemini CLI.
All three of the major vendors now offer something I'm calling a terminal agent - CLI utilities that can iterate on code in a loop, editing files and run their own Bash commands.
They're remarkably powerful. Gemini CLI is currently available for free and the code is open source.
Claude Code is not open source but you can intercept its API traffic to see how it works - a fascinating insight into advanced context engineering.
Three useful resources for learning more about terminal agents, all from Flask creator Armin Rronacher this month:
- Agentic Coding Recommendations - tips for getting good results out of Claude Code
- My First Open Source AI Generated Library - Armin describes how he used Claude Code to build a sloppy XML parser library for Python
- Agentic Coding: The Future of Software Development with Agents - a 37 minute YouTube talk
In other coding agent news, OpenAI Codex (the web app, not the CLI tool) is now available to ChatGPT Plus ($20/month) subscribers. I tried it out and was impressed - it's a very good implementation of the hosted "iterate on my code and file PRs" model, also seen in Google's (currently free) Jules.
Context engineering (the new prompt engineering)
A new term gained traction this month: context engineering. It's partly an attempt to lose the baggage associated with prompt engineering, and acknowledge that there's more to getting great results out of LLMs than just what you put in a prompt.
When working with LLMs, the context you give them is everything. That context includes the previous chat and response history from the current conversation, and increasingly now can include larger chunks of documents, images and content pulled in from tools.
There's lots of new context terminology being coined right now. Context rot is when longer conversations become less useful as errors and distractions accumulate in the context. Context Poisoning, Context Distraction, Context Confusion, and Context Clash are ways in which your context can go bad. Context Quarantine, Context Pruning, Context Summarization, and Context Summarization are tricks you can use to help put things right.
Six months in LLMS, illustrated by pelicans
I gave a well recieved talk at the AI World's Fair in San Francisco, reviewing the last six months of notable model releases and illustrating hem with SVGs of pelicans riding bicycles. Video, slides and extended notes are on my blog.
Prompt injection and the lethal trifecta
There was a bunch of movement on prompt injection and LLM security.
I described the "lethal trifecta": combining tools that provide access to private data, exposure to untrusted content and the ability to externally communicate can enable an attacker to steal that private data. This is an end-user concern: if you mix and match tools using MCP you may expose yourself to this attack.
This month we saw examples of lethal trifecta attacks against both Microsoft 365 Copilot and Atlassian JIRA.
Design Patterns for Securing LLM Agents against Prompt Injections is a new paper with authors from IBM, Google and more providing a very sober and informative review of patterns that can help protect against this class of attacks. I particularly appreciated their succinct definition of the core problem:
The design patterns we propose share a common guiding principle: once an LLM agent has ingested untrusted input, it must be constrained so that it is impossible for that input to trigger any consequential actions—that is, actions with negative side effects on the system or its environment.
Google Research also put out An Introduction to Google’s Approach to AI Agent Security, another relevant paper to this topic.
Tools I'm using at the moment
A few people suggested that they would appreciate and update on the tools I'm using at the moment. I plan to include this in every one of these monthly newsletters going forward.
- Claude 4 Sonnet is my default for code and most general LLM tasks, accessed via Claude.ai. I rarely upgrade to Opus.
- o3 or o4-mini-high for anything that benefits from one or more searches, run via ChatGPT
- I've been experimenting with both Gemini CLI and Claude Code as a terminal agent. Claude Code is my default here.
- GPT-4o for advanced voice mode via the ChatGPT voice app for hands free (walking the dog, cooking)
- Gemini 2.5 Pro via my LLM CLI tool for long-context work, e.g. questions about a large codebase
- I've been switching to Zed from VS Code because it is so much faster to open, and uses less RAM. I still use VS Code with GitHub Copilot a bunch as well.
- Current favourite local model: Mistral 3.2 Small (via Ollama), but Gemma 3n is a very close second.
That's it for June!
Please reply with any feedback on how I can do this better.
If this newsletter was useful feel free to forward it to friends who might find it useful too, especially if they might be convinced to sign up to sponsor me for the next one!
Thanks for your support,
Simon Willison https://simonwillison.net/
(I'm also now available for consulting calls over Zoom or similar, you can contact me at contact@simonwillison.net - plus I'm offering private presentations of my PyCon US workshop Building software on top of Large Language Models and a new workshop on Writing code with LLMs.)