🦄 ai that works

Here's the updated README.md with the requested changes:

🦄 ai that works

a weekly conversation about how we can all get the most juice out of todays models with @hellovai & @dexhorthy

📅 event calendar - https://lu.ma/baml

Discord - https://boundaryml.com/discord

every tuesday at 10 am pst on zoom.

1 hour of live code, q&a with some prepped content to help you take your ai app from a demo to production.

lets code together 🧑‍💻

pre-reading

to prevent repeating the basics, we recommend you come in having already understanding some of the tooling we will be using:

zoom
cursor (a vscode alternative)
programming languages
- application logic: python or typescript or go (depends on session)
- prompting: baml
  - repo: github.com/boundaryml/baml
  - recommend getting started video
- package managers of choice:
  - Python - UV
  - Typescript - PNPM

sessions

topic	description
2025-07-29 #16: Evaluating Prompts Across Models RSVP	AI That Works #16 will be a super-practical deep dive into real-world examples and techniques for evaluating a single prompt against multiple models. While this is a commonly heralded use case for Evals, e.g. "how do we know if the new model is better" / "how do we know if the new model breaks anything", there's not a ton of practical examples out there for real-world use cases.
2025-07-22 #15: PDFs, Multimodality, Vision Models youtube • code • PAST	Dive deep into practical PDF processing techniques for AI applications. We'll explore how to extract, parse, and leverage PDF content effectively in your AI workflows, tackling common challenges like layout preservation, table extraction, and multi-modal content handling.
2025-07-15 #14 Implementing Decaying-Resolution Memory youtube • code • PAST	Last week on #13, we did a conceptual deep dive on context engineering and memory - this week, we're going to jump right into the weeds and implement a version of Decaying-Resolution Memory that you can pick up and apply to your AI Agents today. For this episode, you'll probably want to check out episode #13 in the session listing to get caught up on DRM and why its worth building from scratch.
2025-07-08 #13: Building AI with Memory & Context youtube • code • PAST	How do we build agents that can remember past conversations and learn over time? We'll explore memory and context engineering techniques to create AI systems that maintain state across interactions.
2025-07-01 #12: Boosting AI Output Quality youtube • code • PAST	This week's session was a bit meta! We explored "Boosting AI Output Quality" by building the very AI pipeline that generated this email from our Zoom recording. The real breakthrough: separating extraction from polishing for high-quality AI generation.
2025-06-24 #11: Building an AI Content Pipeline youtube • code • PAST	Content creation involves a lot of manual work - uploading videos, sending emails, and other follow-up tasks that are easy to drop. We'll build an agent that integrates YouTube, email, GitHub and human-in-the-loop to fully automate the AI that Works content pipeline, handling all the repetitive work while maintaining quality.
2025-06-17 #10: Entity Resolution: Extraction, Deduping, and Enriching youtube • code • PAST	Disambiguating many ways of naming the same thing (companies, skills, etc.) - from entity extraction to resolution to deduping. We'll explore breaking problems into extraction → resolution → enrichment stages, scaling with two-stage designs, and building async workflows with human-in-loop patterns for production entity resolution systems.
2025-06-10 #9: Cracking the Prompting Interview youtube • code • PAST	Ready to level up your prompting skills? Join us for a deep dive into advanced prompting techniques that separate good prompt engineers from great ones. We'll cover systematic prompt design, testing tools / inner loops, and tackle real-world prompting challenges. Perfect prep for becoming a more effective AI engineer.
2025-06-03 #8: Humans as Tools: Async Agents and Durable Execution youtube • code • PAST	Agents are great, but for the most accuracy-sensitive scenarios, we some times want a human in the loop. Today we'll discuss techniques for how to make this possible. We'll dive deep into concepts from our 4/22 session on 12-factor agents and extend them to handle asynchronous operations where agents need to contact humans for help, feedback, or approvals across a variety of channels.
2025-05-27 #7: 12-factor agents: selecting from thousands of MCP tools youtube • code • PAST	MCP is only as great as your ability to pick the right tools. We'll dive into showing how to leverage MCP servers and accurately use the right ones when only a few have actually relevant tools.
2025-05-20 #6: Policy to Prompt: Evaluating w/ the Enron Emails Dataset youtube • code • PAST	one of the most common problems in AI engineering is looking at a set of policies / rules and evaluating evidence to determine if the rules were followed. In this session we'll explore turning policies into prompts and pipelines to evaluate which emails in the massive enron email dataset violated SEC and Sarbanes-Oxley regulations.
2025-05-13 #5: evals evals evals youtube • code • RSVP	stay tuned for our season 2 kickoff topic on minimalist and high-performance testing/evals for LLM applications
Break	We had a great time doing the first four episodes of AI that Works - we'll see y'all may 13th for season 2!
2025-04-22 #4: twelve factor agents Youtube • code • PAST	learn how to build production-ready AI agents using the twelve-factor methodology. we'll cover the core concepts and build a real agent from scratch.
2025-04-15 #3: code generation with small models Youtube • code • PAST	large models can do a lot, but so can small models. we'll discuss techniques for how to leverge extremely small models for generating diffs and making changes in complete codebases.
2025-04-08 #2: reasoning models vs reasoning prompts youtube • code • PAST	models can reason but you can also reason within a prompt. which technique wins out when and why? we'll find out by adding reasoning to a chat bot that generates complex cypher/sql queries.
2025-03-31 #1: large scale classification youtube • code • PAST	llms are great at classification from 5, 10, maybe even 50 categories. but how do we deal with situations when we have over 1000? perhaps its an ever changing list of categories?

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
.claude/commands		.claude/commands
.vscode		.vscode
2025-03-31-large-scale-classification		2025-03-31-large-scale-classification
2025-04-07-reasoning-models-vs-prompts		2025-04-07-reasoning-models-vs-prompts
2025-04-15-code-generation-small-models		2025-04-15-code-generation-small-models
2025-04-22-twelve-factor-agents		2025-04-22-twelve-factor-agents
2025-05-10-workshop-nyc-twelve-factor-agents		2025-05-10-workshop-nyc-twelve-factor-agents
2025-05-13-designing-evals		2025-05-13-designing-evals
2025-05-17-workshop-sf-twelve-factor-agents		2025-05-17-workshop-sf-twelve-factor-agents
2025-05-20-policies-to-prompts		2025-05-20-policies-to-prompts
2025-05-27-mcp-with-10000-tools		2025-05-27-mcp-with-10000-tools
2025-06-03-humans-as-tools-async		2025-06-03-humans-as-tools-async
2025-06-10-cracking-the-prompting-interview		2025-06-10-cracking-the-prompting-interview
2025-06-17-entity-extraction		2025-06-17-entity-extraction
2025-06-24-ai-content-pipeline		2025-06-24-ai-content-pipeline
2025-07-01-ai-content-pipeline-2		2025-07-01-ai-content-pipeline-2
2025-07-08-context-engineering		2025-07-08-context-engineering
2025-07-15-decaying-resolution-memory		2025-07-15-decaying-resolution-memory
2025-07-22-multimodality		2025-07-22-multimodality
2025-07-29-eval-many-models-same-prompt		2025-07-29-eval-many-models-same-prompt
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🦄 ai that works

📅 event calendar - https://lu.ma/baml

Discord - https://boundaryml.com/discord

pre-reading

sessions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

hellovai/ai-that-works

Folders and files

Latest commit

History

Repository files navigation

🦄 ai that works

📅 event calendar - https://lu.ma/baml

Discord - https://boundaryml.com/discord

pre-reading

sessions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages