claude-3.7-sonnet with thinking, refs #14 #15

simonw · 2025-02-24T19:34:26Z

Refs:

Support Claude 3.7 Sonnet, including its new thinking mode #14

Needed:

Design for the options - thinking, thinking_budget, show_thinking and thinking_delimiter
Sync streaming mode
Actually ditch show_thinking and thinking_delimiter for the moment - see Support Claude 3.7 Sonnet, including its new thinking mode #14 (comment)
Sync no-streaming mode
Async streaming mode
Async no-streaming mode
Tests
Docs, including Python docs on how to access thinking tokens via JSON
"Include the beta header output-128k-2025-02-19 in your API request to increase the maximum output token length to 128k tokens for Claude 3.7 Sonnet" from https://docs.anthropic.com/en/docs/about-claude/models/all-models#model-comparison-table

simonw · 2025-02-24T20:28:23Z

Published some initial notes on this here, including the all-important pelicans riding bicycles: https://simonwillison.net/2025/Feb/24/claude-37-sonnet-and-claude-code/

simonw · 2025-02-24T20:29:00Z

Maybe I add output-128k-2025-02-19 if the user specifies -o max_tokens X and X is greater than 8000?

It would be nice to have a -o max_tokens max option too.

simonw · 2025-02-25T01:10:50Z

From here's a challenging detail: https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking

In multi-turn conversations, only thinking blocks associated with a tool use session or assistant turn in the last message position are visible to Claude and are billed as input tokens; thinking blocks associated with earlier assistant messages are not visible to Claude during sampling and do not get billed as input tokens.

It's important NOT to store thinking tokens in the response column in the database that's used to construct the next message, since we don't actually want to round-trip those tokens as part of a conversation.

(I mean maybe we do in special cases, but I'll leave that for LLM users to decide.)

simonw · 2025-02-25T01:11:52Z

... this means my current design isn't quite right. Having a -o show_thinking 1 option which causes the thinking tokens to be output as if they were text tokens means the rest of LLM will get confused and long them incorrectly.

I may need to ship a change to core to support this plugin after all.

simonw · 2025-02-25T01:13:28Z

Here's an explanation of that signature field:

It is only strictly necessary to send back thinking blocks when using tool use with extended thinking. Otherwise you can omit thinking blocks from previous turns, or let the API strip them for you if you pass them back.

simonw · 2025-02-25T01:15:29Z

And I need to consider this edge-case (I wonder what it looks like in the streaming chunks API?):

Occasionally Claude's internal reasoning will be flagged by our safety systems. When this occurs, we encrypt some or all of the thinking block and return it to you as a redacted_thinking block. These redacted thinking blocks are decrypted when passed back to the API, allowing Claude to continue its response without losing context.

It does at least provide this example:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "redacted_thinking",
      "data": "EmwKAhgBEgy3va3pzix/LafPsn4aDFIT2Xlxh0L5L8rLVyIwxtE3rAFBa8cr3qpP..."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

simonw · 2025-02-25T01:20:26Z

I ran this, after adding code to pprint(chunk) for the streaming API:

llm -m claude-3.7-sonnet -o show_thinking 1 'ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB'

And got this: https://gist.github.com/simonw/730a9217d981f0e5c85b4e42b1bf983e

Truncated:

{'message': {'content': [],
             'id': 'msg_01SYnm2Viwe6oNwSENaH9b63',
             'model': 'claude-3-7-sonnet-20250219',
             'role': 'assistant',
             'stop_reason': None,
             'stop_sequence': None,
             'type': 'message',
             'usage': {'cache_creation_input_tokens': 0,
                       'cache_read_input_tokens': 0,
                       'input_tokens': 92,
                       'output_tokens': 2}},
 'type': 'message_start'}
{'content_block': {'data': 'Eu0BCoYBGAIiQMuYgm7sHFe8J3ZAfOyGFpDsYykC7tAUCL89kxH0XKS49Q1D8KN7YE3VdgVE+pBGVQZvSb8P+MNVvKJo6xfp+a8qQLoL/lA1w6rZDUAe0HF1ahXUTb5tvbgBEt3n8lP9/b4Am1CxXzk+MJ7xyiKYYViNGMRyHtNuHpjQeb8jqruasBgSDGUZqtFFfSl9l+kp1hoM38iaGEBZSdTxha7ZIjDgv8QJwwBS/p3tMPTWrxZ/6dRVzyzJAVb9wRmgAi5DjKy/qBRQirEO09zkioKSlYkqFJQYmVkPRZdNrobr/BIb21CSruJ0',
                   'type': 'redacted_thinking'},
 'index': 0,
 'type': 'content_block_start'}
{'content_block': {'data': 'Eu0BCoYBGAIiQMuYgm7sHFe8J3ZAfOyGFpDsYykC7tAUCL89kxH0XKS49Q1D8KN7YE3VdgVE+pBGVQZvSb8P+MNVvKJo6xfp+a8qQLoL/lA1w6rZDUAe0HF1ahXUTb5tvbgBEt3n8lP9/b4Am1CxXzk+MJ7xyiKYYViNGMRyHtNuHpjQeb8jqruasBgSDGUZqtFFfSl9l+kp1hoM38iaGEBZSdTxha7ZIjDgv8QJwwBS/p3tMPTWrxZ/6dRVzyzJAVb9wRmgAi5DjKy/qBRQirEO09zkioKSlYkqFJQYmVkPRZdNrobr/BIb21CSruJ0',
                   'type': 'redacted_thinking'},
 'index': 0,
 'type': 'content_block_stop'}
{'content_block': {'data': 'EuUCCoYBGAIiQOkS6N2nZQOxRsm74E7XiLOWrCTXwyLEZKAo3i36YlMVfY0GNQPbUW8oMxQA6b3EEdJA68HBhsX7kcjbXiO8DHAqQFDeq8/O09hyaB2CX/5YyryfFVl0dYnMsmk3scqmqg9tH89i0wIki53eLoanDhAk7HN+tdbdp3V1Ts4eu42jus4SDOyd51dPT15lbnLIkBoMxsO7tto6x3oVXbzMIjBLA/RMRAV0HGD/YdLbktB6FmzJFTf2iJzOs4z8HBGX/l/VTGJlxA/y9Fve8SFtKnIqiwFTF+5HtHBvGOE77P3gR3Hu9JfPqmN/9S2BdbCWf8mu1sfSCd0oKtSR2mkMADBZp+3cYvcnjE77rtHfkhj8qd0KAG76OwXxv6Fmo9V8Od8Cwb+/ETc5P6kSFyaOaEvnlTqYr+jI0+5M16igvfkaLFxdGurbONMc5eh6Q40xCsAPltULFBwDsziwfrv0',
                   'type': 'redacted_thinking'},
 'index': 1,
 'type': 'content_block_start'}
{'content_block': {'data': 'EuUCCoYBGAIiQOkS6N2nZQOxRsm74E7XiLOWrCTXwyLEZKAo3i36YlMVfY0GNQPbUW8oMxQA6b3EEdJA68HBhsX7kcjbXiO8DHAqQFDeq8/O09hyaB2CX/5YyryfFVl0dYnMsmk3scqmqg9tH89i0wIki53eLoanDhAk7HN+tdbdp3V1Ts4eu42jus4SDOyd51dPT15lbnLIkBoMxsO7tto6x3oVXbzMIjBLA/RMRAV0HGD/YdLbktB6FmzJFTf2iJzOs4z8HBGX/l/VTGJlxA/y9Fve8SFtKnIqiwFTF+5HtHBvGOE77P3gR3Hu9JfPqmN/9S2BdbCWf8mu1sfSCd0oKtSR2mkMADBZp+3cYvcnjE77rtHfkhj8qd0KAG76OwXxv6Fmo9V8Od8Cwb+/ETc5P6kSFyaOaEvnlTqYr+jI0+5M16igvfkaLFxdGurbONMc5eh6Q40xCsAPltULFBwDsziwfrv0',
                   'type': 'redacted_thinking'},
 'index': 1,
 'type': 'content_block_stop'}
...

simonw · 2025-02-25T01:21:59Z

More important tips - looks like preserving those thinking blocks really is critical:

When passing thinking and redacted_thinking blocks back to the API in a multi-turn conversation, you must include the complete unmodified block back to the API for the last assistant turn.

This is critical for maintaining the model's reasoning flow. We suggest always passing back all thinking blocks to the API. For more details, see the Preserving thinking blocks section below.

simonw · 2025-02-25T01:23:38Z

These redacted blocks are not human readable, so it doesn't make sense to output them in the LLM CLI even if the user has requested seeing them.

Anthropic suggest you could tell people "Some of Claude’s internal reasoning has been automatically encrypted for safety reasons" but I'd rather not do that in the default CLI output. Users can see that in llm logs -c --json if they really need to.

simonw · 2025-02-25T01:26:14Z

More edge-cases:

Thinking isn't compatible with temperature, top_p, or top_k modifications as well as forced tool use.

You cannot pre-fill responses when thinking is enabled.

I could enforce these in LLM options checking code but I think I'll let the Claude API complain about them with an error instead, that way if those limitations change in the future I won't have to update the plugin.

simonw · 2025-02-25T06:43:45Z

Abandoning this in favor of the much simpler:

Simpler implementation of 3.7 Sonnet #18

I'll come back to visible thinking tokens later on, probably after I make some changes to LLM core.

WIP claude-3.7-sonnet with thinking, refs #14

5a8a3e3

simonw added the enhancement New feature or request label Feb 24, 2025

simonw mentioned this pull request Feb 24, 2025

Support Claude 3.7 Sonnet, including its new thinking mode #14

Closed

simonw added 4 commits February 24, 2025 11:36

Get other models working again

adaf7f9

Claude 3.7 is 8000 output tokens

74cad48

Ran black

6bd4c27

Reuse thought streaming logic for both sync and async

e7eab79

simonw mentioned this pull request Feb 25, 2025

Every response logs the max_tokens and temperature #16

Closed

No need to use extra_body after the Anthropic library upgrade

d85ed44

simonw closed this Feb 25, 2025

simonw mentioned this pull request Feb 25, 2025

Mechanism for visible thinking tokens simonw/llm#770

Open

RKeelan mentioned this pull request Mar 29, 2025

Add Response.thoughts() simonw/llm#867

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

claude-3.7-sonnet with thinking, refs #14 #15

claude-3.7-sonnet with thinking, refs #14 #15

Uh oh!

simonw commented Feb 24, 2025 •

edited

Loading

Uh oh!

simonw commented Feb 24, 2025

Uh oh!

simonw commented Feb 24, 2025 •

edited

Loading

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

simonw commented Feb 25, 2025 •

edited

Loading

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

claude-3.7-sonnet with thinking, refs #14 #15

claude-3.7-sonnet with thinking, refs #14 #15

Uh oh!

Conversation

simonw commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented Feb 24, 2025

Uh oh!

simonw commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

simonw commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

simonw commented Feb 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

simonw commented Feb 24, 2025 •

edited

Loading

simonw commented Feb 24, 2025 •

edited

Loading

simonw commented Feb 25, 2025 •

edited

Loading