Migrate from Claude 3.5 Sonnet to Claude 4 Sonnet on Amazon Bedrock

TL;DR

Claude 4 Sonnet is now available on Amazon Bedrock, with a deprecation timeline announced for Claude 3.5 Sonnet (v1 and v2).
Migration requires careful planning: choose between the InvokeModel API or the unified Converse API, and consider Cross-Region Inference (CRIS) to improve throughput.
Extended thinking and interleaved thinking are available options, but come with cost and latency considerations; use them strategically.
Validate performance with a custom regression suite, and deploy using a phased rollout (shadow testing, canary, or blue/green) to protect production.
Review model prompts, guardrails, and CI/CD evaluation pipelines before production adoption.

Context and background

This post, co-written with Gareth Jones from Anthropic, notes that Anthropic’s Claude 4 Sonnet has launched on Amazon Bedrock, presenting a notable step forward in foundation model capabilities. The ongoing deprecation timeline for Claude 3.5 Sonnet (v1 and v2) creates a dual imperative for production AI applications: to harness enhanced performance while migrating before service deprecation. The broader takeaway is that model migrations should be treated as a core component of AI inference strategy, since poor execution can lead to service disruption, performance regressions, and cost overruns. The article provides a systematic approach to migrate from Claude 3.5 Sonnet to Claude 4 Sonnet on Bedrock, including model differences, migration considerations, and best practices designed to turn the upgrade into measurable value for organizations. Understanding the model changes is the first step in planning a successful migration. The Claude Sonnet 4 version introduces capability and behavioral shifts that can be leveraged in production. For a detailed comparison, refer to the Complete Model Comparison Guide referenced in the post. A successful migration also requires attention to technical and strategic considerations to minimize risk and accelerate production deployment. Before using Claude 4 Sonnet on Bedrock, access to the model must be enabled in the Bedrock console. Review and accept the model’s End User License Agreement (EULA) as part of the access request. Availability can vary by AWS Region, so verify Region support in the Model support by AWS Region in Amazon Bedrock and related guidance. Cross-Region Inference (CRIS) is supported by selecting an inference profile to improve throughput and resource availability across regions.

What’s new

The migration introduces several changes you can take advantage of:

Claude 4 Sonnet on Bedrock demonstrates improved instruction following and precision in alignment with model-specific best practices. Prompts that worked well on Claude 3.5 may require adaptation for Claude 4, and users should consult Claude 4 prompt engineering guidance.
Claude 4 Sonnet is designed to follow instructions more precisely and may be less verbose unless explicitly prompted to elaborate. This can affect the perceived style of responses and may necessitate adjustments to system prompts and persona definitions.
Prompts commonly benefit from XML-like structure to clearly separate input sections, as recommended by Claude 4 prompt engineering practices. This helps ensure reliable results under stricter instruction adherence.
Extended thinking is a built-in capability for Claude 4 Sonnet. You can enable deep, multi-step reasoning by including the thinking keyword argument in your API call. Reasoning tokens are billed as output tokens at standard model rates, and the total thinking process is charged in full, not just the visible summary.
To enable extended thinking, use the Converse API and set additionalModelRequestFields with the thinking configuration, including budget_tokens for the maximum thinking tokens. The maxTokens value should be larger than budget_tokens for extended thinking.
Interleaved thinking for tool calls is available by adding the Anthropic beta parameter interleaved-thinking-2025-05-14 to additionalModelRequestFields in the Converse API. This enables intermediate reasoning between tool calls for more nuanced results.
When migrating, you can continue to use the InvokeModel API with a simple modelId update if you prefer a straightforward transition; however, the Converse API provides a standardized request/response format that will ease future migrations across models or providers.
A model’s safety profile changes with each version. Test the new model with your production guardrails configured identically to your current setup, and plan a phased rollout to minimize risk.

Why it matters (impact for developers/enterprises)

For developers and enterprises, migrating to Claude 4 Sonnet offers the potential for improved accuracy, better alignment with instructions, and more robust tool-assisted reasoning. However, the upgrade also introduces cost and latency considerations due to extended thinking and larger thinking budgets. Organizations should adopt a structured migration approach to protect production continuity, engage in early benchmarking against task-specific datasets, and ensure guardrails align with the new model behavior. The migration is framed as an engineering project: plan, test, and automate evaluation, then validate with production-like traffic. A curated prompt suite, integrated into CI/CD workflows, helps ensure regression coverage as model and prompt changes roll out. Bedrock evaluations and open-source evaluation frameworks (e.g., RAGAS, DeepEval) are mentioned as tooling options to support automated assessments.

Technical details or Implementation

Access and availability

Before use, enable access to Claude 4 Sonnet in your Amazon Bedrock account. Review and accept the EULA during the access request.
Confirm Claude 4 Sonnet is available in your target AWS Region, since model support can vary by location. For current Region availability, consult the AWS Bedrock region guidance and the model support lists.
Cross-Region Inference (CRIS) can be used to improve throughput and resource availability by specifying an inference profile in a source or target region. APIs for migration
InvokeModel API: a straightforward migration path where you update the modelId in your existing code while preserving the Messages API structure. If you use a CRIS profile, specify the correct inference profile ID in the source region (for example, us.anthropic.claude-sonnet-4-20250514-v1:0).
Converse API: a recommended path to standardize request/response formats across models and providers, making future migrations easier. You can switch to Converse while migrating, and implement extended thinking and advanced configurations through the additionalModelRequestFields parameter.
When using CRIS, ensure you specify the appropriate inference profile in your source region to optimize throughput. Extended thinking and tool use
Extended thinking enables deep, multi-step reasoning. To enable it, pass the thinking configuration via additionalModelRequestFields and set budget_tokens to limit the number of thinking tokens. maxTokens must be larger than budget_tokens.
Extended thinking increases costs because reasoning tokens are billed as output tokens; it can also affect streaming response latency.
For many tasks, a well-crafted chain-of-thought prompt remains an efficient approach. When deep reasoning is not required, disable extended thinking to optimize latency and cost.
Interleaved thinking for tool calls allows intermediate reasoning between tool results. Enable this by adding the anthopic_beta parameter with interleaved-thinking-2025-05-14 to additionalModelRequestFields in the Converse API request. Prompt design and evaluation
Do not assume that prompts from Claude 3.5 will work as-is for Claude 4; adopt model-specific best practices and consider structured prompts with explicit sections and XML-like tags for clarity.
Build a curated set of prompts and expected outputs representative of production traffic. Integrate this dataset into your CI/CD evaluation pipeline and use Bedrock evaluations or open-source evaluation frameworks (RAGAS, DeepEval) to measure performance and guardrails.
A model’s safety profile changes with version updates. Validate guardrails and safety configurations in tandem with functional tests; do not test the new model in isolation. rollout and risk management
Use a phased rollout strategy to minimize risk: shadow testing with mirrored traffic, followed by A/B testing to measure business KPIs.
For actual deployment, consider canary releases (gradually expose a small user fraction) or blue/green deployments that preserve parallel environments for quick rollback. Benchmarking and CI/CD
Create a production-representative benchmark suite and integrate it into your CI/CD pipeline. Automated evaluations help track regressions across model and prompt changes. Notes on the content and collaboration
This migration guidance is co-authored by Melanie Li, PhD, AWS Senior Generative AI Specialist Solutions Architect, and Deepak Dalakoti, PhD, AWS Deep Learning Architect, with inputs from Anthropic.
For more details, refer to the AWS blog post: https://aws.amazon.com/blogs/machine-learning/migrate-from-anthropics-claude-3-5-sonnet-to-claude-4-sonnet-on-amazon-bedrock/.

Key takeaways

Claude 4 Sonnet on Bedrock enables new capabilities but requires proactive planning and benchmarking before migration.
You can migrate using InvokeModel or Converse API; CRIS can optimize throughput across regions.
Extended thinking and interleaved thinking are powerful but cost- and latency-sensitive features; use intentionally.
Build automated evaluation pipelines and adopt phased rollout to protect production and measure business impact.
Align prompts, guardrails, and CI/CD tests with the new model to avoid regressions and ensure safety.

FAQ

What is the main purpose of this migration guide?

It provides a systematic approach to migrating from Claude 3.5 Sonnet to Claude 4 Sonnet on Amazon Bedrock, covering model differences, access, API choices, extended thinking, evaluation, and rollout.
Which APIs can I use for migration on Bedrock?

You can use either the model-specific InvokeModel API with a modelId update or the unified Converse API for a standardized request/response format. CRIS can be used with either path to improve throughput.
What is extended thinking and how should I use it?

Extended thinking enables deep, multi-step reasoning by configuring thinking in the API call. It incurs additional costs because thinking tokens are billed as output tokens, and can impact response time. It should be used for tasks requiring deep analysis and not for simple queries.
How should I approach rollout and testing?

Implement a phased rollout with shadow testing, canary or blue/green deployment, and A/B testing to measure business KPIs. Validate the new model against production guardrails and use automated evaluation pipelines.
What practical prompts guidance exists for Claude 4 Sonnet?

Claude 4 Sonnet tends to follow instructions more precisely and may be less verbose unless prompted otherwise; prompts may benefit from an XML-like structure to clearly separate input parts.