How Amazon scaled Rufus by building multi-node inference using AWS Trainium chips and vLLM

Seeded from: AWS ML Blog In this post, Amazon shares how they developed a multi-node inference solution for Rufus, their generative AI shopping assistant, using Amazon Trainium chips and vLLM to serve large language models at scale. The solution combines a leader/follower orchestration model, hybrid parallelism strategies, Read more: https://aws.amazon.com/blogs/machine-learning/how-amazon-scaled-rufus-by-building-multi-node-inference-using-aws-trainium-chips-and-vllm/

More news

15 ago 2025 theverge.com

Anthropic endurece las reglas de uso de Claude ante un panorama de IA más peligroso

Anthropic prohíbe ayudar a desarrollar armas CBRN y explosivos de alto rendimiento, añade restricciones de ciberseguridad, ajusta su política política y aclara requisitos de alto riesgo.

news

15 ago 2025 aws.amazon.com

Build a scalable containerized web application on AWS using the MERN stack with Amazon Q Developer – Part 1

In a traditional SDLC, a lot of time is spent in the different phases researching approaches that can deliver on requirements: iterating over design changes, writing, testing and reviewing code, and configuring infrastructure. In this post, you learned about the experience and saw productivity gains

news

15 ago 2025 aws.amazon.com

Building a RAG chat-based assistant on Amazon EKS Auto Mode and NVIDIA NIMs

In this post, we demonstrate the implementation of a practical RAG chat-based assistant using a comprehensive stack of modern technologies. The solution uses NVIDIA NIMs for both LLM inference and text embedding services, with the NIM Operator handling their deployment and management. The architectu

news

15 ago 2025 theverge.com

GPT-5 decepcionó las expectativas pero mejoró costo, velocidad y capacidad de programación

El lanzamiento de GPT-5 recibió reacciones mixtas: mejoras incrementales en benchmarks, reducción de costos y latencia, mejor desempeño en programación, pero críticas al tono de escritura y errores inesperados.

news

15 ago 2025 aws.amazon.com

Introducing Amazon Bedrock AgentCore Gateway: Transforming enterprise AI agent tool development

In this post, we discuss Amazon Bedrock AgentCore Gateway, a fully managed service that revolutionizes how enterprises connect AI agents with tools and services by providing a centralized tool server with unified interface for agent-tool communication. The service offers key capabilities including S

news

15 ago 2025 aws.amazon.com

Introducing Amazon Bedrock AgentCore Identity: Securing agentic AI at scale

In this post, we explore Amazon Bedrock AgentCore Identity, a comprehensive identity and access management service purpose-built for AI agents that enables secure access to AWS resources and third-party tools. The service provides robust identity management features including agent identity director

news