How Amazon scaled Rufus by building multi-node inference using AWS Trainium chips and vLLM
Seeded from: AWS ML Blog In this post, Amazon shares how they developed a multi-node inference solution for Rufus, their generative AI shopping assistant, using Amazon Trainium chips and vLLM to serve large language models at scale. The solution combines a leader/follower orchestration model, hybrid parallelism strategies, Read more: https://aws.amazon.com/blogs/machine-learning/how-amazon-scaled-rufus-by-building-multi-node-inference-using-aws-trainium-chips-and-vllm/
More news
Anthropic endurece las reglas de uso de Claude ante un panorama de IA más peligroso
Anthropic prohíbe ayudar a desarrollar armas CBRN y explosivos de alto rendimiento, añade restricciones de ciberseguridad, ajusta su política política y aclara requisitos de alto riesgo.
Build a scalable containerized web application on AWS using the MERN stack with Amazon Q Developer – Part 1
In a traditional SDLC, a lot of time is spent in the different phases researching approaches that can deliver on requirements: iterating over design changes, writing, testing and reviewing code, and configuring infrastructure. In this post, you learned about the experience and saw productivity gains
Building a RAG chat-based assistant on Amazon EKS Auto Mode and NVIDIA NIMs
In this post, we demonstrate the implementation of a practical RAG chat-based assistant using a comprehensive stack of modern technologies. The solution uses NVIDIA NIMs for both LLM inference and text embedding services, with the NIM Operator handling their deployment and management. The architectu
GPT-5 decepcionó las expectativas pero mejoró costo, velocidad y capacidad de programación
El lanzamiento de GPT-5 recibió reacciones mixtas: mejoras incrementales en benchmarks, reducción de costos y latencia, mejor desempeño en programación, pero críticas al tono de escritura y errores inesperados.
Introducing Amazon Bedrock AgentCore Gateway: Transforming enterprise AI agent tool development
In this post, we discuss Amazon Bedrock AgentCore Gateway, a fully managed service that revolutionizes how enterprises connect AI agents with tools and services by providing a centralized tool server with unified interface for agent-tool communication. The service offers key capabilities including S
Introducing Amazon Bedrock AgentCore Identity: Securing agentic AI at scale
In this post, we explore Amazon Bedrock AgentCore Identity, a comprehensive identity and access management service purpose-built for AI agents that enables secure access to AWS resources and third-party tools. The service provides robust identity management features including agent identity director