Skip to content
Introducing auto scaling on Amazon SageMaker HyperPod with Karpenter
Source: aws.amazon.com

Introducing auto scaling on Amazon SageMaker HyperPod with Karpenter

Sources: https://aws.amazon.com/blogs/machine-learning/introducing-auto-scaling-on-amazon-sagemaker-hyperpod, https://aws.amazon.com/blogs/machine-learning/introducing-auto-scaling-on-amazon-sagemaker-hyperpod/, AWS ML Blog

TL;DR

  • SageMaker HyperPod now supports managed node auto scaling powered by Karpenter for its EKS clusters.
  • The solution is service-managed, removing the need to install or maintain Karpenter controllers yourself and enabling scale-to-zero behavior.
  • When integrated with continuous provisioning and optional KEDA, you get a two-tier auto scaling architecture that scales pods and underlying nodes to match real-time demand.
  • The approach emphasizes resilience, rapid response to traffic spikes, and cost efficiency for large-scale ML workloads.
  • Prerequisites include quotas, an IAM role for Karpenter, and enabling continuous provisioning in a SageMaker HyperPod EKS cluster.

Context and background

Amazon SageMaker HyperPod provides a high-performance, resilient infrastructure tailored for large-scale ML workloads, including both training and deployment at scale. As more customers move from training foundation models to inference at scale, there is a greater need to automatically adapt GPU resources to changing production traffic while maintaining SLAs and controlling costs. Karpenter is an open source Kubernetes node lifecycle manager created by AWS and is a popular choice for cluster auto scaling due to its ability to rapidly provision and de-provision nodes and optimize capacity. Historically, customers running self-managed Karpenter faced operational overhead to install, configure, and maintain Karpenter components. SageMaker HyperPod now offers a managed Karpenter-based auto scaling experience that removes that heavy lifting, while providing tighter integration with the resilience and observability features of HyperPod. A key capability in this release is scale-to-zero support, which reduces the need to keep a dedicated Karpenter controller running when there is no compute demand, improving cost efficiency. The combination of SageMaker HyperPod’s purpose-built ML infrastructure with Karpenter’s node lifecycle management delivers a resilient, scalable platform designed for large ML workloads. Customers like Perplexity, HippocraticAI, H.AI, and Articul8 are already using SageMaker HyperPod for training and deploying models, and as traffic patterns grow, automatic scaling becomes essential to meet real production demands.

What’s new

This launch introduces a fully managed Karpenter-based auto scaling solution that is installed and maintained by SageMaker HyperPod. It enables auto scaling of SageMaker HyperPod clusters from static capacity to a dynamic, cost-optimized infrastructure that scales with demand. The key components include:

  • A managed Karpenter deployment integrated with SageMaker HyperPod EKS clusters, eliminating the need for customer-side Karpenter setup and maintenance.
  • Continuous provisioning capabilities that allow HyperPod to automatically provision remaining capacity in the background while workloads start on available instances.
  • Automatic retries for provisioning failures due to capacity constraints, so scaling operations remain resilient and non-blocking.
  • HyperpodNodeClass customization, which maps to pre-created instance groups in SageMaker HyperPod. This defines constraints on instance types and Availability Zones to guide Karpenter’s auto scaling decisions.
  • NodePool configuration to specify hardware requirements and placement for pods, enabling precise control over which nodes are created and where they run.
  • Optional integration with Kubernetes Event-driven Autoscaling (KEDA) for pod-level autoscaling driven by metrics (e.g., CloudWatch, SQS, Prometheus, or observed utilization). When combined with Karpenter, KEDA scales pods and Karpenter provisions or deletes nodes as resource needs change. To implement end-to-end auto scaling, you can combine KEDA with Karpenter in SageMaker HyperPod. This two-tier approach ensures pods scale in response to demand signals and nodes are provisioned or removed to match the evolving workload.

Why it matters (impact for developers/enterprises)

For inference workloads facing real-time traffic, auto scaling is critical to meeting latency SLAs while maintaining cost efficiency. The managed Karpenter approach reduces operational overhead, accelerates scaling decisions, and enables scale-to-zero to minimize idle resource usage. Developers and enterprises benefit from:

  • Faster reaction to traffic spikes and more stable response times during peak demand.
  • Reduced operational burden since SageMaker HyperPod manages Karpenter installation, configuration, and maintenance.
  • Improved cost efficiency by scaling to zero when idle and by aligning resources with actual workload requirements.
  • A resilient ML infrastructure with observability and tooling optimized for large-scale ML training and deployment.

Technical details or Implementation

Prerequisites and initial setup

  • Verify quotas for the instance types you plan to create in the SageMaker HyperPod cluster via the AWS Service Quotas console (e.g., g5.12xlarge).
  • Create an AWS Identity and Access Management (IAM) role for HyperPod autoscaling with Karpenter, following the guidance to enable the required permissions.
  • Launch and configure your SageMaker HyperPod EKS cluster with continuous provisioning enabled at cluster creation. This setup provisions the necessary VPC, subnets, security groups, and EKS cluster, and installs the required operators. If you prefer, you can use an existing EKS cluster instead of creating a new one. The full setup typically takes about 20 minutes.
  • Ensure each InstanceGroup is restricted to a single Availability Zone by using OverrideVpcConfig and selecting one subnet per InstanceGroup. Enabling Karpenter in your HyperPod cluster
  • After cluster creation, update the cluster to enable Karpenter. This can be done through the AWS CLI (UpdateCluster API) or programmatically with Boto3 (Python). You would configure your AWS credentials and call the appropriate API to apply the changes, then verify Karpenter is enabled by describing the cluster (DescribeCluster API).
  • Verification steps include confirming that the cluster reflects Karpenter as enabled and that the required Karpenter controllers are installed and running in the HyperPod EKS environment. HyperPodNodeClass and NodePool configuration
  • HyperpodNodeClass is a custom resource that maps to pre-created instance groups in SageMaker HyperPod. It defines constraints around which instance types and Availability Zones HyperPod’s Karpenter autoscaler may use when scaling pods.
  • To use HyperpodNodeClass, you reference the names of the InstanceGroups you want to source AWS compute resources from for node scaling. The HyperpodNodeClass name you specify propagates to the NodePool in the next step.
  • NodePool defines the hardware (instance types) and placement (Availability Zone) for pods. The NodePool acts as a bridge between the Karpenter controller and the underlying SageMaker HyperPod instance groups, ensuring that pods are scheduled on appropriate nodes.
  • A sample NodePool configuration demonstrates including an instance type such as ml.g6.xlarge and constraining placement to a single zone. Pods can be deployed via Kubernetes deployments and will trigger Karpenter to provision nodes as needed. End-to-end auto scaling and observability
  • For a complete auto scaling solution, you can introduce Kubernetes Event-driven Autoscaling (KEDA). KEDA scales pods based on signals such as CloudWatch metrics, SQS queue lengths, Prometheus queries, and resource utilization patterns. When KEDA scales up pods, Karpenter provisions the necessary nodes; when KEDA scales down, Karpenter deprovisions excess nodes, ensuring the cluster maintains the exact resources required.
  • A sample KEDA ScaledObject spec can scale the number of inference pods based on CloudWatch metrics (e.g., ALB request count) to illustrate the integration. Managing resources and cleanup
  • If you want to prevent ongoing charges, delete your SageMaker HyperPod cluster when it is no longer needed.
  • By enabling Karpenter in your SageMaker HyperPod cluster, ML workloads can automatically adapt to changing workload requirements, optimize resource utilization, and help control costs by scaling precisely when needed. The integration with event-driven pod scalers like KEDA further enhances responsiveness to production traffic. Implementation notes
  • The HyperPod-managed Karpenter auto scaling is designed to be resilient and non-blocking, retrying in the background when capacity constraints occur, so workload start times are not drastically impacted by provisioning delays.
  • The solution combines Karpenter’s proven node lifecycle management with the resilience and tooling of SageMaker HyperPod, crafted specifically for large ML workloads.

Key takeaways

  • SageMaker HyperPod now includes a managed Karpenter-based auto scaling capability for its EKS clusters.
  • The solution supports scale-to-zero, reducing idle compute costs while remaining ready to scale up on demand.
  • Continuous provisioning and background retries help maintain cluster scale without blocking workloads.
  • HyperPodNodeClass and NodePool provide fine-grained control over instance types, zones, and pod placement.
  • Optional integration with KEDA enables a two-tier autoscaling approach that scales both pods and nodes in response to real-time metrics.

FAQ

  • What does the new auto scaling with Karpenter do in SageMaker HyperPod?

    It provides a managed Karpenter-based auto scaling solution that scales SageMaker HyperPod clusters in response to demand, with scale-to-zero support and continuous provisioning.

  • What must I configure before enabling Karpenter auto scaling?

    You must verify quotas for the target instances, create an IAM role for HyperPod autoscaling with Karpenter, and enable continuous provisioning on the HyperPod EKS cluster.

  • How do I enable Karpenter on my HyperPod cluster?

    Update the cluster to enable Karpenter (via the AWS CLI UpdateCluster API or Boto3), then verify with DescribeCluster to confirm Karpenter is active.

  • What is HyperpodNodeClass and NodePool used for?

    HyperpodNodeClass maps to SageMaker HyperPod instance groups to constrain which instance types and zones are used for scaling; NodePool defines the hardware and placement for the pods.

  • Can I combine KEDA with Karpenter in HyperPod?

    Yes. KEDA scales pods based on metrics and signals, while Karpenter provisions or deletes nodes to match the updated pod demand.

References

More news