Skip to content
Create a private workforce on Amazon SageMaker Ground Truth with the AWS CDK
Source: aws.amazon.com

Create a private workforce on Amazon SageMaker Ground Truth with the AWS CDK

Sources: https://aws.amazon.com/blogs/machine-learning/create-a-private-workforce-on-amazon-sagemaker-ground-truth-with-the-aws-cdk, https://aws.amazon.com/blogs/machine-learning/create-a-private-workforce-on-amazon-sagemaker-ground-truth-with-the-aws-cdk/, AWS ML Blog

TL;DR

  • Provides a complete, code-driven solution to create private workforces on SageMaker Ground Truth paired with a dedicated Amazon Cognito user pool using the AWS CDK.
  • Addresses the mutual dependency between Cognito resources and the private workforce by using CloudFormation custom resources to orchestrate creation and configuration.
  • Deployments are managed as a single stack; cleanup can be performed via the CloudFormation console or by running cdk destroy with the same CDK options used for deployment.
  • The solution supports an end-to-end authentication flow for labeling workers: email invitation, initial registration, authentication, and login to the labeling portal.
  • It emphasizes best practices for Cognito and CDK, and points to further customization via AWS Professional Services and official data labeling guides.

Context and background

Private workforces for SageMaker Ground Truth and Amazon Augmented AI (Amazon A2I) enable organizations to build proprietary datasets while maintaining high security and privacy. The AWS Management Console offers a quick way to create a private workforce, but many organizations require automated, infrastructure-as-code IaC approaches to ensure consistent, repeatable deployments and to minimize human error. The post presents a complete solution that programmatically creates private workforces on SageMaker AI using the AWS CDK, including a dedicated, fully configured Cognito user pool. This approach is designed to address the key orchestration challenges involved in marrying Cognito with SageMaker private workforces and to provide a reusable, configurable base for ML labeling tasks. The solution highlights how to manage the interdependencies between Cognito resources and the private workforce so that a consistent login and labeling experience can be achieved across environments. The content also notes that a GitHub repository accompanies the solution, offering a customizable AWS CDK example to create and manage the private workforce and its Cognito resources. The overarching goal is to deliver a comprehensive, IaC-ready setup that accelerates private labeling at scale. The architecture relies on a single stack that provisions multiple resources and services. Some components are essential for the initial setup, while others serve the ongoing private workforce used by workers to log in and complete labeling tasks. A central theme is the need for careful sequencing: certain Cognito parameters, such as the app client callback URL, only become available after the private workforce exists, yet the workforce creation itself requires the app client to be present. This mutual dependency is a core technical challenge addressed by introducing CloudFormation custom resources to coordinate resource creation and configuration in the correct order. From a deployment perspective, the solution also emphasizes the importance of a consistent Cognito user pool domain name across deployments, since changing the domain name after creation is not straightforward and can lead to deployment errors. By combining AWS CDK constructs with CloudFormation custom resources, the approach orchestrates the lifecycle of the Cognito pool, app client, and SageMaker private workforce to ensure a correct, repeatable setup that supports downstream labeling operations.

What’s new

This post introduces a practical, end-to-end pattern for programmatically creating a private workforce on SageMaker Ground Truth in tandem with a dedicated Cognito user pool, all orchestrated through the AWS CDK. The key innovation is the integration of AWS CDK with CloudFormation custom resources to manage complex dependencies between the Cognito pool and the private workforce. The solution provides:

  • A CDK-based architecture that creates and configures SageMaker private workforce resources alongside a Cognito user pool and its dependent resources.
  • An orchestration mechanism that resolves the mutual dependency where the app client requires a callback URL produced during workforce creation, and the workforce creation relies on the presence of the app client.
  • A single stack that includes resources needed for initial setup and resources that workers rely on when logging in to the labeling portal.
  • A guided deployment workflow and cleanup instructions that align with standard AWS IaC practices, including the CDK destroy command as an alternative to deleting resources through the CloudFormation console.
  • Guidance for customizing the base infrastructure to meet organizational standards and considerations for Cognito best practices and CDK usage. The article also highlights the broader ecosystem of SageMaker labeling guides and Cognito user pool guides as references for extending the solution. It notes that professional services can help accelerate cloud adoption and emphasizes the value of a well-designed private workforce for secure, private ML data labeling workflows.

Why it matters (impact for developers/enterprises)

  • Automation and consistency: By leveraging the AWS CDK and CloudFormation custom resources, organizations gain repeatable, auditable deployments of private workforces and their Cognito pools, reducing the risk of human error in manual setup.
  • Security and privacy: The solution creates a dedicated Cognito user pool for authenticating private workforce workers, aligning with security and privacy requirements for proprietary data labeling tasks.
  • End-to-end labeling workflow: The authentication flow covers email invitation, registration, authentication, and login to the labeling portal, enabling a streamlined user experience for workers and administrators alike.
  • Manageability at scale: A single stack approach simplifies lifecycle management, enabling efficient provisioning and teardown across multiple environments while preserving configuration integrity.
  • Practical guidance: The post points to best practices for Cognito and CDK, and invites readers to explore additional customization and support via AWS Professional Services and official data labeling guides.

Technical details or Implementation

The core of the solution is a combination of AWS CDK constructs and CloudFormation custom resources that integrate the Amazon Cognito user pool with the SageMaker private workforce. The design intentionally addresses the mutual dependency problem: the app client in Cognito requires parameters that are only known after the private workforce is created, while the private workforce requires a preexisting app client. The orchestration using custom resources coordinates this sequencing so resources are created in the correct order and configured consistently across deployments. Key architectural highlights include:

  • Cognito resources: a dedicated user pool and a corresponding app client, configured to support the authentication flow used by labeling workers.
  • SageMaker private workforce: the private workforce that pairs with the Cognito pool to enable secure labeling tasks.
  • CloudFormation custom resources: provide the orchestration logic to coordinate the interdependent creation steps and ensure parameter propagation in the right order.
  • Single orchestration stack: a unified CDK stack that provisions the necessary resources for initial setup and for ongoing worker access to the labeling portal.
  • Consistent domain name: the Cognito user pool domain name must remain stable across deployments to avoid deployment errors. Deployment prerequisites include having AWS credentials with sufficient permissions to deploy the solution resources and following the invited workflow to join the private workforce. After deployment, administrators can invite themselves or others to join the private workforce, and invited users will receive an email to register and authenticate. The workflow for workers includes the following sequence: email invitation, initial registration, authentication, and login to the labeling portal where labeling jobs are displayed. From a practical standpoint, the solution emphasizes how to clean up resources: navigate to the AWS CloudFormation console and delete the Workforce stack, or, if deployed via the CDK CLI, run cdk destroy with the same CLI arguments used for deployment. This aligns with standard IaC best practices for resource lifecycle management. The solution is designed to be customizable so organizations can tailor it to their security, identity, and UX standards. The accompanying CDK example is intended to be adaptable to different environments and policies, and readers are encouraged to explore additional customization through AWS Professional Services and official guides on data labeling and Cognito user pools. For those who want to explore further, the referenced data labeling and Cognito guides provide broader context and examples of labeling data with SageMaker Ground Truth and configuring Cognito for enterprise-grade use cases. The author of the post, Giorgio Pessot, emphasizes the practical engineering aspects of building enterprise-grade AI systems at scale, blending mathematical rigor with cloud-native DevOps practices.

Architecture at a glance (stack components)

| Component | Role | When it is used |---|---|---| | SageMaker private workforce | Enables private labeling tasks for SageMaker Ground Truth | Core operation after deployment |Amazon Cognito user pool | Provides dedicated identity for private workforce users | Always active after initial setup |Cognito app client | Enables the authentication flow with callback URL | Created early in setup, depends on workforce |CloudFormation custom resources | Orchestrate cross-resource dependencies | During initial setup to coordinate sequencing |CDK constructs | Define, provision, and manage resources as code | Throughout deployment and updates |Related resources (IAM roles, permissions, etc) | Enforce access controls and workflow permissions | Throughout lifecycle | The diagram referenced in the original post illustrates how these components come together in a single stack, with workers joining the private workforce and accessing the labeling portal after authentication. The textual workflow summary below mirrors how a worker logs in and engages with labeling tasks.

Workflow summary (authentication and access)

  • An invitation email is sent to the worker’s address.
  • The worker registers using the link, creates a password, and configures an authenticator app if required.
  • The worker authenticates and gains access to the labeling portal.
  • The worker can view labeling jobs and contribute to data labeling tasks.

Detailed deployment notes

  • Prerequisites: You must have AWS credentials in your environment with the required permissions to deploy the resources.
  • If you invited yourself from the CDK CLI to join the private workforce, follow the email to register and access the portal; otherwise, invite others using the workflow described in the post.
  • After deployment, clean up by deleting the Workforce stack in CloudFormation or by running cdk destroy with the same AWS CDK CLI arguments used during deployment.
  • The solution aims to be a foundation for private workforce infrastructure and can be extended with additional customization to align with organizational standards and security policies.

Key takeaways

  • IaC enables automated, repeatable setup of private workforces for SageMaker Ground Truth with a dedicated Cognito pool.
  • The combination of CDK and CloudFormation custom resources resolves the mutual dependency between Cognito and the private workforce.
  • A single stack approach simplifies lifecycle management and enforces consistent configurations across environments.
  • The enrollment and login workflow provides a complete end-to-end experience for labeling workers.
  • The solution encourages best practices and professional services for tailoring the architecture to enterprise needs.

FAQ

  • What problem does this solution address?

    It provides a programmatic way to create private workforces on SageMaker Ground Truth together with a dedicated Cognito user pool while handling cross-resource dependencies.

  • How are the mutual dependencies between Cognito and the private workforce handled?

    The solution uses CloudFormation custom resources to orchestrate the creation and configuration in the correct order, ensuring parameters required by one resource are available when needed by the other.

  • How do I clean up the deployed resources?

    You can delete the Workforce stack via the AWS CloudFormation console or run cdk destroy from the CDK CLI using the same deployment arguments.

  • What does the worker authentication flow look like?

    It includes an email invitation, initial registration, authentication, and login to the labeling portal where tasks are displayed.

  • Can the Cognito domain name change after deployment?

    No, the Cognito user pool domain name must remain consistent across deployments to avoid errors.

References

More news