Skip to content
Agentic AI Solutions for Warehouse Data Access and Security
Source: engineering.fb.com

Agentic AI Solutions for Warehouse Data Access and Security

Sources: https://engineering.fb.com/2025/08/13/data-infrastructure/agentic-solution-for-warehouse-data-access, engineering.fb.com

TL;DR

  • Meta is evolving its data warehouse security with an agentic, multi-agent system that embeds AI agents into data-access workflows to balance productivity and risk.
  • The solution separates roles into data-user agents and data-owner agents, each with specialized sub-agents to guide access and security operations.
  • Context management and intention management (explicit and implicit) enable nuanced, task-focused data access across complex, cross-domain graphs.
  • A read-only, text-based abstraction of the warehouse supports agents while guardrails and logging enforce risk controls and auditing.
  • The system aims for end-to-end use cases like partial data previews, with eventually more autonomous operations under human oversight.

Context and background

Meta operates a data warehouse within its offline data systems that supports analytics, machine learning, and AI use cases. Given the sheer data volume, system scale, and diverse use cases, security in data access is critical. Historically, teams managed access using a hierarchical structure and role-based access control (RBAC), aligning business needs with assets like tables, pipelines, dashboards, and other resources. Access decisions were largely local, with humans guiding discovery and granting permissions to closely related teams. As data warehouses grow and AI agents begin to process data across broad domains, the prior, human-centric approach to access becomes increasingly complex and time-intensive. Visualizing the data flow as a graph—with assets such as tables, columns, and dashboards as nodes and activities as edges—helps illustrate the challenge: cross-domain AI usage disrupts the simplicity of human-led decisions and necessitates a new approach. Meta therefore evolved toward an agentic solution for data access, designed to work for both humans and agents operating together within data products. This includes a native integration into data workflows and the establishment of strict guardrails, such as analytical rule-based risk assessment, to safeguard agents. This work models a multi-agent system in which data-user agents assist data users in obtaining access, while data-owner agents help data owners manage access. The two agents collaborate when both parties are involved. This separation decomposes the problem, letting each agent focus on its primary duties while coordinating to enable secure, efficient access. The transformation reflects a shift from purely human-driven processes to a cooperative, agent-enabled system that scales with data, semantics, and usage patterns. Meta frames this evolution as an ongoing effort to balance productivity with security, using agents that can reason about data semantics, access rules, and risk. The new agentic workflow is designed to natively integrate into data products so that agents, humans, and services can operate in concert within safe, auditable boundaries. The overarching goal is to minimize security risks while enabling teams to work efficiently in an increasingly data-centric environment. Meta Engineering Blog

What’s new

The central innovation is an agentic workflow constructed around two agent roles: data-user agents and data-owner agents. Each role is further decomposed into specialized sub-agents to handle discrete tasks, coordinated by a triage mechanism.

  • Data-user agent: not a single monolith but a composition of three sub-agents, orchestrated by a triage agent. The sub-agents are:
  • Alternatives sub-agent: suggests alternatives when a user encounters restricted tables (e.g., unrestricted or less-restrictive tables, or curated analyses). Large language models (LLMs) help synthesize tribal knowledge and guide users at scale.
  • Low-risk exploration sub-agent: enables context-aware, task-specific data access for safe, initial exploration—typically a small fraction of a table’s data.
  • Access-request sub-agent: crafts permission requests and negotiates with data-owner agents; currently human-in-the-loop, with a path toward greater autonomy.
  • Data-owner agent: also composed of multiple sub-agents, including:
  • Security operations sub-agent: acts as a junior engineer to assist with security tasks, following SOPs derived from documented rules/guidelines to handle incoming permission requests.
  • Access-management sub-agent: proactively configures access rules and helps leverage semantics and content to improve traditional role-mining processes. To support agents, Meta maintains the data warehouse in a hierarchical structure, where folders represent organizing units and leaf resources include tables, dashboards, policies, and more. This structure enables a read-only, text-based view of warehouse resources for agents. The SOP itself becomes a textual resource that agents can consult for guidance on data-access practices. Context management differentiates among three scenarios:
  • Automatic context: the system knows who is trying to access what and can fetch exact context when access is blocked.
  • Static context: users explicitly narrow scope or expand the resource from automatic context.
  • Dynamic context: agents filter resources by metadata (e.g., data semantics) or via similarity search. Intention management captures what drives a data user to access resources, in two forms:
  • Explicit intention: users communicate their current task and role, carrying the business context.
  • Implicit intention: the system infers intent from a user’s recent activities when roles alone cannot capture it. End-to-end use case: partial data preview. Typically, a data user’s workflow starts with discovery, followed by exploration (exposing a small amount of data), then full analysis. The agentic workflow orchestrates four capabilities to enable task-focused, context-aware access during exploration, enabling users to see meaningful data while respecting risk controls.
  • Data-user agent uses the user-activities tool to gather across platforms (diffs, tasks, posts, SEVs, dashboards, documents) and the user-profile tool to fetch profile information. It then formulates the user’s intention based on activities, profiles, and query shapes and calls the data-owner agent.
  • The data-owner agent analyzes the query, identifies the resources being accessed, fetches resource metadata (table summaries, column descriptions, data semantics, SOPs), and leverages an LLM to generate an output decision and the reasoning behind it. An output guardrail ensures alignment with rule-based risk calculations, and all decisions and logs are securely stored for future auditing and improvement.

Tables

| Agent | Primary role | Key sub-agents |---|---|---| | Data-user agent | Assists data users in obtaining access; reads warehouse-as-text and coordinates with the data-owner agent | Alternatives sub-agent; Low-risk exploration sub-agent; Access-request sub-agent |Data-owner agent | Helps data owners manage access and enforce security | Security operations sub-agent; Access-management sub-agent |

Why it matters (impact for developers/enterprises)

Automating and coordinating access in a large, multi-domain data warehouse reduces the time-to-access for researchers and analysts while maintaining strong security controls. By decoupling concerns into specialized sub-agents and coupling them with context- and intention-aware reasoning, the system reduces repetitive manual approvals, surfaces safer alternatives for data discovery, and enables more consistent adherence to risk policies. For enterprises, this approach points toward scalable governance that can evolve with growing data volumes, more sophisticated AI processes, and increasingly cross-domain analytics needs, without sacrificing traceability or control.

Technical details or Implementation

The architecture relies on a multi-agent system with two primary agents:

  • Data-user agents that assist data users in obtaining access and a triage mechanism coordinating three sub-agents: alternatives, low-risk exploration, and access requests.
  • Data-owner agents that help data owners manage and operate data access, including security operations and access-management sub-agents. Key implementation elements include:
  • A hierarchical, text-based representation of warehouse resources to convert assets into a format suitable for agent reasoning (folders as organizing units; leaf nodes as resources like tables, dashboards, or policies).
  • The SOP becomes a resource that agents use as input to guide data-access decisions.
  • Context management across automatic, static, and dynamic contexts, enabling agents to fetch, constrain, or expand resource sets as needed.
  • Intention management with explicit and implicit signals that drive how agents interpret user tasks and access needs.
  • End-to-end workflow for partial data previews, where the data-user agent gathers user activity data and profile information, formulates intent, and invokes the data-owner agent to fetch metadata and perform a decision with an auditable, rule-based guardrail.
  • The role of the LLM is to generate output decisions and the associated reasoning, which must pass through the guardrails before any access is granted or logged. All decisions and logs are securely stored to support auditability and future improvements.

Key takeaways

  • Agentic data access integrates specialized AI agents into warehouse security and productivity workflows.
  • Separate data-user and data-owner agents each have dedicated sub-agents to handle alternatives, exploration, permissions, security operations, and access management.
  • Text-based resource representations and SOPs enable agents to reason with data semantics while preserving strict guardrails and auditability.
  • Context and intention management empower agents to tailor access to users’ roles, tasks, and evolving needs.
  • The approach supports end-to-end workflows like partial data previews and points toward greater autonomy under ongoing human oversight.

FAQ

  • Q: What is the core idea behind an agentic solution for warehouse data access? A: It uses a multi-agent system with data-user and data-owner agents to streamline access and enforce security, decomposing tasks into specialized sub-agents and coordinating actions via a triage mechanism.
  • Q: How are risks controlled in this system? A: Output decisions pass through a rule-based risk guardrail, and all decisions and logs are securely stored for auditing.
  • Q: What do the sub-agents within the data-user agent do? A: The Alternatives sub-agent suggests safer data options, the Low-risk exploration sub-agent enables controlled data access during exploration, and the Access-request sub-agent crafts and negotiates permission requests with data-owner agents.
  • Q: How is context and intent managed? A: Context comes in automatic, static, and dynamic forms, while intention management includes explicit signals from users and implicit inferences from activities over a short period.

References

More news