Skip to content
Natural language-based database analytics with Amazon Nova
Source: aws.amazon.com

Natural language-based database analytics with Amazon Nova

Sources: https://aws.amazon.com/blogs/machine-learning/natural-language-based-database-analytics-with-amazon-nova, https://aws.amazon.com/blogs/machine-learning/natural-language-based-database-analytics-with-amazon-nova/, AWS ML Blog

TL;DR

  • Natural language database analytics are enabled by Amazon Nova foundation models (Nova Pro, Nova Lite, Nova Micro) and the ReAct pattern implemented via LangGraph, allowing conversation-like interactions with complex database systems.
  • The solution centers on three core components—UI, generative AI, and data—and uses an agent to coordinate questions, reasoning, workflow orchestration, and comprehensive natural language responses with self-remediation.
  • It supports self-healing and human-in-the-loop (HITL) workflows to validate and refine SQL queries, ensuring results match user intent and schema requirements.
  • Evaluations on the Spider text-to-SQL dataset show competitive accuracy and low latency for cross-domain translation tasks, with notable strengths on the most complex queries. For production deployments, security considerations for Streamlit in the demo are noted.
  • The GenAIIC collaboration provides access to experts to identify valuable use cases and tailor practical generative AI solutions; the architecture is built on Amazon Bedrock to enable natural language to SQL and data visualizations.

Context and background

Natural language interfaces to databases have long been a goal in data management. The approach described here leverages large language model (LLM) agents to decompose complex queries into explicit, verifiable reasoning steps and to enable self-correction through validation loops. By catching errors, analyzing failures, and refining queries, the system moves toward matching user intent and schema requirements with precision and reliability. This enables intuitive, conversation-like interactions with sophisticated database systems while preserving analytical accuracy. To achieve optimal performance with minimal trade-offs, the solution uses the Amazon Nova family of foundation models (Nova Pro, Nova Lite, and Nova Micro). These FMs encode vast amounts of world knowledge, enabling nuanced reasoning and contextual understanding essential for complex data analysis. The ReAct (reasoning and acting) pattern is implemented through LangGraph’s flexible architecture, combining the strengths of Amazon Nova LLMs for natural language understanding with explicit reasoning steps and actions. The result is a modern approach to natural language database analytics that supports automated analysis and natural language querying across datasets. Many customers undergoing generative AI transformation recognize that their vast data stores hold untapped potential for automated analysis. This insight often leads to SQL-based exploration, from simple SELECT statements to complex, multipage queries with rich aggregations and functions. The core challenge remains translating user intent—stated or implicit—into performant, precise, and valid SQL queries that retrieve the correct dataset for downstream visualization and exploration. Our solution excels in generating context- and metadata-aware queries capable of retrieving precise datasets and performing intricate analyses. A user-friendly interface is essential, and we have developed an intuitive interface where users can be guided through their analysis journey with human-in-the-loop (HITL) capabilities, allowing input, approvals, and modifications at key decision points. The interface and workflow are designed to support a cohesive analysis journey, not a single query.

What’s new

The architecture described in this post introduces three core components—UI, generative AI, and data—along with an intelligent agent that serves as the central coordinator. The agent performs question understanding, decision-making, workflow orchestration, intelligent routing, and the generation of comprehensive natural language responses. It enhances questions by improving text quality, standardizing terminology, and maintaining conversational context to enable users to extend analysis through a sequence of related queries while preserving precise analytical intent. Key capabilities include intelligent routing to invoke the correct tools for each user question, enabling end-to-end query processing that can also process tabular and visual data. The agent uses the full context to generate comprehensive summaries explaining findings, highlighting key insights, and suggesting relevant follow-up questions. A notable added benefit is the agent’s ability to propose relevant follow-up topics to deepen data exploration and uncover unexpected insights. The agent maintains context across conversations so users can provide minimal follow-up inputs, while abbreviated questions are reconstructed from earlier context for confirmation. After each exchange, the agent suggests follow-up exploratory questions to continue the investigation. Consistent terminology is enforced, expanding abbreviations to full forms (for example, GDPR becomes General Data Protection Regulation) to ensure clarity in inputs and outputs. The solution supports self-remediation: when an execution error occurs, the agent uses the error and full context to regenerate a corrected SQL query. This self-healing approach provides robustness in complex scenarios and helps maintain reliable query processing. The agent’s outputs include a natural language summary, tabular results with reasoning, visualizations with explanations, and a concise insights summary. Production deployments leverage Streamlit for illustration, though security configurations and deployment architectures should be reviewed to align with organizational requirements. The Generative AI Innovation Center (GenAIIC) at AWS has developed this natural language-based database analytics solution, which combines the strengths of Amazon Nova FMs with explicit reasoning steps and actions implemented via ReAct in LangGraph’s architecture. The solution is built on Amazon Bedrock, enabling intuitive, conversation-like interactions with complex database systems and the seamless translation of natural language queries into accurate SQL statements and insightful data visualizations. Evaluation results indicate competitive performance, highlighting the potential to democratize data access and analysis with natural language interfaces. The GenAIIC also offers access to a team of experts to help identify valuable use cases and implement practical generative AI solutions tailored to specific needs. For more information, see Amazon Nova Foundation Models and Amazon Bedrock. If you’re interested in collaborating with GenAIIC, you can find more information at AWS Generative AI Innovation Center.

Technical details or Implementation

Architecture and core components

The solution architecture comprises three core components: UI, generative AI, and data. An agent acts as the central coordinator, combining capabilities such as question understanding, decision-making, workflow orchestration, intelligent routing, and generating comprehensive natural language responses. The agent enhances text quality, standardizes terminology, and preserves conversational context to support a sequence of related queries while maintaining precise analytical intent. Its intelligent routing ensures that the correct tools are invoked for each user question, enabling cohesive end-to-end query processing. The workflow enables processing of tabular and visual data and uses full context to generate summaries and insights.

The agent and self-healing capabilities

A key feature is the agent’s self-remediation capability. When execution errors occur, the agent uses the errors and the full context to regenerate a corrected SQL query. This self-healing approach provides robust and reliable query processing, even in complex scenarios. The agent processes inputs—the rewritten question, results from analysis, and context—to produce a natural language summary and a response that includes tabular results with reasoning, explanations for visualizations, and a concise insights summary. The agent maintains context across conversations, reconstructing abbreviated questions from prior context for confirmation and suggesting follow-up questions after each exchange.

Data processing, visualization, and language standardization

The system’s design supports processing both tabular and visual data and generating comprehensive outputs that explain findings and highlight insights. It standardizes terminology to align with industry standards, customer guidelines, and brand requirements, expanding abbreviations to their full forms to improve clarity in both inputs and outputs.

Evaluation and production considerations

The solution was evaluated on the Spider text-to-SQL dataset, a widely used benchmark for cross-domain semantic parsing and text-to-SQL tasks. The Spider dataset comprises 10,181 questions and 5,693 unique complex SQL queries across 200 databases spanning 138 domains. The evaluation was conducted in a zero-shot setting (no fine-tuning on dataset examples) to assess general text-to-SQL translation abilities. Metrics focused on overall accuracy and efficiency, with results indicating competitive performance and low latency for translation tasks, especially on complex queries. The Spider evaluation helps compare Amazon Nova against state-of-the-art approaches and demonstrates its potential to support natural language querying at scale.

Demo and production readiness notes

The Streamlit interface is used for illustration purposes in the demo. For production deployments, security configurations and deployment architectures should be reviewed to ensure alignment with organizational security requirements and best practices. The GenAIIC provides access to experts who can help identify valuable use cases and tailor practical generative AI solutions to specific needs, complementing the underlying Nova and Bedrock technologies.

Prerequisites and deployment steps (high-level)

  • Use SageMaker notebook instances to experiment with the solution.
  • Download and prepare the database used for querying.
  • Start the Streamlit application with the command: streamlit run app.py. The demo illustrates the interface and flow; production deployments should account for security and scalability considerations.

Tables and key facts

ComponentDescription
Core modelsAmazon Nova Pro, Nova Lite, Nova Micro
PatternReAct (reasoning and acting) implemented via LangGraph
PlatformAmazon Bedrock
Evaluation datasetSpider text-to-SQL (zero-shot)
Data domains138 domains across 200 databases

Why it matters (impact for developers/enterprises)

  • Enables end-to-end natural language querying over structured data with precise SQL generation, reducing the barrier to data access for analysts and decision-makers.
  • Uses explicit reasoning steps and actions, improving transparency and traceability of the analytical process.
  • Supports self-healing queries and HITL workflows to improve robustness and reliability in real-world deployments.
  • Provides a scalable approach to cross-domain query translation and data exploration with low latency, even for complex queries.
  • Offers access to expert guidance through GenAIIC to identify impactful use cases and tailor solutions to organizational needs.

Key takeaways

  • Amazon Nova, guided by the ReAct pattern and LangGraph, enables natural language to SQL translation with explicit reasoning.
  • A central agent coordinates questions, routing, and outputs, while maintaining context across a multi-turn dialogue.
  • Self-remediation and HITL allow the system to recover from execution errors and refine results to user intent.
  • Spider dataset evaluation demonstrates competitive performance in zero-shot cross-domain text-to-SQL tasks with low latency.
  • Streamlit demo showcases the interface, but production deployments require security-conscious configurations.

FAQ

  • What is the role of Amazon Nova in this solution?

    mazon Nova provides the foundation models used for natural language understanding and reasoning, enabling NL-to-SQL translation within the ReAct framework.

  • How does self-healing work in practice?

    When execution errors occur, the agent uses the errors and full context to regenerate a corrected SQL query, improving robustness.

  • What is LangGraph’s role?

    LangGraph implements the ReAct pattern, coordinating reasoning steps and actions with the Nova models to drive end-to-end query processing.

  • What dataset underpins the evaluation, and what does it show?

    The Spider text-to-SQL dataset (10,181 questions, 5,693 SQL queries across 200 databases / 138 domains) was used in a zero-shot setting to assess generalization, showing competitive accuracy and low latency.

  • Where can I learn more or collaborate with GenAIIC?

    The GenAIIC offers access to experts to help identify use cases and implement practical generative AI solutions; more information is available through AWS Generative AI Innovation Center channels described in the post.

References

More news