· Lincoln J Bicalho · AI & Machine Learning · 28 min read
Training Vanna AI on Complex Enterprise Databases: A Practical Guide
Training Vanna AI on enterprise databases requires strategic planning and realistic expectations. Here's how to build effective training datasets, handle complex schemas, and achieve reliable accuracy while maintaining security and validation protocols.
📋 Prerequisites:
- Basic understanding of Vanna AI (see Part 1 of this series)
- Python 3.8+ environment
- Access to enterprise database with DDL export capability
- Familiarity with SQL and database schema concepts
- Understanding of vector embeddings and RAG (retrieval-augmented generation)
When I first deployed Vanna AI in our enterprise environment, I made a critical mistake: I assumed that feeding it our database schema would be enough to generate accurate SQL. After weeks of inconsistent results and frustrated users, I learned that training Vanna AI effectively requires a strategic, methodical approach.
Here’s the reality from managing multiple enterprise deployments: Vanna AI is a powerful assistant that requires proper training to achieve reliable accuracy, but it’s not a replacement for SQL expertise. The key is understanding how to train it effectively and when to trust its output.
Understanding Vanna’s Training Architecture
Before you begin training, you need to understand how Vanna AI learns. Unlike traditional machine learning models that require extensive training datasets and compute resources, Vanna uses a retrieval-augmented generation (RAG) approach.
Core Components
1. Vector Embeddings Your training data (DDL statements, documentation, query patterns) is converted into high-dimensional vector representations that capture semantic meaning.
2. Similarity Search When you ask a question, Vanna finds the most relevant training examples by comparing vector embeddings, not just keyword matching.
3. LLM Generation The retrieved context is passed to a large language model that generates SQL based on the most relevant examples from your training data.
ℹ️ Note: This architecture means your training data quality directly impacts output accuracy. Poor training data leads to poor results, regardless of how sophisticated the underlying model is.
Why This Matters:
- You don’t need thousands of examples (20-50 quality examples often suffice)
- Similar questions retrieve similar training examples
- The LLM generates SQL based on retrieved patterns
- Training is incremental - you can add examples over time
The Three Pillars of Effective Training
After training Vanna on systems with hundreds of tables and complex business logic, I’ve identified three essential training components that work together to create an effective system.
Training Strategy Decision Table
| Training Type | When to Use | Accuracy Impact | Time Investment | Best For |
|---|---|---|---|---|
| DDL Only | Basic schema understanding | 40-60% | Low (hours) | Proof of concept, simple schemas |
| DDL + Documentation | Business context needed | 60-75% | Medium (days) | Production systems, domain-specific terms |
| DDL + Docs + Queries | Production deployment | 65-85% | High (weeks) | Enterprise systems, complex business logic |
| Full Training + Validation | Mission-critical systems | 70-90% (verified) | Very High (months) | Government/financial systems |
1. DDL (Data Definition Language) Training
Your database schema forms the foundation of Vanna’s understanding. However, you need to train strategically rather than dumping your entire schema at once.
Strategic DDL Training Approach:
import vanna
from vanna.remote import VannaDefault
# WHY: Initialize Vanna with your specific model configuration
# HOW: Replace with your actual API credentials
vn = VannaDefault(model='your-model', api_key='your-api-key')
# WHY: Start with core business entities to build foundation
# HOW: Focus on customer-facing tables first
core_tables_ddl = """
-- Customer master table with business context
-- WHY: Comments explain business logic for better understanding
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
company_name VARCHAR(255) NOT NULL,
industry_code VARCHAR(10), -- References industry_codes table
annual_revenue DECIMAL(15,2),
account_status VARCHAR(20) DEFAULT 'ACTIVE', -- ACTIVE, SUSPENDED, CLOSED
created_date DATETIME2 DEFAULT GETDATE(),
last_modified DATETIME2
);
-- Sales transactions with relationship context
-- WHY: Foreign key relationships help Vanna understand table connections
CREATE TABLE sales_transactions (
transaction_id BIGINT PRIMARY KEY,
customer_id INT FOREIGN KEY REFERENCES customers(customer_id),
product_id INT FOREIGN KEY REFERENCES products(product_id),
sale_amount DECIMAL(12,2) NOT NULL,
transaction_date DATETIME2 NOT NULL,
sales_rep_id INT FOREIGN KEY REFERENCES employees(employee_id),
region_id INT FOREIGN KEY REFERENCES regions(region_id)
);
"""
# Train DDL with verification
try:
# WHY: Train method stores DDL in vector database for retrieval
# HOW: Vanna creates embeddings from schema structure and comments
vn.train(ddl=core_tables_ddl)
print("✅ DDL training completed successfully")
# WHY: Immediate verification ensures training was effective
# HOW: Ask a simple question to test schema understanding
test_response = vn.ask("What tables contain customer information?")
print(f"Test response: {test_response}")
except Exception as e:
print(f"❌ DDL training failed: {e}")💡 Tip: In my experience with federal systems, training core entities first establishes a strong foundation. Add peripheral tables incrementally as you validate accuracy.
DDL Training Best Practices:
| Practice | Why It Matters | Impact |
|---|---|---|
| Include meaningful comments | Explains business logic to LLM | High - improves context understanding |
| Add constraint information | Helps with relationship understanding | Medium - better join generation |
| Use descriptive column names | Reduces need for additional documentation | Medium - clearer intent |
| Start with core entities | Builds foundation before complexity | High - establishes patterns |
| Train incrementally | Validates each addition | High - prevents confusion |
⚠️ Warning: Avoid training your entire schema at once. This overwhelms the retrieval system and dilutes the relevance of search results. Start with 10-20 core tables, validate accuracy, then expand.
2. Documentation and Business Context Training
Raw schema tells Vanna what exists, but documentation teaches it what things mean. This is where many implementations fail.
Business Context Training:
# WHY: Business rules aren't in schema structure
# HOW: Document domain-specific logic as plain text
business_context = """
-- Revenue Recognition Rules
-- WHY: These rules define when revenue is counted
-- Revenue is recognized when:
-- 1. Product is shipped (ship_date IS NOT NULL)
-- 2. Payment terms are met (payment_status = 'CONFIRMED')
-- 3. Return period has expired (ship_date + 30 days)
-- Customer Segmentation Logic
-- WHY: Defines how we categorize customers for reporting
-- Enterprise: annual_revenue > 10000000
-- Mid-Market: annual_revenue BETWEEN 1000000 AND 10000000
-- SMB: annual_revenue < 1000000
-- Regional Territory Mapping
-- WHY: Region codes map to geographic territories
-- North America: region_codes 'NA-01' through 'NA-12'
-- Europe: region_codes 'EU-01' through 'EU-08'
-- Asia Pacific: region_codes 'AP-01' through 'AP-06'
-- Commission Calculation Rules
-- WHY: Different product types have different commission rates
-- Software Sales: base_rate * 1.2 (20% premium)
-- Hardware Sales: base_rate * 0.8 (20% discount)
-- Services: base_rate * 1.0 (standard rate)
"""
# WHY: Documentation training adds semantic meaning
# HOW: Store as text embeddings for retrieval
vn.train(documentation=business_context)
# WHY: Data dictionaries define valid values
# HOW: Helps Vanna understand enumeration types
data_dictionary = """
-- Field Definitions and Business Rules
-- WHY: account_status has specific valid values
account_status VALUES:
- ACTIVE: Customer in good standing, can place orders
- SUSPENDED: Payment issues, orders require approval
- CLOSED: No longer doing business, historical data only
-- WHY: product_category determines commission structure
product_category VALUES:
- SOFTWARE: Licensed software products
- HARDWARE: Physical equipment and devices
- SERVICES: Professional services and support
- TRAINING: Educational and certification programs
"""
vn.train(documentation=data_dictionary)ℹ️ Note: This documentation is crucial for domain-specific terminology. In government systems, acronyms and specialized terms are common - documentation training helps Vanna understand these concepts.
When to Use Documentation vs. DDL vs. Query Training:
| Scenario | Best Training Approach | Reason |
|---|---|---|
| Define table structure | DDL training | Provides schema layout |
| Explain business rules | Documentation training | Adds semantic meaning |
| Show calculation logic | Documentation training | Defines computation rules |
| Demonstrate query patterns | Query pattern training | Provides SQL examples |
| Define valid values | Documentation training | Explains enumerations |
| Show table relationships | DDL training (with FKs) | Structural connections |
| Complex aggregations | Query pattern training | Concrete examples |
3. Query Pattern Training
This is the most critical component. You train Vanna with actual business questions and their corresponding SQL solutions.
❗ Important: Query pattern training is where accuracy improvements are most significant. In my production deployments, systems with 50+ query patterns consistently outperform those with only DDL training.
Progressive Query Training Examples:
Basic Level - Simple Filtering:
# WHY: Start with simple queries to establish patterns
# HOW: Match common business questions to SQL solutions
basic_query = {
"question": "What are our top 10 customers by revenue this year?",
"sql": """
-- WHY: This demonstrates basic aggregation and filtering
SELECT TOP 10
c.company_name,
SUM(st.sale_amount) as total_revenue,
COUNT(st.transaction_id) as transaction_count
FROM customers c
JOIN sales_transactions st ON c.customer_id = st.customer_id
WHERE YEAR(st.transaction_date) = YEAR(GETDATE())
GROUP BY c.customer_id, c.company_name
ORDER BY total_revenue DESC
"""
}
# Train the basic pattern
vn.train(question=basic_query["question"], sql=basic_query["sql"])Intermediate Level - Complex Business Logic:
# WHY: Add complexity with business rules
# HOW: Combine multiple conditions and date logic
intermediate_query = {
"question": "Show enterprise customers who haven't purchased in the last 90 days",
"sql": """
-- WHY: Demonstrates customer segmentation and date calculations
SELECT
c.company_name,
c.annual_revenue,
MAX(st.transaction_date) as last_purchase_date,
DATEDIFF(day, MAX(st.transaction_date), GETDATE()) as days_since_purchase
FROM customers c
LEFT JOIN sales_transactions st ON c.customer_id = st.customer_id
WHERE c.annual_revenue > 10000000 -- Enterprise threshold from business rules
GROUP BY c.customer_id, c.company_name, c.annual_revenue
HAVING MAX(st.transaction_date) < DATEADD(day, -90, GETDATE())
OR MAX(st.transaction_date) IS NULL
ORDER BY last_purchase_date DESC
"""
}
vn.train(question=intermediate_query["question"], sql=intermediate_query["sql"])Advanced Level - With Validation:
# WHY: Production systems need validation and confidence scoring
# HOW: Test each training example immediately
training_queries = [
{
"question": "Calculate total commission for software sales reps this quarter",
"sql": """
-- WHY: Demonstrates complex calculation with business rules
SELECT
e.first_name + ' ' + e.last_name as rep_name,
COUNT(st.transaction_id) as sales_count,
SUM(st.sale_amount) as total_sales,
-- WHY: Apply 20% premium for software (from commission rules)
SUM(st.sale_amount * 0.15 * 1.2) as total_commission
FROM employees e
JOIN sales_transactions st ON e.employee_id = st.sales_rep_id
JOIN products p ON st.product_id = p.product_id
WHERE p.product_category = 'SOFTWARE'
AND st.transaction_date >= DATEADD(quarter, DATEDIFF(quarter, 0, GETDATE()), 0)
GROUP BY e.employee_id, e.first_name, e.last_name
ORDER BY total_commission DESC
""",
"confidence_threshold": 0.75
}
]
# WHY: Validation ensures training was effective
# HOW: Test each query after training
for query in training_queries:
try:
vn.train(question=query["question"], sql=query["sql"])
# Test the training immediately
response = vn.ask(query["question"])
# Validate response has minimum confidence
if hasattr(response, 'confidence') and response.confidence >= query["confidence_threshold"]:
print(f"✅ Query trained successfully: {query['question']}")
print(f" Confidence: {response.confidence:.2f}")
else:
print(f"⚠️ Low confidence for: {query['question']}")
print(f" Consider adding more similar examples")
except Exception as e:
print(f"❌ Training failed for: {query['question']} - {e}")💡 Tip: In our production systems, we group training queries by complexity level and validate each level before proceeding. This progressive approach helps identify exactly where accuracy degrades.
Handling Complex Enterprise Schemas
Enterprise databases present unique challenges that require specialized training approaches.
Challenge 1: Multi-Schema Environments
Problem: Queries need to join tables across different databases or schemas, which increases complexity and reduces accuracy.
Solution: Provide explicit cross-schema context and examples.
# WHY: Multi-schema context isn't obvious from DDL alone
# HOW: Document schema organization and relationships
schema_context = """
-- Schema Organization
-- WHY: Different schemas serve different purposes
-- sales_db.dbo: Current sales transactions and customer data (operational)
-- warehouse_db.dbo: Historical data warehouse (3+ years, reporting)
-- finance_db.dbo: Accounting and financial reporting (compliance)
-- hr_db.dbo: Employee and organizational data (HR operations)
-- Cross-schema relationship patterns
-- WHY: Foreign keys may not exist across databases
-- sales_db.customers.customer_id = warehouse_db.dim_customer.customer_key
-- sales_db.sales_transactions.sales_rep_id = hr_db.employees.employee_id
"""
vn.train(documentation=schema_context)
# WHY: Concrete examples show how to join across schemas
# HOW: Fully qualify table names in training examples
cross_schema_query = """
-- Question: Show sales performance by rep with their department info
SELECT
hr.first_name + ' ' + hr.last_name as rep_name,
dept.department_name,
SUM(sales.sale_amount) as total_sales
FROM sales_db.dbo.sales_transactions sales
JOIN hr_db.dbo.employees hr ON sales.sales_rep_id = hr.employee_id
JOIN hr_db.dbo.departments dept ON hr.department_id = dept.department_id
WHERE sales.transaction_date >= DATEADD(month, -12, GETDATE())
GROUP BY hr.employee_id, hr.first_name, hr.last_name, dept.department_name
ORDER BY total_sales DESC
"""
vn.train(
question="Show sales performance by rep with their department info",
sql=cross_schema_query
)Challenge 2: Legacy System Integration
Problem: Legacy systems often use non-intuitive naming conventions that confuse LLMs.
Solution: Create translation documentation for legacy naming.
# WHY: Legacy abbreviations aren't semantically meaningful
# HOW: Provide mapping documentation for translation
legacy_context = """
-- Legacy System Translation Guide
-- WHY: Helps Vanna understand abbreviated table/column names
-- Table Mappings
-- CUST_MST (Customer Master) = customers table equivalent
-- ORD_HDR (Order Header) = order_headers table equivalent
-- ORD_DTL (Order Detail) = order_line_items table equivalent
-- Field Mappings
-- CUST_MST.CUST_NO = customer_id
-- CUST_MST.CUST_NM = customer_name
-- CUST_MST.CUST_ST = customer_status
-- ORD_HDR.ORD_NO = order_number
-- ORD_HDR.ORD_DT = order_date
-- ORD_DTL.PROD_CD = product_code
-- ORD_DTL.QTY = quantity
-- ORD_DTL.UNIT_PRC = unit_price
-- Status Code Translations
-- WHY: Single-letter codes need business meaning
-- CUST_ST: A=Active, I=Inactive, S=Suspended
-- ORD_ST: P=Pending, C=Confirmed, S=Shipped, D=Delivered
"""
vn.train(documentation=legacy_context)⚠️ Warning: Legacy systems are particularly challenging for Vanna. In my experience with government systems that have 20+ year-old schemas, expect to invest significantly more time in documentation training.
Validation and Confidence Scoring
The most critical aspect of enterprise Vanna deployment is implementing robust validation. Never execute Vanna-generated SQL in production without validation.
Complete Validation System:
import re
from typing import Dict, List
# WHY: Validation prevents dangerous operations and poor-quality SQL
# HOW: Multi-layered checks before execution
class VannaValidator:
def __init__(self, vanna_instance):
self.vn = vanna_instance
self.confidence_threshold = 0.70
self.max_rows_threshold = 100000
def validate_query(self, question: str, generated_sql: str) -> Dict:
"""
Comprehensive query validation with safety checks
WHY: Production systems require multiple validation layers
HOW: Check safety, complexity, performance, and confidence
"""
validation_result = {
"is_safe": False,
"confidence_score": 0.0,
"estimated_rows": 0,
"warnings": [],
"recommendations": []
}
# 1. Safety checks - prevent dangerous operations
# WHY: Prevent accidental data modification or deletion
dangerous_patterns = [
r'DROP\s+TABLE',
r'DELETE\s+FROM',
r'UPDATE\s+.*\s+SET',
r'TRUNCATE\s+TABLE',
r'ALTER\s+TABLE'
]
for pattern in dangerous_patterns:
if re.search(pattern, generated_sql, re.IGNORECASE):
validation_result["warnings"].append(
f"Dangerous operation detected: {pattern}"
)
return validation_result
# 2. Estimate query complexity and performance
# WHY: Complex queries may need manual review
if 'JOIN' in generated_sql.upper():
join_count = generated_sql.upper().count('JOIN')
if join_count > 5:
validation_result["warnings"].append(
f"High complexity: {join_count} joins detected"
)
# 3. Check for missing WHERE clauses on large tables
# WHY: Full table scans on large tables cause performance issues
large_tables = ['sales_transactions', 'audit_log', 'user_activities']
for table in large_tables:
if table in generated_sql and 'WHERE' not in generated_sql.upper():
validation_result["warnings"].append(
f"Missing WHERE clause on large table: {table}"
)
# 4. Estimate result set size
# WHY: Prevent memory issues from huge result sets
try:
# In production, execute count query against database
# This is a simplified example
count_sql = f"SELECT COUNT(*) as estimated_rows FROM ({generated_sql}) as subquery"
# estimated_rows = execute_query(count_sql)
estimated_rows = 1000 # Placeholder for demo
validation_result["estimated_rows"] = estimated_rows
if estimated_rows > self.max_rows_threshold:
validation_result["warnings"].append(
f"Large result set: {estimated_rows} rows (threshold: {self.max_rows_threshold})"
)
except Exception as e:
validation_result["warnings"].append(f"Query validation error: {e}")
# 5. Calculate confidence based on training similarity
# WHY: Low confidence suggests the question is outside training coverage
# HOW: In production, use vector similarity scores
validation_result["confidence_score"] = 0.75 # Placeholder
# 6. Final safety determination
# WHY: Only execute if all checks pass
validation_result["is_safe"] = (
len(validation_result["warnings"]) == 0 and
validation_result["confidence_score"] >= self.confidence_threshold
)
if not validation_result["is_safe"]:
# WHY: Provide actionable guidance for failed validation
validation_result["recommendations"] = [
"Review generated SQL manually before execution",
"Consider rephrasing the question for better accuracy",
"Add more specific criteria to narrow results",
"Train Vanna with similar question patterns"
]
return validation_result
# Usage example
validator = VannaValidator(vn)
def safe_vanna_query(question: str) -> Dict:
"""
Execute Vanna query with comprehensive validation
WHY: Production systems must validate before executing AI-generated SQL
HOW: Generate, validate, then conditionally execute
"""
try:
# Generate SQL from question
generated_sql = vn.generate_sql(question)
# Validate before execution
validation = validator.validate_query(question, generated_sql)
result = {
"question": question,
"generated_sql": generated_sql,
"validation": validation,
"data": None,
"executed": False
}
# Only execute if validation passes
if validation["is_safe"]:
# Execute query (implement your database connection)
# result["data"] = execute_database_query(generated_sql)
result["executed"] = True
print(f"✅ Query executed successfully")
print(f" Confidence: {validation['confidence_score']:.2f}")
else:
print(f"❌ Query failed validation")
for warning in validation["warnings"]:
print(f" - {warning}")
print(f"\nRecommendations:")
for rec in validation.get("recommendations", []):
print(f" - {rec}")
return result
except Exception as e:
return {
"question": question,
"error": str(e),
"executed": False
}
# Test the validation system
test_questions = [
"Show me our top customers this month",
"What are all transactions for customer ID 12345?",
"Delete all old records" # This should fail validation
]
for question in test_questions:
result = safe_vanna_query(question)
print(f"\nQuestion: {question}")
print(f"Executed: {result['executed']}")
print("-" * 50)❗ Important: In our federal government deployments, we never execute Vanna-generated SQL automatically. All queries go through validation, and those with confidence < 0.80 require manual review.
Accuracy Expectations by Training Approach
Based on production experience with multiple enterprise systems, here are realistic accuracy expectations:
| Training Level | Setup Time | Query Types | Expected Accuracy | Best Use Case |
|---|---|---|---|---|
| DDL Only | 1-4 hours | Simple SELECT, basic filters | 40-60% | POC, demos, very simple schemas |
| DDL + Documentation | 1-3 days | Business queries, moderate complexity | 60-75% | Development environments, internal tools |
| DDL + Docs + 20 Queries | 1-2 weeks | Standard business reports | 65-80% | Production with human validation |
| DDL + Docs + 50+ Queries | 3-4 weeks | Complex business logic | 70-85% | Enterprise production systems |
| Full Training + Validation | 2-3 months | Mission-critical reporting | 75-90% | Government/financial systems |
ℹ️ Note: These ranges reflect observed patterns in production systems. Your results will vary based on schema complexity, query diversity, and domain specificity.
Accuracy by Query Complexity:
**Simple Filtering and Aggregation:** 80-90%
- "Show top 10 customers by revenue"
- "Count orders by status"
- "What's the average order value this month?"
**Multi-Table Joins (Common Patterns):** 70-85%
- "Show customer orders with product details"
- "Revenue by sales rep and region"
- "Top products by category and quarter"
**Complex Business Logic:** 60-75%
- "Customers exceeding quota but under forecast"
- "Commission calculations with multi-tier rules"
- "Year-over-year growth by custom segments"
**Ad-hoc Analytical Queries:** 50-70%
- Novel question structures
- Unusual data combinations
- First-time business questionsCommon Training Mistakes and Solutions
After implementing Vanna across multiple enterprise environments, here are the most common mistakes and how to avoid them.
Mistake 1: Over-Training on Edge Cases
Problem: Spending too much time training rare, complex queries while neglecting common business questions.
Impact: Good accuracy on unusual queries, poor performance on everyday needs.
Solution: Follow the 80/20 rule for training priorities.
# Training priority framework
training_priorities = {
"high_priority": [
# WHY: Train what users ask most frequently
"Revenue reporting queries",
"Customer status inquiries",
"Basic aggregations and filtering",
"Standard business metrics"
],
"medium_priority": [
# WHY: Add these after core patterns are solid
"Cross-functional reporting",
"Trend analysis queries",
"Complex joins",
"Historical comparisons"
],
"low_priority": [
# WHY: Add only if frequently requested
"One-off analytical queries",
"System administration queries",
"Data migration scripts"
]
}💡 Tip: In our government systems, we analyzed 6 months of actual user queries to identify the top 50 patterns. Training these patterns first gave us 70% accuracy coverage for 85% of user questions.
Mistake 2: Insufficient Business Context
Problem: Training only technical schema without explaining business meaning.
Impact: Vanna generates syntactically correct SQL that answers the wrong business question.
Solution: Invest heavily in documentation training for domain-specific terminology.
Example:
# ❌ INSUFFICIENT: Only DDL
CREATE TABLE orders (
order_id INT PRIMARY KEY,
status VARCHAR(20)
);
# ✅ BETTER: DDL with business context
"""
CREATE TABLE orders (
order_id INT PRIMARY KEY,
status VARCHAR(20) -- Values: PENDING, CONFIRMED, SHIPPED, DELIVERED, CANCELLED
);
-- Status Workflow
-- PENDING: Order received, payment not confirmed
-- CONFIRMED: Payment confirmed, awaiting fulfillment
-- SHIPPED: Order in transit to customer
-- DELIVERED: Order received by customer
-- CANCELLED: Order cancelled (payment refunded if applicable)
"""Mistake 3: Ignoring Data Quality Issues
Problem: Training on production data without documenting known data quality problems.
Impact: Vanna generates queries that fail due to NULL values, inconsistent data, or missing records.
Solution: Document data quality issues in your training.
# WHY: Known data issues need explicit documentation
# HOW: Tell Vanna how to handle missing/inconsistent data
data_quality_notes = """
-- Known Data Quality Issues
-- WHY: Pre-2020 data has different quality standards
1. customers.annual_revenue: NULL for ~15% of records (primarily pre-2020 data)
SOLUTION: Use COALESCE(annual_revenue, 0) or filter with IS NOT NULL
2. sales_transactions.region_id: Historical records use old region code system
SOLUTION: Use COALESCE(new_region_id, legacy_region_id) for complete coverage
3. products.discontinued_date: Not consistently populated
SOLUTION: Check both discontinued_date IS NULL AND product_status = 'ACTIVE'
-- Query Adjustments for Data Quality
-- Always filter NULL annual_revenue when doing revenue-based segmentation:
WHERE annual_revenue IS NOT NULL
-- Use proper region mapping:
COALESCE(r.new_region_id, r.legacy_region_id) as region_id
-- Check for current products correctly:
WHERE (p.discontinued_date IS NULL OR p.discontinued_date > GETDATE())
AND p.product_status = 'ACTIVE'
"""
vn.train(documentation=data_quality_notes)Troubleshooting Guide
Issue 1: Low Confidence Scores
Symptoms:
- Confidence scores consistently below 0.70
- Vanna frequently says “I don’t know how to answer that”
- Generated SQL is wildly incorrect
Causes:
- Question doesn’t match training examples
- Insufficient query pattern training
- Missing business context documentation
Solutions:
# 1. Analyze the question and retrieve similar training examples
def diagnose_low_confidence(question: str):
"""
WHY: Understanding why confidence is low helps improve training
HOW: Check what training examples are being retrieved
"""
# Get the retrieved training examples for this question
# (Vanna API may provide this - check your version)
similar_examples = vn.get_similar_training_data(question)
print(f"Question: {question}")
print(f"\nRetrieved training examples:")
for i, example in enumerate(similar_examples[:3], 1):
print(f"\n{i}. Similarity: {example.get('similarity', 'N/A')}")
print(f" Question: {example.get('question', 'N/A')}")
# If similarity scores are all low (< 0.6), need more training
if all(ex.get('similarity', 0) < 0.6 for ex in similar_examples):
print("\n⚠️ No similar training examples found")
print(" Recommendation: Add training examples for this question type")
# 2. Add similar training examples
def improve_coverage(question: str, sql: str):
"""
WHY: Add variations of the same question type
HOW: Create multiple phrasings of similar questions
"""
# Add the original question
vn.train(question=question, sql=sql)
# Add variations to improve retrieval
variations = [
# Different phrasing of same question
"Show me " + question.lower(),
"Can you " + question.lower() + "?",
"I need to see " + question.lower()
]
for variation in variations:
vn.train(question=variation, sql=sql)Issue 2: Performance Problems
Symptoms:
- Vanna generates SQL without WHERE clauses on large tables
- Queries timeout or consume excessive resources
- Result sets are too large
Causes:
- No training examples showing proper filtering
- Missing performance considerations in documentation
Solutions:
# WHY: Train Vanna to always include appropriate filters
# HOW: Document performance requirements
performance_guidelines = """
-- Query Performance Guidelines
-- WHY: Large tables need filtering to prevent full scans
Large Tables (require WHERE clauses):
- sales_transactions (50M+ rows) - ALWAYS filter by date
- audit_log (100M+ rows) - ALWAYS filter by date and entity
- user_activities (200M+ rows) - ALWAYS filter by date and user
Standard Date Filters:
- Current month: WHERE transaction_date >= DATEADD(month, DATEDIFF(month, 0, GETDATE()), 0)
- Last 90 days: WHERE transaction_date >= DATEADD(day, -90, GETDATE())
- Current year: WHERE YEAR(transaction_date) = YEAR(GETDATE())
Maximum Result Set Guidelines:
- Reporting queries: Use TOP 100 or pagination
- Dashboard queries: Aggregate data, don't return raw rows
- Exports: Require explicit date range in question
"""
vn.train(documentation=performance_guidelines)
# Train examples that demonstrate proper filtering
good_performance_example = """
-- Question: Show recent high-value transactions
-- WHY: This demonstrates proper filtering on large tables
SELECT TOP 100
t.transaction_id,
t.transaction_date,
t.sale_amount,
c.company_name
FROM sales_transactions t
JOIN customers c ON t.customer_id = c.customer_id
WHERE t.transaction_date >= DATEADD(day, -30, GETDATE()) -- Required filter
AND t.sale_amount > 10000 -- High-value filter
ORDER BY t.transaction_date DESC, t.sale_amount DESC
"""
vn.train(
question="Show recent high-value transactions",
sql=good_performance_example
)Issue 3: Incorrect Business Logic
Symptoms:
- SQL is syntactically correct but answers wrong business question
- Calculations don’t match expected results
- Wrong aggregation levels or grouping
Causes:
- Business rules not documented in training
- Ambiguous terminology in questions
Solutions:
# WHY: Explicit business rule documentation prevents misinterpretation
# HOW: Define calculations and business logic clearly
business_logic = """
-- Revenue Calculations
-- WHY: Different revenue types have different calculation rules
Recognized Revenue (completed sales):
- SUM(sale_amount) WHERE payment_status = 'CONFIRMED'
AND ship_date IS NOT NULL
AND ship_date < DATEADD(day, -30, GETDATE())
Pending Revenue (not yet recognized):
- SUM(sale_amount) WHERE payment_status = 'CONFIRMED'
AND (ship_date IS NULL OR ship_date >= DATEADD(day, -30, GETDATE()))
Total Sales (all orders regardless of status):
- SUM(sale_amount) with no status filter
-- Customer Lifetime Value (CLV)
-- WHY: CLV only includes recognized revenue
CLV = SUM(recognized_revenue) per customer over all time
Calculation:
SELECT
customer_id,
SUM(sale_amount) as lifetime_value
FROM sales_transactions
WHERE payment_status = 'CONFIRMED'
AND ship_date < DATEADD(day, -30, GETDATE())
GROUP BY customer_id
"""
vn.train(documentation=business_logic)FAQ Section
❓ How many training examples do I need?
For basic functionality: 20-30 query patterns covering common questions For production deployment: 50-100 query patterns including edge cases For mission-critical systems: 100+ patterns with comprehensive coverage
Quality matters more than quantity. Ten well-documented, diverse examples are better than 50 similar ones.
❓ Why is Vanna’s accuracy lower than I expected?
Common causes:
- Insufficient query pattern training - DDL alone isn’t enough
- Questions outside training scope - Users ask questions you didn’t train for
- Ambiguous questions - “Show me revenue” could mean 10 different things
- Complex business logic - Multi-step calculations need explicit examples
Solution: Analyze failed queries, add training examples for those patterns.
❓ Can Vanna learn from my corrections?
Yes, with proper implementation. When Vanna generates incorrect SQL:
- Correct the SQL manually
- Train the corrected version:
vn.train(question=original_question, sql=corrected_sql)- This becomes a new training example for future similar questions
Build a feedback loop where corrections automatically become training data.
❓ How much storage does Vanna training require?
Based on typical enterprise deployments:
- Vector embeddings: ~1KB per training example
- 50 query patterns: ~50KB
- 100 query patterns: ~100KB
- Full enterprise training (500+ examples): ~500KB-1MB
Storage is minimal. The time investment in creating quality training data is the real cost.
❓ What happens when my schema changes?
You need to retrain affected portions:
- New tables: Train DDL for new tables
- New columns: Update relevant DDL training
- Modified relationships: Update join examples
- Business logic changes: Update documentation training
Implement a training update process as part of your schema change workflow.
❓ Should I use the same Vanna model for multiple databases?
It depends on schema similarity:
- Similar schemas (same domain): Yes, shared training improves all
- Different schemas (different domains): No, separate models prevent confusion
- Multi-tenant (same schema, different data): Yes, one model with tenant-aware queries
In our government systems, we use separate models for each functionally distinct database.
❓ How do I handle questions Vanna can’t answer?
Implement a fallback strategy:
def handle_vanna_query(question: str): result = vn.ask(question) if result.confidence < 0.70: # Low confidence - route to expert return { "sql": result.sql, "confidence": result.confidence, "action": "MANUAL_REVIEW_REQUIRED", "message": "This question requires expert review" } elif result.confidence < 0.85: # Medium confidence - show SQL for approval return { "sql": result.sql, "confidence": result.confidence, "action": "APPROVE_BEFORE_EXECUTE", "message": "Please review this SQL before execution" } else: # High confidence - execute with validation return execute_with_validation(result.sql)
Production Deployment Checklist
Before deploying Vanna in production, verify you’ve completed these steps:
✅ Training Completeness
- Core business entities trained (minimum 20 tables with meaningful comments)
- Business rules documented (calculation logic, valid values, workflows)
- 50+ common query patterns trained (covering 80% of typical user questions)
- Edge cases documented (data quality issues, special handling requirements)
- Cross-schema relationships documented (if applicable)
- Legacy system mappings provided (if applicable)
✅ Validation Framework
- Confidence thresholds established (recommend 0.70+ for auto-execution)
- Query safety checks implemented (prevent DROP, DELETE, UPDATE operations)
- Performance limits configured (max rows, join complexity, execution timeout)
- Manual review process defined (for low-confidence queries)
- Error handling implemented (graceful failures, user feedback)
- Testing completed (validation logic tested with good and bad queries)
✅ Security Measures
- Sensitive data sanitized (no PII, SSN, credentials in training data)
- Role-based access controls implemented (users only see authorized data)
- Audit logging enabled (track all generated and executed queries)
- Data masking configured (for non-production environments)
- Read-only database access (Vanna should only SELECT, never modify data)
- SQL injection prevention (parameterized queries, input validation)
✅ Monitoring and Maintenance
- Query performance tracking (execution time, resource usage)
- User feedback collection (thumbs up/down, accuracy ratings)
- Regular retraining schedule (weekly/monthly review and additions)
- Error analysis process (categorize and address failure patterns)
- Confidence score monitoring (track average confidence over time)
- Usage analytics (most common questions, success rates)
✅ Documentation and Support
- User documentation created (how to ask good questions)
- Support process defined (how to report issues, request new patterns)
- Training data version control (track what was trained when)
- Rollback procedure documented (how to revert to previous training state)
Setting Realistic Expectations
After implementing Vanna across multiple enterprise environments, here’s what you should expect:
When Vanna Works Best
Ideal Use Cases:
- Repetitive reporting queries with consistent patterns
- Standard business metrics and KPIs
- Well-documented database schemas with clear naming
- Users who understand SQL and can validate output
- Environments where 70-85% accuracy is acceptable with human oversight
Example Success Story: In our government deployment, Vanna handles 75% of standard monthly reports automatically, freeing analysts to focus on custom analysis. The key was thorough training on the 50 most common report types.
When Vanna Struggles
Challenging Scenarios:
- First-time questions with no similar training examples
- Complex multi-step analysis requiring domain expertise
- Ambiguous questions with multiple valid interpretations
- Schemas with poor naming or limited documentation
- Business logic that changes frequently
ℹ️ Note: In my experience, trying to force Vanna to handle these scenarios leads to frustration. Better to be explicit about what Vanna can and can’t do.
When to Use Manual SQL
Human-Written SQL Is Better For:
- One-off complex analysis requiring creative problem-solving
- Performance-critical queries needing optimization
- Queries requiring deep domain expertise or judgment calls
- Data migration or system administration tasks
- Situations where errors have serious consequences
💡 Tip: Position Vanna as an assistant that handles routine queries, not a replacement for SQL expertise. This sets appropriate expectations and leads to higher user satisfaction.
Ongoing Maintenance and Improvement
Successful Vanna implementations require continuous improvement based on real usage patterns.
Feedback Loop Implementation
# WHY: Learning from production queries improves accuracy over time
# HOW: Collect feedback, analyze patterns, add training
class VannaMonitor:
def __init__(self):
self.query_log = []
self.feedback_log = []
def log_query_performance(
self,
question: str,
sql: str,
execution_time: float,
accuracy_rating: int,
user_feedback: str = None
):
"""
Track query performance for continuous improvement
WHY: Identify what works and what doesn't
HOW: Log every query with metadata for analysis
"""
self.query_log.append({
"timestamp": datetime.now(),
"question": question,
"generated_sql": sql,
"execution_time": execution_time,
"accuracy_rating": accuracy_rating, # 1-5 scale
"user_feedback": user_feedback
})
def identify_training_gaps(self) -> List[Dict]:
"""
Analyze query logs to identify areas needing more training
WHY: Focus training efforts where they'll have most impact
HOW: Find patterns in low-accuracy queries
"""
low_accuracy_patterns = []
# Group queries by similarity (simplified for example)
for entry in self.query_log:
if entry["accuracy_rating"] < 3: # Below average accuracy
low_accuracy_patterns.append({
"question_pattern": entry["question"],
"issue": "Low accuracy",
"suggested_training": "Add more similar examples",
"frequency": 1 # Count how often this pattern fails
})
return low_accuracy_patterns
def generate_training_recommendations(self) -> List[str]:
"""
Generate actionable training recommendations
WHY: Convert analysis into specific action items
HOW: Prioritize by impact and frequency
"""
gaps = self.identify_training_gaps()
recommendations = []
# Sort by frequency to prioritize common failures
gaps.sort(key=lambda x: x.get("frequency", 0), reverse=True)
for gap in gaps[:10]: # Top 10 priorities
recommendations.append(
f"Train more examples for: {gap['question_pattern']}"
)
return recommendations
# Usage in production
monitor = VannaMonitor()
# After each query execution
monitor.log_query_performance(
question="Show top customers by revenue",
sql="SELECT TOP 10...",
execution_time=0.45,
accuracy_rating=5,
user_feedback="Perfect, exactly what I needed"
)
# Weekly review process
training_recommendations = monitor.generate_training_recommendations()
for rec in training_recommendations:
print(f"📝 {rec}")Monthly Review Process
Week 1: Analyze Performance
- Review query logs for patterns
- Identify low-accuracy query types
- Measure average confidence scores
Week 2: Identify Gaps
- Group failed queries by category
- Prioritize by business impact
- Create list of needed training examples
Week 3: Add Training
- Develop new query patterns
- Update documentation for new business rules
- Add DDL for schema changes
Week 4: Validate Improvements
- Retest previously failed queries
- Measure accuracy improvements
- Update user documentation
Security Considerations
Enterprise Vanna deployments must address several critical security concerns.
Security Implementation
# WHY: Production systems need comprehensive security
# HOW: Multi-layered security controls
def secure_training_pipeline():
"""
Implement security best practices for Vanna training
WHY: Protect sensitive data and prevent unauthorized access
HOW: Sanitize, validate, and control access
"""
# 1. Data sanitization - remove sensitive information
# WHY: Training data shouldn't contain real PII
sensitive_patterns = [
r'SSN|Social Security',
r'Password|pwd',
r'Credit Card|CC|card_number',
r'Bank Account|account_number',
r'\d{3}-\d{2}-\d{4}', # SSN format
r'\d{16}', # Credit card format
]
def sanitize_training_data(data: str) -> str:
"""Remove sensitive information from training data"""
sanitized = data
for pattern in sensitive_patterns:
sanitized = re.sub(pattern, '[REDACTED]', sanitized, flags=re.IGNORECASE)
return sanitized
# 2. Use synthetic data for training when possible
# WHY: Avoid real customer/employee data in examples
synthetic_data_examples = """
-- Use realistic but fake data for training
-- Customer: ACME Corp, Contoso Ltd (not real company names)
-- Amounts: Round numbers (100, 1000, 10000)
-- Dates: Use relative dates (DATEADD, GETDATE())
-- IDs: Use placeholder IDs (1, 2, 3)
Example:
SELECT * FROM customers
WHERE company_name = 'ACME Corp' -- Synthetic company
AND annual_revenue > 1000000 -- Round number
AND created_date >= DATEADD(year, -1, GETDATE()) -- Relative date
"""
# 3. Implement role-based access controls
# WHY: Users should only generate queries for data they can access
user_access_context = """
-- Role-based query restrictions
-- WHY: Enforce data access policies through Vanna
-- Sales Reps: Can only see their own customers and territories
WHERE sales_rep_id = @CurrentUserId
-- Managers: Can see their team's data
WHERE sales_rep_id IN (
SELECT employee_id FROM employees WHERE manager_id = @CurrentUserId
)
-- Executives: Can see all data (no filter required)
-- Finance: Can see financial data but not personal customer details
SELECT
customer_id, -- OK
SUM(sale_amount), -- OK
-- email_address, -- RESTRICTED for Finance role
-- phone_number -- RESTRICTED for Finance role
"""
return {
"sanitized_examples": synthetic_data_examples,
"access_controls": user_access_context
}⚠️ Warning: Never train Vanna with production data containing PII without proper sanitization. In federal systems, this can violate FISMA and other compliance requirements.
Security Checklist:
- All training data reviewed for sensitive information
- Synthetic data used where possible
- Role-based access controls implemented
- Read-only database credentials configured
- Audit logging enabled for all queries
- Data masking applied in non-production environments
- Regular security reviews scheduled
Conclusion
Training Vanna AI effectively requires treating it as what it is: a powerful assistant that augments human SQL expertise rather than replacing it. Success depends on three critical factors:
- Strategic Training - Focus on business value, not edge cases
- Robust Validation - Never execute without proper checks
- Realistic Expectations - Understand what Vanna can and can’t do
With proper training and validation, Vanna can handle a significant portion of common business queries accurately, freeing up skilled developers to focus on complex analytical work. However, remember that confidence scores and validation are essential - never execute Vanna-generated SQL without proper review in production environments.
The investment in comprehensive training pays dividends in user productivity and data accessibility, but only when implemented with the security, validation, and maintenance practices outlined above.
Key Takeaways
- ✅ Start with core entities - Build foundation before adding complexity
- ✅ Document business logic - Schema alone isn’t enough
- ✅ Train query patterns - Examples are more valuable than descriptions
- ✅ Validate everything - Confidence scores and safety checks are mandatory
- ✅ Iterate continuously - Learn from production usage to improve accuracy
- ❌ Don’t expect 100% - Realistic targets are 65-85% with proper training
- ❌ Don’t skip validation - Always verify before execution
- ❌ Don’t train once and forget - Ongoing maintenance is required
Next Steps
Immediate Actions:
- Identify your 10-20 core database tables for initial training
- Document 3-5 key business rules and calculations
- Collect 10 most common user questions
- Set up Vanna with basic DDL training
- Implement validation framework before production use
Resources:
- Download our Vanna Training Template with enterprise-ready validation frameworks
- Check out Part 1 of this series for setup and architecture guidance
- Review Measuring Vanna AI Accuracy for comprehensive testing strategies
Need Help?
Implementing Vanna for an enterprise system with complex business logic, multi-tenant requirements, or compliance constraints? I’ve helped organizations navigate these exact challenges, from initial architecture through production deployment. Let’s discuss your specific requirements.
Schedule a consultation to discuss your Vanna AI implementation.
Have questions about implementing Vanna in your enterprise environment? Share your challenges in the comments or reach out directly. I’m always happy to discuss specific use cases and implementation strategies.