Agentic DE, AI, Data Engineering, DE, Snowflake

Agentic Data Foundry, Part 2: The Deployment Problem

#OnTapToday — This is Part 2 of a series. 

Part 1 introduced the Agentic Data Foundry: the idea that data engineers should stop writing transformation code and start describing intent — letting AI agents handle the SQL. If you haven’t read that, it’s worth a few minutes.

This post asks the question that every engineer eventually asks after seeing the demo: “Okay, but how do you actually ship it?”


The Gap Between “Generated” and “Deployed”

The Agentic Data Foundry makes a compelling promise: an AI agent reads your Schema Contracts and Transformation Directives, generates CREATE DYNAMIC TABLE DDL, validates it, and materializes your Silver and Gold layers — automatically.

I’ve watched that demo land in dozens of rooms. The reaction is almost always the same. Executives are energized. Architects lean in. And then — quietly — an experienced engineer raises their hand:

“That’s impressive. But I have 15 environments, a change control board, and three compliance teams. How does this get from a notebook to production?”

It’s the right question. Generating SQL is half the problem. Deploying SQL — safely, repeatably, with a full audit trail, across dev and staging and production — is the other half. And for a long time, we didn’t have a clean answer.


What “Ship It” Means for a Data Platform

In application engineering, Infrastructure-as-Code is table stakes. You write Terraform, you run terraform plan, you review the diff, you run terraform apply. The platform determines what needs to change. You approve. It executes atomically.

Data platforms have historically been the exception. We’ve been doing CREATE OR REPLACE TABLE and hoping nothing downstream breaks. Schema changes are deployed through migration scripts that are carefully ordered, one-way, and impossible to easily roll back. Most “change management” processes for databases are really just “a human runs SQL in a specific order and prays.”

The Agentic Data Foundry makes this worse before it makes it better. Because now it’s not even a human running SQL — it’s an AI. The same AI that generates 80-85% accurate DDL on first attempt also occasionally produces a FULL_NAME column that doesn’t exist in the source. You absolutely cannot let that go straight to production.

What we needed was a Plan → Review → Deploy workflow for data objects. The same workflow Terraform provides for infrastructure. Declarative. Diffable. Auditable.


Enter Declarative Database Change Management

The emerging category of Declarative Database Change Management (DCM) brings exactly this model to cloud data platforms. Instead of writing imperative CREATE OR REPLACE statements, you define the desired state of your database objects. The platform computes the diff — what needs to be created, altered, or dropped — and presents you with a plan before touching anything.

The workflow is conceptually identical to Terraform:

# Traditional (imperative) approach:
CREATE OR REPLACE DYNAMIC TABLE SILVER.CUSTOMERS ...;  # ← "Just do it"

# DCM approach:
snow dcm plan    # ← "Here's what will change"
snow dcm deploy  # ← "Now execute the plan"

The plan step produces a structured diff: new objects, modified objects, dropped objects, dependencies that will be invalidated. An engineer reviews it. A CI/CD pipeline gates on it. Only then does anything change in the database.

For the Agentic Data Foundry, this is transformative.


The Agentic + IaC Workflow

With a DCM layer plugged in, the Executor phase doesn’t execute DDL anymore. It writes DDL — to version-controlled definition files. The dry_run mode that previously logged SQL without executing it maps perfectly to a PLAN command that previews all changes before any object is touched.

The full workflow becomes:

Foundry Phase What Happens DCM Role
Trigger Detects new table or schema drift
Planner Assembles context, determines strategy
Executor Generates desired-state definitions Writes to versioned files
Validator Validates DDL correctness PLAN previews all changes
Human Review Engineer reviews in management UI Reviews plan diff
Deploy DEPLOY applies changes atomically
Reflector Extracts learnings from execution

Every agent-generated transformation is now a file — diffable, reviewable, committable to git — before it ever touches a database. Your change control board reviews a plan diff, not raw SQL. Your compliance team has a complete audit trail that traces every materialized table back through the exact reasoning chain that produced it.


Multi-Environment Without the Ceremony

Here’s a practical problem every enterprise faces: the same pipeline needs to run in dev, staging, and production — with different databases, different warehouse sizes, different refresh intervals.

DCM’s template variable system (Jinja2 or equivalent) solves this by parameterizing the things that change across environments:

configurations:
  DEV:
    db_name: FOUNDRY_DEV
    target_freshness: '5 minutes'
    warehouse_size: XSMALL
  PROD:
    db_name: FOUNDRY_PROD
    target_freshness: '1 minute'
    warehouse_size: MEDIUM

The agent-generated definition is environment-agnostic. The deployment configuration is environment-specific. Same Schema Contracts, same Directives, same Learnings drive agent behavior everywhere — only the target database, compute size, and refresh interval vary.

The Foundry’s claim that “environment provisioning is instant” is made real by this pattern. There’s no “we need three weeks to stand up the staging pipeline.” The staging pipeline is the prod pipeline with a different parameter file.


RBAC as Code

The Agentic Data Foundry’s security model relies on role-based access control: agent-generated DDL executes under constrained role privileges, row access policies propagate through the dependency chain, and no generated query can access data the role can’t see.

DCM closes the loop by putting the security model itself under version control. The roles and grants that define what agents can and can’t do are defined declaratively, diffed like any other change, and promoted through the same CI/CD pipeline as the transformations they govern. You’re not just auditing what the agent built — you’re auditing what the agent was allowed to build and proving it hasn’t changed.


What This Changes

The shift from “agent executes DDL directly” to “agent writes files, DCM deploys” isn’t just operational hygiene. It changes the fundamental nature of what the agentic system produces.

Before DCM, the agent’s output is an ephemeral execution. After DCM, the agent’s output is a durable artifact — a versioned file that represents the complete desired state of a transformation. That file can be reviewed, rolled back, branched, compared, and tested independent of the database it targets.

The Agentic Data Foundry doesn’t just automate pipeline construction anymore. It generates infrastructure.


Where We Are

In Part 1, we established that AI agents can replace hand-coded ETL for the 80% of pipeline work that is repetitive and pattern-based. In Part 2, we’ve shown how DCM bridges the gap between “generated” and “deployed” — giving the agentic system the safety, repeatability, and auditability that enterprise deployments require.

But there’s still a gap. DCM handles deployment. Who handles execution standards — testing, documentation, lineage, CI/CD integration that any data engineer can reason about without understanding the underlying agent?

That’s the question Part 3 answers.

Next up: The Determinism Dial: How dbt Makes Agentic Output Production-Grade.

Related Posts Plugin for WordPress, Blogger...

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.