Crawl

Pre-migration intelligence for enterprise data infrastructure.

They catalog your data. Crawl tells you what breaks when you migrate.

Extract business logic from stored procedures, ETL jobs, and warehouse views — the undocumented rules buried in your data stack that block every migration project. Open-source, vendor-neutral, local-first LLM.

GitHubGet UpdatesTry the Free Tool

What Crawl Does

Input: a 200-line stored procedure that nobody on the team wrote.

sp_calculate_customer_churn (confidence: HIGH) ├── Rule 1: Customers inactive >90 days flagged as at-risk ├── Rule 2: Churn score weighted by lifetime value (table: dim_customer) ├── Rule 3: ⚠️ References dim_product_v2 — TABLE DROPPED 2022-06-14 ├── Rule 4: Monthly aggregation via vendor-specific DATEADD syntax └── Triage: CRITICAL (12 downstream dependencies) | MEDIUM migration risk Contradictions found: └── Rule 2 conflicts with sp_calculate_ltv line 47 (different LTV formula)

CLI Commands

crawl scan Connect to a database and discover all stored procs, views, functions crawl extract Extract human-readable business rules using hybrid AST + LLM analysis crawl triage Score each object by criticality, complexity, and migration risk crawl diff Compare extracted logic between environments or time periods crawl export Output to dbt-docs YAML, JSON, or Markdown

The Problem

Every cloud migration hits the same wall: thousands of stored procedures and ETL jobs encoding business rules in vendor-specific dialects that nobody documented. Migration tools can translate your SQL, but they can't tell you what it means — or whether it's even still relevant.

Crawl is Step 0: the pre-migration intelligence layer that runs before you use Datafold, Lakebridge, dbt, or SnowConvert.

Questions Crawl Answers

What do we have? — Inventory with auto-generated business-rule summaries
What does it do? — Human-readable logic, not just column lineage
Is it still alive? — Dead code detection, contradiction flagging
What should we migrate first? — Triage by criticality, complexity, risk
What breaks if we move? — Vendor-specific logic that won't survive a platform change

How It Works

YOUR LEGACY DATA STACK (stored procs, ETL, views) │ ▼ ┌──────────────┐ │ CRAWL │ ← Step 0: Understand & Triage │ (open-source) │ └──────┬───────┘ │ │ outputs: business rules, triage scores, │ migration risk report, dbt-compatible docs │ ┌─────┴─────┬──────────────┬─────────────────┐ ▼ ▼ ▼ ▼ ┌──────┐ ┌────────┐ ┌───────────┐ ┌──────────────┐ │ dbt │ │Datafold│ │Lakebridge │ │ SnowConvert │ │models│ │ Agent │ │(Databricks)│ │ (Snowflake) │ └──────┘ └────────┘ └───────────┘ └──────────────┘ Step 1: Step 1: Step 1: Step 1: Convert Translate Convert to Convert to to dbt SQL Databricks Snowflake

Design Principles

Step 0, not Step 1. Crawl doesn't migrate your code — it tells you what you have so migration tools can do their job.

Vendor-neutral. Works with any source database, any target platform. No lock-in.

Local-first LLM. Enterprise code never needs to leave your environment. Supports Ollama and vLLM out of the box.

Open-source (Apache 2.0). Your understanding of your data belongs to you, not a vendor.

Enterprise Safety

Crawl is designed to connect to enterprise databases safely.

Read-only, always. No writes, no DDL, no DML. Read-only transaction mode enforced.

Catalog-only access. Reads stored procedure source code from system catalogs. Never queries user table contents.

Non-production recommended. Stored procedure source code is identical in staging — there's no reason to connect to prod.

No hammering. Single connection, rate-limited, batched queries, configurable timeouts.

Query allowlisting. Every SQL query is hardcoded and auditable. No dynamic SQL.

Full audit trail. Every query logged for DBA review.

Supported Sources

SourceStatus
PostgreSQL stored proceduresIn Development
Snowflake (views, UDFs, procs, tasks)Planned
Informatica PowerCenter / IICSPlanned
SQL Server stored proceduresPlanned
Oracle PL/SQLPlanned
dbt modelsPlanned

Built By

Digital Rain Technologies. Founded by Augustin Chan, former Development Architect at Informatica (12 years, Fortune 500 data integration across APAC/MENA/Europe).

Follow the Build

Crawl is in early development. Get monthly updates on what shipped, what's next, and lessons from building open-source migration intelligence.

No spam. Unsubscribe anytime.