Kirimana for Databricks
The open-source contract layer for the Databricks Lakehouse. ODCS v3 contracts on disk that AI agents read directly. Unity Catalog pass-through. Mandatory one-line Databricks audit per Kiri action. PR-time governance gates. Guided source-schema introspection on Databricks. Apache-2.0. Azure today; the AWS path is on the roadmap.
For data platforms running on Databricks Lakehouse with Unity Catalog and Workflows. Kirimana sits above them as the contract layer your data platform’s been missing — open source, ecosystem-friendly, anchored in capabilities implemented today. The most mature product path in the project, currently in Private Preview with active design partners on Azure Databricks. An AWS path is on the roadmap from the same codebase.
Files an AI agent can read. Every contract is YAML on disk in git; every Kiri-driven Databricks action emits a single audit line with a correlation id. Drop Claude Code or Cursor into your repo and the agent reads the files and understands your Databricks platform without an API integration.
Built for enterprise scale. Hub-and-spoke governance across dozens of domains, OIDC RBAC pinned to your IdP, multi-environment CI/CD, federation across three transports with fail-closed health. Light enough for any team. A small data team gets the same architecture; the wizard takes you from zero to a contracted bronze layer in an afternoon.
Five things Kirimana adds on Databricks
These are the value-adds that make Databricks customers more Databricks-engaged: more Unity Catalog usage, more Workflows runs, all governed, audit-clean, and contract-driven — anchored in what’s implemented today.
1. Contract-driven Bronze / Silver / Gold
Every Delta table has an ODCS v3 contract upstream. UC stays the catalog of record; Kirimana feeds it owner, classification, lineage, and attribute review-state. Contracts move with the table when domains migrate or merge.
2. Mandatory one-line Databricks audit
Every Kiri → Databricks action — CREATE, INSERT, SHOW, job
submit, classification override — emits a single audit line.
CLI calls and MCP calls are audited identically with the same
correlation id. tail -f databricks-audit.log is the entire demo.
The regulator can read it without us being in the room.
3. Workflows compilation from ODCS
The contract graph compiles to native Databricks Workflows JSON. No third runtime, no parallel orchestrator. Kirimana drives the Workflows your platform team already operates.
4. Fail-closed authoring guardrails
Inferred PII (names, emails, free-text identifiers, classified
attributes) cannot be authored as public without an audited
override and a written reason. Mandatory ownership: every contract
names a real human; the CLI rejects what doesn’t. Default is
refuse, not warn.
5. PR-time governance gates
kiri contract lint, kiri contract validate, and kiri contract diff run in CI before merge. Classification present, owner valid,
lineage resolves, no drift between docs and code. UC stays clean;
governance moves left, into the developer’s flow.
Compliance evidence generators (DORA / EU AI Act / GDPR Art. 17) are on the roadmap. When they ship, the outputs will be structured evidence artefacts, not legal certification. Human attestation and review by qualified personnel remain part of every regulatory framework we cover.
What’s included
- Databricks platform adapter: Delta Lake bronze + silver + gold generation, OAuth M2M auth, typed parameter binding, retry on transient failures
- Unity Catalog pass-through: owner, classification, lineage, attribute review-state pushed bidirectionally so UC stays the metadata surface for users while Kirimana is the source of contract truth
- Mandatory one-line Databricks audit on every Kiri action, CLI and MCP audited identically with a shared correlation id
- Native Databricks Workflows orchestration: DAGs compiled from contracts to native Workflows JSON; no third runtime
- Guided source-schema introspection: sample a live Databricks source, classify columns, propose populated bronze contracts — editable scaffolding, not auto-applied
- Databricks Vault adapter:
${vault:...}resolves to Databricks Secret Scopes - MCP server: Databricks AI Assistants, Claude.ai, Cursor, Continue, Cline read your contracts, classifications, lineage, and release status through the same protocol
- Federation across three transports (in-process, HTTP+ETag,
fs-static) with
health()returning OK / STALE / UNAVAILABLE; federated reads fail closed when a source isn’t reachable - Migration as a verb set: AI-assisted business-vault → ODCS
translation, parity harness with row-hash check,
shadow → diff → promote → rollbackcutover, KEEP / DROP / MERGE / REDESIGN review actions, quarantine reprocess as a typed state machine - Helm chart for the AKS-host: runs the Kirimana control plane in your own AKS cluster, dispatches to the Databricks workspace
kiri databricks setupwizard: interactive provision of service principal, secret scopes, workspace permissions
Files an AI agent can read
This is the difference between Kirimana and other governance tools.
| Tool | Where its metadata lives | What an AI agent has to do |
|---|---|---|
| Atlan | Proprietary catalog | Authenticate to the API, paginate, parse JSON, guess relationships |
| Collibra | Proprietary catalog | Same |
| Unity Catalog | Databricks workspace | API + token, query SQL warehouse, workspace-scoped |
| Microsoft Purview | Azure Purview tenant | API + token, tenant-scoped |
| Kirimana | YAML in git, JSONL audit log, graph lineage | Read the files |
Drop Claude Code into your kirimana repo. It immediately knows: who owns what, what’s PII, what changed last week, what the SLA is, what the lineage is. No special integration. Just files an AI can read.
This becomes a force multiplier when your AI agents are doing real work — drafting contracts, debugging applies, suggesting silver models — because the agents have full context without you wiring it.
Pass-through to Unity Catalog (we feed UC, we don’t replace it)
Kirimana is not a catalog replacement. The Databricks product path treats Unity Catalog as the metadata surface for users; Kirimana is the source of contract truth feeding it.
| Direction | What flows |
|---|---|
| Push to UC | Owner, classification, attribute review-state, contract version, lineage edges |
| Pull from UC | Schema drift detection, observed lineage, downstream usage signals |
| Sync cadence | Every apply + nightly reconciliation; manual kiri catalog sync always available |
Unity Catalog stays the place your analytics engineers and BI team browse. Kirimana stays the place your contracts live and your audit log records.
Integrations available out of the box
- AI assistants: Databricks AI Assistants, Claude.ai, Claude Code, Cursor, Continue.dev, Cline (via MCP)
- Catalogs: Unity Catalog (primary), Snowflake Horizon push, Microsoft Purview push (cross-cloud tenants)
- Ingest: Databricks-native ingestion patterns, Databricks DLT, REST, database direct, and landing-zone ingestion
- Vault: Databricks Secret Scopes, Azure Key Vault, AWS Secrets Manager
- Auth: OIDC, Entra ID, GitHub, Okta, Auth0
- BI: dbt Semantic Layer / MetricFlow / Cube exports; Power BI / Tableau / Qlik connection guides
How to deploy
| Pattern | Cloud | When |
|---|---|---|
| AKS-host + Databricks workspace | Azure | Recommended day-1. Helm chart deploys control plane to AKS; dispatches Workflows to Databricks. |
| EKS-host + Databricks workspace | AWS | On the roadmap. Same product, AWS-native Terraform. |
| Self-host on existing Kubernetes | Any | If you already run K8s elsewhere; chart works on any compliant K8s 1.28+. |
Pricing posture
- Apache-2.0 today, no BSL planned. Full adapter, full Helm chart, full CLI. There is no “community edition” feature-gating in the codebase today; if that ever changes, we’d say so before doing it. See /pricing.
A note on the Databricks ecosystem
Kirimana exists to make Databricks customers more Databricks- engaged. Every value-add above increases your customers’ use of Unity Catalog and Workflows — operationalised through contracts, mandatory ownership, fail-closed authoring, and mandatory audit that make them safe for regulated industries.
Open source, Apache-2.0, and ecosystem-friendly. ODCS aligns with the open standards Databricks already champions (Delta Lake, Unity Catalog OSS, MLflow, Apache Spark). Roadmap target: Databricks Marketplace listing + Databricks App deployment.