Skip to main content

Getting Started with Keebo Warehouse Optimization for Databricks

Preview

Warehouse Optimization for Databricks is currently in preview. Reach out to Keebo support for access and onboarding.

What Are the Prerequisites?

Before connecting Warehouse Optimization to a Databricks environment, the following prerequisites must be in place:

  • Unity Catalog enabled — The Databricks workspace must have Unity Catalog enabled. Follow the official Databricks documentation if Unity Catalog is not yet enabled.
  • Account Admin and Metastore Admin permissions — The user completing the Databricks configuration phase must be an Account Admin for the Databricks account and a Metastore Admin on the workspaces being connected.
  • Databricks CLI (optional) — Some configuration steps can be completed via the Databricks CLI. If CLI-based setup is preferred, ensure it is installed and authenticated. Refer to the Databricks documentation for installing the CLI and authenticating with OAuth M2M.

What Access Does Warehouse Optimization Require?

Warehouse Optimization (KWO) accesses usage metadata through the Databricks REST API and a set of views that read from Databricks system tables. It does not access organization data — every piece of data retrieved serves a specific role in the optimization process.

REST API — real-time access to warehouse state and query history:

  • Warehouse metadata (size, current state)
  • Recent query history (accessed via a read-only view on the system table created during setup)
  • Warehouse configuration updates (to apply optimizations)

System telemetry — imported daily. Keebo accesses system tables in read-only mode via views created during setup:

  • Warehouse events and configuration changes
  • Billable usage
  • Query history

Export volume — KWO writes Parquet files to a Unity Catalog managed volume for batch workload export.

Warehouse permissions — KWO requires CAN_MANAGE on each SQL warehouse selected for optimization. This enables resizing and configuration changes without accessing warehouse data.

For the full permissions breakdown and network configuration details, see Databricks Security Setup.

What Is the Onboarding Process?

Onboarding is a two-phase process. Phase 1 must be completed by a Databricks admin before Phase 2 can begin.

Phase 1 — Databricks Environment Configuration

A Databricks admin configures the environment to grant Warehouse Optimization the access it needs:

  1. Create a Keebo service principal in the Databricks Account Console and generate OAuth credentials.
  2. Assign the service principal the User role on each workspace to connect.
  3. Grant the service principal CAN_MANAGE on each warehouse to optimize.
  4. Create the export volume.
  5. Create usage views that give Keebo read-only access to system table data.
  6. Configure network access if the account uses IP access lists.

At the end of Phase 1, the admin will have collected the credentials and resource identifiers needed for Phase 2. See Phase 1 — Databricks Environment Configuration for step-by-step instructions.

Phase 2 — Keebo Onboarding Wizard

Once Phase 1 is complete, a Keebo admin signs in to the Keebo portal and completes the onboarding wizard:

  1. Enter the service principal credentials (Account ID, Client ID, Client Secret).
  2. Register each workspace by URL.
  3. Register each warehouse by ID.
  4. Verify schema access and run the provided SQL in the Databricks SQL Editor.

See Phase 2 — Keebo Onboarding Wizard for step-by-step instructions.