Skip to main content

Keebo Warehouse Optimization Security Setup for Databricks

Warehouse Optimization (KWO) requires a dedicated Databricks service principal, workspace permissions, and a set of usage views before onboarding. This page describes what access is required and why. For step-by-step configuration instructions, see Phase 1 — Databricks Environment Configuration.

What Metadata Does Warehouse Optimization Access?

KWO accesses usage metadata through the Databricks REST API and a set of views that read from Databricks system tables. It does not access organization data.

Real-time endpoints:

  • Get warehouse info — retrieves metadata about an optimized warehouse, such as its size
  • List queries — retrieves recent query history as input to optimization algorithms
  • Update warehouse — makes configuration changes to apply calculated optimizations

Batch usage views (created during setup, reading from system tables):

KWO only has SELECT on these views — direct access to the system catalog is not required and should not be granted.

KWO also writes Parquet files to a Unity Catalog managed volume at keebo.kwo.export for batch workload export.

How Is This Metadata Used?

KWO's patented algorithms use this metadata to continuously adapt to changing conditions in the Databricks environment. These fields provide insight into:

  • Workload distribution
  • Resource utilization
  • Query behaviors

Each field plays a role in optimization decisions. When KWO detects an opportunity — such as a period of low utilization — it autonomously triggers actions to reduce costs without impacting performance. If query latencies increase, KWO detects the change and increases warehouse size to maintain performance.

What Permissions Are Required?

REST API — real-time access to warehouse state and query history:

EndpointMethodPurpose
/api/2.0/sql/warehouses/{id}GETRead warehouse metadata (size, state)
/api/2.0/sql/history/queriesGETRead recent query history
/api/2.0/sql/warehouses/{id}/editPOSTUpdate warehouse configuration

System table views — read-only access via views created during setup:

ViewSystem table sourceAction
Warehouse eventssystem.compute.warehouse_eventsSELECT
Warehousessystem.compute.warehousesSELECT
Billable usagesystem.billing.usageSELECT
Query historysystem.query.historySELECT

Export volume — batch workload export to Unity Catalog:

ResourcePermission
Catalog (keebo)USE CATALOG
Schema (keebo.kwo)USE SCHEMA
Volume (keebo.kwo.export)READ VOLUME, WRITE VOLUME

Workspace access — required on each workspace being connected:

PermissionLevel
Workspace roleUser

Warehouse actions — applied to each SQL warehouse selected for optimization:

PermissionPurpose
CAN_MANAGERead state, resize, and change configuration

How Is Network Access Configured?

KWO connects to Databricks from a fixed set of IP addresses. If the account or workspace has IP access lists enabled, these must be added to the allow list so that the Keebo service can reach the Databricks REST API.

IP AddressIP Address
34.123.209.15935.232.243.181
34.134.199.9834.41.176.165
34.136.192.18935.224.13.139
34.123.121.25134.29.108.17
35.226.95.6434.30.123.135

Authentication: OAuth M2M (service principal with Client ID and Secret).

Network options: IP access lists (Enterprise tier) or Private Link (AWS/Azure/GCP).

note

IP access lists require the Enterprise pricing tier in Databricks. If IP access lists are not enabled, no changes are needed.

For configuration instructions, see How Is Network Access Configured? in the Databricks Configuration guide.

Keebo supports connecting to Databricks workspaces that are configured with AWS PrivateLink, Azure Private Link, or GCP Private Service Connect. When a workspace has public access disabled and is only reachable via Private Link, Keebo connects through its own VPC endpoints instead of the public internet.

Keebo VPC Endpoints (AWS)

RegionVPC Endpoint ID
us-east-1vpce-0d414758482d485d7
us-east-2vpce-03cfbba8d0539eb51
note

If your Databricks workspace is deployed in an AWS region not listed above, contact Keebo support to discuss availability.

Setup

To use Private Link with Keebo, two things must be in place:

  1. Your Databricks workspace must accept connections from Keebo's VPC endpoint. Follow the Databricks documentation for your cloud provider to register Keebo's endpoint as an allowed Private Link connection:

  2. Contact Keebo support to register your Databricks workspace hostname. Keebo must add your workspace hostname to its internal DNS configuration so that traffic routes through the Private Link connection. Without this step, DNS resolution will fail and Keebo will not be able to reach the workspace.

Connectivity Diagnostics

When a workspace is connected, Keebo automatically runs a connectivity check that tests DNS resolution, TCP reachability, authentication, and workspace API access. If the workspace requires Private Link and the hostname has not yet been registered on Keebo's side, the diagnostic will report a DNS resolution failure with a Private Link Required remediation hint. If you see this, reach out to Keebo support with your workspace URL to complete the setup.