Keebo Warehouse Optimization Security Setup for Databricks
Warehouse Optimization (KWO) requires a dedicated Databricks service principal, workspace permissions, and a set of usage views before onboarding. This page describes what access is required and why. For step-by-step configuration instructions, see Phase 1 — Databricks Environment Configuration.
What Metadata Does Warehouse Optimization Access?
KWO accesses usage metadata through the Databricks REST API and a set of views that read from Databricks system tables. It does not access organization data.
Real-time endpoints:
- Get warehouse info — retrieves metadata about an optimized warehouse, such as its size
- List queries — retrieves recent query history as input to optimization algorithms
- Update warehouse — makes configuration changes to apply calculated optimizations
Batch usage views (created during setup, reading from system tables):
- Warehouse events table — warehouse state changes
- Warehouses table — warehouse configuration changes, such as size changes
- Billable usage — used to compute savings generated by optimizations
- Query history — input for optimization models
KWO only has SELECT on these views — direct access to the system catalog is not required and should not be granted.
KWO also writes Parquet files to a Unity Catalog managed volume at keebo.kwo.export for batch workload export.
How Is This Metadata Used?
KWO's patented algorithms use this metadata to continuously adapt to changing conditions in the Databricks environment. These fields provide insight into:
- Workload distribution
- Resource utilization
- Query behaviors
Each field plays a role in optimization decisions. When KWO detects an opportunity — such as a period of low utilization — it autonomously triggers actions to reduce costs without impacting performance. If query latencies increase, KWO detects the change and increases warehouse size to maintain performance.
What Permissions Are Required?
REST API — real-time access to warehouse state and query history:
| Endpoint | Method | Purpose |
|---|---|---|
/api/2.0/sql/warehouses/{id} | GET | Read warehouse metadata (size, state) |
/api/2.0/sql/history/queries | GET | Read recent query history |
/api/2.0/sql/warehouses/{id}/edit | POST | Update warehouse configuration |
System table views — read-only access via views created during setup:
| View | System table source | Action |
|---|---|---|
| Warehouse events | system.compute.warehouse_events | SELECT |
| Warehouses | system.compute.warehouses | SELECT |
| Billable usage | system.billing.usage | SELECT |
| Query history | system.query.history | SELECT |
Export volume — batch workload export to Unity Catalog:
| Resource | Permission |
|---|---|
Catalog (keebo) | USE CATALOG |
Schema (keebo.kwo) | USE SCHEMA |
Volume (keebo.kwo.export) | READ VOLUME, WRITE VOLUME |
Workspace access — required on each workspace being connected:
| Permission | Level |
|---|---|
| Workspace role | User |
Warehouse actions — applied to each SQL warehouse selected for optimization:
| Permission | Purpose |
|---|---|
CAN_MANAGE | Read state, resize, and change configuration |
How Is Network Access Configured?
KWO connects to Databricks from a fixed set of IP addresses. If the account or workspace has IP access lists enabled, these must be added to the allow list so that the Keebo service can reach the Databricks REST API.
| IP Address | IP Address |
|---|---|
34.123.209.159 | 35.232.243.181 |
34.134.199.98 | 34.41.176.165 |
34.136.192.189 | 35.224.13.139 |
34.123.121.251 | 34.29.108.17 |
35.226.95.64 | 34.30.123.135 |
Authentication: OAuth M2M (service principal with Client ID and Secret).
Network options: IP access lists (Enterprise tier) or Private Link (AWS/Azure/GCP).
IP access lists require the Enterprise pricing tier in Databricks. If IP access lists are not enabled, no changes are needed.
For configuration instructions, see How Is Network Access Configured? in the Databricks Configuration guide.
Private Link
Keebo supports connecting to Databricks workspaces that are configured with AWS PrivateLink, Azure Private Link, or GCP Private Service Connect. When a workspace has public access disabled and is only reachable via Private Link, Keebo connects through its own VPC endpoints instead of the public internet.
Keebo VPC Endpoints (AWS)
| Region | VPC Endpoint ID |
|---|---|
us-east-1 | vpce-0d414758482d485d7 |
us-east-2 | vpce-03cfbba8d0539eb51 |
If your Databricks workspace is deployed in an AWS region not listed above, contact Keebo support to discuss availability.
Setup
To use Private Link with Keebo, two things must be in place:
-
Your Databricks workspace must accept connections from Keebo's VPC endpoint. Follow the Databricks documentation for your cloud provider to register Keebo's endpoint as an allowed Private Link connection:
-
Contact Keebo support to register your Databricks workspace hostname. Keebo must add your workspace hostname to its internal DNS configuration so that traffic routes through the Private Link connection. Without this step, DNS resolution will fail and Keebo will not be able to reach the workspace.
Connectivity Diagnostics
When a workspace is connected, Keebo automatically runs a connectivity check that tests DNS resolution, TCP reachability, authentication, and workspace API access. If the workspace requires Private Link and the hostname has not yet been registered on Keebo's side, the diagnostic will report a DNS resolution failure with a Private Link Required remediation hint. If you see this, reach out to Keebo support with your workspace URL to complete the setup.