Appendix
Warehouse Optimization for Databricks is currently in preview. Reach out to Keebo support for access and onboarding.
Glossary of Terms
DBU (Databricks Billing Unit)
The unit of usage used to measure Databricks compute resources. DBUs are consumed based on the size and runtime of SQL warehouses.
SQL Warehouse
A Databricks compute resource used to run SQL queries. Warehouse Optimization optimizes Databricks SQL Warehouses connected to a workspace.
Serverless SQL Warehouse
A type of Databricks SQL Warehouse where Databricks manages the underlying compute infrastructure. Warehouse Optimization is currently designed to work with Serverless SQL Warehouses.
Workspace
A Databricks environment that houses SQL warehouses, notebooks, jobs, and other resources. Warehouse Optimization connects to one or more workspaces to optimize SQL warehouses.
Service Principal
A non-human identity used by applications to authenticate and interact with Databricks. Warehouse Optimization uses a Databricks service principal with OAuth M2M authentication to connect to the environment.
Unity Catalog
Databricks' data governance solution that provides centralized access control, auditing, and data lineage. Unity Catalog must be enabled on a workspace for Warehouse Optimization to connect.
Keebo Warehouse Optimization for Databricks
Keebo's automation product that optimizes Databircks data warehouse costs and performance.
Keebo Portal
The web interface for Warehouse Optimization, accessible at portal.keebo.ai. This is where connected warehouses are managed, the dashboard is viewed, and optimization settings are configured.
Optimization
An action taken by Warehouse Optimization to change a warehouse's settings — such as resizing to a smaller size during low utilization — to reduce DBU consumption without negatively impacting query performance.
Backoff
A Warehouse Optimization action that reverses a previous optimization by returning a warehouse to its default settings. Warehouse Optimization initiates a backoff when a warehouse's performance meets specific criteria defined as guardrails, such as high query latency.
Guardrail
A performance threshold configured in Warehouse Optimization. When a connected warehouse exceeds this threshold, Warehouse Optimization performs a backoff by stopping all optimizations and returning the warehouse to its default to restore performance.
Auto-Stop
A Databricks SQL Warehouse setting that defines how long (in seconds) the warehouse stays active after the last query before it shuts down. Warehouse Optimization dynamically manages this value to balance cost savings with performance.
Savings Calculation
Warehouse Optimization's estimate of how many DBUs were saved compared to running warehouses without Keebo enabled. The savings percentage is calculated based on warehouses that have Warehouse Optimization enabled.