Keebo Glossary of Terms
Warehouse Optimization for Snowflake
Snowflake Credit
A unit of measure Snowflake uses to quantify computational resources consumed during operations. Snowflake bills at a negotiated cost per credit.
Keebo Credit
A unit of measure Keebo uses to quantify the Snowflake credits Warehouse Optimization has saved. One Keebo credit equals one Snowflake credit saved.
Keebo Organization
Each customer has one Keebo organization. All accounts and warehouses are managed within this organization.
Optimization Decision
An optimization decision occurs each time Warehouse Optimization evaluates whether to change a warehouse's settings. Optimization decisions are visible on the Optimization page.
Multi-Cluster Optimization (MCO)
Warehouse Optimization automatically reduces the maximum number of clusters when a multi-cluster warehouse is underutilized, minimizing unnecessary scale-out.
Automated Downsizing
Warehouse Optimization automatically downsizes a warehouse when it is underutilized.
Memory Optimization
Warehouse Optimization optimizes cost and performance by continuously comparing the cost of losing a warehouse's cache against the cost of keeping the warehouse running.
Labels
Labels can be assigned to warehouses within Warehouse Optimization. These are separate from Snowflake labels and query tags.
Evaluation Window
The time interval in which Warehouse Optimization makes an optimization decision. For example, a 15-minute evaluation window means Warehouse Optimization evaluates and makes a decision every 15 minutes.
Warehouse Optimization for Databricks
DBU (Databricks Billing Unit)
The unit of usage used to measure Databricks compute resources. DBUs are consumed based on the size and runtime of SQL warehouses.
SQL Warehouse
A Databricks compute resource used to run SQL queries. Warehouse Optimization optimizes Databricks SQL Warehouses connected to a workspace.
Serverless SQL Warehouse
A type of Databricks SQL Warehouse where Databricks manages the underlying compute infrastructure. Warehouse Optimization is currently designed to work with Serverless SQL Warehouses.
Workspace
A Databricks environment that houses SQL warehouses, notebooks, jobs, and other resources. Warehouse Optimization connects to one or more workspaces to optimize SQL warehouses.
Service Principal
A non-human identity used by applications to authenticate and interact with Databricks. Warehouse Optimization uses a Databricks service principal with OAuth M2M authentication to connect to the environment.
Unity Catalog
Databricks' data governance solution that provides centralized access control, auditing, and data lineage. Unity Catalog must be enabled on a workspace for Warehouse Optimization to connect.
Keebo Warehouse Optimization for Databricks
Keebo's automation product that optimizes Databricks data warehouse costs and performance.
Keebo Portal
The web interface for Warehouse Optimization, accessible at portal.keebo.ai. This is where connected warehouses are managed, the dashboard is viewed, and optimization settings are configured.
Optimization
An action taken by Warehouse Optimization to change a warehouse's settings — such as resizing to a smaller size during low utilization — to reduce DBU consumption without negatively impacting query performance.
Backoff
A Warehouse Optimization action that reverses a previous optimization by returning a warehouse to its default settings. Warehouse Optimization initiates a backoff when a warehouse's performance meets specific criteria defined as guardrails, such as high query latency.
Guardrail
A performance threshold configured in Warehouse Optimization. When a connected warehouse exceeds this threshold, Warehouse Optimization performs a backoff by stopping all optimizations and returning the warehouse to its default to restore performance.
Auto-Stop
A Databricks SQL Warehouse setting that defines how long (in seconds) the warehouse stays active after the last query before it shuts down. Warehouse Optimization dynamically manages this value to balance cost savings with performance.
Savings Calculation
Warehouse Optimization's estimate of how many DBUs were saved compared to running warehouses without Keebo enabled. The savings percentage is calculated based on warehouses that have Warehouse Optimization enabled.
Workload Intelligence
Average Latency
The average total query execution time, including all phases of query processing: compilation, queueing, and execution.
Compute
Snowflake credits consumed by virtual warehouse usage. Snowflake uses a per-second billing model with a minimum charge of 60-second increments.
Cloud Services
Snowflake credits consumed by non-query activities such as authentication, metadata management, and query optimization. These credits are only billed if they equal 10% or more of total spending.
Query Template
A common query structure shared across multiple queries, excluding variables and constants that frequently change. Workload Intelligence uses query templates in Spend Analytics to group similar queries.
Workload
A user-defined group of queries based on specific criteria such as warehouses or users. Workloads enable analysis of which groupings contribute the most to cost or performance issues.
Bytes Scanned
The total amount of data read from storage to execute a query. More costly queries typically scan more bytes of data.
Total Local and Remote Spillage
Spillage occurs when a query requires more memory than is available in the virtual warehouse. Spillage increases latency, with remote spillage causing a more severe impact than local spillage.
Memory Inefficient
A warehouse is considered memory inefficient when its queries scan 20% more remote bytes when compute resources are already available compared to when additional resources must be provisioned.
Query Imbalance
Warehouses with queries whose execution time is 10 times higher than the warehouse average are considered to have query imbalance.
Underprovisioned
A warehouse is considered underprovisioned if the percentage of queries with spillage to remote or local storage is 5% or more.
Imbalance Factor
A metric that evaluates the uniformity of query workloads in a warehouse. Higher values indicate more significant imbalances. Formula: Imbalance Factor = (total execution time of all outlier queries) / (total execution time of all queries) * 100
Wasteful Queries
Queries that frequently fail, resulting in wasted Snowflake credits.
Unused Tables
Tables created over 90 days ago that have not been accessed or modified in the last 90 days.
Unread Tables
Tables modified but not read in the last 90 days. These tables may be constantly updated by an automated process while no one reads them.
Active Size
Bytes owned by and billed to a table that are in the active state.
Time Travel
A Snowflake feature that provides access to historical versions of changed or deleted data within a defined retention period.
Fail-Safe
A Snowflake feature that provides an additional layer of data recovery beyond Time Travel. It retains deleted or changed data for seven days after the Time Travel period ends, allowing Snowflake to recover data in case of system failures or accidental data loss.
Retained for Clone
A Snowflake feature that preserves deleted or changed data for short periods to support zero-copy cloning. This enables creating clones of tables, schemas, or databases without duplicating data storage.
Total Size
The sum of Active Size, Time Travel, Fail-Safe, and Retained for Clone bytes. This represents the total amount of storage billed for a table.