Client Success KPIs

Client Success KPIs

Client Success KPIs provide insights to your Databricks usage.

The Lakehouse Optimizer collects telemetry data for your Databricks workloads running on classic infrastructure (job compute, all-purpose compute and pipelines) and as of April 1st 2025 LHO collects telemetry for SQL Warehouses (classic and serverless), serverless notebooks and serverless jobs (no serverless DLT support yet). As more metrics are published by Databricks in system tables, LHO will provide coverage for those too.

The collected telemetry data allows for reporting the following KPIs:

KPI

Description

KPI

Description

Data Usage

Total amount of data in, data out, shuffle in, and shuffle out across all monitored resources. Includes data processed by all clusters (Serverless, Classic Compute, SQL Warehouses) to which LHO has access for retrieving bytes processed information. Some resources may not provide telemetry metadata to LHO due to factors such as security constraints or because Spark jobs are not utilized.

Data Usage is also referred to as Data Processed.

- If the workloads are not running any Spark code and do not produce Spark task metrics, LHO will not be able to report this KPI.
- For Pipeline updates, if they are using Unity Catalog storage and are using the default channel LHO will not be able to collect telemetry, with UC the Preview channel is required for this level of instrumentation.

Data Usage Cost

Total cost of all monitored resources with data telemetry (i.e., reporting data in, data out, shuffle in, and shuffle out). 

Cluster Hours

Total duration metadata across all monitored clusters. Includes the duration of all clusters (Serverless, AI, and Classic Compute) to which LHO has access for retrieving duration metadata. Some resources may not provide telemetry metadata to LHO due to factors such as security constraints.

Monitored Duration Cost

Total cost of all resources for which LHO can determine duration. 

Cost per TB

The average cost per TB of monitored data.
Calculated as: Data Usage Cost / Data Usage.

Cost per Hour

The average total cost incurred per hour during a given month.
Calculated as: Total Monthly Cost / Hours in Month.

TB per Day

The average amount of monitored data per day during a given month.
Calculated as: Cluster Hours / Days in Month.

Cluster Hours per TB

The average number of cluster hours required to process one TB of monitored data.
Calculated as: Cluster Hours / Data Usage.

KPIs levels:

  1. Monthly KPIs can be found on Forecasting page.
    Click on a month to open KPIs panel. KPIs can also be exported as CSV.

    2025-07-30_18h34_07-20250731-013407.png


    Cost & Data chart visualizes monthly KPIs and projects future trends for each KPI line.

    2025-07-30_18h36_57-20250731-013657.png


    2025-07-30_18h38_01-20250731-013801.png

     

  2. KPIs at the job run level can be found in Trendlines for Job Runs. Customize in Filters which KPIs charts to view.

    2025-07-30_18h49_22-20250731-014953.png

     

  3. Client Success Report with all the monthly KPIs are available as CSV downloads in Settings > Analysis Management > Optimization Reports, alongside the Telemetry Coverage Report. Admin permission is required to access this page.

    2025-07-30_18h53_34-20250731-015334.png

 

Data KPIs FAQ:

Q: Why Monitored Duration Cost is not the same as Total Monthly Cost?
A: There is a range of reasons as to why. Not all workspaces may be published ( ie have telemetry enabled). As Databricks adds new features, not all these additions have telemetry implemented yet. The LHO team continuously works on expanding telemetry to new features. Also, not all Databricks features can have telemetry because of what Databricks provides (example: Pipeline Maintenance).

 

Q: Why Data Usage Cost is lower than Duration Usage Cost?
A: For some features, duration metadata is available, but collecting data processing telemetry is not possible. Therefore, metrics such as data in, data out, shuffle in, and shuffle out are not available. Examples include workloads without Spark jobs, or Pipelines with the channel set to "Current".