The Lakehouse Optimizer empowers users to monitor and improve their Lakehouse infrastructure by configuring incidents of interest that identify inefficiencies in cost, performance, and operational metrics. Incidents are displayed in the Incidents section of the app, providing actionable insights to optimize resource utilization and expenditures.

Incident Configuration Steps

Under the Settings menu, select Settings / Incident Policies
1. Incidents can be defined within Subscriptions, Workspaces, Workflows, All Purpose Compute, Delta Live Tables, SQL Warehouses, Pools, and Job Compute areas.
2. Each incident created has its own incident policy
Select the area of interest to create an incident under (ex. Workflow).
Select category - Cost Control or Performance
Select sub-category from dropdown menu (ex. Over Provisioning)
Select sub-option within Sub-Category (ex. Cluster CPU Over Provisioning)

Select +ADD button under Incident Rules section

Specify threshold for the incident rule (ex. Cluster CPU under 60% for any run) and save changes.
a. multiple rules can be set for each incident policy (ex. if a user wants an incident created each time Cluster CPU for any run is under 60%, 65%, or 70%)

b. When an incident rule has been met, an Incident Ticket is created automatically (and corresponding incident will appear in the Incidents view.

Every incident rule for a given Incident Policy automatically has email notifications turned on. However, by default no email group is tied to the incident rule, thus email notifications will not be sent until an email group(s) is selected for a given incident rule.
a.

The following incidents are configurable in LHO:

Entity	Incident
Subscriptions	Monthly Cost above Threshold
Workspaces	Monthly Cost above Threshold
Workflows	Monthly Cost above Threshold
Workflows	Over-Provisioning Cluster CPU Driver Memory Driver CPU Driver Memory
Workflows	Under-Provisioning Cluster CPU Driver Memory Driver CPU Driver Memory
Workflows	Imbalanced-Provisioning CPU Memory
Workflows	Bad Skew
Workflows	Disk Spillage
Workflows	Run Failure
Workflows	Job with All Purpose Clusters
Delta Live Tables	Over Provisioning Cluster CPU Cluster Memory
Delta Live Tables	Under Provisioning Cluster CPU Cluster Memory
Delta Live Tables	Update Failure
Delta Live Tables	Monthly Cost above Threshold
All Purpose Clusters	Monthly Cost above Threshold
All Purpose Clusters	Auto Shutdown Timeout Shutdown Timeout above Threshold Shutdown Timeout Missing
All Purpose Clusters	Total Idle Time above Threshold
All Purpose Clusters	Over-Provisioning Cluster CPU Driver Memory Driver CPU Driver Memory
All Purpose Clusters	Under-Provisioning Cluster CPU Driver Memory Driver CPU Driver Memory
Pools	Auto Shutdown Timeout Shutdown Timeout above Threshold

Lakehouse Optimizer Incidents and Notifications Configuration

Incident Configuration Steps

The following incidents are configurable in LHO:

Related articles