Analysis Management guide

Introduction

Analysis Management allows admin users - users who maintain the Lakehouse Optimizer setup - to audit and if necessary, debug the actual analysis jobs being run on the users Databricks data by Lakehouse Optimizer.

Analysis Management Page

On this page the user can view their telemetry data, and if that data was successfully detected and analyzed.

At the top of the page, the user can select their Databricks Workspace, and then the page will show the data for that workspace only.

This page is split largely into two parts: Detection and Analysis. The user can see the last item run in both of these histories displayed at the top of each section. This gives the user a good indication of the state of the latest runs - did they finish or fail, or is there no history logged at all because the telemetry processor is set to off?

Detection History

Under the hood, Lakehouse Optimizer reads data directly from Databricks, saves it, and our detection engine runs through all that gathered data. Based on that, the Lakehouse Optimizer can discover new assets like jobs, clusters, or SQL warehouses.

The result of that process is the Detection History, with each step indicating a detected gap in the data, or missing data in the overall process that Lakehouse Optimizer goes through to create all the data displayed on the overview and reports pages.

Each Detection Step listed is an instance of potentially missing data in LHOs database versus what is detected in Databricks.

Analysis History

Analysis happens automatically on any given step. This view will show you what jobs, notebooks are found and analyzed.

Example

Lets say there was a detectMissingDbxJobRuns, step in Analysis History. We could take the Execution Id of that detection step, and go over to the Analysis view, and select Interactive Notebooks from the dropdown on the left:

Then we can search by “Execution Group Id” with the ID from the detection step, and any missing notebooks connected to that step would show up. The user can investigate if they were successfully processed.

If an analysis fails, there will be an eye icon next to the status that will reveal more about the failure.

A simple example might be that the application stopped for a moment.

Performance Evaluation Report

A thorough and complete report derived from the analysis, broken down by waste, latency, and orchestration topics.

Telemetry Detection Intervals

Here the user can set how often the detection engine should run the detection steps.

Incidents Engine Configuration

Here the user can set how often the detection engine checks for incidents by type.

Reanalyze Telemetry

This section will allow the user to rerun the analysis jobs found in the analysis history if needed. This should not be necessary as this process runs automatically, but the option is there if needed.

Warning: this process is expensive resource wise (LHO resources) and takes a long time to run