2.0 Release Notes

 

🎉 Highlights of 2.0 release

  • support for Delta Live Tables monitoring

  • authorization rules

  • improved skew and spillage detection

  • VM Pool cost tracking

  • Databricks on AWS support!

 

What’s new:


Smart Tips

Data Skew info

Lakehouse Monitor identifies scenarios when the data is skewed and provides concrete identification criteria to immediately identify the Spark code section responsible to the level of the job and stage IDs. With one click, you can open the Spark UI directly to the task that requires your attention.

Data Skew tooltips
skew analysis
  • PR 2159: Parametrised skew algorithm

    METRIC_PROCESSOR_SKEW_THRESHOLD:0.3 SKEW_LOW_PASS_FILTER_MS:5000 SKEW_STD_DEV_MULTIPLY_FACTOR:2

Disk Spillage Analysis

Lakehouse Monitor identifies scenarios when data spillage on the disk occurs and provides concrete identification criteria to immediately identify the Spark code section responsible to the level of the job and stage IDs. With one click, you can open the Spark UI directly to the task that requires your attention.

  • PR 2260: Fix spill not being analyzed for pipeline jobs

  • PR 2273: User Story 4039, User Story 4073 - Spill and Skew improvements

UX Optimizations

  • PR 2215: User Story 3689: [UI] persist tab selection on cluster stats widget - idle time

  • PR 2285: User Story 4155: [UX] add horizontal scrollbars for large tables

  • PR 2286: User Story 4158: [ui] fiscal year configuration allows only YEAR and MONTH selection

  • PR 2287: Bug 4163: Cost on the reports page should always show 2 decimals.

  • PR 2202: User Story 4002: trim cost value to only 2 decimals after . in the reports overview (dlt, jobs, clusters)

  • PR 2258: Warning message when new version is available

  • PR 2238: User Story 4001: add tooltip for total cost per month in the subscription cost widget

  • PR 2246: Bug 3705: [UI] Scrollbar for reordering of columns should be more sensitive and drag and drop might be better made from the whole box

  • PR 2227: Bug 4063: Add comma for large values not only on the tooltips on Reporting, but for the actual item costs, and for the budget configuration on Overview

  • PR 2221: User Story 3711: detect that the UI has been updated and display a dialog asking the user to refresh

  • PR 2226: User Story 4044: update color scheme for bar-charts

  • PR 2175: User Story 3689: [UI] persist tab selection on cluster stats widget - idle time

  • PR 2320: User Story 4094: add in the tooltip a separator line for total cost in the job overview (job, clusters, dlt overview)

  • PR 2319: User Story 4227: [UX] add horizontal scrollbars per page if the entire becomes crowded

  • User Story 4051: [DLT] navigate to pipelines detail view for 30 days from the enable/disable monitor screen

  • User Story 4049: [jobs] navigate to runs detail view for 30 days from the enable/disable monitor screen

  • User Story 4050: [all-purpose-clusters] navigate to runs detail view for 30 days from the enable/disable monitor screen

  • User Story 4080: [UI] set ID as separate column when enabling/disabling monitoring

  • User Story 3999: add group separators on the left navigation bar

  • User Story 4273: [UI] Adjust view configuration error message based on the workload type

  • User Story 4278: approximate with "$2.3k" and do not show decimals after . in the overview panels - subscription budget

  • User Story 4004: use MM/DD/YYYY for date formatting across app

  • User Story 4279: add approximation for workspace and subscription cost stats - overview left lower widget

  • User Story 4283: increase width of tooltips in the Operational management page

  • User Story 4284: maintain height for on demand telemetry analysis details panel

  • User Story 4302: rename "Consumption Loader Configuration" page to "Consumption Data Management"

  • User Story 4303: rename "Operational Management" page to "Analysis Management"

 

Performance Optimizations 

  • PR 2214: Separate the analysis of non-Spark jobs, notebook and clusterIdleness, from that of job runs

  • PR 2205: [US-2942] Persist jobs, clusters, pipelines initial configuration

  • PR 2185: Fixed Spark caching inside consumption loading post-processing, resulting in significant time reduction on refresh consumption loading: from 2.5 hours to 20-25 minutes for the current year session, tested on BDE and BP Labs.

  • PR 2299: [3918] persist cluster configuration in the database for each job/pipeline/notebook analysis

  • PR 2248: [4105] Store analysis directly to database

  • Feature 3777: performance optimizations large scale / enterprise deployments

  • PR 2261: [4087] add flag to filter cost data only for workspaces in subscriptionMetadata.csv

  • PR 2218: Optimize consumption: write directly to DB

  • PR 2322: [4230] Optimize non-Spark job, notebook and cluster idleness processing

  • PR 2214: Separate the analysis of non-Spark jobs, notebook and clusterIdleness, from that of job runs

  • PR 2205: [US-2942] Persist jobs, clusters, pipelines initial configuration

    • automatically save initial configuration before enabling the monitor

  • [4230] Optimize non-Spark job, notebook and cluster idleness processing

  • [4295] Doctor and queue notification processors retry indefinitely to connect even though storage account does not exist. NPE not handled

  • PR 2412: Pre-filter on Usage Details API - reduce amount of data loaded from Consumption API

  • User Story 4331: Change default consumption loading batch size to 10 days

  • 2426: Filter task metrics for which task run id could not be resolved

  • User Story 4400: Detect earliest available cluster events at the consumption loading start

  • [4362] Use Maps instead of DTOs to retrieve and send cluster/job/pipeline configurations.

  • [4362] Add support for provisioning single node job-clusters

Overview

Services Cost Distribution

  • User Story 3908: [UI] show cost distribution between job clusters, all purpose clusters, pipelines.

Cost Breakdown by Resource

With this release, Lakehouse Monitor also tracks the cost of the VM Pools.

  • Bug 4016: [UI] Change Vm Pools label to Pools on overview

  • User Story 3820: Clicking a different date range should persist as I toggle back and forth between the two views, "workspaces" and "subscriptions"

  • Add vm pools cost information

 

  • PR 2276: [3847] VM Pool category in overview reports

  • PR 2207: User Story 3908: [UI] show cost distribution between job clusters, all purpose clusters, pipelines.

Subscription budget optimizations

  • PR 2324: Bug 3819: Changing the FY start data from December 1 2021 to Jan 1 2022 doesn't persist when I click "save".

  • PR 2232: [US-3712] Replace spark ml with commons math3 for cost prediction algorithm

    • improves the speed for Overview’s subscription budget widget

  • User Story 4326: use the trendline (estimate) to mark the "in budget" or "over budget" on the cumulative tab for subscription cost

    • provide cost trendline in the cumulative view per fiscal year

Documentation

  • PR 2318: User Story 4157: [docs] explain skew tooltip and rename "skew" to "data skew"

  • PR 2228: User Story 2682: add Help button that links to a support form and user documentation

    • feedback form

    • contact email for suggestions

  • User Story 4414: [docs] add tooltip for disk spillage column header

  •  

  • User Story 4424: [docs] tooltip for memory & cpu

  • User Story 4416: [docs] shuffle size meaning as tooltip for column header

Reports

  • User Story 4385: disable sorting on Job Runs on all columns except job run id

  • [US-4290] Change implementation for job reports, change cache timeout to 3 min

  • [4240] Pipeline support search by name

  • PR 2387: Refactor list databricks workspaces code. When we list databricks workspaces we always display workspaces retrieved from azure AND workspaces from public subscription

  • Bug 4314: Reports - update the number of items per page when search is applied

  • PR 2427: Approximate cost for cluster idleness periods

    • Compute approximate costs for cluster idleness periods & store it in the database

  • User Story 4182: warn user that not all jobs have been enabled for monitoring when using "monitor all"

  • User Story 3909: [UX] move job id in the job run details header

VMs Allocation Timeline Chart

  • PR 2202: User Story 4002: trim cost value to only 2 decimals after . in the reports overview (dlt, jobs, clusters)

 

  • PR 2157: User Story 3689: [UI] persist tab selection on cluster stats widget - idle time

  • PR 2160: User Story 3881: [UI] Rename Execution Status column from Reporting view

  • PR 2151: User Story 3781: add subscription ID and workspace ID to the URL for the job runs view

  • User Story 3892: [UI] - Add hover information on each bar on Worker VMs Allocation Timeline Chart

  • User Story 4439: Include cost data in job autoscaling timeline

    • support jobs whose duration span multiple days

Persist cluster configuration for job runs, notebooks, pipelines

  • PR 2257: [4119] fix report error on missing cost_avg

  • PR 2191: [3803] Save cost distribution by cluster types, pipelines for overview api

  • PR 2309: Save stageInfo in a separate file

  • PR 2316: [4032]: Persist cluster events into DB

  • PR 2312: [US-4098] Add apis for getting cluster configuration for job runs, notebooks, pipelines

User Story 4423: open cluster in Databricks from "view cluster configuration"

Delta Live Tables

1. Configure Monitoring on you pipelines.

2. Review stats regarding cost and telemetry

3. Explore detailed information for pipeline update

 

  • PR 2270: [3880] Filter our running DLT clusters when listing all purpose clusters

  • PR 2269: [4118] Display maintenance jobs that have no analysis

  • PR 2274: Maintenance job tab

  • PR 2252: User Story 4107: [UI] Rename Cost per Table to Cost per Pipeline

  • PR 2307: [4193] provisioning jobs with pipeline_task(s) should also provision the pipelines

  • PR 2210: [UI] Add breakdown for pipeline duration

  • PR 2212: Add default error handling for delta live tables enable

  • PR 2210: [UI] Add breakdown for pipeline duration

  • User Story 3805: “Enable Delta Live Tables” button on settings page

  • PR 2203: User Story 4021: apply date range filter only on the left widget

  • PR 2208: Add breakdown for pipeline duration

  • PR 2201: [4020] [DLT] updates with no cost show as zero cost, should be n/a

  • PR 2199: [4019] [DLT] NPE when attempting to view a pipeline configuration

  • PR 2198: User Story 3717: [UI] update tooltips to include reference to DLT cost data

In Progress Markers

  • PR 2284: In Progress hint for pipelines and job runs

  • PR 2234: [US-4067] Add flag for job reports indicating if the job is running or not

  • PR 2235: Add support for filtering pipeline analysis by date range and in progress hint

Navigation

  • User Story 4166: add "open in Databricks" the DLT pipeline from the monitor configuration page

  • User Story 4170: add "open in Databricks" the Job from the monitor configuration page

  • User Story 4171: add "open in Databricks" the All purpose clusters from the monitor configuration page

  • User Story 4062: [UI] mark jobs that are related to DLT

  • User Story 4289: make the date interval mandatory - default to 7 days

  • User Story 4291: [UI] Rename column from #runs to Analyzed Runs on Reports tabs

  • Bug 4247: [UI] Enable sorting by pipeline id, creator and monitor status just as it is for jobs and all purpose clusters

  • User Story 4330: add tooltip "Open in Reports" when I hover over the link in Jobs, All purpose Clusters, Delta Live Tables

Collected data

  • Save stageInfo in a separate file

  • User Story 4229: store in taskMetrics description of the error messages for failed tasks

  • [4259] drop 30-days limit for proportional cost calculation

  • [4277] Collector stores pipeline task metrics in the job task metrics blob. Fix pipeline id not being resolved

  • User Story 4308: update labels for Reports // Jobs page -- analyzed vs all jobs tooltipls and info

Security

  • PR 2300: [4218] - removing azure keyvault and azure management functions as required dependencies in the grant consent dialog and in the app

  • PR 2155: Implement retry policy when we add service principal in databricks

  • PR 2326: [4255] - Disable object-id requirement when azure management is disabled

  • PR 2429: Add possibility to link with existing secret scope

Authorization Rules

  • PR 2437: Authorized jobs cache with key formed by user & workspace. Invalidate cache on logout

  • PR 2289: [BUG-4110] Display only authorized services

  • PR 2280: Bug 4169: [UI] - Managed Identity errors are shown even if it's not enabled

  • PR 2277: [4129] - Same workspace, different environments, difference in user authorization

  • PR 2271: [4124] - Check grants doesn't throw error if user is not authorized.

  • PR 2243: [US-4035] Show in reports only costs for authorized jobs, clusters, pipelines

see more here: Authorization Rulesarchived

Operational Management – Telemetry & Consumption

  • PR 2308: Design, layout and search functionality for metrics processor history

  • PR 2302: User Story 4207: use correlationID instead of Execution Id for the ConsumptionLoad in the Run History

  • PR 2301: Enhance Metrics processor audit

  • PR 2272: User Story 4139: [UX] improve consumption loading history dialog

  • PR 2265: Run metrics processor scheduled tasks sequentially

  • PR 2262: Expose background processors provider (MANAGED IDENTITY AND/OR SERVICE_PRINCIPAL)

  • add collector commit id

  • PR 2253: Add initial delay configuration for metrics processor doctor

  • PR 2256: [4127] fix refresh mode data cleaning

  • PR 2229: Bug 3914: Consumption history start time displayed as hour 12 for backend 00

  • PR 2127: [3660] - [API] integrate SMTP alerting in case of ConsumptionLoading background process failing

  • PR 2200: [3917] add commitid (or any other id) in the name of the metrics files to provide versioning information

  • PR 2195: Limit metrics historical APIs response size

  • PR 2156: Sort consumption history by start time desc, sort steps of same execution id by start type asc.

  • PR 2127: [3660] - [API] integrate SMTP alerting in case of ConsumptionLoading background process failing

  • PR 2191: [US-3803] Save cost distribution by cluster types, pipelines for overview api

  • PR 2195: Limit metrics historical APIs response size

  • PR 2185: [3953] fix backfilling costs into DB in incremental mode

  • PR 2181: Metrics Processor and Metrics Processor Doctor historical APIs

  • PR 2174: [US-3904] Make cache timeout configurable

  • PR 2166: [3848] - Expose bplm log folder of bplm container to host machine

  • PR 2155: Implement retry policy when we add service principal in databricks

  • PR 2156: Sort consumption history by start time desc, sort steps of same execution id by start type asc.

AWS support

  • with S3 as cloud storage solution

  • use Databricks identity provider

    • accounts API - username and password

🎛️ Configuration

  • [4268]: Support to update per workspace, spark-env extraListeners and report period

  • [4256] - timeout user login session after 1 hour

  • [4286] - [SecretScopes] Validation issue for Secret scopes with AzureKeyVault backend type;

  • [4285] - Error thrown on DLT page when we list pipelines and secret scope is not set.

  • [3795] - Expose active directory OAuth configuration to application config

  • [4239] Add scheduler name validation

  • User Story 4310: [UI] - Hide buttons related to MI if Managed Identity is disabled

  • User Story 4249: [UI] support default “deployment configuration”

  • [4333] - add config: AUTO_UPDATE_PRIVATE_METADATA_FILE

  • User Story 4264: add tab for Access Rights configuration

Deployment

  • PR 2264: Update azure-pipelines.yml for Azure Pipelines

  • PR 2291: [4194] - Detect if the workspace is premium or not and add a flag, such that Enable DLTs button is deactivated if non premium

  • PR 2278: [4168] - Create script to enable bplm app as a linux service when we deploy without docker.

  • PR 2174: [US-3904] Make cache timeout configurable

  • PR 2169: [3848] - Expose bplm.log file of bplm-container to host machine

  • secure deployment with no public access in Azure

  • PR 2209: Hardcoded commitId of collector in init script and spark-conf

  • PR 2200: [3917] add commitid (or any other id) in the name of the collector jar file to provide versioning information

Miscellaneous

  • PR 2305: User Story 4094: add in the tooltip a separator line for total cost in the job overview (job, clusters, dlt overview)

  • PR 2296: [4202] Maintenance reporting overview displays aggregated telemetry as n/a even when telemetry exists

  • PR 2303: Default selection for secret scopes dialog when keyvault is not enabled

  • PR 2288: Fix analyzer when it is executed from notebook

  • PR 2275: Fix filter by skew

  • PR 2279: [4137] Change type of error response for AzureException in ControllerExceptionHandler

  • PR 2266: User Story 4143: [UX] improve Error messages

  • PR 2268: [4130] - Logout throws error

  • PR 2259: [4135] - Support to enable/disable collector logs

  • PR 2263: Remove timestamp from task metrics reporting when append mode is supported

  • PR 2249: Update spark conf extra classpath regex to match new collector jar name with commitId

  • PR 2163: User Story 3892: [UI] - Add hover information on each bar on Worker VMs Allocation Timeline Chart

  • PR 2162: [3886] make PoolId to be a case insensitive string

  • User Story 4383: rename widget to "Cost per Pipeline Maintenance Job" on Reports / Delta Live Tables / Maintenance Jobs

  • User Story 4370: [UX] add vertical horizontal scrollbars per page if the entire page becomes overcrowded

🐞Fixed Bugs

  • PR 2306: Bug 3941: Loader scheduler can be saved without having a name

  • PR 2294: Bug 3607: [UI] Costs per jobs - costs cram into colored segment bar

  • PR 2295: Bug 4217: [UI] Pipeline cost list is empty when selecting Time tab on all purpose clusters - Clusters tab

  • PR 2292: Bug 4199: [UI] Overview - Cost per workspaces chart wrongly displays the length of the horizontal bar charts due to missing pools cost in the UI

  • PR 2255: [BUG-4108] Fix column length in databricks services configuration table

  • PR 2239: Bug 3915: Icon for job executed for job runs api is displayed when it shouldn't be, for jobs and notebooks on all purpose clusters

  • PR 2171: Bug 3898: jobs run by submitted API shown as deleted on the all-purpose-clust...

  • PR 2304: Bug 3704: [UI] Column titles not aligned

  • Bug-3900: [UI] Different date interval initially displayed

  • BUG-3666: Add workspace name for only storage costs overview

  • Bug 3792: [UI] Costs per clusters - make the content reach the box margins when no results are available

  • PR 2192: Bug 3730: [UI] Overview - All widgets change size if tabs are switched on Daily alerts

  • PR 2187: Bug 3998: [UI] Cost breakdown bar colors are not right

  • PR 2183: Bug 3940: autoscaling timeline tooltip not readable when using light theme

  • PR 2171: Bug 3898: jobs run by submitted API shown as deleted on the all-purpose-clust...

  • PR 2162: [3886] make PoolId to be a case insensitive string

  • Bug 4116: Entry displayed twice in Pipelines/Cost per table

  • Bug 4245: [UI] Multiple unsaved consumption loading schedulers contain in the name the # of the last scheduler tab

  • Bug 3819: Changing the FY start data from December 1 2021 to Jan 1 2022 doesn't persist when I click "save".

  • [4267] [API] DLT Updates view - Total cost is N/A but all individual costs are 0

  • [4271][Logging] Analyser clutters logs when processing task metrics of a deleted job

  • 4270 - Operational management: saving analysis configuration twice throws error

  • 4272: [UI] Content is not refreshed when switching to a workspace where I am not authorized

  • 4266: [UI] Percentage on daily alerts widget table have 3 decimal points

  • 4275: [UI] Jobs and All purpose clusters pages content is not refreshed when switching to a workspace where I am not authorized

  • 4276: [UI] - Navigating back from all purpose cluster job reporting view does not select correct tabs

  • Bug 4195: [UI] Start times on Telemetry Analysis Details and History must mention am/pm

  • BUG 4293 - Fix job duration breakdown report

  • [4294] - Issue with enable monitor when agent advanced settings is saved, and a redeploy change collector-jar name.

  • Bug 4297: [UI] Maintain capital/small letters on elements of Settings page

  • Bug 4251: job run id not aligned on small screen

  • Bug 4269: [UI] Schedule name is overwritten when moving between tabs

  • Bug 4231: [UI] Reports: new search by date interval does not refresh items page selection

  • [4221] Jobs on all purpose clusters are not alphabetically ordered

  • Bug 4311: [UI] Sorted Jobs are not found when clicking on cost job list

  • Bug 4246: [UI] convert start time to client(browser) timezone not only on loader run history, but also on last run

  • Bug 4236: [UI] Telemetry analysis details/Cluster idleness - date incorrectly displayed

  • Bug 4320: [UI] Make x hover on date interval not to overlap the actual date and align ids with workspace and subscription names

  • Bug 4336: [UI] All purpose clusters/Jobs & Notebooks - sorting by cluster does not work

  • Bug 4337: [UI] All purpose clusters/Notebook runs distribution & Dlts - items page cannot be changed after the number of items per page has been modified

  • Bug 4338: [UI] Jobs runs - sorting by task run id does not work. disable sorting by memory and cpu

  • Bug 4325: Consumption last run does not show status and error message for failed load

  • Bug 3706: [UI] Column reordering causes ids not to be displayed

  • Bug 4350: [UI] - Schedule since & until dates should be displayed as they come from the api

  • Bug 4353: [UI] - Reporting view calendar displaying wrong date on UTC- timezones

  • [BUG 4440] Fix duration execution for running jobs

  • Bug 4442: [UI] Search doesn't work if request on runs does not finish

  • Bug 4421: almost zero cost bar chart issues

  • PR 2449: Fix missing analysis detection for streaming jobs

  • Bug 4373: Calculating proportional VM Pool costs for streaming / long jobs

 


Last commit id: b89580346