2.0 Release Notes
🎉 Highlights of 2.0 release
support for Delta Live Tables monitoring
authorization rules
improved skew and spillage detection
VM Pool cost tracking
Databricks on AWS support!
What’s new:
Smart Tips
Data Skew info
Lakehouse Monitor identifies scenarios when the data is skewed and provides concrete identification criteria to immediately identify the Spark code section responsible to the level of the job and stage IDs. With one click, you can open the Spark UI directly to the task that requires your attention.
PR 2159: Parametrised skew algorithm
METRIC_PROCESSOR_SKEW_THRESHOLD:0.3 SKEW_LOW_PASS_FILTER_MS:5000 SKEW_STD_DEV_MULTIPLY_FACTOR:2
Disk Spillage Analysis
Lakehouse Monitor identifies scenarios when data spillage on the disk occurs and provides concrete identification criteria to immediately identify the Spark code section responsible to the level of the job and stage IDs. With one click, you can open the Spark UI directly to the task that requires your attention.
PR 2260: Fix spill not being analyzed for pipeline jobs
PR 2273: User Story 4039, User Story 4073 - Spill and Skew improvements
UX Optimizations
PR 2215: User Story 3689: [UI] persist tab selection on cluster stats widget - idle time
PR 2285: User Story 4155: [UX] add horizontal scrollbars for large tables
PR 2286: User Story 4158: [ui] fiscal year configuration allows only YEAR and MONTH selection
PR 2287: Bug 4163: Cost on the reports page should always show 2 decimals.
PR 2202: User Story 4002: trim cost value to only 2 decimals after . in the reports overview (dlt, jobs, clusters)
PR 2258: Warning message when new version is available
PR 2238: User Story 4001: add tooltip for total cost per month in the subscription cost widget
PR 2246: Bug 3705: [UI] Scrollbar for reordering of columns should be more sensitive and drag and drop might be better made from the whole box
PR 2227: Bug 4063: Add comma for large values not only on the tooltips on Reporting, but for the actual item costs, and for the budget configuration on Overview
PR 2221: User Story 3711: detect that the UI has been updated and display a dialog asking the user to refresh
PR 2226: User Story 4044: update color scheme for bar-charts
PR 2175: User Story 3689: [UI] persist tab selection on cluster stats widget - idle time
PR 2320: User Story 4094: add in the tooltip a separator line for total cost in the job overview (job, clusters, dlt overview)
PR 2319: User Story 4227: [UX] add horizontal scrollbars per page if the entire becomes crowded
User Story 4051: [DLT] navigate to pipelines detail view for 30 days from the enable/disable monitor screen
User Story 4049: [jobs] navigate to runs detail view for 30 days from the enable/disable monitor screen
User Story 4050: [all-purpose-clusters] navigate to runs detail view for 30 days from the enable/disable monitor screen
User Story 4080: [UI] set ID as separate column when enabling/disabling monitoring
User Story 3999: add group separators on the left navigation bar
User Story 4273: [UI] Adjust view configuration error message based on the workload type
User Story 4278: approximate with "$2.3k" and do not show decimals after . in the overview panels - subscription budget
User Story 4004: use MM/DD/YYYY for date formatting across app
User Story 4279: add approximation for workspace and subscription cost stats - overview left lower widget
User Story 4283: increase width of tooltips in the Operational management page
User Story 4284: maintain height for on demand telemetry analysis details panel
User Story 4302: rename "Consumption Loader Configuration" page to "Consumption Data Management"
User Story 4303: rename "Operational Management" page to "Analysis Management"
Performance Optimizations
PR 2214: Separate the analysis of non-Spark jobs, notebook and clusterIdleness, from that of job runs
PR 2205: [US-2942] Persist jobs, clusters, pipelines initial configuration
PR 2185: Fixed Spark caching inside consumption loading post-processing, resulting in significant time reduction on refresh consumption loading: from 2.5 hours to 20-25 minutes for the current year session, tested on BDE and BP Labs.
PR 2299: [3918] persist cluster configuration in the database for each job/pipeline/notebook analysis
PR 2248: [4105] Store analysis directly to database
Feature 3777: performance optimizations large scale / enterprise deployments
PR 2261: [4087] add flag to filter cost data only for workspaces in subscriptionMetadata.csv
PR 2218: Optimize consumption: write directly to DB
PR 2322: [4230] Optimize non-Spark job, notebook and cluster idleness processing
PR 2214: Separate the analysis of non-Spark jobs, notebook and clusterIdleness, from that of job runs
PR 2205: [US-2942] Persist jobs, clusters, pipelines initial configuration
automatically save initial configuration before enabling the monitor
[4230] Optimize non-Spark job, notebook and cluster idleness processing
[4295] Doctor and queue notification processors retry indefinitely to connect even though storage account does not exist. NPE not handled
PR 2412: Pre-filter on Usage Details API - reduce amount of data loaded from Consumption API
User Story 4331: Change default consumption loading batch size to 10 days
2426: Filter task metrics for which task run id could not be resolved
User Story 4400: Detect earliest available cluster events at the consumption loading start
[4362] Use Maps instead of DTOs to retrieve and send cluster/job/pipeline configurations.
[4362] Add support for provisioning single node job-clusters
Overview
Services Cost Distribution
User Story 3908: [UI] show cost distribution between job clusters, all purpose clusters, pipelines.
Cost Breakdown by Resource
With this release, Lakehouse Monitor also tracks the cost of the VM Pools.
Bug 4016: [UI] Change Vm Pools label to Pools on overview
User Story 3820: Clicking a different date range should persist as I toggle back and forth between the two views, "workspaces" and "subscriptions"
Add vm pools cost information
PR 2276: [3847] VM Pool category in overview reports
PR 2207: User Story 3908: [UI] show cost distribution between job clusters, all purpose clusters, pipelines.
Subscription budget optimizations
PR 2324: Bug 3819: Changing the FY start data from December 1 2021 to Jan 1 2022 doesn't persist when I click "save".
PR 2232: [US-3712] Replace spark ml with commons math3 for cost prediction algorithm
improves the speed for Overview’s subscription budget widget
User Story 4326: use the trendline (estimate) to mark the "in budget" or "over budget" on the cumulative tab for subscription cost
provide cost trendline in the cumulative view per fiscal year
Documentation
PR 2318: User Story 4157: [docs] explain skew tooltip and rename "skew" to "data skew"
PR 2228: User Story 2682: add Help button that links to a support form and user documentation
feedback form
contact email for suggestions
User Story 4414: [docs] add tooltip for disk spillage column header
User Story 4424: [docs] tooltip for memory & cpu
User Story 4416: [docs] shuffle size meaning as tooltip for column header
Reports
User Story 4385: disable sorting on Job Runs on all columns except job run id
[US-4290] Change implementation for job reports, change cache timeout to 3 min
[4240] Pipeline support search by name
PR 2387: Refactor list databricks workspaces code. When we list databricks workspaces we always display workspaces retrieved from azure AND workspaces from public subscription
Bug 4314: Reports - update the number of items per page when search is applied
PR 2427: Approximate cost for cluster idleness periods
Compute approximate costs for cluster idleness periods & store it in the database
User Story 4182: warn user that not all jobs have been enabled for monitoring when using "monitor all"
User Story 3909: [UX] move job id in the job run details header
VMs Allocation Timeline Chart
PR 2202: User Story 4002: trim cost value to only 2 decimals after . in the reports overview (dlt, jobs, clusters)
PR 2157: User Story 3689: [UI] persist tab selection on cluster stats widget - idle time
PR 2160: User Story 3881: [UI] Rename Execution Status column from Reporting view
PR 2151: User Story 3781: add subscription ID and workspace ID to the URL for the job runs view
User Story 3892: [UI] - Add hover information on each bar on Worker VMs Allocation Timeline Chart
User Story 4439: Include cost data in job autoscaling timeline
support jobs whose duration span multiple days
Persist cluster configuration for job runs, notebooks, pipelines
PR 2257: [4119] fix report error on missing cost_avg
PR 2191: [3803] Save cost distribution by cluster types, pipelines for overview api
PR 2309: Save stageInfo in a separate file
PR 2316: [4032]: Persist cluster events into DB
PR 2312: [US-4098] Add apis for getting cluster configuration for job runs, notebooks, pipelines
User Story 4423: open cluster in Databricks from "view cluster configuration"
Delta Live Tables
1. Configure Monitoring on you pipelines.
2. Review stats regarding cost and telemetry
3. Explore detailed information for pipeline update
PR 2270: [3880] Filter our running DLT clusters when listing all purpose clusters
PR 2269: [4118] Display maintenance jobs that have no analysis
PR 2274: Maintenance job tab
PR 2252: User Story 4107: [UI] Rename Cost per Table to Cost per Pipeline
PR 2307: [4193] provisioning jobs with pipeline_task(s) should also provision the pipelines
PR 2210: [UI] Add breakdown for pipeline duration
PR 2212: Add default error handling for delta live tables enable
PR 2210: [UI] Add breakdown for pipeline duration
User Story 3805: “Enable Delta Live Tables” button on settings page
PR 2203: User Story 4021: apply date range filter only on the left widget
PR 2208: Add breakdown for pipeline duration
PR 2201: [4020] [DLT] updates with no cost show as zero cost, should be n/a
PR 2199: [4019] [DLT] NPE when attempting to view a pipeline configuration
PR 2198: User Story 3717: [UI] update tooltips to include reference to DLT cost data
In Progress Markers
PR 2284: In Progress hint for pipelines and job runs
PR 2234: [US-4067] Add flag for job reports indicating if the job is running or not
PR 2235: Add support for filtering pipeline analysis by date range and in progress hint
Navigation
User Story 4166: add "open in Databricks" the DLT pipeline from the monitor configuration page
User Story 4170: add "open in Databricks" the Job from the monitor configuration page
User Story 4171: add "open in Databricks" the All purpose clusters from the monitor configuration page
User Story 4062: [UI] mark jobs that are related to DLT
User Story 4289: make the date interval mandatory - default to 7 days
User Story 4291: [UI] Rename column from #runs to Analyzed Runs on Reports tabs
Bug 4247: [UI] Enable sorting by pipeline id, creator and monitor status just as it is for jobs and all purpose clusters
User Story 4330: add tooltip "Open in Reports" when I hover over the link in Jobs, All purpose Clusters, Delta Live Tables
Collected data
Save stageInfo in a separate file
User Story 4229: store in taskMetrics description of the error messages for failed tasks
[4259] drop 30-days limit for proportional cost calculation
[4277] Collector stores pipeline task metrics in the job task metrics blob. Fix pipeline id not being resolved
User Story 4308: update labels for Reports // Jobs page -- analyzed vs all jobs tooltipls and info
Security
PR 2300: [4218] - removing azure keyvault and azure management functions as required dependencies in the grant consent dialog and in the app
PR 2155: Implement retry policy when we add service principal in databricks
PR 2326: [4255] - Disable object-id requirement when azure management is disabled
PR 2429: Add possibility to link with existing secret scope
Authorization Rules
PR 2437: Authorized jobs cache with key formed by user & workspace. Invalidate cache on logout
PR 2289: [BUG-4110] Display only authorized services
PR 2280: Bug 4169: [UI] - Managed Identity errors are shown even if it's not enabled
PR 2277: [4129] - Same workspace, different environments, difference in user authorization
PR 2271: [4124] - Check grants doesn't throw error if user is not authorized.
PR 2243: [US-4035] Show in reports only costs for authorized jobs, clusters, pipelines
see more here: Authorization Rulesarchived
Operational Management – Telemetry & Consumption
PR 2308: Design, layout and search functionality for metrics processor history
PR 2302: User Story 4207: use correlationID instead of Execution Id for the ConsumptionLoad in the Run History
PR 2301: Enhance Metrics processor audit
PR 2272: User Story 4139: [UX] improve consumption loading history dialog
PR 2265: Run metrics processor scheduled tasks sequentially
PR 2262: Expose background processors provider (MANAGED IDENTITY AND/OR SERVICE_PRINCIPAL)
add collector commit id
PR 2253: Add initial delay configuration for metrics processor doctor
PR 2256: [4127] fix refresh mode data cleaning
PR 2229: Bug 3914: Consumption history start time displayed as hour 12 for backend 00
PR 2127: [3660] - [API] integrate SMTP alerting in case of ConsumptionLoading background process failing
PR 2200: [3917] add commitid (or any other id) in the name of the metrics files to provide versioning information
PR 2195: Limit metrics historical APIs response size
PR 2156: Sort consumption history by start time desc, sort steps of same execution id by start type asc.
PR 2127: [3660] - [API] integrate SMTP alerting in case of ConsumptionLoading background process failing
PR 2191: [US-3803] Save cost distribution by cluster types, pipelines for overview api
PR 2195: Limit metrics historical APIs response size
PR 2185: [3953] fix backfilling costs into DB in incremental mode
PR 2181: Metrics Processor and Metrics Processor Doctor historical APIs
PR 2174: [US-3904] Make cache timeout configurable
PR 2166: [3848] - Expose bplm log folder of bplm container to host machine
PR 2155: Implement retry policy when we add service principal in databricks
PR 2156: Sort consumption history by start time desc, sort steps of same execution id by start type asc.
AWS support
with S3 as cloud storage solution
use Databricks identity provider
accounts API - username and password
🎛️ Configuration
[4268]: Support to update per workspace, spark-env extraListeners and report period
[4256] - timeout user login session after 1 hour
[4286] - [SecretScopes] Validation issue for Secret scopes with AzureKeyVault backend type;
[4285] - Error thrown on DLT page when we list pipelines and secret scope is not set.
[3795] - Expose active directory OAuth configuration to application config
[4239] Add scheduler name validation
User Story 4310: [UI] - Hide buttons related to MI if Managed Identity is disabled
User Story 4249: [UI] support default “deployment configuration”
[4333] - add config: AUTO_UPDATE_PRIVATE_METADATA_FILE
User Story 4264: add tab for Access Rights configuration
Deployment
PR 2264: Update azure-pipelines.yml for Azure Pipelines
PR 2291: [4194] - Detect if the workspace is premium or not and add a flag, such that Enable DLTs button is deactivated if non premium
PR 2278: [4168] - Create script to enable bplm app as a linux service when we deploy without docker.
PR 2174: [US-3904] Make cache timeout configurable
PR 2169: [3848] - Expose bplm.log file of bplm-container to host machine
secure deployment with no public access in Azure
PR 2209: Hardcoded commitId of collector in init script and spark-conf
PR 2200: [3917] add commitid (or any other id) in the name of the collector jar file to provide versioning information
Miscellaneous
PR 2305: User Story 4094: add in the tooltip a separator line for total cost in the job overview (job, clusters, dlt overview)
PR 2296: [4202] Maintenance reporting overview displays aggregated telemetry as n/a even when telemetry exists
PR 2303: Default selection for secret scopes dialog when keyvault is not enabled
PR 2288: Fix analyzer when it is executed from notebook
PR 2275: Fix filter by skew
PR 2279: [4137] Change type of error response for AzureException in ControllerExceptionHandler
PR 2266: User Story 4143: [UX] improve Error messages
PR 2268: [4130] - Logout throws error
PR 2259: [4135] - Support to enable/disable collector logs
PR 2263: Remove timestamp from task metrics reporting when append mode is supported
PR 2249: Update spark conf extra classpath regex to match new collector jar name with commitId
PR 2163: User Story 3892: [UI] - Add hover information on each bar on Worker VMs Allocation Timeline Chart
PR 2162: [3886] make PoolId to be a case insensitive string
User Story 4383: rename widget to "Cost per Pipeline Maintenance Job" on Reports / Delta Live Tables / Maintenance Jobs
User Story 4370: [UX] add vertical horizontal scrollbars per page if the entire page becomes overcrowded
🐞Fixed Bugs
PR 2306: Bug 3941: Loader scheduler can be saved without having a name
PR 2294: Bug 3607: [UI] Costs per jobs - costs cram into colored segment bar
PR 2295: Bug 4217: [UI] Pipeline cost list is empty when selecting Time tab on all purpose clusters - Clusters tab
PR 2292: Bug 4199: [UI] Overview - Cost per workspaces chart wrongly displays the length of the horizontal bar charts due to missing pools cost in the UI
PR 2255: [BUG-4108] Fix column length in databricks services configuration table
PR 2239: Bug 3915: Icon for job executed for job runs api is displayed when it shouldn't be, for jobs and notebooks on all purpose clusters
PR 2171: Bug 3898: jobs run by submitted API shown as deleted on the all-purpose-clust...
PR 2304: Bug 3704: [UI] Column titles not aligned
Bug-3900: [UI] Different date interval initially displayed
BUG-3666: Add workspace name for only storage costs overview
Bug 3792: [UI] Costs per clusters - make the content reach the box margins when no results are available
PR 2192: Bug 3730: [UI] Overview - All widgets change size if tabs are switched on Daily alerts
PR 2187: Bug 3998: [UI] Cost breakdown bar colors are not right
PR 2183: Bug 3940: autoscaling timeline tooltip not readable when using light theme
PR 2171: Bug 3898: jobs run by submitted API shown as deleted on the all-purpose-clust...
PR 2162: [3886] make PoolId to be a case insensitive string
Bug 4116: Entry displayed twice in Pipelines/Cost per table
Bug 4245: [UI] Multiple unsaved consumption loading schedulers contain in the name the # of the last scheduler tab
Bug 3819: Changing the FY start data from December 1 2021 to Jan 1 2022 doesn't persist when I click "save".
[4267] [API] DLT Updates view - Total cost is N/A but all individual costs are 0
[4271][Logging] Analyser clutters logs when processing task metrics of a deleted job
4270 - Operational management: saving analysis configuration twice throws error
4272: [UI] Content is not refreshed when switching to a workspace where I am not authorized
4266: [UI] Percentage on daily alerts widget table have 3 decimal points
4275: [UI] Jobs and All purpose clusters pages content is not refreshed when switching to a workspace where I am not authorized
4276: [UI] - Navigating back from all purpose cluster job reporting view does not select correct tabs
Bug 4195: [UI] Start times on Telemetry Analysis Details and History must mention am/pm
BUG 4293 - Fix job duration breakdown report
[4294] - Issue with enable monitor when agent advanced settings is saved, and a redeploy change collector-jar name.
Bug 4297: [UI] Maintain capital/small letters on elements of Settings page
Bug 4251: job run id not aligned on small screen
Bug 4269: [UI] Schedule name is overwritten when moving between tabs
Bug 4231: [UI] Reports: new search by date interval does not refresh items page selection
[4221] Jobs on all purpose clusters are not alphabetically ordered
Bug 4311: [UI] Sorted Jobs are not found when clicking on cost job list
Bug 4246: [UI] convert start time to client(browser) timezone not only on loader run history, but also on last run
Bug 4236: [UI] Telemetry analysis details/Cluster idleness - date incorrectly displayed
Bug 4320: [UI] Make x hover on date interval not to overlap the actual date and align ids with workspace and subscription names
Bug 4336: [UI] All purpose clusters/Jobs & Notebooks - sorting by cluster does not work
Bug 4337: [UI] All purpose clusters/Notebook runs distribution & Dlts - items page cannot be changed after the number of items per page has been modified
Bug 4338: [UI] Jobs runs - sorting by task run id does not work. disable sorting by memory and cpu
Bug 4325: Consumption last run does not show status and error message for failed load
Bug 3706: [UI] Column reordering causes ids not to be displayed
Bug 4350: [UI] - Schedule since & until dates should be displayed as they come from the api
Bug 4353: [UI] - Reporting view calendar displaying wrong date on UTC- timezones
[BUG 4440] Fix duration execution for running jobs
Bug 4442: [UI] Search doesn't work if request on runs does not finish
Bug 4421: almost zero cost bar chart issues
PR 2449: Fix missing analysis detection for streaming jobs
Bug 4373: Calculating proportional VM Pool costs for streaming / long jobs
Last commit id: b89580346