2.3 Release Notes

 

 

🎉 Highlights of 2.3 release

  • reporting // improved navigation on Reports pages

  • reporting // filter by cluster when exploring items running on all-purpose-clusters

  • reporting // optimized queries for large date intervals

  • added support for Azure Consumption - Modern Usage Details API

  • health alerts // performance alerts

  • security // billing administrator role added 

  • analysis // improved cluster idleness detection with CPU usage

  • improved cost collection for AWS deployments to account for network related costs

  • expose used pool type on autoscaling timeline views

  • logic to flush the metrics in storage when collecting on long-running jobs with hundreds of tasks

  • new login screen with AD authentication and Databricks account sign-in

 

Overview

  • subscription budget – predict all next months until end of fiscal year and use all available data from the past 2 years

  • 5026: Persist subscription + workspace selection when changed from overview

    • (1) If I select a workspace in the Overview on the right widgets from the dropdown, then the selected workspace will be for the Reports the default workspace.

      (2) If I select a workspace in the Overview on the lower left widget (cost breakdown), then the workspace on the right widgets will be preselected and also the selected workspace will be for Reports the default workspace.

      (3) If the selected workspace is only visible on the overview, then on Reports the preselection will be on the first subscription and workspace available for the logged in user.

  • 5124: simplified cost values by not showing decimals after . in the overview page for all tooltips

Workspace Features (aka Services)
  • 5226: Change Service label to Feature label

  • 4602: predict all next months until end of fiscal year, not only the following 3 months

  • 4604: on subscription budget widget train on all available data always - take 24months by default

  • 5251: Use weighted linear regression for cost prediction

  • 5370 - increase pixel size of cost values and labels with +2 on overview

predicted cost for the remaining of the fiscal year

 

Reports // Workflows

  • 5032: see job in progress also on the Reports overview even if the job was not yet analyzed.

  • 5084: when returning the cluster timeline, return also information regarding what type of pool is used (azure, dbx, no pool)

  • 5108: hide max shuffle size column on reports // job runs details view by default in order to optimize screen real estate

  • 4917: from Reports page on the main view, I want to be able to CMD+click on the Workflow name and that will open a new tab with that workflow details

  • performance optimizations for Reports when handling >3k workflows

  • 5252 - Use spark duration when databricks duration is 0 in reports

  • 5393 - Filter jobs by name on the server side and increase default cache timeout

 

🖥️ Reporting // All purpose clusters

  • 5143: Show full notebook name on hover, otherwise the user has to go to details of each notebook to see full notebook name

  • 5228: add tooltip explaining that name of the workflow opens the detail view

  • 5092: on cluster timeline display information regarding what type of pool is used (azure, dbx, no pool)

  • 4858 - Compute cost per timeline segment for non-pool job clusters

  • 5069: add cluster filter for jobs and notebooks on all purpose clusters view

  • 5093: mark cluster as deleted or unauthorized when showing notebook runs in the Reports / clusters / notebooks

 

❤️ Health Alerts

  • Performance alerts

    • screenshot to be added

Security

  • Billing Administrator Role added

    • 5145: hide the "Consumption Management" tab when the logged in user does not have the "billing_admin" role

  • 5368 - Increase spring versions and dockerfile to fix security vulnerabilities

  • 5367 - New login screen with AD authentication and Databricks account sign-in

  • 5395 - Make DatabricksAccount users admins based on a flag

 

Performance Optimizations

  • 5188: Improve job reports performance

    • for 12 months job, the report response is returned in 5 seconds, compared to 11+ seconds before the optimization

    • optimized queries for large date intervals

  • 5219 - [Collector] Add logic to flush the metrics in storage when a configurable value of accumulated metrics is exceeded

    • logic to flush the metrics in storage when collecting on long-running jobs with hundreds of tasks

    • we added a maximum limit of tasks and stages that can be monitored at a given time. Everything that exceeds this limit will be reported directly, without the fully resolved metadata. The analyzer will contain functionality that will try to recover these metrics

    • we removed from our internal stores any object coming directly from Spark

  • 5423 - [Collector] Add functionality to set memory usage for buffered task metrics in the collector

 

📊 Analysis Optimizations

  • 5291 - [Collector] When reporting task metrics without metadata, ignore the runId obtained from the cluster tag as it may be unreliable

  • 5318 - Queue monitoring should trigger analysis also for CANCELLED AND FAILED jobs

  • 5224 - Add "only sample logs" mode to collector

  • 5260 - Include the metrics of orphaned tasks when analyzing the job runs

  • 5060: compute cluster idleness based on CPU usage

    • improved idleness detection of cluster by taking into consideration the CPU usage

 

UX Optimizations

  • 5003: use new icon for "view cluster configuration"

  • 4982: add job name in the title of the dialog for cluster timeline

  • 4991: add name in the title of the dialog for cluster configuration of a pipeline

  • 5103: open workspace in Databricks from settings page for AWS deployments

  • 4915: from Health Alerts, I want to be able to CMD+click on the "Open in Reports" and that will open a new tab with that job details on the reports page

  • 4914: from Workflows (Clusters, DLTs), I want to be able to CMD+click on the name of the job and that will open a new tab with that job

  • 5171: improve navigation hints by adding background color of the selected button

  • 5109: move autoscaling timeline next to analyzed icon on workflow // job runs

 

🎛️ Configuration

  • 4959: rename cluster type to "Execution Engine" and add multiple badges

  • 5245 - Disable UploadAgent if user doesn't have rights to update global init scripts.

Operational Management – Telemetry & Consumption

  • added support for Azure Consumption - Modern Usage Details API

Telemetry Data

  • 4884 - API for cluster events analysis status. API that returns a percentage that represents the fraction of clusters that have the events analyzed.

Consumption Data

  • 5315 - Report exceptions in DB consumption steps as errors in audit logs

 

📙 AWS Support

  • 5126: add Databricks Workspace Name in the Storage Path besides the Id and Host

  • 5168: Computing cluster timeline segment cost in aws workspace

  • 4984 - collect workspace s3/storage and NAT/network costs

    • improved cost collection for AWS deployments to account for network related costs

  • 5407 - support cross account access to storage in agent and monitor

    • support agent accessing DynanoDb/SQS residing in different AWS account

    • accessing log-delivered billable usages from an s3 bucket in different accounts

  • 5445 - Assume AWS Role for Dynamo+SQS/CostExplorer/TagWorkspaceResource by the app

 

Miscellaneous

  • 5227: highlight row when hovering on job run details (and other similar pages)

 

🐞 Fixed Bugs

  • 4852: Running multitasks job isn't sorted correctly on runs history page

  • 5080: Duplicate entry on All purpose clusters/Jobs

  • 5095: Fix missing analysis detection for SKIPPED jobs

  • 5186: No requests sent when the filters are changed on the One-time runs page

  • 4539: Enable All does not take filtered list into account

  • 5269 - Fix cluster idleness detection for clusters which have not been active for a long time

  • 5279 - Fix notebook analysis detection for clusters which have not been active for a long time

  • 5271 - Failed analyses sometimes do not have a reason for the failure set

  • 5327 - Fix reports considering timezone

  • 5315 - Consumption loading aggregation step errors not reporting in audit logs

  • 5317 - Sort tasks of a multitask job in such a way that the pending one is the first on the list

  • 5345 - Failure on pipeline analysis persisting

 

Documentation

  • 5115: tooltip with (i) icon next to title on Overview page

  • 5116 - add tooltip (i) next to title on Reports // All purpose clusters

  • 5156: add tooltip for "Sort by" box

  • 5159: Add tooltip text on hover for the columns that have multiple aggregations enabled in the Reports pages