1.3.0 Release Notes
🎉 What’s new
- 1 💡Smart Tips & Recommendations
- 2 📊Reporting
- 3 🧭Overview page
- 4 🔑Security
- 5 👥External user support on AD authentication
- 6 ⏳Provide all-purpose-cluster idleness time value
- 7 Enable monitoring in bulk instead of one-by-one
- 8 AKS Deployment
- 9 📚Operational Management for Consumption Data Loading
- 10 🏎Optimizations
- 11 ⚙️Miscellaneous
- 12 🐞Fixed bugs
💡Smart Tips & Recommendations
Improved Skew Algorithm
Use duration as key information to detect skew instead of using only number of records processed.
PR 2159: Parametrised skew algorithm
PR 2153: User Story 3875: [UI] - Bad skew details dialog columns changes
PR 2149: Improve skew algorithm to account also for time when detecting skew
Cost Alerts Thresholds
PR 2146: User Story 3627: [UI] support separate thresholds for jobs, workspaces, all-purpose-clusters in the alerts widget on overview page
📊Reporting
3689: [UI] persist tab selection on cluster stats widget - idle time
3820: Clicking a different date range should persist as I toggle back and forth between the two views, "workspaces" and "subscriptions"
Autoscaling Timeline
Display how workers were allocated from a cluster pool during the run of a job configured to use compute resources from a pool.
3892: [UI] - Add hover information on each bar on Worker VMs Allocation Timeline Chart
Hint regarding costs when using pooled resources
[3697] update info header when using Databricks cluster pools
[3592] provide visual hint to the user that the cost is estimated because it is running with pooled resources. cost is estimated, not
Display workflows with nested tasks
[3622] align the blue (i) "there are no jobs ..." in the middle of...
[3186] compute the DBU cost proportionately for nested workflows
Estimate cost data when using cluster pools
[3511] proportional VM costs for all-purpose clusters when using VM from pools
Estimate job run machine hours using telemetry data
Only 200 recently terminated all-purpose clusters are available with Databricks Cluster API. For the those clusters for which there is no more data available in Databricks, we analyze the telemetry data and infer which clusters where used.
Show, hide and reorder table columns
🧭Overview page
[3681] budget input accepts integer values only - same for cost alert thresholds
[BUG-3756] On subscription budget widget if predicted cost smaller than current month, replace it with 0
[3701] provide support to enlarge subscription budget widget to fill in the screen
[3681] Allow only positive integers as input to the yearly budget
[BUG-3677] Add all purpose clusters costs in overview tables and api
The costs for all purpose clusters were added to the values displayed on every widget from the overview page. Now each cost per subscription/workspace is the sum of costs for job clusters, all purpose clusters and workspace storage.
Cost data is aggregated per workspace only for the items monitored and supported by Lakehouse Monitor. If you are using in a Databricks Workspace features not yet supported by Lakehouse Monitor, then these costs will not appear in this total.
[3695] updated info and tooltips info for overview cost breakdown stats
Expose storage cost category
Storage category refers to costs incurred for cloud storage, DBFS and Hive metastore.
Some examples of costs that are not associated with any particular job or cluster.
Jobs that fail due to insufficient resources for spinning up a cluster to execute them still incur costs. These records do not have a valid ids to be associated with any job or cluster in particular, but they still incur costs.
Each Databricks workspace includes costs associated with storage (Microsoft.Storage) behind the DBFS and Hive metastore that are not caused by a particular cluster or job, but nevertheless is part of the workspace cost.
🔑Security
[3409] Support deletion of secret scopes
[Bug 3558] When we grant a service principal to a workspace, throw an error if the user that trigger the operation doesn't have sufficient permissions to make the SP contributor
[3604]When we run with AppService and we try to create a secret scope, a consent error without consent url is thrown
[3696] improve tooltips for identity management settings
👥External user support on AD authentication
Support view-only login for any user that has an account in a tenant with Active Directory authentication.
Create app registration that allows AD authentication from external users
Support listing of public subscriptions for external users
Support listing workspaces for external users
Restrict external users with read-only rights in bplm-application
Create a read-only group in Databricks workspace used as public workspace. External users should be added in databricks workspace as users and added as members of read-only group. Read-only group should be added manually to each job / cluster in order to give access to external users to see respective cluster/jobs
⏳Provide all-purpose-cluster idleness time value
Measure the idle time per cluster in a time period on all-purpose-clusters.
Enable monitoring in bulk instead of one-by-one
AKS Deployment
Install and run LM in an AKS Kubernetes Cluster
provide a HELM Package description
Run HELM CLI to install LakehouseMonitor in AKS
📚Operational Management for Consumption Data Loading
Provides the ability to track and inform the user of how the process of loading consumption data is doing for a given period of time.
User Story 3659: [UI] provide historical view of ConsumptionData Loading previous runs
Support multiple schedulers to run sequentially
[3659] provide historical view of ConsumptionData Loading
[3657] expose the status of the last run / current run in progress
[3684] audit finish/finished_with_warnings/failed statuses for consumption loading
[2573] save consumption data loader configurations in the SQL database.
[3260] support hourly load of the consumption data
PR 2164 - create multiple log files (bplm.log, bplm-consumption.log, bplm-processor.log) and in console append all logs
[3848] - Expose bplm.log file of bplm-container to host machine
Multiple schedules to load the consumption data
Support setting multiple schedules to load the consumption data for a given Azure Subscription.
Last run status
Last run status is exposed in the right-side panel. Here the user can see if there are still runs in progress and if the consumption loading process has finished successfully.
Consumption Loading run history
Consumption Loading run history enables the Lakehouse Monitor user to review the state of the all the scheduled runs.
PR 2156: Sort consumption history by start time desc, sort steps of same execution id by start type asc.
PR 2147: User Story 3863: [UI] - Get loader runs history when user clicks button to op...
PR 2142: User Story 3816: [UI] save consumption loading run configuration in the DB
PR 2141: [3815] - [API] save consumption loading run configuration in the DB - Add batchSize to consumption audit table
🏎Optimizations
[3748] Optimize DriverOs and ExecutorOs metrics loading
[3624] Cache databricks response when building job reports
[3789] optimize taskmetrics loading for realtime metrics processing
[3748] Optimize DriverOs and ExecutorOs metrics loading
PR 2155: Implement retry policy when we add service principal in databricks
[3907]: Fallback mechanism for race condition between notification processing/task metrics file writes.
[3904] Make cache timeout configurable
⚙️Miscellaneous
[3881] [UI] Rename Execution Status column from Reporting view
[3684] Expand longs to support information regarding intermediate steps status for consumption loading
3689: [UI] persist tab selection on cluster stats widget - idle time
[2573] Store consumption data loader configurations in the SQL database.
[3409] Support deletion of secret scopes
[3363] Save cost alerts threshold in database
[3624] Cache Databricks response when building job reports
[2830] support configurable budget for the platform cost widget
[3533] refactor consumption loading to include VM pool proportional cost calculation
[3439] use only "good" and "bad" for skew. Remove "excellent” tag
Azure VM deployment script end-to-end automation
[3781]: Allow the user to share an URL in order to open a view directly to a particular job run
add subscription ID and workspace ID to the URL for the job runs view
https://bplm-demo.westus.cloudapp.azure.com/job-runs/822699542392579?subscriptionId=a63c1e51-40ae-4a34-b230-bf80e132c05c&workspaceHost=adb-7883487973999049.9.azuredatabricks.net
[3861] Add more context around the latency logs
[3814] Add latency log when calling Databricks API to list jobs or job-runs
Support to Grant Billing reader rights to service principal
🐞Fixed bugs
[3583] Fix cost filters for job and job runs
[3607] Costs per jobs - costs cram into colored segment bar
[3472] UI does not redirect to login on 401 when token is expired
[3583] Fix cost filters for job and job runs
[3708] fix self-join in MetricsProcessor
[3886] uniform lower-case pool-id comparison - account for capitalization when comparing. Make PoolId to be a subtype for strings supporting case-insensitive comparison and hash code
[3852] - Upload agent exception when user is not authorized. Fix misleading IO exception thrown instead of UserNotAuthorized
[3876] - Query on job runs with date range removed does not retrieve today’s job runs
[3898]: jobs run by submitted API shown as deleted on the all-purpose-clusters page
3911: [UI] Workspace stats - Last hover box is cut
3910: [UI] Empty box on the hover of Databricks info on Workspace stats
[3593] fix backfilling costs into DB in incremental mode