Provisioning with Unity Catalog Enabled
Introduction
Compute with Shared Access mode are only available when Unity Catalog is enabled for the workspace. Global Init Scripts are not executed for shared access mode clusters, which means the LHO can only enable monitoring for them by means of Cluster-Scoped Init Scripts stored in Unity Catalog volumes and Spark configurations and tags at the cluster level.
Moreover, loading init scripts from UC volumes is only supported in Databricks Runtimes 13.3 and more recent.
cluster-init-script
- is a LHO script that copies the LHO Telemetry Agent jar from DBFS to the storage area accessible by clusters, which can be either in /Volumes
or in /Workspace
(workspace files) depending on Unity Catalog setup.
Automatically enabling Unity Catalog
A metastore admin has the option to automatically enable LHO with Unity Catalog by following these steps:
In LHO, unde Settings → Provisioning and permissions click on upload unde Volumes Init Script
Once this step completes succesfully the init script should be uploaded and you can check it in your Catalog under
/Volumes/main/default/bp-lakehouse-monitor/script/init_script.sh
Using different catalogs
Should you not want to use the main
catalog for this, before enabling you need to update an environment variable for LHO and restart.
SSH into the LHO vm and locate the
.env
fileWhen editing this file either locate or add the
WORKSPACES_OVERRIDES
environment variable. This variable tells LHO what path to use to upload the init script. It supports per workspace settings as you can see in this example:
WORKSPACES_OVERRIDES={workspaceId1}:/Volumes/path/to/init-script.sh,{workspaceId2}:/Volumes/path/to/init-script.sh
After saving the file make sure to restart the Lakehouse Optimizer container for the new settings to take effect
docker restart bplm
Once the app restarts, the metastore admin can upload the init script as outlined in the section above by clicking on Upload in the Settings → Provisioning and permissions section.
Manually enabling Unity Catalog
Should enabling and upload of the init script from the LHO app not be acceptable, the admin needs to upload the cluster-init-script in /Volumes
by following the following steps.
Step 1) Use the following link to download the LHO cluster-init-script
https://bplmdemoappstg.blob.core.windows.net/libraries/LHO_VERSION/init_script.sh
e.g.
https://bplmdemoappstg.blob.core.windows.net/libraries/2.27.1/init_script.sh
Step 2) Prepare storage path in Catalog Explorer
open Catalog Explorer
select schema:
main/default
. If using themain
catalog is not an option you can choose/create a different one but be sure to grant proper access to it.create volume:
bp-lakehouse-monitor
The catalog, schema and name of the volume are configurable and these are default values.
Users that own shared clusters should be granted access to the configured catalog, schema and volume configured for the init script to be loaded from.
Workspaces to be monitored should also be configured to access the configured catalog.
Step 3) Upload script to
/Volumes/main/default/bp-lakehouse-monitor/script/
Note that if you’re not using the main
catalog the path will be specific to your catalog.
The complete path should look like:
/Volumes/main/default/bp-lakehouse-monitor/script/init_script.sh
Grant cluster owner Users Read Access to Volume
Click on the bp-lakehouse-monitor volume in ‘Catalog Explorer’
Select Permissions
Click on Grant
Select READ VOLUME and add ‘Account Users’ principal (a more narrow group of users that own shared clusters can be configured instead. The same group should also have access to the catalog and schema where the volume is created).
Click on Grant
The storage location backing the configured catalog and schema should be accessible from every workspace to be monitored.
Allow init script execution
Step 1) In Databricks
open Catalog
open Metastore details
open Allowed JARs/Init Scripts
add a new entry for
/Volumes/main/default/bp-lakehouse-monitor/script/init_script.sh
Unity Catalog Disabled
This configuration step is done automatically by LHO when LHO Telemetry Agent is updated.
LHO copies the cluster-init-script
into the storage area accessible by the cluster, i.e. the Databricks workspace files
/Workspace/bp-lakehouse-monitor/script/init_script.sh
Note: permissions to this file must be granted manually to All Users in the workspace!
Related articles