Provisioning with Unity Catalog Enabled
Page Contents:
Introduction
Compute with Shared Access mode are only available when Unity Catalog is enabled for the workspace. Global Init Scripts are not executed for shared access mode clusters, which means the LHO can only enable monitoring for them by means of Cluster-Scoped Init Scripts stored in Unity Catalog volumes and Spark configurations and tags at the cluster level.
Moreover, loading init scripts from UC volumes is only supported in Databricks Runtimes 13.3 and more recent.
cluster-init-script - is a LHO-generated script file that copies the LHO Telemetry Agent jar from DBFS to the storage area accessible by clusters, stored in a Unity Catalog volume, by default in the main catalog and default schema (should be overridden, follow the steps from the Override section in this document)
Upload init script into Unity Catalog volume
A metastore admin will use the LHO interface for this task by clicking Upload button for “Volumes Init Script“.
The selected catalog for upload must have a configured storage location, either at the catalog level or metastore root. It is strongly recommended that you not use the main catalog, and follow the steps from the Override section in this document.
If not overridden, the init script is uploaded in your Catalog under /Volumes/main/default/bp-lakehouse-monitor/script/init_script.sh
Enable All Workspaces access to Catalog
All Workspaces (without exception) should have access to the configured catalog (main or custom, see next section).
Override init script location (strongly recommended)
We strongly recommend against using the default main catalog and default schema for storing the init script, we recommend one of the following:
one catalog per Dbx account, applies only when the account is hosted in a single region, with one metastore (values can be of your choosing):
INIT_SCRIPT_CATALOG=lho-script-catalogINIT_SCRIPT_CATALOG_SCHEMA=lho-script-schema
one catalog per region/metastore:
WORKSPACES_METASTORE_OVERRIDES={metastoreId1}:/Volumes/<path-to-script>/init-script.sh,{metastoreId2}:/Volumes/<path-to-script>/init-script.sh
per workspace: you may override the location of the init script per workspace in cases where you have strict network segregation/isolation of storage locations:
WORKSPACES_OVERRIDES={workspaceId1}:/Volumes/<path-to-script>/init-script.sh,{workspaceId2}:/Volumes/<path-to-script>/init-script.sh
You can configure with option 2 and then also override at the workspace level for select workspaces.
Edit .env configuration file on the host VM and update with one of the options above. Save file. Recreate the docker container:
docker-compose up -dOnce the app restarts, the metastore admin can upload the init script within LHO by clicking Upload button on the Settings > Monitor Setup > Provisioning & Permissions page.
If you encounter a 404 error during this step, refer to the Troubleshooting section below for guidance.
Grant All Account Users Read Access to Volume
Click on the
bp-lakehouse-monitorvolume in ‘Catalog Explorer’Select Permissions
Click on Grant
Select
READ VOLUMEand add ‘Account Users’ principal (a more narrow group of users that own shared clusters can be configured instead. The same group should also have access to the catalog and schema where the volume is created).Click on Grant
The storage location backing the configured catalog and schema should be accessible from every workspace to be monitored.
Grant Service Principal Write Volume Access to Volume
Click on the
bp-lakehouse-monitorvolume in ‘Catalog Explorer’Select Permissions
Click on Grant
Select
WRITE VOLUMEand add the Service Principal client id.Click on Grant
Grant Service Principal Manage Allowlist Access in Metastore
Click on Catalog tab from the left navigation tab.
Click on
Settingswheel icon and selectData Administrationitem.Select Permissions
Click on Grant
Select
MANAGE ALLOWLISTand add the Service Principal client id.Click on Grant
Allow init script execution
Step 1) In Databricks
open Catalog
open Metastore details
open Allowed JARs/Init Scripts
add a new entry for
/Volumes/main/default/bp-lakehouse-monitor/script/init_script.sh
Troubleshooting problems
404 Error – “The specified filesystem does not exist”
Verify the storage account settings where the catalog stores its data. Ensure that the corresponding blob container exists and is accessible from your Databricks workspace.
Related articles