Provisioning with Unity Catalog Enabled (AWS)

Compute with Shared Access mode are only available when Unity Catalog is enabled for the workspace. Global Init Scripts are not executed for shared access mode clusters, which means the LHO can only enable monitoring for them by means of Cluster-Scoped Init Scripts stored in Unity Catalog volumes and Spark configurations and tags at the cluster level.

Moreover, loading init scripts from UC volumes is only supported in Databricks Runtimes 13.3 and more recent.

Configure Cluster Scope Init Script Storage

cluster-init-script - is a LHO script that copies the LHO Telemetry Agent jar from DBFS to the storage area accessible by clusters, which can be either in /Volumes or in /Workspace (workspace files) depending on Unity Catalog setup.

Unity Catalog Enabled

At the moment of this documentation creation, Databricks does not offer an API to upload the cluster-init-script in /Volumes, therefore the admin needs to configure the cluster-init-script by following the next steps.

 

Step 1) Use the following link to download the LHO cluster-init-script

  • https://bplmdemoappstg.blob.core.windows.net/libraries/LHO_VERSION/init_script.sh

    • e.g. https://bplmdemoappstg.blob.core.windows.net/libraries/2.13.1/init_script.sh

 

Step 2) Prepare storage path in Catalog Explorer

storage path
  • open Catalog Explorer

  • select schema: main/default

  • create volume: bp-lakehouse-monitor

The catalog, schema and name of the volume are configurable and these are default values.

Users that own shared clusters should be granted access to the configured catalog, schema and volume configured for the init script to be loaded from.

Workspaces to be monitored should also be configured to access the configured catalog.

 

Step 3) Upload script to

/Volumes/main/default/bp-lakehouse-monitor/script/

 

The complete path should look like:

/Volumes/main/default/bp-lakehouse-monitor/script/init_script.sh

 

Grant cluster owner Users Read Access to Volume

  • Click on the bp-lakehouse-monitor volume in ‘Catalog Explorer’

  • Select Permissions

  • Click on Grant

  • Select READ VOLUME and add ‘Account Users’ principal (a more narrow group of users that own shared clusters can be configured instead. The same group should also have access to the catalog and schema where the volume is created).

  • Click on Grant

 

The storage location backing the configured catalog and schema should be accessible from every workspace to be monitored.

 

Allow init script execution

Step 1) In Databricks

  • open Catalog

  • open Metastore details

  • open Allowed JARs/Init Scripts

  • add a new entry for

/Volumes/main/default/bp-lakehouse-monitor/script/init_script.sh

Unity Catalog Disabled

This configuration step is done automatically by LHO when LHO Telemetry Agent is updated.

LHO copies the cluster-init-script into the storage area accessible by the cluster, i.e. the Databricks workspace files

/Workspace/bp-lakehouse-monitor/script/init_script.sh

 

Note: permissions to this file must be granted manually to All Users in the workspace!

 

 


Related articles