LHO Application Resource & Permissions Requirements (Azure)

LHO Application Resource & Permissions Requirements (Azure)

This page provides an overview of the Azure resources required to deploy Lakehouse Optimizer.

If you use the PowerShell script provided in this guide https://blueprinttechnologies.atlassian.net/wiki/spaces/BLMPD/pages/2515566593, the required resources and permissions will be created automatically.

Page Contents:

 

Resource Group in your Azure tenant

One Resource Group in your Azure tenant (in the same region as your Azure Databricks resources) containing:

Azure Ubuntu Linux VM

  • OS: Ubuntu Linux 24.04

  • Recommended Type B8ms or similar with minimum 8 cores, 32G RAM, 50GB of disk space

  • Docker Engine installed (version 23.0 or later)

  • Docker Compose installed (version v2.40.0 or later)

  • Following ports accessible:

    • 80 - Internet access (used by LetsEncrypt servers to issue/renew SSL certificate)

    • 443 - IP restricted HTTPS access (firewall) for locations LHO will be used from

    • 22 - IP restricted SSH access (firewall) for locations LHO will be managed from

  • If any Databricks Workspaces have Secure Cluster Connectivity enabled, a private endpoint of type databricks_ui_api is required between the LHO VNET and each Secure Azure Databricks Workspace

  • An extra 50GB datadisk attached to LUN 0 (Linux VM disk Volume)

  • Outbound Internet access: required for system package updates and Blueprint Azure Container Registry access for retrieving LHO docker images (ACR login server: blueprint.azurecr.io)

  • DNS Name configured on the VMs public IP

  • SSL Keystore with the SSL Certificate and Private Key, if the LetsEncrypt Certificate is not an option. See https://blueprinttechnologies.atlassian.net/wiki/x/CYDXrg for details.

Azure SQL Server

  • Recommended Type: S3 w. 400 DTU or higher (Serverless min 2 cores up to 8 as alternative)

  • SQL login (User & Pass)

  • Empty Azure/MS SQL Database

    • Collation on LHO DB: SQL_Latin1_General_CP1_CI_AS

    • SQL Login user granted the following permissions:

      • db_ddladmin

      • db_datareader

      • db_datawriter

Azure Key Vault

  • Recommended Type: Standard

  • Recommended permission model: Vault Access Policy

    • List, Get and Set permissions granted for Secrets to the VM’s System-Assigned Managed Identity

    • if RBAC is required instead of Vault Access Policy grant the Key Vault Secrets Officer role to the VM’s System-Assigned Managed Identity

  • the following secrets created

    • msft-provider-auth-secret - the Microsoft EntraID App Registration client secret (for Azure Active Directory Single Sign On into the application as well as the LHO Application and Agent to access the Azure Storage account storing telemetry data)

    • storage-account-key - (only needed if accessing storage account through the Service Principal is not an option) The storage account access key which will be used by the LHO Service Principal and LHO Agent running in each Databricks workspace cluster to access (read/write) the Azure Storage account.

    • mssql-password - the SQL Login password

    • application-encryption-secret - the secret used to encrypt data inside the LHO database (any random string)

Azure Storage Account

  • Storage account key access enabled (if access key is used instead of Service Principal)

  • Hierarchical namespace enabled

  • Service Principal granted the following roles at the storage account level :

    • Storage Table Data Contributor

    • Storage Queue Data Contributor

  • Network access required from each Databricks workspace (VNet of the managed resource group) into the LHO storage account since the LHO agent running in the data plane of each Dbx workspace will write telemetry data to azure tables and send message to the azure queues

Microsoft EntraID App Registration/Service Principal

  1. Permissions granted as per https://blueprinttechnologies.atlassian.net/wiki/x/TAArnw

  2. Permission to list Databricks workspaces granted as per https://blueprinttechnologies.atlassian.net/wiki/spaces/BLMPD/pages/3663888385/Initial+Setup+and+Configuration+Azure#Assign-workspace-read-permissions-via-Azure-AD-custom-role

  3. Role based authorization in the LHO app can be configured through Azure Custom App Roles created as per https://blueprinttechnologies.atlassian.net/wiki/spaces/BLMPD/pages/2670493810

  4. Billing Reader role granted on every Azure Subscription (or scoped to every Resource Group, data plane and control plane/managed resource group) that contains Databricks workspaces. We recommend granting the role to the entire subscription so that any new workspace created will be captured by LHO during consumption data loading. LHO will filter our non-Databricks data.
    Additional information can be found in https://blueprinttechnologies.atlassian.net/wiki/spaces/BLMPD/pages/3574464513

  5. 1 Web Platform configuration with:

    1. redirect URI:
      https://<VM-DNS>/login/oauth2/code/azure

    2. ID Token enabled

  6. Granted Admin role in all Databricks workspaces

  7. System table permissions: USE_CATALOG, USE_SCHEMA and SELECT granted on the following tables in the system catalog:

    1. billing.usage

    2. query.history

    3. compute.warehouse_events

    4. compute.warehouses

  8. Enable All Workspaces option set on the main catalog

  9. Configuration of support for Unity Catalog can be performed by a Metastore Admin user from the LHO Web Interface, provisioning area.

    1. If this is not an option, or the metastore is not configured properly, follow this document for full manual configuration: https://blueprinttechnologies.atlassian.net/wiki/x/LoDNo

 

Referenced Pages: