Recommended configuration
Deployment | Azure Service Management | Azure Security Access for Databricks | Credentials Storage | Azure Cloud Storage Access |
---|---|---|---|---|
Azure VM/Docker | ON
|
|
|
|
Reference links
Security FAQs
👤Active Directory Authentication
How do I login into the LHO app? (1)
Signing into the Lakehouse Optimizer application URL is done via Azure Active Directory credentials that are authorized for access into all available subscriptions in the Azure tenant and for all available workspaces in the available subscriptions.
How do I configure the Azure Active Directory Service Principal?
LHO deployment scripts create and configure an App Registration in your Azure portal Microsoft Azure portal App Registrations for Azure AD Single Sign-On and as the application identity for calling downstream Databricks APIs for background telemetry data analysis
LHO requires an administrator rights when running deployment scripts.
Configurations done automatically by the deployment scripts:
Creates an Azure AD App Registration that will be used as a Service Principal for Azure AD Single Sign-On
this service principal is of type system-assigned managed identity (learn more at “Managed Identities Types”)
Creates an App Registration in your Azure portal Microsoft Azure portal App Registrations
sets a name for the Service Principal. This name will be used later to assign roles
Sets the redirect uri to
https://{FQDN}/login/oauth2/code/azure
whereFQDN
is the url the LHO Application is published withCreates a secret (Certificates & Secrets tab) named
msft-provider-auth-secret
, also known asclient secret
in Azure Key Vault sets the LHO secret
msft-provider-auth-secret
to <value-of-msft-provider-auth-secret
>. The Azure Key Vault instance was already created by the LHO deployment script with the name specified during deployment process.
Enables ID Tokens in the Authentication tab
sets
clientId
,tenantId
as public variables for LHO.env
file (3)
💰Consumption Data Authentication
How is the LHO app authenticating itself to external services? (2)
By using Azure Service Principal as Managed Identity if you run LHO with Service Principal enabled by default.
The user who installed the Lakehouse Optimizer application is assumed to already have User Access Administrator
role. This role is required to install LHO app using the deployment scripts. See steps of “How do I configure the Azure Active Directory group?”.
How can I grant rights to LHO app to read consumption (cost) data?
Billing Reader
role is required to be granted to the Service Principal for loading consumption data for a target Azure subscription.
The LHO Application Settings page provides a button for the grant. This function however requires the Azure Service Management Functions to be enabled, which is the case in this configuration.
Therefore, the following automated procedure will be executed.
The administrator configuring LHO application can grant the necessary permissions to load consumption data directly from the Setting panel of LHO app - click on “Grant Access to Service Principal” button. The signed in LHO user must have at least the UserAccessAdministrator role in the subscription to properly configure the necessary permission.
The following configuration steps are done automatically by LHO:
in Azure Portal → Subscriptions → Azure subscription → Access control (IAM) → Role assignment
Billing Reader
role is added to Members of the Service Principal identitysee more details at “How do I configure the Azure Active Directory group?”
Billing Reader
role at Azure subscription level is required in order to allow the LHO application to read consumption/usage details data.
Who can trigger consumption (cost) data loading? (3)
Any signed in Azure AD user that has the admin role in the LHO application can access the Consumption Management page in LHO and trigger the loading of consumption data from Azure to LHO.
The consumption data is loaded by using the Service Principal of LHO app configured previously, see step “How can I grant rights to LHO app to read consumption (cost) data?”. The signed in user does not need any additional rights assigned to trigger this operation.
🗄️Secrets Storage
What secrets do I need to take care of?
Service Principal Client Secret Secret
see “How do I configure the Azure Active Directory Service Principal?”
LHO Azure SQL/SQL Server database user & password
see “LHO Installation Manual” (contact Blueprint)
saved in Azure Key Vault
Where do I store my secrets? (9)
This LHO deployment configuration uses Azure Key Vault to store and retrieve secrets.
Read more here: Azure Key Vault.
Who can access the secrets?
LHO app can read all environment variables configured the virtual machine environment
Who is using the credentials and for what?
Service Principal Client Secret
see “How do I configure the Azure Active Directory Service Principal?”
used by the LHO App for SSO via OAuth2 Web Flow, using the configured Service Principal
used by Service Principal to access Azure Blob Storage, Containers and Queues on behalf of the designated Service Principal
used by the LHO App to call the Databricks API on behalf of the designated Service Principal
LHO
database user & password
see “LHO Installation Manual” (contact Blueprint)
used by LHO app to read and write data to LHO database deployed as Azure SQL Server (10)
saved in Azure Key Vault
🔐Databricks access
How is Databricks managed in LHO? (4)
A
Databricks user
must be created in Databricks (see Workspace/Settings/Admin Console/Users) for each user that signs into LHO via Azure AD. The LHO app does not create users automatically as it would require a higher privilege for the signed in user.The Databricks user’s permission will dictate the level of access to entities and resources. If the workspace is part of the Premium tier, more granular access control can be enabled, and the “Can Manage” permission is required for this user to change cluster or job configuration.
For non-Premium workspaces, any Databricks user can edit configurations, the one exception is All Purpose clusters, they can be only edited by their Owners, since we introduced Databricks secrets in configuration.
Authorization: Lakehouse Optimizer calls Databricks APIs on behalf of the signed in Azure AD user or Service Principal, depending on context, so authorization is performed by Databricks.
Authentication for the downstream Databricks API calls is done via OAuth2 Access Tokens (OnBehalfOf flow) https://learn.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-on-behalf-of-flow
For more details see https://blueprintconsultingservices.atlassian.net/wiki/spaces/BLMPD/pages/2547286017
LHO App Settings page provides a feature called “Add Service Principal” that will create a Databricks user for the configured Service Principal in the selected Databricks workspace.
It uses the SCIM Databricks API under the hood for Premium tier workspaces. For standard tier workspaces, it elevates the Service Principal role’s to Contributor, makes a Databricks API call (which in turn creates the Databricks user in the workspace) and revokes the Contributor role. (5)
What are the required permissions for the signed in user and Service Principal?
Any LHO Azure AD signed in user will be authorized by Databricks through the API calls made by the application on the user’s behalf. See the full list of permissions required for a user to have access to all the features in LHO.
Any LHO logged in user will see Databricks entities based on the rights configured for that user in Databricks. LHO uses the signed-in user AD access token to list Databricks entities.
LHO app will use the Service Principal to call Databricks APIs for telemetry analysis.
to create a Service Principal to use for Databricks please see “How is Databricks access managed in LHO?”
As a background process, LHO app will use the Service Principal user to read Databricks entities.
to create a Service Principal to use for Databricks please see “What rights does LHO need to access Databricks entities?”
Managed Identiy used by LHO is created at step “How do I configure the Azure Active Directory group?”
The Service Principal represents the application identity is it will require access to all the Databricks entities the LHO App monitors: clusters, jobs, DLT pipelines, SQL Warehouses. Analysis data will be stored in the database, used in reporting. The Signed in Azure AD user will be authorized at runtime by the downstream API calls and the application will filter the content of the reports generated from the analyzed database data with the authorization rules in Databricks. Any signed in user will only be able to access the Databricks entities is authorized for, even though the database will have analyzed data for all (via Service Principal)
The LHO “Add Service Principal” feature on the Settings page will add the SP Databricks user to the admins group. The exact list of required permissions however is documented below:
Service Principal Permissions Required by Databricks API
LHO MODULE | Databricks API | PERMISSION NEEDED |
ANALYZER | /2.1/jobs/get/{job_id} | CAN_VIEW |
ANALYZER | /2.1/jobs/runs/list | CAN_VIEW |
ANALYZER | /2.1/jobs/runs/get/{run_id} | CAN_VIEW |
ANALYZER | /2.0/clusters/list | CAN_ATTACH_TO |
ANALYZER | /2.0/pipelines/ | CAN_VIEW |
ANALYZER | /2.0/pipelines/{pipeline_id} | CAN_VIEW |
ANALYZER | /2.0/pipelines/{pipeline_id}/events | CAN_VIEW |
ANALYZER | /2.0/pipelines/{pipeline_id}/updates | CAN_VIEW |
ANALYZER | /2.0/pipelines/{pipeline_id}/updates/{update_id} | CAN_VIEW |
ANALYZER | /2.0/clusters/events | CAN_ATTACH_TO |
ANALYZER | /2.0/sql/warehouses | CAN_USE |
User Permissions Required by Databricks API
LHO MODULE | Databricks API | PERMISSION NEEDED |
REPORTING | /2.1/jobs/list | CAN_VIEW |
REPORTING | /2.1/jobs/get/{job_id} | CAN_VIEW |
REPORTING | /2.1/jobs/runs/list | CAN_VIEW |
REPORTING | /2.1/jobs/runs/get/{run_id} | CAN_VIEW |
REPORTING | /2.0/clusters/list | CAN_ATTACH_TO |
REPORTING | /2.0/pipelines/ | CAN_VIEW |
REPORTING | /2.0/pipelines/{pipeline_id} | CAN_VIEW |
REPORTING | /2.0/pipelines/{pipeline_id}/updates | CAN_VIEW |
REPORTING | /2.0/workspace/get-status | CAN_READ |
PROVISIONING | /2.1/jobs/list | CAN_VIEW |
PROVISIONING | /2.0/dbfs/delete | CAN_MANAGE |
PROVISIONING | /2.0/dbfs/mkdirs | CAN_MANAGE |
PROVISIONING | /2.0/dbfs/add-block | CAN_MANAGE |
PROVISIONING | /2.0/dbfs/get-status | CAN_READ |
PROVISIONING | /2.1/jobs/get/{job_id} | CAN_VIEW |
PROVISIONING | /2.1/jobs/reset | CAN_MANAGE |
PROVISIONING | /2.0/clusters/list | CAN_ATTACH_TO |
PROVISIONING | /2.0/clusters/get | CAN_ATTACH_TO |
PROVISIONING | /2.0/clusters/edit | CAN_MANAGE |
PROVISIONING | /2.0/global-init-scripts [GET] | ADMIN GROUP |
PROVISIONING | /2.0/global-init-scripts [POST] | ADMIN GROUP |
PROVISIONING | /2.0/global-init-scripts/{script_id} | ADMIN GROUP |
PROVISIONING | /2.0/pipelines/{pipeline_id} [GET] | CAN_VIEW |
PROVISIONING | /2.0/pipelines/{pipeline_id} [PUT] | CAN_MANAGE |
PROVISIONING | /2.0/secrets/scopes/list | MANAGE |
PROVISIONING | /2.0/secrets/list | READ |
PROVISIONING | /2.0/secrets/create | WRITE |
PROVISIONING | /2.0/secrets/delete | MANAGE |
PROVISIONING | /2.0/secrets/scopes/delete | MANAGE |
PROVISIONING | /2.0/secrets/scopes/create | MANAGE |
What rights are used to monitor Databricks entities (workflows, clusters, pipelines)?
In order to monitor an entity (e.g. workflow, cluster, pipeline), LHO needs to update the configuration of that entity. Databricks allows only some of the users to update modify the configuration of an entity (e.g. administrators or owners of clusters).
LHO uses the rights of the signed in user to enable monitoring of an entity.
If the signed in user can modify the configuration of an entity in Databricks, then LHO also will be able to monitor that entity.
LHO does not use the Service Principal rights to enable or disable monitoring of an entity.
What rights do I need to collect telemetry data?
If the signed in user can modify the configuration of an entity in Databricks, then LHO also will be able to monitor that entity. Collecting telemetry data does not require any additional rights.
How do I configure the LHO agent (telemetry collector) to gather data?
No additional configuration is required. Configuration is done automatically by the LHO app.
LHO uses the rights of the signed in user to enable monitoring of an entity.
If the signed in user can modify the configuration of an entity in Databricks, then LHO also will be able to monitor that entity. Collecting telemetry data does not require any additional rights.
How do I read credentials from the LHO Agent (telemetry collector)? (6)
The telemetry collector agent is using the configured Secrets Scope (Azure Key Vault or Databricks Secrets) for securely accessing the cloud storage configuration. Databricks allows applications running inside Databricks environment to access secrets stored either in Databricks Secrets or Azure Key Vault.
LHO uses either Databricks Secrets or Azure Key Vault to provide credentials for the LHO Agent (collector) to use. The option is selected in the Settings page of the LHO app in the Databricks Workspace section, see “Created / Edit Secrets Scope” button.
A security administrator will need to create a Secrete Scope to store credentials for LHO Agent, by clicking “Create Secrets Scope”.
Once created, LHO app, in the Settings page > Access Rights, will be linked to the newly created Secret Scope. This means that LHO will be able to access and use credentials securely in a Databricks environment.
LHO App saves the following information in the configured Secret Scope:
service principal
tenant id
,client id
,client secret
storage account
container name
,account name
,queue name
LHO Agent has access to the configuration of a Databricks entity (e.g. Workflow, Job) and in that configuration can be found the Secrete Scope container where credentials are stored. Any entity has access to Databricks Secrets service.
What permissions are required for the telemetry agent to read/write data from/to Azure Blob Storage?
The telemetry collector agent is using the configured Service Principal to identify itself and securely write to Azure Blob Storage container.
How does the telemetry agent communicate with the LHO App for realtime telemetry data analysis? (7)
The LHO Agent stores telemetry data in the cloud storage and sends events (e.g. spark job completed finished events) to Azure Queue configured on the Storage Account that also saves the telemetry data.
The LHO App dequeues events from this Queue and triggers the analysis when the Databricks job or DLT update is complete.
The access to cloud storage via Access Key can be disabled the the LHO App configured to use Service Principal to access cloud storage. The LHO Service Principal requires the Storage Queue Data Contributor
role at the Storage Account level used by LHO Agent. This allows the LHO App to read data from the Storage Account’s Queue and the LHO Agent to write data to this queue. (8)
Storage Queue Data Contributor
role must be granted manually to the LHO Service Principal on the Storage Account used by LHO Agent.
📍 Public Workspaces
How do I expose Subscriptions and Workspaces to users from other AD tenants?
See https://blueprintconsultingservices.atlassian.net/wiki/spaces/BLMPD/pages/2565537923
🪪 LHO Roles
What roles are there in the LHO app?
How can I assign LHO roles to users?
Managed Identities
Managed Identities Types
There are two types of managed identities:
System-assigned. Some Azure resources, such as virtual machines allow you to enable a managed identity directly on the resource. When you enable a system-assigned managed identity:
A service principal of a special type is created in Azure AD for the identity. The service principal is tied to the lifecycle of that Azure resource. When the Azure resource is deleted, Azure automatically deletes the service principal for you.
By design, only that Azure resource can use this identity to request tokens from Azure AD.
You authorize the managed identity to have access to one or more services.
User-assigned. You may also create a managed identity as a standalone Azure resource. You can create a user-assigned managed identity and assign it to one or more Azure Resources. When you enable a user-assigned managed identity:
A service principal of a special type is created in Azure AD for the identity. The service principal is managed separately from the resources that use it.
User-assigned identities can be used by multiple resources.
You authorize the managed identity to have access to one or more services.
Reference
Why and how to create Azure service principals
Managed identities provide an identity to applications that access Azure resources. Both service principles and managed identities enable fine-grained, programmatic access to Azure infrastructure without having to put passwords into scripts.
The key difference between Azure service principals and managed identities is that, with the latter, admins do not have to manage credentials, including passwords.
Add Comment