Azure/PCF with disabled Azure Service Management
Deployment | Azure Service Management | Azure Security Access for Databricks | Credentials Storage | Azure Cloud Storage Access |
---|---|---|---|---|
Pivotal Cloud Foundry (PCF) | OFF
| Service Principal | unprotected in configuration (external tool will access secure credentials and transfer into PCF config) | Service Principal |
Reference links
Pivotal Cloud Foundry
Security FAQs
- 1 Active Directory Authentication
- 2 💰Consumption Data Authentication
- 3 🗄️Secrets Storage
- 4 🔐Databricks access
- 4.1 How is Databricks access managed in LHO? (4)(5)
- 4.2 What are the required permissions for the signed in user and Service Principal?
- 4.3 What rights are used to monitor Databricks entities (workflows, clusters, pipelines)?
- 4.4 What rights do I need to collect telemetry data?
- 4.5 How do I configure the LHO agent (telemetry collector) to gather data?
- 4.6 How do I read credentials from the LHO Agent (telemetry collector)? (6)
- 4.7 What permissions are required for the telemetry agent to read/write data from/to Azure Tables? (7)
- 4.8 How does the telemetry agent communicate with the LHO App for realtime telemetry data analysis? (8)
- 5 📍 Public Workspaces
- 6 🪪 LHO Roles
Active Directory Authentication
How do I login into the LHO app? (1)
Signing into the Lakehouse Optimizer application URL is done via Azure Active Directory credentials that are authorized for access into all available subscriptions in the Azure tenant and for all available workspaces in the available subscriptions.
How do I configure the Azure Active Directory Service Principal?
LHO requires an administrator to set up an App registration for LHO in the Azure Portal.
Create an Azure AD App Registration that will be used as a Service Principal for Azure AD Single Sign-On and as the application identity for calling downstream Databricks APIs for background telemetry data analysis
Create an App Registration in your Azure portal Microsoft Azure portal App Registrations
set a name for the Service Principal. This name will be used later to assign roles
Set the redirect uri to
https://{FQDN}/login/oauth2/code/azure
whereFQDN
is the url the LHO Application is published withCreate a secret (Certificates & Secrets tab) named
msft-provider-auth-secret
and save the secret value in a text file as it will be required for configuring LHO environment variablesin PCF portal before starting LHO app, set the LHO env variable
MSFT_PROVIDER_AUTH_SECRET
to <value-of-msft-provider-auth-secret
>set the LHO env variable SERVICE_PRINCIPAL_CLIENT_SECRET to <value-of-
msft-provider-auth-secret
>
Enable ID Tokens on the Authentication tab
Copy
clientId
,tenantId
generated at step (a) and set these variables in PCF portal as LHO env variablesSERVICE_PRINCIPAL_CLIENTID
=<app-registration-clientId
>SERVICE_PRINCIPAL_TENANTID
=<app-registration-tenantId
>
💰Consumption Data Authentication
The user who installed the Lakehouse Optimizer application is assumed to already have User Access Administrator
role. This role is required to configure the required permissions for the Service Principal of LHO app. See steps of “How do I configure the Azure Active Directory Service Principal?”.
How can I grant rights to LHO app to read consumption (cost) data? (2)
Billing Reader
role is required to be granted to the SP for loading consumption data for a target Azure subscription.
The LHO Application Settings page provides a button for the grant. This function however requires the Azure Service Management functions to be enabled, which is not the case in this configuration.
Therefore, the following manual procedure must be executed.
The administrator configuring LHO during setup will grant the LHO application running in PCF service the necessary permissions to load consumption data. The signed in LHO user must have at least the UserAccessAdministrator role in the subscription to properly configure the Service Principal roles.
Navigate to Azure Portal → Subscriptions → select Azure subscription → select Access control (IAM) → add Role assignment
select
Billing Reader
role and click on MembersAssign access to
User, group, or service principal
Members
click on Add members
search for name of Service Principal
see more details at “How do I configure the Azure Active Directory Service Principal?”
Billing Reader
role at Azure subscription level is required in order to allow the LHO application to read consumption/usage details data.
Who can trigger consumption (cost) data loading? (2)
Any signed in Azure AD user that has the admin role in the LHO application can access the Consumption Management page in LHO and trigger the loading of consumption data from Azure to LHO.
The consumption data is loaded by using the identity of the Service Principal of LHO app configured previously, see step “How can I grant rights to LHO app to read consumption (cost) data?”. The signed in user does not need any additional rights assigned to trigger this operation.
🗄️Secrets Storage
What secrets do I need to take care of?
Service Principal Client Secret Secret
see “How do I configure the Azure Active Directory Service Principal?”
LHO Azure SQL/SQL Server database user & password
see “LHO Installation Manual” (contact Blueprint)
Where do I store my secrets? (3)
This LHO deployment configuration does not use Azure KeyVault to store and retrieve secrets.
Secrets are stored in plain text in the environment variables in the PCF portal by a different pipeline that might very well retrieve secrets from Azure KeyVault.
Read more here: Azure Key Vault.
Who can access the secrets?
LHO app can read all environment variables configured in PCF configuration.
Any user that can open the PCF portal and configure LHO instance.
Who is using the credentials and for what?
Service Principal Client Secret
see “How do I configure the Azure Active Directory Service Principal?”
used by the LHO App for SSO via OAuth2 Web Flow, using the configured Service Principal
used by the LHO App to access Azure Blob Storage, Containers and Queues on behalf of the designated Service Principal
used by the LHO App to call the Databricks API on behalf of the designated Service Principal
LHO
database password
see “LHO Installation Manual” (contact Blueprint)
used by LHO app to read and write data to LHO database deployed as Azure SQL Server (9)
SQL database Login and password are read from the PCF configuration
🔐Databricks access
How is Databricks access managed in LHO? (4)(5)
A
Databricks user
must be created in Databricks (see Workspace/Settings/Admin Console/Users) for each user that signs into LHO via Azure AD. The LHO app does not create users automatically as it would require a higher privilege for the signed in user.The Databricks user’s permission will dictate the level of access to entities and resources. If the workspace is part of the Premium tier, more granular access control can be enabled, and the “Can Manage” permission is required for this user to change cluster or job configuration.
For non-Premium workspaces, any Databricks user can edit configurations, the one exception is All Purpose clusters, they can be only edited by their Owners, since we introduced Databricks secrets in configuration.
Authorization: Lakehouse Optimizer calls Databricks APIs on behalf of the signed in Azure AD user or Service Principal, depending on context, so authorization is performed by Databricks.
Authentication for the downstream Databricks API calls is done via OAuth2 Access Tokens (OnBehalfOf flow) https://learn.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-on-behalf-of-flow
For more details see https://blueprintconsultingservices.atlassian.net/wiki/spaces/BLMPD/pages/2547286017
LHO App Settings page provides a feature called “Add Service Principal” that will create a Databricks user for the configured Service Principal in the selected Databricks workspace. It uses the SCIM Databricks API under the hood for Premium tier workspaces. For standard tier workspaces, it elevates the Service Principal role’s to Contributor, makes a Databricks API call (which in turn creates the Databricks user in the workspace) and revokes the Contributor role.
What are the required permissions for the signed in user and Service Principal?
Any LHO Azure AD signed in user will be authorized by Databricks through the API calls made by the application on the user’s behalf. See the full list of permissions required for a user to have access to all the features in LHO.
LHO app will use the Service Principal to call Databricks APIs for telemetry analysis.
to create a Service Principal user for Databricks please see “How is Databricks access managed in LHO?”
The Service Principal represents the application identity is it will require access to all the Databricks entities the LHO App monitors: clusters, jobs, DLT pipelines, SQL Warehouses. Analysis data will be stored in the database, used in reporting. The Signed in Azure AD user will be authorized at runtime by the downstream API calls and the application will filter the content of the reports generated from the analyzed database data with the authorization rules in Databricks. Any signed in user will only be able to access the Databricks entities is authorized for, even though the database will have analyzed data for all (via Service Principal)
The LHO “Add Service Principal” feature on the Settings page will add the SP Databricks user to the admins group. The exact list of required permissions however is documented below:
Service Principal Permissions Required by Databricks API
LHO MODULE | Databricks API | PERMISSION NEEDED |
ANALYZER | /2.1/jobs/get/{job_id} | CAN_VIEW |
ANALYZER | /2.1/jobs/runs/list | CAN_VIEW |
ANALYZER | /2.1/jobs/runs/get/{run_id} | CAN_VIEW |
ANALYZER | /2.0/clusters/list | CAN_ATTACH_TO |
ANALYZER | /2.0/pipelines/ | CAN_VIEW |
ANALYZER | /2.0/pipelines/{pipeline_id} | CAN_VIEW |
ANALYZER | /2.0/pipelines/{pipeline_id}/events | CAN_VIEW |
ANALYZER | /2.0/pipelines/{pipeline_id}/updates | CAN_VIEW |
ANALYZER | /2.0/pipelines/{pipeline_id}/updates/{update_id} | CAN_VIEW |
ANALYZER | /2.0/clusters/events | CAN_ATTACH_TO |
ANALYZER | /2.0/sql/warehouses | CAN_USE |
User Permissions Required by Databricks API
LHO MODULE | Databricks API | PERMISSION NEEDED |
REPORTING | /2.1/jobs/list | CAN_VIEW |
REPORTING | /2.1/jobs/get/{job_id} | CAN_VIEW |
REPORTING | /2.1/jobs/runs/list | CAN_VIEW |
REPORTING | /2.1/jobs/runs/get/{run_id} | CAN_VIEW |
REPORTING | /2.0/clusters/list | CAN_ATTACH_TO |
REPORTING | /2.0/pipelines/ | CAN_VIEW |
REPORTING | /2.0/pipelines/{pipeline_id} | CAN_VIEW |
REPORTING | /2.0/pipelines/{pipeline_id}/updates | CAN_VIEW |
REPORTING | /2.0/workspace/get-status | CAN_READ |
PROVISIONING | /2.1/jobs/list | CAN_VIEW |
PROVISIONING | /2.0/dbfs/delete | CAN_MANAGE |
PROVISIONING | /2.0/dbfs/mkdirs | CAN_MANAGE |
PROVISIONING | /2.0/dbfs/add-block | CAN_MANAGE |
PROVISIONING | /2.0/dbfs/get-status | CAN_READ |
PROVISIONING | /2.1/jobs/get/{job_id} | CAN_VIEW |
PROVISIONING | /2.1/jobs/reset | CAN_MANAGE |
PROVISIONING | /2.0/clusters/list | CAN_ATTACH_TO |
PROVISIONING | /2.0/clusters/get | CAN_ATTACH_TO |
PROVISIONING | /2.0/clusters/edit | CAN_MANAGE |
PROVISIONING | /2.0/global-init-scripts [GET] | ADMIN GROUP |
PROVISIONING | /2.0/global-init-scripts [POST] | ADMIN GROUP |
PROVISIONING | /2.0/global-init-scripts/{script_id} | ADMIN GROUP |
PROVISIONING | /2.0/pipelines/{pipeline_id} [GET] | CAN_VIEW |
PROVISIONING | /2.0/pipelines/{pipeline_id} [PUT] | CAN_MANAGE |
PROVISIONING | /2.0/secrets/scopes/list | MANAGE |
PROVISIONING | /2.0/secrets/list | READ |
PROVISIONING | /2.0/secrets/create | WRITE |
PROVISIONING | /2.0/secrets/delete | MANAGE |
PROVISIONING | /2.0/secrets/scopes/delete | MANAGE |
PROVISIONING | /2.0/secrets/scopes/create | MANAGE |
What rights are used to monitor Databricks entities (workflows, clusters, pipelines)?
In order to monitor an entity (e.g. workflow, cluster, pipeline), LHO needs to update the configuration of that entity. Databricks allows only some of the users to update modify the configuration of an entity (e.g. administrators or owners of clusters).
LHO uses the rights of the signed in user to enable monitoring of an entity.
If the signed in user can modify the configuration of an entity in Databricks, then LHO also will be able to monitor that entity, by virtue of using the Databricks API.
LHO does not use the Service Principal rights to enable or disable monitoring of an entity.
What rights do I need to collect telemetry data?
If the signed in user can modify the configuration of an entity in Databricks, then LHO also will be able to monitor that entity. Collecting telemetry data does not require any additional rights.
How do I configure the LHO agent (telemetry collector) to gather data?
No additional configuration is required. Configuration is done automatically by the LHO app.
LHO uses the rights of the signed in user to enable monitoring of an entity.
If the signed in user can modify the configuration of an entity in Databricks, then LHO also will be able to monitor that entity. Collecting telemetry data does not require any additional rights.
Access to Blob Storage for persisting the telemetry data is done via Service Principal.
How do I read credentials from the LHO Agent (telemetry collector)? (6)
The telemetry collector agent is using Databricks Secrets for securely accessing the cloud storage configuration. Databricks allows applications running inside Databricks environment to access Databricks secrets.
LHO uses Databricks Secrets to provide the LHO Agent (collector) with the Service Principal Client Secret required to persist to blob storage.
A context in Databricks Secrets is named Secret Scope. A security administrator will need to create a Secrete Scope to store credentials for LHO Agent.
Once created, LHO app, in the Settings page > Access Rights, will need to be linked to the existing Secret Scope. This means that LHO will be able to use existing Secret Scope to securely save credentials to be used in Databricks environment.
Once LHO App has a valid Secret Scope linked, LHO App will save in that Secrete Scope the credentials to be used by the LHO Agent.
LHO App saves the following information in the Secret Scope:
service principal
tenant id
,client id
,client secret
storage account
container name
,account name
,queue name
LHO Agent has access to the configuration of a Databricks entity (e.g. Workflow, Job, Cluster, DLT) and in that configuration can be found the Secret Scope container where credentials are stored. Any entity has access to Databricks Secrets service.
What permissions are required for the telemetry agent to read/write data from/to Azure Tables? (7)
The Service Principal of the LHO App that uses Azure Blob Storage service must be configured manually by the administrator with the role of Storage Table Data Contributor
at the storage account level.
The telemetry collector agent is using Databricks Secrets to retrieve the client secret
of the Service Principal that will be used to access cloud storage.
How does the telemetry agent communicate with the LHO App for realtime telemetry data analysis? (8)
The LHO Agent stores telemetry data in the cloud storage and sends events (e.g. job finished events) to an Azure Queue configured in the same Storage Account used for saving the telemetry data.
The LHO App dequeues events from this Queue and triggers the analysis when the Databricks job or DLT update is complete.
The LHO Service Principal requires Storage Queue Data Contributor
role at the queue level.
📍 Public Workspaces
How do I expose Subscriptions and Workspaces to users from other AD tenants?
See https://blueprintconsultingservices.atlassian.net/wiki/spaces/BLMPD/pages/2565537923
🪪 LHO Roles
What roles are there in the LHO app?
LHO currently supports the following roles that grant specific rights in the application:
user
an Azure AD user can only access the Overview, Reports and Health Alerts features of the application
access cost and telemetry data on workspaces based on configured access rights
executive
all the rights of users, plus access cost and telemetry data on all published workspaces with no access rights restriction
admin
all the rights of users and executives, plus the ability to configure a Databricks Workspace to be used for analysis by users and executives
billing admin
all the rights of users, plus ability to manage consumption data loading and processing
How can I assign LHO App roles to signed in Azure AD users?
Any Azure AD user, native or external to the AD tenant can be granted an application role:
(Step 1) Create env variable
In order to enable roles in LHO app, navigate to PCF portal and set the following environment variable for LHO app:
ADMIN_APP_ROLE
=bplm-admin
bplm-admin
can be any App Role name provided for the Service Principal’s App Registration (see below).if the env variable
ADMIN_APP_ROLE
is not defined, then all authenticated AD users will be granted theadmin
role with full access to the LHO application.
(Step 2) Create App Role
Open Azure Active Directory in Azure Portal → App registrations → search for client id
(see more details at “How do I configure the Azure Active Directory Service Principal?”) and open application → click on App roles → Create app role with following settings:
Display name
bplm-admin
select Users/Groups
Value
bplm-admin
value must be the same as the value of env variable
ADMIN_APP_ROLE
Description
any meaningful description
(Step 3) Assign users to App role
From the above step, click on Overview in the opened application page.
If the from (Step 2) was closed, then Open Azure Active Directory in Azure Portal → App registrations → search for client id
(see more details at “How do I configure the Azure Active Directory group?”) and open application.
Click on the link for Managed application in local directory, the link is the App name. This will open the Enterprise Application view.
Select Users and groups tab, click to Add User/Group.
This view allows you to assign users and/or groups to have the admin
role in the LHO app.