AWS Security Requirements for VM runtime
Phase 1) AWS Resources
After the deployment of the Lakehouse Optimizer is complete, the provided resource group will include the following resources:
VM includes the application services that provide the web interface and APIs for reporting and instrumentation dashboard, DBricks workspace configuration panel, background services for telemetry data analysis and recommendations, as well as consumption data scheduled runs
Storage Account: used for storing all telemetry data from the Databricks workspaces and consumption/cost/usage detail data
KeyVault: used for storing the storage account access key (if enabled for the deployment), the Azure AD App Registration client secret (for Azure AD Single Sign On into the application and optionally for accessing the Azure Blob Storage account), and finally the SQL Server Login password.
SQL Server database: used for storing the output of the analyzer and consumption data processors, feeds all the data required by the reports and dashboards
Signing into the application url is done through Azure AD credentials that are authorized for access into all available subscriptions in the Azure tenant and all the Databricks workspaces in the available subscriptions.
Phase 2) Azure AD SSO user requirements
Each signed-in Azure AD user must have following permissions in order use the features of the application which is calling 2 types of APIs on the behalf of the signed in user:
Azure Service Management APIs for listing the available Subscriptions and Databricks workspaces and optionally grant the application identity the Billing Reader permission.
listing subscriptions can be done via the service management API call (Azure Service Management functions can be turned off completely from configuration. Contact Blueprint with support on this matter) or by manually providing a publicSubscriptions.csv file.
listing workspaces in each available subscriptions can be done by calling the service management API or by manually providing a subscriptionMetadata.csv file in each subscription directory in the storage account of the deployment. Calling the API requires the
Microsoft.Databricks/workspaces/read
permission via a custom role at either Azure subscription or resource group level containing the Databricks workspaces this user should be able to access from the application.
Databricks APIs
A
Databricks user
must be created in the for the signed in Azure AD user. The app does not create them automatically.The application will call Databricks APIs on behalf of the signed in user using their OAuth2 Access Token ( Microsoft identity platform and OAuth2.0 On-Behalf-Of flow - Microsoft identity platform ) and these APIs will authorize the signed in user based on the level of permissions in the Databricks workspace.
The list of APIs and permissions required: User Permissions Required by Databricks API
Phase 3) Access roles configuration
The signed in user will grant the application the necessary permissions to load consumption data on a schedule and analyze telemetry data. The signed in user must have at least the UserAccessAdministrator role in the subscription.
The application will perform background tasks like loading consumption data and analyzing telemetry.
It does by either using the System Assigned Managed Identity of the VM or the Service Principal configured in the application for Azure AD SSO (USE_SP_FOR_BACKGROUND_PROCESSORS = true
configuration option) and we will call this the “App Service Account” from now on.
These are the following roles that the App Service Account are being granted by the signed in user:
Billing Reader
role at Azure subscription level in order to allow the application to read consumption/usage details data. This operation can be done by clicking theGrant Billing Reader role
button on settings page for selected Azure subscription in the Lakehouse Optimizer user interface- “Grant billing reader role
Create a Databricks service principal for the App Service Account in the selected Databricks workspace in order to allow the application to access the Databricks jobs/clusters/DLTs etc (REST APIs) for background processors.
if
USE_SP_FOR_BACKGROUND_PROCESSORS = false
the System Assigned Managed Identity of the VM is the “App Service Account” and the Databricks SP creation is done through theGrant Access
button on settings page for selected Databricks workspace in the Lakehouse Optimizer user interface.if
USE_SP_FOR_BACKGROUND_PROCESSORS = true
the Service Principal configured for SSO is the “App Service Account” and the Databricks SP creation is done through theAdd Service Principal
button on settings page for selected Databricks workspace in the Lakehouse Optimizer user interface.The created Databricks SP is part of the admins group in the workspace, however, the exact list of permissions required for the SP in the workspace is documented here: Service Principal Permissions Required by Databricks API
For Premium tier workspaces the service principal is created via the SCIM 2.0 API
For Standard tier workspaces the service principal is created by temporarily granting the
Contributor
role to the App Service Account. Once this corresponding service user is created in Databricks, theContributor
role is removed and onlyReader
role is maintained for the App Service Account.
Phase 4) Secret Scope creation
The telemetry collector agent or the service principal (if used) are using Databricks Secrets for securely accessing the cloud storage configuration, including the access key or service principal client secret (USE_SP_FOR_STORAGE_ACCOUNT
=true)
Click the Create Secret Scope
button on settings page for selected Databricks workspace in the Lakehouse Optimizer user interface.
Both Azure KeyVault backed secret scopes and Databricks backed secret scopes are supported:
a. Databricks backed secret scope: signed in AD user must be part of the admins group in the workspace
b. Azure KeyVault backed secret scope: signed in Azure AD user should have both the Key Vault Contributor
role for the KeyVault used for the selected workspace and an access policy with the Set permission enabled in that KeyVault