AWS Security Requirements for VM runtime

Phase 1) AWS Resources

After the deployment of the Lakehouse Optimizer is complete, the provided resource group will include the following resources:

VM includes the application services that provide the web interface and APIs for reporting and instrumentation dashboard, DBricks workspace configuration panel, background services for telemetry data analysis and recommendations, as well as consumption data scheduled runs

Storage Account: used for storing all telemetry data from the Databricks workspaces and consumption/cost/usage detail data

KeyVault: used for storing the storage account access key (if enabled for the deployment), the Azure AD App Registration client secret (for Azure AD Single Sign On into the application and optionally for accessing the Azure Blob Storage account), and finally the SQL Server Login password.

SQL Server database: used for storing the output of the analyzer and consumption data processors, feeds all the data required by the reports and dashboards

 

Signing into the application url is done through Azure AD credentials that are authorized for access into all available subscriptions in the Azure tenant and all the Databricks workspaces in the available subscriptions.

Phase 2) Azure AD SSO user requirements

Each signed-in Azure AD user must have following permissions in order use the features of the application which is calling 2 types of APIs on the behalf of the signed in user:

  1. Azure Service Management APIs for listing the available Subscriptions and Databricks workspaces and optionally grant the application identity the Billing Reader permission.

    1. listing subscriptions can be done via the service management API call (Azure Service Management functions can be turned off completely from configuration. Contact Blueprint with support on this matter) or by manually providing a publicSubscriptions.csv file.

    2. listing workspaces in each available subscriptions can be done by calling the service management API or by manually providing a subscriptionMetadata.csv file in each subscription directory in the storage account of the deployment. Calling the API requires the Microsoft.Databricks/workspaces/read permission via a custom role at either Azure subscription or resource group level containing the Databricks workspaces this user should be able to access from the application.

  2. Databricks APIs

Phase 3) Access roles configuration

The signed in user will grant the application the necessary permissions to load consumption data on a schedule and analyze telemetry data. The signed in user must have at least the UserAccessAdministrator role in the subscription.

The application will perform background tasks like loading consumption data and analyzing telemetry.

It does by either using the System Assigned Managed Identity of the VM or the Service Principal configured in the application for Azure AD SSO (USE_SP_FOR_BACKGROUND_PROCESSORS = true configuration option) and we will call this the “App Service Account” from now on.

These are the following roles that the App Service Account are being granted by the signed in user:

  • Billing Reader role at Azure subscription level in order to allow the application to read consumption/usage details data. This operation can be done by clicking the Grant Billing Reader role button on settings page for selected Azure subscription in the Lakehouse Optimizer user interface

    • “Grant billing reader role
  • Create a Databricks service principal for the App Service Account in the selected Databricks workspace in order to allow the application to access the Databricks jobs/clusters/DLTs etc (REST APIs) for background processors.

  • if USE_SP_FOR_BACKGROUND_PROCESSORS = false the System Assigned Managed Identity of the VM is the “App Service Account” and the Databricks SP creation is done through the Grant Access button on settings page for selected Databricks workspace in the Lakehouse Optimizer user interface.

  • if USE_SP_FOR_BACKGROUND_PROCESSORS = true the Service Principal configured for SSO is the “App Service Account” and the Databricks SP creation is done through the Add Service Principal button on settings page for selected Databricks workspace in the Lakehouse Optimizer user interface.

  • The created Databricks SP is part of the admins group in the workspace, however, the exact list of permissions required for the SP in the workspace is documented here: Service Principal Permissions Required by Databricks API

  • For Premium tier workspaces the service principal is created via the SCIM 2.0 API

  • For Standard tier workspaces the service principal is created by temporarily granting the Contributor role to the App Service Account. Once this corresponding service user is created in Databricks, the Contributor role is removed and only Reader role is maintained for the App Service Account.

Phase 4) Secret Scope creation

The telemetry collector agent or the service principal (if used) are using Databricks Secrets for securely accessing the cloud storage configuration, including the access key or service principal client secret (USE_SP_FOR_STORAGE_ACCOUNT=true)

Click the Create Secret Scope button on settings page for selected Databricks workspace in the Lakehouse Optimizer user interface.

Both Azure KeyVault backed secret scopes and Databricks backed secret scopes are supported:

a. Databricks backed secret scope: signed in AD user must be part of the admins group in the workspace

b. Azure KeyVault backed secret scope: signed in Azure AD user should have both the Key Vault Contributor role for the KeyVault used for the selected workspace and an access policy with the Set permission enabled in that KeyVault