Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagejson
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "iam:CreateServiceLinkedRole",
                "iam:DeleteServiceLinkedRole",
                "iam:CreateInstanceProfile",
                "iam:AddRoleToInstanceProfile",
                "iam:DeleteInstanceProfile",
                "iam:GetUser",
                "iam:AttachRolePolicy",
                "iam:GetInstanceProfile",
                "iam:PassRole",
                "iam:CreatePolicy",
                "iam:ListEntitiesForPolicy",
                "iam:AttachUserPolicy",
                "iam:CreatePolicyVersion",
                "iam:ListAttachedUserPolicies",
                "iam:ListPolicies",
                "iam:DetachUserPolicy",
                "iam:ListUsers",
                "iam:ListGroups",
                "iam:CreateRole",
                "iam:GetPolicy",
                "iam:GetPolicyVersion",
                "iam:RemoveRoleFromInstanceProfile",
                "iam:DeleteRole",
                "iam:DeletePolicy",
                "iam:ListPolicyVersions",
                "iam:ListInstanceProfilesForRole",
                "iam:ListRolePolicies",
                "iam:ListAttachedRolePolicies",
                "iam:GetRole",
                "iam:ListRoles",
                "iam:DetachRolePolicy",
                "organizations:DescribeOrganization",
                "account:ListRegions"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2iam:*CreateServiceLinkedRole",
                "dynamodbiam:*DeleteServiceLinkedRole",
            ],
   "route53:*",                 "rds:*","Resource": "arn:aws:iam::*:role/lhoiam/*"
        },
       "s3:*", {
              "Effect": "cloudshell:*Allow",
                "resource-groups:*","Action": [
                "secretsmanageriam:TagResourcePassRole",
                "secretsmanager:CreateSecret"],
                "secretsmanager:DescribeSecret","Resource": "arn:aws:iam::*:role/lhoiam/*"
          },
     "secretsmanager:GetResourcePolicy",    {
            "Effect"secretsmanager:GetSecretValue "Allow",
                "kms:CreateKey","Action": [
                "kmsec2:DescribeKey*",
                "kmsdynamodb:GetKeyPolicy*",
                "kmsroute53:ScheduleKeyDeletion*",
                "kmsrds:GetKeyRotationStatus*",
                "secretsmanagers3:DeleteSecret*",
                "secretsmanagercloudshell:PutSecretValue*",
                "kmsresource-groups:ListResourceTags*",
              ],  "secretsmanager:TagResource",
              "Resource": "*"  "secretsmanager:CreateSecret",
                "secretsmanager:DescribeSecret",
                "secretsmanager:GetResourcePolicy",
        }      ]
}

Databricks Service Principal

Ensure the Databricks service principal was created with the prerequisites linked below

https://blueprinttechnologies.atlassian.net/wiki/spaces/BLMPD/pages/2577662006/AWS+Resource+Requirements#Databricks-Service-Principal

LHO Agent Access

Make a decision on how the LHO agent will be granted authorization in your Databricks environment. We suggest creating an IAM user and configuring a role trust, as outlined in the documentation linked below.

https://blueprinttechnologies.atlassian.net/wiki/spaces/BLMPD/pages/2577662006/AWS+Resource+Requirements#AWS-Permission-Policies

Azure Active Directory Single Sign-On Prerequisites

LHO currently support Databricks Accounts and Azure Active Directory as Identity Providers for signing into the application. A signed in user will have to import a Databricks API Token for each monitored Databricks workspace and all Databricks APIs made by the LHO on behalf of the user will use the API Token.

If your choice is Azure AD SSO for identity management create an app registration in the target Azure tenant before completing the installation guide

https://learn.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app

Make note of the azure tenant id, Application ( client ) id.

Create an application client secret, saving the generated value for later use in the deployment process. Make note of the expiration date of the secret. Once deployed into AWS Secrets manager, this value will be stored in 'msft-provider-auth-secret'. Rotate that secret value by generating a new client secret as necessary.

...

Click “Add a platform” under Authentication to add a Web platform authentication configuration

...

Enable the authorization endpoint to issue ‘ID tokens’ under ‘Authentication’

...

If you’ve decided on a DNS name for the app, you can also at this time update the Web Redirect URI to include

Code Block
https://{dns record name}/login/oauth2/code/azure

You can also update this value after deployment. It must be configured correctly to enable Azure AD SSO.

...

I. Installation Guide

Step 1. Log in to AWS and open AWS CloudShell

...

Step 2. Prepare Lakehouse Optimizer deployment archive.

Code Block
languagebash
wget https://bplmdemoappstg.blob.core.windows.net/deployment/vm-aws/lho-centos.zip
unzip lho-centos.zip -d lho
cd lho
chmod +x deploy-lm.sh 
sed -i.bak 's/\r$//' *.sh

Step 3. Build the argument list and run deployment script.

There are multiple options for LHO for AWS deployment, please understand the below options and create an argument list that best configures LHO for your environment’s needs.

The base arguments for all deployments are below:

Code Block
languagebash
bash ./deploy-lm.sh --email_certbot "{user email}" \
      --aws_region {aws region} \
      --databricks_account_id {databricks account ID}\
      --databricks_principal_guid {Databricks service principal client id} \
      --dns_record_name "{DNS record name}" \
      --name_prefix "{NAME}" \
      --acr_username "{Container_resistry_username}"
  

email_certbot - An admin email notified by Let’s Encrypt regarding certificate expiry. As a backup to the automatically renewing certs LHO configures.

aws_region - The AWS region you wish to deploy LHO into. Should be the same or as close to your databricks workspaces as possible.

databricks_account_id - Your Databricks account id. https://docs.databricks.com/en/administration-guide/account-settings/index.html#locate-your-account-id

databricks_principal_guid - The client id of the databricks service principal created as part of the prerequisites. https://blueprinttechnologies.atlassian.net/wiki/spaces/BLMPD/pages/2577662006/AWS+Resource+Requirements#Databricks-Service-Principal

dns_record_name - Friendly, descriptive DNS name for the application. eg: lho-app-dev.yourdomain.com. An 'A' record is created in an AWS Hosted Zone in the account this deployment is running in. Ensure that the hosted zone is in the AWS account.

name_prefix - AWS Resources name prefix. eg: lho-dev

acr_username - Container registry username to authenticate and pull down the LHO app container. Contact Blueprint support if you do not have this information.

Azure Entra Id ( Active Directory ) Single-Sign On Enabled Options

...

  "secretsmanager:GetSecretValue",
                "kms:CreateKey",
                "kms:DescribeKey",
                "kms:GetKeyPolicy",
                "kms:ScheduleKeyDeletion",
                "kms:GetKeyRotationStatus",
                "secretsmanager:DeleteSecret",
                "secretsmanager:PutSecretValue",
                "kms:ListResourceTags"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "ec2-instance-connect:SendSSHPublicKey",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/App": "BPLM"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": "ec2-instance-connect:OpenTunnel",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/App": "BPLM"
                }
            }
        }
    ]
}

Databricks metastore admin permission

Besides the AWS permissions listed above, the deploying user needs to be a Metastore Admin of the Databricks Unity Catalog. We recommend creating a group which shall be configured as Metastore Admin and having the admins added to this group.

Supported AWS Regions

The Lakehouse Optimizer supports any region where all the required AWS services can be deployed. You can check service availability by region by this AWS page to list AWS services available by region.

...

Databricks Service Principal

Ensure the Databricks service principal was created with the prerequisites linked below

https://blueprinttechnologies.atlassian.net/wiki/spaces/BLMPD/pages/2577662006/AWS+Resource+Requirements#Databricks-Service-Principal

...

LHO Agent Access

Make a decision on how the LHO agent will be granted authorization in your Databricks environment. We suggest creating an IAM user and configuring a role trust, as outlined in the documentation linked below.

https://blueprinttechnologies.atlassian.net/wiki/spaces/BLMPD/pages/2577662006/AWS+Resource+Requirements#AWS-Permission-Policies

...

Single Sign-On Configuration

Lakehouse Optimizer supports a handful of different identity providers. Please follow the links to the configuration page for your SSO provider of choice. If you use Databricks accounts for authentication, you can skip this section.

Microsoft Entra\Azure Active Directory

Okta

Google

...

I. Installation Guide

Step 1. Log in to AWS and open AWS CloudShell

...

Step 2. Prepare Lakehouse Optimizer deployment archive.

Code Block
languagebash
wget https://bplmdemoappstg.blob.core.windows.net/deployment/vm-aws/lho_aws_r9.zip
unzip lho_aws_r9.zip -d lho
cd lho
chmod +x deploy-lm.sh 
sed -i.bak 's/\r$//' *.sh

Step 3. Build the argument list and run deployment script.

There are multiple options for LHO for AWS deployment, please understand the below options and create an argument list that best configures LHO for your environment’s needs.

The base arguments for all deployments are below:

Code Block
languagebash
bash ./deploy-lm.sh --email_certbot "{user email}" \
      --aws_region {aws region} \
      --databricks_account_id {databricks account ID}\
      --databricks_principal_guid {Databricks service principal client id} \
      --dns_record_name "{DNS record name}" \
      --name_prefix "{NAME}" \
      --acr_username "{Container_resistry_username}"
  

email_certbot - An admin email notified by Let’s Encrypt regarding certificate expiry. As a backup to the automatically renewing certs LHO configures.

aws_region - The AWS region you wish to deploy LHO into. Should be the same or as close to your databricks workspaces as possible.

databricks_account_id - Your Databricks account id. https://docs.databricks.com/en/administration-guide/account-settings/index.html#locate-your-account-id

databricks_principal_guid - The client id of the databricks service principal created as part of the prerequisites. https://blueprinttechnologies.atlassian.net/wiki/spaces/BLMPD/pages/2577662006/AWS+Resource+Requirements#Databricks-Service-Principal

dns_record_name - Friendly, descriptive DNS name for the application. eg: lho-app-dev.yourdomain.com. An 'A' record is created in an AWS Hosted Zone in the account this deployment is running in. Ensure that the hosted zone is in the AWS account.

name_prefix - AWS Resources name prefix. Note that this will be used to name the S3 bucket. The bucket name must be globally unique (across the entire AWS space) so we recommend using specific names instead of generic ones. E.g: lho-<your company name here> instead of lho

acr_username - Container registry username to authenticate and pull down the LHO app container. Contact Blueprint support if you do not have this information.

Single Sign-On Options

Azure Entra Id ( Active Directory )

Two additional arguments must be passed in, ‘tenant_id' and 'service_principal'. You will also be prompted for the client secret during script execution

Code Block
languagebash
bash ./deploy-lm.sh --email_certbot "{user email}" \
      --aws_region {aws region} \
      --databricks_account_id {databricks account ID}\
      --databricks_principal_guid {Databricks service principal client id} \
      --dns_record_name "{DNS record name}" \
      --name_prefix "{NAME}" \
      --acr_username "{Container_resistry_username}" \
      --service_principal "{App registration Client (App) ID}" \
      --tenant_id "{azure_ad_tenant_id}"

service_principal - Your Azure AD app registration client ( App ) ID. Steps to create this above under the section 'Azure Active Directory Single Sign-On Prerequisites”.

tenant_id - Your Azure AD tenant id. https://learn.microsoft.com/en-us/azure/azure-portal/get-subscription-tenant-id#find-your-microsoft-entra-tenant

Okta

Okta SSO requires that the Okta base URL and client ID be added as arguments. You will also be prompted for the client secret during script execution

Code Block
languagebash
bash ./deploy-lm.sh --email_certbot "{user email}" \
      --aws_region {aws region} \
      --databricks_account_id {databricks account ID}\
      --databricks_principal_guid {Databricks service principal client id} \
      --dns_record_name "{DNS record name}" \
      --name_prefix "{NAME}" \
      --acr_username "{Container_resistry_username}" \
      --okta_clientid "{Client ID of created App integration}" \
      --okta_baseurl "{base URL of okta tenant}"

Google

Google SSO requires the Google app client ID passed in as an argument. You will also be prompted for the client secret during script execution

Code Block
languagebash
bash ./deploy-lm.sh --email_certbot "{user email}" \
      --aws_region {aws region} \
      --databricks_account_id {databricks account ID}\
      --databricks_principal_guid {Databricks service principal client id} \
      --dns_record_name "{DNS record name}" \
      --name_prefix "{NAME}" \
      --acr_username "{Container_resistry_username}" client id} \
      --servicedns_record_principalname "{AppDNS registration Client (App) IDrecord name}" \
      --tenant_id "{azure_ad_tenant_id}"

service_principal - Your Azure AD app registration client ( App ) ID. Steps to create this above under the section 'Azure Active Directory Single Sign-On Prerequisites”.

...

-

...

-name_prefix "{NAME}" \
      --acr_username "{Container_resistry_username}" \
      --google_client_id "{Client ID of created App}"

LHO for smaller environments

...

Configuring option 2, using role trusts

First run the The below code snippet to create creates a role trust file that is a newline separated list of all of the instance profile role ARNs or AWS accounts that are in use in your Databricks environments. You may also simply add the root ARN to have the LHO agent role trust all roles in the given account.environment. This file location is passed in as an argument to the deployment script.

Copy the snippet to a text editor. Change the below ARN values defined in the role_array variable declaration to suit your needs. After updating, run in AWS Cloudshell to create the role trust file.

Code Block
# declare role array
role_array=(
"arn:aws:iam::{aws account id}:role/rolename"
"arn:aws:iam::{aws account id}:role/anotherrolename"
)
# create role file
printf "%s\n" "${role_array[@]}" > ~/lho/agent_trusted_roles.txt

OR

role_array=(
"arn:aws:iam::{aws account id}:root"
)

# create role file
printf "%s\n" "${role_array[@]}" > ~/lho/agent_trusted_roles.txt

...

There are multiple different argument options available to configure the LHO application to suit your needs. Beyond the required base arguments, you can mix and match the additional options discussed above. For example, you can run an Azure AD SSO version of LHO that enables both IAM user and role trust authentication solutions. For other options, run ./deploy-lm.sh --help to see all available options

Step 4. Script input fields:

...