Below is a checklist designed to get you ready to deploy the Lakehouse Optimizer (LHO) on AWS:
1. Decide which Databricks workspaces should be monitored.
If all your workspaces are in the same AWS account and you plan to deploy LHO into that same account, you will be deploying the Single AWS account scenario.
Otherwise, you are deploying the Cross-account AWS scenario.
2. Determine which AWS region the LHO infrastructure will be deployed into. Ideally the same region as your Databricks workspaces to avoid cross-region data transfer fees.
3. For your selected region, ensure there’s an available VPC slot as the installation create a new VPC.
4. Understand AWS infrastructure resource requirements:
5. The IAM user account running the installation needs the following policies attached:
6. If you have Microsoft Entra ID (Azure Active Directory) as your identity provider, create an app registration in your tenant:
7. Have the following values available:
App registration client ID
Azure AD tenant ID
The client secret value
8. Create the Databricks service principal – https://blueprinttechnologies.atlassian.net/wiki/spaces/BLMPD/pages/2577662006/AWS+Resource+Requirements#Databricks-Service-Principal
Account Admin
Workspace Admin for all planned monitor workspaces
Create OATH client secret and store for later use during deployment
9. Determine identity authorization solution for the LHO agent running on Databricks Workspace compute resources.
Options outlined here -- AWS Resource Requirements - Blueprint Lakehouse Optimizer Documentation - Confluence (atlassian.net)
Depending on options taken:
Create IAM user, saving username, access key and secret for later use.
Gather all instance profile ARNs in use by all target workspaces for later use, or have root account ARN available
Expand all in-use instance profile IAM policies to include the permissions required for the LHO Agent.
10. Decide on a DNS name
As part of deployment, certificates are automatically created and a DNS entry is added if the hosted zone is available.
If the desired DNS name is not a part of the AWS account LHO is deployed into, have someone who can create a DNS in the provider of choice
11. Enable CostExplorer in AWS https://blueprinttechnologies.atlassian.net/wiki/spaces/BLMPD/pages/2577662006/AWS+Resource+Requirements#Tags-to-activate-in-Cost-Manager
Sometime after the deployment completes, activate the documented user-defined tags
If you are ready to get started, you can follow the AWS deployment guides here:
Deployment on AWS
Add Comment