AWS Resource Requirements

Lakehouse Optimizer requires and uses the following resources.

Required AWS Resources

  • 1 Resource Group that contains the following:

    • Amazon EC2 Ubuntu/CentOS Linux EC2 VM:

      • OS: Ubuntu Linux 20.04 or CentOS Linux 7.9

      • Recommended Type: t3.2xlarge or similar with minimum 8 cores

      • Docker Engine (version 23.0 or later) installed

      • 50GB EBS Volume

    • Amazon RDS for SQL Server:

      • Instance type: db.t2.xlarge or similar (4 cores/16GB RAM)

      • Daily automated backups

      • Web Edition, engine version 15.00 ( SQL 2019 )

      • RDS requires two subnets in different availability zones

        • Security group requires inbound TCP on 1433

      • An app database ( suggested name ‘bplm’ or ‘lakehouse-monitor’ )

    • AWS Secrets Manager

      • Used to store sensitive passwords

    • Amazon Dynamo DB

      • On-Demand Capacity Mode (can be changed to Provisioned Mode after a period of 1-2 months of monitoring)

      • Standard Table class

      • On-demand backup (daily, TTL enabled on ALL tables, 3 days max, data no longer required after scheduled LHM analyzer runs will move aggregated data to SQL Server)

      • one set of tables per LHM deployment, in one AWS region (multi region support in future releases)

    • Amazon SQS:

      • Free Tier: First 1 Million Requests/Month is free

    • Amazon Route 53:

      • create a DNS entry for the VM’s public IP address/hostname

      • LHM install script will install CertBot and generate a self-signed SSL certificate for the VM, it requires a human readable URL

      • alternatively, provide a SSL Certificate from a trusted Certificate Authority at deployment time

AWS Service Limitations

When choosing a region to deploy the AWS services into, be mindful of current services in that region and service quotas. Find more information on service quotas in this AWS PDF https://docs.aws.amazon.com/pdfs/general/latest/gr/aws-general.pdf#aws-service-information

If need be, before deploying the prerequisite AWS services, request a quota increase by following the steps outlined here https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html.

AWS EC2 VM

  • Deployment with Docker Containers

  • Public IP, security group:

    • allow inbound traffic on 443,80 for web traffic, TCP protocol

    • inbound port 22 for ssh configuration, TCP, can be closed later

  • Cloud Provider: AWS

  • Required

    • OS: Ubuntu Linux 20.04 or CentOS Linux 7.9

    • Docker Engine (version 23.0 or later) installed

Host VM Specs

CPU

Memory (GB)

Disk (GB)

Host VM Specs

CPU

Memory (GB)

Disk (GB)

Recommended

8

28+

30

Databricks Service Principal

Lakehouse Optimizer leverages a Databricks service principal for API calls. This principal requires account admin privileges as well as workspace admin for any monitored workspaces. Lakehouse Optimizer uses OATH authentication for service principals. Please follow step 1 and 2 in the documentation from Databricks linked below to create the service principal and an OATH secret for the principal. Save this secret for later use during deployment where it is saved in secrets manager for use by the Lakehouse Optimizer application. Make sure you also add the service principal as a member to the workspaces that you want monitored by Lakehouse Optimizer.

https://docs.databricks.com/en/dev-tools/authentication-oauth.html#language-Identity%C2%A0federation%C2%A0enabled

AWS Permission Policies

See for regular deployment or for cross account AWS deployment.

Lakehouse Optimizer Agent Permissions

As part of fully configuring the Lakehouse Optimizer, there is an agent that gets deployed to your Databricks workflows and clusters. That agent requires write access to DynamoDB and Simple Queue Service (SQS) as outlined under the “LHO Agent Policy”. There are three options to enable this agent, and the preferred way is to configure both option 1 and 2. These are the options that can be configured automatically via the deployment scripts:

  1. Create an IAM user and attach the LHO Agent Policy. This is the catch all option as it will future proof monitoring for new compute resources created in any monitored workspaces. Have an access key and secret for this user ready at deployment time as you will be prompted to enter them. The access key and secret are stored in AWS Secrets Manager. The access key and secret will also be in plain text in the global init script that is only accessible by workspace admins. This scenario also acts as a fall back if option 2 or 3 are not configured for all desired compute resources within monitored Databricks workspaces.

  2. Create a role for the agent in IAM with the LHO Agent policy attached, configuring that role’s trust policies to trust the instance profiles used in your Databricks environment(s). (Instance Profile role will “assume” the LHO Agent IAM Role .. “AWS IAM role chaining”). This is the AWS preferred way of access. Only those compute resources with instance profiles will be analyzed.

  3. Extend the policies for all instance profiles used by compute resources to include the “LHO Agent Policy”. The drawback to this approach is only compute resources that use these configured instance profiles will be reported on and it is currently a manual configuration.

Tags to activate in Cost Manager

In order to obtain consumption data for Databricks managed resources from AWS cost allocation tags must be properly configured at AWS

  1. Enable CostExplorer

  2. Activate user-defined cost allocation tags:

    • ClusterId, DatabricksInstancePoolId, SqlEndpointId - these tags are automatically added to VM resources by Databricks

    • BplmWorkspaceId - used for reporting workspace storage and NAT Gateway costs

    • BplmMetastoreId - used for reporting the Unity Catalog metastore storage cost

These operations require an AWS payer account and usually take a day to complete by AWS.

 

AWS Networking Diagram

image-20240311-171556.png

AWS Marketplace Considerations

Below are the mandatory billable services created during an AWS marketplace deployment.

Networking components that will incur charges:

  • Virtual Private Cloud

  • Internet Gateway

  • A Route 53 DNS entry

EC2 - Compute resource used to host application container.

RDS - Required to store application data.

Secrets Manager - Used to store secrets securely used by the application, such as the SQL password or application registration client secret if using Entra ID single sign-on.

Dynamo DB

Simple Queue Service