Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Step 1) Required Resources

Lakehouse Monitor Optimizer requires the following resources to already be created:

AWS Resource Requirements

Step 2) Configuration Prerequisites

  • AWS Secrets Manager needs to be configured with the following secret key value pairs. Suggested name for the secret is ‘bplm-credentials’:

    • storage-access-key - Optional, AWS Access Key used for accessing Amazon DynamoDB and Amazon SQS by the telemetry agent

    • storage-secret-key - Optional, AWS Secret Key DynamoDB secret keyused for accessing Amazon DynamoDB and Amazon SQS by the telemetry agent

      • Note: DynamoDB is the telemetry data store, access from the LHM LHO services or telemetry agents in Databricks workspaces can be enabled either with the access key/secret key pair or via IAM Roles/Credentials and Instance Profiles, in which case the key pair above becomes optional

    • service-account-username - Databricks service account username

    • service-account-password - Databricks service account password

      • Note: the Databricks service account is required for access to the Billable Usage Logs of Databricks Accounts API. These logs will fuel all the consumption reports at the Databricks account, workspace, job, job run, task run, nested notebook, cluster, notebook, dlt pipeline, dlt update etc level in the tool and it will help prioritize optimization efforts.

      • It is a Databricks Account user with username and password, since the only authentication supported by the Databricks Accounts API is username/password.

    • mssql-password - SQL Login password for the SQL Database

    • application-encryption-secret - encryption key for storing PATs (Personal Access Tokens) and the Databricks Accounts credentials (billable usage logs) in the LHM LHO SQL database

    • msft-provider-auth-secret - Optional, Client secret value from azure app registration

      • Note:

    the storage-access-key and storage-secret-key are optional. They become required only if you choose to NOT use the IAM Role for accessing DynamoDB and SQS
      • The Service Principal secret key is needed in case you want LHO configured with Azure Active Directory for login and SSL. If you choose to use Databricks authentication only this is not needed and the secret can be omitted.

Step 3) Installation procedure

1. SSH into the BPLM VM configured at Step 1) Required Resources.

...

Code Block
  .env
  docker-compose.yml
  setup.sh
  start.sh
  
  1. Before you start setup you need to fill out the .env file with the required information). Open the file in your editor of choice and fill in the values.

    1. Please find a brief explanation of the .env values below

...

Code Block
LOG_LEVEL=info
LOG_LEVEL_APP=info
LOG_LEVEL_HTTP_HEADERS=error

APPSERVICE_URL=<eg:https://demo.aws-bplm.com>

SQL_DATABASE=master
SQL_SERVER_HOST=<eg:192.168.4.10 or endpoint DNS name>
SQL_USER=<eg:sql_admin>

STORAGE_AWS_REGION=<eg:us-west-1>
STORAGE_AWS_TABLE_PREFIX=bplm

AWS_SECRETS_MANAGER_ENABLED=true
AWS_SECRETS_MANAGER_REGION=<eg:us-west-1>
BPLM_SECRET_NAME=<name of the secrets manager secret>
SERVER_SSL_ENABLED=true
SERVER_SSL_KEY-STORE=/keystore/bplm.p12
SERVER_SSL_KEY-STORE-PASSWORD=
SERVER_SSL_KEY-STORE-TYPE=PKCS12
SERVER_SSL_KEY-ALIAS=bplm
SERVER_SSL_KEY-PASSWORD=

SERVICE_PRINCIPAL_CLIENTID=<eg: 925accb1-8506-4ec4-a90b-b1b0e6d8a5eb>
SERVICE_PRINCIPAL_TENANTID=<eg: 03786a4c-412b-4fac-a981-b4c5bcbc55b7>
SERVICE#SERVICE_PRINCIPAL_CLIENTSECRET=<secret value or ${secret key name from secrets manager}>msft-provider-auth-secret}

DATABRICKS_ACCOUNT_ID=<eg: 56293882-89e7-4ecd-a5f7-cb61e68a54f0>
DATARICKS_SERVICE_PRINCIPAL=<eg: 48de6ad6-ff14-403d-b842-d4ce5da4662f>
ACTIVE-DIRECTORY_HOST=https://login.microsoftonline.com
ACTIVE-DIRECTORY_TOKEN-ENDPOINT=/oauth2/v2.0/token
ACTIVE-DIRECTORY_AUTHORIZE-ENDPOINT=/oauth2/v2.0/authorize
ACTIVE-DIRECTORY_JWK-ENDPOINT=/discovery/keys
ACTIVE-DIRECTORY_USER-INFO-URI=https://graph.microsoft.com/oidc/userinfo

CLOUD_PROVIDER=AWS
AUTHENTICATION_PROVIDER=databricks-account,active-directory
SPRING_PROFILES_ACTIVE=production-aws
SERVER_SERVLET_SESSION_PERSISTENT=true
SERVER_SERVLET_SESSION_STORE_DIR=/home/ubuntu/spring-session/session
ADMIN_APP_ROLE=bplm-admin
METRIC_PROCESSINGPROCESSOR_ENABLED=falsetrue
STORAGE_THROUGH_IAM_CREDENTIALS=true
#metric.queueMonitoring.compactionTimeout=PT25M
APPLICATION_NOTIFICATION_JOBNOTIFICATIONQUEUENAME=<prefix for sqs names>

#CONSUMPTION_BILLABLE_USAGE_PATH=s3a://{{s3-bucket}}/dbx-costs/billable-usage/csv

#CROSS_ACCOUNT_ASSUME_IAM_ROLE_AGENT=
#CROSS_ACCOUNT_ASSUME_IAM_ROLE_S3_DBX_BILLING_APP=
#CROSS_ACCOUNT_ASSUME_IAM_ROLE_DYNAMO_SQS_APP=
#CROSS_ACCOUNT_ASSUME_IAM_ROLE_COST_EXPLORER_APP=
#CROSS_ACCOUNT_ASSUME_IAM_ROLE_TAG_WORKSPACE_RESOURCE_APP=

# Configuration example from cross account assume roles
#CROSS_ACCOUNT_ASSUME_IAM_ROLE_COST_EXPLORER_APP=arn:aws:iam::<aws-account>:role/bplm-dev-costexplorer-role,arn:aws:iam::153067919175:role/examplempl-of-xaccount-permission-role-for-cost-explorer-and-tagz
#CROSS_ACCOUNT_ASSUME_IAM_ROLE_TAG_WORKSPACE_RESOURCE_APP=arn:aws:iam::<aws-account>:role/vt-bplm-test-multi-aws-acc-tags,arn:aws:iam::153067919175:role/example-of-xaccount-permission-role-for-cost-explorer-and-tagz
#CROSS_ACCOUNT_ASSUME_IAM_ROLE_DYNAMO_SQS_APP=arn:aws:iam::<aws-account>:role/bplm-dev-dynamosqs-role
#CROSS_ACCOUNT_ASSUME_IAM_ROLE_AGENT=arn:aws:iam::<aws-account>:role/bplm-dev-dynamosqs-collector-role
#CROSS_ACCOUNT_ASSUME_IAM_ROLE_S3_DBX_BILLING_APP=arn:aws:iam::153067919175:role/xaccount-s3-accesss-role

Note: due to the docker version provided by CentOS the SERVICE_PRINCIPAL_CLIENTSECRET can not be pulled from the secrets manager.

...

4. Run ./setup.sh providing the domain you wish to create an SSL cert for, the version of the lakehouse monitorLakehouse Optimizer, and an admin email that will be used to configure certbot’s notifications when creating an SSL certificate.

Info

If you do not currently have a registered DNS entry for the lakehouse monitorLakehouse Optimizer, you can skip setting up SSL certs by not supplying the cert_domain or email_certbot arguments.

Code Block
languagebash
eg: ./setup.sh --cert_domain "lakehouse-monitor.company.com" --version 2.35.0 --email_certbot notifications@company.com

...