Step 1) Required Resources
Lakehouse Monitor Optimizer requires the following resources to already be created:
Step 2) Configuration Prerequisites
AWS Secrets Manager needs to be configured with the following secret key value pairs. Suggested name for the secret is ‘bplm-credentials’:
storage-access-key
- DynamoDB access keyOptional, AWS Access Key used for accessing Amazon DynamoDB and Amazon SQS by the telemetry agentstorage-secret-key
- DynamoDB secret keyOptional, AWS Secret Key used for accessing Amazon DynamoDB and Amazon SQS by the telemetry agentNote: DynamoDB is the telemetry data store, access from the LHM LHO services or telemetry agents in Databricks workspaces can be enabled either with the access key/secret key pair or via IAM Roles/Credentials and Instance Profiles, in which case the key pair above becomes optional
service-account-username
- Databricks service account usernameservice-account-password
- Databricks service account passwordNote: the Databricks service account is required for access to the Billable Usage Logs of Databricks Accounts API. These logs will fuel all the consumption reports at the Databricks account, workspace, job, job run, task run, nested notebook, cluster, notebook, dlt pipeline, dlt update etc level in the tool and it will help prioritize optimization efforts.
It is a Databricks Account user with username and password, since the only authentication supported by the Databricks Accounts API is username/password.
mssql-password
- SQL Login password for the SQL Databaseapplication-encryption-secret
- encryption key for storing PATs (Personal Access Tokens) and the Databricks Accounts credentials (billable usage logs) in the LHM SQL databaseLHO SQL databasemsft-provider-auth-secret
- Optional, Client secret value from azure app registrationNote: The Service Principal secret key is needed in case you want LHO configured with Azure Active Directory for login and SSL. If you choose to use Databricks authentication only this is not needed and the secret can be omitted.
Step 3) Installation procedure
1. SSH into the BPLM VM configured at Step 1) Required Resources.
Download the install archive by running the following command:
Ubuntu:
wget https://bplmdemoappstg.blob.core.windows.net/deployment/vm-aws/
ubuntu.zip
CentOS:
wget https://bplmdemoappstg.blob.core.windows.net/deployment/vm-aws/centos.zip
Extract the archive contents
unzip ubuntu.zip
/unzip archivecentos.zip
In the destination directory you should see the following files:
Code Block |
---|
.env
docker-compose.yml
setup.sh
start.sh
|
Before you start setup you need to fill out the .env file with the required information). Open the file in your editor of choice and fill in the values.
Please find a brief explanation of the .env values below
Info |
---|
Lakehouse uses databricks authentication currently with an optional setup of Azure AD as the identity provider. If you will not be using AAD, you do not need to fill out You can also In this case you must remove “ |
Code Block |
---|
LOG_LEVEL=info LOG_LEVEL_APP=info LOG_LEVEL_HTTP_HEADERS=error APPSERVICE_URL=<eg:https://demo.aws-bplm.com> SQL_DATABASE=master SQL_SERVER_HOST=<eg:192.168.4.10>10 or endpoint DNS name> SQL_USER=<eg:sql_admin> STORAGE_AWS_REGION=<eg:us-west-1> STORAGE_AWS_TABLE_PREFIX=bplm AWS_SECRETS_MANAGER_ENABLED=true AWS_SECRETS_MANAGER_REGION=<eg:us-west-1> BPLM_SECRET_NAME=<name of the secrets manager secret> SERVER_SSL_ENABLED=true SERVER_SSL_KEY-STORE=/keystore/bplm.p12 SERVER_SSL_KEY-STORE-PASSWORD= SERVER_SSL_KEY-STORE-TYPE=PKCS12 SERVER_SSL_KEY-ALIAS=bplm SERVER_SSL_KEY-PASSWORD= SERVICE_PRINCIPAL_CLIENTID=<eg: 925accb1-8506-4ec4-a90b-b1b0e6d8a5eb> SERVICE_PRINCIPAL_TENANTID=<eg: 03786a4c-412b-4fac-a981-b4c5bcbc55b7> SERVICE#SERVICE_PRINCIPAL_CLIENTSECRET=<secret value> ${msft-provider-auth-secret} DATABRICKS_ACCOUNT_ID=<eg: 56293882-89e7-4ecd-a5f7-cb61e68a54f0> DATARICKS_SERVICE_PRINCIPAL=<eg: 48de6ad6-ff14-403d-b842-d4ce5da4662f> ACTIVE-DIRECTORY_HOST=https://login.microsoftonline.com ACTIVE-DIRECTORY_TOKEN-ENDPOINT=/oauth2/v2.0/token ACTIVE-DIRECTORY_AUTHORIZE-ENDPOINT=/oauth2/v2.0/authorize ACTIVE-DIRECTORY_JWK-ENDPOINT=/discovery/keys ACTIVE-DIRECTORY_USER-INFO-URI=https://graph.microsoft.com/oidc/userinfo CLOUD_PROVIDER=AWS AUTHENTICATION_PROVIDER=databricks-account,active-directory SPRING_PROFILES_ACTIVE=production-aws SERVER_SERVLET_SESSION_PERSISTENT=true SERVER_SERVLET_SESSION_STORE_DIR=<eg: /home/localuser/dockerless-env/spring-session/session> ADMIN_APP_ROLE=internal_user METRIC_PROCESSING_ENABLED=false #metric.queueMonitoring.compactionTimeout=PT25M ADMIN_APP_ROLE=bplm-admin METRIC_PROCESSOR_ENABLED=true STORAGE_THROUGH_IAM_CREDENTIALS=true #metric.queueMonitoring.compactionTimeout=PT25M APPLICATION_NOTIFICATION_JOBNOTIFICATIONQUEUENAME=<prefix for sqs names> #CONSUMPTION_BILLABLE_USAGE_PATH=s3a://{{s3-bucket}}/dbx-costs/billable-usage/csv #CROSS_ACCOUNT_ASSUME_IAM_ROLE_AGENT= #CROSS_ACCOUNT_ASSUME_IAM_ROLE_S3_DBX_BILLING_APP= #CROSS_ACCOUNT_ASSUME_IAM_ROLE_DYNAMO_SQS_APP= #CROSS_ACCOUNT_ASSUME_IAM_ROLE_COST_EXPLORER_APP= #CROSS_ACCOUNT_ASSUME_IAM_ROLE_TAG_WORKSPACE_RESOURCE_APP= # Configuration example from cross account assume roles #CROSS_ACCOUNT_ASSUME_IAM_ROLE_COST_EXPLORER_APP=arn:aws:iam::<aws-account>:role/bplm-dev-costexplorer-role,arn:aws:iam::153067919175:role/examplempl-of-xaccount-permission-role-for-cost-explorer-and-tagz #CROSS_ACCOUNT_ASSUME_IAM_ROLE_TAG_WORKSPACE_RESOURCE_APP=arn:aws:iam::<aws-account>:role/vt-bplm-test-multi-aws-acc-tags,arn:aws:iam::153067919175:role/example-of-xaccount-permission-role-for-cost-explorer-and-tagz #CROSS_ACCOUNT_ASSUME_IAM_ROLE_DYNAMO_SQS_APP=arn:aws:iam::<aws-account>:role/bplm-dev-dynamosqs-role #CROSS_ACCOUNT_ASSUME_IAM_ROLE_AGENT=arn:aws:iam::<aws-account>:role/bplm-dev-dynamosqs-collector-role #CROSS_ACCOUNT_ASSUME_IAM_ROLE_S3_DBX_BILLING_APP=arn:aws:iam::153067919175:role/xaccount-s3-accesss-role |
Note: due to the docker version provided by CentOS the SERVICE_PRINCIPAL_CLIENTSECRET
can not be pulled from the secrets manager.
4. Run ./setup.sh providing the domain you wish to create an SSL cert for, the version of the lakehouse monitorLakehouse Optimizer, and an admin email that will be used to configure certbot’s notifications when creating an SSL certificate.
Info |
---|
If you do not currently have a registered DNS entry for the lakehouse monitorLakehouse Optimizer, you can skip setting up SSL certs by not supplying the |
Code Block | ||
---|---|---|
| ||
eg: ./setup.sh --cert_domain "lakehouse-monitor.company.com" --version 2.5.20 --email_certbot notifications@company.com |
...
5. After the setup script completes, run start.sh to pull down the application container and start it
ACR For being able to run the start script you will need to provide the Blueprint Docker Registry username and ACR password to that will be used by docker to pull the BPLM images from the Blueprint container registry:
bplm-acr-token / <password to be provided upon deployment>
where
ACRUser
is the Blueprint Docker Registry userwhere
ACRPass
is the Blueprint Docker Registry password
Code Block | ||
---|---|---|
| ||
eg: ./start.sh example-acr-user someStrongPasswordprovidedPassword |