Advanced Manual Complete Deployment of Lakehouse Optimizer in AWS

Step 1) Prerequisites:

  • Create a Databricks service principal or select one already available

  • Determine a DNS name for the application VM, register a domain name if applicable.

  • If you are using azure AD as an identity provider, create an app registration in your AAD tenant of choice.

    • Also create a client secret, saving the secret value as input while running the infrastructure setup script

Step 2) Running the install script:

  • Open an AWS cloudshell instance and upload the install archive

./deploy-lm.sh --databricks_principal "example@domain.com" \ --databricks_account_id <GUID for databricks account> \ --databricks_principal_guid <GUID for databricks principal> \ --version "2.3" \ # this is the Lakehouse Monitor application Version --email_certbot "it_admin@example.com" \ --aws_region us-east-1 \ --service_principal <Azure service principal client id> \ --tenant_id <Azure Tenant ID> \

Step 3) After ./deploy-lm.sh

  • Login to AWS Management Console.

  • The virtual machine needs the policies described below assigned to it. One suggested way would be to create a specific role for the VM and assign the created policies to that role. The information below uses the ‘JSON’ view to enable faster policy creation

 

Step 4) Assign LHM Monitor IAM Role to VM instance

See Single AWS Account access policies for LHO for regular deployment or Cross AWS Account access policies for BPLM deployment for cross account AWS deployment.

Once the role is created, navigate to the EC2 instance and assign the IAM role

  • Actions → Security → Modify IAM role

    • From here search for then select the IAM role and click ‘Update IAM role’

 

Step 5) Create DNS Entry

  • Navigate to the Route 53 service page, then further to the hosted zone you wish to manage. Create an 'A' record for the application providing the IP address output at the end of script execution.

run setup.sh

deploy-lm.sh creates an opening for the current IP of the AWS cloudshell session in the VM’s security group. If for some reason you have to restart your session and cannot connect via SSH, determine the IP address of the current cloudshell session and change the IP allowed on port 22.

  • Navigate back to your cloudshell instance and ssh into the vm to run the rest of the setup

ssh -i ~/.ssh/ec2key ubuntu@<vm public IP or DNS >

  • Run ./setup.sh providing the domain you wish to create an SSL cert for, the version of the lakehouse monitor, and an admin email that will be used to configure certbot’s notifications when creating an SSL certificate.

If you do not currently have a registered DNS entry for the lakehouse monitor, you can skip setting up SSL certs by not supplying the cert_domain or email_certbot arguments.

chmod +x setup.sh eg: ./setup.sh --cert_domain "lakehouse-monitor.company.com" --version 2.3 --email_certbot notifications@company.com

Update docker-compose.yml with sql password. You can find the password in secrets manager. It’s stored as one of the key value pairs under the configured secret name

vi docker-compose.yml

find the line

updating as such: SA_PASSWORD: yourpasswordhere

Post setup.sh steps

Edit app registration in Azure, changing the Redirect URI to https://<configured VM DNS>/login/oauth2/code/azure

run start.sh

After the setup script completes, run start.sh to pull down the application container and start it

  • ACR username and ACR password to be used by docker to pull the BPLM images from the container registry: bplm-acr-token / <password to be provided upon deployment>

  • where ACRUser is the Blueprint Docker Registry user

  • where ACRPass is the Blueprint Docker Registry password

All done! After initialization is complete, you should now be able to access the homepage from the configured DNS value