Advanced Manual Complete Deployment of Lakehouse Optimizer in AWS
Step 1) Prerequisites:
Create a Databricks service principal or select one already available
Determine a DNS name for the application VM, register a domain name if applicable.
If you are using azure AD as an identity provider, create an app registration in your AAD tenant of choice.
Also create a client secret, saving the secret value as input while running the infrastructure setup script
Step 2) Running the install script:
Open an AWS cloudshell instance and upload the install archive
./deploy-lm.sh --databricks_principal "example@domain.com" \
--databricks_account_id <GUID for databricks account> \
--databricks_principal_guid <GUID for databricks principal> \
--version "2.3" \ # this is the Lakehouse Monitor application Version
--email_certbot "it_admin@example.com" \
--aws_region us-east-1 \
--service_principal <Azure service principal client id> \
--tenant_id <Azure Tenant ID> \
Step 3) After ./deploy-lm.sh
Login to AWS Management Console.
The virtual machine needs the policies described below assigned to it. One suggested way would be to create a specific role for the VM and assign the created policies to that role. The information below uses the ‘JSON’ view to enable faster policy creation
Step 4) Assign LHM Monitor IAM Role to VM instance
See Single AWS Account access policies for LHO for regular deployment or Cross AWS Account access policies for BPLM deployment for cross account AWS deployment.
Once the role is created, navigate to the EC2 instance and assign the IAM role
Actions → Security → Modify IAM role
From here search for then select the IAM role and click ‘Update IAM role’
Step 5) Create DNS Entry
Navigate to the Route 53 service page, then further to the hosted zone you wish to manage. Create an 'A' record for the application providing the IP address output at the end of script execution.
run setup.sh
deploy-lm.sh creates an opening for the current IP of the AWS cloudshell session in the VM’s security group. If for some reason you have to restart your session and cannot connect via SSH, determine the IP address of the current cloudshell session and change the IP allowed on port 22.
Navigate back to your cloudshell instance and ssh into the vm to run the rest of the setup
ssh -i ~/.ssh/ec2key ubuntu@<vm public IP or DNS >
Run ./setup.sh providing the domain you wish to create an SSL cert for, the version of the lakehouse monitor, and an admin email that will be used to configure certbot’s notifications when creating an SSL certificate.
If you do not currently have a registered DNS entry for the lakehouse monitor, you can skip setting up SSL certs by not supplying the cert_domain
or email_certbot
arguments.
chmod +x setup.sh
eg: ./setup.sh --cert_domain "lakehouse-monitor.company.com" --version 2.3 --email_certbot notifications@company.com
Update docker-compose.yml with sql password. You can find the password in secrets manager. It’s stored as one of the key value pairs under the configured secret name
vi docker-compose.yml
find the line
updating as such: SA_PASSWORD: yourpasswordhere
Post setup.sh steps
Edit app registration in Azure, changing the Redirect URI to https://<configured VM DNS>/login/oauth2/code/azure
run start.sh
After the setup script completes, run start.sh to pull down the application container and start it
ACR username and ACR password to be used by docker to pull the BPLM images from the container registry:
bplm-acr-token / <password to be provided upon deployment>
where
ACRUser
is the Blueprint Docker Registry userwhere
ACRPass
is the Blueprint Docker Registry password
All done! After initialization is complete, you should now be able to access the homepage from the configured DNS value