Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Inventory of Workspace Objects:

    • Tables: Identifies managed, external, and tables requiring review, including details about location, owner, provider, and type.

    • Databases: Analyzes Hive databases, providing information about location, owner, comments, and potential table errors.

    • Functions: Inventories Hive functions, including class name, type, determinism, and version.

    • Users and Groups: Provides details on workspace users and groups, including entitlements and memberships.

    • Grants: Analyzes permission grants for tables within databases.

    • Notebooks and Files: Identifies notebooks and files within the workspace, excluding certain file types.

    • Jobs: Analyzes jobs and identifies the type of task (notebook, spark, python, etc.) and the object name associated with the task.

    • DLT Pipelines: Provides an overview of Delta Live Tables pipelines.

    • External Locations: Lists external locations used within the workspace.

    • Clusters: Provides information about cluster IDs, names, and Spark versions.

    • Models: Lists models in the model registry along with user and version information.

  • Code Analysis:

    • Notebook Impact Assessment: Analyzes notebooks to identify references to Hive tables and DBFS, aiding in understanding the potential impact of migration on notebooks.

    • Statistical Analysis: Calculates mean, median, and mode for table references in notebooks.

  • Recommendations: Offers actionable recommendations based on the assessment findings, including:

  • The feature requires a Unity Catalog-enabled workspace and access to a specific catalog, schema, and volume for storing results.

  • The notebook utilizes the Databricks SDK for Python and leverages various Spark SQL functions for data analysis.
    • Upgrading cluster DBR versions

    • Addressing data in DBFS root

    • Migrating workspace permission grants

    • Evaluating workspace groups

    • Designing the Unity Catalog structure

    • Strategizing for external locations

    • Analyzing source code objects used in multiple jobs

    • Evaluating notebooks that reference tables and DBFS.

  • Reporting:

    • Delta Tables: Stores the inventory data in Delta tables within the specified Unity Catalog schema for further analysis.

    • CSV Files: Generates comprehensive CSV reports for each inventory category.

    • Notebook Impact Analysis Graph: Generates an interactive graph visualizing the distribution of table references in notebooks.

Additional Notes:

Conclusion:

The Unity Catalog Migration Assessment feature in the Blueprint Lakehouse Optimizer provides valuable insights and automation capabilities to facilitate a smooth and successful migration to Unity Catalog. By leveraging its comprehensive analysis and actionable recommendations, organizations can confidently navigate the complexities of migration and ensure a seamless transition to a more secure and governed data lakehouse environment.