Unity Catalog Structure
Introduction
Databricks is ARC’s primary data processing platform. Databricks has baked in the Unity Catalog as a system for the organization of data. The Unity Catalog is a hierarchical structure that organizes data into a tree of catalogs and schemas and supports search, metadata, and permissions. The Unity Catalog is a powerful tool for organizing data and making it easy to find and use data in Databricks. This project aims to describe how ARC will be using the Unity Catalog to organize data.
Principles
- Data is organized in a way to make it easily consumable: The Unity Catalog should be organized in a way that makes it easy for users to find the data they need. This means that the catalog should be organized in a way that is intuitive and easy to navigate.
- Data is organized in a way that is easy to maintain: The Unity Catalog should be organized in a way that is easy to maintain. This means that the catalog should be organized in a way that is easy to update and modify as new data sources are added or existing data sources are changed.
- Data is open by default: The Unity Catalog should be organized in a way that makes data open by default. This means that data should be organized in a way that makes it easy for users to access and use data without having to jump through hoops to get access. Exceptions are made for sensitive data.
- Data is organized in a way that is easy to secure: The Unity Catalog should be organized in a way that is easy to secure. This means that the catalog should be organized in a way that makes it easy to control access to data sources and ensure that only authorized users have access to sensitive data.
Catalog Structure
In organizing the Unity Catalog, the following is the proposed structure:
Option 1
Unity Catalog
├── Foreign Catalog 1
├── Foreign Catalog 2
├── Foreign Catalog Views Catalog
├── Source Systems Catalog
├── Functional Area Catalog 1
├── Functional Area Catalog 2
├── Business Unit Team Catalog 1
├── Business Unit Team Catalog 2
└── ...
Option 2
Unity Catalog
├── Foreign Catalog 1
├── Foreign Catalog 2
├── Functional Area Catalog 1
├── Functional Area Catalog 2
├── Business Unit Team Catalog 1
├── Business Unit Team Catalog 2
└── ...
Unity Catalog
├── Foreign Catalog 1
│ ├── Foreign Schema 1
│ │ ├── Foreign Table 1
│ │ ├── Foreign Table 2
│ │ └── ...
│ ├── Foreign Schema 2
│ │ ├── Foreign Table 1
│ │ ├── Foreign Table 2
│ │ └── ...
│ └── ...
├── Foreign Catalog 2
│ ├── Foreign Schema 1
│ │ ├── Foreign Table 1
│ │ ├── Foreign Table 2
│ │ └── ...
│ ├── Foreign Schema 2
│ │ ├── Foreign Table 1
│ │ ├── Foreign Table 2
│ │ └── ...
│ └── ...
├── Foreign Catalog Views Catalog
│ ├── Foreign Catalog 1 Views
│ │ ├── View 1
│ │ ├── View 2
│ │ └── ...
│ ├── Foreign Catalog 2 Views
│ │ ├── View 1
│ │ ├── View 2
│ │ └── ...
│ └── ...
├── Source Systems Catalog
│ ├── Source System 1
│ │ ├── Table 1
│ │ ├── Table 2
│ │ └── ...
│ ├── Source System 2
│ │ ├── Table 1
│ │ ├── Table 2
│ │ └── ...
│ └── ...
├── Functional Area Catalog 1
│ ├── Schema 1
│ │ ├── Table 1
│ │ ├── Table 2
│ │ └── ...
│ ├── Schema 2
│ │ ├── Table 1
│ │ ├── Table 2
│ │ └── ...
│ └── ...
├── Functional Area Catalog 2
│ ├── Schema 1
│ │ ├── Table 1
│ │ ├── Table 2
│ │ └── ...
│ └── ...
├── Business Unit Team Catalog 1
│ ├── Schema 1
│ │ ├── Table 1
│ │ ├── Table 2
│ │ └── ...
│ ├── Schema 2
│ │ ├── Table 1
│ │ ├── Table 2
│ │ └── ...
│ └── ...
├── Business Unit Team Catalog 2
│ ├── Schema 1
│ │ ├── Table 1
│ │ ├── Table 2
│ │ └── ...
│ ├── Schema 2
│ │ ├── Table 1
│ │ ├── Table 2
│ │ └── ...
│ └── ...
└── ...
Foreign Catalogs
Managed by: Data Services Team
Permissions: Browse access to all users
Ownership:
Foreign catalogs are catalogs that are external to the Unity Catalog. These catalogs are typically managed by external systems and are connected to the Unity Catalog via a foreign data source. Foreign catalogs are typically read-only and should be used with caution as there can be performance implications when querying data from foreign catalogs. Materialzed views, workflows and other optimizations should be considered when working with foreign catalogs to improve performance and to protect the source system from excessive queries.
Foreign Catalog Views Catalog
Managed and Administered by: Data Services Team
Permissions: Browse & Select access to all users
Ownership:
The Foreign Catalog Views Catalog is a catalog that contains views that are created on top of foreign catalogs. These views created via a script that queries the foreign catalog and creates a view in the Unity Catalog. These views can be used to simplify the data model and to provide a more user-friendly interface to the data in the foreign catalog.
Replicated Tables
Replicated tables are tables that are replicated from a external source system into the Unity Catalog. These tables are typically replicated via Extract Transform Load (ETL) processes.
Source Systems Catalog
Managed and Administered by: Data Services Team
Permissions: Browse & Select access to all users
Ownership:
The Source Systems Catalog is a catalog that contains tables that are sourced from source systems. These tables are typically replicated into the Unity Catalog via ETL processes and are used as the source of truth for data in the Unity Catalog. Source systems can be internal or external to ARC. This would be equivalent to the current zone 2 catalog.
Functional Area Catalogs
Managed and Administered by: Data Services Team
Permissions: Browse & Select access to all users
Ownership: The Respective Business Unit
Functional Area Catalogs are catalogs that are organized by a logical functional area. These catalogs contain tables and or views that are specific to a function and are typically used by the business unit to organize data in a way that is meaningful to the business unit. An example of a functional area catalog is a catalog that contains tables related to finance, marketing or wells. This is not necessarily business unit specific but is organized by a logical function.
Business Unit Team Catalogs
Administered by: Data Services Team
Managed by: The Respective Business Unit Team
Permissions: No Access All Users, Read & Write Access to specific Business Unit Team
Ownership: The Respective Business Unit Team
Business Unit Team Catalogs are catalogs that are administered by the Data Services team but then are managed by the business unit team. These catalogs are specifically for a business unit to store their own analyses, models and data. The Data Services team is responsible for setting up the catalog and providing access to the business unit team. The business unit team is responsible for managing the data in the catalog. Access to the catalog is restricted to the business unit team only. No other users should have access to these catalogs.