Metadata Onboarding Guide
Glossary Alignment for New Data Sources
To ensure consistent data definitions across the organization, all new data sources ingested into Databricks must align with the enterprise data glossary maintained by the Data Governance team.
The glossary is stored in the team_information_management schema and contains standardized definitions that are applied to dataset columns using metadata mappings.
How the Glossary Works
Our glossary framework uses two tables:
Glossary Definitions Table
Stores the official definition of each approved term.
These definitions are used as column comments in Databricks tables to ensure consistent interpretation of data.Glossary Mapping Table
Maps glossary terms to specific columns in datasets.
This allows the same standardized definition to be applied across multiple tables and data sources.
This structure ensures that business terminology is standardized while remaining flexible enough to support reuse of glossary terms and multiple definitions for a single term.
Information Required for New Data Sources
When submitting a new dataset for ingestion, please provide the following information for each column that needs to be mapped to the glossary:
Column Name
Provide the column name that will appear in the Databricks table. This helps the governance team apply the correct glossary definition and column comments.Proposed Definition
Provide a clear business definition for the column.
If the term already exists in the glossary but your definition differs, or if the column name is entirely new, please include the new definition.Definitions should describe:
- What the field represents
- Any important business context (calculations, acceptable values, etc.)
- Measurement units, if applicable
Source System (optional)
Specify the originating system for the data, if applicable.Source System Column Name (optional)
If the column is coming directly from another system, include the exact column name from the source system.
This helps ensure accurate mapping between the source system and the Databricks table and supports governance traceability.Third-Party Definition (if applicable)
If the field originates from a vendor or external system, include the official definition provided by the vendor.
This helps governance validate that internal definitions align with vendor documentation.
Submission Format
If you are creating new tables in the DAL from new data sources, please send the Information Management team an Excel file for each table. The file name should include the table name.
Each Excel file must contain the following completed columns:
- Column Name
- Column Comment (Definition)
- Source System
- Source System Column Name
The Information Management team will review the submission and integrate the approved definitions into the enterprise-wide glossary.
Metadata Deployment Process
Together with Data Services, we developed a process to manage glossary definitions and column mappings using CSV files.
These files are uploaded into the DAL (uat_ds_admin.metadata) within a volume.
Each morning at 5:00 AM, the system applies glossary definitions to the appropriate columns based on the mapping CSV files.
Several tables capture column changes over time to support tracking, and an additional table records any errors encountered during the application of column comments.