Databricks

Databricks is a unified data analytics platform that allows for:

Storage of data, both structured and unstructured
The creation of data pipelines to ingest and transform data
The creation of machine learning models
The creation of dashboards and reports
Tracking of data lineage and quality as well as the governance of data

We use Databricks as our primary data analytics platform. All data in ARC should be stored or connected to Databricks. Everyone in ARC has been granted access to Databricks and should have the necessary permissions to access the data they need. If you do not have access to Databricks, please contact the Data Analytics team.

Databricks Getting Started

All users have access to the data catalog in Databricks (This is called Unity Catalog in Databricks). The data catalog is a central repository of all data in Databricks. To access the data catalog, follow these steps:

Log in to Databricks using your ARC credentials
Click on the Catalog tab on the left-hand side of the screen
You will see a list of all the data in Databricks in a tree structure, the tree structure is generally organized into schemas (a grouping of tables) and tables. Clicking on a schema will show you all the tables in that schema. Clicking on a table will show you the details of that table.
Search for the data you need by typing in the search bar at the top of the screen. It is possible to find tables, columns, and schemas by searching.

Databricks Compute Optimization

Databricks provides several features to optimize compute resources and improve performance, see:
ARC Databricks Compute Optimization
Comprehensive Guide to Optimize Databricks, Spark and Delta Lake Workloads

Some of the key features include:

Auto-scaling: Automatically adjusts the number of workers in a cluster based on workload
Job Clusters: Dedicated clusters for running scheduled jobs, which can be configured to start and stop automatically
Photon Engine: A query engine that accelerates SQL workloads by optimizing query execution

If you are wanting some help with optimizing your compute resources, please reach out to the Data Analytics team.

Last updated on 18 Aug 2025
Published on 18 Aug 2025
Edit on GitHub