How to connect to Databricks¶
Install the Databricks extra¶
pip install semolina[databricks]
# or
uv add "semolina[databricks]"
The Databricks extra installs adbc-poolhouse[databricks], which provides the ADBC
Databricks driver and connection pooling.
Configure with .semolina.toml (recommended)¶
Create a .semolina.toml file in your project root:
# .semolina.toml
[connections.default]
type = "databricks"
host = "workspace.cloud.databricks.com"
http_path = "/sql/1.0/warehouses/abc123"
token = "dapi..."
# catalog = ""
# schema = ""
Field |
Type |
Required |
Description |
|---|---|---|---|
|
|
Yes |
Must be |
|
|
Yes |
Databricks workspace hostname (e.g. |
|
|
Yes |
SQL warehouse HTTP path (e.g. |
|
|
Yes |
Personal access token starting with |
|
|
No |
Unity Catalog name |
|
|
No |
Default schema |
Then load and register the pool:
from semolina import register, pool_from_config
pool, dialect = pool_from_config()
register("default", pool, dialect=dialect)
Tip
Use pool_from_config(connection="analytics") to load a named connection section
other than default.
Configure manually¶
When credentials come from a vault or secrets manager, construct the pool directly:
from adbc_poolhouse import DatabricksConfig, create_pool
from semolina import register
config = DatabricksConfig(
host="workspace.cloud.databricks.com",
http_path="/sql/1.0/warehouses/abc123",
token="dapi...",
)
pool = create_pool(config)
register("default", pool, dialect="databricks")
Use Unity Catalog three-part names¶
Databricks uses Unity Catalog
for three-level namespace: catalog.schema.view. Pass a three-part view= name in your model:
from semolina import SemanticView, Metric, Dimension
class Sales(SemanticView, view="main.analytics.sales"):
revenue = Metric()
country = Dimension()
Each part is quoted separately with backticks in generated SQL:
SELECT MEASURE(`revenue`), `country`
FROM `main`.`analytics`.`sales`
GROUP BY ALL
Run a query¶
Once a pool is registered, the query API works the same as any backend:
cursor = (
Sales.query()
.metrics(Sales.revenue)
.dimensions(Sales.country)
.execute()
)
for row in cursor.fetchall_rows():
print(row.country, row.revenue)
Generated SQL¶
Databricks SQL uses MEASURE() for metrics and backtick-quoted identifiers:
SELECT MEASURE(`revenue`), `country`
FROM `sales`
GROUP BY ALL
See also¶
How to choose and configure a backend – compare connection patterns
How to connect to Snowflake – connect to Snowflake semantic views
How to test application code with MockPool – test queries with
MockPool