Quickstart¶

This guide walks you from zero to a running clair project.

1. Set up an environment¶

Run clair init — it will prompt for your Snowflake connection details and write ~/.clair/environments.yml:

$ clair init

Project directory [.]: ./my_project

  hint: select concat(current_organization_name(), '-', current_account_name()) as account;
Snowflake account (e.g. myorg-myaccount): myorg-myaccount

  hint: select current_user() as user;
Snowflake user: alice@example.com

Authentication method:
  1. Private key
  2. Password
  3. SSO (externalbrowser)
Enter choice [1]: 3

  hint: select current_warehouse() as warehouse;
Warehouse: my_warehouse

Role (leave blank to use user default): analyst

  hint: select current_region() as region;
Region (e.g. us-east-1): us-east-1

  hint: select current_account() as account_locator;
Account locator (e.g. abc12345): abc12345

What is an example Snowflake table that contains source data? (eg source.orders.raw) [source]: source.products.catalog

  created  /path/to/my_project/source/products/catalog.py
  created  /path/to/my_project/.gitignore
  created  ~/.clair/environments.yml

✓ Project ready.

Next steps:
  1. clair compile --project /path/to/my_project
  2. clair run    --project /path/to/my_project

2. Create a project¶

Each directory level maps to one part of the Snowflake name: database / schema / table.

my_project/
├── source/
│   └── products/
│       ├── catalog.py             # source.products.catalog
│       └── reviews.py             # source.products.reviews
├── refined/
│   └── products/
│       ├── catalog.py             # refined.products.catalog
│       └── reviews.py             # refined.products.reviews
└── derived/
    └── products/
        └── top_reviewed.py        # derived.products.top_reviewed

3. Write your Trouves¶

source/products/catalog.py — declares a pre-existing Snowflake table:

from clair import Column, ColumnType, Trouve, TrouveType

trouve = Trouve(
    type=TrouveType.SOURCE,
    columns=[
        Column(name="product_id",   type=ColumnType.STRING),
        Column(name="name",         type=ColumnType.STRING),
        Column(name="category",     type=ColumnType.STRING),
        Column(name="price",        type=ColumnType.FLOAT),
    ],
)

refined/products/catalog.py — cleans and standardises the source:

from clair import Trouve, TrouveType
from source.products.catalog import trouve as source_catalog

trouve = Trouve(
    type=TrouveType.TABLE,
    sql=f"""
        SELECT
            product_id,
            initcap(name)     AS name,
            lower(category)   AS category,
            price
        FROM {source_catalog}
        WHERE product_id IS NOT NULL
    """,
)

refined/products/reviews.py — cleans the reviews source:

from clair import Trouve, TrouveType
from source.products.reviews import trouve as source_reviews

trouve = Trouve(
    type=TrouveType.TABLE,
    sql=f"""
        SELECT
            review_id,
            product_id,
            rating,
            review_date
        FROM {source_reviews}
        WHERE rating BETWEEN 1 AND 5
    """,
)

derived/products/top_reviewed.py — joins and aggregates the refined layer:

from clair import Trouve, TrouveType
from refined.products.catalog import trouve as catalog
from refined.products.reviews import trouve as reviews

trouve = Trouve(
    type=TrouveType.TABLE,
    sql=f"""
        SELECT
            c.product_id,
            c.name,
            c.category,
            avg(r.rating)  AS avg_rating,
            count(*)       AS review_count
        FROM {catalog} c
        JOIN {reviews} r ON c.product_id = r.product_id
        GROUP BY 1, 2, 3
        HAVING avg_rating >= 4
    """,
)

4. Compile and run¶

# Inspect the generated SQL without touching Snowflake
clair compile --project=my_project

# Execute against Snowflake
clair run --project=my_project --env=dev

Next steps¶

Pandas-native transformations — write pipeline steps as Python functions
Incrementality — APPEND and UPSERT strategies for large tables
Data quality tests — all four test types
Selectors — run only a subset of Trouves
CLI reference — all commands and flags