Skip to content

Update the auto YAML Generation #7725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
1df036d
some changes for connector YAML
royendo Aug 4, 2025
3ff888e
remove
royendo Aug 4, 2025
a381593
separated YAML schema
royendo Aug 4, 2025
b6b987f
nit
royendo Aug 4, 2025
d7d4b11
link fixes
royendo Aug 4, 2025
a3784a0
Update generate_project.go
royendo Aug 4, 2025
f2ed2c4
add sec warning
royendo Aug 4, 2025
d57d430
motherduck live connector
royendo Aug 4, 2025
e5434ee
remove local file from connector
royendo Aug 4, 2025
cd0c819
nit
royendo Aug 4, 2025
d3396eb
matching file names
royendo Aug 4, 2025
16c99c6
reordered files and added sources YAML and model SQL
royendo Aug 4, 2025
6ba0dcd
nit, fix links
royendo Aug 4, 2025
4363fff
comparing with current YAML changes
royendo Aug 5, 2025
64677e0
Merge branch 'main' into docs/auto-gen-YAML
royendo Aug 5, 2025
77f0ca9
adding annotatons back
royendo Aug 5, 2025
7c35559
gofmt, golint
royendo Aug 5, 2025
0ff11e2
fix
royendo Aug 5, 2025
11b2710
first pass fix, need to review a few other items
royendo Aug 6, 2025
db75a6b
remove toplevel driver
royendo Aug 6, 2025
95ca49a
single project file
royendo Aug 6, 2025
98baf56
Update generate_project.go
royendo Aug 6, 2025
7065bcc
good old cursor wiped my functoni
royendo Aug 6, 2025
dba395a
missing parameters
royendo Aug 6, 2025
6e32ddd
midding parameters and inline examples
royendo Aug 6, 2025
c55eff3
god so many false positives from asking cursor to merge files
royendo Aug 6, 2025
6ffe7f8
adding links
royendo Aug 6, 2025
f69c89a
fixed oneOf
royendo Aug 6, 2025
3b6517e
split data properties to have api excamples
royendo Aug 6, 2025
e67f159
gonig through files
royendo Aug 7, 2025
df54d71
returning original format
royendo Aug 7, 2025
f94396e
?
royendo Aug 7, 2025
c58c634
Merge remote-tracking branch 'origin/main' into docs/auto-gen-YAML
royendo Aug 8, 2025
e8c04e9
fixing urls,
royendo Aug 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
401 changes: 330 additions & 71 deletions cli/cmd/docs/generate_project.go

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,9 +1,25 @@
---
note: GENERATED. DO NOT EDIT.
title: Model YAML
sidebar_position: 38
title: Models YAML
sidebar_position: 34
---

:::tip

Both regular models and source models can use the Model YAML specification described on this page. While [SQL models](./models) are perfect for simple transformations, Model YAML files provide advanced capabilities for complex data processing scenarios.

**When to use Model YAML:**
- **Partitions** - Optimize performance with data partitioning strategies
- **Incremental models** - Process only new or changed data efficiently
- **Pre/post execution hooks** - Run custom logic before or after model execution
- **Staging** - Create intermediate tables for complex transformations
- **Output configuration** - Define specific output formats and destinations

Model YAML files give you fine-grained control over how your data is processed and transformed, making them ideal for production workloads and complex analytics pipelines.

:::


## Properties

### `type`
Expand All @@ -24,15 +40,39 @@ _[object]_ - Specifies the refresh schedule that Rill should follow to re-ingest

- **`run_in_dev`** - _[boolean]_ - If true, allows the schedule to run in development mode.

```yaml
refresh:
cron: "* * * * *"
#every: "24h"
```


### `connector`

_[string]_ - Refers to the connector type or [named connector](./connector.md#name) for the source.

_[string]_ - Refers to the resource type and is needed if setting an explicit OLAP engine. IE `clickhouse`

### `sql`

_[string]_ - Raw SQL query to run against source _(required)_

### `pre_exec`

_[string]_ - Refers to SQL queries to run before the main query, available for DuckDB-based models. (optional).
Ensure pre_exec queries are idempotent. Use IF NOT EXISTS statements when applicable.
```yaml
pre_exec: ATTACH IF NOT EXISTS 'dbname=postgres host=localhost port=5432 user=postgres password=postgres' AS postgres_db (TYPE POSTGRES)
```


### `post_exec`

_[string]_ - Refers to a SQL query that is run after the main query, available for DuckDB-based models. (optional).
Ensure post_exec queries are idempotent. Use IF EXISTS statements when applicable.
```yaml
post_exec: DETACH DATABASE IF EXISTS postgres_db
```


### `timeout`

_[string]_ - The maximum time to wait for model ingestion
Expand Down Expand Up @@ -81,6 +121,12 @@ _[oneOf]_ - Refers to the explicitly defined state of your model, cannot be used

- **`where_error`** - _[boolean]_ - Indicates whether the condition should trigger when the resource is in an error state.

```yaml
state:
sql: SELECT MAX(date) as max_date
```


### `partitions`

_[oneOf]_ - Refers to the how your data is partitioned, cannot be used with state. (optional)
Expand Down Expand Up @@ -117,6 +163,17 @@ _[oneOf]_ - Refers to the how your data is partitioned, cannot be used with stat

- **`where_error`** - _[boolean]_ - Indicates whether the condition should trigger when the resource is in an error state.

```yaml
partitions:
glob: gcs://my_bucket/y=*/m=*/d=*/*.parquet
```
```yaml
partitions:
connector: duckdb
sql: SELECT range AS num FROM range(0,10)
```


### `materialize`

_[boolean]_ - models will be materialized in olap
Expand All @@ -135,6 +192,15 @@ _[object]_ - in the case of staging models, where an input source does not suppo

- **`connector`** - _[string]_ - Refers to the connector type for the staging table _(required)_

- **`path`** - _[string]_ - Refers to the path to the staging table

```yaml
stage:
connector: s3
path: s3://my_bucket/my_staging_table
```


### `output`

_[object]_ - to define the properties of output
Expand Down Expand Up @@ -201,198 +267,14 @@ _[object]_ - Overrides any properties in development environment.

_[object]_ - Overrides any properties in production environment.

## Additional properties when `connector` is `athena` or [named connector](./connector.md#name) for athena

### `output_location`

_[string]_ - Output location for query results in S3.

### `workgroup`

_[string]_ - AWS Athena workgroup to use for queries.

### `region`

_[string]_ - AWS region to connect to Athena and the output location.

## Additional properties when `connector` is `azure` or [named connector](./connector.md#name) of azure

### `path`

_[string]_ - Path to the source

### `account`

_[string]_ - Account identifier

### `uri`

_[string]_ - Source URI

### `extract`

_[object]_ - Arbitrary key-value pairs for extraction settings

### `glob`

_[object]_ - Settings related to glob file matching.

- **`max_total_size`** - _[integer]_ - Maximum total size (in bytes) matched by glob

- **`max_objects_matched`** - _[integer]_ - Maximum number of objects matched by glob

- **`max_objects_listed`** - _[integer]_ - Maximum number of objects listed in glob

- **`page_size`** - _[integer]_ - Page size for glob listing

### `batch_size`

_[string]_ - Size of a batch (e.g., '100MB')

## Additional properties when `connector` is `bigquery` or [named connector](./connector.md#name) of bigquery

### `project_id`

_[string]_ - ID of the BigQuery project.

## Additional properties when `connector` is `duckdb` or [named connector](./connector.md#name) of duckdb

### `path`
## Depending on the connector, additional properties may be required

_[string]_ - Path to the data source.
Depending on the connector, additional properties may be required, for more information see the [connectors](./connectors.md) documentation

### `format`

_[string]_ - Format of the data source (e.g., csv, json, parquet).

### `pre_exec`

_[string]_ - refers to SQL queries to run before the main query, available for DuckDB-based models. _(optional)_. Ensure `pre_exec` queries are idempotent. Use `IF NOT EXISTS` statements when applicable.

### `post_exec`

_[string]_ - refers to a SQL query that is run after the main query, available for DuckDB-based models. _(optional)_. Ensure `post_exec` queries are idempotent. Use `IF EXISTS` statements when applicable.
## Examples

### Incremental model
```yaml
pre_exec: ATTACH IF NOT EXISTS 'dbname=postgres host=localhost port=5432 user=postgres password=postgres' AS postgres_db (TYPE POSTGRES);
sql: SELECT * FROM postgres_query('postgres_db', 'SELECT * FROM USERS')
post_exec: DETACH DATABASE IF EXISTS postgres_db
test
```

## Additional properties when `connector` is `gcs` or [named connector](./connector.md#name) of gcs

### `path`

_[string]_ - Path to the source

### `uri`

_[string]_ - Source URI

### `extract`

_[object]_ - key-value pairs for extraction settings

### `glob`

_[object]_ - Settings related to glob file matching.

- **`max_total_size`** - _[integer]_ - Maximum total size (in bytes) matched by glob

- **`max_objects_matched`** - _[integer]_ - Maximum number of objects matched by glob

- **`max_objects_listed`** - _[integer]_ - Maximum number of objects listed in glob

- **`page_size`** - _[integer]_ - Page size for glob listing

### `batch_size`

_[string]_ - Size of a batch (e.g., '100MB')

## Additional properties when `connector` is `local_file` or [named connector](./connector.md#name) of local_file

### `path`

_[string]_ - Path to the data source.

### `format`

_[string]_ - Format of the data source (e.g., csv, json, parquet).

## Additional properties when `connector` is `redshift` or [named connector](./connector.md#name) of redshift

### `output_location`

_[string]_ - S3 location where query results are stored.

### `workgroup`

_[string]_ - Redshift Serverless workgroup to use.

### `database`

_[string]_ - Name of the Redshift database.

### `cluster_identifier`

_[string]_ - Identifier of the Redshift cluster.

### `role_arn`

_[string]_ - ARN of the IAM role to assume for Redshift access.

### `region`

_[string]_ - AWS region of the Redshift deployment.

## Additional properties when `connector` is `s3` or [named connector](./connector.md#name) of s3

### `region`

_[string]_ - AWS region

### `endpoint`

_[string]_ - AWS Endpoint

### `path`

_[string]_ - Path to the source

### `uri`

_[string]_ - Source URI

### `extract`

_[object]_ - key-value pairs for extraction settings

### `glob`

_[object]_ - Settings related to glob file matching.

- **`max_total_size`** - _[integer]_ - Maximum total size (in bytes) matched by glob

- **`max_objects_matched`** - _[integer]_ - Maximum number of objects matched by glob

- **`max_objects_listed`** - _[integer]_ - Maximum number of objects listed in glob

- **`page_size`** - _[integer]_ - Page size for glob listing

### `batch_size`

_[string]_ - Size of a batch (e.g., '100MB')

## Additional properties when `connector` is `salesforce` or [named connector](./connector.md#name) of salesforce

### `soql`

_[string]_ - SOQL query to execute against the Salesforce instance.

### `sobject`

_[string]_ - Salesforce object (e.g., Account, Contact) targeted by the query.

### `queryAll`

_[boolean]_ - Whether to include deleted and archived records in the query (uses queryAll API).
28 changes: 19 additions & 9 deletions docs/docs/hidden/yaml/alert.md → docs/docs/hidden/yaml/alerts.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
note: GENERATED. DO NOT EDIT.
title: Alert YAML
sidebar_position: 31
sidebar_position: 38
---

Along with alertings at the dashboard level and can be created via the UI, there might be more extensive alerting that you might want to develop and can be done so the an alert.yaml. When creating an alert via a YAML file, you'll see this denoted in the UI as `Created through code`.
Expand All @@ -12,13 +12,15 @@ Along with alertings at the dashboard level and can be created via the UI, there

_[string]_ - Refers to the resource type and must be `alert` _(required)_

### `display_name`

_[string]_ - Refers to the display name for the alert

### `refresh`

_[object]_ - Specifies the refresh schedule that Rill should follow to re-ingest and update the underlying data _(required)_
_[object]_ - Refresh schedule for the alert
```yaml
refresh:
cron: "* * * * *"
#every: "24h"
```
_(required)_

- **`cron`** - _[string]_ - A cron expression that defines the execution schedule

Expand All @@ -30,6 +32,14 @@ _[object]_ - Specifies the refresh schedule that Rill should follow to re-ingest

- **`run_in_dev`** - _[boolean]_ - If true, allows the schedule to run in development mode.

### `display_name`

_[string]_ - Display name for the alert

### `description`

_[string]_ - Description for the alert

### `intervals`

_[object]_ - define the interval of the alert to check
Expand All @@ -42,15 +52,15 @@ _[object]_ - define the interval of the alert to check

### `watermark`

_[string]_ - Specifies how the watermark is determined for incremental processing. Use 'trigger_time' to set it at runtime or 'inherit' to use the upstream model's watermark.
_[string]_ - Specifies how the watermark is determined for incremental processing. Use 'trigger_time' to set it at runtime or 'inherit' to use the upstream model's watermark.

### `timeout`

_[string]_ - define the timeout of the alert in seconds (optional).

### `data`

_[oneOf]_ - Specifies one of the options to retrieve or compute the data used by alert _(required)_
_[oneOf]_ - Data source for the alert _(required)_

- **option 1** - _[object]_ - Executes a raw SQL query against the project's data models.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. we should not remove options.


Expand Down Expand Up @@ -122,7 +132,7 @@ _[string]_ - Defines the re-notification interval for the alert (e.g., '10m','24

### `notify`

_[object]_ - Defines how and where to send notifications. At least one method (email or Slack) is required. _(required)_
_[object]_ - Notification configuration _(required)_

- **`email`** - _[object]_ - Send notifications via email.

Expand Down
Loading
Loading