Skip to content

Update the auto YAML Generation #7725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
1df036d
some changes for connector YAML
royendo Aug 4, 2025
3ff888e
remove
royendo Aug 4, 2025
a381593
separated YAML schema
royendo Aug 4, 2025
b6b987f
nit
royendo Aug 4, 2025
d7d4b11
link fixes
royendo Aug 4, 2025
a3784a0
Update generate_project.go
royendo Aug 4, 2025
f2ed2c4
add sec warning
royendo Aug 4, 2025
d57d430
motherduck live connector
royendo Aug 4, 2025
e5434ee
remove local file from connector
royendo Aug 4, 2025
cd0c819
nit
royendo Aug 4, 2025
d3396eb
matching file names
royendo Aug 4, 2025
16c99c6
reordered files and added sources YAML and model SQL
royendo Aug 4, 2025
6ba0dcd
nit, fix links
royendo Aug 4, 2025
4363fff
comparing with current YAML changes
royendo Aug 5, 2025
64677e0
Merge branch 'main' into docs/auto-gen-YAML
royendo Aug 5, 2025
77f0ca9
adding annotatons back
royendo Aug 5, 2025
7c35559
gofmt, golint
royendo Aug 5, 2025
0ff11e2
fix
royendo Aug 5, 2025
11b2710
first pass fix, need to review a few other items
royendo Aug 6, 2025
db75a6b
remove toplevel driver
royendo Aug 6, 2025
95ca49a
single project file
royendo Aug 6, 2025
98baf56
Update generate_project.go
royendo Aug 6, 2025
7065bcc
good old cursor wiped my functoni
royendo Aug 6, 2025
dba395a
missing parameters
royendo Aug 6, 2025
6e32ddd
midding parameters and inline examples
royendo Aug 6, 2025
c55eff3
god so many false positives from asking cursor to merge files
royendo Aug 6, 2025
6ffe7f8
adding links
royendo Aug 6, 2025
f69c89a
fixed oneOf
royendo Aug 6, 2025
3b6517e
split data properties to have api excamples
royendo Aug 6, 2025
e67f159
gonig through files
royendo Aug 7, 2025
df54d71
returning original format
royendo Aug 7, 2025
f94396e
?
royendo Aug 7, 2025
c58c634
Merge remote-tracking branch 'origin/main' into docs/auto-gen-YAML
royendo Aug 8, 2025
e8c04e9
fixing urls,
royendo Aug 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
single project file
all schemas into one including rillyaml. commented out lines.

reworked generate-project.go:
- moved out logic for sample in copy, uses schema sample instead
- adding in line examples
  • Loading branch information
royendo committed Aug 6, 2025
commit 95ca49a0c772a062f508556d47e676e2ca4921b3
321 changes: 118 additions & 203 deletions cli/cmd/docs/generate_project.go

Large diffs are not rendered by default.

282 changes: 41 additions & 241 deletions docs/docs/hidden/yaml/advanced-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,12 @@ _[string]_ - Refers to the resource type and must be `model` _(required)_
### `refresh`

_[object]_ - Specifies the refresh schedule that Rill should follow to re-ingest and update the underlying model data
```yaml
refresh:
cron: "* * * * *"
#every: "24h"
```


- **`cron`** - _[string]_ - A cron expression that defines the execution schedule

Expand Down Expand Up @@ -64,53 +70,65 @@ _[string]_ - Configure how changes to the model specifications are applied (opti

_[oneOf]_ - Refers to the explicitly defined state of your model, cannot be used with partitions (optional)

- **`sql`** - _[string]_ - Raw SQL query to run against existing models in the project. _(required)_
- **`sql`** - _[string]_ - Raw SQL query to run against existing models in the project. _(required)_

- **`connector`** - _[string]_ - specifies the connector to use when running SQL or glob queries.

- **`metrics_sql`** - _[string]_ - SQL query that targets a metrics view in the project _(required)_

- **`connector`** - _[string]_ - specifies the connector to use when running SQL or glob queries.
- **`api`** - _[string]_ - Name of a custom API defined in the project. _(required)_

- **`metrics_sql`** - _[string]_ - SQL query that targets a metrics view in the project _(required)_
- **`args`** - _[object]_ - Arguments to pass to the custom API.

- **`api`** - _[string]_ - Name of a custom API defined in the project. _(required)_
- **`glob`** - _[anyOf]_ - Defines the file path or pattern to query from the specified connector. _(required)_

- **`args`** - _[object]_ - Arguments to pass to the custom API.
- **option 1** - _[string]_ - A simple file path/glob pattern as a string.

- **`glob`** - _[anyOf]_ - Defines the file path or pattern to query from the specified connector. _(required)_
- **option 2** - _[object]_ - An object-based configuration for specifying a file path/glob pattern with advanced options.

- **option 1** - _[string]_ - A simple file path/glob pattern as a string.
- **`connector`** - _[string]_ - Specifies the connector to use with the glob input.

- **option 2** - _[object]_ - An object-based configuration for specifying a file path/glob pattern with advanced options.
- **`resource_status`** - _[object]_ - Based on resource status _(required)_

- **`connector`** - _[string]_ - Specifies the connector to use with the glob input.
- **`where_error`** - _[boolean]_ - Indicates whether the condition should trigger when the resource is in an error state.

- **`resource_status`** - _[object]_ - Based on resource status _(required)_
```yaml
resource_status:
where_error: true
```

- **`where_error`** - _[boolean]_ - Indicates whether the condition should trigger when the resource is in an error state.

### `partitions`

_[oneOf]_ - Refers to the how your data is partitioned, cannot be used with state. (optional)

- **`sql`** - _[string]_ - Raw SQL query to run against existing models in the project. _(required)_
- **`sql`** - _[string]_ - Raw SQL query to run against existing models in the project. _(required)_

- **`connector`** - _[string]_ - specifies the connector to use when running SQL or glob queries.
- **`connector`** - _[string]_ - specifies the connector to use when running SQL or glob queries.

- **`metrics_sql`** - _[string]_ - SQL query that targets a metrics view in the project _(required)_
- **`metrics_sql`** - _[string]_ - SQL query that targets a metrics view in the project _(required)_

- **`api`** - _[string]_ - Name of a custom API defined in the project. _(required)_
- **`api`** - _[string]_ - Name of a custom API defined in the project. _(required)_

- **`args`** - _[object]_ - Arguments to pass to the custom API.
- **`args`** - _[object]_ - Arguments to pass to the custom API.

- **`glob`** - _[anyOf]_ - Defines the file path or pattern to query from the specified connector. _(required)_
- **`glob`** - _[anyOf]_ - Defines the file path or pattern to query from the specified connector. _(required)_

- **option 1** - _[string]_ - A simple file path/glob pattern as a string.
- **option 1** - _[string]_ - A simple file path/glob pattern as a string.

- **option 2** - _[object]_ - An object-based configuration for specifying a file path/glob pattern with advanced options.
- **option 2** - _[object]_ - An object-based configuration for specifying a file path/glob pattern with advanced options.

- **`connector`** - _[string]_ - Specifies the connector to use with the glob input.
- **`connector`** - _[string]_ - Specifies the connector to use with the glob input.

- **`resource_status`** - _[object]_ - Based on resource status _(required)_
- **`resource_status`** - _[object]_ - Based on resource status _(required)_

- **`where_error`** - _[boolean]_ - Indicates whether the condition should trigger when the resource is in an error state.

```yaml
resource_status:
where_error: true
```

- **`where_error`** - _[boolean]_ - Indicates whether the condition should trigger when the resource is in an error state.

### `materialize`

Expand Down Expand Up @@ -146,38 +164,6 @@ _[object]_ - to define the properties of output

- **`partition_by`** - _[string]_ - Column or expression to partition the table by

**Additional properties for `output` when `connector` is `clickhouse`**

- **`type`** - _[string]_ - Type to materialize the model into. Can be 'TABLE', 'VIEW' or 'DICTIONARY'

- **`columns`** - _[string]_ - Column names and types. Can also include indexes. If unspecified, detected from the query.

- **`engine_full`** - _[string]_ - Full engine definition in SQL format. Can include partition keys, order, TTL, etc.

- **`engine`** - _[string]_ - Table engine to use. Default is MergeTree

- **`order_by`** - _[string]_ - ORDER BY clause.

- **`partition_by`** - _[string]_ - Partition BY clause.

- **`primary_key`** - _[string]_ - PRIMARY KEY clause.

- **`sample_by`** - _[string]_ - SAMPLE BY clause.

- **`ttl`** - _[string]_ - TTL settings for the table or columns.

- **`table_settings`** - _[string]_ - Table-specific settings.

- **`query_settings`** - _[string]_ - Settings used in insert/create table as select queries.

- **`distributed_settings`** - _[string]_ - Settings for distributed table.

- **`distributed_sharding_key`** - _[string]_ - Sharding key for distributed table.

- **`dictionary_source_user`** - _[string]_ - User for accessing the source dictionary table (used if type is DICTIONARY).

- **`dictionary_source_password`** - _[string]_ - Password for the dictionary source user.

## Common Properties

### `name`
Expand All @@ -194,190 +180,4 @@ _[object]_ - Overrides any properties in development environment.

### `prod`

_[object]_ - Overrides any properties in production environment.

## Additional properties when `connector` is [`athena`](./connectors#athena)

### `output_location`

_[string]_ - Output location for query results in S3.

### `workgroup`

_[string]_ - AWS Athena workgroup to use for queries.

### `region`

_[string]_ - AWS region to connect to Athena and the output location.

## Additional properties when `connector` is [`azure`](./connectors#azure)

### `path`

_[string]_ - Path to the source

### `account`

_[string]_ - Account identifier

### `uri`

_[string]_ - Source URI

### `extract`

_[object]_ - Arbitrary key-value pairs for extraction settings

### `glob`

_[object]_ - Settings related to glob file matching.

- **`max_total_size`** - _[integer]_ - Maximum total size (in bytes) matched by glob

- **`max_objects_matched`** - _[integer]_ - Maximum number of objects matched by glob

- **`max_objects_listed`** - _[integer]_ - Maximum number of objects listed in glob

- **`page_size`** - _[integer]_ - Page size for glob listing

### `batch_size`

_[string]_ - Size of a batch (e.g., '100MB')

## Additional properties when `connector` is [`bigquery`](./connectors#bigquery)

### `project_id`

_[string]_ - ID of the BigQuery project.

## Additional properties when `connector` is [`duckdb`](./connectors#duckdb)

### `path`

_[string]_ - Path to the data source.

### `format`

_[string]_ - Format of the data source (e.g., csv, json, parquet).

### `pre_exec`

_[string]_ - refers to SQL queries to run before the main query, available for DuckDB-based models. _(optional)_. Ensure `pre_exec` queries are idempotent. Use `IF NOT EXISTS` statements when applicable.

### `post_exec`

_[string]_ - refers to a SQL query that is run after the main query, available for DuckDB-based models. _(optional)_. Ensure `post_exec` queries are idempotent. Use `IF EXISTS` statements when applicable.

```yaml
pre_exec: ATTACH IF NOT EXISTS 'dbname=postgres host=localhost port=5432 user=postgres password=postgres' AS postgres_db (TYPE POSTGRES);
sql: SELECT * FROM postgres_query('postgres_db', 'SELECT * FROM USERS')
post_exec: DETACH DATABASE IF EXISTS postgres_db
```

## Additional properties when `connector` is [`gcs`](./connectors#gcs)

### `path`

_[string]_ - Path to the source

### `uri`

_[string]_ - Source URI

### `extract`

_[object]_ - key-value pairs for extraction settings

### `glob`

_[object]_ - Settings related to glob file matching.

- **`max_total_size`** - _[integer]_ - Maximum total size (in bytes) matched by glob

- **`max_objects_matched`** - _[integer]_ - Maximum number of objects matched by glob

- **`max_objects_listed`** - _[integer]_ - Maximum number of objects listed in glob

- **`page_size`** - _[integer]_ - Page size for glob listing

### `batch_size`

_[string]_ - Size of a batch (e.g., '100MB')

## Additional properties when `connector` is [`redshift`](./connectors#redshift)

### `output_location`

_[string]_ - S3 location where query results are stored.

### `workgroup`

_[string]_ - Redshift Serverless workgroup to use.

### `database`

_[string]_ - Name of the Redshift database.

### `cluster_identifier`

_[string]_ - Identifier of the Redshift cluster.

### `role_arn`

_[string]_ - ARN of the IAM role to assume for Redshift access.

### `region`

_[string]_ - AWS region of the Redshift deployment.

## Additional properties when `connector` is [`s3`](./connectors#s3)

### `region`

_[string]_ - AWS region

### `endpoint`

_[string]_ - AWS Endpoint

### `path`

_[string]_ - Path to the source

### `uri`

_[string]_ - Source URI

### `extract`

_[object]_ - key-value pairs for extraction settings

### `glob`

_[object]_ - Settings related to glob file matching.

- **`max_total_size`** - _[integer]_ - Maximum total size (in bytes) matched by glob

- **`max_objects_matched`** - _[integer]_ - Maximum number of objects matched by glob

- **`max_objects_listed`** - _[integer]_ - Maximum number of objects listed in glob

- **`page_size`** - _[integer]_ - Page size for glob listing

### `batch_size`

_[string]_ - Size of a batch (e.g., '100MB')

## Additional properties when `connector` is [`salesforce`](./connectors#salesforce)

### `soql`

_[string]_ - SOQL query to execute against the Salesforce instance.

### `sobject`

_[string]_ - Salesforce object (e.g., Account, Contact) targeted by the query.

### `queryAll`

_[boolean]_ - Whether to include deleted and archived records in the query (uses queryAll API).
_[object]_ - Overrides any properties in production environment.
Loading