Problem
Right now, in Airflow, we Data Engineering define our HQL dependencies via direct HDFS paths.
These HQL dependencies are not managed by our Airflow's artifact module.
Meaning, they do not get sync-ed to HDFS to be used by operators automatically at Airflow deployment time.
Thus, we need to deploy them manually, using refinery deployment, and make sure the HQL file paths match.
Why are we doing this?
We could currently define an artifact for each HQL file, and that would work, without the need of the extra refinery deployment.
However, this would require to define an artifact for each HQL file we have in our jobs, that would make probably a couple hundred artifacts.
Better solution
Instead, we could add support to treating full repositories as Airflow artifacts. We'd add the full repository URL as a i.e. git artifact,
and Airflow would sync (clone?) the repo to HDFS, like with the other artifacts.
Then from the DAG code, we could use the reference to that artifact as a base path, and append the path to the desired query after it.
This way we'd have way less artifacts defined, and still let Airflow manage our HQL dependencies (no refinery deployment).