Migrate these queries to Airflow.
Data is used to populate:
https://analytics.wikimedia.org/dashboards/browsers
Done is:
- Queries are executed via Airflow
- Report updater job is turned off
Migrate these queries to Airflow.
Data is used to populate:
https://analytics.wikimedia.org/dashboards/browsers
Done is:
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T307500 Airflow Hackathon (May 2022) | |||
Open | None | T307540 [Airflow Migration] Migrate reportupdater jobs | |||
Resolved | amastilovic | T354552 [Maintenance] Migrate ReportUpdater browser queries to Airflow |
The suggested approach for this will be to use spark to run the queries after which result will be saved in the cluster. However, spark saves files in folder and we don't want to have different folders for each querry result. We want to put all the output files (report) in one location with is already rsynced to report server. Thus we would use our hdfsarchive operator to move the generated output from the spark output path to the final destination.
To start, we would migrate the queries in the browser folder first.
Change 995271 had a related patch set uploaded (by Snwachukwu; author: Snwachukwu):
[analytics/refinery/source@master] Add Dynamic Pivot job for reportupdater reports
Change 995740 had a related patch set uploaded (by Snwachukwu; author: Snwachukwu):
[analytics/refinery@master] Add Reportupdater Browser All Sites Queries.
We added the following:
ebysans opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/615
Add dag for browser All Sites ReportUpdater.
ebysans merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/615
Add dag for browser All Sites ReportUpdater.
Change 995740 merged by Snwachukwu:
[analytics/refinery@master] Add Reportupdater Browser All Sites Queries.
Change 1005129 had a related patch set uploaded (by Snwachukwu; author: Snwachukwu):
[analytics/refinery@master] Add spark sql version of browser mobile and desktop querries. 1. Modify browser all_site queries. 2. Add browser desktop and mobile queries.
Change 1005129 merged by Snwachukwu:
[analytics/refinery@master] Add spark sql version of browser mobile and desktop querries. 1. Modify browser all_site queries. 2. Add browser desktop and mobile queries.
ebysans opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/622
Update browser metric airflow job to use DataPivoter
Change 995271 merged by jenkins-bot:
[analytics/refinery/source@master] Add DataPivoter job
ebysans merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/622
Update browser metric airflow job to use DataPivoter
joal opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/666
Update analytics browser_metrics dag
joal closed https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/666
Update analytics browser_metrics dag
joal reopened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/666
Update analytics browser_metrics dag
joal merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/666
Update analytics browser_metrics dag