Page MenuHomePhabricator

[Maintenance] Migrate ReportUpdater browser queries to Airflow
Closed, ResolvedPublic8 Estimated Story Points

Description

Migrate these queries to Airflow.

https://gerrit.wikimedia.org/r/plugins/gitiles/analytics/reportupdater-queries/+/refs/heads/master/browser/

Data is used to populate:

https://analytics.wikimedia.org/dashboards/browsers

Done is:

  • Queries are executed via Airflow
  • Report updater job is turned off

Event Timeline

Ahoelzl renamed this task from Migrate browser queries to Airflow to [Maintenance] Migrate ReportUpdater browser queries to Airflow.Jan 9 2024, 5:43 PM

The suggested approach for this will be to use spark to run the queries after which result will be saved in the cluster. However, spark saves files in folder and we don't want to have different folders for each querry result. We want to put all the output files (report) in one location with is already rsynced to report server. Thus we would use our hdfsarchive operator to move the generated output from the spark output path to the final destination.
To start, we would migrate the queries in the browser folder first.

Change 995271 had a related patch set uploaded (by Snwachukwu; author: Snwachukwu):

[analytics/refinery/source@master] Add Dynamic Pivot job for reportupdater reports

https://gerrit.wikimedia.org/r/995271

Change 995740 had a related patch set uploaded (by Snwachukwu; author: Snwachukwu):

[analytics/refinery@master] Add Reportupdater Browser All Sites Queries.

https://gerrit.wikimedia.org/r/995740

We added the following:

  • A spark-scala job to perform dynamic pivot because some reports need to be pivoted.
  • A UDF to get the start of a week given a date. This will be used in the querries since we are getting the aggregate from the very beginning each time we generate reports.

Change 995740 merged by Snwachukwu:

[analytics/refinery@master] Add Reportupdater Browser All Sites Queries.

https://gerrit.wikimedia.org/r/995740

Change 1005129 had a related patch set uploaded (by Snwachukwu; author: Snwachukwu):

[analytics/refinery@master] Add spark sql version of browser mobile and desktop querries. 1. Modify browser all_site queries. 2. Add browser desktop and mobile queries.

https://gerrit.wikimedia.org/r/1005129

Change 1005129 merged by Snwachukwu:

[analytics/refinery@master] Add spark sql version of browser mobile and desktop querries. 1. Modify browser all_site queries. 2. Add browser desktop and mobile queries.

https://gerrit.wikimedia.org/r/1005129

Change 995271 merged by jenkins-bot:

[analytics/refinery/source@master] Add DataPivoter job

https://gerrit.wikimedia.org/r/995271