Page MenuHomePhabricator

amastilovic (Aleksandar Mastilovic)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Jan 20 2024, 12:05 AM (45 w, 3 d)
Availability
Available
IRC Nick
amastilovic
LDAP User
Aleksandar Mastilovic
MediaWiki User
AMastilovic-WMF [ Global Accounts ]

Recent Activity

Yesterday

amastilovic added a comment to T376267: ☂ Wikitech account linking and SUL error reporting .
Wikitech account/LDAP:AMastilovic-WMF
SUL accountAMastilovic-WMF
Account linked on IDMY
I have visited MediaWiki:LoginpromptY
I have tried to reset my password using Special:PasswordResetY
Mon, Dec 2, 10:51 PM · wikitech.wikimedia.org
amastilovic added a comment to T380614: Migrate the airflow-analytics database to Kubernetes.

Maybe we should just use the tool provided by Airflow itself? db clean --clean-before-timestamp for example.

Mon, Dec 2, 7:36 PM · Data-Platform-SRE

Wed, Nov 27

amastilovic updated the task description for T377770: [Update Pipeline] druid_load_editattemptstep.
Wed, Nov 27, 1:07 AM · DPE Temporary Accounts (Sprint 1)
amastilovic added a comment to T377770: [Update Pipeline] druid_load_editattemptstep.

Related eventgate schema: https://schema.wikimedia.org/repositories//secondary/jsonschema/analytics/legacy/editattemptstep/latest.yaml

Wed, Nov 27, 1:07 AM · DPE Temporary Accounts (Sprint 1)
amastilovic added a comment to T377770: [Update Pipeline] druid_load_editattemptstep.

Related Airflow-DAGs MR: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/943

Wed, Nov 27, 1:05 AM · DPE Temporary Accounts (Sprint 1)
amastilovic changed the status of T377770: [Update Pipeline] druid_load_editattemptstep, a subtask of T377293: [Epic] Modify DPE pipelines to account for Temp Accounts, from Open to In Progress.
Wed, Nov 27, 1:05 AM · Epic, DPE Temporary Accounts (Sprint 1)
amastilovic changed the status of T377770: [Update Pipeline] druid_load_editattemptstep from Open to In Progress.
Wed, Nov 27, 1:05 AM · DPE Temporary Accounts (Sprint 1)

Oct 23 2024

amastilovic created T378000: Write documentation on usage of RestExternalTaskSensor.
Oct 23 2024, 5:28 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
amastilovic added a comment to T368033: Design a suitable DAG deployment method.

@brouberol got it. You'll mount Ceph as a file system local to the Airflow instance, and HDFS sync will write to Ceph - effectively, to Airflow, this will look like a local fs directory being updated.

Oct 23 2024, 3:32 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), Data-Engineering

Oct 22 2024

amastilovic added a comment to T368033: Design a suitable DAG deployment method.

@brouberol automatic sync from Ceph every 5 minutes might cause some issues with the HDFS synchronizer. We need to think about the scenario where your 5 minute sync starts in the middle of HDFS synchronization to Ceph - in that case, you will only get a part of airflow-dag repository and Airflow will most likely choke. Sure, it would fix itself on the next run within 5 minutes but it would be much better if we could mitigate this scenario somehow.

Oct 22 2024, 5:05 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), Data-Engineering

Oct 17 2024

amastilovic added a comment to T322690: Add support for repository artifacts in Airflow.

Update: We've refactored the library to support cache_key_fn config parameter, which enabled us to get rid of FsVersionedArtifactCache in favor of simply having one class FsArtifactCache:

Oct 17 2024, 11:06 PM · Data-Engineering, Data Pipelines

Oct 16 2024

amastilovic added a member for DPE Temporary Accounts: amastilovic.
Oct 16 2024, 3:58 PM

Oct 15 2024

amastilovic added a comment to T322690: Add support for repository artifacts in Airflow.

Cool! How will this be used via artifact.yaml config?

Oct 15 2024, 11:12 PM · Data-Engineering, Data Pipelines
amastilovic updated subscribers of T322690: Add support for repository artifacts in Airflow.

But for this MR, what do you think of limiting the change to just restricting to fsspec, perhaps, perhaps by just renaming FsArtifactSource and FsArtifactCache to ArtifactSource and ArtifactCache and removing those abstract base classes? Then I think your refactor which removes the ArtifactCache.open abstract method will just work as is.

Oct 15 2024, 4:01 PM · Data-Engineering, Data Pipelines

Oct 10 2024

amastilovic added a comment to T322690: Add support for repository artifacts in Airflow.

@Ottomata @mforns I think we should expand the scope of this refactor to include redefining the relationships between Artifact, ArtifactLocator, ArtifactSource and ArtifactCache, too. The current design is cumbersome and unintuitive IMHO, for the following reasons:

  • ArtifactSource is defined/constructed through class name and base_uri which is optional, but in practice base_uri is not optional and points to either a directory URI or to an actual instance of Artifact. However, since it extends ArtifactLocator, all of its methods require an Artifact instance as a parameter.
  • ArtifactCache is defined through class name and base_uri which is again defined as optional, but in practice it's actually required. All of its methods also require an instance of Artifact as a parameter.
  • Even though ArtifactLocators all de-facto depend on an Artifact, the Artifact class itself introduces a hard dependency on ArtifactLocators through its source and caches arguments.
Oct 10 2024, 11:21 PM · Data-Engineering, Data Pipelines
amastilovic updated the task description for T370368: [NEEDS GROOMING] We should improve the code health of gobblin-wmf.
Oct 10 2024, 6:08 PM · Data-Engineering, Event-Platform

Oct 7 2024

amastilovic added a comment to T372647: Implement Airflow Dataset class for RestExternalTaskSensor.

I've published a draft MR that, as far as I can tell, implements most of support for the produced_by configuration of a dataset. The code correctly recognizes, parses and models the produced_by section, and inserts a DatasetProducer object into a Dataset object. This DatasetProducer object supports its own implementation of the get_sensor_for method that takes precedence over Dataset's normal method, if the DatasetProducer is available.

Oct 7 2024, 8:18 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Dumps 2.0 (Kanban Board)

Oct 2 2024

amastilovic closed T375390: [gobblin-wmf] Update parent POM to a more recent version, and make sure that the verify maven goal passes as Resolved.
Oct 2 2024, 8:59 PM · Data-Engineering (Q1 2024 July 1st - September 30th)

Sep 27 2024

amastilovic added a comment to T372647: Implement Airflow Dataset class for RestExternalTaskSensor.

it does not care about whether there's an actual file in the s3 bucket, or whether the contents of the file are correct.

Ah! This was the bit I was missing. I was trying to understand how a Hive dataset would work here.

So, there is no data aware sensing using Airflow Datasets, it is 100% airflow task state.

Sep 27 2024, 10:49 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Dumps 2.0 (Kanban Board)
amastilovic added a comment to T368033: Design a suitable DAG deployment method.

@amastilovic I'm interested! Do you have something I could read to know more?

Sep 27 2024, 10:32 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), Data-Engineering
amastilovic added a comment to T375716: Ensure the Airflow API can be reached out to from within Kubernetes and is authenticated.

How would Kerberos authentication play out when it comes to different Airflow instances communicating with each other? Would each instance get a Kerberos ticket automatically, and then use that to communicate to other instances?

Sep 27 2024, 9:14 PM · Data-Platform-SRE (2024.10.19 - 2024.11.08)
amastilovic added a comment to T368033: Design a suitable DAG deployment method.

Additional thought: once we migrate to KubernetesExecutor instead of LocalExecutor, the dags repo would get cloned by git-sync at the start of each task (ie Pod), causing us to have a hard reliance on gitlab being up. I think we should have an asynchronous process that clones the repo via git-sync (using a sha, and not a branch name), and exports the data to, say, s3. Then "pulling the DAGs repo" would basically become a s3 cp operation, causing us to move the dependency from gitlab to Ceph/RadosGW.

Sep 27 2024, 5:08 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), Data-Engineering
amastilovic added a comment to T375878: Clean Airflow database of old entries.

@amastilovic this is part of the answer as to how we keep PG as lowly loaded as possible.

As an experienced Airflow WMF user, would you have an idea as to what a good retention would look like for you? 30 days? More?

Sep 27 2024, 4:35 PM · Data-Platform-SRE (2024.09.28 - 2024.10.18)

Sep 26 2024

amastilovic added a comment to T372647: Implement Airflow Dataset class for RestExternalTaskSensor.

Things are not settled for sure. The version of Airflow we use already supports the notion of Dataset (Data Asset is basically a rename+upgrade of a Dataset) and I think even as of right now we could use that for "event-driven pipelines" that react to Datasets being updated.

Sep 26 2024, 10:17 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Dumps 2.0 (Kanban Board)

Sep 25 2024

amastilovic added a comment to T372647: Implement Airflow Dataset class for RestExternalTaskSensor.

Currently, each instance specifies its own datasets.yaml files in its “config” directory. This approach is problematic due to a couple of reasons:

Sep 25 2024, 10:16 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Dumps 2.0 (Kanban Board)
amastilovic added a comment to T372647: Implement Airflow Dataset class for RestExternalTaskSensor.

OK, so the schema I proposed in the Google Doc looks like this:

Sep 25 2024, 10:10 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Dumps 2.0 (Kanban Board)
amastilovic added a comment to T372647: Implement Airflow Dataset class for RestExternalTaskSensor.

I've posted a draft proposal for the implementation design of ideas/needs described in this ticket: https://docs.google.com/document/d/1lapUHpWY2rm9si1iupRYHTcsDY-_QSbLcL03MXOSOlE/edit

Sep 25 2024, 10:03 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Dumps 2.0 (Kanban Board)

Sep 23 2024

amastilovic updated the task description for T370368: [NEEDS GROOMING] We should improve the code health of gobblin-wmf.
Sep 23 2024, 6:03 PM · Data-Engineering, Event-Platform
amastilovic updated the task description for T375390: [gobblin-wmf] Update parent POM to a more recent version, and make sure that the verify maven goal passes.
Sep 23 2024, 2:47 PM · Data-Engineering (Q1 2024 July 1st - September 30th)
amastilovic added a comment to T375390: [gobblin-wmf] Update parent POM to a more recent version, and make sure that the verify maven goal passes.

Project migrated to GitLab: https://gitlab.wikimedia.org/repos/data-engineering/gobblin-wmf/-/merge_requests/1

Sep 23 2024, 2:45 PM · Data-Engineering (Q1 2024 July 1st - September 30th)
amastilovic created T375390: [gobblin-wmf] Update parent POM to a more recent version, and make sure that the verify maven goal passes.
Sep 23 2024, 2:44 PM · Data-Engineering (Q1 2024 July 1st - September 30th)
amastilovic moved T371994: Deploy the HDFS synchronizer (blunderbuss) service to the dse-k8s cluster from Next Up to In progress on the Data-Engineering (Q1 2024 July 1st - September 30th) board.
Sep 23 2024, 2:40 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Patch-For-Review, Data-Engineering (Q2 2024 October 1st - December 31th)
amastilovic moved T372647: Implement Airflow Dataset class for RestExternalTaskSensor from Next Up to In progress on the Data-Engineering (Q1 2024 July 1st - September 30th) board.
Sep 23 2024, 2:37 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Dumps 2.0 (Kanban Board)
amastilovic moved T357430: Airflow mapped tasks UI & metrics from In progress to Blocked/Paused on the Data-Engineering (Q1 2024 July 1st - September 30th) board.
Sep 23 2024, 2:31 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
amastilovic moved T356762: [Refine refactoring] Refine jobs should be scheduled by Airflow: implementation from In progress to In Review on the Data-Engineering (Q1 2024 July 1st - September 30th) board.
Sep 23 2024, 2:31 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Patch-For-Review
amastilovic moved T342911: Data Quality Issue: Wikitext History Job fail / rerun in Airflow from Ready to Deploy to Done on the Data-Engineering (Q1 2024 July 1st - September 30th) board.
Sep 23 2024, 2:18 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Movement-Metrics, Movement-Insights
amastilovic moved T342911: Data Quality Issue: Wikitext History Job fail / rerun in Airflow from In progress to Ready to Deploy on the Data-Engineering (Q1 2024 July 1st - September 30th) board.
Sep 23 2024, 2:18 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Movement-Metrics, Movement-Insights
amastilovic moved T363587: [Event Platform] Instrument EventBus with prometheus MW Statslib from Ready to Deploy to Done on the Data-Engineering (Q1 2024 July 1st - September 30th) board.
Sep 23 2024, 2:17 PM · MW-1.43-notes (1.43.0-wmf.22; 2024-09-10), Data-Engineering (Q1 2024 July 1st - September 30th), Dumps 2.0 (Kanban Board), Event-Platform

Sep 17 2024

amastilovic added a project to T371994: Deploy the HDFS synchronizer (blunderbuss) service to the dse-k8s cluster: Data-Platform-SRE.
Sep 17 2024, 7:39 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Patch-For-Review, Data-Engineering (Q2 2024 October 1st - December 31th)

Sep 16 2024

amastilovic added a comment to T372014: Problem deploying - missing airflow_client dependency.

This was resolved some time ago when SRE released updated Airflow Debian package.

Sep 16 2024, 10:30 PM · Dumps 2.0 (Kanban Board), Data-Engineering (Q1 2024 July 1st - September 30th)

Sep 5 2024

amastilovic edited projects for T367404: Update parent pom to disable fetching dependencies from Archiva and use Gitlab instead, added: Data-Engineering (Q1 2024 July 1st - September 30th); removed Data-Engineering.
Sep 5 2024, 4:53 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Data-Engineering (Q1 2024 July 1st - September 30th), Patch-For-Review, Java-Scala-Standardization, Discovery-Search
amastilovic added a project to T369901: Migrate wmf-jvm-parent-pom and supporting components to the Maven group on Gitlab: Data-Engineering (Q1 2024 July 1st - September 30th).
Sep 5 2024, 4:21 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Data-Engineering (Q1 2024 July 1st - September 30th), Patch-For-Review, Java-Scala-Standardization

Aug 23 2024

amastilovic added a comment to T372647: Implement Airflow Dataset class for RestExternalTaskSensor.

Maybe just change the name of the file to data_dependencies.yaml and the module to DataDependency?
And then the user would use it like data_dependency("data-dependency-name").get_sensor_for(dag)?
With this, the semantic weirdness would be solved, no?

Aug 23 2024, 7:09 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Dumps 2.0 (Kanban Board)

Aug 7 2024

amastilovic updated subscribers of T365659: Implement automatic sync of refinery HQL files to HDFS.

I talked to @BTullis about obtaining a functional test environment that would mimic the real world this service would be operating in, and he kindly provided a list of things to do in order to build such an environment. The list is in the subtask ticket https://phabricator.wikimedia.org/T371994

Aug 7 2024, 6:22 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
amastilovic added a subtask for T365659: Implement automatic sync of refinery HQL files to HDFS: T371994: Deploy the HDFS synchronizer (blunderbuss) service to the dse-k8s cluster.
Aug 7 2024, 5:36 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
amastilovic added a parent task for T371994: Deploy the HDFS synchronizer (blunderbuss) service to the dse-k8s cluster: T365659: Implement automatic sync of refinery HQL files to HDFS.
Aug 7 2024, 5:36 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Patch-For-Review, Data-Engineering (Q2 2024 October 1st - December 31th)
amastilovic created T371994: Deploy the HDFS synchronizer (blunderbuss) service to the dse-k8s cluster.
Aug 7 2024, 5:35 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Patch-For-Review, Data-Engineering (Q2 2024 October 1st - December 31th)
amastilovic updated the task description for T360968: [Developer Experience] [SPIKE] Investigate process to automate deployment of folders and artifacts to HDFS.
Aug 7 2024, 5:23 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Release-Engineering-Team, Spike
amastilovic updated the task description for T360922: [Status Store] [SPIKE] Investigate and document approach for Iceberg Sensors.
Aug 7 2024, 4:53 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Dumps 2.0 (Kanban Board), Spike
amastilovic updated the task description for T369900: Develop Airflow ExternalTaskSensor to orchestrate DAG dependencies.
Aug 7 2024, 3:57 PM · Dumps 2.0 (Kanban Board), Data-Engineering (Q1 2024 July 1st - September 30th)
amastilovic updated the task description for T369900: Develop Airflow ExternalTaskSensor to orchestrate DAG dependencies.
Aug 7 2024, 3:57 PM · Dumps 2.0 (Kanban Board), Data-Engineering (Q1 2024 July 1st - September 30th)

Jul 25 2024

amastilovic moved T369851: NEW BUG REPORT Mediawiki_history contains duplicate rows for some revisions from Next Up to In progress on the Data-Engineering (Q1 2024 July 1st - September 30th) board.
Jul 25 2024, 4:27 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Movement-Insights, Analytics-Data-Problem, Data-Platform

Jul 22 2024

amastilovic added a comment to T367403: Validate CI integration so that Ci can release Maven artifacts on user's demand.

Also, could we add a settings.xml file with the following contents to the Docker image? It's necessary for Maven release plugin to interact with GitLab:

Jul 22 2024, 4:42 PM · Discovery-Search (Current work), Release-Engineering-Team (Radar), Data-Engineering (Q1 2024 July 1st - September 30th), Java-Scala-Standardization, Data-Platform-SRE
amastilovic updated the task description for T369900: Develop Airflow ExternalTaskSensor to orchestrate DAG dependencies.
Jul 22 2024, 2:39 PM · Dumps 2.0 (Kanban Board), Data-Engineering (Q1 2024 July 1st - September 30th)
amastilovic updated the task description for T369900: Develop Airflow ExternalTaskSensor to orchestrate DAG dependencies.
Jul 22 2024, 2:38 PM · Dumps 2.0 (Kanban Board), Data-Engineering (Q1 2024 July 1st - September 30th)

Jul 11 2024

amastilovic updated the task description for T360922: [Status Store] [SPIKE] Investigate and document approach for Iceberg Sensors.
Jul 11 2024, 4:50 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Dumps 2.0 (Kanban Board), Spike

Jul 8 2024

amastilovic added a comment to T360968: [Developer Experience] [SPIKE] Investigate process to automate deployment of folders and artifacts to HDFS.

I've considered the option of pulling from the git origin into the destination HDFS, albeit not using a systemd timer. I've actually done something similar before in previous jobs/roles, by mounting HDFS onto a local file system, but I don't think this is a viable solution for a number of reasons:

Jul 8 2024, 11:34 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Release-Engineering-Team, Spike

Jul 2 2024

amastilovic added a comment to T367391: Setup a test project to validate upload to the Gitlab package registry.

Seconded about the .test in the group ID - do we really need that? I believe the group ID should simply be org.wikimedia and then we can have the test part in the artifact ID.

Jul 2 2024, 4:55 PM · Data-Platform-SRE (2024.08.17 - 2024.09.06), Release-Engineering-Team (Radar), User-brennen, Java-Scala-Standardization

Jun 24 2024

amastilovic added a comment to T368033: Design a suitable DAG deployment method.

In some ways, the fundamental question is: do we want to move to a continuous deployment model, or do we want to retain manual deployment.

Jun 24 2024, 11:05 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), Data-Engineering
amastilovic added a comment to T368033: Design a suitable DAG deployment method.

My feeling is that, at our scale and given its proximity to the dse-k8s cluster, Ceph will be just fine for these DAG volumes.

Jun 24 2024, 11:04 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), Data-Engineering

Jun 12 2024

amastilovic set IRC Nick to amastilovic on amastilovic.
Jun 12 2024, 9:50 PM

Jun 10 2024

amastilovic assigned T367116: mw-page-content-change-enrich flink app is missing in k8s staging to gmodena.
Jun 10 2024, 9:24 PM · Data-Platform-SRE (2024.06.17 - 2024.07.07), Data-Engineering, Event-Platform
amastilovic added a comment to T367073: Requesting access to Kubernetes deployment for amastilovic.

Merged and applied - done

Jun 10 2024, 5:37 PM · Data-Engineering, SRE, SRE-Access-Requests
amastilovic created T367073: Requesting access to Kubernetes deployment for amastilovic.
Jun 10 2024, 3:39 PM · Data-Engineering, SRE, SRE-Access-Requests

Jun 6 2024

amastilovic updated the task description for T365659: Implement automatic sync of refinery HQL files to HDFS.
Jun 6 2024, 11:17 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
amastilovic updated subscribers of T360968: [Developer Experience] [SPIKE] Investigate process to automate deployment of folders and artifacts to HDFS.

Q: Have we discussed these ideas with Release Engineering folks? They are currently working on a similar CD project, but it might be MediaWiki focused only.

Jun 6 2024, 11:11 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Release-Engineering-Team, Spike

Jun 5 2024

JAllemandou awarded T360922: [Status Store] [SPIKE] Investigate and document approach for Iceberg Sensors a Love token.
Jun 5 2024, 3:17 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Dumps 2.0 (Kanban Board), Spike

Jun 4 2024

amastilovic updated the task description for T365659: Implement automatic sync of refinery HQL files to HDFS.
Jun 4 2024, 11:19 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
amastilovic closed T365382: Move reportupdater reports away from their local filesystem locations as Resolved.

This ticket has been resolved, the tasks from the ticket definition have been performed on an-launcher1002 (an-launcher1001 has been decommissioned).

Jun 4 2024, 9:28 PM · Data-Engineering (Q4 2024 April 1st - June 30th)

May 22 2024

amastilovic created T365659: Implement automatic sync of refinery HQL files to HDFS.
May 22 2024, 11:10 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
amastilovic added a comment to T354552: [Maintenance] Migrate ReportUpdater browser queries to Airflow.

@lbowmaker this task can be closed.

May 22 2024, 11:05 PM · Patch-For-Review, Data-Engineering (Q4 2024 April 1st - June 30th)
amastilovic updated subscribers of T362699: Update converted reportupdater DAG queries to correct CSV options.

@lbowmaker this task can be closed.

May 22 2024, 11:05 PM · Data-Engineering (Q4 2024 April 1st - June 30th)
amastilovic added a comment to T357372: [Maintenance] Migrate pingback to Airflow.

@lbowmaker this task can be closed.

May 22 2024, 11:05 PM · Data-Engineering (Q4 2024 April 1st - June 30th)
amastilovic added a comment to T357938: [Maintenance] Migrate wmcs to Airflow.

@lbowmaker this task can be closed.

May 22 2024, 11:04 PM · Data-Engineering (Q4 2024 April 1st - June 30th)
amastilovic created T365658: Update Airflow Developer Guide on WikiTech.
May 22 2024, 11:03 PM · Data-Engineering

May 20 2024

amastilovic added a comment to T365201: PHP 8.3 missing (showing as other?) on https://pingback.wmflabs.org/#php-version.

We are attempting to resolve this issue in this ticket: T365382

May 20 2024, 9:18 PM · Data-Engineering, MediaWiki-General
amastilovic added a comment to T365382: Move reportupdater reports away from their local filesystem locations.

No update still - let's wait for a bit and see what happens, the sync refreshment period might be daily as opposed to hourly.

May 20 2024, 8:30 PM · Data-Engineering (Q4 2024 April 1st - June 30th)
amastilovic created T365382: Move reportupdater reports away from their local filesystem locations.
May 20 2024, 5:30 PM · Data-Engineering (Q4 2024 April 1st - June 30th)

May 16 2024

amastilovic added a comment to T364487: Airflow DAG (hdfs_usage_weekly) failed with no details in the application log.

This task can be closed as the issue has been fixed and changes to the DAG have been merged.

May 16 2024, 7:53 PM · Data-Engineering

May 13 2024

amastilovic added a comment to T364487: Airflow DAG (hdfs_usage_weekly) failed with no details in the application log.

What's the longer-term location for the log4j properties file name? Presumably we don't want to leave the file name as aqu-log4j.properties within any folder?

May 13 2024, 7:00 PM · Data-Engineering

May 9 2024

amastilovic added a comment to T364487: Airflow DAG (hdfs_usage_weekly) failed with no details in the application log.

MR to switch to using DagProperties: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/689

May 9 2024, 7:03 PM · Data-Engineering

May 8 2024

amastilovic added a comment to T364487: Airflow DAG (hdfs_usage_weekly) failed with no details in the application log.

The issue was in the path to the configured log4j.properties file in Airflow UI, hdfs:///user/aqu/aqu-log4j.properties was not accessible by the Airflow user analytics.

May 8 2024, 9:37 PM · Data-Engineering
amastilovic claimed T364487: Airflow DAG (hdfs_usage_weekly) failed with no details in the application log.
May 8 2024, 9:33 PM · Data-Engineering

Apr 26 2024

amastilovic claimed T357938: [Maintenance] Migrate wmcs to Airflow.
Apr 26 2024, 8:58 PM · Data-Engineering (Q4 2024 April 1st - June 30th)

Apr 17 2024

amastilovic created T362832: Migrate refinery HQL files to CI/CD supported GitLab repository.
Apr 17 2024, 10:23 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
amastilovic added a comment to T362699: Update converted reportupdater DAG queries to correct CSV options.

Hi @amastilovic, can you please associate one or more active project tags with this task (via the Add Action...Change Project Tags dropdown)? That will allow to see a task when looking at project workboards or searching for tasks in certain projects, and get notified about a task when watching a related project tag. Thanks!

Apr 17 2024, 10:17 PM · Data-Engineering (Q4 2024 April 1st - June 30th)
amastilovic added a project to T362699: Update converted reportupdater DAG queries to correct CSV options: Data-Engineering (Q4 2024 April 1st - June 30th).
Apr 17 2024, 10:16 PM · Data-Engineering (Q4 2024 April 1st - June 30th)

Apr 16 2024

amastilovic created T362699: Update converted reportupdater DAG queries to correct CSV options.
Apr 16 2024, 5:01 PM · Data-Engineering (Q4 2024 April 1st - June 30th)

Apr 9 2024

amastilovic created T362201: Fix and validate browser report DAG and queries.
Apr 9 2024, 10:43 PM · Data-Engineering (Q4 2024 April 1st - June 30th)

Feb 16 2024

amastilovic created T357803: [Maintenance] Migrate pingback ReportUpdater job.
Feb 16 2024, 6:22 PM · Data-Engineering

Jan 29 2024

amastilovic added a comment to T355606: Requesting analytics-privatedata-users access for amastilovic.

I need access to the following (from the wiki page you provided):

Jan 29 2024, 9:58 PM · Patch-For-Review, SRE, SRE-Access-Requests
amastilovic created T356106: Requesting Kerberos access for amastilovic.
Jan 29 2024, 8:10 PM

Jan 25 2024

amastilovic added a comment to T355607: Grant Access to ldap/wmf for Aleksandar Mastilovic.

@Arnoldokoth @Dzahn thank you!

Jan 25 2024, 8:55 PM · Patch-For-Review, SRE, LDAP-Access-Requests

Jan 22 2024

amastilovic created T355607: Grant Access to ldap/wmf for Aleksandar Mastilovic.
Jan 22 2024, 9:41 PM · Patch-For-Review, SRE, LDAP-Access-Requests
amastilovic created T355606: Requesting analytics-privatedata-users access for amastilovic.
Jan 22 2024, 9:34 PM · Patch-For-Review, SRE, SRE-Access-Requests