Page MenuHomePhabricator

AnalyticsComponent
ArchivedPublic

Details

Description

ARCHIVED and superseded by Data-Engineering, see T287531: Create project tag for Data-Engineering.

Main place where the WMF Analytics Engineering team triages incoming work. If you tag something with Analytics it will go to the Incoming column and we will triage it usually within a week. For urgent problems, you can contact the team members on IRC or email.

Recent Activity

Fri, Nov 22

xcollazo closed T374341: Add support for Spark producers in Event Platform, a subtask of T214430: Event Platform: Stream Connectors, as Resolved.
Fri, Nov 22, 8:52 PM · Data-Engineering, Analytics, Goal, Services (watching), Event-Platform
MarcoSwart updated the task description for T380565: slider bar for period in wikistats is becoming unusable.
Fri, Nov 22, 9:26 AM · Data-Engineering, Analytics, Data-Engineering-Wikistats
MarcoSwart created T380565: slider bar for period in wikistats is becoming unusable.
Fri, Nov 22, 9:23 AM · Data-Engineering, Analytics, Data-Engineering-Wikistats

Thu, Nov 21

cjming moved T228175: [Metrics Platform] Event Platform Client Libraries from Incoming to Metrics Platform Backlog on the Data Products board.
Thu, Nov 21, 3:31 PM · Data Products, Analytics, Epic, Better Use Of Data, Product-Infrastructure-Team-Backlog-Deprecated

Mon, Nov 18

TJones renamed T294133: [EPIC] Expose rdf-streaming-updater.mutation content through EventStreams from Expose rdf-streaming-updater.mutation content through EventStreams to [EPIC] Expose rdf-streaming-updater.mutation content through EventStreams.
Mon, Nov 18, 4:44 PM · Epic, Discovery-Search (Current work), Data-Engineering, Event-Platform, Analytics, Wikidata, EventStreams

Thu, Nov 14

Maintenance_bot removed a project from T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas: Patch-For-Review.
Thu, Nov 14, 3:30 PM · Data-Engineering, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Analytics, Better Use Of Data
gerritbot added a comment to T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas.

Change #930665 abandoned by DCausse:

[machinelearning/liftwing/inference-services@main] events: drop support for /mediawiki/revision/create#1.x events

Reason:

lack of interest

https://gerrit.wikimedia.org/r/930665

Thu, Nov 14, 2:56 PM · Data-Engineering, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Analytics, Better Use Of Data

Tue, Nov 12

Ottomata added a parent task for T258511: Data Lake incremental Data Updates : T204950: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users.
Tue, Nov 12, 3:26 PM · Patch-For-Review, Analytics, Epic, Product-Analytics
dcausse added a comment to T294133: [EPIC] Expose rdf-streaming-updater.mutation content through EventStreams.

[...]
4 ... Update that entity using the following SPARQL Update query (assuming that DESCRIBE follows statement nodes, which we are currently implementing anyway, and is allowed inside a DELETE WHERE, which we could easily implement):

DELETE WHERE { DESCRIBE wd:Q73536234 }
INSERT DATA { <contents of Q73536234.2268889369.nt> }

Tue, Nov 12, 1:57 PM · Epic, Discovery-Search (Current work), Data-Engineering, Event-Platform, Analytics, Wikidata, EventStreams

Sat, Nov 9

Hannah_Bast added a comment to T294133: [EPIC] Expose rdf-streaming-updater.mutation content through EventStreams.

@dcausse Thank you for your reply. Do I understand you correctly that the current best way, or at least a feasible and correct way, for us to perform updates would be:

Sat, Nov 9, 10:06 AM · Epic, Discovery-Search (Current work), Data-Engineering, Event-Platform, Analytics, Wikidata, EventStreams

Fri, Nov 8

xcollazo changed the status of T374341: Add support for Spark producers in Event Platform, a subtask of T214430: Event Platform: Stream Connectors, from Open to In Progress.
Fri, Nov 8, 4:04 PM · Data-Engineering, Analytics, Goal, Services (watching), Event-Platform

Thu, Nov 7

dcausse added a comment to T294133: [EPIC] Expose rdf-streaming-updater.mutation content through EventStreams.

Hello,
I have a question maybe you can help.

Thu, Nov 7, 3:12 PM · Epic, Discovery-Search (Current work), Data-Engineering, Event-Platform, Analytics, Wikidata, EventStreams
Sina_az_9 added a comment to T294133: [EPIC] Expose rdf-streaming-updater.mutation content through EventStreams.

Hello,
I have a question maybe you can help.
I am trying to simulate updates happening in Wikidata using SPARQL update.
in some changes, for example this one 2 properties were added under reference of the property Filmweb.no film ID. I came up with this update command,

Thu, Nov 7, 2:06 PM · Epic, Discovery-Search (Current work), Data-Engineering, Event-Platform, Analytics, Wikidata, EventStreams

Wed, Nov 6

Ottomata added a comment to T228175: [Metrics Platform] Event Platform Client Libraries.

Can we decline / resolve this task?

Wed, Nov 6, 6:39 PM · Data Products, Analytics, Epic, Better Use Of Data, Product-Infrastructure-Team-Backlog-Deprecated
Ottomata renamed T228175: [Metrics Platform] Event Platform Client Libraries from Event Platform Client Libraries to [Metrics Platform] Event Platform Client Libraries.
Wed, Nov 6, 6:39 PM · Data Products, Analytics, Epic, Better Use Of Data, Product-Infrastructure-Team-Backlog-Deprecated
Ottomata added a subtask for T214430: Event Platform: Stream Connectors: T374341: Add support for Spark producers in Event Platform.
Wed, Nov 6, 6:38 PM · Data-Engineering, Analytics, Goal, Services (watching), Event-Platform

Tue, Nov 5

lmata edited projects for T266886: Augment NEL reports with a computed timestamp-of-generation, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Tue, Nov 5, 5:11 PM · SRE Observability (FY2024/2025-Q2), Observability-Logging, Data-Engineering-Icebox, Analytics

Mon, Nov 4

Maintenance_bot added a project to T275143: Schema tests should validate examples: Data-Engineering.
Mon, Nov 4, 3:30 PM · Data-Engineering, Event-Platform, Analytics

Nov 1 2024

Gehel closed T374918: Define a schema for the rdf-streaming-updater mutation stream, a subtask of T294133: [EPIC] Expose rdf-streaming-updater.mutation content through EventStreams, as Resolved.
Nov 1 2024, 9:51 AM · Epic, Discovery-Search (Current work), Data-Engineering, Event-Platform, Analytics, Wikidata, EventStreams

Oct 28 2024

phuedx removed a parent task for T267602: Client-side error logging should use Elastic Common Schema (ECS) fields when possible: T281999: Metrics Platform Schema: Define & Model Event Level Fields.
Oct 28 2024, 12:08 PM · Data-Engineering, Metrics Platform, Better Use Of Data, Analytics, Product-Data-Infrastructure, Event-Platform

Oct 25 2024

Ottomata moved T280017: Deploy schema repos to analytics cluster and use local uris for analytics jobs from Schemas to Components on the Event-Platform board.
Oct 25 2024, 1:38 PM · Data-Engineering, Event-Platform, Analytics
Ottomata moved T263049: Avoid extra HTTPS connections for most Event Platform beacons from Backlog to Components on the Event-Platform board.
Oct 25 2024, 1:38 PM · Wikimedia-Performance-recommendation, Data-Engineering, Better Use Of Data, Product-Data-Infrastructure, Analytics, Event-Platform
Ottomata merged task T274901: Stop using puppet + git pull for auto deployment of schema repos into T347421: [NEEDS GROOMING] schema services should be moved to k8s.
Oct 25 2024, 1:37 PM · Data-Engineering, Release-Engineering-Team (Radar), Event-Platform, Analytics
Ottomata moved T281483: mediawiki/page/properties-change schema should use map type for added and removed page properties from Backlog to Stream Data Products on the Event-Platform board.
Oct 25 2024, 1:35 PM · Data-Engineering, Event-Platform, Analytics
Ottomata moved T280017: Deploy schema repos to analytics cluster and use local uris for analytics jobs from Backlog to Schemas on the Event-Platform board.
Oct 25 2024, 1:35 PM · Data-Engineering, Event-Platform, Analytics
Ottomata moved T282057: [Event Platform] Add expiry info to mediawiki.page-restrictions-change stream from Backlog to Stream Data Products on the Event-Platform board.
Oct 25 2024, 1:34 PM · Data-Engineering, Platform Team Workboards (Clinic Duty Team), Analytics, Event-Platform
Ottomata moved T290211: EventStreams sending same data over and over (page links change) from Backlog to Stream Data Products on the Event-Platform board.
Oct 25 2024, 1:33 PM · Data-Engineering, Platform Engineering, Analytics, Event-Platform
Ottomata moved T267602: Client-side error logging should use Elastic Common Schema (ECS) fields when possible from Backlog to Stream Data Products on the Event-Platform board.
Oct 25 2024, 1:33 PM · Data-Engineering, Metrics Platform, Better Use Of Data, Analytics, Product-Data-Infrastructure, Event-Platform
Ottomata moved T291645: Produce ECS formatted logstash logs to Event Platform, allowing them to be queried in the WMF Data Lake with SQL from Backlog to Stream Data Products on the Event-Platform board.
Oct 25 2024, 1:33 PM · Observability-Logging, Analytics, Data-Engineering, Event-Platform, Wikimedia-Logstash, SRE
Ottomata moved T256891: EventGate and EventStreams rate limiting from Backlog to Components on the Event-Platform board.
Oct 25 2024, 1:29 PM · Data-Engineering, Event-Platform, Analytics
Ottomata updated the task description for T256891: EventGate and EventStreams rate limiting.
Oct 25 2024, 1:23 PM · Data-Engineering, Event-Platform, Analytics
Ottomata renamed T256891: EventGate and EventStreams rate limiting from EventGate throttling and DOS prevention to EventGate and EventStreams rate limiting.
Oct 25 2024, 1:21 PM · Data-Engineering, Event-Platform, Analytics
Ottomata added a comment to T256891: EventGate and EventStreams rate limiting.

Looks like this can be handled by service mesh stuff now:

Oct 25 2024, 1:17 PM · Data-Engineering, Event-Platform, Analytics
Ottomata moved T263049: Avoid extra HTTPS connections for most Event Platform beacons from Estimated (To be planned) to Candidates for next quarter on the Data-Engineering board.
Oct 25 2024, 1:16 PM · Wikimedia-Performance-recommendation, Data-Engineering, Better Use Of Data, Product-Data-Infrastructure, Analytics, Event-Platform
Ottomata moved T263049: Avoid extra HTTPS connections for most Event Platform beacons from Event Platform Backlog to Estimated (To be planned) on the Data-Engineering board.
Oct 25 2024, 1:16 PM · Wikimedia-Performance-recommendation, Data-Engineering, Better Use Of Data, Product-Data-Infrastructure, Analytics, Event-Platform
Ottomata merged T280256: [session length] Change domain of event collection to avoid ad-blocker issue into T263049: Avoid extra HTTPS connections for most Event Platform beacons.
Oct 25 2024, 1:15 PM · Wikimedia-Performance-recommendation, Data-Engineering, Better Use Of Data, Product-Data-Infrastructure, Analytics, Event-Platform

Oct 24 2024

taavi closed T204950: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users as Declined.

That seems unnecessarily complicated when I just want to get old stale tasks off of the Data-Services board. I'll just close this instead, if someone is interested in getting this through the process they're free to re-open this.

Oct 24 2024, 2:50 PM · Data-Engineering-Icebox

Oct 22 2024

Ottomata added a comment to T204950: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users.

@taavi many tickets were declined for complexity reasons, but we have new ways of potentially doing this. It needs to be prioritized though, so if you have desires/needs, please escalate them through https://www.mediawiki.org/wiki/Data_Platform_Engineering/Intake_Process

Oct 22 2024, 5:38 PM · Data-Engineering-Icebox
taavi added a comment to T204950: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users.

Sorry to poke an many years old ticket.. but what still needs to happen here? All of the subtasks have been resolved already.

Oct 22 2024, 5:27 PM · Data-Engineering-Icebox
taavi moved T204950: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users from Datasets to Backlog on the Data-Services board.
Oct 22 2024, 5:26 PM · Data-Engineering-Icebox
elukey closed T234234: Port architecture of irc-recentchanges to Kafka as Resolved.

To keep archives happy - in T376014 we moved irc.wikimedia.org's backed to https://github.com/paravoid/ircstream, a more modern stack that still uses UDP. It also supports EventStreams, and the next step will be to test/switch to the new event source and drop the UDP support. This task has several birthdays and it is not up-to-date, any further enhancement will be tracked in T376014.

Oct 22 2024, 10:42 AM · Data-Engineering, MW-1.41-notes (1.41.0-wmf.10; 2023-05-23), Event-Platform, User-Elukey, Analytics
elukey closed T240182: Create EventStream's equivalent to irc.wikimedia.org's #central channel, a subtask of T234234: Port architecture of irc-recentchanges to Kafka, as Declined.
Oct 22 2024, 10:32 AM · Data-Engineering, MW-1.41-notes (1.41.0-wmf.10; 2023-05-23), Event-Platform, User-Elukey, Analytics

Oct 17 2024

Ahoelzl moved T256891: EventGate and EventStreams rate limiting from Incoming (new tickets) to Event Platform Backlog on the Data-Engineering board.
Oct 17 2024, 5:47 PM · Data-Engineering, Event-Platform, Analytics
Ottomata removed a project from T159046: Track page views by page ID rather than title (handles moved pages): Data-Engineering.
Oct 17 2024, 5:16 PM · Data-Engineering, Data Products, AQS2.0, Pageviews-API, Analytics
Ottomata removed a project from T159046: Track page views by page ID rather than title (handles moved pages): Data-Engineering.
Oct 17 2024, 5:15 PM · Data-Engineering, Data Products, AQS2.0, Pageviews-API, Analytics
Ottomata moved T159046: Track page views by page ID rather than title (handles moved pages) from Incoming (new tickets) to Radar (External Teams) on the Data-Engineering board.
Oct 17 2024, 5:13 PM · Data-Engineering, Data Products, AQS2.0, Pageviews-API, Analytics
Ottomata removed a project from T159046: Track page views by page ID rather than title (handles moved pages): Data-Engineering.
Oct 17 2024, 5:08 PM · Data-Engineering, Data Products, AQS2.0, Pageviews-API, Analytics

Oct 10 2024

Ottomata renamed T291645: Produce ECS formatted logstash logs to Event Platform, allowing them to be queried in the WMF Data Lake with SQL from Integrate Event Platform and ECS logs to Produce ECS formatted logstash logs to Event Platform, allowing them to be queried in the WMF Data Lake with SQL.
Oct 10 2024, 1:19 PM · Observability-Logging, Analytics, Data-Engineering, Event-Platform, Wikimedia-Logstash, SRE
Ottomata updated the task description for T291645: Produce ECS formatted logstash logs to Event Platform, allowing them to be queried in the WMF Data Lake with SQL.
Oct 10 2024, 1:17 PM · Observability-Logging, Analytics, Data-Engineering, Event-Platform, Wikimedia-Logstash, SRE
matmarex added a comment to T291645: Produce ECS formatted logstash logs to Event Platform, allowing them to be queried in the WMF Data Lake with SQL.

I've been told that this project would let me process Logstash data with SQL queries, and I would like that very much.

Oct 10 2024, 11:31 AM · Observability-Logging, Analytics, Data-Engineering, Event-Platform, Wikimedia-Logstash, SRE