Page MenuHomePhabricator

Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users
Open, LowPublic0 Estimated Story Points

Description

As part of the Wikistats 2 project we have developed the Edit Data Lake (see https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits). The Edit Data Lake is a denormalized data store that is the best dataset we have had to date to answer questions about content and contributors. At this time this data is only available for the WMF in the private hadoop cluster.
This is the parent task for all the work to make the Data Lake data available on our public cloud infrastructure for our community at large; the more accessible that data is, the more impact it can have.

Related Objects

StatusSubtypeAssignedTask
OpenNone
DeclinedOttomata
ResolvedNone
Resolved Cmjohnson
ResolvedOttomata
ResolvedOttomata
DeclinedNone
ResolvedOttomata
ResolvedJAllemandou
ResolvedAndrew
DeclinedAndrew
DeclinedOttomata
DuplicateNone
Resolved Cmjohnson
OpenNone
Resolvedfnegri
Resolved Bstorm
Resolved Bstorm
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
Resolved Bstorm
Resolved Bstorm
ResolvedMoritzMuehlenhoff
ResolvedMarostegui
ResolvedMarostegui
Resolved Cmjohnson
Resolveddcaro
ResolvedMarostegui
ResolvedRequestwiki_willy
ResolvedRequest Cmjohnson
ResolvedRequest Cmjohnson
ResolvedRequest Cmjohnson
ResolvedRequest Cmjohnson
DeclinedNone
Resolved Kormat
ResolvedArielGlenn
Resolved Bstorm
Declined Bstorm
Resolved Bstorm
Resolved Bstorm
Resolved Jhernandez
Resolved razzi
ResolvedMarostegui
ResolvedMilimetric
Resolved Bstorm
Resolved Bstorm
Resolved Bstorm
Resolved Bstorm
Resolved Bstorm
ResolvedAndrew
Resolved Bstorm
Resolved ayounsi
Resolved Jhernandez
ResolvedMarostegui
ResolvedRagesoss
Resolved Bstorm
Resolved Bstorm
Resolved Bstorm
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedMilimetric
DeclinedNone
ResolvedJAllemandou
DeclinedNone
ResolvedJAllemandou
ResolvedMilimetric
ResolvedNone
DeclinedMilimetric
OpenNone
ResolvedMilimetric
DeclinedNone
OpenOttomata
OpenNoZeroDay

Event Timeline

Nuria renamed this task from Edit Data Lake available in labs: Mediawiki history snapshots available in SQL data store to Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to clod (labs) users.Sep 20 2018, 1:45 PM
Nuria updated the task description. (Show Details)
Milimetric renamed this task from Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to clod (labs) users to Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users.Sep 24 2018, 3:49 PM
Milimetric triaged this task as Medium priority.
Milimetric removed a project: Analytics-Kanban.
Milimetric moved this task from Incoming to Smart Tools for Better Data on the Analytics board.
Milimetric raised the priority of this task from Medium to High.Oct 18 2018, 5:36 PM
Milimetric added a project: Analytics-Kanban.
Milimetric set the point value for this task to 0.
Milimetric subscribed.

De-prioritizing until cloud infrastructure can support monitoring similar to what we can do in production.

Milimetric lowered the priority of this task from High to Low.Aug 31 2020, 5:01 PM
Milimetric moved this task from Deprioritized to Smart Tools for Better Data on the Analytics board.

This needs further consideration.

Sorry to poke an many years old ticket.. but what still needs to happen here? All of the subtasks have been resolved already.

@taavi many tickets were declined for complexity reasons, but we have new ways of potentially doing this. It needs to be prioritized though, so if you have desires/needs, please escalate them through https://www.mediawiki.org/wiki/Data_Platform_Engineering/Intake_Process

That seems unnecessarily complicated when I just want to get old stale tasks off of the Data-Services board. I'll just close this instead, if someone is interested in getting this through the process they're free to re-open this.

I'd like to keep this open, but I will remove the Data-Services tag. It is something we would really like to do.