Page MenuHomePhabricator

Create a hidden revision tag for talk page comments
Closed, ResolvedPublic

Description

This task involves the work of extending the work we did in T274216, to create a way of explicitly tagging talk page comments.

Requirements

Meta

  • Anytime a "qualifying edit" is saved to a talk page assign a hidden discussiontools-added-comment revision tag.
  • The hidden discussiontools-added-comment revision tag should be applied to all "qualifying edits" regardless of the editing interface someone used to save said edit.

Qualifying edit
Any edit that adds a talk page comment, according to our talk page parser, using the same algorithm as for generating notifications. It's somewhat complicated (and still changing as we make improvements to the notifications), but hopefully intuitively obvious? Latest version of the code for reference: EventDispatcher.php.

To better support the use cases, there will be minor differences compared to the notifications:

  • Notifications are only generated for comments signed by the same user who makes the edit; the tag will also be applied if e.g. someone (or a bot) signs an unsigned comment.
  • Notifications are only generated for comments in sections that you can subscribe to; the tag will be applied even in those cases, e.g. for comments in the 0th section of the page (before the first heading, or on a page with no headings) or comments under a section other than level 2 (e.g. === ... === as opposed to the normal == ... ==).

Use cases

Explicitly tagging edits as comments will enable us to answer questions like the below which will help us all better understand how Discussion Tools are impacting the way people use talk pages.

  • "Of all the comments people posted in a specified period, what percentage of comments did people use the Reply Tool to publish?"
  • "How does the Reply Tool impact the average number of comments Junior and Senior Contributors" post on a talk page each month?"
  • "On average, how long does a discussion last (first to last timestamp)."
  • "On average, how many comments does a particular section have?"
  • "On average, how much time elapses between a conversation starting and another person commenting?"

Open questions

  • How should "Qualifying edits" be defined? @MNeisler needs to know this so she can know definitely what edits are and are not included in this definition
    • Being discussed in T284200#7152688.
    • To start, we're going to define "qualifying edits" using the same logic we used to identify new comments in the context of topic subscriptions. More in T262107#7222207.
  • Is it possible to apply tags to historical edits?
    • Tagging historical edits is out of scope for this initial implementation. Reason: we are going to first verify the tagging logic is precise enough to be valuable before extending it to historical edits.
  • What performance implications need to be considered?
    • Regarding the possible performance implications, we estimate there to be significantly fewer edits tagged as comments than there are edits tagged, as say, a mobile edits. [i] As such, we do not anticipate the Performance Team being concerned about this task. Although we've asked them to verify this assumption in T262107#7152858.
  • How can the software tag/categorize edits as comments and new sections in real-time (read: as they happen)?
    • This will be worked out as part the implementation of the ===Requirements above.
  • How might this tagging happen in a way that doesn't slow down the save process?
    • This will be worked out as part the implementation of the ===Requirements above.
  • How/where do these tags get stored (e.g. Special tags)?
    • This will be worked out as part the implementation of the ===Requirements above.

Testing instructions

At ar.wiki, en.wiki, he.wiki, bn.wiki, and ru.wiki do the following:

  1. Visit Special:Tags
  2. Locate the discussiontools-added-comment row
  3. Within the discussiontools-added-commentrow, find the Tagged changes colun
  4. Click the tagged changes link, this will open Special:RecentChanges
    • (note: there are a few known issues with hidden tags on Special:RecentChanges, we're working on them in another task: T281741)
  5. Pick a set of 5 random diffs
  6. Verify each diff contains a new comment
    • Ideally it's intuitive to you whether a diff contains a new comment or not. Tho, if you have questions, please post the link to the ambiguous diff in the ticket.

Done

  • All "Open questions" are answered
  • Logic being used to determine a "Qualifying edit" is documented in the ===Requirements section
  • "Qualifying edits" are being assigned a hidden change tag called discussiontools-added-comment

i. https://en.wikipedia.org/wiki/Special:Tags

Related Objects

Event Timeline

@Esanders raised the following open questions:

  • How might this tagging happen in a way that doesn't slow down the save process?
  • How/where do these tags get stored (e.g. Special tags)? We'd need to talk with performance team to see they're okay with this amount of data being stored.

Meta

  • In what format does Product Analytics need this comments to be stored in order for us to be able to answer questions like:
    • "On average, on how many comments do topics started on Junior Contributor talk pages receive?

@Esanders expressed a preference for categorizing comments by using change tags. An open questions remains about if this can be done in a performant way. We plan to pick this up in 2-Nov, two weeks after T252555 at which point we will know what the baseline page load times are and thus can decide if the appending tags at save is sustainable.

@Esanders confimed this should be good to go. One thing he mentioned is the need to add additional logging to ensure there aren't additional performance regressions. Will need to understand what tags we want to use, and if we use that tag for all replies or just non-DiscussionTools replies.

This task needs more details; I will fill them in and then move this back to "Ready for development."

10-February team discussion
We reached these conclusions:

  1. On the topic of how the work we are doing in T274216 could relate to this ticket:
    • We determined the logic the software needs to know when to send a notification to someone when a new comment is added in a section they are subscribed to could be extended to assign hidden comment tags to edits that are comments, as this ticket is requesting.
  2. On the topic of how we might determine how many comments a given talk page "has":
    • We determined the software is not currently able to relate comments to one another. This means is not currently possible for us to answer a question like, "On average, on how many comments do topics started on Junior Contributor talk pages receive?"

We arrived at these resulting actions:

  • Considering how the work we're doing in T274216 can be extended to this ticket, we're going to start work on this task once T274216 is resolved.
  • Considering the current limitation described in "2." above, we're going to de-scope the "On average, on how many comments do topics started on Junior Contributor talk pages receive?" question from this task and revisit it in T274834.
ppelberg renamed this task from [SPIKE] How might we categorize talk page edits at scale? to Create a hidden revision tag for talk page comments.Feb 16 2021, 1:31 AM
ppelberg updated the task description. (Show Details)
ppelberg moved this task from Backlog to Triaged on the DiscussionTools board.
ppelberg moved this task from Incoming to Upcoming on the Editing-team (Kanban Board) board.

13-April discussion with Ed
Action

  • As part of T264885, we started storing information about comments posted to talk pages. Can you please share what information about each comment is being stored and where/how this information is being stored?

Context: I ask the above wondering what – if any – changes might need to be made so we can use this new data we are logging to answer questions like those listed in the task description's ===Use cases section.

13-April discussion with Ed
Action

  • As part of T264885, we started storing information about comments posted to talk pages. Can you please share what information about each comment is being stored and where/how this information is being stored?

Context: I ask the above wondering what – if any – changes might need to be made so we can use this new data we are logging to answer questions like those listed in the task description's ===Use cases section.

Scratch the above. Let's answer this in T280100.

ppelberg updated the task description. (Show Details)

Performance-Team: the Editing Team is tagging you all on this task to make you aware of the work we plan to do to introduce a new edit tag [i] that will be associated with all talk page comments.

We estimate there will be significantly fewer edits tagged as talk page comments than there are/will be edits tagged, as say, a mobile edits. As such, we do NOT anticipate you all being concerned about this task.

Please comment if any aspect of the above reads to you as risky/unexpected/etc.


i. https://en.wikipedia.org/wiki/Special:Tags

In task description, @ppelberg wrote:

As such, we do anticipate the Performance Team being concerned about this task.

As such, we do NOT anticipate you all being concerned about this task.

Now which one? Do or do not?

Now which one? Do or do not?

Good catch. Thank you.

Use cases

Explicitly tagging edits as comments will enable us to answer questions like the below which will help us all better understand how Discussion Tools are impacting the way people use talk pages.

  • (…)
  • (…)
  • "On average, how long does a discussion last (first to last timestamp)."
  • "On average, how many comments does a particular section have?"
  • "On average, how much time elapses between a conversation starting and another person commenting?"

Some limitations to keep in mind:

  • The section titles are not recorded separately anywhere, only as part of the edit summary, which makes querying them a bit tricky and probably slower
  • Sections can be renamed, or can have sub-sections, which can't be detected based on just the revision history and tags
  • If someone edits using the wikitext editor and does not include the section title in the edit summary, you won't be able to tell at all that the comment was in that section

I think the proposed edit tag can still be useful despite that, but this might affect the results.

Is it possible to apply tags to historical edits?

Theoretically yes, but we'd need to write and run a custom maintenance script.

[...]
Theoretically yes, but we'd need to write and run a custom maintenance script.

I'd very much support a generic maintenance script for this. Something like mwscript applyChangeTags.php --wiki=xxwiki --change-tag='xxxx' --revs-file=revs.txt would be useful at least once in the past (I'd like to find some time to look into server side upload efficiency, and while I'm able to identify some of the past files uploaded through this procedure, it's hard to apply the tag to them).

We'd still need to generate that revs.txt file… For which we'd need a custom maintenance script ;)

We'd still need to generate that revs.txt file… For which we'd need a custom maintenance script ;)

I think this file can be easily generated with a single database query (at least with the naive definition: talk page comment is any edit to any page that's in a namespace with ID that's not dividable by 2). This query can be ran either through mysql.php maintenance script against production databases, or manually against one of the replicas (cloud/analytics).

We're trying to do the non-naive thing, and actually distinguish edits that add new comments, and edits that don't. To do that we intend to use the code from DiscussionTools that generates notifications (https://www.mediawiki.org/wiki/Talk_pages_project/Notifications), except that instead of generating notifications for new comments, it would add a tag if new comments are added.

Oh, that's very interesting! I still think implementing this as a generic script to manipulate tags is going to be helpful for other people -- since whatever the edit did is not going to change, delay between generation and running is acceptable. Just my 2 cents :).

What is our shared definition of a comment? Some approaches:
From PP:

TOPIC

COMMENT 1
: COMMENT 2
COMMENT 3

What is comment 3?
Comment 1 is different from Comment 3 and we should be able to differentiate between them, though they are both top level comments.
Comment 1 is only "privileged" only so far as it was the catalyzing comment.

  • How should "Qualifying edits" be defined? @MNeisler needs to know this so she can know definitely what edits are and are not included in this definition

As @matmarex suggested during today's standup, to start, we're going to define "qualifying edits" using the same logic we used to identify new comments in the context of topic subscriptions (T274216).

From there, we can iterate on the logic as needed.

I've updated the task description to include the above.

Change 706688 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@master] Create a hidden revision tag for talk page comments

https://gerrit.wikimedia.org/r/706688

Defining qualifying edit:
Any edit that adds a talk page comment, according to our talk page parser, using the same algorithm as for generating notifications. It's somewhat complicated (and still changing as we make improvements to the notifications), but hopefully intuitively obvious? Latest version of the code for reference: EventDispatcher.php.

To better support the use cases, there will be minor differences compared to the notifications:

  • Notifications are only generated for comments signed by the same user who makes the edit; the tag will also be applied if e.g. someone (or a bot) signs an unsigned comment.
  • Notifications are only generated for comments in sections that you can subscribe to; the tag will be applied even in those cases, e.g. for comments in the 0th section of the page (before the first heading, or on a page with no headings) or comments under a section other than level 2 (e.g. === ... === as opposed to the normal == ... ==).

Change 706688 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] Create a hidden revision tag for talk page comments

https://gerrit.wikimedia.org/r/706688

ppelberg claimed this task.