Prototype new models to facilitate sockpuppet detection.
This task is scheduled to start in Q2. The preparation for it starts in Q1.
Prototype new models to facilitate sockpuppet detection.
This task is scheduled to start in Q2. The preparation for it starts in Q1.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • DarTar | T171251 [Objective 3.1.2] Models for sockpuppet and toxic discussion detection | |||
Resolved | Isaac | T171635 Prototype new models to facilitate sockpuppet detection | |||
Declined | leila | T172795 Document the procedure and workflow for sockpuppet detection | |||
Resolved | srijan | T172796 Create a research page for this project in meta | |||
Unknown Object (Task) | |||||
Declined | leila | T175103 Create a sketch of how sockpuppets are detected now | |||
Resolved | • DarTar | T176379 Conduct a small-scale checkusers survey | |||
Resolved | DED | T236299 Port sock-puppet detection model in-house |
Requested a list of attendees and meeting for syncing up and aligning directions/expectations between Research, Scoring Platform, and Community Tech, prior to reaching out to potential external collaborators.
Notes from our exploratory call with Srijan and multiple WMF teams on August 4: https://etherpad.wikimedia.org/p/sockpuppetdetection
Summary of results from the meeting: there's generally support for this research. Next steps:
Srijan says:
"I am already working with Tilen, a visiting PhD student (just like I once myself was :)), on an algorithm to identify bad users in any platform, including Wikipedia. Initial experiments show that the algorithm performs well, also on a Wikipedia vandal identification dataset. The idea is to use it to find any type of bad user, including sockpuppets. I will send you some slides tomorrow so that you get a high level overview.
The plan is get the basic framework of the algorithm done before Tilen leaves, which is in late Sept, and then tune it specifically for Wikipedia after that."
He has also asked if the tool will work on private data (I communicated that he should assume that's the case) and whether we can learn the details of the current process how sockpuppets are detected. I created a task for documentation on meta T172796 and figuring out procedures T172795.
Not the same thing, but one may want to be aware of T139810: RFC: Overhaul the CheckUser extension too.
@srijan Happy 2018! :)
I'm assigning this task to you as you're in charge of it. :) On our end, Dario will remain the point of contact. If you need my help at any point, just ping.
Update (No action needed):
Srijan and I met today (meeting notes) and we discussed the state of this task. The task is on a very good track given the complexity of it. Detecting sockpuppets is not an easy task. In the past months, the researchers have tried 3 models (A, B, and C under Model 1) and managed to bring the AUC from almost random (~0.5) to 0.72. Right now, they're working on Model 2. The biggest challenge at the moment is to improve the speed of Model 2 for Wikipedia (because the model relies on every single edit, quite some work is needed to speed it up). Given the state of the model and the work left, the current estimate is that we'd be able to test the new model (hopefully with much higher AUC) in May or June. This date may need an update if the results come out earlier or later. Let's not fix those dates as final in our operations, but that's the target.
Thank you for the update, @leila ! Excited to see the results of the new model.
Let me, @SPoore and/or @PEarleyWMF know if you need help getting feedback from CheckUsers or other sockpuppet hunters. As a reminder, the Anti-Harassment has committed to building a simple UI to help our users interface with the model, if needed.
@TBolliger thanks! How do you recommend we set aside time for your team to help? On the one hand, being able to test via the simple UI you refer to in May/June is very plausible from the state of research, on the other hand, it's not 100% clear that the research will be ready. Is there some way you can set aside some time on your team's end for this without locking resources completely until we know more?
I've created T189324: Build UI to validate sockpuppet model with users to track this work.
Our team works on 2-week sprints, so this task can interrupt us at any point. If the model is determined ineffective we can close this ticket as declined. But (more likely and hopefully!) when the model is ready we can set up a call or email to discuss further details (implementation & what we actually want users to validate.)
I also wanted to mention that I've included this work in our Q4 goals: https://www.mediawiki.org/wiki/Wikimedia_Audiences/2017-18_Q4_Goals
Do you think Q4 is a reasonable timeframe? Or should it be Q1?
@TBolliger It's hard to commit to Q4 from the research perspective as we may not be able to make it. It really depends on how the research goes (and I know this is deeply uncertain:/). You can call it out in Q4 goals as a stretch goal, or leave it out for Q1, and if you get to do it in Q4, we can still report it. Does this work for your workflow?
OK, we'll drop it for Q4. If things get way ahead of schedule we can still work on it :)
Update time. I have sent the following status update and recommendation for next steps email to a few folks. Putting it here as well for visibility for others.
Summary: we have a feature-based model based on public edit logs that
can predict whether two usernames are the same with performance of
~65%. We will talk with checkusers through Trust and Safety to see how
we can move forward in implementation and also see if they're
interested for the model to include features based on private data
which can enhance model performance.
Longer version:
Recommendation: we work with checkusers to implement the feature based model for them considering their workflows. We then add the private data to the model as a set of features if checkusers are interested.
I'll follow up with Trust and Safety about this now as I will need to talk about this with checkusers to see what they think is the best way to move forward.
Status update:
I expect the following couple of weeks be spent on gathering their feedback and understanding how/if iterations over the model are needed.
We have received the first feedback from a checkuser and we will need to change one thing in the set of predictions. At the moment, we include all accounts, including those who have not edited in the past 90 days, but this information is not actionable for checkusers as they can't use other (sometimes private) information that are kept and can be used for 90 days in these cases. It makes sense to remove these predictions, or give two outputs and let them filter by less-than-90-day-edit-activity or not.
We will create a new list and share with them.
I reassigned the task to DED. We're still working with srijan on this. The reassignment makes it clearer on our end who we can poke for status updates.
Weekly updates:
Weekly updates:
Weekly update:
Weekly update:
Weekly update:
@Isaac thank you for the update. When we meet next time to talk about the technical items, let's make sure we discuss roadmaps for this line of research and model (something you and I touched on a month ago and we thought in December there will be more clarity to act based on).
Weekly updates:
When we meet next time to talk about the technical items, let's make sure we discuss roadmaps for this line of research and model
Sounds good - I will add to the agenda
I'm going to close this task out unless there are any objections -- my work on this has largely been complete for a while now and no issues have come up yet in the productization that would require serious rework of the approach (though plenty of improvements have been made to the stability of the prototype). Future tasks that we might open are:
Summary: