User Details
- User Since
- Aug 8 2017, 10:56 AM (380 w, 1 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Diego (WMF) [ Global Accounts ]
Fri, Nov 8
Progress update on the hypothesis for the week
Thu, Nov 7
I can confirm that this is the expected behavior.
Wed, Nov 6
@XiaoXiao-WMF this task is high priority for SDS 1.2.3, please let me know how to proceed.
Tue, Nov 5
@Samwalton9 and probably @KCVelaga_WMF can comment on this.
My understanding is that we need to work on this two things in parallel. The first one is to be able to not stop the work SDS 1.2.3 and the second one is to fully accomplish the goal of this project.
@fkaelin can provide more details.
Fri, Nov 1
Progress update on the hypothesis for the week
Oct 18 2024
Weekly update:
Oct 16 2024
Thanks @achou , and also to @Aitolkyn and @MunizaA , you all did amazing work on making this model faster! The speedup is really impressive and you used cutting edge methods for making this possible. This improvement makes a huge difference from the final user perspective, and specially for the WME use case.
Oct 14 2024
@Miriam pls confirm or update the parent task.
Hi @leila , thanks for your words.
I'm optimistic about the project ending on time, as you said we have a great team. I just highlighted the time constrains to explain why we are focusing on offline data for this quarter. However, depending on how these definitions are going to be used in the future, it would be interesting to think how they can work with live data (like real time monitoring), but for now, this is out of the scope for this quarter.
Oct 11 2024
Progress update on the hypothesis for the week
- We have define the list of participants in this project. Apart from myself this includes two 3 people from research (Isaac, Pablo and Yu-Ming), 2 from design research (Claudia and Eli), one from Moderation Tools team (Sam), one Product Analytics (KC) and one from Product Design (Olga T.)
- Given the size of the team, we decided to split the work in two branches, a qualitative piece lead by Claudia, and a quantitative lead by Isaac.
- Together with the KR owner (Leila) the hypothesis was defined as: If we combine existing knowledge about moderators with quantitative methods for detecting moderation activity, we can systematically define and identify Wikipedia moderators.
Oct 9 2024
Oct 7 2024
Oct 4 2024
Oct 2 2024
Confirm if the hypothesis was supported or contradicted
Sep 13 2024
Progress update
- I’m working on building a set of keywords related to peacock behavior and promotional tone. To do this, I’m using a TF-IDF approach, a well-known method to identify terms (keywords) that characterize a set of documents.
- This and next week are short for me (taking several days off), so it might take a bit more time to finalize this.
- I also communicated with my manager that there might be the possibility of trying to build a product based on the fine-tune model. In case we decide to move forward, we would need to coordinate with her and other teams involved how to proceed.
Sep 7 2024
Progress update
- Experiments:
- As planned I studied the ability of the model fine tuned to detect peacock behavior to detect other promotion-related content issues, described in this data set.
- I run the model on 4 other datasets: {{fanpov}}, {{advert}}, {{autobiography}}, {{weasel}}
- The results show (see below) a similar behavior with the peacock detection task. The model shows a good precision and low recall (lower for templates different from peacock). This suggest that there is information about promotional tone that can be detect by the model, and depending on the setup the model could focus on precision or recall
- Coordination:
- We have a meeting with Peter Pelberg, Nicola Ayub , and Megan Neisler to discuss next steps.
- First, we decided that the model needs to be tested again a simple baseline, that can be just a string matching approach, looking for common peacock keywords. I’ll be working on this during the next week(s) (notice I’ll be OoO few days during the next two weeks)
- Peter is going to decide if we want to go deeper on this specific task, and analysis how other factors related to transform this model into a product (serving time, ux, etc) or work on other tasks that involves ML and user experiences
Sep 4 2024
@achou, just for my curiosity, is the "predict time" the total end-to-end period or total = preprocess + predict?
Aug 30 2024
Progress update
Aug 23 2024
Progress update
This looks great @jsn.sherman. Do you know if there is an overlap on the revisions that returns an error for each model?
I'm just wondering if the ML fails on very long diffs (given that needs to process the text itself).
Aug 19 2024
Aug 14 2024
Aug 8 2024
@Samwalton9-WMF , just keep in mind that the scores from RRML and RRLA are different. This means that you maybe need to run new users' test to (re)define the thresholds.
Hi @Samwalton9-WMF , we choose RRLA because it was more stable, but since then, we made some updates to RRML (it was not only about serving time, but getting errors for some revisions), that aimed to make it more stable.
So, if there is interest to switch to RRML (for 47 languages with coverage), my recommendation would be to run some stress test on that service, and measure the % of errors and if Automoderator can tolerate them.
Aug 2 2024
Progress update
Jul 26 2024
Progress update
- I've been coordinating with ML-team to show code examples that make their (experimental) infrastructure to fail. They will be using this code as part of their use-case studies when testing new LLMs infrastructure.
- In the meantime I've been working on writing code to fine-tune smaller Language Models, this requires:
- Data preprocessing and cleaning (done)
- Experimental design (done)
- Run experiments on stats machine (in progress)
- Met with KR owner (Peter Pelberg) and explain the progress and next steps for this hypothesis.
Jul 23 2024
Hi! Apparently the data has missing again:
Jul 22 2024
Jul 19 2024
- Studied how to create prompts for Gemma2. Noticed the importance of using special tokens and format.
- Designed zero-shot experiment for detecting Peacock behavior.
- Wrote code for testing the Gemma2 instance hosted by the ML-team.
- The instance took more than 5 seconds per query.
- After few requests (around 200) the instance stop responding.
- O've reported this issue to ML-Team, my understanding is they will be working on fixing this during the next week (cc: Chris Albon)
Jul 18 2024
You are right @leila we should merge them.
Jul 12 2024
Based on our previous research, we have created a dataset containing 9276 articles affected by peacock and other related policy violations on English Wikipedia. For each of them we have negative (no policy violations) and positive examples: * Autobiography: 1472
- fanpov: 350
- peacock 2587
- weasel 805
- advert: 4062
- Total: 9276
Jul 5 2024
I'm resolving the task and track model's deployment in T369371
Jul 4 2024
- covid-data.wmf-research-tools.eqiad1.wikimedia.cloud (this one is shut-off so maybe just needs deleted?)
- wikipediaWikidata.wmf-research-tools.eqiad1.wikimedia.cloud
I've just removed these two
Thanks for this work @isarantopoulos!
Jun 28 2024
@Trokhymovych, please post here the models' performance results
To keep this task updated, models for Wikipedia are ready and can be found here:
@Trokhymovych has addresed the comments and submitted the merge request. Model binary can be found here.
I'm going to coordinate with research engineers to decide next steps.
Jun 25 2024
Just for the records, we have migrated the fact-checking API to another instance and deleted the old one.
Jun 24 2024
Jun 18 2024
Thanks @JAllemandou !
May 20 2024
@XiaoXiao-WMF can you please provide more context?
May 6 2024
@lbowmaker the proposed solution sounds ok to me. I have two questions around:
May 3 2024
@lbowmaker if understand correctly, there is no alternative for obtaining historical data for Wikidata edits? If this is the case, we can't keep the Wikidata Revert Risk model updated
May 2 2024
Apr 29 2024
This task has been resolved, please follow the model deployment here: T363718
Apr 17 2024
This was solved. More details here T341820